Social Networks Group betweenness and co

Social Networks 31 (2009) 190–203
Contents lists available at ScienceDirect
Social Networks
journal homepage: www.elsevier.com/locate/socnet
Group betweenness and co-betweenness: Inter-related notions of coalition
centrality夽
Eric D. Kolaczyk a,∗ , David B. Chua b , Marc Barthélemy c,d
a
Dept. of Mathematics and Statistics, Boston University, 111 Cummington Street, Boston, MA 02215, USA
State Street Bank, Boston, MA, USA
Commissariat à l’Energie Atomique, Centre d’Etudes de Bruyères Le Châtel, Bruyères Le Châtel, France
d
Centre d’Analyse et Mathématique Sociales, École des Hautes Études en Sciences Sociales, Paris, France.
b
c
a r t i c l e
Keywords:
Centrality
i n f o
a b s t r a c t
Vertex betweenness centrality is a metric that seeks to quantify a sense of the importance of a vertex in a
network in terms of its ‘control’ on the flow of information along geodesic paths throughout the network.
Two natural ways to extend vertex betweenness centrality to sets of vertices are (i) in terms of geodesic
paths that pass through at least one of the vertices in the set, and (ii) in terms of geodesic paths that pass
through all vertices in the set. The former was introduced by Everett and Borgatti [Everett, M., Borgatti,
S., 1999. The centrality of groups and classes. Journal of Mathematical Sociology 23 (3), 181–201], and
called group betweenness centrality. The latter, which we call co-betweenness centrality here, has not been
considered formally in the literature until now, to the best of our knowledge. In this paper, we show that
these two notions of centrality are in fact intimately related and, furthermore, that this relationship may be
exploited to obtain deeper insight into both. In particular, we provide an expansion for group betweenness
in terms of increasingly higher orders of co-betweenness, in a manner analogous to the Taylor series
expansion of a mathematical function in calculus. We then demonstrate the utility of this expansion by
using it to construct analytic lower and upper bounds for group betweenness that involve only simple
combinations of (i) the betweenness of individual vertices in the group, and (ii) the co-betweenness of
pairs of these vertices. Accordingly, we argue that the latter quantity, i.e., pairwise co-betweenness, is
itself a fundamental quantity of some independent interest, and we present a computationally efficient
algorithm for its calculation, which extends the algorithm of Brandes [Brandes, U., 2001. A faster algorithm
for betweenness centrality. Journal of Mathematical Sociology 25, 163] in a natural manner. Applications
are provided throughout, using a handful of different communication networks, which serve to illustrate
the way in which our mathematical contributions allow for insight to be gained into the interaction of
network structure, coalitions, and information flow in social networks.
© 2009 Elsevier B.V. All rights reserved.
1. Introduction
In social network analysis, the problem of determining the
importance of actors in a network has been studied for a long time
(see, for example, Wasserman and Faust, 1994). It is in this context
that the concept of the centrality of a vertex in a network emerged.
There are numerous measures that have been proposed to quantify
centrality, which differ both in the nature of the underlying notion
of vertex importance that they seek to capture and in the manner
in which that notion is encoded through some functional of the
network. See Borgatti and Everett (2006), for example, for a recent
review and categorization of centrality measures.
夽 Part of this work supported by NSF grant CCR-0325701 and ONR awards N0001403-1-0043 and N00014-06-1-0096.
∗ Corresponding author.
E-mail address: [email protected] (E.D. Kolaczyk).
0378-8733/$ – see front matter © 2009 Elsevier B.V. All rights reserved.
doi:10.1016/j.socnet.2009.02.003
Paths – as the routes by which flows (e.g., of information or
commodities) travel over a network – are fundamental to the functioning of many networks. Therefore, not surprisingly, a number
of centrality measures quantify importance with respect to the
sharing of paths in the network. One popular measure is betweenness centrality. First introduced in its modern form by Freeman
(1977), betweenness centrality is essentially a measure of how
many geodesic (i.e., ‘shortest’) paths pass through a given vertex. In
other words, in a social network for example, the betweenness centrality measures the extent to which an actor “lies between” other
actors in the network, with respect to the network path structure.
As such, it is a measure of the control that actor has over the flow
of information in the network.
The standard betweenness centrality is defined with respect to
individual vertices. As a result, while this quantity can be used to
produce an ordering of the vertices in terms of their individual
importance, it is not clear a priori just how much insight it provides into the manner in which the vertices together exert influence
E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203
191
Fig. 1. Graph representation of the physical topology of the Abilene network. Nodes represent regional network aggregation points (so-called ‘Points-of-Presence’ or PoP’s),
and are labeled according to their metropolitan area, while the edges represent systems of optical transportation technologies and routing devices.
upon the network. Understanding behavior of this latter kind can
be important in presenting an appropriately more nuanced view of
the roles of the different vertices, beyond their individual importance, such as through their roles as members of potential ‘groups’
or ‘coalitions’. The interaction of network structure, information
flow, and the selection (if imposed) or formation (if autonomous)
of influential subsets of vertices is an area of substantial current
research interest in the overall network-oriented literature. Recent
work of this nature can be found in regards to topics as diverse as the
spread of epidemics in population networks and rumors and information in social networks (e.g., Barrat et al., 2008, Chapters 9 & 10),
the effect of affiliation networks of interest groups on political processes (e.g., Dominguez, 2008), and the study of coalition formation
in multi-agent systems in economics and computer science (e.g.,
Merida-Campos and Willmott, 2007). Numerous additional references may be found in those just cited. The general concepts and
tools introduced in this paper are broadly relevant to work in this
area, in that they pertain to the important issue of how to quantify
and interpret betweenness centrality for collections of more than
one vertex.
There are two ways in which one might naturally extend vertex betweenness centrality to sets of vertices. The first is to define
the betweenness of a set in terms of geodesic paths that pass
through at least one of the vertices in the set, and the second, in
terms of geodesic paths that pass through all vertices in the set.
The former notion was introduced by Everett and Borgatti (1999),
and called group betweenness centrality. The latter, which we call
co-betweenness centrality in this paper, has not been considered
formally in the literature until now, to the best of our knowledge.
The first would arguably seem to be of more immediate interest
for applications (e.g., see Everett and Borgatti, 1999 for relevant
discussion). The primary contribution of this paper, however, is to
show that these two notions are in fact intimately related, and that
furthermore, this relationship provides interesting insight into the
nature of each. In particular, we develop a precise mathematical
characterization of this relationship and then use it to show how
the betweenness of a group of an arbitrary number of vertices can be
bounded, both above and below, by quantities involving only (i) the
betweenness of the individual vertices, and (ii) the co-betweenness
of pairs of these vertices. These bounds are found frequently to be
quite tight in the network datasets we examine and, in general,
their width can reveal information concerning higher-order aspects
of network path structure. We therefore argue that pairwise cobetweenness, which is critical to the construction of these bounds,
is itself a quantity of some fundamental interest, and we present an
algorithm for its efficient calculation across all pairs of vertices in a
network.
The organization of this paper is as follows. In Section 2,
we briefly review necessary notation and terminology, and then
illustrate the basic relationship between group betweenness and
co-betweenness centralities in the case of groups of m = 2 vertices. The general case of groups of m ≥ 2 vertices is addressed
in Section 3, wherein we provide an expression relating the two
notions of centrality, we develop our bounds, and we discuss some
of the implications of these bounds. The computation of pairwise
co-betweenness values is discussed in Section 4, where we sketch
our proposed algorithm. The concepts introduced throughout Sections 2 and 3 are motivated and illustrated in the context of an
Internet communication network. In Section 5, we provide further
illustration using two social networks. Some additional discussion is provided in Section 6. Finally, a formal description of our
algorithm for computation of pairwise co-betweenness, as well as
pseudo-code, may be found in Appendix A.
2. Preliminary material and results
2.1. Background
Let G = (V, E) denote an undirected graph with nv vertices in
V and ne edges in E. For convenience, and without loss of generality, we will assume G to be connected. Recall that a path on G,
from a vertex v0 to another vertex v , is an alternating sequence
{v0 , e1 , v1 , . . . , v−1 , e , v } of vertices and edges, where the endpoints of ei are {vi−1 , vi }, such that no edges or vertices are repeated.
The length of this path is said to be . A geodesic path (also often
called a ‘shortest’ path) between two vertices u, v ∈ V is a path
whose length is a minimum, among all paths between u and v.
The length of a geodesic path between two vertices is called their
geodesic distance. In the case that the graph G is weighted, i.e., there
is a collection of edge weights {we }e ∈ E , where we ≥ 0, geodesic
paths may be instead defined as paths for which the total sum of
edge weights is a minimum. In this paper, we will restrict our exposition primarily to the case of unweighted graphs, but extensions
to weighted graphs are straightforward. For additional background
of this type, see, for example, the textbook Clark and Holton (1991).
Let st denote the total number of geodesic paths that connect
vertices s and t (with ss ≡ 1). Similarly, let st (v) denote the number
of geodesic paths between s and t that also pass through vertex v,
in the sense that v is an interior vertex on the path. Betweenness
centrality of a vertex v is defined as a weighted sum of the number
192
E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203
Table 1
Pairs of vertices in Abilene with the top 20 betweenness values B(u,v).
of geodesic paths through v,
B(v) =
st (v)
s,t ∈ V \{v}
st
.
(1)
Note that this definition excludes the geodesic paths that start
or end at v. However, in a connected graph we will have st (v) = st
whenever s = v or t = v, so the exclusion amounts to removing a
constant term that would otherwise be present in the betweenness
centrality of every vertex. Sometimes the betweenness B(v) is normalized, in the form B̃(v) = 2B(v)/[(nv − 1)(nv − 2)], so as to restrict
its range to between 0 and 1.
As an illustration, which we will use throughout this and the next
section, consider the network in Fig. 1. This is the Abilene network,
an Internet network that is part of the Internet2 project,1 a research
project devoted to the development of the ‘next generation’ Internet. It serves as a so-called ‘backbone’ network for universities and
research labs across the United States, in a manner analogous to
the federal highway system of roads. The information traversing
this network takes the form of so-called ‘packets’, and the packets
flow between origins and destinations on this network along paths
strictly determined according to a set of underlying routing protocols. Due to the nature of these protocols, it is common to assume,
as a first approximation, that (i) information flows in this network
with respect to a set of geodesic paths and (ii) there is exactly one
geodesic path for each vertex pair.2
The vertices in Fig. 1 correspond to metropolitan regions, and
have been laid out roughly with respect to their true geographical locations. Note that the latter of the two assumptions above
implies that the betweenness B(v) of any given vertex v ∈ V will be
exactly equal to the number of geodesic paths through v. We will
find this fact convenient for the purposes of illustration, although
it is not necessary for (nor utilized in) our general development.
Intuitively, and according to earlier work on centrality in spatial
networks (Barrat et al., 2005), one might suspect that vertices near
the central portion of the network, such as Kansas City or Indianapolis, have larger betweenness, being likely forced to support most of
the flows of communication between east and west. Examination
of the underlying routing information and the paths induced by this
information show this to be the case.
Until recently standard algorithms for computing betweenness
centralities B(v) for all vertices in a network had O(n3v ) running
times, which was a stumbling block to their application in largescale network analyses. Faster algorithms now exist, such as those
introduced in Brandes (2001), which have running time of O(nv ne )
on unweighted networks and O(nv ne + n2v log nv ) on weighted networks, with an O(nv + ne ) space requirement. These improvements
derive
from exploiting a clever recursive relation for the partial
(v)/st . We make use of similar techniques in the
sums
t ∈ V st
development of our own algorithms here.
2.2. Betweenness and co-betweenness for pairs of vertices
We motivate our study in this paper of higher-order notions of
betweenness by first examining in some detail the case of m = 2
vertices. For two vertices u, v ∈ V , their individual betweenness values quantify the extent to which each is passed through by geodesic
paths in G. Similarly, the group betweenness of the pair quantifies
the extent to which either is passed through. In general, however,
1
http://www.internet2.edu.
Technically, the Abilene network is more accurately described by a directed
graph. But, given the fact that routing is typically symmetric in this network, we
follow the Internet2 convention of displaying Abilene using an undirected graph.
In addition, although the uniqueness of geodesic paths in this network necessarily
implies that it is actually a weighted graph, we will not emphasize this fact here.
2
Rank
Vertex pair u, v
C(u, u)
C(v, v)
C(u, v)
B(u, v)
B̃(u, v)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Indianapolis/Houston
Kansas City/Atlanta
Indianapolis/Los Angeles
Kansas City/Houston
Kansas City/Los Angeles
Kansas City/Washington
Indianapolis/Sunnyvale
Kansas City/New York
Kansas City/Sunnyvale
Kansas City/Chicago
Indianapolis/Washington
Indianapolis/Atlanta
Indianapolis/Kansas City
Indianapolis/Denver
Indianapolis/Seattle
Kansas City/Seattle
Indianapolis/New York
Kansas City/Denver
Chicago/Atlanta
Chicago/Sunnyvale
38
36
36
36
36
36
32
34
32
32
32
30
32
32
32
32
30
30
22
18
4
6
4
4
4
4
12
10
12
18
4
6
32
10
0
0
10
10
6
12
0
0
0
0
0
0
4
6
6
14
0
0
30
8
0
0
8
10
0
2
42
42
40
40
40
40
40
38
38
36
36
36
34
34
32
32
32
30
28
28
0.583
0.583
0.556
0.556
0.556
0.556
0.556
0.528
0.528
0.500
0.500
0.500
0.472
0.472
0.444
0.444
0.444
0.417
0.389
0.389
this latter quantity will not necessarily be equal simply to the sum
of the former two quantities, as this sum will over-count geodesic
paths that pass through both vertices. Some correction is therefore
necessary in relating the betweennesses of two vertices to their
combined group betweenness, as we describe next. Note, however,
that rather than individual vertex betweennesses B(u) and B(v), we
will instead use slightly modified versions of these quantities, for
reasons that will become immediately apparent.
Formally, we express the group betweenness of u and v as
B(u, v) = C{u,v} (u, u) + C{u,v} (v, v) − C{u,v} (u, v),
where
C{u,v} (i1 , i2 ) =
s,t ∈ V \{u,v}
st (i1 , i2 )
,
st
(2)
(3)
for i1 , i2 ∈ {u, v}, and st (i1 , i2 ) is the number of geodesic paths
between vertices s and t that pass through both i1 and i2 . Defined
in analogy to (1), we call the quantity in (3) the co-betweenness of
i1 and i2 , with respect to {u, v}. The subscript {u, v} indicates that
only paths between vertices s, t ∈ V \ {u, v} are included in the sum,
while the argument (i1 , i2 ) indicates that we are counting paths
passing through both i1 and i2 . Although somewhat redundant here,
the purpose of this convention will become clear below, where we
generalize to m ≥ 2. When the context allows, we will sometimes
abbreviate the quantity in (3) as C(i1 , i2 ).
Eq. (2) is just a re-expression of the group betweenness centrality defined in Everett and Borgatti (1999), for a group of size m = 2.
Following these authors, we also define the normalized form of this
measure as B̃(u, v) = 2B(u, v)/[(nv − 2)(nv − 3)]. Each of the three
components in (2) can be seen as having an important role to play
in defining the group betweenness of the pair u, v. However, note
that the quantities C{u,v} (u, u) and C{u,v} (v, v) are not actually equal
to the betweennesses B(u) and B(v), since the latter include paths
with one end at v or u, respectively, while the former do not. Nevertheless, without loss of generality, we will refer to both the former
and the latter quantities as vertex betweenness centralities.
To illustrate the role of these various components, consider
Table 1, in which we list the pairs of vertices u, v in the Abilene
network with the top twenty betweenness values B(u, v). We see
that the group betweenness is largest for pairs of a particular nature,
involving one of the vertices central to the northern east-west route
across the United States and one of the vertices along the southern east-west route. In particular, the six largest values of B(u, v) all
involve combinations of either Indianapolis or Kansas City with one
E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203
Fig. 2. Histogram of second-order group betweenness values B(u, v) for all pairs of
vertices in the Abilene network.
of Washington, Atlanta, Houston, or Los Angeles. Further examination of Table 1 reveals that in fact all but the last two entries involve
either Indianapolis or Kansas City (or, in one instance, both). These
two vertices have very large vertex betweenness, and clearly this is
the main factor in the high group betweenness for these first eighteen entries. In fact, pairing either of these two vertices with Seattle,
which is essentially peripheral to the network, with no betweenness itself with respect to the underlying routing protocols, is still
sufficient to achieve a high second-order group betweenness. When
we first see both Indianapolis and Kansas City absent, in the nineteenth and twentieth entries of the table, the nearby Chicago vertex
has assumed a similar role.
It is also informative to examine the full set of group betweenness values B(u, v), which we show in Fig. 2, in the form of a
histogram for all 55 distinct pairs of vertices in Abilene. It is evident
that there are three small clusters of values, in the high, medium,
and low ranges, against an otherwise fairly uniform background.
The high values correspond to the first ten or so values in Table 1
and, as we have just observed, are driven by the inclusion of either
Indianapolis or Kansas City. The medium values exclude these two
vertices, and instead tend to include Denver, Houston, and New
York, either together or with some of the vertices on the southern
193
east-west route. The low values involve only these vertices on the
southern east-west route, with Seattle as well, in some cases.
Now consider the co-betweennesses C(u, v), which are shown in
Table 1 as well. We have also displayed the co-betweenness values
visually in Fig. 3 using a graph, where each vertex v is again placed
roughly with respect to its actual geographic location, but is now
drawn in proportion to its betweenness B(v). Edges between pairs
of vertices u, v now represent non-zero co-betweenness C(u, v) for
the pair, and are drawn with a thickness in proportion to their value.
A number of interesting features are evident from this graph. First,
we see that, as was noted earlier, the more centrally located vertices tend to have the largest betweenness values. Second, it is these
vertices that typically are involved with the larger co-betweenness
values. Since the paths going through both a vertex u and a vertex v
are a subset of the paths going through either one or the other, this
tendency for large co-betweenness to associate with large betweenness is not a surprise. Third, the co-betweenness values tend to
be smaller between vertices separated by a larger geographical
distance, which again seems intuitive. Somewhat more surprising
perhaps, however, is the manner in which the network becomes
disconnected. The Seattle vertex is now isolated, as the underlying
routing protocols send no paths through that vertex, only to and
from. Additionally, the vertices Houston, Atlanta, and Washington
now form a separate component in this graph, indicating that information is routed on paths passing through both the first two and the
last two, but not through all three, and also not through any of these
and some other vertex. This observation suggests that these three
vertices, as a group, are somewhat more marginal in the network,
with respect to the flow of information.
3. Betweenness for sets of vertices
In this section we present our main results on group betweenness and co-betweenness, for sets of vertices of arbitrary size m ≥ 2.
We first develop an expansion for group betweenness, generalizing
the expression in (2), in terms of co-betweenness values of increasing orders. Based on this expansion, we then construct a set of lower
and upper bounds for group betweenness involving only vertex
betweenness and pairwise co-betweenness.
3.1. An expansion for group betweenness
Our expression for group betweenness in (2), for the case of
m = 2 vertices, explicitly incorporates the counting principle of
inclusion–exclusion. Seen from this perspective, C{u,v} (u, v) is the
Fig. 3. Graph representation of the betweenness and co-betweenness values for the Abilene network. Vertices are in proportion to their betweenness. The width of each link
is drawn in proportion to the co-betweenness of the two vertices incident to it.
194
E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203
key new quantity, where its importance is in correcting for doublecounting in the vertex betweennesses C{u,v} (u, u) and C{u,v} (v, v)
of paths that pass through both u and v. The same principle of
inclusion–exclusion can be used to produce an analogous expression for the betweenness centrality of subsets A ⊂ V of an arbitrary
number m = |A| of vertices.
To see this, following Everett and Borgatti (1999), we first define
the group betweenness of the set of vertices A as
∗ (A)
st
B(A) =
s,t ∈
/A:s =
/ t
st
,
(4)
∗ (A) is the number of geodesic paths between s and t that
where st
pass through at least one of the vertices in A. The normalized version
of B(A) is B̃(A) = 2B(A)/[(nv − m)(nv − m − 1)]. Next, we note that we
∗ (A) in the form
can express st
∗
st
(A) =
m
(−1)j−1
st (ij ),
(5)
ij ⊆A
j=1
where ij = {i1 , . . . , ij } denotes a subset of j vertices in A and st (ij ) is
the number of geodesic paths between s and t that pass through
all of the vertices in ij . This expression is simply the result of
applying the inclusion—exclusion principle in a standard fashion.3
Finally, we extend our notion of co-betweenness to more than two
vertices. Specifically, for a subset ij ⊆ A, we define the j-th order
co-betweenness of ij , with respect to A, as
CA (ij ) =
st (ij )
s,t ∈
/A:s =
/ t
st
.
(6)
This value captures the number of geodesic paths in the network
that pass through all of the j vertices in ij ⊆ A. Note that, with respect
to this notation, the expression in (6) reduces to that in (3) when
A = {u, v}, j = 2, and i2 = {u, v}.
Now, given the definitions above, the expression for group
betweenness in (4) may be written alternatively as
B(A) =
m
j=1
(−1)j−1
CA (ij ).
(7)
ij ⊆A
That is, the group betweenness B(A) can be re-expressed in an
inclusion–exclusion manner, with respect to terms of increasingly
higher orders of co-betweenness among the elements of A. Note
that this formulation of B(A) reduces to that in (2), when A = {u, v}.
Formula (7) provides us with a type of expansion for group
betweenness, similar in spirit to a Taylor series expansion for a
mathematical function in calculus. This perspective in turn suggests
the potential for and usefulness of studying group betweenness
using principles of, for example, truncation and approximation. For
example, for an arbitrary group A, we might ask what the relative
contributions are of co-betweenness values at increasingly higher
orders. In the case m = 2, for the Abilene network, we saw that
the vertex betweenness (i.e., first-order co-betweenness under our
notational convention) plays a primary role, and the pairwise cobetweenness, a secondary and corrective role. In general, for m ≥ 2,
3
The inclusion–exclusion principle states that the total number of elements in
the union of a finite number of sets of finite cardinality may be enumerated as a
summation of terms with alternating signs. The first term is obtained by adding (i.e.,
inclusion) the cardinalities of the sets. The second term is a correction to the first term
for ‘double counting’ of elements shared by pairs of sets, by appropriate subtraction
(i.e., exclusion). The third term is a further correction for excessive subtraction in the
second step, adding back any elements that were subtracted out more times than
necessary due to their being shared by triples of sets. And so on and so forth.
m
since there are
terms in (7) involving subsets of j vertices in A,
j
clearly for many j there are a substantial number of terms. But if the
magnitude of these individual terms is small enough, their overall
contribution may still end up being small. Note, for example, as an
extreme case, that CA (ij ) = 0 for all j > max , where max is the length
of the longest geodesic path in G (i.e., the so-called diameter of G).
Since max = O(log nv ) in many networks (e.g., networks possessing
the ‘small-world’ property), this fact suggests that the number of
relevant orders in the expansion (7) may grow quite slowly with
the number of vertices.
In fact, we show next that it is possible to produce useful bounds
for the group betweenness B(A) involving only vertex betweenness
and pairwise co-betweenness, i.e., involving only terms of first and
second order in (7). Furthermore, in Section 4, we present an algorithm for the efficient numerical computation of these quantities.
Specifically, our algorithm allows for the computation of secondorder co-betweenness C(u, v) for all u, v ∈ V × V in at worst O(n3v )
time, and on sparse graphs we have witnessed typical runs times
much closer to O(n2v ) in practice.4 Taken together, these two contributions allow for a novel characterization of B(A), in terms of its
lower- and higher-order components, in a computationally efficient
manner.
3.2. Lower and upper bounds for group betweenness
The bounds we present here are derived using Bonferroni-like
inequalities (e.g., Galambos and Simonelli (1996)), such as underlie
so-called ‘Bonferroni corrections’ used in statistics, which allow for
the calibration of a collection of statistical tests of, say, m hypotheses
H1 , . . . , Hm , so that the probability of falsely rejecting any of these
hypotheses is controlled at a certain level. The most familiar version
of these types of corrections is based upon a bound of the form
E )≤
Pr(∪m
i=1 Hi
m
Pr(EHi ),
(8)
i=1
where EHi denotes the event that Hi is falsely rejected. This bound
derives from a truncation of an exact expression for the probability on the left-hand side, and this expression is identical in form to
that in (7), but with the components CA (ij ) replaced by probabilities
Pr(EHi ∩ · · · ∩ EHi ). Although used less commonly, various other
1
j
bounds – both lower and upper – have been obtained by more subtle truncations incorporating higher-order terms, with truncations
up to second-order being most common.
The same ideas may be used to bound B(A). Specifically, note
∗ (A)/ may be interpreted as the probability that
that the ratio st
st
a randomly selected geodesic path between s and t passes through
at least one vertex in A. Since this is the probability of a union of
events, it can be bounded above as in (8), where the probabilities
on the right-hand side are of the form st (i)/st , for i ∈ A. That is,
∗ (A)
st (i)
st
≤
.
st
st
(9)
i∈A
Furthermore, a standard extension of (8) similarly yields that
∗ (A)/
st
st can be bounded below by the right-hand side in (9),
less a summation of all probabilities of the form st (i1 , i2 )/st , for
i1 , i2 ∈ A. Applying this idea for each pair (s, t), summing over all
4
This algorithm was used to produce all of the numerical output for the examples
presented in the previous section, in the context of the Abilene network.
E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203
pairs, and collecting terms appropriately, yields the bounds
CA (i) −
i∈A
CA (i1 , i2 ) ≤ B(A) ≤
(i1 ,i2 ) ∈ A:i1 <i2
CA (i).
(10)
i∈A
Now, generally speaking, the lower bound in (10) can be
expected to be somewhat reasonable in practice. But the upper
bound will likely be rather rough – this is certainly the case for the
analogous inequality (8) used in statistics. However, techniques for
deriving improved Bonferroni inequalities can be brought to bear
here, using precisely the same logic as above. For example, direct
application of Corollary 1 of Worsley (1982) yields that
∗ (A)
st
st
≤
st (i)
i∈A
st
−
m−1
st (ij , ij+1 )
j=1
st
,
(11)
where i1 , . . . , im is a given ordering of the vertices in A. Again,
applying this bound to each pair (s, t), summing over all pairs, and
collecting terms appropriately, yields the alternative upper bound
B(A) ≤
CA (i) −
i∈A
m−1
CA (ij , ij+1 ).
(12)
j=1
The bound in (12) differs from the upper bound in (10) by
a correction factor composed of a certain subset of pairwise
co-betweenness values. Coupled with the lower bound in (10),
which involves a correction composed of all possible pairwise cobetweennesses, we have a pair of lower and upper bounds for the
group betweenness B(A) that can be trivially computed in O(m2 ) and
O(m) operations, respectively, given the values CA (i) and CA (i1 , i2 ).
And these latter values can all be computed using a minor modification of the algorithm for computing co-betweennesses C(u, v)
that we mentioned earlier, and describe below in Section 4.
A simple measure of the accuracy of our bounds is provided
by their difference, which is the width of the interval they form.
To express this width succinctly, we define in association with the
set A a graph HA = (VA , EA ), where VA = A and EA contains an edge
between i1 , i2 ∈ VA if and only if CA (i1 , i2 ) =
/ 0. Note that HA is just
a sub-graph of the type of overall co-betweenness graph we introduced in Section 2.2, in Fig. 3, within the context of our discussion
of the Abilene network. In particular, it is the sub-graph induced
by the m vertices in A. Let Ebnd
⊆ EA be those edges for which terms
A
CA (ij , ij+1 ) were used in the upper bound (12). Then the width of the
interval formed by this bound and the lower bound in (10) is given
by
W=
CA (i1 , i2 ).
(13)
(i1 ,i2 ) ∈ EA \Ebnd
A
In other words, the width is determined by the co-betweenness
values for those pairs of vertices not used in constructing the upper
bound.
Hence, an interesting question is how best to select the ver. Intuitively, Ebnd
should involve pairs with
tex pairs {i1 , i2 } ∈ Ebnd
A
A
large co-betweenness. However, recall from the construction of our
also must involve only pairs adjacent to each other
bounds that Ebnd
A
under some ordering i1 , . . . , im of the m vertices in A, and therefore not all possible combinations of pairs {i1 , i2 } are available to
us. For small m, we can in principle create a list of all such orderings, evaluate the width (13) corresponding to each, and select
that which yields a minimum width. For example, if m = 3 and
A = {1, 2, 3}, there are only three unique orderings to consider. In
general, however, there will be m!/2 unique orderings (i.e., since
the left-right direction of the list is unimportant upon summation),
growing quickly in m. In fact, formally, the problem of minimizing
the width (13) is equivalent to that of finding the longest path in
195
Table 2
Upper and lower bounds on the normalized betweenness B̃(u, v, w) for the best
triple (u, v, w) obtained by joining one additional vertex to each of the first, tenth,
and twentieth best pairs (u, v) in Table 1.
Vertex triple (u, v, w)
Lower bound
Upper bound
Indianapolis/Houston/Sunnyvale
Kansas City/Chicago/Atlanta
Kansas City/Chicago/Houston
Kansas City/Chicago/Los Angeles
Chicago/Sunnyvale/Atlanta
0.714
0.643
0.643
0.643
0.607
0.714
0.643
0.643
0.643
0.607
HA or, if HA is not connected, the union of longest paths on the
component sub-graphs, where the length of each edge {i1 , i2 } ∈ EA
is CA (i1 , i2 ). And this problem is known to be NP-hard, since the
Hamiltonian path problem is a special case. Algorithms for producing approximate solutions of this problem exist, although they vary
in their accuracy, particularly in connection with the density of HA ,
with sparse graphs being more difficult. See Karger et al. (1997).
We have not explored the issue of finding a good approximation
algorithm for when m is large.
As a final note, we point out that it is possible to have one or both
of our lower and upper bounds actually achieve the value B(A). For
example, if CA (ij ) = 0 for j ≥ 3, then it follows trivially, by (7) and
(10), that B(A) and the lower bound will be equal. In our numerical work we have also encountered cases where the upper bound
equals B(A)– in fact, this situation occurred quite frequently. Furthermore, in many cases we found that the upper and lower bounds
were equal, indicating that not only were the co-betweenness terms
involving three or more vertices all zero, but also that apparently
the co-betweenness terms for all pairs {i1 , i2 } ∈ EA not used in the
upper bound were zero, since the sum in (13) was zero.
3.3. Illustration: Abilene
In order to illustrate the nature and utility of the results developed in this section, we examine groups A of size m = 3 in the
Abilene network of Fig. 1. Consider the pairs of vertices with the
first, tenth, and twentieth highest betweenness ranking in Table 1.
Suppose that for each pair u, v we wish to add one additional vertex w, chosen so as to maximize the overall ‘control’ of the three
vertices over traffic in the network, i.e., to maximize B(u, v, w). The
results are shown in Table 2, presented in terms of the normalized
betweenness B̃(u, v, w). Here, since there are only three vertices
involved, the second term in the upper bound in (12) was chosen in the form CA (i1 , i2 ) + CA (i2 , i3 ), where i1 , i2 , and i3 are that
permutation of A = {u, v, w} which maximizes this sum.
Two points are interesting to note. First, in all three cases, there is
a reasonable sense of geographical dispersion of the vertices across
the continental United States. Second, in all three cases the lower
and upper bounds are equal and, therefore, these values are just
B̃(A). Of course, it certainly is not the case that all triples of vertices (u, v, w) necessarily will have equal lower and upper bounds.
Rather, it is a question of the location of the vertices relative to each
other and within the network as a whole. When the bounds are
equal, this suggests a relatively non-redundant role for each of the
vertices in the group. When they are not only equal but also high,
this suggests good coverage or ‘control’ as well.
For example, consider the Chicago/Sunnyvale pair, for which we
show in Table 3 the values of B̃(A) for the addition of each possible third vertex. Most vertices besides Atlanta also yield equal
lower and upper bounds, but with values lower than the 0.607
associated with Atlanta. The exceptions are Indianapolis and Kansas
City, whose values are not only lower than the maximum, but also
form non-trivial intervals, i.e., (0.429, 0.464) and (0.536, 0.571),
respectively. These latter two vertices lying as they do between
Chicago and Sunnyvale on the northern east-west route, clearly
196
E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203
Table 3
Upper and lower bounds on the normalized betweenness B̃(u, v, w) for triples
(u, v, w) involving Chicago, Sunnyvale, and one other vertex in the Abilene network,
for all possible choices of additional vertex.
Third vertex w
Lower bound
Upper bound
Atlanta
Kansas City
Houston
Denver
Indianapolis
Washington
Seattle
Los Angeles
New York
0.607
0.571
0.536
0.500
0.464
0.429
0.357
0.321
0.321
0.607
0.607
0.536
0.500
0.500
0.429
0.357
0.321
0.321
C(u, v) =
4. Computation of pairwise co-betweenness
We discuss here the calculation of the pairwise co-betweenness
values C(u, v) in (3), and the closely related values CA (u, v), for all
pairs (u, v). At first glance, it would appear that an algorithm of O(n4v )
running time is necessary, given that the number of vertex pairs
grows as the square of the number of vertices. Such an implementation would render the notion of pairwise co-betweenness infeasible
to implement in any but graphs of relatively modest size. However, exploiting ideas similar to those underlying the algorithms
of Brandes (2001) for calculating the vertex betweennesses B(v),
a decidedly more efficient implementation may be obtained. We
describe the main ideas briefly here in this section. Details may be
found in Appendix A.
Our algorithm for computing co-betweenness involves a threestage procedure for each vertex v ∈ V . In the first stage, we perform
a breadth-first traversal of the graph G, to quickly compute intermediary quantities such as sv , the number of geodesic paths from
a source s to each other vertex v in the network; in the process we
form a directed acyclic graph that contains all geodesic paths leading from vertex s. In the second stage, we iterate through each vertex
in order of decreasing distance from s and compute a score ıs (v) for
each vertex. This score essentially captures the dependency of s on
v, in the sense of its contribution to co-betweennesses involving v.
These contributions are then aggregated in a depth-first traversal
of the directed acyclic graph, which is carried out in the third and
final stage.
In order to compute the number of geodesic paths sv in the first
stage, we note that the number of geodesic paths from s to a vertex
v is the sum of all geodesic paths to each parent of v in the directed
acyclic graph rooted at s, the set of vertices which we denote ps (v),
namely,
st .
(14)
t ∈ ps (v)
In the case of an undirected graph, this can be computed in the
course of a breadth-first search with a running time of O(ne ).
In the second stage, we compute ıs (v) using the recursive relation established in Theorem 6 of Brandes (2001),
ıs (v) =
sv
w ∈ cs (v)
sw
(1 + ıs (w)),
ıs (v)
s ∈ V \{u,v}
admit redundancies in the paths controlled. Nevertheless, it is
interesting to note that these two intervals are still quite tight. In
particular, they are tight enough to conclude, for example, that the
betweenness resulting from the addition of Indianapolis is strictly
less than that resulting from the addition of Kansas City. In fact, our
bounds permit a complete ordering of all possible third vertices, as
shown in Table 3.
sv =
where cs (v) denotes the set of child vertices of v in the directed
acyclic graph rooted at s.
Finally, in the third stage, we compute the co-betweennesses by
interpreting the relation
(15)
sv
sv (u)
(16)
as assigning a contribution of ıs (v)/sv to C(u, v) for each of the
sv (u) geodesic paths to v that pass through u. We accumulate these
contributions at each step of the depth-first traversal when we visit
a vertex v by adding ıs (v)/sv to C(u, v) for every ancestor u of the
current vertex v.
Our proposed algorithms exploit recursions analogous to those
of Brandes (2001) to produce run-times that are in the worst case
O(n3v ), but in empirical studies were found to vary like O(nv ne +
2+p
2+p
nv log nv ) in general, or O(nv log nv ) in the case of sparse graphs.
Here p is related to the total number of geodesic paths in the network and seems to lie comfortably between 0.1 and 0.5 in our
experience. In the case of unique geodesic paths, it may be shown
rigorously that the running time reduces to O(nv ne + n2v log nv ), and
O(n2v log nv ) if the network is sparse as well as ‘small-world’ (i.e.,
with diameter of size O(log nv )). See Appendix A for details.
On a final note, we point out that to compute co-betweenness
values CA (i1 , i2 ), for a given set A, it is sufficient to make two simple
changes to our algorithm, to adjust for the fact that the elements of
A are not allowed to serve as end-points of paths in our calculations.
First, in the third stage, the summation in (16) is restricted to be only
over s ∈ V \ A. Second, the contribution to the recursive sum in (15)
is modified to be ıs (w), rather than 1 + ıs (w), if w ∈ A. Otherwise,
m
the algorithm remains unchanged and the at-most
relevant
2
values will be included among the n2v values output—the rest may
be discarded. Therefore, to produce the bounds described in Section 3.2, once the relevant co-betweenness values are calculated,
requires only O(m) operations for the upper bound, and O(m2 ), for
the lower bound. A more refined algorithm might try to economize
on just which co-betweenness values are computed, given the set A,
although we have not explored this possibility. We comment more
on this issue in Section 6.
5. Additional illustrations
We provide in this section additional illustrations of the results
developed in the previous sections, using two other networks. In
both cases, the data were obtained in studies investigating the flow
of information among actors in a social network.
5.1. Michael’s strike network
The goal of our first illustration is to provide additional insight
into the behavior and potential usage of our bounds. For this purpose we use the strike dataset of Michael (1997), which is also
analyzed in detail in Chapter 7 of de Nooy et al. (2005). New management took over at a forest products manufacturing facility, and
this management team proposed certain changes to the compensation package of the workers. The changes were not accepted by the
workers, and a strike ensued, which was then followed by a halt in
negotiations. At the request of management, who felt that the information about their proposed changes was not being communicated
adequately, an outside consultant analyzed the communication
structure among 24 relevant actors.
The social network in Fig. 4 represents the communication
structure among these actors, with an edge between two actors
indicating that they communicated at some minimally sufficient
level of frequency about the strike. Three subgroups are present
E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203
Fig. 4. Original strike-group communication network of Michael (1997). Three subgroups are represented in this network: younger, Spanish-speaking employees
(black vertices), younger, English-speaking employees (gray vertices), and older,
English-speaking employees (white vertices). The two union negotiators, Sam and
Wendle, are indicated by asterisk next to their names. Edges indicate that the two
incident actors communicated at some minimally sufficient level of frequency about
the strike.
in the network: younger, Spanish-speaking employees (black vertices), younger, English-speaking employees (gray vertices), and
older, English-speaking employees (white vertices). In addition,
the two union negotiators, Sam and Wendle, are indicated by
asterisks next to their names. It is these last two that were responsible for explaining the details of the proposed changes to the
employees. When the structure of this network was revealed, two
additional actors – Bob and Norm – were approached, had the
changes explained to them, which they then discussed with their
colleagues, and within 2 days the employees requested that their
union representatives re-open negotiations. The strike was resolved
soon thereafter.
The formation of an appropriate coalition of actors was fundamental to resolving the strike. That Bob and Norm were approached
is not entirely surprising, from the perspective of network structure. Both serve as cut-vertices in Fig. 4, in that the removal of
either would disconnect the graph. In addition, both have high vertex betweenness centralities, as shown in Fig. 5. Similar to Fig. 3,
vertices in Fig. 5 (now arranged in a circular layout) are drawn in
proportion to their betweenness, and edges, to the co-betweenness
of their incident vertices, as calculated with respect to the original
graph in Fig. 4. Bob and Norm clearly have the largest betweenness
values, followed by Alejandro (who we remark also is a cut-vertex
in Fig. 4, but as part of a much smaller sub-network). As for the two
union representatives, their vertex betweenness values suggest that
Sam also plays a non-trivial role in facilitating communication, but
that Wendle is not well-situated in this regard. In fact, Wendle is
not even connected to the main component of the co-betweenness
graph in Fig. 5, since his vertex betweenness in the original graph
– and hence his co-betweenness with any other vertex – is zero (as
is also true for six other actors).
The coalition formed by Bob, Norm, Sam, and Wendle has a normalized group betweenness of B̃(A) = 0.7702. If instead of Wendle,
we include Alejandro, which might seem more reasonable, given
the discussion above, this value increases only slightly to 0.7807.
However, consider the lower and upper bounds for these numbers,
where the lower bound is given by the left-hand side of (10), and the
upper bound, by the optimal choice of the right-hand side of (12),
197
Fig. 5. Co-betweenness for the strike-group communication network. Actors located
apart from the network, in the corners, are isolated under this representation, as they
have zero betweenness and hence no co-betweenness with any other actors. (Note:
Isolated vertices are drawn to have unit diameter, and not in proportion to their
(zero) betweenness.)
obtained through exhaustive search. For the coalition that includes
Wendle, these bounds are 0.7123 and 0.7702, respectively, while for
the coalition that includes Alejandro, they are 0.5018 and 0.7807,
respectively. Both of these bounds are somewhat loose, but the latter has almost five times the width of the former (i.e., 0.0579 vs
0.2789).
The cause of this difference lies, of course, in the lower bound,
since the upper bound in both cases is exactly equal to the actual
value of the normalized group betweenness. Recall that the lower
bound too will equal this value only if the co-betweennesses CA (ij )
are equal to zero for all subsets of size j = 3 or greater. But examination of the original network, in Fig. 4, shows that this is not
the case for either coalition. In particular, the triple of actors {Bob,
Norm, Sam}, which is common to both coalitions, has a number
of geodesic paths that pass through it, from Xavier to all of the
Spanish-speaking employees and many of the younger, Englishspeaking employees. Hence, this triple has a non-trivial third-order
co-betweenness. Moreover, when Wendle is replaced by Alejandro, geodesic paths from Wendle to these same actors (except
Alejandro now) also contribute to the third-order co-betweenness
of {Bob, Norm, Sam}. Furthermore, there will now be a non-zero
co-betweenness as well for the quadruple {Bob, Norm, Sam, Alejandro}, based on geodesic paths from Wendle and Xavier to the
other Spanish-speaking employees. It is the influence of these
additional higher-order co-betweenness terms on the pairwise cobetweenness terms in equation (13) that increases the width of
our bounds. Or, put another way, the relative redundancy of these
actors as coalition members is reflected in the relative widths of
our bounds.
So a comparison of just the group betweenness values for
our two coalitions suggests that they are roughly equivalent. On
the other hand, a comparison of not only these values, but also
their bounds, is sufficient to highlight differences in higher-order
co-betweenness of coalition members, without actually computing these higher-order values. As a side note, we mention that
the wide bounds in this example are primarily a function of the
choice of vertices in our coalition, rather than a characteristic
of the network as a whole. For most other choices of coalitions of size m = 3 or 4 that we examined, the bounds were
of a similarly narrow width to that observed in our illustra-
198
E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203
Fig. 6. Karate club network of Zachary (1977). The gray vertices represent members of one of the two smaller clubs and the white vertices represent members who went to
the other club. The edges are drawn with a width proportional to the number of situations in which the two members interacted.
tions on the Abilene network, and frequently had a width of
zero.
5.2. Zachary’s karate club network
The goal of our second illustration is to provide some additional
intuition into why pairwise co-betweenness values, when appropriately combined, can successfully summarize group betweenness
values. For this purpose, we use the karate club dataset of Zachary
(1977). Over the course of a couple of years in the 1970s, Zachary
collected information from the members of a university karate club,
including the number of situations (both inside and outside of the
club) in which interactions occurred between members. During
the course of this study, there was a dispute between the club’s
administrator and the principal karate instructor. As a result, the
club eventually split into two smaller clubs of approximately equal
size—one centered around the administrator and the other centered
around the instructor.
Fig. 6 displays the network of social interactions between club
members. The gray vertices represent members of one of the two
smaller clubs and the white vertices represent members who went
to the other club. The edges are drawn with a width proportional
to the number of situations in which the two members interacted.
The graph clearly shows that the original club was already polarized
into two groups centered about actors 1 and 34, who were the key
players in the dispute that split the club in two.
In Fig. 7 is shown a visualization of the vertex betweenness and
pairwise co-betweenness values, similar to those in Figs. 3 and 5,
where the layout is done using an energy minimization algorithm.
After actor 1, actor 34 has the largest vertex betweenness. Now suppose that actor 34 wishes to form a coalition but, due to the dispute,
refuses to do so with actor 1. If 34 wishes to join with an actor in
the same sub-network (i.e., white vertices), either of actors 32 or
33 would seem to be logical choices, based on their similarly large
vertex betweennesses. However, actor 34 has a substantially larger
co-betweenness with actor 32 (i.e., 35.96) than with actor 33 (i.e.,
0.4190), which suggests that {34, 33} will be a stronger coalition
than {34, 32}. This is confirmed by calculating the normalized group
betweenness, which is 0.4710 and 0.4125, respectively. On the other
hand, if 34 wishes to join with an actor in the other sub-network
(i.e., gray vertices), then actor 3 seems the logical choice, and the
coalition {34, 3} yields a normalized group betweenness of 0.4732.
Fig. 7. Co-betweenness for the karate club network. Actors in the upper-left and
lower-right corners, separated from the connected component, are isolated due to
zero betweenness. The two actors in the lower right-hand corner (i.e., a5 and a11)
have non-zero betweenness, but are bridges, in the sense that they only serve to
connect to other vertices, and hence have zero co-betweenness. (Note: The vertices
for actors with zero betweenness are drawn to have unit diameter, for purposes of
visibility.)
E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203
So the coalitions {34, 33} and {34, 3} are about equally strong. If
actor 34 desired a larger coalition, it would seem natural to consider
combining these two smaller coalitions to obtain {34, 33, 3}, and
doing so would yield a normalized group betweenness of 0.5818.
But it turns out that the alternative coalition {34, 32, 3} has an even
higher normalized group betweenness of 0.5984, despite the fact
that actor 32 was less preferable to actor 33 when paired with actor
34 alone. That such might be the case is suggested by the pattern
of pairwise co-betweenness values for these actors in Fig. 7. Actor
3 shares a non-trivial fraction of geodesic paths with actor 34, as
indicated by the rather thick line connecting them in this figure.
And actor 33, in turn, shares just slightly fewer geodesic paths with
actor 3. Since actors 33 and 34 themselves share few geodesic paths,
their two links to actor 3 indicate the sharing of two largely distinct
subsets of geodesic paths, thereby diminishing the contribution of
actor 3 ‘two-fold’ in some (rough) sense. On the other hand, while
actors 3 and 32 both share geodesic paths with actor 34, they do
not share any with each other. So while the contribution of actor
32, when joined with actor 34 alone, is less than that of actor 33,
its contribution when joined with the coalition {34,3} ultimately
surpasses that of actor 33.
While admittedly the relative strength of these coalitions is
something that could be inferred alternatively to some extent by
examination of the original network, in Fig. 6, and comparing the
location of the actors relative to each other within that network, the
above illustration is intended to demonstrate that it is possible to
reason quite effectively from the pairwise co-betweenness graph
alone. The bounds for group betweenness proposed in this paper
may be thought of as similarly utilizing this same information, but
in a more formal manner.
6. Discussion
Expansions are a common tool for representing the structure
of complex mathematical objects. And they can be especially useful when lower-order truncations are found to be accurate. For
example, the Taylor series expansion, and its corresponding first- or
second-order (i.e., linear or quadratic) approximations, is arguably
one of the most standard calculus tools used. Similarly, while probability distributions are known to have various representations in
terms of their complete set of moments, it is models based on
first- and second-order moments (i.e., means and covariances) that
underlie the vast majority of statistical modeling done in practice.
Here in this paper, we have shown that similar principles of expansion and truncation can be brought to bear on the study of the
betweenness centrality of groups of vertices in a graph, and we
have demonstrated the relevance of our results to the control of
information flow by coalitions of actors in social networks.
Our work makes clear the intrinsic nature of group betweenness
centrality. In particular, group betweenness is not simply a trivial
sum of the betweennesses of its individual actors, but rather is a
quantity that incorporates the co-betweennesses of all subgroups
of two or more actors. Nevertheless, we have also demonstrated
that it is possible to characterize the betweenness centrality of a
group of actors with sometimes remarkable accuracy using the cobetweenness of no more than pairs of actors. More generally, the
accuracy with which we can characterize group betweenness centrality in this manner provides direct insight into the composition
of the group and the relative redundancy of the actors, with respect
to the ‘control’ the group exerts over the flow of information over
the network. Specifically, greater accuracy implies more complementary roles, while lesser accuracy implies more redundant roles.
Such insight into the relative redundancy of actors in a group can
in turn be important, for example, in evaluating the robustness of
potential coalitions.
199
The idea that vertex betweenness and pairwise co-betweenness
together can provide significant insight into network information
flow, and its control, is further reinforced by the following interesting connection with the statistical modeling of network flows.
Recall the Abilene network described in Section 2, and suppose that
xs,t is a measure of the information (e.g., Internet packets) flowing
between vertices s and t in the network. Similarly, let yv be the
total information flowing through vertex v. Next, define x to be the
np × 1 vector of values xs,t , where np is the total number of pairs
of vertices exchanging information, and y, to be the nv × 1 vector
of values yv . And suppose, without loss of generality, that geodesic
paths in the network are unique (as is effectively the case in the Abilene network, for example). Then a common expression modeling
the relation between these two quantities is simply y = Rx, where
R is an nv × np matrix (i.e., the so-called ‘routing matrix’) of 0’s and
1’s, indicating through which vertices each given routed path goes.
If we now consider x as a random variable, with uncorrelated elements and sharing a common variance, then its covariance matrix
is simply proportional to the np × np identity matrix. The elements
of y, however, will be correlated, and their covariance matrix takes
the form ∝ RRT , by virtue of the linear relation between y and x.
Importantly, note that the diagonal elements of RRT are the vertex
betweennesses C(u, u) and, furthermore, the off-diagonal elements
are the co-betweennesses C(u, v). In other words, it is the firstand second-order co-betweenness values that are captured in the
covariance matrix of the quantity of information flowing through
the vertices, under this simple model for network information flow.
This example also serves to reinforce another point of our
work, although one that admittedly we have made only indirectly throughout: that pairwise co-betweenness is a quantity
that potentially is itself of fundamental interest, much like vertex
betweenness. It remains to explore in greater depth the implications of this assertion. For example, following the tendencies in
the statistical physics literature on complex networks (Albert and
Barabási, 2002; Pastor-Satorras and Vespignani, 2004), it can be of
interest to explore the statistical properties of co-betweenness in
large-scale networks. Some work in this direction may be found
in Chua (2006), where co-betweenness and functions thereof were
examined in the context of standard network models. The most
striking properties discovered were certain basic scaling relations
with distance between vertices. In a related direction, see also
Chua et al. (2006), where an analytical result is given relating
edge betweenness to the eigen-values of an edge pairwise cobetweenness ‘covariance’ matrix, defined in analogy to the matrix
described above.
On a side note, it should be mentioned that extensions of cobetweenness to contexts other than that of an undirected graph are
certainly possible. For example, we have also developed the analogous quantities and algorithms for pairwise vertex co-betweenness
on weighted graphs (which were used in the computations for
the examples involving the Abilene network) and for pairwise
edge co-betweenness on unweighted and weighted graphs. Details
may be found in Chua (2006). The extension to directed graphs
should also be straightforward, although we have not implemented
it.
In terms of future work, it would also be of interest to explore
the implications of our expansion for group betweenness and the
accuracy of our bounds in networks of various topologies and various sizes. Such an exploration would be particularly relevant in the
context of coalition formation, and would facilitate a study of the
relationship between coalition robustness and redundancy of group
members, as referred to above. The work presented here is intended
to serve as a foundation in this regard, as it clearly enables such further explorations. In particular, we note that it is in larger networks
that groups of non-trivial size m can be examined and, similarly,
in which higher orders of co-betweenness will potentially become
200
E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203
more relevant (i.e., recall our discussion of the truncation induced
by the diameter of a graph, at the end of Section 3.1).
An interesting question left unaddressed by our work is whether
or not the structure inherent in the inclusion–exclusion formula
(7) can be exploited to develop efficient algorithms for exact computation of group betweenness. Certainly the fact that our upper
bound was so often observed in our numerical work to achieve the
actual group betweenness value suggests that this may be so. The
recent work of Puzis et al. (2007) is possibly relevant in this regard.
These authors provide a fast algorithm for successive computation
of group betweenness centrality, consisting of two stages. The first
stage, which they call a pre-processing stage, essentially computes
relevant second-order co-betweenness quantities for all pairs of
vertices, although the details of this stage are not given explicitly
and the notion of co-betweenness itself receives no special attention. In the second stage, a post-processing step is applied to these
quantities, which computes the betweenness of a group A of size m
recursively from the betweennesses of successive subgroups of size
j ≤ m, starting with j = 2. This latter stage takes O(m3 ) time, which
is more expensive than the O(m) time required for our upper bound,
and the O(m2 ) time required for our lower bound. However, in general, of course, the computations in both their method and ours are
dominated by the initial pre-processing stage for sufficiently small
m. Puzis et al. (2007) offer some computational tricks for avoiding
the calculation of unnecessary co-betweenness values, which could
be incorporated into our method as well.
Appendix A
A.1. Derivation of key expressions
and
st (u, v) =
sv vt
0
if d(s, t) = d(s, v) + d(v, t),
otherwise,
ıst (v) =
t ∈ V \{v}
st
st (u, v)=su uv vt =sv (u) vt =
.
(23)
sv (u)
sv vt =ısv (u) st (v),
sv
(24)
and
st (u, v)
ısv (u) st (v)
=
= ısv (u) ıst (v).
st
st
ıst (u, v) =
(25)
These two relations allow us to show that
ıs (u, v) =
ıst (u, v)
(26)
t ∈ V\{u,v}
ısv (u) ıst (v)
by (25)
(27)
since ısu (v) = 0 by (19)
(28)
t ∈ V\{u,v}
= ısv (u) ıs (v)
=
ıs (v)
sv (u)
sv
by (24)
(29)
We use this result to re-express the co-betweenness defined in (3)
as
ıst (u, v)
s,t ∈ V\{u,v}
=
s ∈ V\{u,v}
=
(30)
ıst (u, v)
(31)
t ∈ V\{u,v}
ıs (u, v)
(32)
s ∈ V\{u,v}
ıs (v)
sv
sv (u).
(33)
Lastly, to establish the recursive relation in (15), note that for a
child vertex w ∈ cs (v) every path to v gives rise to exactly one path
to w by following the edge (v, w). This means that
sw (v) = sv
d(s, u) ≤ d(s, v).
ısw (v) =
(20)
st (v)
ıst (v) =
Note that unlike Brandes (2001), we exclude t = v from the sum
in Eq. (23). Two relations that follow immediately from these definitions, combined with Eqs. (17) and (18), are
and that
st (u, v)
ıst (u, v) =
,
st
(21)
(22)
t ∈ V \{v}
For the sake of notational simplicity, we will assume, without
loss of generality, that
for the remainder of this discussion.
The remaining quantities we need to introduce are notions of
the path-dependency of vertices. In the spirit of Brandes (2001),
we define the “dependency” of vertices s and t on the vertex pair
(u, v) as
.
st (v)
,
st
ıs (v) =
=
(19)
st
and the dependency of s alone on v as
(17)
if d(s, t) = d(s, u) + d(u, v) + d(v, t),
if d(s, t) = d(s, v) + d(v, u) + d(u, t), (18)
otherwise.
t ∈ V \{u,v}
Similarly, we define the pair-wise dependency of s and t on a
single vertex v as
s ∈ V\{u,v}
su uv vt
sv vu ut
0
st (u, v)
ıst (u, v) =
t ∈ V \{u,v}
C(u, v) =
Central to our algorithm are the expressions in Eqs. (15) and
(16), the derivations for which we present here. Before doing so,
however, we need to introduce some definitions and relations. Let
d(s, t) be the geodesic distance between two vertices s and t. Note
that a simple combinatorial argument shows that
st (v) =
ıs (u, v) =
=
This appendix contains details specific to the proposed algorithm for computing co-betweenness, including a derivation of
key expressions, a rough analysis of algorithmic complexity,
and pseudo-code. Actual software implementing our algorithm,
written in the Matlab software environment, is available at
(http://math.bu.edu/people/kolaczyk/software.html)
and we define the dependency of s alone on the pair of vertices
(u, v) as
for w ∈ cs (v),
sw (v)
s v
=
sw
sw
for w ∈ cs (v).
(34)
(35)
Also note that for t = w we have
ıst (w) = 1.
(36)
This allows us to decompose ıs (v) in essentially the same manner
as Brandes (2001), namely,
ıs (v) =
t ∈ V\{v}
ıst (v)
(37)
E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203
=
ıst (v, w)
(38)
ıst (v, w)
(39)
t ∈ V\{v} w ∈ cs (v)
=
w ∈ cs (v) t ∈ V\{v}
=
ısw (v) ıst (w)
by (25)
(40)
w ∈ cs (v) t ∈ V\{v}
=
sv
w ∈ cs (v)
=
sw
sv
w ∈ cs (v)
sw
1+
ıst (w)
by (35) and (36)
(41)
t ∈ V\{v,w}
(1 + ıs (w)).
(42)
A.2. Algorithmic complexity
Standard breadth-first search results put the running time for
the first stage of our algorithm at O(ne ), and since we touch each
edge at most twice when we compute the dependency scores ıs (v),
the running time for the second stage is also O(ne ). Since we repeat
each stage for each vertex in the network, the first two stages have a
running time of O(nv ne ). The running time for the depth-first traversal, that occurs during the third stage, depends on the number and
length of all geodesic paths in the network. Overall, we visit every
geodesic path once and compute a co-betweenness contribution for
each edge of every geodesic path. For ‘small-world’ networks, i.e.,
networks with an O(log nv ) diameter, we must compute O( · log nv )
contributions, where
uv
u,v ∈ V
is the total number of geodesic paths in the network.
So the overall running time for the algorithm is O(nv ne +
evidence suggests that the upper bound for
log nv ). Empirical
the average (1/|V|) u ∈ V uv ranges from n0.19
to n0.32
for comv
v
mon random graph models, and at worst has been seen to reach
n0.62
in the case of a network of airports.
(In the latter case, there
v
were extreme fluctuations in (1/|V|) u ∈ V uv so the total number of
geodesic paths, , might be much smaller than nv (nv − 1) times this
2+p
upper bound.) This suggests a running time of O(nv ne + nv log nv ),
though it is an open question to show this rigorously. In the case
of sparse networks, where ne ∼nv , this reduces to a running time of
2+p
O(nv log nv ).
A.3. Pseudo-code
where the last equality is due to the fact that since w is a child of v
we have sv (w) = 0 and thus ısv (w) = 0.
=
201
(43)
Here we provide pseudo-code for the computation of the vertex
co-betweenness in the case of an undirected graph with no edge
weights. The main function listed in Algorithm 1 loops over each
vertex s ∈ V and performs the three stages of the co-betweenness
algorithm described in Section 4. The three functions called in the
main loop carry one of the three stages in the co-betweenness computation that were described. Pseudo-code for BF-count-paths,
which carries out the breadth-first traversal used in the first stage,
is presented in Algorithm 2. The computation of dependency scores
ıs (v) is handled in the second stage by score-vertices, which is
described in Algorithm 3, and the third stage in which the contributions to the co-betweenness are accumulated is detailed in the
pseudo-code for DF-visitgiven in Algorithm 4.
Algorithm 1. The main function for computing the vertex cobetweenness for all vertices.
202
E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203
Algorithm 2. Breadth-first traversal of the graph starting at s.
Used in the first stage to compute intermediary quantities needed
for the computation of the co-betweenness. Here s (v) is the sv
that appeared earlier.
Algorithm 3. Computation of the vertex scores ıs (v) defined in
(23). Used in the second stage of the computation.
E.D. Kolaczyk et al. / Social Networks 31 (2009) 190–203
Algorithm 4. The depth-first traversal of the third stage, used to
accumulate the vertex co-betweenness contributions.
References
Albert, R., Barabási, A.-L., 2002. Statistical mechanics of complex networks. Reviews
of Modern Physics 74, 47–97.
Barrat, A., Barthélemy, M., Vespignani, A., 2005. The effects of spatial constraints on
the evolution of weighted complex networks. Journal of Statistical Mechanics,
05003.
Barrat, A., Barthélemy, M., Vespignani, A., 2008. Dynamical Processes on Complex
Networks. Cambridge University Press, Cambridge.
Borgatti, S., Everett, M., 2006. A graph-theoretic perspective on centrality. Social
Networks 28, 466–484.
Brandes, U., 2001. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology 25, 163.
Chua, D.B., 2006. Statistical analysis for whole networks. PhD thesis, Department of
Mathematics and Statistics, Boston University.
203
Chua, D.B., Kolaczyk, E.D., Crovella, M., 2006. Network kriging. IEEE Journal of
Selected Areas in Communications 24, 2263–2272.
Clark, J., Holton, D.A., 1991. A First Look at Graph Theory. World Scientific.
de Nooy, W., Mrvar, A., Batagelj, V., 2005. Exploratory Social Network Analysis with
Pajek. Cambridge University Press, Cambridge, UK.
Dominguez, C.B.K., 2008. Party coalitions and interest group networks. In: Paper
prepared for delivery at the Annual Meeting of the American Political Science
Association, Boston, MA, August 28–September 1, 2008.
Everett, M., Borgatti, S., 1999. The centrality of groups and classes. Journal of Mathematical Sociology 23 (3), 181–201.
Freeman, L.C., 1977. A set of measures of centrality based on betweenness. Sociometry 40, 35–41.
Galambos, J., Simonelli, I., 1996. Bonferroni-type Inequalities with Applications.
Springer, New York.
Karger, D., Motwani, R., Ramkumar, G., 1997. On approximating the longest path in
a graph. Algorithmica 18, 82–98.
Merida-Campos, C., Willmott, S., 2007. Exploring social networks in request for proposal dynamic coalition formation problems. Lecture Notes in Computer Science
4696, 143–152.
Michael, J., 1997. Labor dispute reconciliation in a forest products manufacturing
facility. Forest Products Journal 47, 41–45.
Pastor-Satorras, R., Vespignani, A., 2004. Evolution and Structure of the Internet: A
Statistical Physics Approach. Cambridge University Press, Cambridge.
Puzis, R., Elovici, Y., Dolev, S., 2007. Fast algorithm for successive computation of
group betweenness centrality. Physical Review E 76, 056709.
Wasserman, S., Faust, K., 1994. Social Network Analysis: Methods and applications.
Cambridge University Press, Cambridge.
Worsley, K.J., 1982. An improved Bonferroni inequality and applications. Biometrika
69 (2), 297–302.
Zachary, W., 1977. An information flow model for conflict and fission in small groups.
Journal of Anthropological Research 33, 452–473.