To protect the rights of the author(s) and publisher we inform you that this PDF is an uncorrected proof for internal business use only by the author(s), editor(s), reviewer(s), Elsevier and typesetter
TNQ Books and Journals Pvt Ltd. It is not allowed to publish this proof online or in print. This proof copy is the copyright property of the publisher and is confidential until formal publication.
ISB2 43112
a0005
Clustering and Cohesion in Networks: Concepts and Measures
James Moody and Jonathan Coleman, Duke University, Durham, NC, USA
AU1
! 2015 Elsevier Ltd. All rights reserved.
Abstract
abspara0010
s0010
Social networks are not homogeneous but typically grouped into subsets of strongly reconnected groups. Here we review the
literature on structural cohesion and clustering in networks. We divide our review into sections based on overall measures of
cohesion and approaches to finding subgroups in larger networks.
Introduction
p0010 Network cohesion and clustering are important for under-
AU2
standing how social networks shape communities, facilitate
norm maintenance, or form the basis of categorical
group identity, among many other things (Freeman, 1992;
Pescosolido and Rubin, 2002; Martin, 2009). Substantively, we
expect that norms and diffusion circulate more readily within
cohesive networks, and that clusters of people connected to
each other – peer groups or crowds – are likely to be similar. If
we seek to measure cohesion and identify clusters in networks,
then we must answer two related methodological questions.
First, how best to operationalize cohesion on a network and
second how to identify the naturally occurring ‘cohesive
subgroups’ that emerge in most social settings. This chapter
reviews current work on the related problems of structural
cohesion and clustering in social networks.
p0015
We understand social networks as graphs of social relations,
G(V,E), where the nodes (V) of the graph are typically people
and the relations (E) are some positive social connection, such
as friendship, love, or communication, linking pairs of nodes
(for the substantive problem of cohesion, relations should be
thought of as carrying positive social meaning rather than mere
contact or negative relations. See Friedkin (2004) for further
reflections on the relations between individual and group
cohesion). Relations can be directed and valued, although
most of the discussion below will focus on binary undirected
networks for simplicity. A path in the network is a (possibly
directed) sequence of nodes and edges starting and ending with
a node but never repeating. Two nodes are said to be reachable
if there is a path from one node to the other, and the graph is
connected if all pairs are reachable. Two paths are node (edge)
independent if they share the same start and end nodes but
overlap on no other nodes (edges). The geodesic distance is the
shortest path connecting two nodes. If there is no path connecting a pair, we define the distance as infinite. A cycle is
a path that starts and ends on the same node. The density of
a graph is the average value of edges calculated over all pairs:
P
ð ij Aij Þ=½NðN $ 1Þ%.
s0015
Structural Cohesion
p0020 Cohesive networks should be difficult to separate. For cohesive
networks, the relations should bind the collective together. In
such settings, we expect that people have many ties with others
International Encyclopedia of the Social & Behavioral Sciences, 2nd edition
and that the ties are widely distributed (rather than routing
through one node). In cohesive networks, people are generally
close to each other (Moody and White, 2003; Friedkin, 2004).
The archetypical structural pattern capturing these intuitive p0025
cohesion ideas is the clique (Luce and Perry, 1949; Alba, 1973):
a collection of nodes with edges connecting every pair in the
network. Every node in a clique of size n is connected to all
n $ 1 other nodes. The complete nature of a clique leads to
maximal values on other features, which have typically been
used as indicators of cohesion or clustering. For example, the
density of a binary graph is 1, the distance between all nodes is
1, and every pair has n $ 2 contacts in common, implying that
all triads within the graph are complete. Disconnecting a clique
requires deleting at least n $ 1 edges or nodes.
The complete adjacency requirement of cliques means they p0030
are substantively difficult to use in real-world data settings
where some element of randomness is usually present. First,
known substantive groups are rarely complete cliques, as any
missing edge undermines the complete connectivity requirement (Moreover, if a survey design for collecting network data
has limited the respondent to nominating k alters, then the
maximal size of clique in the network must be k.). Second,
since cliques can overlap (by as many as k $ 1 nodes), attempts
to use cliques as markers for cohesion frequently must aggregate overlapping clique memberships (Palla et al., 2005; Evert
and Borgatti, 1998). This leads researchers to search for groups
that are ‘clique like’, but not complete. Understanding exactly
what counts as ‘clique like’ has been the heart of the methodological challenge in this area, but most have focused on
relaxing one of the dimensions that defines a clique, effectively
capturing how ‘close’ to a pure clique a given graph might be.
The simplest (because it involves only summing edges) such p0035
generalization has been to use the graph’s density. While
intuitive (cohesive groups should be dense), density is an
insufficient measure of cohesion. Consider Figure 1, which
shows three networks with the same density, but differing path
structures, and thus substantively different cohesion (for how
density nonetheless constrains structure, see Faust, 2006) This
general sort of problem, that networks have the same score on
a given clique-maximal characteristic nonetheless have
different structures that clearly imply varying cohesion, is at the
heart of the difficulty in generalizing clique features to measure
cohesion. This is true for many attempts at measuring cohesion, such as the distance between nodes in a group (N-Clique,
Bron and Kerbosh, 1973), the average number of nodes each
node is connected to (the k-plex or k-core, Seidman and Foster,
http://dx.doi.org/10.1016/B978-0-08-097086-8.43112-0
1
To protect the rights of the author(s) and publisher we inform you that this PDF is an uncorrected proof for internal business use only by the author(s), editor(s), reviewer(s), Elsevier and typesetter
TNQ Books and Journals Pvt Ltd. It is not allowed to publish this proof online or in print. This proof copy is the copyright property of the publisher and is confidential until formal publication.
ISB2 43112
2
Clustering and Cohesion in Networks: Concepts and Measures
f0010 Figure 1
Three graphs with same density but different cohesion.
f0015 Figure 2
Graph with high edge connectivity but low node connectivity.
1978), or the proportion of transitive triples (such as the
clustering coefficient (Watts and Strogatz, 1998). At the heart of
the problem is the understanding that cohesion rests on levels
of connectivity – on the pattern of paths as much as the volume
of ties.
p0040
In response to the ambiguity of these many measures,
Moody and White (2003) introduced node connectivity as
a necessary measure of structural cohesion. Starting with the
simple idea that cohesion must at a minimum imply connectivity, Moody and White argued for generalizing connectivity
based on the difficulty of disconnecting a network. A graph can
be disconnected by removing either nodes or edges. Edge
connectivity is the minimum number of edges that one has to
remove to disconnect the graph while node connectivity is the
minimum number of vertices one has to remove to disconnect
the graph. While these might seem similar, it turns out that one
can easily have high edge connectivity even if node connectivity
is low, while the opposite is not true: high node connectivity
implies high edge connectivity. As an illustration, consider
Figure 2. Here the graph is four-edge connected but one-node
connected, meaning that graph resilience is dependent on the
center star-like cut-node. Since social cohesion turns on the
supraindividual character of the setting (Simmel, 1950) node
connectivity better characterizes the intuitive notion that
a cohesive group is held together but not dominated by a single
node, and thus Moody and White take biconnectivity (twonode-separable graphs) as the fundamental starting point for
structural cohesion.
Node connectivity implies other features that similarly map p0045
well to our understanding of social cohesion. First, a kconnected component implies that every pair of nodes is
connected by at least k-node-independent paths and implies
(k $ 1) node-independent cycles among all pairs in the graph.
This means that k-components (for k > 1) contain multiple
paths between pairs that never cross the same set of nodes. For
social networks, the existence of multiple paths implies many
alternative routes through the network that could generate high
levels of information diffusion (see Centola and Macy, 2007
To protect the rights of the author(s) and publisher we inform you that this PDF is an uncorrected proof for internal business use only by the author(s), editor(s), reviewer(s), Elsevier and typesetter
TNQ Books and Journals Pvt Ltd. It is not allowed to publish this proof online or in print. This proof copy is the copyright property of the publisher and is confidential until formal publication.
ISB2 43112
Clustering and Cohesion in Networks: Concepts and Measures
3
within which any given node is nested is recommended.
Moments, such as the mean, of the distribution of pairwise
joint maximal cohesion provide simple global summaries,
although the minimum is the most theoretically consistent
measure of overall network cohesion (the network is as strong
as the weakest link).
Relative Density Measures
f0020 Figure 3
Nested k-components of a network.
for related ideas with respect to diffusion) and promote the
conditions necessary for enforceable trust, a key element for
norm formation (Mark, 1998).
p0050
Node connectivity implies a hierarchical ordering of the
groups in the network, since higher valued k-components are
nested within lower valued k-components. For example, all
nodes in Figure 3 are members of a (1-)component. Nested
within this are two bicomponents (sets {1–8}, {9–12}) and
nested within the largest bicomponent is a three-component
({3–8}). The full enumeration of the nested node components
provides a cohesive blocking of the network and is a complete
summary of the structural cohesion of the graph. The depth of
involvement within this hierarchical structure is a natural
operationalization of network embeddedness (Grannovetter,
1985; Uzzi, 1999), and thus the highest value k-component
s0020
Moving from the network as a whole to subgroups within p0055
a network, many recognize that ties within a group typically
come at the expense of ties between groups (Freeman, 1992).
Thus, we expect that group (we follow the sociological
convention and refer to these collections of nodes as ‘groups’,
the other common terms used in statistical physics or biology
include ‘communities’ or ‘modules’, respectively) members
interact with each other more often than with others, and
measures based on this balance between intra- and intergroup
ties are common. Theoretically we should distinguish the
relations within the group (what is best thought of as cohesion
per se) from the structural distinctiveness of the group – the
level of boundary crossing represented by between-group ties.
(To give an intuitive sense of the difference, imagine two
systems with multiple clusters but no ties between the clusters.
In that case the relative density measures would be the same for
any internal cluster density above zero.) Node cohesion
implies a particular path structure within groups (k-nodeindependent paths between all pairs) and maximal contact
between groups (cannot be more than k $ 1). Building on this
basic balance of within vs. between-group ties, a natural
measure for how grouped a network is would be a relative score
for the ties within compared to the ties between. One can easily
summarize the group-level tie counts with a mixing matrix,
where each row and column represents a group, and cell values
are the count of relations within/between groups. Relative
density measures are some function of the values on the
diagonal compared to the values off the diagonal. An example
is given in Figure 4, with the groups indicated by the shaded
ellipses under the network. The mixing matrix cells provide tie
counts (top number in each cell) and expected values under
independence (bottom).
There has been some debate over the optimal function to p0060
weight within and between-group ties. Freeman (1978) introduced the segregation index, as (E $ O)/E, where E is the
number of expected cross-group ties and O is the number of
observed cross-group ties. The logic of this index is that segregation implies difference from random mixing: if ties were
distributed randomly with respect to the group labels, we
would find as many observed as expected, and the score would
be zero (completely nonsegregated). If, on the other hand,
groups were strongly bounded we would observe very few
cross-group ties, and the score would approach one. Thus the
measure interpolates between negative values (more crossgroup ties than within-group ties), random mixing (0), and
strongly bounded groups (1.0). Similar scores simply take the
ratio of diagonal cells to off-diagonal or the odds ratio of
a within-group tie (overall or group specific, see Frank, 1995).
Each of these scores can be calculated for the network as
a whole or for specific groups.
To protect the rights of the author(s) and publisher we inform you that this PDF is an uncorrected proof for internal business use only by the author(s), editor(s), reviewer(s), Elsevier and typesetter
TNQ Books and Journals Pvt Ltd. It is not allowed to publish this proof online or in print. This proof copy is the copyright property of the publisher and is confidential until formal publication.
ISB2 43112
4
Clustering and Cohesion in Networks: Concepts and Measures
f0025 Figure 4
p0065
AU3
Example of subgroups in a network and mixing table used for relative density measures.
Of late, the most common relative density measure is the
modularity index (Newman and Girvan, 2004). The modularity index is calculated as
"
X 1 !
Ki Kj
Q ¼
dðCi Cj Þ
Aij $ g
2m
2m
ij
where m is the number of edges, k is the degree, A is the adjacency matrix, d is an indicator for whether node i and j are in
the same subgroup or not, and g is a ‘resolution parameter’
that identifies the scale at which clustering is observed
(Fortunato and Barthélemy, 2007; Reichardt and Bornholdt,
2006). Substantively, [kikj/2m] represents the null model –
the expected likelihood of contact between two nodes –
here simply dependent on the degrees of the pair, so
$"
#
!
Ki Kj
is the connectivity above random expectaAij $ g
2m
tion, normalized by the total volume of ties in the network.
This normalization is key, as it ensures that the index takes
a maximum value if all ties fall within separate groups. This is
the same logic as Freeman’s segregation index, but because Q
takes a value of 0 if there is only one group, the function must
have a maximum value for multiple group partitions, which
makes the quantity a very useful guide for identifying unknown
communities in networks (see Porter et al., 2009 for review).
p0070
Three features of the modularity score are important to
note. First, since the function can be searched for a maximum
value, it had been seen as a solution to the problem of identifying the number of cohesive groups in a network. Identifying
the number of naturally occurring clusters in a dataset is a longstanding problem, and for networks, most prior work required
the user to specify the number of groups, or relied on principled but known-to-be arbitrary stopping criteria (Moody,
2001; Frank, 1995). Since modularity captures how grouped
a network is and has a clear maximum, people had assumed
that maximizing modularity automatically revealed the
number of groups in the data. Unfortunately, the discovery of
the resolution parameter issue (Fortunato and Barthélemy,
2007) has undone this sense of natural maximization, since
changing the resolution parameter will change the number of
groups that maximize Q for any given network, reintroducing
an arbitrary parameter that controls the number of clusters in
the group. (This is a common problem, not unique to finding
groups in networks. Any stopping/selection criteria for creating
distinctions in a continuum would be similar. For example, the
notion that principle components extracted from a factor
analysis are retained when the eigenvalue is greater than 1.0 is
similarly arbitrary (Kim, 1978). The advantage of principledbut-arbitrary selection rules is that they provide a basis for
consistency across studies, rather than leaving it entirely at the
discretion of investigators.) Since a resolution parameter of one
(i.e., ignoring the parameter) is arbitrary, the notion that the
score determines the ‘natural’ number of found groups is not
likely true.
Second, having a high modularly score only ensures that the p0075
groups as observed are distinct, not that they are internally
cohesive. The modularity score will always increase if disconnected groups are assigned to distinct communities, since the
score rests on the number of intragroup ties. This means that
any assignment to groups that truly maximizes the modularity
(see point three below) should identify internally connected
groups. Beyond this simple connectivity requirement, however,
the score is based merely on volume and does not ensure any
formal limit on the group’s internal structure.
Third, while the modularity score can be maximized, it is p0080
not necessary that the maximum modularity value correspond
to a unique assignment of nodes to groups. That is, there
may be many partitions on a network that all maximize
modularity, but imply different substantive assignments
(Bagrow and Bollt, 2005; Clauset, 2005; Good, et al., 2010).
While such assignments may overlap significantly, lack of
a one-to-one correspondence between maximum modularity
To protect the rights of the author(s) and publisher we inform you that this PDF is an uncorrected proof for internal business use only by the author(s), editor(s), reviewer(s), Elsevier and typesetter
TNQ Books and Journals Pvt Ltd. It is not allowed to publish this proof online or in print. This proof copy is the copyright property of the publisher and is confidential until formal publication.
ISB2 43112
Clustering and Cohesion in Networks: Concepts and Measures
and group assignment has further lessened the extent to which
one can depend on modularity in defining cohesive groups.
s0025
Cohesive Group Detection
p0085 While modularity is a general index for the relative volume of
ties within and between specified groups, an obvious related
question turns on how one identifies cohesive subgroups in
a network. The field of approaches for identifying cohesive
groups – called ‘community detection’ within the computer
science and statistical physics literature – is growing rapidly. For
a full review see Porter et al. (2009) or Schaeffer (2007). Here,
we review the history of subgroup search procedures and
outline a few of the more popular approaches. We divide the
field by a basic distinction between deterministic, often graphtheoretic based, approaches and heuristic approaches that
often include some level of randomization.
s0030 Deterministic Group Finding Approaches
p0090 The simplest approaches are the direct extensions of clique
features discussed above. A common approach is to identify
k-cores – maximal subgraphs where all nodes are connected to
at least k other nodes in the set (Siedman, 1983). While every
k-node-connected group is a k-core, not all k-cores are k-node
connected (Moody and White, 2003; White and Harary, 2001)
as such, k-cores are computationally efficient, but ambiguous.
In practice, this efficiency can be useful when trying to find
k-connected groups in large networks (Powell et al., 2005).
p0095
An obvious deterministic group-finding approach is to start
with cliques, but real-world networks tend to have many small
overlapping cliques that do not obviously represent groups.
One approach is to count the number of shared cliques between
nodes and treat this as a new higher order valued matrix, which
can then be clustered using standard cluster or factor analysis
tools (Everett and Borgatti, 1998). A nice generalization of this
approach can be found with the clique percolation method
(CPM) (Palla et al., 2005). CPM is a deterministic groupfinding algorithm, which allows groups to overlap in a very
limited way. The CPM starts with cliques of size k (k-cliques)
and defines two k-cliques as adjacent if they overlap by k $ 1
nodes. A community is then the maximal union of k-cliques
that can be reached through adjacent k-cliques. The communities so identified can overlap, but by no more than k $ 2
nodes (Derenyi et al., 2005), and becomes difficult to use in
large networks with high values of k.
p0100
A second deterministic approach focuses on cutting the
graph in careful ways. Early work used the min-cut, maxflow features to break the network into edge-connected sets
(see, for example, Borgatti et al., 1990). The most popular
recent work in this area divides the network removing key
edges until the graph is no longer connected (Girvan and
Newman, 2002). The trick to this method is to calculate
‘edge-betweenness’ as the number of times the shortest path
between any pair of nodes passes through a given edge. One
then deletes the edge with the highest score, recomputes the
edge-betweenness score for the remaining edges, and repeats
until the graph is disconnected. This is the first cut of the
graph. One then repeats the method on the resulting
5
components. The algorithm is computationally intensive,
but seems to perform quite well (note the procedure is
theoretically deterministic, but ties in the edge-betweenness
level will require a tie-breaking rule, which is often
random).
A third sort of deterministic clustering rests on functions of p0105
the underlying eigenstructure of the network. This has emerged
repeatedly, variously seen as applications of principal component analysis (Cairns and Cairns, 1994), peer influence models
(Moody, 2001), or spectral clustering (Newman, 2006). Early
social network researchers used principal components or factor
analysis to identify groups from interaction (e.g., Wright and
Evitts, 1961) or nomination matrices (e.g., Bock and Husain,
1952; MacRae, 1960; Bagwell et al., 2000). The approaches
share the premise that groups can be conceptualized as individuals whose friendship are similar (i.e., correlated). A
common practice is to run factor analysis tools on the observed
network, then retain all factors with eigenvalues greater than
1.0 and factor loadings above a predetermined threshold (see
Gest et al., 2007 for comparison).
This approach has clear links to the block-modeling tradi- p0110
tions rooted in CONCOR (White et al., 1976), where actors are
classified as similar if they have similar nomination patterns to/
from others in the network. At their core, these models share
a use of eigenvalues (or close approximations of eigenvalues)
as the key information source to identify the partition. Such
models work because the eigenstructure captures an underlying
random-walk process that maps well onto diffusion ideas,
which are captured by cycles and redundancy in the network.
As such, these models are likely well suited to link groups to
behavior, as the underlying model is concordant between the
two features of groups.
Search/Optimization Approaches
s0035
Another general approach to finding groups in networks p0115
involves searching over possible partitions to optimize some
function that identifies the best assignment of nodes to cohesive groups. In theory, one could imagine assigning nodes to
groups exhaustively: trying all possible assignments until the
best possible solution is found. Since such exhaustive search is
computationally prohibitive for any real-world sized networks,
we instead use heuristic approaches that aim to efficiently
approximate an ideal assignment of nodes to groups. In
general, these approaches must balance between computation
costs, accuracy, and arbitrariness. For example, a simple way to
reduce computation cost is to have users specify the number of
groups, but that is often an arbitrary choice. One can also
reduce computation by using graph approximations, which
sacrifices accuracy. Since most of the metrics used to optimize
heuristics are based on some version of a relative density score
(modularity, Freeman’s segregation index, odds ratio), all the
problems associated with those scores carry over to heuristics
designed to optimize them.
Direct optimization routines seek to sort nodes into cohe- p0120
sive groups in a manner that maximizes some clustering
objective function. The Newman (2004) greedy method is one
such agglomerative technique that consecutively combines
nodes to form large groups. Starting with each node in its own
group, the routine joins groups together in pairs, choosing the
AU4
To protect the rights of the author(s) and publisher we inform you that this PDF is an uncorrected proof for internal business use only by the author(s), editor(s), reviewer(s), Elsevier and typesetter
TNQ Books and Journals Pvt Ltd. It is not allowed to publish this proof online or in print. This proof copy is the copyright property of the publisher and is confidential until formal publication.
ISB2 43112
6
Clustering and Cohesion in Networks: Concepts and Measures
pairing that maximizes the increase in modularity. The Louvain
method, implemented by Blondel et al. (2008), is an extension
of this idea, locally optimizing modularity then replacing
communities with ‘supernodes’ and links between communities as weighted ties, yielding a smaller network. This iterates
until modularity is maximized, leading to a hierarchical nesting
of nodes.
p0125
In a similar vein on a different metric, Ken Frank’s KliqueFinder (Frank, 1995) optimizes the odds of an in-group tie
directly. The objective function can then be represented as
a logistic regression model of the form
log
!
pðAij ¼ 1Þ
pðAij ¼ 0Þ
"
¼ q0 þ q1 samegroupij
So the task is to identify the group assignment labels in
a way that maximizes the mixing parameter q1. The procedure
for making this assignment involves identifying a seed triad,
adding nodes to the group until no improvement is made,
then starting with a new seed, and repeating this assignment
process until all nodes are in initial groups. A similar
approach is used with latent space group models (Krivitsky
et al., 2009).
p0135
In many of these approaches, initial decisions and ties in
clustering values can create assignments that are locally optimal
but not globally optimal. A higher order pass through the data
can then be used to move members between identified groups,
merge groups, or split groups. A formal model for this sort of
higher order mixing was proposed by Kernighan and Lin
(1970) (see also Newman, 2006), but similar ideas are
implemented in Frank’s Kliquefinder and Moody’s (2001)
algorithm, among others.
p0140
A new development in group-finding tools extends the
model to dynamic networks (Mucha et al., 2010). Here the task
is to identify clusters based on the evolution of the network.
The key insight builds on the random walk/diffusion notions
embedded in eigenmodels discussed above, but applies the
process to a graph containing two sorts of links: those connecting nodes to each other within time slices and those connected nodes to themselves across time slices. The result is
a ‘multislice’ network, and modularity maximization tools can
then be run over the entire compiled network.
detection method that has no arbitrary parameters and can
identify real underlying groups with certainty. Instead, there
is always the risk of multiple equivalent partitions and only
locally optimal searches, with even these distinctions
dependent to some degree on user-specified parameters. This
implies that any search for groups in networks needs to be
empirically informed by theory and knowledge of the setting,
to help provide external validation for claimed group
assignments. This is, then, a rich and open area for future
research.
See also: 43106; 43107; 43116; 43120; 43121.
p0130
s0040
Summary and Conclusion
p0145 Cohesion is a seeming simple idea that becomes more
complicated once we attempt to operationalize it in real
data. The problem is fundamental to the phenomena: realworld data are ambiguous and messy, but analysts prefer
simple (exhaustive and mutually exclusive) groups with strict
graph-theoretic definitions or searches without arbitrary
parameter choices. While we have made progress in operationalizing structural cohesion overall (Moody and White,
2003), the algorithm is complex (i.e., slow in large
networks), and the resulting cohesive blocking does not lend
itself to a simple scalar representation. Work on subgroup
detection methods is developing rapidly, so it is too soon to
conclude that line of reasoning. However, the fundamental
challenge here is that we appear unlikely to identify a single
References
Alba, R.D., 1973. A graph-theoretic definition of a sociometric clique. The Journal of
Mathematical Sociology 3 (1), 113–126.
Bagrow, J.P., Bollt, E.M., 2005. A local method for detecting communities. Physical
Review E 72 (4), 046108.
Bagwell, C.L., Coie, J.D., Terry, R.A., Lochman, J.E., 2000. Peer clique participation
and social status in preadolescence. Merrill-Palmer Quarterly: Journal of Developmental Psychology.
Blondel, V.D., Jean-Loup, G., Lambiotte, R., Lefebvre, E., 2008. Fast unfolding of
communities in large networks. Journal of Statistical Mechanics.
Bock, R.D., Husain, S.Z., 1952. Factors of the tele: a preliminary report. Sociometry
15 (3/4), 206–219.
Borgatti, S.P., Everett, M.G., Shirey, P.R., 1990. LS sets, lambda sets and other
cohesive subsets. Social Networks 12 (4), 337–357.
Bron, C., Kerbosch, J., 1973. Algorithm 457: finding all cliques of an undirected
graph. Communications of the ACM 16 (9), 575–577.
Cairns, R.B., Cairns, B.D., 1994. Lifelines and Risks: Pathways of Youth in Our Time.
Cambridge University Press.
Centola, D.J., Macy, M., 2007. Complex contagions and the weakness of long ties.
American Journal of Sociology.
Clauset, A., 2005. Finding local community structure in networks. Physical Review E
72 (2), 026132.
Derenyi, I., Palla, G., Vicsek, T., 2005. Clique percolation in random networks. Physical
Review Letters 94 (16), 160202.
Everett, M.G., Borgatti, S.P., 1998. Analyzing clique overlap. Connections 21, 49–61.
Faust, K., 2006. Comparing social networks: size, density, and local structure.
Metodolo!ski Zvezki 3 (2), 185–216.
Fortunato, S., Barthélemy, M., 2007. Resolution limit in community detection.
Proceedings of the National Academy of Sciences 104 (1), 36–41.
Frank, K.A., 1995. Identifying cohesive subgroups. Social Networks 17 (1), 27–56.
Freeman, L.C., 1978. Segregation in social networks. Sociological Methods and
Research 6, 411–429.
Freeman, L.C., 1992. The sociological concept of ‘group’: an empirical test of two
models. American Journal of Sociology 98 (1), 152–166.
Friedkin, N.E., 2004. Social cohesion. Annual Review of Sociology 30, 409–425.
Gest, S.D., Moody, J., Rulison, K.L., 2007. Density or distinction? The roles of data
structure and group detection methods in describing adolescent peer groups.
Journal of Social Structure 8 (1).
Girvan, M., Newman, M.E.J., 2002. Community structure in networks. Proceedings of
the National Academy of Science 99, 7821–7826.
Good, B.H., de Montjoye, Y.-A., Clauset, A., 2010. The performance of modularity
maximization in practical contexts. Physical Review E 81, 046106.
Granovetter, M., 1985. Economic action and social structure: the problem of
embeddedness. American Journal of Sociology 91, 481–510.
Kernighan, B.W., Lin, S., 1970. An efficient heuristic procedure for partitioning graphs.
Bell Systems Technical Journal 49, 291–307.
Kim, J.-O., Mueller, C.W., 1978. Introduction to factor analysis. SAGE.
Krivitsky, P., Handcock, M., Raftery, A., Hoff, P., 2009. Representing degree distributions, clustering, and homophily in social networks with latent cluster random
effects models. Social Networks 31, 204–213.
Luce, R., Perry, A., 1949. A method of matrix analysis of group structure. Psychometrika 14 (2), 95–116.
To protect the rights of the author(s) and publisher we inform you that this PDF is an uncorrected proof for internal business use only by the author(s), editor(s), reviewer(s), Elsevier and typesetter
TNQ Books and Journals Pvt Ltd. It is not allowed to publish this proof online or in print. This proof copy is the copyright property of the publisher and is confidential until formal publication.
ISB2 43112
Clustering and Cohesion in Networks: Concepts and Measures
MacRae, D., 1960. Direct factor analysis of sociometric data. Sociometry 23 (4),
360–371.
Mark, N., 1998. Beyond individual differences: social differentiation from first principles. American Sociological Review 63 (3), 309–330.
Martin, J.L., 2009. Social Structures. Princeton University Press.
Moody, J., White, D.R., 2003. Structural cohesion and embeddedness: a hierarchical
concept of social groups. American Sociological Review 68 (1), 103–127.
Moody, J., 2001. Peer influence groups: identifying dense clusters in large networks.
Social Networks 23 (4), 261–283.
Mucha, P.J., Richardson, T., Macon, K., Porter, M.A., Onnela, J.-P., 2010. Community
structure in time-dependent, multiscale and multiplex networks. Science 328,
876–878.
Newman, M.E.J., 2004. Detecting community structure in networks. The European
Physical Journal B 38, 321–330.
Newman, M.E.J., 2006. Modularity and community structure in networks. Proceedings
of the National Academy of Sciences of the United States of America 103 (23),
8577–8582.
Newman, M.E.J., Girvan, M., 2004. Finding and evaluating community structure in
networks. Physical Review E 69 (2), 026113.
Palla, G., Derenyi, I., Farkas, I., Vicsek, T., 2005. Uncovering the overlapping
community structure of complex networks in nature and society. Nature 435
(7043), 814–818.
Pescosolido, B., Rubin, B.A., 2002. The web of group affiliations revisited. American
Sociological Review 65, 52–76.
Porter, M.A., Onnela, J.-P., Mucha, P.J., 2009. Communities in networks. Notices of
the American Mathematical Society 56, 1082–1166.
7
Powell, W.W., White, D.R., Koput, K.W., Owen-Smith, J., 2005. Network dynamics
and field evolution: the growth of interorganizational collaboration in the life
sciences. American Journal of Sociology 110 (4), 1132–1205.
Reichardt, J, Bornholdt, S., 2006a. Statistical mechanics of community detection.
Physical Review E 74 (1), 016110.
Reichardt, J, Bornholdt, S., 2006b. When are networks truly modular? Physica D:
Nonlinear Phenomena 224 (1–2), 20–26.
Schaeffer, S.E., 2007. Graph clustering. Computer Science Review 1, 27–64.
Seidman, S.B., 1983. Network structure and minimum degree. Social Networks 5 (3),
269–287.
Seidman, S.B., Foster, B.L., 1978. A graph-theoretic generalization of the clique
concept. The Journal of Mathematical Sociology 6 (1), 139–154.
Simmel, G., 1950. In: Wolff, Kurt H. (Ed.), The Sociology of Georg Simmel. Free Press,
New York.
Uzzi, B., 1999. Embeddedness in the making of financial capital: how social relations
and networks benefit firms seeking financing. American Sociological Review 64 (4),
481–505.
Watts, D.J., Strogatz, S.H., 1998. Collective dynamics of ‘small-world’ networks.
Nature 393 (6684), 440–442.
White, D.R., Harary, F., 2001. The cohesiveness of blocks in social networks: node
connectivity and conditional density. Sociological Methodology 31, 305–359.
White, H.C., Boorman, S.A., Breiger, R.L., 1976. Social structure from multiple
networks. I. Blockmodels of roles and positions. American Journal of Sociology,
730–780.
Wright, B., Evitts, M.S., 1961. Direct factor analysis in sociometry. Sociometry
24 (1), 82–98.
© Copyright 2026 Paperzz