Social Networks Opening the black box of link formation: Social

Social Networks 31 (2009) 271–280
Contents lists available at ScienceDirect
Social Networks
journal homepage: www.elsevier.com/locate/socnet
Opening the black box of link formation: Social factors underlying the
structure of the web
Sandra Gonzalez-Bailon
Oxford Internet Institute and Nuffield College, University of Oxford, 1 St. Giles, Oxford, UK
a r t i c l e
i n f o
Keywords:
Web
Links
Centrality
Visibility
Interorganisational networks
ERGMs
a b s t r a c t
Links play a twofold role on the web: they open the channels through which users access information, and
they determine the centrality of sites and their visibility. This paper adds two factors to the analysis of links
that aim to draw a parallel between the web and other offline interorganisational networks: the resources
that the organisations publishing online are able to mobilise, and the status or public recognition of those
organisations. Exponential random graph models (ERGMs) are used to analyse a sample of the web of
about one thousand sites, showing that both the economic resources of the producers of the sites (a proxy
to their wider pool of resources) and their presence in traditional news media (a proxy to their status)
significantly increase their probability of receiving more links, and therefore, their centrality. This adds a
sociologically relevant dimension to the analysis of the web that has been disregarded so far but that is
crucial to understand the way it distributes visibility.
© 2009 Elsevier B.V. All rights reserved.
1. Introduction
2. Visibility and the structure of the web
Links are the building blocks of the web. They open the channels through which users access information and they contribute
to define the visibility of sites by making them more prominent
for search engines. Links hold the key for the way information
is accessed online: they not only open the roads to circulate the
web but also signpost some flows of information more visibly than
others. The more links a site receives, the more visible the site
becomes because it is easier to encounter. Identifying the mechanisms that underlie the formation of links is relevant for three
reasons: first, because links hide the local mechanisms that generate the decentralised structure of the web; second, because links
determine the centrality of sites, and with that, the distribution
of visibility online; and third, because by making sites more central
and attracting audiences, links also contribute to attract investment
from the advertising market. Finding out what mechanisms generate the structure of the web is important to reproduce its efficiency
in the transmission of information; but it is also important from
a less engineering, more sociological perspective: the efficiency of
the web hinges on an uneven distribution of visibility that grants a
competitive advantage to certain web sites when it comes to reaching audiences. By unravelling the factors that lead to the formation
of links, this paper aims to shed light into the forces that give more
prominence to certain sources of information and contents.
Attention is a scarce commodity in all sorts of media, including
the web: users can only devote a limited amount of time to process information and this leads to a competition between sources
to gain their interest (DiMaggio et al., 2001, p. 313). It is therefore hardly surprising that some web sites are more successful in
that competition than others; what is new is the role that links
play in determining who gets the pole position, and the bias that
this influence might be introducing in how information is retrieved
(Lawrence and Giles, 1999). On the web a small number of sites
attract a disproportionate number of links. These sites become
the gravity centres of the web because, first, the more links they
attracted in the past, the more links they are likely to attract in the
future and, second, the more central these sites become, the more
users will end up visiting them.
Researchers have shown that a ‘rich get richer’ principle (Price,
1976) is enough to successfully reproduce the structure of the web:
if, in a network in constant growth, new nodes send links to older
nodes in proportion to the number of links they already receive,
the structure that will emerge will have the same characteristic
long-tail degree distribution exhibited by the web (Barabási and
Albert, 1999; Barabási et al., 2000). This preferential attachment
principle is not intended to capture an empirical mechanism (it is
actually a black box when it comes to explaining what makes some
websites send links to other sites), but it provides a stylised way of
capturing the basic infrastructure of the web. It also suggests that
time generates a path-dependency that is difficult to counteract
and that gives an advantage to the most senior nodes.
E-mail address: [email protected].
0378-8733/$ – see front matter © 2009 Elsevier B.V. All rights reserved.
doi:10.1016/j.socnet.2009.07.003
272
S. Gonzalez-Bailon / Social Networks 31 (2009) 271–280
On the web links not only attract further links but also larger
audiences: they open more points of entry to a given site and they
influence the method that search engines use to retrieve information (Lawrence and Giles, 1999; Pennock et al., 2002; Cho and
Roy, 2004). Search algorithms are very sensitive to the centrality
of sites when establishing the relevance of web contents because
they assume that the number of links reaching a site is a proxy
to its quality: links become a key factor in determining the ranking a site will obtain in query results (Brin and Page, 1998; Tomlin,
2003). Because users are more likely to look at the top 10 results
(Henzinger, 2007), search engines contribute to boost significantly
the popularity of sites by making them more visible and a likelier
destination. To the extent that large audiences attract the interest
of online advertisers, links become relevant not only to understand
how information is accessed, and what sources of information are
more visible, but also who benefits from the political economy of
the web.
Links are therefore the roads and the signposts of the web but
also the currency that measures value online—and it is this twofold
function, and the social implications that derive from it, what provides the starting point of this paper. So far, most approaches to the
web have assumed that links are proxies to either the quality of sites
(Brin and Page, 1998; Tomlin, 2003) or to some sort of affiliation
between the producers of those sites (Huberman, 2001; Adamic and
Adar, 2003). These studies are important because, ultimately, they
contribute to optimise techniques to retrieve information using the
structure of links, and what they represent, as the main recommendation criteria. Yet these studies have not taken into account
the impact that factors exogenous to the web, like the resources
or status of those producing the sites, might have on online linking patterns, which is particularly striking because these have long
been identified as crucial factors in shaping other interorganisational networks (Podolny, 2001; Diani, 2003; Baldassarri and Diani,
2007). The main claim this paper makes is that, given the public
function that the web serves as a form of media, more attention
should be paid to the impact that these factors have on its structure.
The argument is developed as follows. First, the paper reviews
the different approaches that have been used to analyse the structure of the web, paying special attention to how they account for
the mechanisms driving the formation of links. Then, it introduces
the empirical data on which the analyses are based. A description
is given of the procedure used to sample the web, and the producers of those sites are characterised using a number of measures
like their age, field of activity, economic resources, and presence
in traditional news media, which is used as a proxy to their status.
Section four presents the models used to identify the influence that
these characteristics have in the creation of links and therefore in
the distribution of centrality. What the models show is that, controlling for the structural properties of the network, and for the age
of sites, the richer organisations and those with higher status are
still more likely to receive links from other sites. The last section
discusses how these findings qualify previous approaches to the
structure of the web.
3. Approaches to link formation
In principle, the web offers to users whatever information they
want, as long as they know how to find it. In practice, users are more
likely to access some web sites rather than others because they are
more visible to the public. Gatekeepers to information like search
engines play a crucial rule in directing users’ attention to certain
destinations. They use the very backbone of the web, the structure
of links, to rank its contents on the assumption that links are to
sites what academic citations are to papers: an objective measure
of relevance (Brin and Page, 1998). Some researchers have gone one
step further by adding a semantic layer to the interpretation of links
and analysing them not just as proxies to quality but also to common interests and affiliations (Huberman, 2001; Adamic and Adar,
2003). For both interpretations, links are essentially recommendations; they are not necessarily an endorsement (as with scientific
citations, sources can be cited for criticism) but they are an obvious
sign of acknowledgement: site A can only send a link to site B when
it is aware of B’s existence, and each link that site B receives is a
statement that it is, at least, worthwhile a visit.
According to these two interpretations, the web is either a network of documents where links create a voting system that is used
to identify the best contents, or it is a social network driven by
homophily forces where users create links to similar others and, in
doing so, shape the overall structure. And yet the possibility that the
web, as other social networks, might also reflect an asymmetrical
distribution of resources or status, inherited from offline relations,
is somehow overlooked by these two approaches. What this possibility suggests is that the distribution of links might actually be
reflecting hierarchies that are not necessarily related to quality or
shared interests. A research institute might link to an organisation because it depends on its resources to fund its projects; or
an international NGO might not reciprocate the links it receives
from smaller organisations, even when they work on similar issues,
because it does not need them to attain visibility or legitimacy as
much as the smaller organisations need the NGO. Research on economic networks has shown that ties between organisations are not
only the channels through which information or resources flow,
but also assets that organisations use strategically to enhance their
legitimacy in the eyes of potential consumers: alliances with highstatus third parties can improve the image and perception of an
otherwise unknown organisation (Podolny, 2001). A similar distinction could apply to web links: they are the channels through
which users find their way to online contents but, to the extent that
they also signal alliances between organisations, they become an
important clue for public recognition; in that sense, links from central, legitimised sites are more valuable than links from peripheral,
unknown organisations.
Links, like alliances between companies dealing with consumer uncertainty, can contribute to improve the image of an
organisation. But to serve that purpose, links need to connect
with the right partners. Researchers of social movements, particularly those adhering to resource mobilisation theory, have
long acknowledged the importance of this strategic component in
the formation of networks. The larger the size of organisational
resources, the argument goes, the more influential and central an
organisation will be in the network: ties with such an organisation are more valuable because they give access to a larger pool
of resources (Diani, 2003). More recent research on civic associations has proposed an analytical distinction between two types of
connections: identity and instrumental ties (Baldassarri and Diani,
2007). While the former are based on a common ground of values
and interests, and promote the clustering of like-minded associations, the latter forge alliances with organisations that grant access
to the resources necessary to achieve certain goals, even when
these organisations do not necessarily share the same agenda or
principles.
These two types of connections result from different motivations, a difference that might also be reproduced on the web: links
might signal affiliation, but they can also respond to a need to obtain
resources that certain organisations would not able to get on their
own, like more traffic flow in their websites. Being connected to a
large international agency is, in this respect, more important than
being connected to a smaller, local group because it contributes
much more to improve visibility and public perception. The formation of links on the web could therefore be also related to the status
and size of the resources managed by the organisations that publish
the sites, just as it happens with other interorganisational networks.
S. Gonzalez-Bailon / Social Networks 31 (2009) 271–280
However, as the following two sections show, this possibility has yet
not been tested empirically.
3.1. Links as proxies to quality
Search engines have greatly improved the quality of their results
by using the information contained in the linking structure of the
web. This structure is interpreted as a citation graph and it is
assumed that pages that are well cited from many places around
the web are the most interesting pages to browse. Following this
logic, search engines propagate weights through the structure of
the web in a way that gives more relevance to the links sent by
the more central sites: a link sent by the World Bank weights more
than a link sent by a small grassroots organisation when it comes to
defining the relevance of the source being linked. This technology,
originally devised by Google (Brin and Page, 1998) but soon adopted
by most search engines (Tomlin, 2003), is based on a variation of
the measure of eigenvector centrality: the centrality of a site, and
its significance in defining the relevance of other sites, depends on
how central the sites linking to it are themselves.
However, the implementation of this technique rests on two
further assumptions that are more subtle but still crucial if we
are to understand the way in which the web has been conceptualised: first, that the web is essentially a network of documents,
and second, that the position of those documents in the network
has nothing to do with the attributes of the producers. Sociological
research has usually associated measures of network centrality to
power because the more central agents become in a network, the
better positioned they are to control flows of information (Cook et
al., 1983; Bonacich, 1987). Access to larger amounts of resources or
to high-status partners can accelerate the centrality of agents in a
network, but this possibility is mostly absent from how the web is
conceived by its main gatekeepers, which sets the web artificially
apart from how other interorganisational networks are formed.
Instead, search engines conceive the web as analogous to citation
networks.
In science, the number of times that papers are cited by other
papers is usually considered the best indicator of the significance of scientific work (Garfield, 1955; Cole and Cole, 1967). This
reward system sometimes penalises individuals by setting off a
rich-get-richer feedback mechanism that reinforces the visibility
of renowned scholars but overall it has been said to play a functional role because it increases the salience of discoveries from
which everybody benefits (Merton, 1968). The web, which grew as
a repository of scientific documents, shares the same type of structure as citation networks: a scale-free, long-tail degree distribution
where a minority of nodes concentrates the majority of the links
(Price, 1976; Redner, 1998; Albert et al., 1999; Broder et al., 2000).
But with its development, and the increasing participation of actors
willing to reach audiences at any cost, the web started to resemble
more a social network and less a network of documents: agents
sending and receiving links had an interest in attaining the most
visible positions and, most crucially, they were equipped with different stocks of resources to fulfil that aim. Whilst scientists do not
have the power to make their colleagues reference a paper, a funding institution might condition its grants to receiving an explicit
acknowledgement from the recipient; a small grassroots movement
might be compelled to associate with a larger organisation to reach
wider audiences; or local media platforms might collaborate and
reference each other in order to compete with the logistics of established news organisations. These dynamics can only be identified
if we approach the web not as a citation network but as an interorganisational network where links are used strategically to improve
the position of the agents involved.
A piece of evidence suggesting that links actually hide (and give
expression to) different strategic behaviour is that linking patterns
273
change across web domains. Corporate sites, for instance, are less
likely to send links to other sites: most of them occupy a section
of the web that is easy to reach but difficult to leave because there
are not many links offering a way out (Broder et al., 2000, p. 310).
In addition, when considered apart, other subsets of the web, like
university and newspapers homepages, or the sites published by
scientists, do not follow the characteristic power-law distribution:
relative to their own community, the sites that accumulate the
larger number of links are not as far away from the mode (Pennock
et al., 2002, p. 5208). These domains differ from each other in how
much they deviate from the power-law prediction, but they share a
‘winners don’t take all’ feature that qualifies previous models of the
web: when pages are compared with similar types, the unequal distribution of centrality is less extreme, and sites in different domains
do not show the same tendency to prioritise a few nodes over the
rest.
What these findings suggest is that the scale-free nature of the
web, and its distribution of visibility, hides generative mechanisms
that cannot be reduced to endogenous forces explained only in
terms of the quality of the contents published. But to conceptualise
the web as a social network, the attributes of the agents behind its
formation need to be incorporated in the analyses. Researchers who
see links as reflecting alliances between organisations follow this
line of inquiry, producing evidence in support of the claim that identity and homophily are, as in other social networks, a crucial factor
to explain the structure of the web. Yet, as the following section
shows, this line of research still leaves unexplored the instrumental role that links play both as signs of status and as channels for
the mobilisation of resources like traffic.
3.2. Links as proxies to alliances
Researchers factoring the notion of identity into the analysis
of the web have found that, online, agents also build bridges to
similar others to promote a common message. Different studies
have shown that links between sites respond to the strategies and
alliances of the organisations publishing them (Rogers and Marres,
2000; Adamic and Adar, 2003; Rogers, 2004; Adamic and Glance,
2005; Ackland et al., 2006). Individuals and organisations select
the links that come out of their sites in line with their own agenda
and interests. For instance, sites in the .com, .gov and .org domains
dealing with global climate change follow different linking styles
because the organisations behind those sites have different perspectives of the relevant issues. NGOs in the .org domain generate
the densest networks, with most of their links going to other NGOs
and governmental sites, and only a few to corporate sites; governmental sites, in turn, barely ever send links to other domains outside
.gov, whilst corporate sites do just the opposite: most of their links
are targeted outside their own .com domain (Rogers and Marres,
2000). What this research suggests is that links often respond to the
same motivation that underlies identity ties between civic associations (Baldassarri and Diani, 2007), and that visibility on the web
relies partly on relations that are forged offline, not just on the
actual quality of the contents. Links reveal the public perception
that organisations aim to build: this is why companies like Shell
send links to Greenpeace but Greenpeace refuse to send links back
to Shell (Rogers and Marres, 2000, p. 17).
The tendency of web sites to prioritise connections to likeminded organisations falls in line with the homophily principle
ubiquitous in other social networks (McPherson et al., 2001). Identifying this tendency is important because it contributes to explain
the clustering of the web and can lead to the design of more refined
search algorithms (Adamic, 1999); but it is also important because
it opens a point of connection with research on interorganisational
networks and its conceptual framework. As the previous section
argued, of particular relevance is the distinction between identity
274
S. Gonzalez-Bailon / Social Networks 31 (2009) 271–280
and instrumental ties, and amongst the latter, between ties used
to mobilise resources and ties used to signal status. However, to
test whether these two types of connections have an impact on the
structure of the web, other variables need to be taken into account
in addition to those intended to approximate identity.
Some researchers have analysed the impact of geographical distance and economic development on NGOs (Shumate and Dewitt,
2008) and university web networks (Thelwall, 2002); others have
included characteristics of political candidates to analyse the linking patterns of political sites (Foot et al., 2002), and yet others
have focused on the impact that language has on, for instance,
the formation of university web networks (Vaughan, 2006). However, these studies do not make the conceptual distinction between
identity and instrumental ties that this paper aims to test, in line
with resource mobilisation theory and with what we know about
interorganisational networks. This distinction is relevant because it
qualifies the assumption that links are proxies to the quality of websites, and sheds light into factors that affect the way information is
arranged and retrieved online.
In order to test the impact that instrumental ties have on the
structure of the web, this paper takes into account two new variables not considered so far in the literature: the economic resources
of the organisations publishing the sites, and their status or prominence in public perception. The following section gives details on
how these variables were operationalised; what follows are the two
main hypotheses driving the collection of the data:
Hypothesis 1—organisations managing larger pools of resources are
more likely to attract a higher number of links.
Hypothesis 2—organisations holding better public recognition or
higher status are more likely to receive a higher number of links.
Both hypotheses aim to identify the importance that instrumental links have in shaping the structure of the web, but they refer to
different strategic aims: tapping into a larger pool of organisational
resources in the first case, and enhancing public perception by
associating with high-status organisations in the second case. If the
web reflects the dynamics of other interorganisational networks,
these two factors (economic resources and status) will have a positive effect in the centrality of sites for reasons that do not derive
from the quality of their contents but from the prominence of the
organisations that publish them. Their sites might still contain the
best information, but their centrality would be reinforced by the
strategic interests of the organisations linking to them, regardless
of their intrinsic value.
Given what we know about the factors that shape the structure
of the web, testing these hypotheses requires introducing some
controls. Additional variables that might exert an influence in the
centrality of sites are, in the light of the literature explored above,
homophily (which has been repeatedly found to be a crucial building block of the web) and age, both of the organisations and the
sites: the assumption is that the longer a site has been online or the
older an organisation is, the more opportunities both have to accumulate links and play more central roles in the network (Barabási
et al., 2000; Diani, 2003). Considering these variables leads to the
formulation of three additional hypotheses:
Hypothesis 3—organisations working on similar issues are more
likely to send links to each other.
Hypothesis 4—older sites receive a higher number of links.
Hypothesis 5—older organisations attract a higher number of links
to their websites.
These hypotheses will be tested controlling for the structure
of the network so that the effects of the exogenous variables are
not overestimated and we control for the influence of unmeasured
attributes like geographical distance; and also to take into account
endogenous mechanisms not captured by organisational attributes,
like the tendency to reciprocate existing ties or engage in transitive
clusters. The data gathered to test these hypotheses are presented
in the next section.
4. Description of the data
4.1. Method for sampling the web
The sample of the web used in the analyses was collected following the procedure summarised in Fig. 1. First, one thousand
sites were randomly selected from the complete list of sites registered in the .org domain. We focus on this domain because it
is one of the oldest and most popular domains on the web, and
because it is one of the most representative: all sorts of organisations publish here, from charities and NGOs, environmental groups
and grassroots organisations, to UN and intergovernmental agencies, professional associations and religious groups, to name some.
Out of the initial random selection (stage A in Fig. 1), only 13% of
the sites were operative; a content analysis was performed on each
of them, obtaining information about the name and type of organisation publishing the site, and keeping track of the links sent to
other recommended sites within the domain. This excluded links to
non-relevant sites like hosting servers but also to commercial sites
in the .com domain. Following those links, additional sites were
added to the sample, and again information was obtained about
the producers of those sites and their links (stage B in Fig. 1). The
decision to proceed with the sampling following the links sent from
the operative sites as opposed to using another random selection
of the whole domain was taken for efficiency reasons: if 87% of
every thousand sites randomly selected are not operative or are
fake domains, it would have taken much more time to collect information for a network of the same size as the one considered here.
Links from operative sites, on the other hand, are more likely to
identify other sites that are also operative–hence the decision to
snowball from them.
In this second stage, a selection was made to extract from the
sample the sites published by international organisations, or organisations internationally oriented. This filter was applied for two
reasons. The first, methodological, was to avoid a bias in favour of
US organisations, which do not use country code top level domains
(as, for instance, .uk for the United Kingdom, .jp for Japan or .cn
for China). These sites might attract an unrepresentative number
Fig. 1. Data collection procedure.
S. Gonzalez-Bailon / Social Networks 31 (2009) 271–280
275
Table 1
Descriptive measures of the producers of sites in the sample.
Type of organisation
N (%)
Budget (millions, annual, dollars)
Paid staff
Media presence
Charities-NGOs
Political
Health
Environmental
Religious
Research
UN
Media
Intergovernmental
Professional
Education
Sports
Security
Total cases
217 (22)
149 (15)
116 (12)
113 (12)
85 (9)
70 (7)
60 (6)
49 (5)
30 (3)
42 (4)
15 (2)
15 (2)
6 (1)
967
22.8 (3.8–117.5)
3.0 (1.0–8.7)
17.8 (3.7–75.5)
8.9 (1.9–62.4)
11.8 (0.9–84.7)
7.2 (2.6–27)
180.4 (78–679.6)
43.8 (NA)
85.4 (12.2–571)
14 (0.8–40.1)
104.1 (54.4–153.9)
1.0 (0.2–1.5)
26 (NA)
270
26 (8–255)
2 (0–17)
17 (8–39)
2 (0–8)
23 (9–52)
2 (0–10)
22 (10–71)
3 (0–9)
27 (10–215)
1 (0–7)
24 (10–39)
4 (0–22)
541 (138–2847)
9 (1–48)
5 (5–6)
2 (0–4)
163 (68–574)
14 (4–92)
1 (1–87)
1 (0–11)
67 (38–96)
8 (3–60)
5 (0–25)
4 (0–15)
NA
18 (2–34)
302
967
Yr foundation
Yr foundation online
1986 (1968–1996)
1990 (1979–1996)
1992 (1978–1998)
1988 (1973–1995)
1963.5 (1940–1981)
1985 (1964–1994)
1971 (1951–1995)
1994 (1984–1999)
1967 (1959–1988)
1969 (1926–1991)
1973 (1925–1995)
1982 (1957–1992)
1992.5 (1961–1996)
539
1998 (1996–1999)
1998 (1997–2000)
1998.5 (1997–2000)
1998 (1996–2000)
1997 (1995–1999)
1997 (1996–1999)
1998 (1995–2000)
1999 1997–1999)
1996 (1995–1997)
1998 (1996–2000)
1996 (1994–1998)
2000 (1996–2001)
2001 (1997–2002)
909
Note: Cells with attributes show the median and the 1st and 3rd quantiles (in brackets). NA means the statistics could not be calculated due to missing values.
of links, given that the US has one of the highest percentages of
internet users. The second reason, operational, was to be able to
complement the sample with information gathered from the Yearbook of International Organisations, and test whether the attributes
of the organisations behind these sites contribute to explain their
centrality. Snowballing from the sites published by international
organisations, additional sites were added to the network, resulting
in the final sample (stage C in Fig. 1), formed by about one thousand
sites and more than seven thousand links. Analyses not reported in
this paper show that the seed sites that resulted from the first round
of data collection do not have a significant advantage in attracting a higher number of links, which means that the snawballing
procedure is not imposing artificial centrality scores.
Thirty-four of the sites in the final sample had no links with
any other sites in the sample so they were removed from the analyses. The lack of links to or from these sites does not respond to
any substantive reason: their isolation is rather an artefact of the
data collection procedure and, in particular, of the size of the sample. Had the sample been larger, most of these isolated sites would
have been connected to the other sites even if only through long
paths: given what we know about the structure of the web, only
a small percentage of sites are secluded in isolated components
(Broder et al., 2000). According to the domain total registration figures, the fraction captured with this sample amounts to roughly 2%
of all sites registered as .org; however, the real fraction of organisations sampled is probably higher given the high percentage of
fake domains that are either not available or have automatic redirections to other domains like .com. The original random sample
was selected in November 2004 and the snowballing procedure
was applied between December 2004 and March 2005.
4.2. The attributes of the producers of sites
The name and the field of activity of the organisations publishing the sites were collected as part of the content analysis in the
procedure summarised above. In addition, information about other
attributes was also collected using the Yearbook of International
Organisations (printed edition) and the annual reports published
online by the organisations themselves. These attributes included
annual budget, number of paid staff, and year of foundation. The
first two variables are intended to measure the amount of economic
resources managed by the organisations. Since the network data
was collected during 2005, the information about these attributes
corresponds to 2004, or to the last available year before that. Annual
reports were used when the Yearbook did not contain enough information: 47% of the organisations in the sample were not listed in the
Yearbook, and some more contained missing information for the
budget and paid staff variables. When annual reports were used,
budget information was collected using the total income or total
assets reported by the organisation.
Information about the year of first online publication was collected using the search engine Alexa. The status of the organisations
was measured using their visibility in traditional news media, on
the assumption that high status organisations are more visible
and receive more press coverage. This was operationalised as the
number of times the sites of the organisations were cited by international newspapers (all full-text English language news and full-text
and abstract news for other languages stored in the database LexisNexis) during the year previous to the collection of the sample.
Table 1 provides some descriptive measures of these attributes.
The first column in the table contains the categories in which
sites were classified according to the field of activity of the organisations publishing them. This classification was done manually during
the collection of the sample using the same definition given by the
organisations themselves on their websites. When the nature of
the organisation was not clearly specified, and more than one category could apply (for instance, some NGOs work in environmental
issues) a decision was made to choose the category that defined
more accurately the nature of the organisation: Greenpeace, for
instance, was classified as an environmental organisation, not as an
NGOs. This classification was done independently from the information contained in the Yearbook, which provides codes to identify
different types of organisations. The reason is that many of the
sites in our sample (47%, as mentioned above) were not included
in the Yearbook, most of them being internet-based organisations.
In order to check the robustness of this inductive classification, an
inter-rater agreement test was run. The test showed that there is
substantial agreement between the classification of sites displayed
in Table 1 and two other classifications carried by independent
researchers, with a Cohen’s Kappa coefficient of 0.61 for the three
classifications.
The second column of Table 1 specifies the relative size of each
of these categories. As the figures show, the domain sampled here
is mostly populated by charities and NGOs like Caritas, Amnesty
International or the Red Cross. Political, health and environmental organisations follow: sites like Corporate Watch, Family Health
International or Friends of the Earth amount to close to 40% of all
the sites in the sample. The less numerous categories belong to education organisations like the Institute of International Education,
sports associations like the International Athletics Foundation, and
security sites like the International Code Council Foundation.
The remaining columns in the table show information about the
attributes of these organisations. Budget and paid staff are both
measures of the economic resources managed by these organisations, and they intend to test the impact that organisational assets
have on online centrality (hypothesis 1). The distance between the
276
S. Gonzalez-Bailon / Social Networks 31 (2009) 271–280
Fig. 2. Relative inequality in the distribution of centrality.
median and the first and third quantiles show that the distribution of economic resources is significantly skewed to the right in
all categories. The richest organisations are, as one would expect,
UN and intergovernmental agencies; but charities and NGOs follow, with the American Council for Voluntary International Aid
(InterAction) managing the larger amount of resources. According
to the median, the less resourceful organisations in terms of budget are, aside from sports associations, those devoted to political
issues, like Attac International or Minority Rights Group International, although in general these organisations have a comparatively
better score when the level of economic resources is measured
in terms of number of paid staff. Due to missing values (72% for
budget and 69% for paid staff), the approximation to the economic
resources of organisations is not very reliable for the less populated
categories.
The visibility of these sites in traditional media is, as the fourth
column shows, also skewed to the right, meaning that a few sites get
most of the attention from traditional news media and that there
is an elite of high-status organisations. This variable aims to test
to what extent the status of organisations contributes to increase
the centrality of their sites (hypothesis 2). Again, the most visible organisations are intergovernmental and UN agencies, although
sites devoted to security issues, like the Institute for the Analysis of
Global Security, attain an even better visibility. In general, only 7%
of the sites were cited more than a hundred times by traditional
newspapers during the year previous to the collection of the data,
with just two of them getting more than a thousand news citations.
Finally, the last two columns examine the age of the organisations
(44% of the cases are missing) and the age of the sites (6% missing). The youngest organisations are related to media issues, like
the Independent Media Centre (Indymedia), and the oldest to religious groups, like the Alliance of Baptists. All the sites, however,
went alive online around the same time. Intergovernmental agencies seem to have a lead on publishing on the web, followed by
religious groups and research institutes like the Carnegie Endowment for International Peace.
4.3. The distribution of centrality
The network formed by these organisations is, as expected,
highly centralised in a few nodes. Fig. 2 captures the degree of
inequality in the centrality scores of the web sites according to
two measures: indegree (Freeman, 1979) and eigenvector centrality
(Bonacich, 1987). The first refers to the number of links that reach a
given site; the second measures the centrality of sites as a proportion of the centrality of the sites that link to it. As a baseline test,
the observed distributions are compared with random networks of
the same size and density assembled following a Bernoulli process.
As expected, the Gini coefficients show that the inequality in the
centrality of sites is significantly larger for the observed network,
especially according to the eigenvector measure, a variety of which
is used by search engines when determining the prominence of
sites. This inequality is not surprising giving what we know about
the long tail, scale-free properties of the web; but it provides the
empirical starting point for the analyses presented in the following
section and, in particular, for this question: What makes a few web
sites be proportionally so much better connected than the majority
of sites?
The two main hypotheses driving this paper claim that
underlying this uneven distribution of centrality there might
S. Gonzalez-Bailon / Social Networks 31 (2009) 271–280
be mechanisms similar to those that explain the formation of
other interorganisational networks. These networks rely heavily on
instrumental ties that give access to more resourceful partners, who
end up having the most central positions in the network (Diani,
2003). Online, resourceful organisations might also be more central because they count on the links sent by the organisations that
depend on their financial support: if the non-profit organisation
Family Care International is funded by the Gates Foundation, a link
will acknowledge that part of its work depends on the funds granted
by the Foundation. The same might happen with UN and intergovernmental agencies like the World Bank: this site is likely to be a hub
not only (or not necessarily) because of the quality of its contents,
but because the Bank is the pivot on which many organisations
revolve. It is in this sense that instrumental ties defining alliances
or partnerships offline might be having an effect on the structure
of the web. In addition, organisations with better resources might
also be able to hire the services that allow them to optimise their
web sites and make them more visible for search engines. If sending
a link to these sites can result in a reciprocated connection, organisations might be able to benefit from spilled-over traffic; this is
another incentive to send links to prominent sites rather than to
more peripheral organisations, again regardless of content.
Visibility in traditional media, in turn, might also exert an influence in determining the centrality of sites because it contributes to
enhance their public recognition and status (Podolny, 2001). Sites
that link to highly visible organisations might be trying to increase
their own visibility, if only by using links to influence their public perception and gain part of the audience that already trusts
the high-status organisation. A link to Greenpeace as opposed to
a less well known environmental group is more effective when an
organisation wants to send the message that it is committed to
the protection of the environment: all else equal (in this case, two
organisations working in environmental issues) the organisation
with a higher visibility might be a preferred partner because it helps
to convey a message in a more efficient way.
By taking into account the influence that offline status and economic resources have in linking patterns we can determine whether
there are significant points of connection between visibility on the
web and the dynamics that shape other interorganisational net-
277
works; this, in turn, can help us draw some important implications
about how visibility is built online and about potential biases that
this might be introducing in the way users retrieve information.
5. Disentangling the mechanisms of link formation
The models presented in Table 2 belong to the exponential group
of random graph models (ERGMs, also known as p* models, see
Snijders et al., 2006; Robins et al., 2007a,b). Given that the focus
of this paper lies in explaining the variance in the centrality (or
indegree) of sites, these models were conditioned on all outdegrees: they incorporate no structural effects predicting the number
of links that sites send, a feature that is modelled perfectly. The
models fitted without this condition were not successful because
the distribution of outdegrees is (as expected) very skewed. The
assumption when conditioning on outdegrees is that the number
of links that sites send is determined by factors that are internal
to the organisations and for which the models control as fully as
possible. This assumption is similar to that made by fixed effects
regression in longitudinal analyses to control for omitted variables
that differ between cases. The models were fitted using Siena v. 3.11
(Snijders et al., 2007).
The parameter estimates identify what affects the probability
that a site A will send a link to a site B. They are on a logit scale and
should be interpreted as unstandardised effects in logistic regression. There are two types of estimates in these models: structural
and attributes effects. As mentioned in the previous section, the
structural effects aim to act as controls for the analysis of the exogenous attributes by modelling the configurations that characterise
best the observed network. This ensures that the influence of organisations’ attributes is not overestimated, and that we control for
the influence of unmeasured attributes, but also that we explicitly
model relevant mechanisms, endogenous to the network, that are
not reducible to the characteristics of the organisations.
For instance, the structural effects in Model 1 tell us is that there
is a significant degree of reciprocity and a significant tendency to
form hierarchical connections, as suggested by the negative cyclic
triad coefficient. There is also a significant clustering, as measured
by the higher order transitivity parameter, which models not just
Table 2
The impact of resources and status on the probability of links controlling for structure, homophily, and age (ERGMs).
Parameters
Structural effects
Reciprocity
Cyclic triads
Popularity
Higher order transitivity
Association indegree and outdegree
Direct and indirect links (reach)
Indirect links (reach)
Attribute effects
Same field of activity
Paid staff (of target)
Paid staff (missing)
Media visibility (of target)
Online yr of foundation (of target)
Online yr of foundation (missing)
Yr of foundation of org (of target)
Yr of foundation of org (missing)
Large budget (of target)
Large budget—similarity
Large staff (of target)
Large staff—similarity
Large media visibility (of target)
Large media visibility—similarity
Model 1
Model 2
Model 3
Model 4
Est.
SE
Est.
SE
Est.
SE
Est.
SE
1.337
−.131
.178
2.061
−.255
−.609
.251
.087
.024
.045
.038
.007
.056
.008
1.447
−.110
.057
1.965
−.248
−.525
.243
.087
.023
.049
.038
.007
.056
.008
1.470
−.108
.094
1.922
−.241
−.497
.236
.087
.022
.049
.039
.008
.056
.008
1.453
−.108
.089
1.940
−.245
−.504
.240
.087
.022
.049
.040
.007
.059
.008
.639
.057
−.003
.134
.020
.005
.027
.006
.671
.062
.043
.119
−.056
−.301
.003
−.108
.021
.006
.030
.007
.004
.091
.000
.006
.686
.057
.034
.100
−.057
−.314
.003
−.108
.121
−.038
−.088
−.018
−.198
−.419
.022
.007
.033
.007
.004
.087
.000
.038
.096
.103
.101
.108
.103
.114
.681
.053
.019
.004
.102
−.057
−.316
.003
−.103
.007
.004
.090
.000
.037
−.200
.033
Note: Estimates significant at the 5% level are printed in bold.
278
S. Gonzalez-Bailon / Social Networks 31 (2009) 271–280
the number of triangles present in the network but the extent to
which these triangles exist in nested structures formed by more
than three nodes. Both the direct and indirect links parameters act
as controls for the estimation of transitivity: they differentiate the
effects of links that are a prerequisite for transitivity from the effects
of those links that do establish closure. The popularity parameter
estimates the tendency of certain nodes to receive a high number
of links, again placing decreasing probabilities on the higher indegrees. These parameters not only allow us to differentiate effects
that contribute to generate the same configurations, giving a more
precise account of the mechanisms that explain the emergence of
the network; they also help in preventing the degeneracy of the
models (more details in Snijders et al., 2006; Robins et al., 2007b).
Interestingly, the popularity parameter (which, again, models the
skewed indegree distribution of this network) losses the statistical
significance as more attributes are introduced in the models. This
suggests that the uneven distribution of centrality depicted in Fig. 2
is best explained in terms of factors that are exogenous to the web.
With regard to the exogenous factors, the first model tests the
influence that economic resources (as measured by number of paid
staff) and status (as measured by visibility in traditional newspapers) have on the probability of receiving links. The coefficients
are positive for the two variables, meaning that the richer and the
more visible an organisation is, the more likely it is that its website will receive a link from another organisation. These effects take
place controlling for the tendency of sites to link to similar sites: the
positive coefficient of the parameter measuring homophily (‘same
field of activity’) tells us that a link from site A to site B is more
likely if both are classified under the same category. The fourth
exogenous parameter in the model was introduced to control for
missing cases in economic resources, that is, to control for the possibility that these missing values may contribute differently to the
probability of links than the average values. This coefficient is not
significant. Overall, the convergence of the model is acceptable:
this is measured by t-ratios that summarise how much the values simulated by the model deviate from the observed values, so
the closer they are to zero, the better the convergence is. By convention, good convergence is assumed when the t-values for all
the parameters estimated are smaller than 0.15. This is the case for
all the parameters except three (those accounting for the direct and
indirect connections and the association of indegree and outdegree)
where the t-ratios are between 0.17 and 0.18.
Model 2 adds the age of the organisations to the equation, measured as the year in which they were founded and the year in which
they first started to publish online. These variables were introduced
as additional controls for the analysis of exogenous resources, and
to test for the first-movers effect: those who start to publish on
the web earlier might have more chances to become a target of
links; likewise, older organisations may be more likely to engage
in a higher number of partnerships simply because they have been
available for a longer period of time. However, as the model shows,
the data do not support the latter hypothesis: controlling for missing values, the year of foundation of an organisation has a positive
impact on the probability of links, which means that as year of
foundation increases (the younger the organisation is) the more
likely it is that it receives a connection. Year of foundation online,
in turn, generates the expected effect: the estimate is negative,
which means that the longer the organisation has been publishing
on the web, the more likely it is that other organisations will send
links to it. The impact of economic resources and media visibility
(or offline status) remains largely unchanged when controlling for
these effects. The largest t-ratio in this model is 0.10.
Model 3 tests another possible effect of resources and status: the
existence of positive and negative assortativeness. The first would
take place if the most resourceful or visible organisations (those
in the top range of the respective distributions) would prioritise
connections with each other. The second would take place in the
opposite case: when organisations in the lower ranks of the distributions prioritise connections with those in the top range. To test
for these effects, six new variables were introduced in the model.
These variables classify organisations as being part (or not) of the
top range of the distribution in economic resources (measured
with budget and paid staff) and status (measured with visibility
in news media). What these variables model is the influence that
sharing a position in the top set has on the probability of creating links and, vice versa, the influence that being in the top set
has on receiving links from the lower ranks of the distribution.
These are dichotomised variables that obtain the value of 1 when
an organisation is the top of the distribution and 0 otherwise.
None of these variables is statistically significant with the exception of the similarity effect in large media visibility: organisations
that are highly visible in newspapers, and hold better status and
public recognition, do not tend to send links to each other; actually,
the negative coefficient suggests that they rather try to avoid each
other: when two organisations share the same high-status, a link
between their sites is less likely. This supports the idea that links
are used as strategic alliances to improve the status and visibility of
organisations only when there is an asymmetrical starting point.
Again, the effects identified in the previous two models remain
largely unchanged and all t-values fall below 0.15. Model 4 confirms
these trends, leaving out non-significant effects with the exception of popularity or indegree, without which the model did not
converge well. The highest t-ratio for convergence in Model 4 is
0.12.
A goodness-of-fit test was performed with Model 4 restricting
the value of the popularity-indegree parameter to zero, and therefore hypothesizing that this is a dispensable effect once the other
exogenous variables are controlled for. The Rao efficient score test
was used (Snijders et al., 2007, p. 33), which assesses the difference between the expected indegree according to the model (where
this parameter is assumed to be 0) and the indegree distribution
observed in the network. The larger the difference is, the larger the
misfit between the model and the network. According to the test,
the corresponding p-value for the statistic measuring this difference is 0.07, so if we use a 5% level of confidence, the difference
between the observed indegree and the modelled indegree is not
statistically significant. This suggests that the exogenous variables
considered in the model manage to reproduce successfully the centrality scores observed in the network, even in the absence of the
indegree parameter.
Going back to the hypotheses formulated above, Model 4 confirms that both the economic resources of organisations and their
status are significant explanatory factors of their position in the network, much as it happens in other interorganisational networks.
This influence takes place even when the age of both sites and
organisations is taken into account, and possible assortative effects
are controlled for. The influence of resources and status also holds in
the presence of endogenous network mechanisms like reciprocity
or transitivity, which confirms the instrumental role that links play
on the web: many organisations might link to high-resources, highstatus partners because they want to profit from the advantages
associated to that partnership, for instance an increased traffic flow
in their websites—this is a likelier possibility if links end up being
reciprocated, as the reciprocity parameter suggests.
All in all, what these findings reveal is that links are not monolithic proxies to the quality of sites: they respond to social factors
that are not necessarily related to the contents of the documents
published online but rather to who is producing those contents.
In the light of these results, the web acquires a dimension that is
well known by the analysts of other social networks but that has
been disregarded so far by those who study the structure of online
connections: the relations of power embedded in the network. As
S. Gonzalez-Bailon / Social Networks 31 (2009) 271–280
the influence of exogenous resources indicates, some agents enter
the web from a position of strength that does not derive from their
online activities but from their access to economic resources and
offline visibility. These models suggest that more attention should
be paid to how these variables shape the way users access information on the web.
6. Discussion
Research on interorganisational networks has distinguished two
types of ties: those based on identity, and those based on instrumental goals, used to either gain access to resources that would
otherwise be out of reach, or to project a more recognisable
image by associating with high-status organisations. This paper has
applied this conceptual distinction to the analysis of the web using
a sample of roughly a thousand sites, and incorporating resource
and status variables to the analysis of the network structure. Building on previous research of the web, which had only explored the
identity dimension of links, this paper has shown that organisations
reproduce online many of their offline strategic alliances, responding to the same incentives to prioritise partnerships with the most
resourceful and visible organisations.
The findings reported in this paper suggest that, in line with
resource mobilisation theory, the richer organisations are the more
central on the web because they attract more links to their sites.
This does not invalidate the possibility that centrality on the web
might also contribute to increase the resources and offline visibility of organisations, particularly of those that were born with
the internet. Many studies of the web have focused on that side
of the relationship, highlighting stories of success that range from
social movements, like the Zapatista struggle, to new business
models and collaborative platforms, like for instance E-bay or
Wikipedia (McCaughey and Ayers, 2003; Garrido and Halavais,
2003; Anderson, 2007; Tapscott and Williams, 2007). These cases
have been used to illustrate how the internet in general, and the
web in particular, are democratising access to the public domain
by allowing some agents to grow in unprecedented ways, capture
the attention of the international public and obtain funding and
revenue along the way. Yet this paper focused on the less explored
dimension of how the web is still reproducing old asymmetries and
inertias. Further analysis is needed to determine how the two sides
of this influence feed on each other.
The data analysed here poses the question of how general these
results are. What the findings presented suggest is that the networks formed in other domains, like .com, or using other web
technologies, like blogs, are also shaped by the sort of exogenous
factors identified here: well established corporations would, on
average, have a competitive advantage in gaining links and users’
attention, and the blogs written by known academics or writers
would, overall, be more likely to become central than those written
by ordinary users. There is evidence suggesting that this is indeed
the case (Hindman, 2009), but further research is needed to explore
longitudinal trends and how much the influence of exogenous
attributes changes over time. The web is a fast-changing medium,
and the relevance of offline visibility, for instance, might diminish as users learn to trust the web more. The models presented
here provide a baseline against which to assess the direction of that
evolution. This assessment would help envision the future of the
web, and detect potential biases affecting the way information is
accessed.
One of the morals of the findings presented here is that comparing the web with a network of documents might be misleading
when interpreting what links represent. As explained, what determines the centrality of sites is not just the quality of the contents
but the resources and status of the producers of those contents.
Surely enough, resources also matter in the configuration of citation
279
networks: the papers produced from departments in the richest
universities are more likely to become more cited and therefore
more central. The best scientists, however, tend to self-select in
better universities precisely because of the resources these make
available; but the best scientists are still more likely to produce the
best papers. Contents published on the web cannot be assessed
using the same barometer as scientific papers: the influence of
resources is more consequential on the web, especially given that it
is as a form of public media. Studies in sociology and communication have long considered the negative effects that ownership and
concentration can have in the public role of media. If concentration
on the web is significantly affected by economic resources, this is a
trend that requires further attention and analysis.
This paper has presented empirical data that uncovers some
of the forces that promote the formation of links between two
sites. The mechanisms that underlie the formation of the web
are particularly relevant because the structure of the web is used
by most search engines as the main recommendation criteria to
rank their results. This has surely improved the quality of searches
but it might also be introducing biases in how information is
accessed that, at least, are worthwhile identifying. If sites obtain
a competitive advantage in attaining visibility on the basis of their
economic resources and presence in traditional media, then the
web might not be distributing visibility as meritocratically as it is
often assumed. This paper has tried to uncover into this dimension of the web by showing that online networks follow similar
dynamics to other interorganisational networks. Focusing on the
mechanisms that online and offline networks have in common will
give us a better understanding of how the web evolves and gives
access to information.
Acknowledgements
Thanks to Tom Snijders for advice and guidance and to Michael
Biggs, Tak Wing Chan, Jon Fahlander and Mike Thelwall for their
comments and suggestions to previous versions of this paper. I am
also grateful to three anonymous reviewers for their recommendations and to Lucy Power and Nesrine Abdel-Sattar for their research
assistance. This work has been supported by the Economic and
Social Research Council (ESRC, grant number PTA-026-27-1334),
and it has benefited from the R + D project SEJ2006-00959/SOCI
financed by the Spanish Ministry of Education and Science.
References
Ackland, R., O’Neil, M., Bimber, B., Gibson, R., Ward, S., 2006. New methods for
studying online environmental-activist networks. In: Paper Presented at the
International Sunbelt Social Network Conference, Vancouver.
Adamic, L., 1999. The small world web. In: Abiteboul, S., Vercoustre, A.-M. (Eds.),
Lecture Notes in Computer Science. Springer, New York, pp. 443–454.
Adamic, L., Adar, E., 2003. Friends and neighbors on the web. Social Networks 25,
211–230.
Adamic, L., Glance, N.S., 2005. The political blogosphere and the 2004 U.S. election:
divided they blog. In: 2nd Annual Workshop on the Weblogging Ecosystem:
Aggregation, Analysis and Dynamics, WWW 2005, Japan.
Albert, R., Jeong, H., Barabási, A.L., 1999. Diamater of the world-wide web. Nature
401, 130–131.
Anderson, C., 2007. The Long Tail. How Endless Choice is Creating Unlimited Demand.
Random House, London.
Baldassarri, D., Diani, M., 2007. The integrative power of civic networks. American
Journal of Sociology 113, 735–780.
Barabási, A.L., Albert, R., 1999. Emergence of scaling in random networks. Science
286, 509–512.
Barabási, A.L., Albert, R., Jeong, H., 2000. Scale-free characteristics of random networks: the topology of the world wide web. Physica A 281, 69–77.
Bonacich, P., 1987. Power and centrality: a family of measures. American Journal of
Sociology 92, 1170–1182.
Brin, S., Page, L., 1998. The anatomy of a large-scale hypertextual web search engine.
Computer Networks and ISDN Systems 30, 107–117.
Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., 2000. Graph structure
in the Web. Computer Networks 33, 309–320.
280
S. Gonzalez-Bailon / Social Networks 31 (2009) 271–280
Cho, J., Roy, S., 2004. Impact of search engines on page popularity. In: WWW2004,
New York, NY, US.
Cole, S., Cole, J.R., 1967. Scientific output and recognition: a study in the operation of the reward system in science. American Sociological Review 32, 377–
390.
Cook, K., Emerson, R.M., Gillmore, M.R., Yamagashi, T., 1983. The distribution of
power in exchange networks: theory and experimental results. American Journal
of Sociology 89, 275–305.
Diani, M., 2003. ’Leaders’ or brokers? Positions and influence in social movement
networks. In: Diani, M., McAdam, D. (Eds.), Social Movements and Networks.
Relational Approaches to Collective Action. Oxford University Press, New York.
DiMaggio, P., Hargittai, E., Russell Neuman, W., Robinson, J.P., 2001. Social implications of the internet. Annual Review of Sociology 27, 307–336.
Foot, K.A., Schneider, S.M., Dougherty, M., Xenos, M., Larsen, E., 2002. Analyzing linking practices: Candidate sites in the 2002 US Electoral Web Sphere. Journal of
Computer-Mediated Communication 8, 4.
Freeman, L.C., 1979. Centrality in social networks: conceptual clarification. Social
Networks 2, 215–239.
Garfield, E., 1955. Citation indexes for sciences. Science 122, 108–111.
Garrido, M., Halavais, A., 2003. Mapping networks of support for the zapatista
movement: applying social-networks analysis to study contemporary social
movements. In: McCaughey, M., Ayers, M.D. (Eds.), Cyberactivism: Online
Activism in Theory and Practice. Routledge, London.
Henzinger, M., 2007. Search technologies for the internet. Science 317, 468–471.
Hindman, M.S., 2009. The Myth of Digital Democracy. Princeton University Press,
Princeton, NJ.
Huberman, B.A., 2001. The Laws of the Web: Patterns in the Ecology of Information.
MIT Press, Cambridge, MA.
Lawrence, S., Lee Giles, C., 1999. Accessibility of information on the web. Nature 400,
107–109.
McCaughey, M., Ayers, M.D. (Eds.), 2003. Cyberactivism: Online Activism in Theory
and Practice. Routledge, London.
McPherson, M., Smith-Lovin, L., Cook, J., 2001. Birds of a feather: homophily in social
networks. Annual Review of Sociology 27, 415–444.
Merton, Robert K., 1968. The Matthew effect in science. Science 159, 56–63.
Pennock, D.M., Flake, G.W., Lawrence, S., Glover, E.J., Lee Giles, C., 2002. Winners
don’t take all: characterizing the competition for links on the web. Proceedings
of the National Academy of Sciences 99, 5207–5211.
Podolny, J.M., 2001. Networks as the pipes and prisms of the market. American
Journal of Sociology 107, 33–60.
Price, D.S., 1976. A general theory of bibliometric and other advantage processes.
Journal of the American Society for Information Science 27, 292–306.
Redner, S., 1998. How popular is your paper? An empirical study of the citation
distribution. The European Physical Journal B 4, 131–134.
Robins, G., Pattison, P., Kalish, Y., Lusher, D., 2007a. An introduction to exponential
random graph (p*) models for social networks. Social Networks 29, 169–172.
Robins, G., Snijders, T.A.B., Wang, P., Handcock, M.S., Pattison, P., 2007b. Recent developments in exponential random graph (p*) models for social networks. Social
Networks 29, 192–215.
Rogers, R., Marres, N., 2000. Landscaping climate change: a mapping technique for
understanding science & technology debates on the world wide web. Public
Understanding of Science 9, 141–163.
Rogers, R., 2004. Information Politics on the Web. The MIT Press, Cambridge, MA.
Shumate, M., Dewitt, L., 2008. The North/South divide in NGO hyperlink networks.
Journal of Computer-Mediated Networks 13, 405–428.
Snijders, T.A., Pattison, P., Robins, G., Handcock, M.S., 2006. New specifications for
exponential random graph models. Sociological Methodology 36, 99–153.
Snijders, T.A.B., Steglich, C.E.G., Schweinberger, M., Huisman, M., 2007. Manual of
SIENA version 3. ICS, University of Groningen, Groningen.
Tapscott, D., Williams, A., 2007. Wikinomics. How Mass Collaboration Changes
Everything. Atlantic Books, London.
Thelwall, M., 2002. Evidence for the existence of geographic trends in university web
site interlinking. Journal of Documentation 58, 563–574.
Tomlin, J.A., 2003. A new paradigm for ranking pages on the world wide web. In:
WWW2003, Budapest, Hungary.
Vaughan, L., 2006. Visualizing linguistic and cultural differences using web co-link
data. Journal of the American Society for Information Science and Technology
59, 628–643.