ITR/SOC: The Structure and Dynamics of Electronic Social Networks

ITR/SOC: The Structure and Dynamics of Electronic Social Networks
James W. Moody, PI, The Ohio State University
Project Description
1
Specific Aims and Scientific Motivation
Recent computer epidemics, such as ‘Melissa’ that spread using people’s email address books, demonstrate
the importance of social connectivity among electronic communication users. The effectiveness of these types
of virus rest on the fact that people’s social contacts are interconnected in a vast electronic network. Sociologically, it is impossible to assume that such a contact network is randomly structured. Social, economic, geographic and interpersonal factors likely determine the structure of this network. Understanding the basic topographical features of this network and the processes by which the network develops and changes are essential
components for understanding how new information technology affects people’s lives.
Unfortunately, we know very little about the factors that shape computer based relationships in natural,
non-organizational settings (for work on the structure of Usenet posts, see Smith 1999). At the same time, the
prevalence of computer communication is rising rapidly, with over 56% of American adults and close to 250
million people worldwide online (compared to just 9% of American adults online in 1995 (Taylor 1999)). Researchers have studied the topography of computer-based social networks in local and organizational settings
since the early 1980s (see Rice 1994; Wellman et al. 1996 for reviews of the state of the computer mediated
communication literature), but they have not paid similar attention to relations in unconstrained settings. Computer based communication, such as email, provides users with the ability to segregate audiences and hide features of their own identity that is impossible in face-to-face communication (Donath 1999). As such, the structure of social relations may differ dramatically from those we have commonly mapped among coworkers and
friends, providing a unique insight into the development of relational structure when the power of external attributes is minimized (Mark 1998). That many people are involved in relations that sociologists have yet to
document represents a glaring lacuna in our understanding of contemporary culture.
Traditionally, social network researchers have focused almost entirely on very small networks (usually less
than 100 actors, rarely more than 1000). This bias toward small group research makes it impossible to ask
questions about society-wide social structure, or to test the society-level theories developed by early network
theorists (see for example, Pool and Kochen 1978; Rapoport and Horvath 1961). The focus on small groups has
been pragmatic, since collecting network data on most forms of social interaction is expensive and efficient
tools for analyzing large networks have only recently become available (Batagelj and Mrvar 1999; White,
Batagelj, and Mrvar 1999). It is in this respect that computer-based social networks have a distinct advantage.
Using the World Wide Web, we can survey computer communication networks at extremely low cost. After an
initial investment in survey design and equipment, the marginal cost of each additional respondent is almost
zero. Computer-based social networks are thus ideal for providing researchers with a testing ground for substantive and methodological work on large social networks, extending the promise of this work beyond the constrained settings of small groups.
In light of these considerations, this proposed project has 4 specific aims: (1) to map the social topography
of electronic communication networks, (2) to identify the quality and content of computer-based relations, (3) to
identify how relations form and change over time, and (4) to provide the social network and information technology research communities with a public dataset of large, dynamic social networks. When met, these four
aims will allow multiple researchers the opportunity to develop theoretical and methodological approaches to
problems surrounding large social networks in a new and rapidly expanding social context.
1.1 Mapping The Topography of Electronic Communication Networks
Electronic communication networks, like all social networks, can be represented as finite graphs, with each
person in the graph represented as a point and each relation represented as an arc in the graph (see Freeman,
White, and Romney 1992; Wasserman and Faust 1994; Wellman and Berkowitz 1997 for a general introduction). This representation of the network allows one to use graph theoretic tools to map the social structure. For
this project, I propose to measure 6 dimensions of the social network’s topography (see §3 for measurement
details):
1
1.1.1 Social Distance
Most social interaction is constrained by salient social divisions. Moreover, such divisions usually correlate, creating a rigid social structure (Bourdieu 1989; Sewell 1992). In a network, social distance is captured by
the number of relational steps separating two individuals. Early network researchers noted that the distribution
of distance in a society can be used to directly measure the pattern of social divisions (Pool and Kochen 1978;
Rapoport and Horvath 1961), providing a direct measure of social structure. Most network studies in small,
bounded settings are limited to a section of the population that is much more homogeneous than the population
at large. By examining relational patterns without regard to organizational boundaries, we can potentially reach
respondents from all sections of the online community. As such, we gain a much more accurate picture of relations between people from very different class, race and cultural backgrounds. Such a network will provide us
with an interaction-based measure of the national social structure.1
1.1.2 Social Cohesion
One of the key features of any social system is the extent to which actors are bound together in social
groups (Durkheim 1984). Recent theoretical advances have shown that redundancy in social networks is the
essential element of social cohesion. Cohesive groups emerge when members are connected – and re-connected
– through multiple independent pathways (Brudner and White 1997; Harary and White 1999; Moody and White
1999; White 1998; White et al. 1999. See also White’s NSF grant: BCS-9978282). The structural features of
cohesive groups provide a theoretical base for how information and resources flow through the network. For
example, information in a multiply connected group will likely flow freely, since no single member can control
the flow of information among actors. The circulation of information made possible by connectivity helps to
reinforce cultural standards, solidify group power, and lower hierarchical divisions within the group. Empirical
work across widely varying substantive domains has confirmed these theoretical expectations. We find, for
example, that members of cohesive groups are likely to have a stronger sense of community and act in uniform
ways politically, (Moody and White, 2000). Brudner and White (1997) show that structurally cohesive groups
based on marital ties correlated with a stratified class system of single-heir succession of productive farmlands
in Austria. Similarly, White et al (1999) showed that cohesive groups defined by marital ties among Mexican
villagers were restricted to a select core of families who were resident for several generations, excluding recent
immigrants, while ritual kinship ties (between parents and godparents) crosscut this core to integrate immigrants
into an egalitarian class structure.
Identifying cohesive groups within the electronic communication networks will allow me to inductively
identify salient social communities in cyberspace, and evaluate how cohesive group membership relates to
community identification, ascribed characteristics, and social behavior. Sociologically, the private nature of
electronic communication provides a potentially rich and radically new setting to study social cohesion. Actors
can create individual identities for separate audiences and thus segregate their (local) audiences from each other.
Such action may lead to graphs that are less clustered than those observed in most social network research.
However, the same process that might lead to widely ranging ties at the individual level, may simultaneously
lead to more cohesive social structures. By bridging different groups, multiple independent paths between actors will result. Thus, ironically, a local attempt to segregate relations may result in a globally cohesive network.
1.1.3 Small World Structure
The conventional wisdom that no two people are separated by more than ‘6-handshakes’ vividly captures
the image of a small world network (Milgram 1969). The small world phenomena rests on a unique combination of distance and clustering in social relations. Actors must be embedded in dense local clusters that are interconnected through a small number of bridging ties. Such small-world graphs have properties we might expect from random graphs (short overall distances between people) while maintaining features that would only
be found in structured graphs (such as coordinated dynamic action). While the general features of small-world
graphs have been known since the late 1960s, recent work (Watts 1999; Watts and Strogatz 1998) has clarified
the mathematical properties underlying small-world graphs. Importantly, Watts developed a set of scalable pa1
Blau’s work on intermarriage patterns provides the closest example of a nation-wide image of interaction structure
as social structure, of course, since one can be married to only one person at a time, the topography of this structure is
quite limited (Blau 1994; Blau and Schwartz 1984).
2
rameters that can measure the extent to which a given graph has a small-world structure. If electronic networks
are locally clustered, but contain bridges between such groups, then a small world structure is quite likely. By
identifying the magnitude of the small-world features of an electronic network, we can build on the known
characteristics of such graphs, providing insight into how information flows through the network.
1.1.4 Social Balance
A cumulative research tradition in social networks has demonstrated that relatively simple individual action rules will generate global networks that are both highly clustered and hierarchically ordered (Cartwright
and Harary 1956; Davis 1963; Davis 1970; Davis and Leinhardt 1972; Johnsen 1985; 1986; Moody 1999c).
The guiding rule at the base of this work is social balance theory. Simply stated, social relations are balanced
when they are transitive, that is, when a friend of a friend is also a friend. Psychologically, balance theory rests
on the assumption that associating with people who do not like each other generates strain (Festinger 1957).
When this happens, people are expected to change their relations to create balanced networks. While balance
theory is based on attempts to reconcile local conditions, the theory has profound implications for the way complete social networks are shaped and evolve. It can be shown, for example, that under certain circumstances the
resulting global graph will consist of a set of tight-knit groups that are embedded in a wider field of loosely
connected people. The groups in this structure emerge and dissolve in an ever-shifting pattern, as each actor’s
local choices to balance their own networks creates imbalance for others, who then respond in kind. As a result,
the broad macro-level topography – of an ordered set of groups – remains throughout, but the membership
within such groups is constantly in flux.
Since electronic relations consist of largely private dyadic exchanges, the third-person pressures evident
in most balance models may play a weaker role. Similarly, if people can manipulate their identities with different interaction partners, they may be able to segregate social worlds to thwart the kind of clustering social balance usually produces. Thus, the computer communication network provides a unique challenge to one of the
best-supported theories of social network formation, and a unique opportunity to test the model in a dynamic
new setting.
1.1.5 Overlap of Category and Network
In most social networks, there is a correspondence between actors’ attributes and the pattern of their relations. In fact, it has been argued that the social salience of such categories is directly related to the extent that
they shape social relations (Freeman 1972), and that the category itself often results from regular social interaction (White 1965).
The correspondence between social interaction and attributes provides an opportunity to study substantive
social segregation. In many social settings, heterogeneous groups share the same physical space, and are thus
formally integrated. If, however, people in the setting only interact with people from their own group, then the
setting is substantively segregated. Thus, while the number of minorities who use electronic communication is
increasing2 (Taylor 1999), if electronic relations are focused entirely within race, then the Internet is still substantively segregated (see Moody 1999a).
The extent of substantive segregation in communication networks may differ from face-to-face relations for
two reasons. First, actors’ attributes are not as public in email as in face-to-face communication. Actors can
choose whether to reveal their attributes, and thus interaction can be based on substantive interest instead of
ascribed status. Secondly, since electronic media are not restricted to physical focal organizations (Feld 1981),
segregation that results from differential interaction opportunities (such as school tracking) may not play a role.
We should not jump to the conclusion, however, that segregation will not persist on the Internet. First, if
race is important to the actor, then they will likely seek out others who share the same opinions, and there are
fewer social controls to limit the formation of deviant groups. We find then, that the web has become an ideal
setting for skin-heads seeking to build a white-only community. Secondly, to the extent that electronic media
simply mirror other personal relations, we will expect relationship segregation levels similar to those observed
2
American Internet users are predominately white (81% compared to 76% of the US population) better educated
(64% with at least some college education, compared to 48% of the total US population) and have a higher household
income (41% earn more than $50,000 a year, compared to 32% of the total population) than the nation as a whole
(Taylor 1999, see also CyberAtlas 2000; Newburger 1997).
3
in friendship networks. Third, even if electronic relations are integrated with respect to ascribed characteristics
such as race, they may still be very homogeneous with respect to other characteristics (a group formed around
an online game, for example). Empirically, research on Usenet groups shows that race and gender are both
pieces of information people seek out in online communication (Burkhalter 1999; O'Brien 1999). One of the
primary results of this work will be to identify the extent of social integration on the Internet.
1.1.6 Network Position
Network topography has significant implications for an individual’s social position in the group. Networks
are relationally differentiated,3 and we can thus distinguish types of actors based on the pattern of relations they
are involved in. First, role positions in any group can be identified through regular interaction patterns (Nadel
1955; Lorrain and White 1971; White et al. 1976). For example, we can identify people who are liaisons between multiple disconnected groups or people who are at the top of a relational hierarchy (members of a
school’s leading crowd (Coleman 1961), for example). By identifying the most common interaction patterns in
a network and identifying how positions relate to each other, we can identify the role system for any given network. Secondly, we can characterize each actor’s position at the individual level through measures such as
network centrality. These measures situate each person in the social topography relative to the position of every
other person in the network. Our current understanding about position in electronic networks comes from
studies of (comparatively) small settings within organizations, and thus not surprisingly interaction patterns tend
to follow the organizational chart (Rice 1994). This project will allow us to describe the array of positions in
non-organizational settings, and thus understand the links between general interaction and behavior outside of
the organizational contexts of previous work.
1.2 Evaluating the Use and Content of Social Relations
The meaning of any global network structure depends as much on the content as on the pattern of social
relations. Traditionally, social network researchers have relied on relations with strong face validity (such as
‘friendship’ or ‘coworker’) where the substantive meaning of the relation could be readily inferred. Computer
based relations, however, do not have the same unambiguous meaning, and thus we need to identify the content
of the relation directly. We can do this by identifying how people use their computer relations and letting them
evaluate the quality of such relations.
I propose to identify the relational content of electronic communication in two ways. The first method for
determining the content of relations is to ask respondents to identify relevant dimensions of each relationship.
For example, I will ask respondents to rate relations on the content of the communication (work vs. personal for
example), the extent to which they trust the people they email often, how often they use the link for social support (either giving or receiving) and how important this person is in their lives. Secondly, relationships are
often multi-layered. People are friends with their co-workers and work with their relatives. By identifying how
various relationship types overlap, we can better characterize the content of each relation. Thus, using algebraic
techniques on local networks (Mandel 1983; Pattison 1993) I can identify the characteristic patterns of electronic relations relative to well known relations such as friendship and kin.
1.3 The Network Dynamics of Computer Based Relations
While most social network research had tended toward static, cross-sectional research, there has been a
strong recent interest in the dynamic aspects of social networks (Doreian 1986; Galaskiewicz and Wasserman
1981; Hummon and Fararo 1995; Leenders 1996;1995; Morgan et al. 1997; Stokman and Doreian 1996; Weesie
and Flap 1990; Zeggelink 1994; Zeggelink et al. 1996). This surge in dynamic research comes from the realization that to understand the properties of any social system, one needs to understand the trajectory of both actors within the system and of the global characteristics of the system. To that end, I propose to follow a sample
of actors and network clusters four times over the course of a year, and resample the full network a year after
the first contact.
At the individual level there are two relevant dimensions of relational change. First, we want to document
ego-level changes in relational behavior, such as changes in the volume of computer communication partners
and changes in the frequency of contact with any given partner. Such changes can be modeled as a function of
changes in the relational environment ego is embedded within as well as changes in individual characteristics
3
Except in the rare case of a completely random network or completely connected clique.
4
and life-course position. Second, I will document changes in an individual’s position relative to the wider network he or she is embedded within. Since the structural features of a given network are largely independent of
the actions of any single actor, an individual’s position in the global network can change even if he or she
makes no changes in their own relations. To document positional changes, we will identify sequences of positions that actors occupy over time. The set of all such position sequences will enumerate the evolving role
structure of the group. Given the rapid growth and newness of the Internet, these images will provide a unique
vision of the early development of a large interacting social system.
1.4 Public Use Database
Scientific study benefits from multiple perspectives and approaches to any given problem. The smallnetwork bias evident in most social network research is due, at least in part, to a relative lack of data on large
social networks. This project will provide the social network research community with multiple, large networks
that will push the frontier of methodological and theoretical network research. I will provide two types of data
to best serve the varying needs of the research community. First, after removing any identifying characteristics
from the network records, I will make the raw network data available to any researchers through my web page
or by placing the data in centralized data archives. This raw data will include both the network adjacency
structure, as well as a wide range of substantive and demographic attributes of the respondents. Researchers
will then be free to develop and test new methods for large-scale network research as well as substantive theories about electronic communication networks and social behavior. Second, since many researchers have substantive interests that would benefit by including network measures but don’t have the technical training to construct such measures, I will also make a constructed network dataset available that any researcher will be able to
use with standard sociological analysis techniques. This will provide the general research community with a
rich, multiple use dataset on a unique sample of people who are active online.
2. Relation to Investigator’s previous work
2.1 Adolescent Social Networks
This project extends my previous substantive work on adolescent friendship networks and my methodological work on large-scale social networks into a new substantive domain over a larger scale. My previous
work focused on friendship and romantic relations among adolescents (Bearman et al. 1997b; Moody 1999c)
and large scale social cohesion (Moody and White, 2000). In my dissertation, (Moody 1999c) I build on a cumulative research line in sociology on the role of social balance in the development of global network structures
(Cartwright and Harary 1956; Davis 1963; Davis 1970; Davis and Leinhardt 1972; Doreian et al. 1996; Hallinan
1974; Holland and Leinhardt 1971; Hummon and Fararo 1995; Johnsen 1985). I developed a theory for how
positive relationships, such as friendship, develop and change, and identified the global network structures, and
resulting dynamic that would follow from the relationship formation process.
Empirically, I use panel network models, trajectory models of social position, and dynamic simulation to
show that the macro structure of high schools will remain constant even while relations at the local level are
continuously changing. In all of the Add Health high schools, an ordered hierarchy of friendship groups rested
within a loosely connected collection of actors who were not in cohesive groups. These ‘background’ actors
would, over time, change their relations to balance the local friendships they were involved in. In so doing,
they created imbalance for those around them and shifted the population of people who were embedded in
friendship groups. This dynamic modeling effort is one of the largest dynamic network studies completed to
date.
A second feature of my work on adolescent social relations has focused on relational race segregation
(Moody 1999a). Consider the image of “Countryside School District” below. In this figure, points represent
students and lines represent relations among students. In general, two people who have many friends in common are plotted close to each other, while two people who have few friends in common are distant from each
other. In this school, we see a clear split between White students on the left part of the figure and Black students on the right.4 When I compare students across multiple different schools, I find that racial heterogeneity
in the school setting tends to increase the tendency for students to choose friends of their own race, but that integrated extracurricular activity mitigates this same-race selection processes (Moody 1999a).
4
Within the race groups, the clustering evident is between Jr. High students (top-left) and High school students (bottom-right)
5
Figure 1. Social Relations in “Countryside” School District
Points Colored by Race
White
Black
Mixed/Other
2.2 Large Scale Social Cohesion
I have recently been working with Douglas R. White (Moody and White 1999) to extend his foundational
work on the connection between network connectivity and social cohesion (Brudner and White 1997; Harary
and White 1999; White 1998; White et al. 1999). We define cohesiveness as the minimum number of actors
who, if removed from a group, would not allow the group to remain connected. We show that this conception
of cohesion leads to hierarchically nested sets of ever-increasing connectivity. This hierarchical nesting provides a rigorous analytic operationalization of network embeddedness, which we show is a significant factor in
empirical applications as wide ranging as adolescent school attachment and the political action similarity of
corporations. Extending the insights of this work into very large networks, such as those that result from computer communication, will provide an opportunity to test for cohesion effects over great social distances.
White is a prominent figure in social network analysis and chairs the program in social network analysis at
the University of California – Irvine. He has had extensive experience analyzing very large social networks
(Brudner and White 1997; White et al. 1999) and is currently developing longitudinal models and comprehensive multiple investigator data sets focusing on large-scale social cohesion (see NSF grant BCS-9978282,
“Longitudinal Social Network Studies and Predictive Social Cohesion Theory, 1999-2002”). White will serve
as a consultant on this project and his expertise with respect to large networks, social cohesion, and identifying
equivalence positions in social networks will strengthen the quality of the project.
2.3 Large Social Networks Methods
In addition to my substantive work on relations among adolescents, I have extensive methodological experience developing techniques for large social networks (Moody 1998a; Moody 1998b; Moody 1999b; Moody
and White 1999). I have developed an integrated set of network analysis modules, including the only currently
implemented algorithm for identifying all connectivity sets in large social networks (Moody 1999b). My work
on large networks has focused on understanding the temporal features of STD flows (Moody 2000), identifying
cohesive peer groups (Moody 1998b), and enumerating the triad structure of social networks (Moody 1998c),
which are integral to describing the broad structural patterns in a social network (Johnsen 1985; 1986). Empirically, I have developed and implemented a wide range of measures for multiple large networks as part of my
work on the National Longitudinal Survey of Adolescent Health (Add Health), resulting in a publicly available
dataset of network measures for general use by other researchers (Bearman et al. 1997a).
Since much of the substantive work in social networks has focused on small groups, the techniques used to
analyze social networks tend to be inefficient when applied to the much larger (and usually much more sparse)
6
networks. I have used, and will continue to adapt, new graph exploration algorithms from computer science,
which makes analyzing large networks much more feasible than was possible even 15 years ago (see for example, Auletta et al. 1999; Ball and Provan 1983; Chartrand and Oellermann 1993; Gibbons 1985; Kanevsky 1993;
Khuller and Raghavachari 1995).
3 Research Design
3.1 Overview
The proposed project has a three-stage design. In the first stage, I will conduct a snowball sample to
identify large connected components, drawn from wide-ranging geographic areas. This sample will provide
basic demographic and global network information, as well as provide the frame from which to draw specialized sub-samples for in-depth longitudinal study. In the second stage, I will select three types of people for longitudinal study: (1) a representative random sample of the stage-one snowball sample, (2) an ego-network sample, and (3) a cohesive peer group sample. Each member of the special samples will be followed for a year and
given detailed interviews on the content and quality of their relations 4 times. At the time of the 4th interview, I
will re-contact all people from the original snowball sample for a short follow-up survey, to provide a global
context for the detailed temporal data. From these three data sources, I will be able to estimate the features of
network topography outlined in section 1, identify the quality and significance of electronic communities in the
lives of respondents, identify how relations change over time, and provide the research community with a national sample of longitudinal, electronic social networks.
3.2 Data collection
3.2.1 Stage 1: Global Network Snowball Sample
To identify properties of network distance, social cohesion and balance, we need to have data that extends
beyond the individual to the greater social network. Ideally, this would include all actors linked in a given network. Given the extreme size of the computer communication network (presumably most of the estimated 259
million people online are connected in a single giant component (Parker, 1985)), it is impossible to analyze the
entire population. Given that the majority of online activity occurs in English and the United States, restricting
the survey to those explicit criteria helps limit the size, but the number of English speaking online adults in the
Unites States is still over 100 million and thus some sampling procedure is required.
The best way to sample from a global network is to use snowball sampling techniques. A snowball sample
is an intuitive choice since the object of study, the structure of the network, defines the data collection procedure (Frank 1977; Frank 1978; Frank 1979). Starting with a geographically dispersed initial seed sample, I will
ask each respondent to name the people they email with most frequently. I will then contact the people that
they name, and ask them for the names of the people they email with most frequently, and so on.
The size of a snowball sample depends on the number of steps followed and the probability that any newly
named person has already been selected into the sample. These two quantities effectively govern the sample
size at a given sample step, and in a random network, can be closely approximated with:
pi+1 = (1-Xi)(1-e-api)
(1)
where pi is the proportion of the total population reached at the ith step, Xi is the cumulative proportion and a is
the mean degree of actors in the network. If the network is not random, but structured due to reciprocity, transitivity or clustering around ego attributes, a is effectively reduced. A network with structured ties will thus
have a reachability profile similar to a random network with smaller average degree, α (see the work of Fararo
and Skvoretz (Fararo and Skvoretz 1987;Skvoretz and Fararo 1996; Skvoretz 1983; 1985) for a detailed description of this effect).5
I thus propose to seed the snowball sample with 200 people, chosen from widely varying geographic areas.
I will then snowball out from the initial set to a depth of not more than 10 steps, or until I reach 50,000 from any
5
An implicit assumption in equation 1 is that the probability of inclusion is essentially continuous over the network.
Thus, the approximation may not hold well in heterogeneously clustered – or nested – networks.
7
initial snowball seed. 6 This then sets the maximum size of any connected component at 50,000 nodes, which
ought to be large enough to estimate connectivity parameters and to identify high-connectivity clusters within
extended neighborhood of each seed member. The sample design ensures a maximum of 200 connected components, which would provide multiple large images of the entire electronic communication network, though
there will be fewer if the initial sample seeds are less than 10 steps apart (which is likely for at least some nodes
within each chosen city).
3.2.1.1 Selecting the Snowball Seed Sample
No unified sampling frame for all electronic email addresses exists. Thus, to ensure geographic dispersion, I
will first stratify the US into 10 large geographic areas (by state combinations) and within each state area, I will
then randomly select 2 large cities (with large defined as in the top 10% of the wider geographic area). Using
on-line email search engines, such as Netscape’s Who Where People finder, I will randomly select respondents
until 10 seeds from each setting agree to participate. Each person will be chosen by randomly selecting a letter
for the last name and then selecting randomly within the list of all people whose name starts with the selected
letter in the given setting.
3.2.1.2 Snowball Sample Mechanics
All data will be collected using a Computer Assisted Data Interview (CADI) enabled Web survey on a
dedicated secure server. Since the population of interest is all people who are online, a web-based survey provides an excellent medium for collecting the network data. Most importantly, once the initial investment in
hardware and software development has been made, the marginal cost of each survey response is essentially
zero. For a network survey of the size needed to identify the macro-level properties of electronic networks, this
is a decided advantage.
The Stage 1 snowball survey will consist of a short demographic questionnaire and an email network name
generator. The demographic portion of the survey will collect data on gender, race, residence, socio-economic
status, age and family structure. The attributes identified in this portion of the survey will be used to identify
basic mixing matrices (Morris 1997) and network bias parameters (Fararo 1981; Fararo and Skvoretz 1987;
Skvoretz and Fararo 1996). The second part of the snowball sample questionnaire will be a network name generator that asks respondents to identify the people they email with most frequently. Once the names and email
addresses of each alter have been entered, I can use a JAVA applet to allow each respondent to draw the links
among the alters they nominate; providing a simple, complete ego-network generator that ought to be fun for
respondents to use. Here the respondent will identify some basic demographic characteristics of each person
(race, gender, age and occupation). This technique provides direct information on email communication from
ego and an indirect estimate of communication among ego’s alters, that will be useful for estimating linkages
among those actors who do not agree to participate.
The web survey will automatically check the newly identified names against those currently contacted. If
email addresses are nominated that have not been previously contacted, they will be sent an email informing
them about the purpose and content of the study and inviting them to participate. In order to maintain a manageable volume at the study server, we can control the number of active surveys in the field at any given time.
Survey response rates are important for any social survey, but especially so for network surveys. Most
methods and measures for networks require population data. Table 1 below shows what proportion of the
population would be reachable under different response assumptions, based on simulated clustered networks.7
We see, for example, that we would gather information on 86% of the total population (from either self reports
or alter reports) if 30% of the people we contacted agreed to participate in the survey and then agreed to provide
us with information on 70% of their contacts. Importantly, we can cover most of the observed networks even if
overall response rates are fairly low (30% - 50%). Since we will have multiple estimates of relations among
6
The 10-step limit is based on estimates derived from equation 1 and a national population of 100 million online users. Assuming a graph with an effective degree equal to that of large high schools, the 50,000 person limit will be
reached in between 6 and 7 steps from a 10-person starting sample (the number I will seed in any given community).
7
Results are from 1000 trials on simulated networks with an average degree of 11 that consisted of randomly generated primary groups of size 50 loosely embedded in larger groups of 200, which were more loosely linked in a population of 10,000. The results are only marginally different if you assume a smaller (mean degree=9) or larger (mean
degree = 13) degree value or change the size of the network.
8
people we do not sample (from their close associates), we will have some information about the structure of
relations among most of the people in the graph.
Table 1. Network coverage under various response patterns
Linkage Rate
Participation Rate
50%
70%
90%
30%
63%
86%
94%
40%
84%
95%
97%
50%
92%
97%
99%
Since we cannot know the final sample size before starting when using a snowball sample, it is impossible
to calculate a cost for individual participation inducements. More importantly, even a very small inducement,
which would likely not help increase participation rates, would result in a huge cost if given to every respondent. In keeping with many online surveys, I propose a lottery system for rewarding participation. Thus, each
respondent will be entered into a drawing for $1,000.
3.2.2 Stage 2: Longitudinal Sub-samples
3.2.2.1 Overview
The snowball sample will provide information on the broad structural features of the electronic social network. To understand the dynamics and qualitative details of electronic communication networks, we need to
follow a smaller group of people, using more detailed survey instruments, over time. In this section, I describe
three sub-samples of the original network. First, I will select a simple random sample of 2000 respondents from
the network generated by the snowball sample procedure. Second, for each member of the random sample, I
will identify the people they email with most often and bring them into a combined ego-network sample.8
Third, I will use a cohesive peer group identification method to identify cohesive communities through the pattern and frequency of interaction. While there is no pre-specified size for such groups, I will attempt to sample
a broad spectrum of groups from small (less than 100) to large (the maximum observed group size).9 Each person selected into a special sample will receive a detailed survey 4 times over the year, which will be designed to
measure changes in their local electronic networks, gauge the overlap of their electronic relations with other
social relations, and to relate attitudes and behaviors to network position.
3.2.2.2 Simple Random and Ego-Network Samples
The snowball samples will extend over a potentially very large population. To gather greater information
about this population, and to understand how people in varying positions in the larger network behave, I will
sample a group of people at random from the entire snowball sample. Because this will be a representative
sample of the snowball networks, I will be able to relate attitudes, behaviors and network activity to a respondent’s position in the overall network. This will provide a broad base of information to identify how positions in
the original snowball network relate to behaviors.
For each person in the random sample, I will also select the people they are adjacent to in the electronic
network. This will provide the local sociometric context within which each actor is situated, and thus allow for
a comparison of activities between ego and his or her network neighbors. Changes in behavior in the local network can then be linked to an individual’s actions, providing a direct context for each person’s behaviors.
Moreover, the detailed ego-network data will provide information on mixing patterns for many more attributes
than collected in the short snowball sample form.
3.2.2.3 Cohesive Peer Group Sample
While the ego-network samples provide information on the local context actors are embedded within, the
promise and scientific interest of network analysis comes from looking beyond the individual to the wider
groups he or she is embedded within. Extending a method I have developed for identifying cohesive peer
8
The expected sample size will be the mean degree times 2000, minus any overlap. To make the task of reporting on
relations manageable for respondents, the size will be limited to 20 alters. Previous research indicates that people
have between 11 and 17 close relations (Fischer 1982, Wellman 1992).
9
As a benchmark, the mean size of such groups in the Add Health data was 22 members.
9
groups in large networks, I will construct groups from the identified snowball sample based on two principles:
the volume of interaction and the cohesive pattern of interaction. For the purposes in this project, I will first
implement a tri-connected component algorithm (Hopcroft and Tarjan 1973). Identifying tri-components will
reduce size of the sub-graph I need to search over, simplifying the remaining search. Within each tricomponent, I identify partitions that maximize the number of within-group ties and minimize the number of
between-group ties, while ensuring that the graph is at least bi-connected (for a method that is conceptually
similar but does not include the connectivity restriction, see Frank 1995). This results in an interaction group
that is both cohesive and heavily interactive.
3.2.2.4 The Stage-2 Survey Instrument
All respondents selected into the stage-2 sample will receive the same survey, again administered through a
Web based CADI system on a secure dedicated server. This survey will consist of three modules: a general
social survey module, an Internet behavior module, and a network name generator.
The general social survey section will include items on demographics and family structure, employment,
attitudes and feelings, and other commonly studied social behaviors. Items will be chosen to maximize the potential usability by other social scientists who want to understand how network processes affect important sociological questions. I will query network researchers, though organizations such as the International Network
for Social Network Analysis (INSNA), on topics that they would prefer to see included in the survey. Certain
elements are sure to be included, such as questions that focus on community involvement and identification,
which help identify the place of electronic relations in the lives of respondents. Whenever possible, questionnaire items will be take from well known general social surveys, such as the General Social Survey (GSS), the
National Longitudinal Survey of Youth (NLSY), the Current Population Survey, and the National Longitudinal
Survey of Adolescent Health (Add Health). This will allow researchers to compare responses on the network
sample to a known national probability sample. Researchers will then be able to relate questions about network
position and composition to a wide range of substantive topics, enriching the scientific return to the data collection greatly.
In addition to general social behavior questions, I will include a module specifically designed to understand
the qualitative aspects of electronic communication networks. Respondents will be asked to identify how much
time they spend online with their common email friends and how important such relations are in their everyday
lives. They will be asked to identify how often they use such contacts for typically studied network effects,
such as social support (Wellman 1992; Wellman and Wortley 1990), help getting jobs (Granovetter 1973), information gathering (Buskens and Yamaguchi 1999; Friedkin 1991; Meyer 1994), and companionship (Bell
1988; Duck 1991; Leenders 1996; Zeggelink 1993).
The third section of the detailed interview will consist of a replication of the GSS social network module
for close friends, without any referent to the electronic network. This module will allow us to construct egonetworks that are not necessarily constrained to electronic relations, and thereby compare the email networks to
general friendship networks. By also collecting data on family structure and whether or not any named alter is
kin we will be able to compare electronic kinship nets and friendship networks.
As with the first stage instrument, response rates are important – especially with the cohesive network sample. Given the longer time commitment required of respondents, I propose to provide an additional lottery inducement of $1500 for each wave of the in-depth survey.
3.2.3 Snowball Network Resample
With the collection of the data outlined above, I will have detailed information on the starting network, and
4 snapshots of parts of that network over time. A longitudinal picture of the total network over this period is
still needed to situate the sub-samples within the wider computer communication network. I propose to recontact all people identified in the original survey at the time of the last longitudinal sample. This short followup questionnaire will contain only the network module and questions about changes in status since we last contacted them, and will serve to anchor the global network sample at the end of the study, providing the ability to
situate the sub-sampled actors within the wider population structure.
4. Data Analysis
Once the snowball sampling limits have been reached, the next task is identifying the longitudinal network
samples, which requires identifying all cohesive peer groups in the network. Once these groups have been
10
identified and the longitudinal survey put into the field, work on identifying the properties outlined in the first
section can begin. The ability to analyze large social networks is expanding rapidly, thanks largely to the development of PAJEK (Batagelj and Mrvar 1999), a program for analyzing large networks. For the work below, I
will use PAJEK when possible, and develop separate software as needed.
4.1 Network Topography
The electronic social network is represented as a digraph, G(V,A), where the vertices, V, represent our set
of |v| actors and the arcs, A, represent the relations among actors defined as an ordered set of pairs (vi,vj). Actor
i is adjacent to actor j if (vi, vj)∈A. A path in the network is defined as a sequence of adjacent, distinct vertices
and edges, starting with one node and ending with another. Actor i can reach actor j if there is a path in the
graph starting with i and ending with j.
4.1.1 Social Distance
Graph theoretically, distance is defined as the minimum number of edges in a path connecting two actors.
We can identify the distance from ego to any other actor in the network using a simple BFS search (Gibbons
1985). One measure of the social distance in the achieved snowball graphs can be calculated by tracing the
geodesics in the observed graph. However, the sampling procedure will constrain the diameter of the graph to a
20 step maximum (10-steps on either ‘side’ of the initial seed), and smaller if 50,000 people are reached in
fewer steps from the seed. Thus for two nodes that are closer to the frontier of the snowball sample than the
distance between them, their geodesics may be overestimated, since we cannot know if a contact in the next
(not-sampled) step would link them. For all other nodes in the sample graph, the geodesic will provide an accurate estimate of the social distance amongst the actors.
A second approach to social distance will be to estimate network bias parameters (Fararo 1981; 1983;
Fararo and Skvoretz 1987; Skvoretz 1983; 1985) from the extended local networks of each person contacted.
Conceptually, bias parameters control the difference between a and α in a snowball recursion formula, such as
that given in equation 1. The reachability curves for any given population can be traced, just as in the construction of a snowball sample, and a parameter governing reachability can be estimated from these curves. Figure 2
below, for example, gives the reachability curves for three large American high schools.
Figure 2. Mean Trace Profile for three high schools. Data from the National Longitudinal Survey of
Adolescent Health.
1
Proportion Contacted
0.8
0.6
0.4
0.2
0
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Step
In each case, the curve is much more shallow (reaches fewer people in k steps) than a random network of similar mean degree. By estimating the proportion of new people brought into the sample at each wave, we can approximate the entire reachability curves. One can estimate bias parameters based on transitivity (the likelihood
that if i nominates j, and j nominates k, i also nominates k), reciprocity (the likelihood that if i nominates j, j also
11
nominates i), and homogeneity or clustering parameters (the likelihood that a person with attribute k nominates
another person with attribute k, or who is x units different from ego on k).
4.1.2 Social Cohesion and Peer Communities
I measure social cohesion through the nested node connectivity sets. First, I will employ linear time algorithms to identify bicomponents and tri-components in the networks (Gibbons 1985; Hopcroft and Tarjan 1973),
and higher levels of connectivity by combining low polynomial time algorithms for testing connectivity (Even
and Endre Tarjan 1975) and identifying cut-sets of the graph (Kanevsky 1993).10 This procedure will identify a
nested set of cohesive groups. Identifying how behavior similarity and relation stability differs within and between such groups is an important test for the relation between node connectivity and social cohesion.
The procedure for identifying peer groups starts with identifying the tri-components of the graph, and
searching for dense interaction regions within the tri-components. Once an initial set of dense regions is identified, using a standard distance clustering method (or any initial cut method, such as the adjacency sorting
method implemented in NEGOPY (Richards 1995)), I then identify the returns to relative density that follow
from moving any member of one group to another group. The iterative procedure then assigns people to groups
if a reassignment would increase the fit index for both groups. At each re-assignment stage, new groups are
checked for minimal connectivity to ensure that all identified groups are cohesive. The re-assignment procedure works on the underlying mixing matrix, and thus greatly reduces the computational size of the problem.
Three positions result from this group assignment procedure. Nodes either belong to cohesive groups (members), are between multiple groups (liaisons), or are outside of the system (people who are not members of the
largest bi-component).
4.1.3 Small World Structure
A small world network can be characterized as one in which most ties are sent within a relatively small, local group of actors, where each of these small groups have a few ties distributed throughout the graph, thereby
linking the small clusters together. In his recent work on the small world problem, Duncan Watts identifies a
set of simple mixing parameters that capture the broad features of a small world network (Watts 1999; Watts
and Strogatz 1998). Combining the median length of the geodesics in a graph (what Watts refers to as the characteristic path length, L) and the clustering coefficient, γv, which measures the extent to which vertices adjacent
to any vertex v, are adjacent to each other,11 we can measure the degree to which a given graph represents a
small-world structure. Watt’s provides two general mathematical models for small world graphs, which depend
on a single parameter bounded between 0 and 1. At one extreme, we have a completely random graph while at
the other a completely clustered graph.12 By estimating this parameters for the observed electronic networks
and building on the theoretical applications Watts provides, we can identify the potential dynamic properties
associated with the graph.
4.1.4 Social Balance
The transitivity level, or functions of the transitivity level, of a given network is a standard measure of social balance, which rests on the distribution of triads in a network. While efficient matrix methods for identifying the triad census in moderately sized dense network are available (Moody 1998c), for sparse networks it is
more efficient to enumerate the distribution of triads in the network as they can be calculated from the 2-step
neighborhoods of each node. For this part of the project, I will identify the triad census for the graph as a
whole, as well as for regions of varying distance around each node. In so doing, I will be able to assess the extent of social balance at various removes from ego, and thus evaluate the structure of both the global and messolevel communities each actor is involved in.
Functions of the transitivity index are problematic measures of balance when other features, such as organizational clustering and homophily, can lead to transitive friendship groups without reference to a balance
10
The efficiency of both algorithms can be improved when searching for nested sets, as the search area in any graph
can be constrained to those nodes with degree greater than or equal to the previous connectivity level, k.
11
γv is equal to the density of the local neighborhood of each vertex v.
12
The difference between the two models is that one assumes a ring-substrate. This is likely the most applicable
model, since a giant bicomponent – a cycle connecting every node in the network – will likely encompass most of the
nodes in the connected components.
12
model (Feld and Elmore 1982). To account for such features, a statistical model that controls for clustering in
the graph will be used. The p* family of statistical models for social networks allow one to estimate the effect
of a given relational pattern (such as the number of transitive or intransitive triads that would result from a
given relation) on the likelihood of a relation being present, net of other features in the graph (Robins et al.
1997; Wasserman and Pattison 1996). The models are estimated using a logistic regression on properties of the
ordered dyads in a network. A model such as the following would be a typical cross-sectional example
(seeMoody 1999c for model details and alternative specifications), that could be estimated on the clustered subsamples of the total graph.
 p ( Y ij = 1) 
 = a + b1 ( E i ) + b 2 ( p j ) + b 3 ( H ij ) + b 4 ( r ji ) + b 5 ( T ij ) + e ij
log 
 p (Y = 0 ) 
ij


(2)
Where a captures the effect of density, b1 is a coefficient for ego out-degree, b2 is a coefficient for alter in-degree, b3
is a vector of coefficient(s) describing effect(s) of dyad attribute differences on email relations (multiple homophily
parameters), b4 is a coefficient for reciprocity effects, b5 is a coefficient for transitivity, describing the impact on the
number of transitive/intransitive triples associated with the jth dyad and eij is a random error term for the ijth dyad. I
have successfully used similar models on a large sample of fairly large networks (129 networks, ranging in size from
25 to 2000, with a mean of just under 500 nodes).
4.1.5 Category and Network
Group interaction can be modeled through mixing matrices, square tables that count the number of connections from people in one group to people in another. One can then fit log linear models to the tables to statistically model the likelihood of a person of one race nominating a person of another, or use such categories as
elements in the homophily parameters in the p* model of equation 2. Simple descriptions of mixing frequency
illustrate the saliency of a given category. Consider as an example table 2 below, which measures interaction
patterns among high school students (based on Add Health data). This table provides race-specific mixing ratios, αij, and measures the relative odds that someone of the row race nominates someone of the column race as
a friend. For example, we see that the odds of a White student nominating another white student are about 4.4
times the odds of a White student nominating a non-white student.
Table 2. Race Specific Mixing Patterns*
Race of person nominated as a friend
Race if nominator
White
Black
Hispanic
White
4.44
0.29
0.79
Black
0.21
9.16
1.18
Hispanic
0.81
1.29
2.51
Asian
0.81
0.51
0.69
*Row to column odds ratio
Asian
0.70
0.39
0.78
7.90
4.1.6 Network Position
Network position describes how an individual is situated within the overall network. On this broad view,
position includes both measures such as popularity (which captures a volume dimension of position), centrality
(positioning each node relative to the center or periphery of the network), and relational pattern similarity
(structural equivalence). Popularity is measured through actor in-degree – the number of people who nominate
ego as an electronic communication partner. Centrality can be measured in many ways (Bonacich 1987; Freeman 1979; Friedkin 1991; Kim 1997). I will calculate multiple centrality measures to provide other researchers
with choices for models of centrality on behavior. Role equivalence can be measured well by calculating the
triad position each actor is involved in. There are 16 possible triads in a directed graph, and 36 positions within
those triads. Each actor occupies a particular distribution of the positions, and two actors with identical triad
position vectors will have equivalent role positions in the network (Burt 1990; Moody 1999c).
13
4.2 Evaluating Use and Content of Online Relationships
Models of relational content will seek to describe (1) how people use their computer relations, (2) how important such relations are in their lives and (3) how such relations correspond to other network relations. The
first two goals are met largely through simple descriptive statistics. Ordered scales will be constructed to measure the importance and saliency of each relation, and standard statistical modeling techniques will then be used
to explain variance in the importance and use of electronic networks based on the attributes and network positions of actors. The third goal is met both descriptively (what proportion of electronic partners are also kin, for
example) and algebraically (Mandel 1983; Pattison 1993). By compounding multiple relations, we can reduce
the complexity of the ego-network pattern to a containment set (Mandel, 1983:379) which characterizes the local role of any actor. When calculated for all actors, we have an inventory of the role structures in the electronic network, which can then be used in statistical models of behavior and attitudes. For example, we will be
able to identify sets of actors for whom friendship and electronic relations are identical (the compound relation
equals the individual relation) or completely disjoint (the compound relation is empty) as well as a range of intermediate local role structures.
4.3 Dynamic Features
The technical tools for describing change in social networks are less well developed than models and methods for describing networks in the cross section. The simplest analysis of network change involves describing
changes in the various statistics calculated on each cross section. For example, as more people become involved in electronic communication, the overlap between friendship relations and electronic relations may increase. By describing patterns of change in transitivity, reciprocity, and position, we go a long way towards
understanding how networks evolve.
More formally, I will extend techniques I used on the Add Health sample of high school networks (Moody,
1999c). This includes using time-ordered panel extensions of the p* models developed by Wasserman, Pattison
and Robinson. These models approximate previous Markov approaches (Leenders 1995), while maintaining the
modeling flexibility of the p* logit models. In these cases, we can model the current network as a function of
the past network and patterns of the current network. Another approach to capturing dynamic features of the
networks will be to treat movement in network positions as a mobility problem. For example, we can model
changes in actor popularity as a mobility matrix, using standard log-linear models to describe the mobility regime in any given network. These analyses can be extended at the individual level by modeling the trajectory
of each person over time (Han and Moen 1999), using variants of sequence analysis (Abbott 1995).
4.4 Public Use Data Preparation
My goal is to make as much data publicly available as possible while maintaining the strict confidentiality
of all respondents. I will create two forms of the data for public release. The first dataset will contain all information in the datafile (the network adjacency information as well as demographic and behavioral data), with
all identifying information removed. This file will contain no information that could identify an individual.
Thus, all email addresses and outlying values will be recoded, and geographic information will be limited to
city. The audience for these data will be researchers with training and interest in analyzing large complex social
networks, who want to develop detailed behavior models (such as peer influence models and network autoregression models (Dow 1986; Friedkin 1998; Friedkin and Cook 1990; Friedkin and Johnsen 1997)) and measurement techniques.
The second data file will consists of a series of constructed network variables appended to the demographic
and behavioral files. This file will contain information on positional variables (centrality, degree, geodesic distance to others), local network context variables (reciprocity of local network, transitivity in local network, density in local network, etc.), local network composition variables (heterogeneity, mean values of substantive
variables, geographic dispersion of the ego-network, kin-composition, etc.), and sub-group membership variables (position indicators and cohesive group membership). This will then be a simple rectangular file that
other researchers can use with standard social science methods. The target audience for these data will be those
with a substantive interest in the importance of network context for behavior and those interested in online behavior and activity, who are not trained to calculate network measures directly.
14
ITR/SOC: The Structure and Dynamics of Electronic Social Networks
James W. Moody, PI, The Ohio State University.
Cited References
Abbott, A. 1995. "Sequence Analysis: New Methods for Old Ideas." Annual Review of Sociology 21:93113.
Auletta, V., Ye. Dinitz, Z. Nutov, and D. Parente. 1999. "A 2-Approximation Algorithm for Finding an
Optimum 3-Vertex Connected Spanning Subgraph." Journal of Algorithms 32:21-30.
Ball, M. O. and J. S. Provan. 1983. "Calculating Bounds on Reachability and Connectedness in Stochastic
Networks." Networks 13:253-78.
Batagelj, Vladimir and Andrej Mrvar. 1999. PAJEK. Vers. 49.
Bearman, P., J. Moody, and K. Stovel. 1997a. "The Add Health Network Variable Codebook." University
of North Carolina at Chapel Hill.
———. 1997b. "Chains of Affection: The Structure of Adolescent Romantic Networks." University of
North Carolina at Chapel Hill. Manuscript .
Bell, R. R. 1988. Worlds of Friendship. Beverly Hills: Sage publications.
Blau, P. M. 1994. Structural Contexts of Opportunities. Chicago and London: University of Chicago Press.
Blau, P. M. and J. E. Schwartz. 1984. Crosscutting Social Circles: Testing a Macrostructural Theory of
Intergroup Relations. Orlando: Academic Press.
Bonacich, P. 1987. "Power and Centrality: A Family of Measures." American Journal of Sociology
92:1170-1182.
Bourdieu, P. 1989. "Social Space and Symbolic Power." Sociological Theory :14-25.
Brudner, L. A. and D. R. White. 1997. "Class, Poverty, and Structural Endogamy: Visualizing Networked
Histories." Theory and Society 26:161-208.
Burkhalter, B. 1999. "Reading Race Online: Discovering Racial Identity in Usenet Discussions." Pp. 60-75
in Communities in Cyberspace, Editors Peter Kollock and Marc A. Smith. London: Routledge.
Burt, R. S. 1990. "Detecting Role Equivalence." Social Networks 12:83-97.
Buskens, V. and K. Yamaguchi. 1999. "A New Model for Information Diffusion in Heterogeneous Social
Networks." Sociological Methodology 29:281-35.
Cartwright, D. and F. Harary. 1956. "Structural Balance: A Generalization of Heider's Theory."
Psychological Review 63:277-93.
Chartrand, G. and O. R. Oellermann. 1993. Applied and Algorithmic Graph Theory. New York: McGrawHill Inc.
Coleman, J. S. 1961. The Adolescent Society. New York: Free Press.
CyberAtlas. 2000. "The World's Online Populations."
http://cyberatlas.internet.com/big_picture/geographics/article/0,1323,5911_151151,00.html
Davis, J. A. 1963. "Structural Balance, Mechanical Solidarity, and Interpersonal Relations." American
Journal of Sociology 68:444-62.
———. 1970. "Clustering and Hierarchy in Interpersonal Relations: Testing Two Graph Theoretical
Models on 742 Sociomatrices." American Sociological Review 35:843-51.
Davis, J. A. and S. Leinhardt. 1972. "The Structure of Positive Relations in Small Groups." Pp. 218-51 in
Sociological Theories in Progress, vol. 2, J. Berger, M. Zelditch, and B. Anderson. Boston, MA:
Houghton Mifflin.
Donath, J. S. 1999. "Identity and Deception in the Virtual Community." Pp. 29-59 in Communities in
Cyberspace, Editors Peter Kollock and Marc A. Smith. London: Routledge.
Doreian, P., R. Kapuscinski, D. Krackhardt, and J. Szczypula. 1996. "A Brief History of Balance Through
1
Time." Journal of Mathematical Sociology 21:113-31.
Doreian, P. 1986. "On the Evolution of Group and Network Structure II: Structures Within Structures."
Social Networks 8:22-64.
Dow, M. M. 1986. "Model Selection Procedures for Network Autocorrelated Disturbances Models."
Sociological Methods and Research 14:403-22.
Duck, S. W. 1991. Friends for Life: the Psychology of Personal Relationships. New York: Havester.
Durkheim, E. 1984. The Division of Labor in Society. translator W. D. Halls. New York: The Free Press.
Even, S. and Endre Tarjan. 1975. "Network Flow and Testing Graph Connectivity." SIAM Journal of
Computing 4:507-18.
Fararo, T. J. 1981. "Biased Networks and Social Structure Theorems." Social Networks 3:137-59.
———. 1983. "Biased Networks and the Strength of Weak Ties." Social Networks 5:1-11.
Fararo, T. J. and J. Skvoretz. 1987. "Unification Research Programs: Integrating Two Structural Theories."
American Journal of Sociology 92:1183-209.
Feld, S. L. 1981. "The Focused Organization of Social Ties." American Journal of Sociology 86:1015-35.
Feld, S. L. and R. Elmore. 1982. "Patterns of Sociometric Choices: Transitivity Reconsidered." Social
Psychological Quarterly 45:77-85.
Festinger, L. 1957. A Theory of Cognitive Balance. Evanston, IL: Row, Peterson & Co.
Fischer, C. S. 1982. To Dwell Among Friends: Personal Networks in Town and City. Chicago: University
of Chicago Press.
Frank, K. A. 1995. "Identifying Cohesive Subgroups." Social Networks 17:27-56.
Frank, O. 1977. "Survey Sampling in Graphs." Journal of Statistical Planning and Inference 1:235-64.
———. 1978. "Sampling and Estimation in Large Social Networks." Social Networks 1:91-101.
———. 1979. "Estimation of Population Totals by Use of Snowball Samples." Pp. 319-48 in Perspectives
on Social Network Research, Paul. W. Holland and Samuel Leinhardt. New York: Academic Press.
Freeman, L. C. 1972. "Segregation in Social Networks." Sociological Methods and Research 6:411-30.
Freeman, L. C. 1979. "Centrality in Social Networks: Conceptual Clarification." Social Networks 1:21539.
Freeman, L. C., D. R. White, and K. A. Romney. 1992. Research Methods in Social Network Analysis.
New Brunswick and London: Transaction Publishers.
Friedkin, N. E. 1991. "Theoretical Foundations for Centrality Measures." American Journal of Sociology
96:1478-504.
———. 1998. A Structural Theory of Social Influence. Cambridge: Cambridge.
Friedkin, N. E. and K. S. Cook. 1990. "Peer Group Influence." Sociological Methods and Research
19(1):122-43.
Friedkin, N. E. and E. C. Johnsen. 1997. "Social Positions in Influence Networks." Social Networks
19:209-22.
Galaskiewicz, J. and S. Wasserman. 1981. "A Dynamic Study of Change in a Regional Corporate
Network." American Sociological Review 46:475-84.
Gibbons, A. 1985. Algorithmic Graph Theory. Cambridge: Cambridge University Press.
Granovetter, M. 1973. "The Strength of Weak Ties." American Journal of Sociology 81:1287-303.
Hallinan, M. T. 1974. "A Structural Model of Sentiment Relations." American Journal of Sociology
80:364-78.
Han, S.-K. and P. Moen. 1999. "Clocking Out: Temporal Patterning of Retirement." American Journal of
Sociology 105:191-236.
Harary, F. and D. R. White. 1999. "Measuring Social Cohesion: Node Connectivity and Conditional
2
Density." Manuscript .
Holland, P. W. and S. Leinhardt. 1971. "Transitivity in Structural Models of Small Groups." Comparative
Groups Studies 2:107-24.
Hopcroft, J. E. and R. E. Tarjan. 1973. "Dividing a Graph into Triconnected Components." SIAM Journal
of Computing 2:135-58.
Hummon, N. P. and T. J. Fararo. 1995. "Assessing Hierarchy and Balance in Dynamic Network Models."
Journal of Mathematical Sociology 20:145-59.
Johnsen, E. C. 1985. "Network Macrostructure Models for the Davis-Leinhardt Set of Empirical
Sociomatrices." Social Networks 7:203-24.
———. 1986. "Structure and Process: Agreement Models for Friendship Formation." Social Networks
8:257-306.
Kanevsky, A. 1993. "Finding All Minimum-Size Separating Vertex Sets in a Graph." Networks 23:533-41.
Khuller, S. and B. Raghavachari. 1995. "Improved Approximation Algorithms for Uniform Connectivity
Problems." Proceedings of the 27th Annual ACM Symposium on the Theory of Computing :1-10.
Kim, H. 1997. "Structural Holes, Strategic Communication, and Control Centrality in Social Networks."
Workshop on Structures in Process, Working Papers Series (1). University of North Carolina at
Chapel Hill.
Leenders, R. Th. A. J. 1996. "Evolution of Friendship and Best Friendship Choices." Pp. 149-64 in
Evolution of Social Networks, Editors P. Doreian and Frans N. Stokman. New York: Gordon and
Breach.
Leenders, R. Th. A. J. 1995. "Models for Network Dynamics: A Markovian Framework." Journal of
Mathematical Sociology 20:1-21.
Lorrain, F. and H. C. White. 1971. "Structural Equivalence of Individuals in Social Networks." Journal of
Mathematical Sociology 1:49-80.
Mandel, M. 1983. "Local Roles and Social Networks." American Sociological Review 48:376-86.
Mark, N. 1998. "Beyond Individual Differences: Social Differentiation From First Principles." American
Sociological Review 63:309-30.
Meyer, G. W. 1994. "Social Information Processing and Social Networks: A Test of Social Influence
Mechanisms." Human Relations 47:1013-47.
Milgram, S. 1969. "The Small World Problem." Psychology Today 22:61-67.
Moody, J. 1998a. "A General Method for Creating Approximate Conditionally Uniform Random Graphs."
University of North Carolina at Chapel Hill. Manuscript .
———. 1998b. "Identifying Cohesive Subgroups in Large Networks." University of North Carolina,
Chapel Hill. Manuscript .
———. 1998c. "Matrix Methods for Calculating the Triad Census." Social Networks 20:291-99.
———. 1999a. "School Friendship Segregation: Racial Heterogeneity and Friendship Choice in American
High Schools." The Ohio State University. Manuscript .
———. 1999b. SPAN: SAS Programs for Analyzing Networks. Vers. .30. The Ohio State University.
———. 1999c. "The Structure of Adolescent Social Relations: Modeling Friendship in Dynamic Social
Settings." Dissertation. University of North Carolina, Chapel Hill.
———. 2000. "Indirect Connectivity and STD Infection Risk: The Iportance of Relationship Timing for
STD Diffusion." Manuscript. The Ohio State University.
Moody, J. and D. R. White. 1999. "Social Cohesion and Embeddedness: A Hierarchical Conception of
Social Groups." in The Ohio State University. Manuscript .
Morgan, D. L., M. B. Neal, and P. Carder. 1997. "The Stability of Core and Peripheral Networks Over
Time." Social Networks 19(1):9-25.
3
Morris, M. 1997. "Sexual Networks and HIV." AIDS 97: Year in Review 11(Suppl A):S209-S216.
Nadel, S. F. 1955. The Theory of Social Structure. London: Cohen and West.
Newburger, Eric C. 1997. Computer Use in the United States. p20-522. Washington, D. C.: U.S. Census
Bureau.
O'Brien, J. 1999. "Wrting in the Body: Gender (Re)Prodcution in Online Interaction." Pp. 76-106 in
Communities in Cyberspace, Editors Peter Kollock and Marc A. Smith. London: Routledge.
Pattison, P. 1993. Algebraic Models for Social Networks. Cambridge England: Cambridge University
Press.
Pool, I. d. S. and M. Kochen. 1978. "Contacts and Influence." Social Networks 1:5-51.
Rapoport, A. and W. J. Horvath. 1961. "A Study of a Large Sociogram." Behavioral Science 6:279-91.
Rice, R. E. 1994. "Network Analysis and Computer-Mediated Communication Systems." Pp. 167-203 in
Advances in Social Network Analysis, Editors Stanley G. J. Wasserman. Sage.
Richards, William D. 1995. NEGOPY. Vers. 4.30. Brunaby, B.C. Canada: Simon Fraser University.
Robins, G., P. Pattison, and S. Wasserman. 1997. "Logit Models and Logistic Regressions for Social
Networks: III. Valued Relations." Manuscript .
Sewell, W. H. Jr. 1992. "A Theory of Structure: Duality, Agency, and Transformation." American Journal
of Sociology 98:1-29.
Skvoretz, J. and T. J. Fararo. 1996. "Status and Participation in Task Groups: A Dynamic Network
Model." American Journal of Sociology 101:1366-414.
Skvoretz, J. 1983. "Salience, Heterogeneity and Consolidation of Parameters." American Sociological
Review 48(360-375).
———. 1985. "Random and Biased Networks: Simulations and Approximations." Social Networks 7:22561.
Smith, M. A. 1999. "Invisible Crowds in Cyberspace: Mapping the Social Structure of the Usenet." Pp.
195-219 in Communities in Cyberspace, Editors Peter Kollock and Marc A. Smith. London:
Routledge.
Stokman, F. N. and P. Doreian. 1996. "Evolution of Social Networks: Processes and Principles." Pp. 23350 in Evolution of Social Networks, Editors P. Doreian and Frans N. Stokman. New York: Gordon and
Breach.
Taylor, H. 1999. "Online Population Growth Surges to 56% of All Adults."
http://www.harrisinteractive.com/harris_poll/pdf/dec22_1999.pdf .
Wasserman, S. and K. Faust. 1994. Social Network Analysis. Cambridge: Cambridge University Press.
Wasserman, S. and P. Pattison. 1996. "Logit Models and Logistic Regressions for Social Networks: I. An
Introduction to Markov Graphs and P*." Psychometrika 61:401-25.
Watts, D. J. 1999. Small Worlds: The Dynamics of Networks Between Order and Randomness. Princeton:
Princeton University Press.
Watts, D. J. and S. H. Strogatz. 1998. "Collective Dynamics of 'Small-World' Networks." Nature 393:440442.
Weesie, J. and H. Flap. 1990. Social Networks Through Time. Utrecht, Netherlands: ISOR.
Wellman, B. 1992. "Which Types of Ties and Networks Give What Kinds of Social Support?" Pp. 207-35
in Advances in Group Processes, vol. 9, E. Lawler, B. Markovsky, C. Ridgeway, and H. Walker.
Greenwich, CT: JAI Press.
Wellman, B. and S. D. Berkowitz. 1997. Social Structures: A Network Approach. London: JAI Press.
Wellman, B., J. Salaff, D. Dimitrova, L. Garton, M. Gulia, and C. Haythornthwaite. 1996. "Computer
Networks As Social Networks: Collaborative Work, Telework and Virtual Community." Annual
Review of Sociology 22:213-38.
4
Wellman, B. and S. Wortley. 1990. "Different Strokes From Different Folks: Community Ties and Social
Support." American Journal of Sociology 96:558-88.
White, D. R. 1998. "Concepts of Cohesion, Old and New: Which Are Valid Which Are Not?" University
of California - Irvine. Manuscript .
White, D. R., C. Batagelj, and A. Mrvar. 1999. "Analyzing Large Kinship and Marriage Networks with Pgraph and Pajek." Social Science Computer Review 17:245-74.
White, D. R., M. Schnegg, L. A. Brudner, and H. Nutini. 1999. "Multiple Connectivity and Its Boundaries
of Reticulate Integration: A Community Study." University of California Irvine. manuscript .
White, H. C. 1965. "Notes on the Constituents of Social Structure." Social Relations Department, Harvard
University. Manuscript .
White, H. C., S. A. Boorman, and R. L. Breiger. 1976. "Social Structure From Multiple Networks I."
American Journal of Sociology 81:730-780.
Zeggelink, E. P. H. 1993. Strangers into Friends: the Evolution of Friendship Networks Using an
Individual Oriented Modeling Approach. Amsterdam: ICS.
———. 1994. "Dynamics of Structure: an Individual Oriented Approach.” Social Networks 16:295-333.
Zeggelink, E. P. H., F. N. Stokman, and G. G. Van De Bunt. 1996. "The Emergence of Groups in the
Evolution of Friendship Networks." Journal of Mathematical Sociology 21:29-55.
5