ITR/SOC: The Structure and Dynamics of Electronic Social Networks James W. Moody, PI, The Ohio State University Project Description 1 Specific Aims and Scientific Motivation Recent computer epidemics, such as ‘Melissa’ that spread using people’s email address books, demonstrate the importance of social connectivity among electronic communication users. The effectiveness of these types of virus rest on the fact that people’s social contacts are interconnected in a vast electronic network. Sociologically, it is impossible to assume that such a contact network is randomly structured. Social, economic, geographic and interpersonal factors likely determine the structure of this network. Understanding the basic topographical features of this network and the processes by which the network develops and changes are essential components for understanding how new information technology affects people’s lives. Unfortunately, we know very little about the factors that shape computer based relationships in natural, non-organizational settings (for work on the structure of Usenet posts, see Smith 1999). At the same time, the prevalence of computer communication is rising rapidly, with over 56% of American adults and close to 250 million people worldwide online (compared to just 9% of American adults online in 1995 (Taylor 1999)). Researchers have studied the topography of computer-based social networks in local and organizational settings since the early 1980s (see Rice 1994; Wellman et al. 1996 for reviews of the state of the computer mediated communication literature), but they have not paid similar attention to relations in unconstrained settings. Computer based communication, such as email, provides users with the ability to segregate audiences and hide features of their own identity that is impossible in face-to-face communication (Donath 1999). As such, the structure of social relations may differ dramatically from those we have commonly mapped among coworkers and friends, providing a unique insight into the development of relational structure when the power of external attributes is minimized (Mark 1998). That many people are involved in relations that sociologists have yet to document represents a glaring lacuna in our understanding of contemporary culture. Traditionally, social network researchers have focused almost entirely on very small networks (usually less than 100 actors, rarely more than 1000). This bias toward small group research makes it impossible to ask questions about society-wide social structure, or to test the society-level theories developed by early network theorists (see for example, Pool and Kochen 1978; Rapoport and Horvath 1961). The focus on small groups has been pragmatic, since collecting network data on most forms of social interaction is expensive and efficient tools for analyzing large networks have only recently become available (Batagelj and Mrvar 1999; White, Batagelj, and Mrvar 1999). It is in this respect that computer-based social networks have a distinct advantage. Using the World Wide Web, we can survey computer communication networks at extremely low cost. After an initial investment in survey design and equipment, the marginal cost of each additional respondent is almost zero. Computer-based social networks are thus ideal for providing researchers with a testing ground for substantive and methodological work on large social networks, extending the promise of this work beyond the constrained settings of small groups. In light of these considerations, this proposed project has 4 specific aims: (1) to map the social topography of electronic communication networks, (2) to identify the quality and content of computer-based relations, (3) to identify how relations form and change over time, and (4) to provide the social network and information technology research communities with a public dataset of large, dynamic social networks. When met, these four aims will allow multiple researchers the opportunity to develop theoretical and methodological approaches to problems surrounding large social networks in a new and rapidly expanding social context. 1.1 Mapping The Topography of Electronic Communication Networks Electronic communication networks, like all social networks, can be represented as finite graphs, with each person in the graph represented as a point and each relation represented as an arc in the graph (see Freeman, White, and Romney 1992; Wasserman and Faust 1994; Wellman and Berkowitz 1997 for a general introduction). This representation of the network allows one to use graph theoretic tools to map the social structure. For this project, I propose to measure 6 dimensions of the social network’s topography (see §3 for measurement details): 1 1.1.1 Social Distance Most social interaction is constrained by salient social divisions. Moreover, such divisions usually correlate, creating a rigid social structure (Bourdieu 1989; Sewell 1992). In a network, social distance is captured by the number of relational steps separating two individuals. Early network researchers noted that the distribution of distance in a society can be used to directly measure the pattern of social divisions (Pool and Kochen 1978; Rapoport and Horvath 1961), providing a direct measure of social structure. Most network studies in small, bounded settings are limited to a section of the population that is much more homogeneous than the population at large. By examining relational patterns without regard to organizational boundaries, we can potentially reach respondents from all sections of the online community. As such, we gain a much more accurate picture of relations between people from very different class, race and cultural backgrounds. Such a network will provide us with an interaction-based measure of the national social structure.1 1.1.2 Social Cohesion One of the key features of any social system is the extent to which actors are bound together in social groups (Durkheim 1984). Recent theoretical advances have shown that redundancy in social networks is the essential element of social cohesion. Cohesive groups emerge when members are connected – and re-connected – through multiple independent pathways (Brudner and White 1997; Harary and White 1999; Moody and White 1999; White 1998; White et al. 1999. See also White’s NSF grant: BCS-9978282). The structural features of cohesive groups provide a theoretical base for how information and resources flow through the network. For example, information in a multiply connected group will likely flow freely, since no single member can control the flow of information among actors. The circulation of information made possible by connectivity helps to reinforce cultural standards, solidify group power, and lower hierarchical divisions within the group. Empirical work across widely varying substantive domains has confirmed these theoretical expectations. We find, for example, that members of cohesive groups are likely to have a stronger sense of community and act in uniform ways politically, (Moody and White, 2000). Brudner and White (1997) show that structurally cohesive groups based on marital ties correlated with a stratified class system of single-heir succession of productive farmlands in Austria. Similarly, White et al (1999) showed that cohesive groups defined by marital ties among Mexican villagers were restricted to a select core of families who were resident for several generations, excluding recent immigrants, while ritual kinship ties (between parents and godparents) crosscut this core to integrate immigrants into an egalitarian class structure. Identifying cohesive groups within the electronic communication networks will allow me to inductively identify salient social communities in cyberspace, and evaluate how cohesive group membership relates to community identification, ascribed characteristics, and social behavior. Sociologically, the private nature of electronic communication provides a potentially rich and radically new setting to study social cohesion. Actors can create individual identities for separate audiences and thus segregate their (local) audiences from each other. Such action may lead to graphs that are less clustered than those observed in most social network research. However, the same process that might lead to widely ranging ties at the individual level, may simultaneously lead to more cohesive social structures. By bridging different groups, multiple independent paths between actors will result. Thus, ironically, a local attempt to segregate relations may result in a globally cohesive network. 1.1.3 Small World Structure The conventional wisdom that no two people are separated by more than ‘6-handshakes’ vividly captures the image of a small world network (Milgram 1969). The small world phenomena rests on a unique combination of distance and clustering in social relations. Actors must be embedded in dense local clusters that are interconnected through a small number of bridging ties. Such small-world graphs have properties we might expect from random graphs (short overall distances between people) while maintaining features that would only be found in structured graphs (such as coordinated dynamic action). While the general features of small-world graphs have been known since the late 1960s, recent work (Watts 1999; Watts and Strogatz 1998) has clarified the mathematical properties underlying small-world graphs. Importantly, Watts developed a set of scalable pa1 Blau’s work on intermarriage patterns provides the closest example of a nation-wide image of interaction structure as social structure, of course, since one can be married to only one person at a time, the topography of this structure is quite limited (Blau 1994; Blau and Schwartz 1984). 2 rameters that can measure the extent to which a given graph has a small-world structure. If electronic networks are locally clustered, but contain bridges between such groups, then a small world structure is quite likely. By identifying the magnitude of the small-world features of an electronic network, we can build on the known characteristics of such graphs, providing insight into how information flows through the network. 1.1.4 Social Balance A cumulative research tradition in social networks has demonstrated that relatively simple individual action rules will generate global networks that are both highly clustered and hierarchically ordered (Cartwright and Harary 1956; Davis 1963; Davis 1970; Davis and Leinhardt 1972; Johnsen 1985; 1986; Moody 1999c). The guiding rule at the base of this work is social balance theory. Simply stated, social relations are balanced when they are transitive, that is, when a friend of a friend is also a friend. Psychologically, balance theory rests on the assumption that associating with people who do not like each other generates strain (Festinger 1957). When this happens, people are expected to change their relations to create balanced networks. While balance theory is based on attempts to reconcile local conditions, the theory has profound implications for the way complete social networks are shaped and evolve. It can be shown, for example, that under certain circumstances the resulting global graph will consist of a set of tight-knit groups that are embedded in a wider field of loosely connected people. The groups in this structure emerge and dissolve in an ever-shifting pattern, as each actor’s local choices to balance their own networks creates imbalance for others, who then respond in kind. As a result, the broad macro-level topography – of an ordered set of groups – remains throughout, but the membership within such groups is constantly in flux. Since electronic relations consist of largely private dyadic exchanges, the third-person pressures evident in most balance models may play a weaker role. Similarly, if people can manipulate their identities with different interaction partners, they may be able to segregate social worlds to thwart the kind of clustering social balance usually produces. Thus, the computer communication network provides a unique challenge to one of the best-supported theories of social network formation, and a unique opportunity to test the model in a dynamic new setting. 1.1.5 Overlap of Category and Network In most social networks, there is a correspondence between actors’ attributes and the pattern of their relations. In fact, it has been argued that the social salience of such categories is directly related to the extent that they shape social relations (Freeman 1972), and that the category itself often results from regular social interaction (White 1965). The correspondence between social interaction and attributes provides an opportunity to study substantive social segregation. In many social settings, heterogeneous groups share the same physical space, and are thus formally integrated. If, however, people in the setting only interact with people from their own group, then the setting is substantively segregated. Thus, while the number of minorities who use electronic communication is increasing2 (Taylor 1999), if electronic relations are focused entirely within race, then the Internet is still substantively segregated (see Moody 1999a). The extent of substantive segregation in communication networks may differ from face-to-face relations for two reasons. First, actors’ attributes are not as public in email as in face-to-face communication. Actors can choose whether to reveal their attributes, and thus interaction can be based on substantive interest instead of ascribed status. Secondly, since electronic media are not restricted to physical focal organizations (Feld 1981), segregation that results from differential interaction opportunities (such as school tracking) may not play a role. We should not jump to the conclusion, however, that segregation will not persist on the Internet. First, if race is important to the actor, then they will likely seek out others who share the same opinions, and there are fewer social controls to limit the formation of deviant groups. We find then, that the web has become an ideal setting for skin-heads seeking to build a white-only community. Secondly, to the extent that electronic media simply mirror other personal relations, we will expect relationship segregation levels similar to those observed 2 American Internet users are predominately white (81% compared to 76% of the US population) better educated (64% with at least some college education, compared to 48% of the total US population) and have a higher household income (41% earn more than $50,000 a year, compared to 32% of the total population) than the nation as a whole (Taylor 1999, see also CyberAtlas 2000; Newburger 1997). 3 in friendship networks. Third, even if electronic relations are integrated with respect to ascribed characteristics such as race, they may still be very homogeneous with respect to other characteristics (a group formed around an online game, for example). Empirically, research on Usenet groups shows that race and gender are both pieces of information people seek out in online communication (Burkhalter 1999; O'Brien 1999). One of the primary results of this work will be to identify the extent of social integration on the Internet. 1.1.6 Network Position Network topography has significant implications for an individual’s social position in the group. Networks are relationally differentiated,3 and we can thus distinguish types of actors based on the pattern of relations they are involved in. First, role positions in any group can be identified through regular interaction patterns (Nadel 1955; Lorrain and White 1971; White et al. 1976). For example, we can identify people who are liaisons between multiple disconnected groups or people who are at the top of a relational hierarchy (members of a school’s leading crowd (Coleman 1961), for example). By identifying the most common interaction patterns in a network and identifying how positions relate to each other, we can identify the role system for any given network. Secondly, we can characterize each actor’s position at the individual level through measures such as network centrality. These measures situate each person in the social topography relative to the position of every other person in the network. Our current understanding about position in electronic networks comes from studies of (comparatively) small settings within organizations, and thus not surprisingly interaction patterns tend to follow the organizational chart (Rice 1994). This project will allow us to describe the array of positions in non-organizational settings, and thus understand the links between general interaction and behavior outside of the organizational contexts of previous work. 1.2 Evaluating the Use and Content of Social Relations The meaning of any global network structure depends as much on the content as on the pattern of social relations. Traditionally, social network researchers have relied on relations with strong face validity (such as ‘friendship’ or ‘coworker’) where the substantive meaning of the relation could be readily inferred. Computer based relations, however, do not have the same unambiguous meaning, and thus we need to identify the content of the relation directly. We can do this by identifying how people use their computer relations and letting them evaluate the quality of such relations. I propose to identify the relational content of electronic communication in two ways. The first method for determining the content of relations is to ask respondents to identify relevant dimensions of each relationship. For example, I will ask respondents to rate relations on the content of the communication (work vs. personal for example), the extent to which they trust the people they email often, how often they use the link for social support (either giving or receiving) and how important this person is in their lives. Secondly, relationships are often multi-layered. People are friends with their co-workers and work with their relatives. By identifying how various relationship types overlap, we can better characterize the content of each relation. Thus, using algebraic techniques on local networks (Mandel 1983; Pattison 1993) I can identify the characteristic patterns of electronic relations relative to well known relations such as friendship and kin. 1.3 The Network Dynamics of Computer Based Relations While most social network research had tended toward static, cross-sectional research, there has been a strong recent interest in the dynamic aspects of social networks (Doreian 1986; Galaskiewicz and Wasserman 1981; Hummon and Fararo 1995; Leenders 1996;1995; Morgan et al. 1997; Stokman and Doreian 1996; Weesie and Flap 1990; Zeggelink 1994; Zeggelink et al. 1996). This surge in dynamic research comes from the realization that to understand the properties of any social system, one needs to understand the trajectory of both actors within the system and of the global characteristics of the system. To that end, I propose to follow a sample of actors and network clusters four times over the course of a year, and resample the full network a year after the first contact. At the individual level there are two relevant dimensions of relational change. First, we want to document ego-level changes in relational behavior, such as changes in the volume of computer communication partners and changes in the frequency of contact with any given partner. Such changes can be modeled as a function of changes in the relational environment ego is embedded within as well as changes in individual characteristics 3 Except in the rare case of a completely random network or completely connected clique. 4 and life-course position. Second, I will document changes in an individual’s position relative to the wider network he or she is embedded within. Since the structural features of a given network are largely independent of the actions of any single actor, an individual’s position in the global network can change even if he or she makes no changes in their own relations. To document positional changes, we will identify sequences of positions that actors occupy over time. The set of all such position sequences will enumerate the evolving role structure of the group. Given the rapid growth and newness of the Internet, these images will provide a unique vision of the early development of a large interacting social system. 1.4 Public Use Database Scientific study benefits from multiple perspectives and approaches to any given problem. The smallnetwork bias evident in most social network research is due, at least in part, to a relative lack of data on large social networks. This project will provide the social network research community with multiple, large networks that will push the frontier of methodological and theoretical network research. I will provide two types of data to best serve the varying needs of the research community. First, after removing any identifying characteristics from the network records, I will make the raw network data available to any researchers through my web page or by placing the data in centralized data archives. This raw data will include both the network adjacency structure, as well as a wide range of substantive and demographic attributes of the respondents. Researchers will then be free to develop and test new methods for large-scale network research as well as substantive theories about electronic communication networks and social behavior. Second, since many researchers have substantive interests that would benefit by including network measures but don’t have the technical training to construct such measures, I will also make a constructed network dataset available that any researcher will be able to use with standard sociological analysis techniques. This will provide the general research community with a rich, multiple use dataset on a unique sample of people who are active online. 2. Relation to Investigator’s previous work 2.1 Adolescent Social Networks This project extends my previous substantive work on adolescent friendship networks and my methodological work on large-scale social networks into a new substantive domain over a larger scale. My previous work focused on friendship and romantic relations among adolescents (Bearman et al. 1997b; Moody 1999c) and large scale social cohesion (Moody and White, 2000). In my dissertation, (Moody 1999c) I build on a cumulative research line in sociology on the role of social balance in the development of global network structures (Cartwright and Harary 1956; Davis 1963; Davis 1970; Davis and Leinhardt 1972; Doreian et al. 1996; Hallinan 1974; Holland and Leinhardt 1971; Hummon and Fararo 1995; Johnsen 1985). I developed a theory for how positive relationships, such as friendship, develop and change, and identified the global network structures, and resulting dynamic that would follow from the relationship formation process. Empirically, I use panel network models, trajectory models of social position, and dynamic simulation to show that the macro structure of high schools will remain constant even while relations at the local level are continuously changing. In all of the Add Health high schools, an ordered hierarchy of friendship groups rested within a loosely connected collection of actors who were not in cohesive groups. These ‘background’ actors would, over time, change their relations to balance the local friendships they were involved in. In so doing, they created imbalance for those around them and shifted the population of people who were embedded in friendship groups. This dynamic modeling effort is one of the largest dynamic network studies completed to date. A second feature of my work on adolescent social relations has focused on relational race segregation (Moody 1999a). Consider the image of “Countryside School District” below. In this figure, points represent students and lines represent relations among students. In general, two people who have many friends in common are plotted close to each other, while two people who have few friends in common are distant from each other. In this school, we see a clear split between White students on the left part of the figure and Black students on the right.4 When I compare students across multiple different schools, I find that racial heterogeneity in the school setting tends to increase the tendency for students to choose friends of their own race, but that integrated extracurricular activity mitigates this same-race selection processes (Moody 1999a). 4 Within the race groups, the clustering evident is between Jr. High students (top-left) and High school students (bottom-right) 5 Figure 1. Social Relations in “Countryside” School District Points Colored by Race White Black Mixed/Other 2.2 Large Scale Social Cohesion I have recently been working with Douglas R. White (Moody and White 1999) to extend his foundational work on the connection between network connectivity and social cohesion (Brudner and White 1997; Harary and White 1999; White 1998; White et al. 1999). We define cohesiveness as the minimum number of actors who, if removed from a group, would not allow the group to remain connected. We show that this conception of cohesion leads to hierarchically nested sets of ever-increasing connectivity. This hierarchical nesting provides a rigorous analytic operationalization of network embeddedness, which we show is a significant factor in empirical applications as wide ranging as adolescent school attachment and the political action similarity of corporations. Extending the insights of this work into very large networks, such as those that result from computer communication, will provide an opportunity to test for cohesion effects over great social distances. White is a prominent figure in social network analysis and chairs the program in social network analysis at the University of California – Irvine. He has had extensive experience analyzing very large social networks (Brudner and White 1997; White et al. 1999) and is currently developing longitudinal models and comprehensive multiple investigator data sets focusing on large-scale social cohesion (see NSF grant BCS-9978282, “Longitudinal Social Network Studies and Predictive Social Cohesion Theory, 1999-2002”). White will serve as a consultant on this project and his expertise with respect to large networks, social cohesion, and identifying equivalence positions in social networks will strengthen the quality of the project. 2.3 Large Social Networks Methods In addition to my substantive work on relations among adolescents, I have extensive methodological experience developing techniques for large social networks (Moody 1998a; Moody 1998b; Moody 1999b; Moody and White 1999). I have developed an integrated set of network analysis modules, including the only currently implemented algorithm for identifying all connectivity sets in large social networks (Moody 1999b). My work on large networks has focused on understanding the temporal features of STD flows (Moody 2000), identifying cohesive peer groups (Moody 1998b), and enumerating the triad structure of social networks (Moody 1998c), which are integral to describing the broad structural patterns in a social network (Johnsen 1985; 1986). Empirically, I have developed and implemented a wide range of measures for multiple large networks as part of my work on the National Longitudinal Survey of Adolescent Health (Add Health), resulting in a publicly available dataset of network measures for general use by other researchers (Bearman et al. 1997a). Since much of the substantive work in social networks has focused on small groups, the techniques used to analyze social networks tend to be inefficient when applied to the much larger (and usually much more sparse) 6 networks. I have used, and will continue to adapt, new graph exploration algorithms from computer science, which makes analyzing large networks much more feasible than was possible even 15 years ago (see for example, Auletta et al. 1999; Ball and Provan 1983; Chartrand and Oellermann 1993; Gibbons 1985; Kanevsky 1993; Khuller and Raghavachari 1995). 3 Research Design 3.1 Overview The proposed project has a three-stage design. In the first stage, I will conduct a snowball sample to identify large connected components, drawn from wide-ranging geographic areas. This sample will provide basic demographic and global network information, as well as provide the frame from which to draw specialized sub-samples for in-depth longitudinal study. In the second stage, I will select three types of people for longitudinal study: (1) a representative random sample of the stage-one snowball sample, (2) an ego-network sample, and (3) a cohesive peer group sample. Each member of the special samples will be followed for a year and given detailed interviews on the content and quality of their relations 4 times. At the time of the 4th interview, I will re-contact all people from the original snowball sample for a short follow-up survey, to provide a global context for the detailed temporal data. From these three data sources, I will be able to estimate the features of network topography outlined in section 1, identify the quality and significance of electronic communities in the lives of respondents, identify how relations change over time, and provide the research community with a national sample of longitudinal, electronic social networks. 3.2 Data collection 3.2.1 Stage 1: Global Network Snowball Sample To identify properties of network distance, social cohesion and balance, we need to have data that extends beyond the individual to the greater social network. Ideally, this would include all actors linked in a given network. Given the extreme size of the computer communication network (presumably most of the estimated 259 million people online are connected in a single giant component (Parker, 1985)), it is impossible to analyze the entire population. Given that the majority of online activity occurs in English and the United States, restricting the survey to those explicit criteria helps limit the size, but the number of English speaking online adults in the Unites States is still over 100 million and thus some sampling procedure is required. The best way to sample from a global network is to use snowball sampling techniques. A snowball sample is an intuitive choice since the object of study, the structure of the network, defines the data collection procedure (Frank 1977; Frank 1978; Frank 1979). Starting with a geographically dispersed initial seed sample, I will ask each respondent to name the people they email with most frequently. I will then contact the people that they name, and ask them for the names of the people they email with most frequently, and so on. The size of a snowball sample depends on the number of steps followed and the probability that any newly named person has already been selected into the sample. These two quantities effectively govern the sample size at a given sample step, and in a random network, can be closely approximated with: pi+1 = (1-Xi)(1-e-api) (1) where pi is the proportion of the total population reached at the ith step, Xi is the cumulative proportion and a is the mean degree of actors in the network. If the network is not random, but structured due to reciprocity, transitivity or clustering around ego attributes, a is effectively reduced. A network with structured ties will thus have a reachability profile similar to a random network with smaller average degree, α (see the work of Fararo and Skvoretz (Fararo and Skvoretz 1987;Skvoretz and Fararo 1996; Skvoretz 1983; 1985) for a detailed description of this effect).5 I thus propose to seed the snowball sample with 200 people, chosen from widely varying geographic areas. I will then snowball out from the initial set to a depth of not more than 10 steps, or until I reach 50,000 from any 5 An implicit assumption in equation 1 is that the probability of inclusion is essentially continuous over the network. Thus, the approximation may not hold well in heterogeneously clustered – or nested – networks. 7 initial snowball seed. 6 This then sets the maximum size of any connected component at 50,000 nodes, which ought to be large enough to estimate connectivity parameters and to identify high-connectivity clusters within extended neighborhood of each seed member. The sample design ensures a maximum of 200 connected components, which would provide multiple large images of the entire electronic communication network, though there will be fewer if the initial sample seeds are less than 10 steps apart (which is likely for at least some nodes within each chosen city). 3.2.1.1 Selecting the Snowball Seed Sample No unified sampling frame for all electronic email addresses exists. Thus, to ensure geographic dispersion, I will first stratify the US into 10 large geographic areas (by state combinations) and within each state area, I will then randomly select 2 large cities (with large defined as in the top 10% of the wider geographic area). Using on-line email search engines, such as Netscape’s Who Where People finder, I will randomly select respondents until 10 seeds from each setting agree to participate. Each person will be chosen by randomly selecting a letter for the last name and then selecting randomly within the list of all people whose name starts with the selected letter in the given setting. 3.2.1.2 Snowball Sample Mechanics All data will be collected using a Computer Assisted Data Interview (CADI) enabled Web survey on a dedicated secure server. Since the population of interest is all people who are online, a web-based survey provides an excellent medium for collecting the network data. Most importantly, once the initial investment in hardware and software development has been made, the marginal cost of each survey response is essentially zero. For a network survey of the size needed to identify the macro-level properties of electronic networks, this is a decided advantage. The Stage 1 snowball survey will consist of a short demographic questionnaire and an email network name generator. The demographic portion of the survey will collect data on gender, race, residence, socio-economic status, age and family structure. The attributes identified in this portion of the survey will be used to identify basic mixing matrices (Morris 1997) and network bias parameters (Fararo 1981; Fararo and Skvoretz 1987; Skvoretz and Fararo 1996). The second part of the snowball sample questionnaire will be a network name generator that asks respondents to identify the people they email with most frequently. Once the names and email addresses of each alter have been entered, I can use a JAVA applet to allow each respondent to draw the links among the alters they nominate; providing a simple, complete ego-network generator that ought to be fun for respondents to use. Here the respondent will identify some basic demographic characteristics of each person (race, gender, age and occupation). This technique provides direct information on email communication from ego and an indirect estimate of communication among ego’s alters, that will be useful for estimating linkages among those actors who do not agree to participate. The web survey will automatically check the newly identified names against those currently contacted. If email addresses are nominated that have not been previously contacted, they will be sent an email informing them about the purpose and content of the study and inviting them to participate. In order to maintain a manageable volume at the study server, we can control the number of active surveys in the field at any given time. Survey response rates are important for any social survey, but especially so for network surveys. Most methods and measures for networks require population data. Table 1 below shows what proportion of the population would be reachable under different response assumptions, based on simulated clustered networks.7 We see, for example, that we would gather information on 86% of the total population (from either self reports or alter reports) if 30% of the people we contacted agreed to participate in the survey and then agreed to provide us with information on 70% of their contacts. Importantly, we can cover most of the observed networks even if overall response rates are fairly low (30% - 50%). Since we will have multiple estimates of relations among 6 The 10-step limit is based on estimates derived from equation 1 and a national population of 100 million online users. Assuming a graph with an effective degree equal to that of large high schools, the 50,000 person limit will be reached in between 6 and 7 steps from a 10-person starting sample (the number I will seed in any given community). 7 Results are from 1000 trials on simulated networks with an average degree of 11 that consisted of randomly generated primary groups of size 50 loosely embedded in larger groups of 200, which were more loosely linked in a population of 10,000. The results are only marginally different if you assume a smaller (mean degree=9) or larger (mean degree = 13) degree value or change the size of the network. 8 people we do not sample (from their close associates), we will have some information about the structure of relations among most of the people in the graph. Table 1. Network coverage under various response patterns Linkage Rate Participation Rate 50% 70% 90% 30% 63% 86% 94% 40% 84% 95% 97% 50% 92% 97% 99% Since we cannot know the final sample size before starting when using a snowball sample, it is impossible to calculate a cost for individual participation inducements. More importantly, even a very small inducement, which would likely not help increase participation rates, would result in a huge cost if given to every respondent. In keeping with many online surveys, I propose a lottery system for rewarding participation. Thus, each respondent will be entered into a drawing for $1,000. 3.2.2 Stage 2: Longitudinal Sub-samples 3.2.2.1 Overview The snowball sample will provide information on the broad structural features of the electronic social network. To understand the dynamics and qualitative details of electronic communication networks, we need to follow a smaller group of people, using more detailed survey instruments, over time. In this section, I describe three sub-samples of the original network. First, I will select a simple random sample of 2000 respondents from the network generated by the snowball sample procedure. Second, for each member of the random sample, I will identify the people they email with most often and bring them into a combined ego-network sample.8 Third, I will use a cohesive peer group identification method to identify cohesive communities through the pattern and frequency of interaction. While there is no pre-specified size for such groups, I will attempt to sample a broad spectrum of groups from small (less than 100) to large (the maximum observed group size).9 Each person selected into a special sample will receive a detailed survey 4 times over the year, which will be designed to measure changes in their local electronic networks, gauge the overlap of their electronic relations with other social relations, and to relate attitudes and behaviors to network position. 3.2.2.2 Simple Random and Ego-Network Samples The snowball samples will extend over a potentially very large population. To gather greater information about this population, and to understand how people in varying positions in the larger network behave, I will sample a group of people at random from the entire snowball sample. Because this will be a representative sample of the snowball networks, I will be able to relate attitudes, behaviors and network activity to a respondent’s position in the overall network. This will provide a broad base of information to identify how positions in the original snowball network relate to behaviors. For each person in the random sample, I will also select the people they are adjacent to in the electronic network. This will provide the local sociometric context within which each actor is situated, and thus allow for a comparison of activities between ego and his or her network neighbors. Changes in behavior in the local network can then be linked to an individual’s actions, providing a direct context for each person’s behaviors. Moreover, the detailed ego-network data will provide information on mixing patterns for many more attributes than collected in the short snowball sample form. 3.2.2.3 Cohesive Peer Group Sample While the ego-network samples provide information on the local context actors are embedded within, the promise and scientific interest of network analysis comes from looking beyond the individual to the wider groups he or she is embedded within. Extending a method I have developed for identifying cohesive peer 8 The expected sample size will be the mean degree times 2000, minus any overlap. To make the task of reporting on relations manageable for respondents, the size will be limited to 20 alters. Previous research indicates that people have between 11 and 17 close relations (Fischer 1982, Wellman 1992). 9 As a benchmark, the mean size of such groups in the Add Health data was 22 members. 9 groups in large networks, I will construct groups from the identified snowball sample based on two principles: the volume of interaction and the cohesive pattern of interaction. For the purposes in this project, I will first implement a tri-connected component algorithm (Hopcroft and Tarjan 1973). Identifying tri-components will reduce size of the sub-graph I need to search over, simplifying the remaining search. Within each tricomponent, I identify partitions that maximize the number of within-group ties and minimize the number of between-group ties, while ensuring that the graph is at least bi-connected (for a method that is conceptually similar but does not include the connectivity restriction, see Frank 1995). This results in an interaction group that is both cohesive and heavily interactive. 3.2.2.4 The Stage-2 Survey Instrument All respondents selected into the stage-2 sample will receive the same survey, again administered through a Web based CADI system on a secure dedicated server. This survey will consist of three modules: a general social survey module, an Internet behavior module, and a network name generator. The general social survey section will include items on demographics and family structure, employment, attitudes and feelings, and other commonly studied social behaviors. Items will be chosen to maximize the potential usability by other social scientists who want to understand how network processes affect important sociological questions. I will query network researchers, though organizations such as the International Network for Social Network Analysis (INSNA), on topics that they would prefer to see included in the survey. Certain elements are sure to be included, such as questions that focus on community involvement and identification, which help identify the place of electronic relations in the lives of respondents. Whenever possible, questionnaire items will be take from well known general social surveys, such as the General Social Survey (GSS), the National Longitudinal Survey of Youth (NLSY), the Current Population Survey, and the National Longitudinal Survey of Adolescent Health (Add Health). This will allow researchers to compare responses on the network sample to a known national probability sample. Researchers will then be able to relate questions about network position and composition to a wide range of substantive topics, enriching the scientific return to the data collection greatly. In addition to general social behavior questions, I will include a module specifically designed to understand the qualitative aspects of electronic communication networks. Respondents will be asked to identify how much time they spend online with their common email friends and how important such relations are in their everyday lives. They will be asked to identify how often they use such contacts for typically studied network effects, such as social support (Wellman 1992; Wellman and Wortley 1990), help getting jobs (Granovetter 1973), information gathering (Buskens and Yamaguchi 1999; Friedkin 1991; Meyer 1994), and companionship (Bell 1988; Duck 1991; Leenders 1996; Zeggelink 1993). The third section of the detailed interview will consist of a replication of the GSS social network module for close friends, without any referent to the electronic network. This module will allow us to construct egonetworks that are not necessarily constrained to electronic relations, and thereby compare the email networks to general friendship networks. By also collecting data on family structure and whether or not any named alter is kin we will be able to compare electronic kinship nets and friendship networks. As with the first stage instrument, response rates are important – especially with the cohesive network sample. Given the longer time commitment required of respondents, I propose to provide an additional lottery inducement of $1500 for each wave of the in-depth survey. 3.2.3 Snowball Network Resample With the collection of the data outlined above, I will have detailed information on the starting network, and 4 snapshots of parts of that network over time. A longitudinal picture of the total network over this period is still needed to situate the sub-samples within the wider computer communication network. I propose to recontact all people identified in the original survey at the time of the last longitudinal sample. This short followup questionnaire will contain only the network module and questions about changes in status since we last contacted them, and will serve to anchor the global network sample at the end of the study, providing the ability to situate the sub-sampled actors within the wider population structure. 4. Data Analysis Once the snowball sampling limits have been reached, the next task is identifying the longitudinal network samples, which requires identifying all cohesive peer groups in the network. Once these groups have been 10 identified and the longitudinal survey put into the field, work on identifying the properties outlined in the first section can begin. The ability to analyze large social networks is expanding rapidly, thanks largely to the development of PAJEK (Batagelj and Mrvar 1999), a program for analyzing large networks. For the work below, I will use PAJEK when possible, and develop separate software as needed. 4.1 Network Topography The electronic social network is represented as a digraph, G(V,A), where the vertices, V, represent our set of |v| actors and the arcs, A, represent the relations among actors defined as an ordered set of pairs (vi,vj). Actor i is adjacent to actor j if (vi, vj)∈A. A path in the network is defined as a sequence of adjacent, distinct vertices and edges, starting with one node and ending with another. Actor i can reach actor j if there is a path in the graph starting with i and ending with j. 4.1.1 Social Distance Graph theoretically, distance is defined as the minimum number of edges in a path connecting two actors. We can identify the distance from ego to any other actor in the network using a simple BFS search (Gibbons 1985). One measure of the social distance in the achieved snowball graphs can be calculated by tracing the geodesics in the observed graph. However, the sampling procedure will constrain the diameter of the graph to a 20 step maximum (10-steps on either ‘side’ of the initial seed), and smaller if 50,000 people are reached in fewer steps from the seed. Thus for two nodes that are closer to the frontier of the snowball sample than the distance between them, their geodesics may be overestimated, since we cannot know if a contact in the next (not-sampled) step would link them. For all other nodes in the sample graph, the geodesic will provide an accurate estimate of the social distance amongst the actors. A second approach to social distance will be to estimate network bias parameters (Fararo 1981; 1983; Fararo and Skvoretz 1987; Skvoretz 1983; 1985) from the extended local networks of each person contacted. Conceptually, bias parameters control the difference between a and α in a snowball recursion formula, such as that given in equation 1. The reachability curves for any given population can be traced, just as in the construction of a snowball sample, and a parameter governing reachability can be estimated from these curves. Figure 2 below, for example, gives the reachability curves for three large American high schools. Figure 2. Mean Trace Profile for three high schools. Data from the National Longitudinal Survey of Adolescent Health. 1 Proportion Contacted 0.8 0.6 0.4 0.2 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Step In each case, the curve is much more shallow (reaches fewer people in k steps) than a random network of similar mean degree. By estimating the proportion of new people brought into the sample at each wave, we can approximate the entire reachability curves. One can estimate bias parameters based on transitivity (the likelihood that if i nominates j, and j nominates k, i also nominates k), reciprocity (the likelihood that if i nominates j, j also 11 nominates i), and homogeneity or clustering parameters (the likelihood that a person with attribute k nominates another person with attribute k, or who is x units different from ego on k). 4.1.2 Social Cohesion and Peer Communities I measure social cohesion through the nested node connectivity sets. First, I will employ linear time algorithms to identify bicomponents and tri-components in the networks (Gibbons 1985; Hopcroft and Tarjan 1973), and higher levels of connectivity by combining low polynomial time algorithms for testing connectivity (Even and Endre Tarjan 1975) and identifying cut-sets of the graph (Kanevsky 1993).10 This procedure will identify a nested set of cohesive groups. Identifying how behavior similarity and relation stability differs within and between such groups is an important test for the relation between node connectivity and social cohesion. The procedure for identifying peer groups starts with identifying the tri-components of the graph, and searching for dense interaction regions within the tri-components. Once an initial set of dense regions is identified, using a standard distance clustering method (or any initial cut method, such as the adjacency sorting method implemented in NEGOPY (Richards 1995)), I then identify the returns to relative density that follow from moving any member of one group to another group. The iterative procedure then assigns people to groups if a reassignment would increase the fit index for both groups. At each re-assignment stage, new groups are checked for minimal connectivity to ensure that all identified groups are cohesive. The re-assignment procedure works on the underlying mixing matrix, and thus greatly reduces the computational size of the problem. Three positions result from this group assignment procedure. Nodes either belong to cohesive groups (members), are between multiple groups (liaisons), or are outside of the system (people who are not members of the largest bi-component). 4.1.3 Small World Structure A small world network can be characterized as one in which most ties are sent within a relatively small, local group of actors, where each of these small groups have a few ties distributed throughout the graph, thereby linking the small clusters together. In his recent work on the small world problem, Duncan Watts identifies a set of simple mixing parameters that capture the broad features of a small world network (Watts 1999; Watts and Strogatz 1998). Combining the median length of the geodesics in a graph (what Watts refers to as the characteristic path length, L) and the clustering coefficient, γv, which measures the extent to which vertices adjacent to any vertex v, are adjacent to each other,11 we can measure the degree to which a given graph represents a small-world structure. Watt’s provides two general mathematical models for small world graphs, which depend on a single parameter bounded between 0 and 1. At one extreme, we have a completely random graph while at the other a completely clustered graph.12 By estimating this parameters for the observed electronic networks and building on the theoretical applications Watts provides, we can identify the potential dynamic properties associated with the graph. 4.1.4 Social Balance The transitivity level, or functions of the transitivity level, of a given network is a standard measure of social balance, which rests on the distribution of triads in a network. While efficient matrix methods for identifying the triad census in moderately sized dense network are available (Moody 1998c), for sparse networks it is more efficient to enumerate the distribution of triads in the network as they can be calculated from the 2-step neighborhoods of each node. For this part of the project, I will identify the triad census for the graph as a whole, as well as for regions of varying distance around each node. In so doing, I will be able to assess the extent of social balance at various removes from ego, and thus evaluate the structure of both the global and messolevel communities each actor is involved in. Functions of the transitivity index are problematic measures of balance when other features, such as organizational clustering and homophily, can lead to transitive friendship groups without reference to a balance 10 The efficiency of both algorithms can be improved when searching for nested sets, as the search area in any graph can be constrained to those nodes with degree greater than or equal to the previous connectivity level, k. 11 γv is equal to the density of the local neighborhood of each vertex v. 12 The difference between the two models is that one assumes a ring-substrate. This is likely the most applicable model, since a giant bicomponent – a cycle connecting every node in the network – will likely encompass most of the nodes in the connected components. 12 model (Feld and Elmore 1982). To account for such features, a statistical model that controls for clustering in the graph will be used. The p* family of statistical models for social networks allow one to estimate the effect of a given relational pattern (such as the number of transitive or intransitive triads that would result from a given relation) on the likelihood of a relation being present, net of other features in the graph (Robins et al. 1997; Wasserman and Pattison 1996). The models are estimated using a logistic regression on properties of the ordered dyads in a network. A model such as the following would be a typical cross-sectional example (seeMoody 1999c for model details and alternative specifications), that could be estimated on the clustered subsamples of the total graph. p ( Y ij = 1) = a + b1 ( E i ) + b 2 ( p j ) + b 3 ( H ij ) + b 4 ( r ji ) + b 5 ( T ij ) + e ij log p (Y = 0 ) ij (2) Where a captures the effect of density, b1 is a coefficient for ego out-degree, b2 is a coefficient for alter in-degree, b3 is a vector of coefficient(s) describing effect(s) of dyad attribute differences on email relations (multiple homophily parameters), b4 is a coefficient for reciprocity effects, b5 is a coefficient for transitivity, describing the impact on the number of transitive/intransitive triples associated with the jth dyad and eij is a random error term for the ijth dyad. I have successfully used similar models on a large sample of fairly large networks (129 networks, ranging in size from 25 to 2000, with a mean of just under 500 nodes). 4.1.5 Category and Network Group interaction can be modeled through mixing matrices, square tables that count the number of connections from people in one group to people in another. One can then fit log linear models to the tables to statistically model the likelihood of a person of one race nominating a person of another, or use such categories as elements in the homophily parameters in the p* model of equation 2. Simple descriptions of mixing frequency illustrate the saliency of a given category. Consider as an example table 2 below, which measures interaction patterns among high school students (based on Add Health data). This table provides race-specific mixing ratios, αij, and measures the relative odds that someone of the row race nominates someone of the column race as a friend. For example, we see that the odds of a White student nominating another white student are about 4.4 times the odds of a White student nominating a non-white student. Table 2. Race Specific Mixing Patterns* Race of person nominated as a friend Race if nominator White Black Hispanic White 4.44 0.29 0.79 Black 0.21 9.16 1.18 Hispanic 0.81 1.29 2.51 Asian 0.81 0.51 0.69 *Row to column odds ratio Asian 0.70 0.39 0.78 7.90 4.1.6 Network Position Network position describes how an individual is situated within the overall network. On this broad view, position includes both measures such as popularity (which captures a volume dimension of position), centrality (positioning each node relative to the center or periphery of the network), and relational pattern similarity (structural equivalence). Popularity is measured through actor in-degree – the number of people who nominate ego as an electronic communication partner. Centrality can be measured in many ways (Bonacich 1987; Freeman 1979; Friedkin 1991; Kim 1997). I will calculate multiple centrality measures to provide other researchers with choices for models of centrality on behavior. Role equivalence can be measured well by calculating the triad position each actor is involved in. There are 16 possible triads in a directed graph, and 36 positions within those triads. Each actor occupies a particular distribution of the positions, and two actors with identical triad position vectors will have equivalent role positions in the network (Burt 1990; Moody 1999c). 13 4.2 Evaluating Use and Content of Online Relationships Models of relational content will seek to describe (1) how people use their computer relations, (2) how important such relations are in their lives and (3) how such relations correspond to other network relations. The first two goals are met largely through simple descriptive statistics. Ordered scales will be constructed to measure the importance and saliency of each relation, and standard statistical modeling techniques will then be used to explain variance in the importance and use of electronic networks based on the attributes and network positions of actors. The third goal is met both descriptively (what proportion of electronic partners are also kin, for example) and algebraically (Mandel 1983; Pattison 1993). By compounding multiple relations, we can reduce the complexity of the ego-network pattern to a containment set (Mandel, 1983:379) which characterizes the local role of any actor. When calculated for all actors, we have an inventory of the role structures in the electronic network, which can then be used in statistical models of behavior and attitudes. For example, we will be able to identify sets of actors for whom friendship and electronic relations are identical (the compound relation equals the individual relation) or completely disjoint (the compound relation is empty) as well as a range of intermediate local role structures. 4.3 Dynamic Features The technical tools for describing change in social networks are less well developed than models and methods for describing networks in the cross section. The simplest analysis of network change involves describing changes in the various statistics calculated on each cross section. For example, as more people become involved in electronic communication, the overlap between friendship relations and electronic relations may increase. By describing patterns of change in transitivity, reciprocity, and position, we go a long way towards understanding how networks evolve. More formally, I will extend techniques I used on the Add Health sample of high school networks (Moody, 1999c). This includes using time-ordered panel extensions of the p* models developed by Wasserman, Pattison and Robinson. These models approximate previous Markov approaches (Leenders 1995), while maintaining the modeling flexibility of the p* logit models. In these cases, we can model the current network as a function of the past network and patterns of the current network. Another approach to capturing dynamic features of the networks will be to treat movement in network positions as a mobility problem. For example, we can model changes in actor popularity as a mobility matrix, using standard log-linear models to describe the mobility regime in any given network. These analyses can be extended at the individual level by modeling the trajectory of each person over time (Han and Moen 1999), using variants of sequence analysis (Abbott 1995). 4.4 Public Use Data Preparation My goal is to make as much data publicly available as possible while maintaining the strict confidentiality of all respondents. I will create two forms of the data for public release. The first dataset will contain all information in the datafile (the network adjacency information as well as demographic and behavioral data), with all identifying information removed. This file will contain no information that could identify an individual. Thus, all email addresses and outlying values will be recoded, and geographic information will be limited to city. The audience for these data will be researchers with training and interest in analyzing large complex social networks, who want to develop detailed behavior models (such as peer influence models and network autoregression models (Dow 1986; Friedkin 1998; Friedkin and Cook 1990; Friedkin and Johnsen 1997)) and measurement techniques. The second data file will consists of a series of constructed network variables appended to the demographic and behavioral files. This file will contain information on positional variables (centrality, degree, geodesic distance to others), local network context variables (reciprocity of local network, transitivity in local network, density in local network, etc.), local network composition variables (heterogeneity, mean values of substantive variables, geographic dispersion of the ego-network, kin-composition, etc.), and sub-group membership variables (position indicators and cohesive group membership). This will then be a simple rectangular file that other researchers can use with standard social science methods. The target audience for these data will be those with a substantive interest in the importance of network context for behavior and those interested in online behavior and activity, who are not trained to calculate network measures directly. 14 ITR/SOC: The Structure and Dynamics of Electronic Social Networks James W. Moody, PI, The Ohio State University. Cited References Abbott, A. 1995. "Sequence Analysis: New Methods for Old Ideas." Annual Review of Sociology 21:93113. Auletta, V., Ye. Dinitz, Z. Nutov, and D. Parente. 1999. "A 2-Approximation Algorithm for Finding an Optimum 3-Vertex Connected Spanning Subgraph." Journal of Algorithms 32:21-30. Ball, M. O. and J. S. Provan. 1983. "Calculating Bounds on Reachability and Connectedness in Stochastic Networks." Networks 13:253-78. Batagelj, Vladimir and Andrej Mrvar. 1999. PAJEK. Vers. 49. Bearman, P., J. Moody, and K. Stovel. 1997a. "The Add Health Network Variable Codebook." University of North Carolina at Chapel Hill. ———. 1997b. "Chains of Affection: The Structure of Adolescent Romantic Networks." University of North Carolina at Chapel Hill. Manuscript . Bell, R. R. 1988. Worlds of Friendship. Beverly Hills: Sage publications. Blau, P. M. 1994. Structural Contexts of Opportunities. Chicago and London: University of Chicago Press. Blau, P. M. and J. E. Schwartz. 1984. Crosscutting Social Circles: Testing a Macrostructural Theory of Intergroup Relations. Orlando: Academic Press. Bonacich, P. 1987. "Power and Centrality: A Family of Measures." American Journal of Sociology 92:1170-1182. Bourdieu, P. 1989. "Social Space and Symbolic Power." Sociological Theory :14-25. Brudner, L. A. and D. R. White. 1997. "Class, Poverty, and Structural Endogamy: Visualizing Networked Histories." Theory and Society 26:161-208. Burkhalter, B. 1999. "Reading Race Online: Discovering Racial Identity in Usenet Discussions." Pp. 60-75 in Communities in Cyberspace, Editors Peter Kollock and Marc A. Smith. London: Routledge. Burt, R. S. 1990. "Detecting Role Equivalence." Social Networks 12:83-97. Buskens, V. and K. Yamaguchi. 1999. "A New Model for Information Diffusion in Heterogeneous Social Networks." Sociological Methodology 29:281-35. Cartwright, D. and F. Harary. 1956. "Structural Balance: A Generalization of Heider's Theory." Psychological Review 63:277-93. Chartrand, G. and O. R. Oellermann. 1993. Applied and Algorithmic Graph Theory. New York: McGrawHill Inc. Coleman, J. S. 1961. The Adolescent Society. New York: Free Press. CyberAtlas. 2000. "The World's Online Populations." http://cyberatlas.internet.com/big_picture/geographics/article/0,1323,5911_151151,00.html Davis, J. A. 1963. "Structural Balance, Mechanical Solidarity, and Interpersonal Relations." American Journal of Sociology 68:444-62. ———. 1970. "Clustering and Hierarchy in Interpersonal Relations: Testing Two Graph Theoretical Models on 742 Sociomatrices." American Sociological Review 35:843-51. Davis, J. A. and S. Leinhardt. 1972. "The Structure of Positive Relations in Small Groups." Pp. 218-51 in Sociological Theories in Progress, vol. 2, J. Berger, M. Zelditch, and B. Anderson. Boston, MA: Houghton Mifflin. Donath, J. S. 1999. "Identity and Deception in the Virtual Community." Pp. 29-59 in Communities in Cyberspace, Editors Peter Kollock and Marc A. Smith. London: Routledge. Doreian, P., R. Kapuscinski, D. Krackhardt, and J. Szczypula. 1996. "A Brief History of Balance Through 1 Time." Journal of Mathematical Sociology 21:113-31. Doreian, P. 1986. "On the Evolution of Group and Network Structure II: Structures Within Structures." Social Networks 8:22-64. Dow, M. M. 1986. "Model Selection Procedures for Network Autocorrelated Disturbances Models." Sociological Methods and Research 14:403-22. Duck, S. W. 1991. Friends for Life: the Psychology of Personal Relationships. New York: Havester. Durkheim, E. 1984. The Division of Labor in Society. translator W. D. Halls. New York: The Free Press. Even, S. and Endre Tarjan. 1975. "Network Flow and Testing Graph Connectivity." SIAM Journal of Computing 4:507-18. Fararo, T. J. 1981. "Biased Networks and Social Structure Theorems." Social Networks 3:137-59. ———. 1983. "Biased Networks and the Strength of Weak Ties." Social Networks 5:1-11. Fararo, T. J. and J. Skvoretz. 1987. "Unification Research Programs: Integrating Two Structural Theories." American Journal of Sociology 92:1183-209. Feld, S. L. 1981. "The Focused Organization of Social Ties." American Journal of Sociology 86:1015-35. Feld, S. L. and R. Elmore. 1982. "Patterns of Sociometric Choices: Transitivity Reconsidered." Social Psychological Quarterly 45:77-85. Festinger, L. 1957. A Theory of Cognitive Balance. Evanston, IL: Row, Peterson & Co. Fischer, C. S. 1982. To Dwell Among Friends: Personal Networks in Town and City. Chicago: University of Chicago Press. Frank, K. A. 1995. "Identifying Cohesive Subgroups." Social Networks 17:27-56. Frank, O. 1977. "Survey Sampling in Graphs." Journal of Statistical Planning and Inference 1:235-64. ———. 1978. "Sampling and Estimation in Large Social Networks." Social Networks 1:91-101. ———. 1979. "Estimation of Population Totals by Use of Snowball Samples." Pp. 319-48 in Perspectives on Social Network Research, Paul. W. Holland and Samuel Leinhardt. New York: Academic Press. Freeman, L. C. 1972. "Segregation in Social Networks." Sociological Methods and Research 6:411-30. Freeman, L. C. 1979. "Centrality in Social Networks: Conceptual Clarification." Social Networks 1:21539. Freeman, L. C., D. R. White, and K. A. Romney. 1992. Research Methods in Social Network Analysis. New Brunswick and London: Transaction Publishers. Friedkin, N. E. 1991. "Theoretical Foundations for Centrality Measures." American Journal of Sociology 96:1478-504. ———. 1998. A Structural Theory of Social Influence. Cambridge: Cambridge. Friedkin, N. E. and K. S. Cook. 1990. "Peer Group Influence." Sociological Methods and Research 19(1):122-43. Friedkin, N. E. and E. C. Johnsen. 1997. "Social Positions in Influence Networks." Social Networks 19:209-22. Galaskiewicz, J. and S. Wasserman. 1981. "A Dynamic Study of Change in a Regional Corporate Network." American Sociological Review 46:475-84. Gibbons, A. 1985. Algorithmic Graph Theory. Cambridge: Cambridge University Press. Granovetter, M. 1973. "The Strength of Weak Ties." American Journal of Sociology 81:1287-303. Hallinan, M. T. 1974. "A Structural Model of Sentiment Relations." American Journal of Sociology 80:364-78. Han, S.-K. and P. Moen. 1999. "Clocking Out: Temporal Patterning of Retirement." American Journal of Sociology 105:191-236. Harary, F. and D. R. White. 1999. "Measuring Social Cohesion: Node Connectivity and Conditional 2 Density." Manuscript . Holland, P. W. and S. Leinhardt. 1971. "Transitivity in Structural Models of Small Groups." Comparative Groups Studies 2:107-24. Hopcroft, J. E. and R. E. Tarjan. 1973. "Dividing a Graph into Triconnected Components." SIAM Journal of Computing 2:135-58. Hummon, N. P. and T. J. Fararo. 1995. "Assessing Hierarchy and Balance in Dynamic Network Models." Journal of Mathematical Sociology 20:145-59. Johnsen, E. C. 1985. "Network Macrostructure Models for the Davis-Leinhardt Set of Empirical Sociomatrices." Social Networks 7:203-24. ———. 1986. "Structure and Process: Agreement Models for Friendship Formation." Social Networks 8:257-306. Kanevsky, A. 1993. "Finding All Minimum-Size Separating Vertex Sets in a Graph." Networks 23:533-41. Khuller, S. and B. Raghavachari. 1995. "Improved Approximation Algorithms for Uniform Connectivity Problems." Proceedings of the 27th Annual ACM Symposium on the Theory of Computing :1-10. Kim, H. 1997. "Structural Holes, Strategic Communication, and Control Centrality in Social Networks." Workshop on Structures in Process, Working Papers Series (1). University of North Carolina at Chapel Hill. Leenders, R. Th. A. J. 1996. "Evolution of Friendship and Best Friendship Choices." Pp. 149-64 in Evolution of Social Networks, Editors P. Doreian and Frans N. Stokman. New York: Gordon and Breach. Leenders, R. Th. A. J. 1995. "Models for Network Dynamics: A Markovian Framework." Journal of Mathematical Sociology 20:1-21. Lorrain, F. and H. C. White. 1971. "Structural Equivalence of Individuals in Social Networks." Journal of Mathematical Sociology 1:49-80. Mandel, M. 1983. "Local Roles and Social Networks." American Sociological Review 48:376-86. Mark, N. 1998. "Beyond Individual Differences: Social Differentiation From First Principles." American Sociological Review 63:309-30. Meyer, G. W. 1994. "Social Information Processing and Social Networks: A Test of Social Influence Mechanisms." Human Relations 47:1013-47. Milgram, S. 1969. "The Small World Problem." Psychology Today 22:61-67. Moody, J. 1998a. "A General Method for Creating Approximate Conditionally Uniform Random Graphs." University of North Carolina at Chapel Hill. Manuscript . ———. 1998b. "Identifying Cohesive Subgroups in Large Networks." University of North Carolina, Chapel Hill. Manuscript . ———. 1998c. "Matrix Methods for Calculating the Triad Census." Social Networks 20:291-99. ———. 1999a. "School Friendship Segregation: Racial Heterogeneity and Friendship Choice in American High Schools." The Ohio State University. Manuscript . ———. 1999b. SPAN: SAS Programs for Analyzing Networks. Vers. .30. The Ohio State University. ———. 1999c. "The Structure of Adolescent Social Relations: Modeling Friendship in Dynamic Social Settings." Dissertation. University of North Carolina, Chapel Hill. ———. 2000. "Indirect Connectivity and STD Infection Risk: The Iportance of Relationship Timing for STD Diffusion." Manuscript. The Ohio State University. Moody, J. and D. R. White. 1999. "Social Cohesion and Embeddedness: A Hierarchical Conception of Social Groups." in The Ohio State University. Manuscript . Morgan, D. L., M. B. Neal, and P. Carder. 1997. "The Stability of Core and Peripheral Networks Over Time." Social Networks 19(1):9-25. 3 Morris, M. 1997. "Sexual Networks and HIV." AIDS 97: Year in Review 11(Suppl A):S209-S216. Nadel, S. F. 1955. The Theory of Social Structure. London: Cohen and West. Newburger, Eric C. 1997. Computer Use in the United States. p20-522. Washington, D. C.: U.S. Census Bureau. O'Brien, J. 1999. "Wrting in the Body: Gender (Re)Prodcution in Online Interaction." Pp. 76-106 in Communities in Cyberspace, Editors Peter Kollock and Marc A. Smith. London: Routledge. Pattison, P. 1993. Algebraic Models for Social Networks. Cambridge England: Cambridge University Press. Pool, I. d. S. and M. Kochen. 1978. "Contacts and Influence." Social Networks 1:5-51. Rapoport, A. and W. J. Horvath. 1961. "A Study of a Large Sociogram." Behavioral Science 6:279-91. Rice, R. E. 1994. "Network Analysis and Computer-Mediated Communication Systems." Pp. 167-203 in Advances in Social Network Analysis, Editors Stanley G. J. Wasserman. Sage. Richards, William D. 1995. NEGOPY. Vers. 4.30. Brunaby, B.C. Canada: Simon Fraser University. Robins, G., P. Pattison, and S. Wasserman. 1997. "Logit Models and Logistic Regressions for Social Networks: III. Valued Relations." Manuscript . Sewell, W. H. Jr. 1992. "A Theory of Structure: Duality, Agency, and Transformation." American Journal of Sociology 98:1-29. Skvoretz, J. and T. J. Fararo. 1996. "Status and Participation in Task Groups: A Dynamic Network Model." American Journal of Sociology 101:1366-414. Skvoretz, J. 1983. "Salience, Heterogeneity and Consolidation of Parameters." American Sociological Review 48(360-375). ———. 1985. "Random and Biased Networks: Simulations and Approximations." Social Networks 7:22561. Smith, M. A. 1999. "Invisible Crowds in Cyberspace: Mapping the Social Structure of the Usenet." Pp. 195-219 in Communities in Cyberspace, Editors Peter Kollock and Marc A. Smith. London: Routledge. Stokman, F. N. and P. Doreian. 1996. "Evolution of Social Networks: Processes and Principles." Pp. 23350 in Evolution of Social Networks, Editors P. Doreian and Frans N. Stokman. New York: Gordon and Breach. Taylor, H. 1999. "Online Population Growth Surges to 56% of All Adults." http://www.harrisinteractive.com/harris_poll/pdf/dec22_1999.pdf . Wasserman, S. and K. Faust. 1994. Social Network Analysis. Cambridge: Cambridge University Press. Wasserman, S. and P. Pattison. 1996. "Logit Models and Logistic Regressions for Social Networks: I. An Introduction to Markov Graphs and P*." Psychometrika 61:401-25. Watts, D. J. 1999. Small Worlds: The Dynamics of Networks Between Order and Randomness. Princeton: Princeton University Press. Watts, D. J. and S. H. Strogatz. 1998. "Collective Dynamics of 'Small-World' Networks." Nature 393:440442. Weesie, J. and H. Flap. 1990. Social Networks Through Time. Utrecht, Netherlands: ISOR. Wellman, B. 1992. "Which Types of Ties and Networks Give What Kinds of Social Support?" Pp. 207-35 in Advances in Group Processes, vol. 9, E. Lawler, B. Markovsky, C. Ridgeway, and H. Walker. Greenwich, CT: JAI Press. Wellman, B. and S. D. Berkowitz. 1997. Social Structures: A Network Approach. London: JAI Press. Wellman, B., J. Salaff, D. Dimitrova, L. Garton, M. Gulia, and C. Haythornthwaite. 1996. "Computer Networks As Social Networks: Collaborative Work, Telework and Virtual Community." Annual Review of Sociology 22:213-38. 4 Wellman, B. and S. Wortley. 1990. "Different Strokes From Different Folks: Community Ties and Social Support." American Journal of Sociology 96:558-88. White, D. R. 1998. "Concepts of Cohesion, Old and New: Which Are Valid Which Are Not?" University of California - Irvine. Manuscript . White, D. R., C. Batagelj, and A. Mrvar. 1999. "Analyzing Large Kinship and Marriage Networks with Pgraph and Pajek." Social Science Computer Review 17:245-74. White, D. R., M. Schnegg, L. A. Brudner, and H. Nutini. 1999. "Multiple Connectivity and Its Boundaries of Reticulate Integration: A Community Study." University of California Irvine. manuscript . White, H. C. 1965. "Notes on the Constituents of Social Structure." Social Relations Department, Harvard University. Manuscript . White, H. C., S. A. Boorman, and R. L. Breiger. 1976. "Social Structure From Multiple Networks I." American Journal of Sociology 81:730-780. Zeggelink, E. P. H. 1993. Strangers into Friends: the Evolution of Friendship Networks Using an Individual Oriented Modeling Approach. Amsterdam: ICS. ———. 1994. "Dynamics of Structure: an Individual Oriented Approach.” Social Networks 16:295-333. Zeggelink, E. P. H., F. N. Stokman, and G. G. Van De Bunt. 1996. "The Emergence of Groups in the Evolution of Friendship Networks." Journal of Mathematical Sociology 21:29-55. 5
© Copyright 2026 Paperzz