Procedures for Analyses of Online Communities JCMC 8 (4) July 2003 Message Board Collab-U CMC Play E-Commerce Symposium Net Law InfoSpaces Usenet NetStudy VEs VOs O-Journ HigherEd Conversation Cyberspace Web Commerce Vol. 6 No. 1 Vol. 6 No. 2 Vol. 6 No. 3 Vol. 6 No. 4 Vol. 7 No. 1 Vol. 7 No. 2 Vol. 7 No. 3 Vol. 7 No. 4 Vol. 8 No. 1 Vol. 8 No. 2 Vol. 8 No. 3 Vol. 8 No. 4 Procedures for Analyses of Online Communities Devan Rosen Cornell University Joseph Woelfel State University of New York at Buffalo Dean Krikorian Cornell University George A. Barnett State University of New York at Buffalo ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Abstract Introduction Online Community Research Method Online Community Description: SciCentr Data Analytic Method Outputs Findings Discussion Implications for Future Development Footnotes Acknowledgments References About the Authors 1 Abstract This article details a set of procedures for the analysis and interpretation of the content and structure of online networks and communities. These novel methods allow for the analysis of online chat, including parsing the data http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (1 of 19) [1/11/2005 9:13:46 AM] Procedures for Analyses of Online Communities into separate and interrelated files to determine individual, group and organizational patterns. An illustrative example of an educational online community in Active Worlds Educational Universe (AWEDU) is provided that uses three-dimensional virtual worlds for student interaction. Findings from semantic network analysis procedures reveal elements of the online interaction that would otherwise be difficult to extract given the great amount of textual data produced in such communities. The case study allows for qualitative and quantitative analyses. The limitations of the procedures are discussed along with planned developments and their social implications. Introduction "Communication has emerged as a necessary object of attention in the 20th century, not because it's new, but because it's that portion of the social organism now undergoing elephantiasis" -Marshall McLuhan (1969) As McLuhan (1969) prophetically pointed out, our ability to communicate and interact with each other has catalyzed the information revolution. Considering the change in scale introduced to human interaction by communication technology, the impact may be affecting us with as much force as the industrial revolution. Much as the railway didn't necessarily introduce movement to human society, it certainly accelerated and enlarged the scale of previous human functions and concept of space. This comparison is quite applicable to the change in human affairs that the global Internet has introduced: "Under instant circuitry, nothing is remote in time or space...it is now" (McLuhan, in Benedetti, 1996, p. 8). Resulting from the change in scale associated with the human ability to connect with each other is the shrinking of this conceptual space, as well as the way we associate cognitive communicative distance. Emerging as a powerful catalyst of the increase in global communication forums are online communities. Ranging from simple text-based newsgroups to intricate immersive virtual reality multi-user environments, these communities, whether graphic or not, are strung together by conversational text, or chat. Chat entails any number of individuals communicating with each other using text-based communication, often appearing as a chat window in graphic environments. Johnson (1997, p.71) points out that, "For the most part the social fabric of cyberspace is still stitched together by the gossamer thread of text." Through these communities this social fabric is being wrapped around the world and connecting humans with humans in much the same way a village does. Perhaps a village is even too big a metaphor, as McLuhan noted: Transmitted at the speed of light, all events on this planet are simultaneous... The absence of space brings to mind the idea of the village. But actually, at the speed of light, the planet is not much bigger than the room we're in... The acoustic or simultaneous space in which we now live in is like a sphere whose center is everywhere and whose margins are nowhere. (1974, in Benedetti, 1996, p. 24) As this sphere of contact continues to enrich our interactions across domains, from online stock trading discussions to medical advice communities, the use of chat may also increase. Parallel with the increased complexity and development of online communicative forums is the need to develop methods and tools that will allow the users and creators of future iterations of communication technology to be proactive, instead of the cycle of reactivity that is currently too common. For example, many of the technologies that would be considered part of the communication technology revolution, such as online newsgroups, multiuser dungeons (MUDs), and e-mail, were created and widely used before they were studied by social scientists. This may indeed be a result of many of such technologies origins in industry, and their widespread use was indeed what catalyzed scientific inquiry and development of analysis tools. Thus a tool for the systematic and intelligent analysis of the chat medium is quite necessary to understand further the social and communicative implications. This is not to say that the methods described herein are not subject to reactivity, as chat has existed for a number of years; however, the sooner such methods are explicated, the more informed developers of communication technologies will be. The integration of quantitative methods into the qualitative study (for a review of these strategies and relevant literature see Paccagnella, 1997) of these communities is also a promising development. A review of some of the quantitative techniques currently used among a multidisciplinary community is presented, followed by the http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (2 of 19) [1/11/2005 9:13:46 AM] Procedures for Analyses of Online Communities introduction of the methods used herein as a viable and necessary procedure for the quantitative analysis of characteristics of online communities. Our method, discussed at length below, applies a semantic network approach1 to the measurement of communication variables and is posed as a potential mate for the qualitative analysis of online communication. An illustrative example of the benefit of combining quantitative and qualitative analysis is provided as an study of an online educational community. This analysis also demonstrates possible shortcomings of the method. Online Community Research Because very little research using semantic networks has been performed in chat settings, this section reviews relevant research on online community and networks. Given the tremendous task of making sense of cyberspace, there are several online community analysis approaches available offering unique and informative perspectives o communities formed on the Internet (see Wellman & Gulia, 1999). A multidisciplinary approach would offer great assistance in unpacking phenomena found in online interaction, for "such interfaces would only benefit from the plethora of developers whom a publicly available platform would attract" (Smith & Fiore, 2001). Smith's research offers much insight into the social aspects and structure of online interaction through his research on Usenet (Smith, 1999) and V-Chat (Smith, Farnham, & Drucker, 2000). His research and development of Netscan provide entrance into various elements of newsgroups. Netscan uses a "dashboard" display that offers the user a thread-tree2 visualization, a piano roll component, interpersonal connections, and message display (see Smith & Fiore, 2001). These visualization components illustrate patterns of activity as well as some conversational structures in Usenet newsgroup threads. Smith's research is of particular importance because it offers some of the first successful structural visualization of very large online Usenet communities. However, representing the structure is only one level of analysis needed for the understanding of how people use online communities. The study of spatial movement can also provide useful understandings of online interaction. The two following studies are offered as examples of this sort of analysis. In a study done on V-Chat, Smith and his colleagues (Smith et al., 2000) were able to analyze 3D chat spaces by extracting measures on avatar gesturing and positioning, along with several other analyses of session length and number of sessions attended. Smith et al. (2000) found that these users did indeed use the spatial features unique to 3D chat, but that they did so less and less over time. They also found that the spatial interaction was similar to that of physical interaction. These findings provide some of the first insights into the spatial uses of online communities, particularly in regard to the use of avatar functions. Using a different method then Smith et al. (2000), Krikorian, Lee, Chock, and Harms (2000) analyzed user interaction in a graphical chat room using video capture techniques to measure avatar proximity. The authors found a positive parabolic relationship between users' liking and avatar distance, where those who liked each other more tended to be either closer or farther away from each other. Later results indicated that male-male and female-female clusters explained the social distance and attraction parabola (Krikorian & Lee, 2003). The findings by Krikorian et al. offer unique insights when compared to prior studies in that they relate the spatial distance of avatars to that of social liking. Another methodological innovation was introduced with the development of measures of asynchronous interaction by Krikorian and Kiyomiya (2002). In the Krikorian and Kiyomiya study these measures were used to model Usenet newsgroups as self-organizing systems and developed the newsgroup death model, identifying the communication constructs of these online communities that indicate their decline. This model was able to produce "box-scores" and scatter-plots that indicated the health and status of these newsgroups (Krikorian & Ludwig, 2002). Much unlike previous methods, this approach is able to compare extremely large Usenet groups to gain an understanding of the functions behind the failure of certain newsgroups, as well as identify such groups before their actual "death." Such a perspective is useful in developing preventive actions for those groups in danger of failing. Krikorian and Ludwig (2003) developed overtime network mapping software that represents any threaded message network as a longitudinal movie, revealing patterns over time. This software is also unique in that it reveals a network of users as well as messages, creating a bimodal network using novel clique detection http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (3 of 19) [1/11/2005 9:13:46 AM] Procedures for Analyses of Online Communities processes. Likewise, the ability to visualize large networks over time is a method not previously available and brings a heightened understanding to the lifecycle of such networks and communities. The methods covered thus far visit the structural and spatial study of online communities; however, the content of the interaction within these communities is also of particular importance. One such method was developed by Sack (2000) with the Conversation Map software, which represents the threads in a newsgroup over a period of time as glyphs, as well as the threads' social and semantic interrelationships. Conversation Map has interactive features that enable analysts to create maps at different levels of detail, while still being able to see the content of specific messages. This tool is oneof the first to allow entrance into the content of threaded online communities. Along with methods developed to study the structure and content of online communities, methods have also been developed to understand the nature of users' coordination and impressions. Hancock and Dunham (2001a, 2001b) use quantitative measures to gain insight into the use of coordination devices in text-based computermediated communication (CMC) environments (2001a), and to compare impression formation processes in CMC and face-to-face environments (2001b). Walther and D'Addario (2001) used experimental methods to study the impact of emoticons, graphical representations of facial expressions, on message interpretation. Emoticons are widely used in all of the online environments mentioned in the previous studies in this section, yet little has been learned about their impact. Walther and D'Addario found that verbal content outweighed emoticons, but both were found to influence message interpretation. Such studies represent a very important parallel in the methodological approaches to the study of online environments in that the structure and content of such communities is only part of the story. There is much to gain from understanding the differences between CMC and face-to-face environments. There has also been work on the quantitative structural analysis of online networks. Garton, Haythornwaite, and Wellman (1997) provide an exhaustive description of how social network techniques can be used in the study of online communities and networks. Barnett, Chon, and Rosen (2001) used social network analysis to study the structure of international Internet flows. Park, Barnett, and Kim described the structure of Korean political communication via Internet networks (2000), as well as the Internet communication structure in the Korean National Assembly (2001). Park, Barnett, and Nam (2002a, 2002b) revealed the hyperlink network structure of top Websites in Korea. It is of particular importance to highlight network analytic studies in relation to the analysis of online communities as many of the methods used by the online community studies discussed in this section, as well as the methods in this study, use network-based measures and methods of analysis. For example, Sack's (2000) Conversation Map is largely connected to network measures. Sack notes "The Conversation Map system provides the means to spatially navigate through social networks.... one of the results of a VLSC (very large scale conversation) is a social network" (p. 75). Likewise, it is of great use to the online researcher to have a fundamental understanding of network analytic methods since the very structure of the Internet is a network itself, much as Krikorian (in press) refers to it as the "Interactive Network." Although progress has been made toward the development of techniques to map, display and study online communities, most techniques have been implemented on thread-based communities such as Usenet groups. Analysis of complex, dynamic non-threaded interaction, such as chat-room conversation, remains unresolved. Graphical chat-rooms sequentially log chat interaction that is difficult to separate and analyze as sub-groups or parsed interaction. Thus, it is useful to have a method for the analysis of non-threaded interaction since such interactions produce large volumes of data that are difficult to navigate and interpret (see Figure 1). Rolland: So yes...if anyone wants to take a look into this, that would be great. MrC: sounds interesting....hello doctor could you prescribe a tomato for me? nadya: hey guys, has anyone else found some interesting info? Rolland: has anyone followed up on the sites that I mentioned last week? keeper1616: I'm looking through this book, its got a lot, but a lot of junk too nadya: that's great cyrus nadya: anyone else Rolland: Yeah...you really have to sort through a lot of stuff, but it helps your research skills a lot because it forces you to pare down your reserach and look for very specific things. nadya: hey guys...i was just wondering what is everyone working on now? nadya: jeremy is here keeper1616: im reading 'the tomato in america nadya: hi jeremy skippy: *y nadya: huh? http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (4 of 19) [1/11/2005 9:13:46 AM] Procedures for Analyses of Online Communities Immigration Officer: You are being joined by Damir. skippy: Sorry, my bad nadya: no prob Damir: Damir on line nadya: hey guys Damir: hi nadya: we were checking out cyrus' and cj's green house nadya: looking good! Rolland: Hi Damir Rolland: Great job on the green houses. nadya: does anyone know if mike or trevor were going to build a new green house for africe? nadya: africa...not africe nadya: also, is trevor coming back ever? skippy: Unknown on both nadya: ok thanks Figure 1. Raw chat data. It is our goal in this paper to explicate a method facilitating the parsing and analysis of data that is recorded in a non-threaded manner, such as chat. The method used for this form of analysis is discussed in detail below. Method The purpose of the method used in this study is to adapt, automate, and implement neural-based content analysis software to observe Internet communication patterns in chat rooms. This implementation uses CatpacTM (Woelfel & Woelfel, 1997a), a developed and proven semantic network analysis package which has the capability to extract word patterns and clusters. Clusters are extracted by sliding a text-window through the text and associating each word in the window with a neuron in an artificial neural network. Using a proprietary variation of an interactive activation and competition algorithm, connection strengths or weights are generated as a function of the coactivation patterns among the neurons. These weights in turn serve as the basis of cluster analysis and Galileo mapping3 (Woelfel, 1993). Catpac consists of four general modules. The first module is system input, which consists of subsystems for locating the input data, parsing and breaking it into "elements" for analysis, and formatting it for presentation to the main neural engine. The second module is the neural engine itself. This is a proprietary variant of the interactive activation and competition type neural network. Each neuron in the network represents an element of the input data. As elements of the input data flow through the network, nodes that represent those elements become active, and connections among active nodes are strengthened according to one of four optional learning algorithms, including sigmoid, hypertangential and linear algorithms. The output of this second module is a square matrix of connection weights among the neurons representing the elements of the data. It should be noted that, in its original form, Catpac was driven by a simple co-occurrence engine. After the development of the neural engine, Catpac was offered with the option to use either co-occurrence data or the neural network engine. Since most users overwhelmingly favored the results of the neural engine, subsequent versions of Catpac have dropped the co-occurrence option. The neural engine produces much deeper, more complex and detailed structures than the co-occurrence model, due to the fact that all indirect links are considered. In the co-occurrence model, linkages between elements which co-occur in a case are strengthened. But in the Catpac neural model, linkages between elements which co-occur are strengthened, as well as linkages with other elements already linked to the co-occurring nodes in proportion to their degree of linkage. These indirect connections allow for significantly more complex patterns to be stored and retrieved. The third module consists of several multivariate analysis systems, which analyze the underlying structure of the connection weight matrix. These currently consist of cluster analysis routines, perceptual mapping (multidimensional scaling) routines, and a neural network which allows tracing associations among the elements of the data; that is, one or more elements may be selected, and others most closely connected to them will be http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (5 of 19) [1/11/2005 9:13:46 AM] Procedures for Analyses of Online Communities displayed. This technique displays clusters of words in the form of a dendogram, and as clusters of points in three-dimensional space using a sister program, ThoughtViewTM (Woelfel & Woelfel, 1997b). This wide variety of analytic methods makes it possible to adapt Catpac to an assortment of types of data. Catpac has been used for the study of traditional text (Doerfel & Barnett, 1995; Freeman & Barnett, 1994; Salisbury, 2001), such as articles and long response questionnaires. It has been successful in revealing clusters of associated words in text that provided helpful quantitative data to support qualitative interpretations. One of the most important aspects of the method used in this procedure is the ability to analyze data based on set parameters. For this, an algorithm has been developed which parses chat data into separate and interrelated files used to determine individual, group, and systematic organizational patterns over time. This becomes useful when combined with a qualitative analysis where the researcher has an ethnographic understanding of the community members, whereas there is a "name file" that allows for directed analysis and the labeling of contributions. For example, if the online community were associated with a large undergraduate class, the teacher would have the ability to observe semantic clusters extracted from only the sophomores' communication, or only what the Communication majors are contributing as compared to the Psychology majors. If the analysis was on a medical community one could observe the difference between communication originating from doctors as compared to lay users. Other uses bridge to industry, where virtual task groups' interaction could be parsed, revealing both potentially positive and negative trends in the interaction. The following section explains the use of the method as a case study analyzing an educational online community. Online Community Description: SciCentr To develop and test the method, data from the Cornell Theory Center's SciCentr were used (www.scicentr.org). SciCentr is an effort of the Cornell Theory Center to explore the use of three-dimensional online virtual worlds for student interaction. The mission of the project is to engage and educate community members in relation to the excitement and complexity of computational science (Corbit, 2000). One of the main efforts in SciCentr's creation is to complete this mission by allowing users to interact in ways appropriate to their own level of interest, commitment, and ability. SciCentr's user community is comprised of advisors and technical experts, university and high school students, as well as content and exhibits developers. The target audience is youth between the ages of 11 and 21. The purpose of this virtual world is to allow teens in after-school programs to create knowledge spaces, based on research they are conducting on tomatoes with the help of researchers at Cornell. Research topics comprise heritage, diversity, uses, production, breeding; they will soon include genetic engineering and molecular genetics. Active Worlds client/server technology is used for the implementation of a 3-D multi-user virtual science museum, SciCentr. This "world" is graphically modeled after the 1939 World Fair, and combines the setting of a museum and an outdoor fair, or collection of demonstrations and exhibits, to create what might be described as a virtual, outdoor exhibit-based museum. An example of one of these exhibits is the Fourier Fountain (Figure 2, from Corbit, 2000), modeled after the Singing Fountains of the 1939 World's Fair. In this exhibit users, represented as avatars, can play the small keys of the fountain, and corresponding sections of the central crystal structure flash yellow as the sound is played on the desktop. Other users can also hear the chords played. Thus, users from around the world can participate in making music, and hear it played in "real time." Further, the chords are visually represented on the wall of the room using sound generators in MATLAB. http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (6 of 19) [1/11/2005 9:13:46 AM] Procedures for Analyses of Online Communities Figure 2. Fourier Fountain, an exhibit from Cornell Theory Center's SciCentr. Another example of a SciCentr exhibit is the Plant Breeding Beds (Figure 3, from Corbit, 2000), which represents an inquiry-based digital laboratory for plant breeding simulations. Figure 3. Plant Breeding Beds, an exhibit from Cornell Theory Center's SciCentr. http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (7 of 19) [1/11/2005 9:13:46 AM] Procedures for Analyses of Online Communities Data The two figures from SciCentr provided above are just two examples from the wealth of exhibits that the students can navigate and help create. It is of particular importance to note that both of these figures show a chat window in the lower left of the figure, in which the immigration officer is welcoming a user to the SciCentr. These chat windows are part of all SciCentr navigation and are the main way that the users can verbally interact with each other. All developers, including student, professional, and high school world builders have "citizenships" that allow them to visit all of the worlds in Active Worlds Educational Universe (AWEDU). These users all select user id's and passwords. Likewise, all student visitors also have user IDs (see Figure 3 for examples of user IDs). These consistent logon names allow for effective collection of chat data since the chat windows are present during all SciCentr activity (see chat boxes in Figures 1 and 2). Chat sessions are stored as log files containing the raw chat data (for an example of the raw chat data see Figure 3). The people included in this data set are high school students at a rural high school, science students at Cornell University acting as mentors for the high school students, and volunteers who helped guide the interaction. Analytic Method The data are parsed into several different levels of analysis. First, the chat data is left as is, including logon names, to analyze how the presence of these IDs would affect the results. This analysis also allows for the increased understanding of the structure of the community since the logon names will be clustered together if the users had frequent common interaction. Second, the logon names are eliminated from chat logs to allow the outputs to represent only the actual conversational interaction. This step is taken because of the frequency of logon names in the data and the expected domination of them in the outputs. Chat statements of individual users are then parsed into separate files containing only those users' text. Third, individual user files are bundled into location sets, one for the high school and one for Cornell. This step is taken to see if there is a difference between trends of what the high school students are saying compared to what the Cornell mentors are saying. Finally, the individual user files are bundled by gender to test for differences and similarities. These data sets are analyzed using Catpac, producing dendogram outputs clustered with Ward's method.4 Visual representation was further enhanced using the three-dimensional representation software, Thoughtview. Outputs All results from the analysis are represented as follows: Lists of most frequent words in descending as well as alphabetical order (see Table 1), icicled dendograms (which are read vertically where the height of the pillars connecting the words represents their association and thus clustering words together), and as three-dimensional planes containing the word clusters. These planes are rotated around any of the three axes to gain visual access to word clusters that may be hidden in any given view (see figures 5 and 6 for examples of complete output). In the test concerning the analysis of the raw chat, the four most frequently occurring words in the descending frequency list are logon names. Likewise, 9 of the 12 words in a large cluster in the dendogram are also logon names (see Figure 4). The users that were clustered together revealed that the two mentors (Nadya and Rolland) were most central along with the technical coordinator (MrC), with the most active students (Emrys, Keeper, Kikki, Telekenetix, and Tiona) centered about the mentors. This has important structural implications discussed in the next section. Ward's Method http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (8 of 19) [1/11/2005 9:13:46 AM] Procedures for Analyses of Online Communities Figure 4. Dendogram view of all chat with logon names. Removing the logon names from the data set yielded results that represent a more accurate representation of clusters of words in the actual interaction. The strongest clusters are: ● I'm - today - joined - OK http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (9 of 19) [1/11/2005 9:13:46 AM] Procedures for Analyses of Online Communities ● ● ● ● ● ● ● ● everyone - me - good - guys - great don't - know will - think - yes want - go - going - am greenhouse - tomato build - stuff bye - next - time Cornell - doing The results of the data representing all statements written by mentors located on the Cornell campus reveal the following clusters: ● ● ● ● ● ● everyone - want - question - go - me - sign OK - guys - hey - good - know - that's - great - am America - working - greenhouse - greenhouses tomato - tomatoes don't - think - Rolland feel - free - worlds - doing DESCENDING FREQUENCY LIST ALPHABETICALLY SORTED LIST WORD FREQ PCNT CASE CASE WORD FREQ PCNT FREQ PCNT CASE CASE FREQ PCNT HOUSE 156 3.6 189 10.0 AM 76 1.7 235 12.4 TIME 142 3.2 147 7.8 AMERICA 71 1.6 82 4.3 GREEN 127 2.9 187 9.9 ASK 65 1.5 70 3.7 GUYS 113 2.6 297 15.7 BIT 76 1.7 89 4.7 KNOW 111 2.5 239 12.6 BUILD 54 1.2 68 3.6 EVERYONE 110 2.5 324 17.1 BUILDING 79 1.8 132 7.0 ME 110 2.5 240 12.7 CORNELL 53 1.2 105 5.6 HOUSES 108 2.5 120 6.3 CYRUS 54 1.2 213 11.3 US 108 2.5 168 8.9 DOING 60 1.4 0 0.0 GO 106 2.4 231 12.2 DON'T 67 1.5 212 11.2 WORLD 105 2.4 95 5.0 EVERYONE 110 2.5 324 17.1 ROLLAND 104 2.4 169 8.9 FEEL 46 1.0 46 2.4 NEXT 103 2.3 158 8.4 FREE 49 1.1 25 1.3 STUFF 98 2.2 123 6.5 GO 106 2.4 231 12.2 WANT 98 2.2 205 10.8 GOING 88 2.0 221 11.7 WILL 98 2.2 176 9.3 GOOD 70 1.6 266 14.1 INFO 92 2.1 170 9.0 GREAT 80 1.8 238 12.6 QUESTIONS 89 2.0 180 9.5 GREEN 127 2.9 187 9.9 GOING 88 2.0 221 11.7 GREENHOUSE 88 2.0 125 6.6 GREENHOUSE 88 2.0 125 6.6 GREENHOUSES 73 1.7 110 5.8 OVER 86 2.0 82 4.3 GUYS 113 2.6 297 15.7 TOMATOES 86 2.0 121 6.4 HEY 64 1.5 269 14.2 TODAY 83 1.9 181 9.6 HOUSE 156 3.6 189 10.0 http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (10 of 19) [1/11/2005 9:13:46 AM] Procedures for Analyses of Online Communities GREAT 80 1.8 238 12.6 HOUSES 108 2.5 120 6.3 OK 80 1.8 301 15.9 INFO 92 2.1 170 9.0 BUILDING 79 1.8 132 7.0 111 2.5 239 12.6 AM 76 1.7 235 12.4 LAST 71 1.6 131 6.9 BIT 76 1.7 89 4.7 LOOKING 51 1.2 119 6.3 TOMATO 75 1.7 136 7.2 MARGARET 50 1.1 83 4.4 GREENHOUSES 73 1.7 110 5.8 ME 110 2.5 240 12.7 WORK 73 1.7 125 6.6 NAME 57 1.3 79 4.2 AMERICA 71 1.6 82 4.3 NEXT 103 2.3 158 8.4 LAST 71 1.6 131 6.9 OK 80 1.8 301 15.9 GOOD 70 1.6 266 14.1 OKAY 68 1.5 286 15.1 OKAY 68 1.5 286 15.1 OVER 86 2.0 82 4.3 DON'T 67 1.5 212 11.2 PEOPLE 46 1.0 149 7.9 ASK 65 1.5 70 3.7 89 2.0 180 9.5 HEY 64 1.5 269 14.2 REALLY 54 1.2 126 6.7 THINK 62 1.4 213 11.3 ROLLAND 104 2.4 169 8.9 DOING 60 1.4 0 0.0 SIGN 52 1.2 51 2.7 NAME 57 1.3 79 4.2 STUFF 98 2.2 123 6.5 TUESDAY 54 1.2 68 3.6 THINK 62 1.4 213 11.3 CYRUS 54 1.2 213 11.3 TIME 142 3.2 147 7.8 REALLY 54 1.2 126 6.7 TODAY 83 1.9 181 9.6 WORKING 54 1.2 96 5.1 TOMATO 75 1.7 136 7.2 CORNELL 53 1.2 105 5.6 TOMATOES 86 2.0 121 6.4 SIGN 52 1.2 51 2.7 TUESDAY 108 2.5 168 8.9 MARGARET 50 1.1 83 4.4 WANT 98 2.2 205 10.8 FREE 49 1.1 25 1.3 WILL 98 2.2 176 9.3 KNOW QUESTIONS Table 1. Cornell all chat, word list and dendogram view. http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (11 of 19) [1/11/2005 9:13:46 AM] Procedures for Analyses of Online Communities Figure 5. Cornell all chat, word list and dendogram view. http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (12 of 19) [1/11/2005 9:13:46 AM] Procedures for Analyses of Online Communities Figure 6. 3-D Representation of Cornell Location Users The results of the text analysis of the high school location presented the following clusters: ● ● ● ● ● ● yeah - OK - will - go - Rolland - good - don't - I'll - bye are - doing - additional - leaning - someone - requires - program - needs first - place else - someone name - doing able - objects Analysis of the males' chat output presented several smaller clusters: ● ● ● ● ● ● yes - I'm - doing - good everyone - that's - want - know later - ya - go last - year can't - thing - going built - us - greenhouse Much like the male output, the female analysis presented several smaller clusters: ● ● ● ● ● ● ● I'm - go - name - me - tomatoes - don't - know killing - people - wrong okay - bye - time sure - want - information death - penalty sister - think hey - am - doing - something http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (13 of 19) [1/11/2005 9:13:46 AM] Procedures for Analyses of Online Communities Findings The analysis performed with the logon names present in the data revealed that these logon names are indeed dominant and did not allow for a good view of the content of the chat interaction. However, it did provide an understanding of the social structure of the community in regards to which individuals were frequently grouped together in the interaction. The main group is comprised of the mentors at Cornell and the community developers, followed by the most active student users. This is important because it shows that the mentors were indeed often present to assist the students, as well as indicating extent of use by the students and who the most frequent users are. This finding can be associated with a network clique, which, to a large extent, is the one of the main powers of social network analysis (the ability to identify cliques within larger social groups, e.g. departments within a corporation). Likewise, the most frequent users can be associated with centrality, or how central an actor is within a network. Although these are only associations, they do provide a basic groundwork for interpretation of chat interaction. Output from the data with the logon names extracted gives a view of the conversational aspects of the chat sessions. The mentors were the most frequent users, thus the clusters are likely more representative of what they said, than what the students said. This is reflected in the results where the content of the clusters seem to be initiating, leading, and closing statements such as I'm joined today, everyone me good guys great, bye next time, and build stuff. Also revealed is the presence of greenhouse and tomato, which are both central topics of the SciCentr. Given that the mentors were dominant in the data, the analysis separating them from the high school students is important for an accurate understanding of the community. The mentors were indeed giving positive feedback such as OK-guys-hey-good-know-that's-great and everyone-want-questions-go-me-sign, as well as central themes of the research, such as America-working-greenhouse and tomato-tomatoes. Likewise the high school students seem to be accepting the help with clusters like yeah-ok-will-go and are-doing-additional-someonerequires-program-needs. An interesting finding is the difference in the gender-based analysis. The females have such clusters as killingpeople-wrong and death-penalty, which were completely absent from the male data. Males had clusters like yesI'm-doing-good, can't-thing-going, and built-us-greenhouse. Although the implications of differences like these will require further investigation (mainly in the analysis of subsequent iterations of the SciCentr project), interaction in the online community indicated that both males and females in the high school found the 3D virtual worlds stimulating, increasing their interest in scientific research (Corbit, 2000). Highlighting this point are relative similarities such as the males' clusters everyone - that's - want - know and built - us - greenhouse, and the females' clusters hey - am - doing - something and sure - want - information. It is anticipated that the semantic network analysis results explicated above, when combined with analysis of the subsequent iterations of the community, may begin to catalyze the understanding of the possible similarities and differences in why male and female high school students are interested in scientific discussion. Discussion Findings from the semantic network analysis provide increased insight into the interaction in the educational online community. This level of insight was not previously accessible to the developers due to the large amount of data generated by chat interaction. It is in this sense that the combination of quantitative methods, such as those explicated in this article, when combined with the qualitative/ethnographic approach that the researchers are already using, truly allow for the increased understanding of such communities. Indeed an ethnographic understanding of the community would allow for intuitions such as the moderators' being the most frequent users, yet the quantitative measures to back these intuitions were not available. It should also be noted that this analysis provided a substantive framework for the development of the next iterations of the SciCentr project, mainly regarding the amount of chat data created. Having the ability to analyze the large amount of data produced in the community enabled the developers to foster the use of the chat ability of the SciCentr. Likewise, the moderators of the community were more motivated to encourage conversation since they understood both the impact that the chat was having on the students as well as the ability to extract themes from the interaction. http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (14 of 19) [1/11/2005 9:13:46 AM] Procedures for Analyses of Online Communities An important development for the study of online communities is the ability to perform longitudinal analysis (for examples of other online longitudinal research see Krikorian, in press; Sack, 2000; and Smith et al., 2000). Interaction logs have been recorded for subsequent semesters of the SciCentr project and are currently being processed for an over-time perspective on the community. Having a longitudinal understanding of online communities will catalyze insights into such dynamics as organizing, communication convergence, and relative success of such communities. This will increase the insight provided by the quantitative methods, further stressing the need to combine the two approaches. An important consideration when using outputs from programs like Catpac is that the picture gained is highly interpretive, and there is always the possibility that there are threads of text that were not included due to infrequency. These threads may be extremely informative. Likewise, the clusters that are revealed are only clusters and it becomes difficult to identify the context in which they occur. Again, this is where a qualitative understanding is crucial to take full advantage of the network tools available. Researchers using network analysis programs should keep in mind that these programs are only tools and will not provide meaningful findings unless the full story behind the data is investigated. It is in this sense that this article, along with the others in this volume, work to stress the benefits of using a multi-level approach combining both quantitative methods with the qualitative insight needed to interpret the findings. Similarly, multidimensional representation should be used as a supplemental tool, since the apparent distances between nodes can be somewhat arbitrary. The presence of these clusters in relation to each other can be informative (given some separation in the output) in that one can gain an understanding of the relation of words in the dataset. Thus, outputs offered in this article, as well as other multidimensional representations, should be viewed with these considerations in mind. The level of information gained using semantic network analysis tools and the power of the outputs are positively related to the amount of text analyzed. If there are more data the neural network has more opportunity to learn how to associate the words. The dataset used in this study was only from one semester of interaction, and the insight gained by these methods increases with the amount of information analyzed. Thus, such methods will greatly assist the interpretive understanding when studying larger online communities. Paccagnella (1997) notes, "deep, interpretive research on virtual communities could consequently be greatly helped by an accurate use of new analytic, powerful yet flexible tools, exploiting the possibility of cheaply collecting, organizing and exploring digital data." Implications for Future Development The case study provided many insights that illuminate the development of the methods described in this paper. The method of parsing chat data so that they can be studied on individual and group levels is extremely useful. Automating this process allows users to input a name file allowing for the immediate analysis of the data on any demographic or systems level. The analysis suggests that Catpac should be redesigned to deal specifically with chat data. It is currently built to handle sentence and paragraph style text, where chat data are typically conversational and thus comprised of very short statements made by a variety of users. The engine will need to be rebuilt to deal with this difference. Another limitation of the current engine is the inability to handle certain languages. Although Catpac does have the ability to process over 12 languages, it cannot handle such languages as Mandarin, Korean, Thai, and many other languages. As the Internet becomes an increasingly global medium, international comparative research becomes more important, yet Catpac research is limited in this regard5. It has also become quite apparent that the methods used will need to incorporate many other network tools, such as the ability to search for specific information in the text. This ability will be beneficial whether within a specific chat room or spidering the Internet to locate chat rooms currently using certain terms. For increased visual representation, future applications would benefit from implementing a real-time continuously updating engine, introducing the ability to observe the three-dimensional representation as the chat is http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (15 of 19) [1/11/2005 9:13:46 AM] Procedures for Analyses of Online Communities progressing. Thus words could be seen entering and leaving the window, as well as the fluid and continuous movement of clusters and word relations adding another level to the ability to observe online interaction. Factors such as gender, age, income, geographic proximity, and anonymity will all play important roles in computer-mediated communication (see Miller, 1995; Wallace, 1999). Likewise chat contexts such as stock trading, romance, fan clubs, moderated chat (e.g., with experts or personalities), rural civic groups, science tutoring for children, and feedback forums provide examples of the diversity of communities. Given this dynamic spectrum, it is crucial that methodological approaches be developed to extract various demographic, institutional, and attitudinal information. Eventually an analysis of a large array of genres will help understand the nature of chat interaction in a multitude of contexts and levels of measurement. As human interaction continues to be increasingly integrated with communication technologies, quantitative procedures for the analysis of the interactions need to be developed at a pace on par with the technological development. Not only will analysis procedures help us understand the changing human condition, it will also assist in the heuristic development of future communication technology. Researchers using these methods should also incorporate qualitative research methods, for it is crucial to integrate these techniques to develop a multi-method approach in determining the resultant epistemological changes new media will catalyze. If we understand the revolutionary transformations caused by new media, we can anticipate and control them; but if we continue in our self-induced subliminal trance, we will be their slaves. (1974 McLuhan interview, in Benedetti, 1996, p. 74) Acknowledgements All correspondence should be sent to the first author at [email protected]. A special thanks goes to Margaret Corbit, the Cornell Theory Center, and the whole SciFair project for their generosity. Also, many thanks to Melissa Carvalho for her time and effort, and Joe Walther for help with editing. Footnotes 1. Semantic network analysis is similar to social network analysis in that it uses a node (or actor, which can be discrete individual, corporate, or social entities) and link (or relational tie, the defining feature of which is that they establish a connection of some form between the nodes). However, the words in a body of text are treated as the nodes and the connections weights become the links. 2. Threads are chains of posts linked to each other, where each post contains a header that records information about the post. 3. Galileo theory offers laws of processes similar to common laws of physics. In Galileo mapping, items have not only location, but also equivalents to mass and velocity. Thus, a Galileo map is not a void space with occasional concepts in it, it is more associated to Einsteinian space-time, in which forces exist between items in the space. It is in this sense that there isn't empty space, but rather areas of increased mass and thus their associated forces. 4. For a detailed description of clustering algorithms available in Catpac, see Catpac Users Manual (Woelfel & Woelfel, 1997a). 5. For example, Park (2002) had to translate Korean texts into English. References Barnett, G. A., Chon, B. S., & Rosen, D. (2001). The structure of international internet flows in cyberspace. NETCOM (Network and Communication Studies), 15 (1-2), 61-80. Benedetti, P., & Dehart, N. (Eds.) (1996). Forward through the rearview mirror: Reflections on and by Marshall McLuhan. Prentice Hall: Toronto. Corbit, M. (2000). Building virtual worlds for informal science learning (SciCentr and SciFair) in the Active Worlds educational universe (AWEDU). Paper presented to the Workshops on Enabling Technologies: Infrastructure for http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (16 of 19) [1/11/2005 9:13:46 AM] Procedures for Analyses of Online Communities Collaborative Enterprises, at the National Institute of Standards and Technology. Retrieved January 5, 2003 from http://www.tc.cornell.edu/~corbitm/corbit.nist.2000.htm. Doerfel, M. L., & Barnett, G. A. (1996). The use of CATPAC for textual analysis. Cultural Anthropology Methods, 8, 4-7. Freeman, C. A., & Barnett, G. A. (1994). An alternative approach to using interpretative theory to examine corporate messages and organizational culture. In L. Thayer & G.A. Barnett (Eds.), Organization<->communication: Emerging perspectives (pp. 60-73), Norwood, NJ: Ablex. Garton, L., Haythornwaite, C., & Wellman, B. (1997) Studying online social networks. Journal of Computer Mediated Communication, 3 (1). Retrieved March 10, 2003 from http://www.ascusc.org/jcmc/vol3/issue1/garton.html. Hancock, J. T., & Dunham, P. J. (2001a). Language use in computer-mediated communication: The role of coordination devices. Discourse Processes, 31, 91-110. Hancock, J. T., & Dunham, P. J. (2001b). Impression formation in computer-mediated communication revisited: An analysis of the breadth and intensity of impressions. Communication Research, 28, 325-347. Johnson, S. (1997) Interface culture. San Francisco: Basic Books. Krikorian, D., Lee, J., Chock, T. M., & Harms, C. (2000). Isn't that spatial?: Distance and communication in a 2-D virtual environment. Journal of Computer Mediated Communication, 5(4). Retrieved January 8, 2003 from http://www.ascusc.org/jcmc/vol5/issue4/krikorian.html. Krikorian, D. (in press). The newsgroup death model: Internet groups as self-organizing. In G.A. Barnett, & R. Houston, (Eds.) Progress in communication sciences Vol. 18, Self-organizing Systems. Greenwich, CT: Ablex. Krikorian, D. & Kiyomiya, T. (2002). Bona fide groups as self-organizing systems: Applications to electronic newsgroups. In L.R. Frey (Ed.), Group communication in context: Studies of bona fide groups (pp. 335-365). New York: Lawrence Erlbaum. Krikorian, D., & Ludwig, G. (2002, March) Groupscope: Data mining tools for online communication networks. Paper presented at the 22nd annual Sunbelt Social Network Conference, New Orleans, LA. Krikorian, D., & Ludwig, G. (2003, February) Advances in network analysis: Over-time visualization, dual-mode relations, and clique detection methods. Paper presented at the 23rd annual Sunbelt Social Network Conference, Cancun, Mexico. Krikorian, D., & Lee, J. (2003) Explaining the social attraction-distance parabola: Same sex effects in online stranger interaction. Working paper. Ithaca, NY: Cornell University. McLuhan, M. (1969) Counterblast. New York: H.B. &W. Inc. Miller, H. (1995). The presentation of self in electronic life: Goffman on the Internet. Paper presented at Embodied Knowledge and Virtual Space, London, 1995. Retrieved March 1, 2003 from http://ess.ntu.ac.uk/miller/cyberpsych/goffman.htm. Paccagnella, L. (1997). Getting the seats of your pants dirty: Strategies for ethnographic research on virtual communities. Journal of Computer Mediated Communication, 3 (1). Retrieved January 8, 2003 from http://www.ascusc.org/jcmc/vol3/issue1/paccagnella.html. Park, H. W. (2002). Examining the determinants of who is hyperlinked to whom: A survey of webmasters in Korea. First Monday, 7 (11). Retrieved April 5, 2003 from http://www.firstmonday.dk/issues/issue7_11/. http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (17 of 19) [1/11/2005 9:13:46 AM] Procedures for Analyses of Online Communities Park, H. W., Barnett, G. A. & Kim, C. S. (2000). Political communication structure in Internet networks-- A Korean case. Sunggok Journalism Review, 12, 67-90. Park, H. W., Barnett, G. A., & Kim, C. S. (2001). Internet communication structure in Korean National Assembly: A network analysis. Korean Journal of Journalism & Communication Studies, Special English Edition, 185-204. Park, H. W., Barnett, G. A., & Nam, I. Y. (2002a). Hyperlink-affiliation network structure of top websites: Examining affiliates with hyperlinks in Korea. Journal of the American Society for Information Science and Technology, 53 (7), 1-10. Park, H. W., Barnett, G. A., & Nam, I. Y. (2002b). Interorganizational hyperlink networks among websites in South Korea. NETCOM: Network and Communication Studies, 16(3/4, Special issue on the Internet development in Asia), 155-173. Sack, W. (2000). Conversation map: An interface for very large scale conversations. Journal of Management Information Systems, 17(3), 73-92. Salisbury, J. G. T. (2001). Using neural networks to assess corporate image. In M. West (Ed.), Progress in communication sciences, Vol 17: Applications of computer content analysis (pp.65-86). Westport, CT: Ablex. Smith, M., & Fiore, A. (2001). Visualization components for persistent conversations. In ACM SIG CHI 2001. Retrieved February 10, 2003 from http://www.research.microsoft.com/~masmith/Visualization Components for Persistent Conversations - Final.doc. Smith, M., Farnham, S., & Drucker, S. (2000). The social life of small graphical chat spaces. In ACM SIG CHI 2000. Retrieved February 10, 2003 from http://research.microsoft.com/~masmith/The Social Life of Small Graphical Chats.doc. Smith, M. (1999). Invisible crowds in cyberspace: Measuring and mapping the social structure of USENET. In M. Smith & P. Kollock (Eds.), Communities in cyberspace: Perspectives on new forms of social organization. London: Routledge Press. Wallace, P. (1999). The psychology of the Internet. Cambridge, U.K: Cambridge University Press. Walther, J. B., & D'Addario, K. P. (2001). The impacts of emoticons on message interpretation in computermediated communication. Social Science Computer Review, 19, 323-345. Wellman, B., & Gulia, M. (1999) Net surfers don't ride alone: Virtual communities as communities. In M. Smith & P. Kollock (Eds.), Communities in cyberspace (pp. 331-367). London: Routledge Press. Woelfel, J. (1993). Artificial neural networks in policy research: A current assessment. Journal of Communication, 43(1), 63-80. Woelfel, J., & Woelfel, J. (1997a) Catpac version 2.0, Galileo Corporation. Woelfel, J., & Woelfel, J. (1997b) ThoughtView version 2.0, Galileo Corporation. About the Authors Devan Rosen is currently a doctoral student at Cornell University in the Department of Communication, with a focus on Communication Technology and Networks. He received his B.A. at the University at Buffalo, Department of Communication, with a focus in Organizational and Intercultural Communication. He then worked http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (18 of 19) [1/11/2005 9:13:46 AM] Procedures for Analyses of Online Communities in industry before returning to the University at Buffalo to receive his M.A. from the Department of Communicationm with a focus in Social Network Analysis and Organizational Communication. His research foci range from the self-organization and emergence of human interaction, to the use of social network measures and neural network applications for the longitudinal study of online communities. Address: Department of Communication, Cornell University, 336 Kennedy Hall, Ithaca, NY 14853-4203. Joseph Woelfel received his Bachelor's degree from Canisius College, and his Master's and Ph.D. from the University of Wisconsin at Madison. He has served on the faculty of the University of Illinois at UrbanaChampaign, Michigan State University, and the State University of New York at Albany, where he was Professor of Communication and Director of Research and Founding Fellow of the Institute for the Study of Information Science. He is currently Professor and former Chair of the Department of Communication at the University at Buffalo. Professor Woelfel was a Senior Fellow at the East West Center in Honolulu, a Fulbright scholar in Yugoslavia, and Senior Fellow at the Rockefeller Institute of Government at the State University of New York. Dr. Woelfel is the author of numerous books and articles, including The Measurement of Communication Processes: Galileo Theory and Method, with E. L. Fink. He is a principal developer of extensive computer software, including the suite of Galileo programs, and CATPAC, a text analysis program utilizing artificial neural technology. Dr. Woelfel has also served as president of Terra Research and Computing, and is currently president of The Galileo Company. Dr. Woelfel's clients include many of the Fortune Top 50, and his software is widely used in both academic and business settings worldwide. Current biography can be found in Who's Who in America and Who's Who in The World. Address: Department of Communication, State University of New York at Buffalo, 528 Baldy Hall, Buffalo, NY 14260-1020. Dean Krikorian Dean H. Krikorian (Ph.D., University of California, Santa Barbara) is an Assistant Professor in the Department of Communication at Cornell University. His research examines organizational communication, small group decision-making processes, and the Internet. He is director of the Cornell Communication Network Laboratory, which examines network communication patterns, particularly in online environments. He is currently developing network analytic software for Internet groups. Address: Department of Communication, Cornell University, 336 Kennedy Hall, Ithaca, NY 14853-4203. George A. Barnett (Ph.D., Michigan State University, 1976) is currently Chair and Professor of Communication at the State University of New York at Buffalo. Dr. Barnett has also taught at Rensselaer Polytechnic Institute and the University of Texas at Austin. He has written over 100 books, articles and conference papers on such topics as organizational, mass, international, intercultural, political, technical and scientific communication, as well as marketing communication, public relations and the diffusion of innovations. He has edited the Handbook of Organizational Communication (Ablex, Norwood NJ, 1988) and is currently editor of Organization <--> Communication: Emerging Perspectives and Progress in Communication Science. The goal of his current research is to describe the patterns of use or structure of international communication, in general, and telecommunications (telephone and computer based communication-the World Wide Web) in particular. He also has an interest is in the sociology of knowledge, especially as it applies to the field of communication. Currently, he is involved in a study that examines the absolute and distributed information in the field along with its applications to other social organizations. Address: Department of Communication, State University of New York at Buffalo, 528 Baldy Hall, Buffalo, NY 14260-1020. ©Copyright 2003 Journal of Computer-Mediated Communication http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (19 of 19) [1/11/2005 9:13:46 AM]
© Copyright 2026 Paperzz