Procedures for Analyses of Online Communities

Procedures for Analyses of Online Communities
JCMC 8 (4) July 2003
Message Board
Collab-U CMC Play E-Commerce Symposium Net Law InfoSpaces Usenet
NetStudy VEs VOs O-Journ HigherEd Conversation Cyberspace Web Commerce
Vol. 6 No. 1 Vol. 6 No. 2 Vol. 6 No. 3 Vol. 6 No. 4
Vol. 7 No. 1 Vol. 7 No. 2 Vol. 7 No. 3 Vol. 7 No. 4 Vol. 8 No. 1 Vol. 8 No. 2 Vol. 8 No. 3 Vol. 8 No. 4
Procedures for Analyses of Online Communities
Devan Rosen
Cornell University
Joseph Woelfel
State University of New York at Buffalo
Dean Krikorian
Cornell University
George A. Barnett
State University of New York at Buffalo
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Abstract
Introduction
Online Community Research
Method
Online Community Description: SciCentr
Data
Analytic Method
Outputs
Findings
Discussion
Implications for Future Development
Footnotes
Acknowledgments
References
About the Authors
1
Abstract
This article details a set of procedures for the analysis and interpretation of the content and structure of online
networks and communities. These novel methods allow for the analysis of online chat, including parsing the data
http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (1 of 19) [1/11/2005 9:13:46 AM]
Procedures for Analyses of Online Communities
into separate and interrelated files to determine individual, group and organizational patterns. An illustrative
example of an educational online community in Active Worlds Educational Universe (AWEDU) is provided that
uses three-dimensional virtual worlds for student interaction. Findings from semantic network analysis
procedures reveal elements of the online interaction that would otherwise be difficult to extract given the great
amount of textual data produced in such communities. The case study allows for qualitative and quantitative
analyses. The limitations of the procedures are discussed along with planned developments and their social
implications.
Introduction
"Communication has emerged as a necessary object of attention in the 20th century, not because it's new, but because it's
that portion of the social organism now undergoing elephantiasis" -Marshall McLuhan (1969)
As McLuhan (1969) prophetically pointed out, our ability to communicate and interact with each other has
catalyzed the information revolution. Considering the change in scale introduced to human interaction by
communication technology, the impact may be affecting us with as much force as the industrial revolution. Much
as the railway didn't necessarily introduce movement to human society, it certainly accelerated and enlarged the
scale of previous human functions and concept of space. This comparison is quite applicable to the change in
human affairs that the global Internet has introduced: "Under instant circuitry, nothing is remote in time or
space...it is now" (McLuhan, in Benedetti, 1996, p. 8). Resulting from the change in scale associated with the
human ability to connect with each other is the shrinking of this conceptual space, as well as the way we
associate cognitive communicative distance.
Emerging as a powerful catalyst of the increase in global communication forums are online communities.
Ranging from simple text-based newsgroups to intricate immersive virtual reality multi-user environments, these
communities, whether graphic or not, are strung together by conversational text, or chat. Chat entails any
number of individuals communicating with each other using text-based communication, often appearing as a
chat window in graphic environments. Johnson (1997, p.71) points out that, "For the most part the social fabric of
cyberspace is still stitched together by the gossamer thread of text." Through these communities this social
fabric is being wrapped around the world and connecting humans with humans in much the same way a village
does. Perhaps a village is even too big a metaphor, as McLuhan noted:
Transmitted at the speed of light, all events on this planet are simultaneous... The absence of space brings to mind the idea
of the village. But actually, at the speed of light, the planet is not much bigger than the room we're in... The acoustic or
simultaneous space in which we now live in is like a sphere whose center is everywhere and whose margins are nowhere.
(1974, in Benedetti, 1996, p. 24)
As this sphere of contact continues to enrich our interactions across domains, from online stock trading
discussions to medical advice communities, the use of chat may also increase.
Parallel with the increased complexity and development of online communicative forums is the need to develop
methods and tools that will allow the users and creators of future iterations of communication technology to be
proactive, instead of the cycle of reactivity that is currently too common. For example, many of the technologies
that would be considered part of the communication technology revolution, such as online newsgroups, multiuser dungeons (MUDs), and e-mail, were created and widely used before they were studied by social scientists.
This may indeed be a result of many of such technologies origins in industry, and their widespread use was
indeed what catalyzed scientific inquiry and development of analysis tools. Thus a tool for the systematic and
intelligent analysis of the chat medium is quite necessary to understand further the social and communicative
implications. This is not to say that the methods described herein are not subject to reactivity, as chat has
existed for a number of years; however, the sooner such methods are explicated, the more informed developers
of communication technologies will be.
The integration of quantitative methods into the qualitative study (for a review of these strategies and relevant
literature see Paccagnella, 1997) of these communities is also a promising development. A review of some of
the quantitative techniques currently used among a multidisciplinary community is presented, followed by the
http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (2 of 19) [1/11/2005 9:13:46 AM]
Procedures for Analyses of Online Communities
introduction of the methods used herein as a viable and necessary procedure for the quantitative analysis of
characteristics of online communities. Our method, discussed at length below, applies a semantic network
approach1 to the measurement of communication variables and is posed as a potential mate for the qualitative
analysis of online communication. An illustrative example of the benefit of combining quantitative and qualitative
analysis is provided as an study of an online educational community. This analysis also demonstrates possible
shortcomings of the method.
Online Community Research
Because very little research using semantic networks has been performed in chat settings, this section reviews
relevant research on online community and networks. Given the tremendous task of making sense of
cyberspace, there are several online community analysis approaches available offering unique and informative
perspectives o communities formed on the Internet (see Wellman & Gulia, 1999). A multidisciplinary approach
would offer great assistance in unpacking phenomena found in online interaction, for "such interfaces would only
benefit from the plethora of developers whom a publicly available platform would attract" (Smith & Fiore, 2001).
Smith's research offers much insight into the social aspects and structure of online interaction through his
research on Usenet (Smith, 1999) and V-Chat (Smith, Farnham, & Drucker, 2000). His research and
development of Netscan provide entrance into various elements of newsgroups. Netscan uses a "dashboard"
display that offers the user a thread-tree2 visualization, a piano roll component, interpersonal connections, and
message display (see Smith & Fiore, 2001). These visualization components illustrate patterns of activity as well
as some conversational structures in Usenet newsgroup threads. Smith's research is of particular importance
because it offers some of the first successful structural visualization of very large online Usenet communities.
However, representing the structure is only one level of analysis needed for the understanding of how people
use online communities. The study of spatial movement can also provide useful understandings of online
interaction. The two following studies are offered as examples of this sort of analysis.
In a study done on V-Chat, Smith and his colleagues (Smith et al., 2000) were able to analyze 3D chat spaces
by extracting measures on avatar gesturing and positioning, along with several other analyses of session length
and number of sessions attended. Smith et al. (2000) found that these users did indeed use the spatial features
unique to 3D chat, but that they did so less and less over time. They also found that the spatial interaction was
similar to that of physical interaction. These findings provide some of the first insights into the spatial uses of
online communities, particularly in regard to the use of avatar functions.
Using a different method then Smith et al. (2000), Krikorian, Lee, Chock, and Harms (2000) analyzed user
interaction in a graphical chat room using video capture techniques to measure avatar proximity. The authors
found a positive parabolic relationship between users' liking and avatar distance, where those who liked each
other more tended to be either closer or farther away from each other. Later results indicated that male-male and
female-female clusters explained the social distance and attraction parabola (Krikorian & Lee, 2003). The
findings by Krikorian et al. offer unique insights when compared to prior studies in that they relate the spatial
distance of avatars to that of social liking.
Another methodological innovation was introduced with the development of measures of asynchronous
interaction by Krikorian and Kiyomiya (2002). In the Krikorian and Kiyomiya study these measures were used to
model Usenet newsgroups as self-organizing systems and developed the newsgroup death model, identifying
the communication constructs of these online communities that indicate their decline. This model was able to
produce "box-scores" and scatter-plots that indicated the health and status of these newsgroups (Krikorian &
Ludwig, 2002). Much unlike previous methods, this approach is able to compare extremely large Usenet groups
to gain an understanding of the functions behind the failure of certain newsgroups, as well as identify such
groups before their actual "death." Such a perspective is useful in developing preventive actions for those groups
in danger of failing.
Krikorian and Ludwig (2003) developed overtime network mapping software that represents any threaded
message network as a longitudinal movie, revealing patterns over time. This software is also unique in that it
reveals a network of users as well as messages, creating a bimodal network using novel clique detection
http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (3 of 19) [1/11/2005 9:13:46 AM]
Procedures for Analyses of Online Communities
processes. Likewise, the ability to visualize large networks over time is a method not previously available and
brings a heightened understanding to the lifecycle of such networks and communities.
The methods covered thus far visit the structural and spatial study of online communities; however, the content
of the interaction within these communities is also of particular importance. One such method was developed by
Sack (2000) with the Conversation Map software, which represents the threads in a newsgroup over a period of
time as glyphs, as well as the threads' social and semantic interrelationships. Conversation Map has interactive
features that enable analysts to create maps at different levels of detail, while still being able to see the content
of specific messages. This tool is oneof the first to allow entrance into the content of threaded online
communities.
Along with methods developed to study the structure and content of online communities, methods have also
been developed to understand the nature of users' coordination and impressions. Hancock and Dunham (2001a,
2001b) use quantitative measures to gain insight into the use of coordination devices in text-based computermediated communication (CMC) environments (2001a), and to compare impression formation processes in CMC
and face-to-face environments (2001b). Walther and D'Addario (2001) used experimental methods to study the
impact of emoticons, graphical representations of facial expressions, on message interpretation. Emoticons are
widely used in all of the online environments mentioned in the previous studies in this section, yet little has been
learned about their impact. Walther and D'Addario found that verbal content outweighed emoticons, but both
were found to influence message interpretation. Such studies represent a very important parallel in the
methodological approaches to the study of online environments in that the structure and content of such
communities is only part of the story. There is much to gain from understanding the differences between CMC
and face-to-face environments.
There has also been work on the quantitative structural analysis of online networks. Garton, Haythornwaite, and
Wellman (1997) provide an exhaustive description of how social network techniques can be used in the study of
online communities and networks. Barnett, Chon, and Rosen (2001) used social network analysis to study the
structure of international Internet flows. Park, Barnett, and Kim described the structure of Korean political
communication via Internet networks (2000), as well as the Internet communication structure in the Korean
National Assembly (2001). Park, Barnett, and Nam (2002a, 2002b) revealed the hyperlink network structure of
top Websites in Korea. It is of particular importance to highlight network analytic studies in relation to the
analysis of online communities as many of the methods used by the online community studies discussed in this
section, as well as the methods in this study, use network-based measures and methods of analysis. For
example, Sack's (2000) Conversation Map is largely connected to network measures. Sack notes "The
Conversation Map system provides the means to spatially navigate through social networks.... one of the results
of a VLSC (very large scale conversation) is a social network" (p. 75). Likewise, it is of great use to the online
researcher to have a fundamental understanding of network analytic methods since the very structure of the
Internet is a network itself, much as Krikorian (in press) refers to it as the "Interactive Network."
Although progress has been made toward the development of techniques to map, display and study online
communities, most techniques have been implemented on thread-based communities such as Usenet groups.
Analysis of complex, dynamic non-threaded interaction, such as chat-room conversation, remains unresolved.
Graphical chat-rooms sequentially log chat interaction that is difficult to separate and analyze as sub-groups or
parsed interaction. Thus, it is useful to have a method for the analysis of non-threaded interaction since such
interactions produce large volumes of data that are difficult to navigate and interpret (see Figure 1).
Rolland: So yes...if anyone wants to take a look into this, that would be great.
MrC: sounds interesting....hello doctor could you prescribe a tomato for me?
nadya: hey guys, has anyone else found some interesting info?
Rolland: has anyone followed up on the sites that I mentioned last week?
keeper1616: I'm looking through this book, its got a lot, but a lot of junk too
nadya: that's great cyrus
nadya: anyone else
Rolland: Yeah...you really have to sort through a lot of stuff, but it helps your research skills a lot because it forces you to pare
down your reserach and look for very specific things.
nadya: hey guys...i was just wondering what is everyone working on now?
nadya: jeremy is here
keeper1616: im reading 'the tomato in america
nadya: hi jeremy
skippy: *y
nadya: huh?
http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (4 of 19) [1/11/2005 9:13:46 AM]
Procedures for Analyses of Online Communities
Immigration Officer: You are being joined by Damir.
skippy: Sorry, my bad
nadya: no prob
Damir: Damir on line
nadya: hey guys
Damir: hi
nadya: we were checking out cyrus' and cj's green house
nadya: looking good!
Rolland: Hi Damir
Rolland: Great job on the green houses.
nadya: does anyone know if mike or trevor were going to build a new green house for africe?
nadya: africa...not africe
nadya: also, is trevor coming back ever?
skippy: Unknown on both
nadya: ok thanks
Figure 1. Raw chat data.
It is our goal in this paper to explicate a method facilitating the parsing and analysis of data that is recorded in a
non-threaded manner, such as chat. The method used for this form of analysis is discussed in detail below.
Method
The purpose of the method used in this study is to adapt, automate, and implement neural-based content
analysis software to observe Internet communication patterns in chat rooms. This implementation uses
CatpacTM (Woelfel & Woelfel, 1997a), a developed and proven semantic network analysis package which has
the capability to extract word patterns and clusters. Clusters are extracted by sliding a text-window through the
text and associating each word in the window with a neuron in an artificial neural network. Using a proprietary
variation of an interactive activation and competition algorithm, connection strengths or weights are generated as
a function of the coactivation patterns among the neurons. These weights in turn serve as the basis of cluster
analysis and Galileo mapping3 (Woelfel, 1993).
Catpac consists of four general modules. The first module is system input, which consists of subsystems for
locating the input data, parsing and breaking it into "elements" for analysis, and formatting it for presentation to
the main neural engine.
The second module is the neural engine itself. This is a proprietary variant of the interactive activation and
competition type neural network. Each neuron in the network represents an element of the input data. As
elements of the input data flow through the network, nodes that represent those elements become active, and
connections among active nodes are strengthened according to one of four optional learning algorithms,
including sigmoid, hypertangential and linear algorithms. The output of this second module is a square matrix of
connection weights among the neurons representing the elements of the data.
It should be noted that, in its original form, Catpac was driven by a simple co-occurrence engine. After the
development of the neural engine, Catpac was offered with the option to use either co-occurrence data or the
neural network engine. Since most users overwhelmingly favored the results of the neural engine, subsequent
versions of Catpac have dropped the co-occurrence option. The neural engine produces much deeper, more
complex and detailed structures than the co-occurrence model, due to the fact that all indirect links are
considered. In the co-occurrence model, linkages between elements which co-occur in a case are strengthened.
But in the Catpac neural model, linkages between elements which co-occur are strengthened, as well as
linkages with other elements already linked to the co-occurring nodes in proportion to their degree of linkage.
These indirect connections allow for significantly more complex patterns to be stored and retrieved.
The third module consists of several multivariate analysis systems, which analyze the underlying structure of the
connection weight matrix. These currently consist of cluster analysis routines, perceptual mapping
(multidimensional scaling) routines, and a neural network which allows tracing associations among the elements
of the data; that is, one or more elements may be selected, and others most closely connected to them will be
http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (5 of 19) [1/11/2005 9:13:46 AM]
Procedures for Analyses of Online Communities
displayed. This technique displays clusters of words in the form of a dendogram, and as clusters of points in
three-dimensional space using a sister program, ThoughtViewTM (Woelfel & Woelfel, 1997b). This wide variety
of analytic methods makes it possible to adapt Catpac to an assortment of types of data.
Catpac has been used for the study of traditional text (Doerfel & Barnett, 1995; Freeman & Barnett, 1994;
Salisbury, 2001), such as articles and long response questionnaires. It has been successful in revealing clusters
of associated words in text that provided helpful quantitative data to support qualitative interpretations.
One of the most important aspects of the method used in this procedure is the ability to analyze data based on
set parameters. For this, an algorithm has been developed which parses chat data into separate and interrelated
files used to determine individual, group, and systematic organizational patterns over time. This becomes useful
when combined with a qualitative analysis where the researcher has an ethnographic understanding of the
community members, whereas there is a "name file" that allows for directed analysis and the labeling of
contributions. For example, if the online community were associated with a large undergraduate class, the
teacher would have the ability to observe semantic clusters extracted from only the sophomores' communication,
or only what the Communication majors are contributing as compared to the Psychology majors. If the analysis
was on a medical community one could observe the difference between communication originating from doctors
as compared to lay users. Other uses bridge to industry, where virtual task groups' interaction could be parsed,
revealing both potentially positive and negative trends in the interaction.
The following section explains the use of the method as a case study analyzing an educational online
community.
Online Community Description: SciCentr
To develop and test the method, data from the Cornell Theory Center's SciCentr were used (www.scicentr.org).
SciCentr is an effort of the Cornell Theory Center to explore the use of three-dimensional online virtual worlds for
student interaction. The mission of the project is to engage and educate community members in relation to the
excitement and complexity of computational science (Corbit, 2000). One of the main efforts in SciCentr's
creation is to complete this mission by allowing users to interact in ways appropriate to their own level of interest,
commitment, and ability. SciCentr's user community is comprised of advisors and technical experts, university
and high school students, as well as content and exhibits developers. The target audience is youth between the
ages of 11 and 21.
The purpose of this virtual world is to allow teens in after-school programs to create knowledge spaces, based
on research they are conducting on tomatoes with the help of researchers at Cornell. Research topics comprise
heritage, diversity, uses, production, breeding; they will soon include genetic engineering and molecular
genetics. Active Worlds client/server technology is used for the implementation of a 3-D multi-user virtual
science museum, SciCentr. This "world" is graphically modeled after the 1939 World Fair, and combines the
setting of a museum and an outdoor fair, or collection of demonstrations and exhibits, to create what might be
described as a virtual, outdoor exhibit-based museum.
An example of one of these exhibits is the Fourier Fountain (Figure 2, from Corbit, 2000), modeled after the
Singing Fountains of the 1939 World's Fair. In this exhibit users, represented as avatars, can play the small keys
of the fountain, and corresponding sections of the central crystal structure flash yellow as the sound is played on
the desktop. Other users can also hear the chords played. Thus, users from around the world can participate in
making music, and hear it played in "real time." Further, the chords are visually represented on the wall of the
room using sound generators in MATLAB.
http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (6 of 19) [1/11/2005 9:13:46 AM]
Procedures for Analyses of Online Communities
Figure 2. Fourier Fountain, an exhibit from Cornell Theory Center's SciCentr.
Another example of a SciCentr exhibit is the Plant Breeding Beds (Figure 3, from Corbit, 2000), which represents
an inquiry-based digital laboratory for plant breeding simulations.
Figure 3. Plant Breeding Beds, an exhibit from Cornell Theory Center's SciCentr.
http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (7 of 19) [1/11/2005 9:13:46 AM]
Procedures for Analyses of Online Communities
Data
The two figures from SciCentr provided above are just two examples from the wealth of exhibits that the students
can navigate and help create. It is of particular importance to note that both of these figures show a chat window
in the lower left of the figure, in which the immigration officer is welcoming a user to the SciCentr. These chat
windows are part of all SciCentr navigation and are the main way that the users can verbally interact with each
other.
All developers, including student, professional, and high school world builders have "citizenships" that allow
them to visit all of the worlds in Active Worlds Educational Universe (AWEDU). These users all select user id's
and passwords. Likewise, all student visitors also have user IDs (see Figure 3 for examples of user IDs). These
consistent logon names allow for effective collection of chat data since the chat windows are present during all
SciCentr activity (see chat boxes in Figures 1 and 2). Chat sessions are stored as log files containing the raw
chat data (for an example of the raw chat data see Figure 3). The people included in this data set are high
school students at a rural high school, science students at Cornell University acting as mentors for the high
school students, and volunteers who helped guide the interaction.
Analytic Method
The data are parsed into several different levels of analysis. First, the chat data is left as is, including logon
names, to analyze how the presence of these IDs would affect the results. This analysis also allows for the
increased understanding of the structure of the community since the logon names will be clustered together if the
users had frequent common interaction. Second, the logon names are eliminated from chat logs to allow the
outputs to represent only the actual conversational interaction. This step is taken because of the frequency of
logon names in the data and the expected domination of them in the outputs. Chat statements of individual users
are then parsed into separate files containing only those users' text. Third, individual user files are bundled into
location sets, one for the high school and one for Cornell. This step is taken to see if there is a difference
between trends of what the high school students are saying compared to what the Cornell mentors are saying.
Finally, the individual user files are bundled by gender to test for differences and similarities.
These data sets are analyzed using Catpac, producing dendogram outputs clustered with Ward's method.4
Visual representation was further enhanced using the three-dimensional representation software, Thoughtview.
Outputs
All results from the analysis are represented as follows: Lists of most frequent words in descending as well as
alphabetical order (see Table 1), icicled dendograms (which are read vertically where the height of the pillars
connecting the words represents their association and thus clustering words together), and as three-dimensional
planes containing the word clusters. These planes are rotated around any of the three axes to gain visual access
to word clusters that may be hidden in any given view (see figures 5 and 6 for examples of complete output).
In the test concerning the analysis of the raw chat, the four most frequently occurring words in the descending
frequency list are logon names. Likewise, 9 of the 12 words in a large cluster in the dendogram are also logon
names (see Figure 4). The users that were clustered together revealed that the two mentors (Nadya and
Rolland) were most central along with the technical coordinator (MrC), with the most active students (Emrys,
Keeper, Kikki, Telekenetix, and Tiona) centered about the mentors. This has important structural implications
discussed in the next section.
Ward's Method
http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (8 of 19) [1/11/2005 9:13:46 AM]
Procedures for Analyses of Online Communities
Figure 4. Dendogram view of all chat with logon names.
Removing the logon names from the data set yielded results that represent a more accurate representation of
clusters of words in the actual interaction. The strongest clusters are:
●
I'm - today - joined - OK
http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (9 of 19) [1/11/2005 9:13:46 AM]
Procedures for Analyses of Online Communities
●
●
●
●
●
●
●
●
everyone - me - good - guys - great
don't - know
will - think - yes
want - go - going - am
greenhouse - tomato
build - stuff
bye - next - time
Cornell - doing
The results of the data representing all statements written by mentors located on the Cornell campus reveal the
following clusters:
●
●
●
●
●
●
everyone - want - question - go - me - sign
OK - guys - hey - good - know - that's - great - am
America - working - greenhouse - greenhouses
tomato - tomatoes
don't - think - Rolland
feel - free - worlds - doing
DESCENDING FREQUENCY LIST
ALPHABETICALLY SORTED LIST
WORD
FREQ
PCNT CASE CASE
WORD
FREQ PCNT
FREQ PCNT
CASE CASE
FREQ PCNT
HOUSE
156
3.6
189
10.0 AM
76
1.7
235
12.4
TIME
142
3.2
147
7.8
AMERICA
71
1.6
82
4.3
GREEN
127
2.9
187
9.9
ASK
65
1.5
70
3.7
GUYS
113
2.6
297
15.7 BIT
76
1.7
89
4.7
KNOW
111
2.5
239
12.6 BUILD
54
1.2
68
3.6
EVERYONE
110
2.5
324
17.1 BUILDING
79
1.8
132
7.0
ME
110
2.5
240
12.7 CORNELL
53
1.2
105
5.6
HOUSES
108
2.5
120
6.3
CYRUS
54
1.2
213
11.3
US
108
2.5
168
8.9
DOING
60
1.4
0
0.0
GO
106
2.4
231
12.2 DON'T
67
1.5
212
11.2
WORLD
105
2.4
95
5.0
EVERYONE
110
2.5
324
17.1
ROLLAND
104
2.4
169
8.9
FEEL
46
1.0
46
2.4
NEXT
103
2.3
158
8.4
FREE
49
1.1
25
1.3
STUFF
98
2.2
123
6.5
GO
106
2.4
231
12.2
WANT
98
2.2
205
10.8 GOING
88
2.0
221
11.7
WILL
98
2.2
176
9.3
GOOD
70
1.6
266
14.1
INFO
92
2.1
170
9.0
GREAT
80
1.8
238
12.6
QUESTIONS
89
2.0
180
9.5
GREEN
127
2.9
187
9.9
GOING
88
2.0
221
11.7 GREENHOUSE
88
2.0
125
6.6
GREENHOUSE
88
2.0
125
6.6
GREENHOUSES 73
1.7
110
5.8
OVER
86
2.0
82
4.3
GUYS
113
2.6
297
15.7
TOMATOES
86
2.0
121
6.4
HEY
64
1.5
269
14.2
TODAY
83
1.9
181
9.6
HOUSE
156
3.6
189
10.0
http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (10 of 19) [1/11/2005 9:13:46 AM]
Procedures for Analyses of Online Communities
GREAT
80
1.8
238
12.6 HOUSES
108
2.5
120
6.3
OK
80
1.8
301
15.9 INFO
92
2.1
170
9.0
BUILDING
79
1.8
132
7.0
111
2.5
239
12.6
AM
76
1.7
235
12.4 LAST
71
1.6
131
6.9
BIT
76
1.7
89
4.7
LOOKING
51
1.2
119
6.3
TOMATO
75
1.7
136
7.2
MARGARET
50
1.1
83
4.4
GREENHOUSES 73
1.7
110
5.8
ME
110
2.5
240
12.7
WORK
73
1.7
125
6.6
NAME
57
1.3
79
4.2
AMERICA
71
1.6
82
4.3
NEXT
103
2.3
158
8.4
LAST
71
1.6
131
6.9
OK
80
1.8
301
15.9
GOOD
70
1.6
266
14.1 OKAY
68
1.5
286
15.1
OKAY
68
1.5
286
15.1 OVER
86
2.0
82
4.3
DON'T
67
1.5
212
11.2 PEOPLE
46
1.0
149
7.9
ASK
65
1.5
70
3.7
89
2.0
180
9.5
HEY
64
1.5
269
14.2 REALLY
54
1.2
126
6.7
THINK
62
1.4
213
11.3 ROLLAND
104
2.4
169
8.9
DOING
60
1.4
0
0.0
SIGN
52
1.2
51
2.7
NAME
57
1.3
79
4.2
STUFF
98
2.2
123
6.5
TUESDAY
54
1.2
68
3.6
THINK
62
1.4
213
11.3
CYRUS
54
1.2
213
11.3 TIME
142
3.2
147
7.8
REALLY
54
1.2
126
6.7
TODAY
83
1.9
181
9.6
WORKING
54
1.2
96
5.1
TOMATO
75
1.7
136
7.2
CORNELL
53
1.2
105
5.6
TOMATOES
86
2.0
121
6.4
SIGN
52
1.2
51
2.7
TUESDAY
108
2.5
168
8.9
MARGARET
50
1.1
83
4.4
WANT
98
2.2
205
10.8
FREE
49
1.1
25
1.3
WILL
98
2.2
176
9.3
KNOW
QUESTIONS
Table 1. Cornell all chat, word list and dendogram view.
http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (11 of 19) [1/11/2005 9:13:46 AM]
Procedures for Analyses of Online Communities
Figure 5. Cornell all chat, word list and dendogram view.
http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (12 of 19) [1/11/2005 9:13:46 AM]
Procedures for Analyses of Online Communities
Figure 6. 3-D Representation of Cornell Location Users
The results of the text analysis of the high school location presented the following clusters:
●
●
●
●
●
●
yeah - OK - will - go - Rolland - good - don't - I'll - bye
are - doing - additional - leaning - someone - requires - program - needs
first - place
else - someone
name - doing
able - objects
Analysis of the males' chat output presented several smaller clusters:
●
●
●
●
●
●
yes - I'm - doing - good
everyone - that's - want - know
later - ya - go
last - year
can't - thing - going
built - us - greenhouse
Much like the male output, the female analysis presented several smaller clusters:
●
●
●
●
●
●
●
I'm - go - name - me - tomatoes - don't - know
killing - people - wrong
okay - bye - time
sure - want - information
death - penalty
sister - think
hey - am - doing - something
http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (13 of 19) [1/11/2005 9:13:46 AM]
Procedures for Analyses of Online Communities
Findings
The analysis performed with the logon names present in the data revealed that these logon names are indeed
dominant and did not allow for a good view of the content of the chat interaction. However, it did provide an
understanding of the social structure of the community in regards to which individuals were frequently grouped
together in the interaction. The main group is comprised of the mentors at Cornell and the community
developers, followed by the most active student users. This is important because it shows that the mentors were
indeed often present to assist the students, as well as indicating extent of use by the students and who the most
frequent users are. This finding can be associated with a network clique, which, to a large extent, is the one of
the main powers of social network analysis (the ability to identify cliques within larger social groups, e.g.
departments within a corporation). Likewise, the most frequent users can be associated with centrality, or how
central an actor is within a network. Although these are only associations, they do provide a basic groundwork
for interpretation of chat interaction.
Output from the data with the logon names extracted gives a view of the conversational aspects of the chat
sessions. The mentors were the most frequent users, thus the clusters are likely more representative of what
they said, than what the students said. This is reflected in the results where the content of the clusters seem to
be initiating, leading, and closing statements such as I'm joined today, everyone me good guys great, bye next
time, and build stuff. Also revealed is the presence of greenhouse and tomato, which are both central topics of
the SciCentr.
Given that the mentors were dominant in the data, the analysis separating them from the high school students is
important for an accurate understanding of the community. The mentors were indeed giving positive feedback
such as OK-guys-hey-good-know-that's-great and everyone-want-questions-go-me-sign, as well as central
themes of the research, such as America-working-greenhouse and tomato-tomatoes. Likewise the high school
students seem to be accepting the help with clusters like yeah-ok-will-go and are-doing-additional-someonerequires-program-needs.
An interesting finding is the difference in the gender-based analysis. The females have such clusters as killingpeople-wrong and death-penalty, which were completely absent from the male data. Males had clusters like yesI'm-doing-good, can't-thing-going, and built-us-greenhouse. Although the implications of differences like these
will require further investigation (mainly in the analysis of subsequent iterations of the SciCentr project),
interaction in the online community indicated that both males and females in the high school found the 3D virtual
worlds stimulating, increasing their interest in scientific research (Corbit, 2000). Highlighting this point are relative
similarities such as the males' clusters everyone - that's - want - know and built - us - greenhouse, and the
females' clusters hey - am - doing - something and sure - want - information. It is anticipated that the semantic
network analysis results explicated above, when combined with analysis of the subsequent iterations of the
community, may begin to catalyze the understanding of the possible similarities and differences in why male and
female high school students are interested in scientific discussion.
Discussion
Findings from the semantic network analysis provide increased insight into the interaction in the educational
online community. This level of insight was not previously accessible to the developers due to the large amount
of data generated by chat interaction. It is in this sense that the combination of quantitative methods, such as
those explicated in this article, when combined with the qualitative/ethnographic approach that the researchers
are already using, truly allow for the increased understanding of such communities. Indeed an ethnographic
understanding of the community would allow for intuitions such as the moderators' being the most frequent
users, yet the quantitative measures to back these intuitions were not available. It should also be noted that this
analysis provided a substantive framework for the development of the next iterations of the SciCentr project,
mainly regarding the amount of chat data created. Having the ability to analyze the large amount of data
produced in the community enabled the developers to foster the use of the chat ability of the SciCentr. Likewise,
the moderators of the community were more motivated to encourage conversation since they understood both
the impact that the chat was having on the students as well as the ability to extract themes from the interaction.
http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (14 of 19) [1/11/2005 9:13:46 AM]
Procedures for Analyses of Online Communities
An important development for the study of online communities is the ability to perform longitudinal analysis (for
examples of other online longitudinal research see Krikorian, in press; Sack, 2000; and Smith et al., 2000).
Interaction logs have been recorded for subsequent semesters of the SciCentr project and are currently being
processed for an over-time perspective on the community. Having a longitudinal understanding of online
communities will catalyze insights into such dynamics as organizing, communication convergence, and relative
success of such communities. This will increase the insight provided by the quantitative methods, further
stressing the need to combine the two approaches.
An important consideration when using outputs from programs like Catpac is that the picture gained is highly
interpretive, and there is always the possibility that there are threads of text that were not included due to
infrequency. These threads may be extremely informative. Likewise, the clusters that are revealed are only
clusters and it becomes difficult to identify the context in which they occur. Again, this is where a qualitative
understanding is crucial to take full advantage of the network tools available. Researchers using network
analysis programs should keep in mind that these programs are only tools and will not provide meaningful
findings unless the full story behind the data is investigated. It is in this sense that this article, along with the
others in this volume, work to stress the benefits of using a multi-level approach combining both quantitative
methods with the qualitative insight needed to interpret the findings.
Similarly, multidimensional representation should be used as a supplemental tool, since the apparent distances
between nodes can be somewhat arbitrary. The presence of these clusters in relation to each other can be
informative (given some separation in the output) in that one can gain an understanding of the relation of words
in the dataset. Thus, outputs offered in this article, as well as other multidimensional representations, should be
viewed with these considerations in mind.
The level of information gained using semantic network analysis tools and the power of the outputs are positively
related to the amount of text analyzed. If there are more data the neural network has more opportunity to learn
how to associate the words. The dataset used in this study was only from one semester of interaction, and the
insight gained by these methods increases with the amount of information analyzed. Thus, such methods will
greatly assist the interpretive understanding when studying larger online communities. Paccagnella (1997) notes,
"deep, interpretive research on virtual communities could consequently be greatly helped by an accurate use of
new analytic, powerful yet flexible tools, exploiting the possibility of cheaply collecting, organizing and exploring
digital data."
Implications for Future Development
The case study provided many insights that illuminate the development of the methods described in this paper.
The method of parsing chat data so that they can be studied on individual and group levels is extremely useful.
Automating this process allows users to input a name file allowing for the immediate analysis of the data on any
demographic or systems level.
The analysis suggests that Catpac should be redesigned to deal specifically with chat data. It is currently built to
handle sentence and paragraph style text, where chat data are typically conversational and thus comprised of
very short statements made by a variety of users. The engine will need to be rebuilt to deal with this difference.
Another limitation of the current engine is the inability to handle certain languages. Although Catpac does have
the ability to process over 12 languages, it cannot handle such languages as Mandarin, Korean, Thai, and many
other languages. As the Internet becomes an increasingly global medium, international comparative research
becomes more important, yet Catpac research is limited in this regard5.
It has also become quite apparent that the methods used will need to incorporate many other network tools,
such as the ability to search for specific information in the text. This ability will be beneficial whether within a
specific chat room or spidering the Internet to locate chat rooms currently using certain terms.
For increased visual representation, future applications would benefit from implementing a real-time continuously
updating engine, introducing the ability to observe the three-dimensional representation as the chat is
http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (15 of 19) [1/11/2005 9:13:46 AM]
Procedures for Analyses of Online Communities
progressing. Thus words could be seen entering and leaving the window, as well as the fluid and continuous
movement of clusters and word relations adding another level to the ability to observe online interaction.
Factors such as gender, age, income, geographic proximity, and anonymity will all play important roles in
computer-mediated communication (see Miller, 1995; Wallace, 1999). Likewise chat contexts such as stock
trading, romance, fan clubs, moderated chat (e.g., with experts or personalities), rural civic groups, science
tutoring for children, and feedback forums provide examples of the diversity of communities. Given this dynamic
spectrum, it is crucial that methodological approaches be developed to extract various demographic, institutional,
and attitudinal information. Eventually an analysis of a large array of genres will help understand the nature of
chat interaction in a multitude of contexts and levels of measurement.
As human interaction continues to be increasingly integrated with communication technologies, quantitative
procedures for the analysis of the interactions need to be developed at a pace on par with the technological
development. Not only will analysis procedures help us understand the changing human condition, it will also
assist in the heuristic development of future communication technology. Researchers using these methods
should also incorporate qualitative research methods, for it is crucial to integrate these techniques to develop a
multi-method approach in determining the resultant epistemological changes new media will catalyze.
If we understand the revolutionary transformations caused by new media, we can anticipate and control them; but if we
continue in our self-induced subliminal trance, we will be their slaves. (1974 McLuhan interview, in Benedetti, 1996, p. 74)
Acknowledgements
All correspondence should be sent to the first author at [email protected]. A special thanks goes to Margaret
Corbit, the Cornell Theory Center, and the whole SciFair project for their generosity. Also, many thanks to
Melissa Carvalho for her time and effort, and Joe Walther for help with editing.
Footnotes
1. Semantic network analysis is similar to social network analysis in that it uses a node (or actor, which can be discrete individual,
corporate, or social entities) and link (or relational tie, the defining feature of which is that they establish a connection of some form
between the nodes). However, the words in a body of text are treated as the nodes and the connections weights become the links.
2. Threads are chains of posts linked to each other, where each post contains a header that records information about the post.
3. Galileo theory offers laws of processes similar to common laws of physics. In Galileo mapping, items have not only location, but
also equivalents to mass and velocity. Thus, a Galileo map is not a void space with occasional concepts in it, it is more associated
to Einsteinian space-time, in which forces exist between items in the space. It is in this sense that there isn't empty space, but
rather areas of increased mass and thus their associated forces.
4. For a detailed description of clustering algorithms available in Catpac, see Catpac Users Manual (Woelfel & Woelfel, 1997a).
5. For example, Park (2002) had to translate Korean texts into English.
References
Barnett, G. A., Chon, B. S., & Rosen, D. (2001). The structure of international internet flows in cyberspace.
NETCOM (Network and Communication Studies), 15 (1-2), 61-80.
Benedetti, P., & Dehart, N. (Eds.) (1996). Forward through the rearview mirror: Reflections on and by Marshall
McLuhan. Prentice Hall: Toronto.
Corbit, M. (2000). Building virtual worlds for informal science learning (SciCentr and SciFair) in the Active Worlds
educational universe (AWEDU). Paper presented to the Workshops on Enabling Technologies: Infrastructure for
http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (16 of 19) [1/11/2005 9:13:46 AM]
Procedures for Analyses of Online Communities
Collaborative Enterprises, at the National Institute of Standards and Technology. Retrieved January 5, 2003 from
http://www.tc.cornell.edu/~corbitm/corbit.nist.2000.htm.
Doerfel, M. L., & Barnett, G. A. (1996). The use of CATPAC for textual analysis. Cultural Anthropology Methods,
8, 4-7.
Freeman, C. A., & Barnett, G. A. (1994). An alternative approach to using interpretative theory to examine
corporate messages and organizational culture. In L. Thayer & G.A. Barnett (Eds.), Organization<->communication: Emerging perspectives (pp. 60-73), Norwood, NJ: Ablex.
Garton, L., Haythornwaite, C., & Wellman, B. (1997) Studying online social networks. Journal of Computer
Mediated Communication, 3 (1). Retrieved March 10, 2003 from
http://www.ascusc.org/jcmc/vol3/issue1/garton.html.
Hancock, J. T., & Dunham, P. J. (2001a). Language use in computer-mediated communication: The role of
coordination devices. Discourse Processes, 31, 91-110.
Hancock, J. T., & Dunham, P. J. (2001b). Impression formation in computer-mediated communication revisited:
An analysis of the breadth and intensity of impressions. Communication Research, 28, 325-347.
Johnson, S. (1997) Interface culture. San Francisco: Basic Books.
Krikorian, D., Lee, J., Chock, T. M., & Harms, C. (2000). Isn't that spatial?: Distance and communication in a 2-D
virtual environment. Journal of Computer Mediated Communication, 5(4). Retrieved January 8, 2003 from
http://www.ascusc.org/jcmc/vol5/issue4/krikorian.html.
Krikorian, D. (in press). The newsgroup death model: Internet groups as self-organizing. In G.A. Barnett, & R.
Houston, (Eds.) Progress in communication sciences Vol. 18, Self-organizing Systems. Greenwich, CT: Ablex.
Krikorian, D. & Kiyomiya, T. (2002). Bona fide groups as self-organizing systems: Applications to electronic
newsgroups. In L.R. Frey (Ed.), Group communication in context: Studies of bona fide groups (pp. 335-365).
New York: Lawrence Erlbaum.
Krikorian, D., & Ludwig, G. (2002, March) Groupscope: Data mining tools for online communication networks.
Paper presented at the 22nd annual Sunbelt Social Network Conference, New Orleans, LA.
Krikorian, D., & Ludwig, G. (2003, February) Advances in network analysis: Over-time visualization, dual-mode
relations, and clique detection methods. Paper presented at the 23rd annual Sunbelt Social Network
Conference, Cancun, Mexico.
Krikorian, D., & Lee, J. (2003) Explaining the social attraction-distance parabola: Same sex effects in online
stranger interaction. Working paper. Ithaca, NY: Cornell University.
McLuhan, M. (1969) Counterblast. New York: H.B. &W. Inc.
Miller, H. (1995). The presentation of self in electronic life: Goffman on the Internet. Paper presented at
Embodied Knowledge and Virtual Space, London, 1995. Retrieved March 1, 2003 from
http://ess.ntu.ac.uk/miller/cyberpsych/goffman.htm.
Paccagnella, L. (1997). Getting the seats of your pants dirty: Strategies for ethnographic research on virtual
communities. Journal of Computer Mediated Communication, 3 (1). Retrieved January 8, 2003 from
http://www.ascusc.org/jcmc/vol3/issue1/paccagnella.html.
Park, H. W. (2002). Examining the determinants of who is hyperlinked to whom: A survey of webmasters in
Korea. First Monday, 7 (11). Retrieved April 5, 2003 from http://www.firstmonday.dk/issues/issue7_11/.
http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (17 of 19) [1/11/2005 9:13:46 AM]
Procedures for Analyses of Online Communities
Park, H. W., Barnett, G. A. & Kim, C. S. (2000). Political communication structure in Internet networks-- A
Korean case. Sunggok Journalism Review, 12, 67-90.
Park, H. W., Barnett, G. A., & Kim, C. S. (2001). Internet communication structure in Korean National Assembly:
A network analysis. Korean Journal of Journalism & Communication Studies, Special English Edition, 185-204.
Park, H. W., Barnett, G. A., & Nam, I. Y. (2002a). Hyperlink-affiliation network structure of top websites:
Examining affiliates with hyperlinks in Korea. Journal of the American Society for Information Science and
Technology, 53 (7), 1-10.
Park, H. W., Barnett, G. A., & Nam, I. Y. (2002b). Interorganizational hyperlink networks among websites in
South Korea. NETCOM: Network and Communication Studies, 16(3/4, Special issue on the Internet
development in Asia), 155-173.
Sack, W. (2000). Conversation map: An interface for very large scale conversations. Journal of Management
Information Systems, 17(3), 73-92.
Salisbury, J. G. T. (2001). Using neural networks to assess corporate image. In M. West (Ed.), Progress in
communication sciences, Vol 17: Applications of computer content analysis (pp.65-86). Westport, CT: Ablex.
Smith, M., & Fiore, A. (2001). Visualization components for persistent conversations. In ACM SIG CHI 2001.
Retrieved February 10, 2003 from http://www.research.microsoft.com/~masmith/Visualization Components for
Persistent Conversations - Final.doc.
Smith, M., Farnham, S., & Drucker, S. (2000). The social life of small graphical chat spaces. In ACM SIG CHI
2000. Retrieved February 10, 2003 from http://research.microsoft.com/~masmith/The Social Life of Small
Graphical Chats.doc.
Smith, M. (1999). Invisible crowds in cyberspace: Measuring and mapping the social structure of USENET. In M.
Smith & P. Kollock (Eds.), Communities in cyberspace: Perspectives on new forms of social organization.
London: Routledge Press.
Wallace, P. (1999). The psychology of the Internet. Cambridge, U.K: Cambridge University Press.
Walther, J. B., & D'Addario, K. P. (2001). The impacts of emoticons on message interpretation in computermediated communication. Social Science Computer Review, 19, 323-345.
Wellman, B., & Gulia, M. (1999) Net surfers don't ride alone: Virtual communities as communities. In M. Smith &
P. Kollock (Eds.), Communities in cyberspace (pp. 331-367). London: Routledge Press.
Woelfel, J. (1993). Artificial neural networks in policy research: A current assessment. Journal of
Communication, 43(1), 63-80.
Woelfel, J., & Woelfel, J. (1997a) Catpac version 2.0, Galileo Corporation.
Woelfel, J., & Woelfel, J. (1997b) ThoughtView version 2.0, Galileo Corporation.
About the Authors
Devan Rosen is currently a doctoral student at Cornell University in the Department of Communication, with a
focus on Communication Technology and Networks. He received his B.A. at the University at Buffalo,
Department of Communication, with a focus in Organizational and Intercultural Communication. He then worked
http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (18 of 19) [1/11/2005 9:13:46 AM]
Procedures for Analyses of Online Communities
in industry before returning to the University at Buffalo to receive his M.A. from the Department of
Communicationm with a focus in Social Network Analysis and Organizational Communication. His research foci
range from the self-organization and emergence of human interaction, to the use of social network measures
and neural network applications for the longitudinal study of online communities.
Address: Department of Communication, Cornell University, 336 Kennedy Hall, Ithaca, NY 14853-4203.
Joseph Woelfel received his Bachelor's degree from Canisius College, and his Master's and Ph.D. from the
University of Wisconsin at Madison. He has served on the faculty of the University of Illinois at UrbanaChampaign, Michigan State University, and the State University of New York at Albany, where he was Professor
of Communication and Director of Research and Founding Fellow of the Institute for the Study of Information
Science. He is currently Professor and former Chair of the Department of Communication at the University at
Buffalo. Professor Woelfel was a Senior Fellow at the East West Center in Honolulu, a Fulbright scholar in
Yugoslavia, and Senior Fellow at the Rockefeller Institute of Government at the State University of New York. Dr.
Woelfel is the author of numerous books and articles, including The Measurement of Communication Processes:
Galileo Theory and Method, with E. L. Fink. He is a principal developer of extensive computer software, including
the suite of Galileo programs, and CATPAC, a text analysis program utilizing artificial neural technology. Dr.
Woelfel has also served as president of Terra Research and Computing, and is currently president of The
Galileo Company. Dr. Woelfel's clients include many of the Fortune Top 50, and his software is widely used in
both academic and business settings worldwide. Current biography can be found in Who's Who in America and
Who's Who in The World.
Address: Department of Communication, State University of New York at Buffalo, 528 Baldy Hall, Buffalo, NY
14260-1020.
Dean Krikorian Dean H. Krikorian (Ph.D., University of California, Santa Barbara) is an Assistant Professor in the
Department of Communication at Cornell University. His research examines organizational communication,
small group decision-making processes, and the Internet. He is director of the Cornell Communication Network
Laboratory, which examines network communication patterns, particularly in online environments. He is currently
developing network analytic software for Internet groups.
Address: Department of Communication, Cornell University, 336 Kennedy Hall, Ithaca, NY 14853-4203.
George A. Barnett (Ph.D., Michigan State University, 1976) is currently Chair and Professor of Communication at
the State University of New York at Buffalo. Dr. Barnett has also taught at Rensselaer Polytechnic Institute and
the University of Texas at Austin. He has written over 100 books, articles and conference papers on such topics
as organizational, mass, international, intercultural, political, technical and scientific communication, as well as
marketing communication, public relations and the diffusion of innovations. He has edited the Handbook of
Organizational Communication (Ablex, Norwood NJ, 1988) and is currently editor of Organization <-->
Communication: Emerging Perspectives and Progress in Communication Science. The goal of his current
research is to describe the patterns of use or structure of international communication, in general, and
telecommunications (telephone and computer based communication-the World Wide Web) in particular. He also
has an interest is in the sociology of knowledge, especially as it applies to the field of communication. Currently,
he is involved in a study that examines the absolute and distributed information in the field along with its
applications to other social organizations.
Address: Department of Communication, State University of New York at Buffalo, 528 Baldy Hall, Buffalo, NY
14260-1020.
©Copyright 2003 Journal of Computer-Mediated Communication
http://www.ascusc.org/jcmc/vol8/issue4/rosen.html (19 of 19) [1/11/2005 9:13:46 AM]