A Methodological Framework for Socio

A Methodological Framework for Socio-Cognitive
Analyses of Collaborative Design of Open Source
Software*
Warren Sack1, Françoise Détienne2, Jean-Marie Burkhardt2, Flore Barcellini2,
Nicolas Ducheneaut3, Dilan Mahendran4
1
University of California, Santa Cruz, USA
2
INRIA, Eiffel research group,
Domaine de voluceau, Rocquencourt, BP 105,
78153 Le Chesnay, France
3
Palo Alto Research Center (PARC)
3333 Coyote Hill Road
Palo Alto, CA 94304 - USA
4
University of California, Berkeley,
CA 94720-2316, USA
[email protected] ; [email protected] ; [email protected] ;
[email protected], [email protected]; [email protected]
The Open Source Software (OSS) movement has received enormous attention in the last
several years. It is often characterized as a fundamentally new way to develop software that
poses a serious challenge to the commercial software business that dominates most software
markets today (Raymond, 2001). It is claimed, for example, that defects are found and fixed
very quickly because there are “many eyeballs looking for the problems.” Code is written
with more care and creativity, because developers are working only on things for which they
have a real passion. All these potential advantages are said to emerge from the following
characteristics of work and collaboration inside OSS projects:
 OSS systems are built by potentially large numbers of volunteers.
 Work is not assigned; people undertake the work they choose to undertake.
 There is no explicit system-level design, or even detailed design.
 There is no project plan, schedule, or list of deliverables.
*
Presented at the Workshop on Distributed Collective Practices, ACM CSCW Computer-Supported
Cooperative Work, Chicago, November 2004.
http://tech-web-n2.utt.fr/cscw04/
OSS represents an extreme but successful case of geographically distributed development: codesigners work in arbitrary locations, rarely or never meet face-to-face, and coordinate their
design activity almost exclusively by three information spaces: the implementation space
(code CVS), the documentation space and the discussion space (Ducheneaut, 2003; Gasser et
al. 2003; Mockus et al. 2002).
1. Objective
The objective of our research is to understand the specific hybrid weaving accomplished by
the actors of the design process. The design process implies various types of actors: people
with prescribed roles, and also elements involved in the three information spaces. This paper
presents the methodological framework we have constructed to analyse these links which
emerge between these elements from a socio-cognitive perspective.
There exist a wide variety of ongoing Open Source Software (OSS) projects. We choose to
work on the design processes of an OSS project devoted to the development of a
programming language called Python (see http://www.python.org).
The Python project is particularly interesting because the designers of Python engage in a
specific design process called Python Enhancement Proposals (PEPs) which are similar to two
design processes used in conventional software projects: RFCs (request for comments) and
technical review meetings. The negotiation, refinement and editing of PEPs are akin to a
design process, called RFCs, that has been practiced for decades to define standards for the
Internet (used, especially by the Internet Engineering Task Force, IETF). PEPs are also
comparable to technical review meetings (D’Astous et al, in press) as practiced in many
corporate and governmental settings.
The Python project is also interesting because the PEPS design process can be seen as
distributed through three information spaces: the implementation space, the documentation
space and the discussion space. It thus offers us interesting data to analyse the links
constructed between these three spaces and people involved in the design process. Our object
of study is the hybrid weaving accomplished by the actors involved in the negotiation,
elaboration, development and implementation of the PEPs.
2. Information spaces and design process in Python
PEPs are the main mechanisms for proposing new features, for collecting community input on
an issue, and for documenting the design decisions that have gone into Python. A PEP is a
design document providing information to the Python community, or describing a new feature
for Python. It should provide a concise technical specification of the feature, a rationale for
the feature and a reference implemention.
Each PEP has a champion (the author of the PEP). The PEP champion should collect
community feedback by posting it to the comp.lang.python newsgroup (a.k.a. [email protected] mailing list).The PEP champion then emails the PEP editors, who assign
PEP numbers and change their status, with a proposed title and a draft of the PEP. If the PEP
editor approves, he will assign the PEP a number, and give it status "Draft".
The author of the PEP is then responsible for posting the PEP to the community forums,
[email protected] and/or [email protected] where the PEP is discussed. Finally,
it is Guido (the project leader called Beneficient Dictator For Life BDFL) and his chosen
consultants, who may accept or reject a PEP or send it back to the author(s) for revision. Once
a PEP has been accepted, the reference implementation must be completed. When the
reference implementation is complete and accepted by Guido, the status will be changed to
"Final".The implementation can take place. A PEP can also be assigned status "Deferred",
"Rejected" or can also be replaced by a different PEP.
PEP work flow is as follows:
Draft -> Accepted -> Final -> Replaced
^
+----> Rejected
v
Deferred
We analyse the PEPs design process as proceeding through three information spaces: the
discussion space, the documentation space and the implementation space.
The discussion space is composed of several newsgroups and mailing lists. Most of
newsgroups are also available as a mailing lists for participants who don't have Usenet access
or prefer to receive messages as e-mail.
The comp.lang.python newsgroup is about developing with Python, not about development of
the Python interpreter itself. PEPs ideas are discussed here before getting an official PEP
status (or not). The Python-dev newsgroup is for work on developing Python: fixing bugs and
adding new features to Python itself. Practically everyone with CVS write privileges is on
python-dev, and first drafts of PEPs are posted here for review and rewriting before their
public appearance on python-announce. The comp.lang.python.announce newsgroup is a
forum for Python-related announcements. New modules and programs are announced, and
PEPs are posted to get comments from the community.
Special Interest Groups (SIGs) are smaller communities focused on a particular topic or
application such as databases, Every SIG has a mailing list. There are other mailing lists such
as patches.mailing.list and python-help mailing list.
In the documentation space, the PEPs drafts are maintained as text files under CVS control.
Archives of discussion are kept on python org, sourceforge.org, gmane.org. Messages can be
viewed according to several organizations: time, topics (e.g. PEPs number), threads (reply-to).
In the implementation space, the PEPs implementation can take place. The CVS (Concurrent
Versions System) tool is used to manage changes within the source code tree. The current
version of a piece of source code is stored, as well as a record of all changes (and who made
those changes) that have occurred since the preceding version and so on. While accessing the
CVS repository is free, CVS write privileges are given only to a subset of Python community
(developers).
Figure 1 shows an overview of Python PEP process with links to the three information spaces.
Figure 1: Overview of PEP -Python Enhancement Proposal process. Once a pre-PEP is accepted, it becomes a PEP which is discussed in
the “discussion space”. Archives of discussion, decisions regarding a PEP and the different versions of a PEP are kept in the documentation
spaces. So, status of PEPs and information on PEPs are distributed in these two spaces. Even when a PEP is accepted, it has to be reviewed
by BDFL. This review can put the PEP in discussion again. Finally, a PEP can produce a new piece of code (implementation space).
3. A socio-cognitive methodological framework
Following the conventions of actor-network analysis in field of science and technology
studies we hereafter refer to the “elements” of an OSS project as either “actors” or “actants”
and their interrelationships as a “network.” Thus, people, code archives, messages, threads
and PEP documents are all actors or actants and their links and relationships of cohesion
constitute an actor-network. Concurrently, we refer to the process of PEP development, and
OSS development in general, as a process of hybridization – a collective process of knitting
together in a cohesive manner the diverse elements of an OSS project.
While it is ultimately necessary to understand the emerging or accomplished coherence
attained in a PEP, we have found it methodologically more tractable to analyze the textual,
material – i.e., literal – signs of coherence in PEP process. Following the linguistic
(specifically systemic-functional) convention, we call these literal, textual signs of coherence
cohesion. Consequently, our cognitive analysis of the PEP process is explicable as an
investigation into the emerging cohesion between the many textual elements of an OSS
project.
Specifically we identified, and have developed as XML tagging schema, to define the
following important textual elements, their parts and relationships between the elements: (a)
the published PEP document (usually a webpage of a very specific format that defines the
final consensus); (b) email messages exchanged during the negotiation and development of a
PEP; (c) threads – i.e., sequences of email message replies – elaborated via email messages;
(d) code archive and editing (i.e., CVS) records.
We have examined OSS design as both a set of cognitive and social processes.
Methodologically we have combined qualitative and quantitative approaches including
ethnography, discourse analysis, social network analysis, and actor-network analysis. Our
work has also entailed the design and implementation of various computational tools for the
analysis of email and code archives and the testing and verification of these tools.
We have employed ethnographic methods to drive a needs assessment for the design of
algorithms and interfaces for analyzing the archives of OSS projects. Specifically, we have
built systems for analyzing email-based discussions and CVS code archives (e.g., the work of
Ducheneaut shown in Figure 3). Consequently, some of our ethnographic observations have
been embodied in software useful for further examination of OSS development efforts. And,
some of our qualitative and quantitative work has allowed us to debug, redesign and evaluate
our analysis software. For example, the quotation analysis shown in Appendix was done by
hand and has forced us to redesign our threading and quotation analysis software.
Depending upon which actors we focus on, our results can be understood as a social analysis
or as a cognitive analysis. Four complementary views of the socio-technical interaction
network have been constructed:
 A view on how power is distributed across three information spaces - the discussion,
implementation and documentation spaces - shows the social and governance structures in
the design project;
 A view on the evolution of links between people and two information spaces – the
discussion and implementation spaces - shows the progressive integration of people into
the socio-technical network;
 A view on the dynamics in the discussion space and the links with the social structure
shows how the design activity reflects the social and organizational structure in the project
and people influence in design;
 A view on the links between the code space (architecture) and the social structure shows
how the technical structure influences the social structure of the project.
3. 1 Social and governance structures
Much of the focus of our work has been on understanding the diversity, interrelationships, and
dimensions of the social and organizational roles played by participants in the Python project
and, specifically, in the PEP process (cf., Gacek et al., 2004). Some of these roles are
explicit, other are implicit. For example, the founder of the project, Guido Van Rossum, is
referred to playfully – but explicitly – as the Python Project’s BDFL, “Benificient Dictator for
Life.” Others in the project have explicit roles insofar as, for instance, they are assigned to
lead the development or be administrators of specific parts of the project. Other roles are
implicit: question-answerers in online discussions, novices seeking help, etc. We have done a
long-term ethnography of the Python project (Mahendran, 2002) and roughly sketched the
interrelationships between roles in the Python project using the hierarchy shown in Figure 2.
Figure 2: Sociotechnical stratification of roles in the Python project.
Figure 2 reveals a very conventional
organizational structure: one leader (Von Rossum) has
control over the project; directly below him, in
organizational power, are a few people who work
directly with him and are known as the Python Lab
core team, below them are members of a particular
mailing list (Python-dev) who also have the power to
directly change the code of the project, below them are
advanced members who can comment on the project
but cannot change the code, and newbies (or novices)
exist on the bottom rung of the organizational
hierarchy. From this description one can understand
that power in the Python project is distributed across
the elements of the project which might generally be
distinguished as three different “spaces”: (1)
discussion spaces; (2) implementation or coding
spaces; and, (3) comment or documentation spaces.
Project participants with more power can contribute to
all of the spaces. Other participants with limited power
have, literally, certain aspects of the project that are
“off limits” to them. For example, not everyone can
make changes to the code of the project.
Perhaps one of the most striking observations of this ethnography of Python project members
concerns the how they explain and talk about their roles in the project using a vocabulary of
pre-industrial, craft- or artisan-based roles. The notion of artisan is not uncommon in other
free software communities. As observed in the Python community master/apprentice work
relationships were quite common. The guild like structure of Python, with senior developers
handing off programming projects to junior developers, is striking and marks free software
development off from commercial software ventures. Many of the social relations can be
reduced to the trope of master and apprentice. In short, one of our results is rather
paradoxical: the hypothesized “new” and “different” structure of OSS development relies of
very old ideas of production based on strict, hierarchical models of production and
social/organization roles. These old ideas are apparent in the Python project when
participants talk about their own roles in the project and when detailed quantitative studies of
work activities are carried out.
3.2 Integration of people into the actor-network
Following this ethnography of the Python project, we carried out quantitative studies
analyzing the observable cohesion between the various elements or actors of the project.
These studies of, for example, the code and email archives of the project reflect and further
substantiate the observed social and governance structures discovered in the ethnographic
work.
Our analysis allows us to define and follow participants’ roles and changing status within the
PEP process. We employ some standard social network metrics (e.g., measurements of
centrality and connectedness) extended to allow the inclusion of non-human actants in the
network (e.g., email messages and pieces of code appear as nodes in the following actornetwork). The following is taken from an automatic analysis of the corpus of messages
exchanged in the Python OSS project. The automatic analysis (see Figure 3) was done using
tools we had developed (Ducheneaut, 2003). The analysis shows how the participant, “Greg,”
starts in January 2002 with a proposal to extend the Python language. As an outsider he needs
to work his way into the center of the social and technical network of the project before his
proposal has any chance of success. He managed to work his way from outsider to insider in
about 10 months by contributing both to the ongoing discussion and also by writing code for
the project (cf., the work of Madley et al., 2004).
Figure 3: Map of the progressive
integration of a software designer into
the social (i.e., online discussion) and
technical (i.e., code) networks of
Python, an Open Source Software
project.
The round, black nodes
indicate people, the square, blue nodes
indicate code. Thus, the integrated
network of people and code is a
sociotechnical network (i.e., an actor
network), not simply a social network.
3. 3 Organizational structure and citation
In the discussion space, we analyse the emerging cohesion of the email messages themselves
by a cognitive analysis, specifically more like analogous work in psychology and linguistics
discourse analysis. Our research question was the dynamics in the discussion space and the
links with the social structure.
A central aspect of coherence is how a message connects to previous messages in a discourse
context. In face-to-face conversation, coherence-how a turn connects to previous turns in a
dialogue- can be seen as actively constructed by participants across turns taking. In on line
conversations, a message can be separated both in time and place from the message it
responds to. Thus, according to a (time-based) sequential model of on-line conversation
(messages are posted in the order received by the system), there are disrupted turn adjacency,
i.e. relevant responses do not occur temporally adjacent to initiating turns (Herring, 1999):
this is a violation of sequential coherence (pragmatic principles of adjacency and relevance).
Prior work on online discussions (e.g. Venolia & Neustaedter, 2003; Popolov et al. 2000)
assumes that the conversational structure is determined by “threading” (i.e., reply relations):
A message may either denote a new conversation or be a reply to a single prior message. This
representation is most useful to analyse the interactional roles in turn taking of proposants and
repliers and to get a picture of the centrality (versus periphery) of participants (who tend to
get the most response of one post) in the social network. However it is not completely
relevant to analyse the referential coherence of the conversation.
We examine an alternative view based on quoting or citation (Yee, 2002) and on content
analysis. On the basis of content analysis, Eklundh and Rodriguez, (2004) distinguish
between several types of conversational linking strategies in on-line conversations around
documents :
 Explicit references: message number (in fact, never used, name of author), author (e.g.
even through Fred may be right), subject either by quoting or paraphrasing
 Implicit references: deictic or anaphoric reference to previous messages (e.g. as you
mention”), conversational sequencing (question or response move), topic relatedness
 External references: to other documents, to group experience
Quoting is seen as a linguistic strategy used by participants to connect a comment to previous
discourse contributions. Preliminary studies on the practice of quoting in on line conversation
(Herring, 1999; Eklundh & Rodriguez, 2004) show that it creates the illusion of adjacency: it
incorporates portions of two turns within a single message. It maintains context and last
messages can retrace the history of conversation.
We started this analysis, manually, on one corpus. The second step will be to analyse more
corpus with a software support. We selected a corpus of 126 email messages posted to the
main Python development mailing list from March 28th to April 8th, 2002 by 22 developers
including 6 administrators. (This corpus corresponds to the entire discussion of PEP 279.)
We distinguish two types of cohesion that occur between the messages: (1) Reply: Email
messages can be explicit responses to previously-posted messages (this is usually visible via a
subject: header shared by both messages); and, (2) Quotation: Email messages frequently
quote from previously-posted messages (quotations usually appear as indented or prefixed
lines -- e.g., lines starting like this: >>> -- in the citing message).
From a close analysis of the discussion organized by quotation, we find that not all posters
participated equally in the PEP discussion. When we quantitatively distinguish the highfrequency posters from the low-frequency posters (i.e., those who contributed many versus
those who contributed few messages) we can see that high-frequency posters are mostly
people who have assigned, administrative positions in the Python project. Moreover, those
posters who integrated either no (i.e., zero) quotes or multiple quotes from prior messages
into their responses tended to be administrators; those who used single quotes in their replies
tend to be developers, not administrators. In short, this is a simple example of where analysis
of the activity in the OSS project reflects the social and organizational role structure of the
project (and vice versa).
Furthermore, the patterns of quotation, sequential versus branch structure, tend to be linked
with respect to the social position of the poster in the Python project (see Apprendix). For
example, we note that (1) the branching structure is generally initiated by a message posted by
either Guido or by the PEP’s Champion (2) the sequential structure tends to show alternances
of administrators posting with developers posting. However, in thematic drift (as in P8) this
is not observed as Guido or the PEP’s Champion do not participate any more (except when
Guido stops the discussion). This analysis shows again the links between the social structure
and elements in the discussion space and how it shapes influence in the design process.
A more fine-grained content analysis is in progress. It categorizes messages according to a
coding scheme, inspired by our own previous work on collaborative design (Détienne et al,
2003; Détienne et al. in press). We distinguish between: (1) Theme: Problem addressed; (2)
Activity: Prop: proposition of (alternative) solution; Agreement/disagreement (with or
without arguments); Group regulation; Problem setting; Synthesis; Clarification; Explicit
decision. Our objective is to analyse patterns of activities with respect to the quoting structure
and the participants roles.
3. 4 Social structure and technical structure
Within the field of software engineering, it has been noted that, for any large software system
one can map out an "ownership architecture" (cf., Bowman and Holt, 1998). Specifically, one
can chart out who "owns" -- i.e., who makes changes and extensions to -- which modules of
the system. General software engineering implications for such “architectures” include, for
instance, "Conway's Law”: the social structure of the project (e.g., who manages whom, who
communicates with whom, etc.) has a direct influence on the structure of the software itself
(e.g., its division into modules). Conway’s law (Herbsleb and Grinter, 1999) was the first
explicit recognition that the communication patterns left an indelible mark upon the product
built. Most OSS projects produce conventional software products (e.g., programming
languages, operating systems, network clients and servers, etc.).
We are exploring the possible influences of an "inverse Conway's Law" (Sack et al. 2003) that
could explain how the “miracle” of organization of OSS development is not at all miraculous:
the technical structure of the software might directly influence the social structure of the
project. It may be the case that OSS development methods work only because the "parceling
out" of the work is well-known to most computer scientists even before the start of the
project. And, furthermore, this “parceling out” seems to entail the reinvention of some very
old – rather than new – work roles and governance/administrative structures (e.g., a strict, topdown hierarchy and a reenactment of artisan guild roles of master, apprentice, etc. as
described in Figure 3). We need to complete further work to test these hypotheses concerning
the possible, surprising, “conservative” nature of the organization and structure of OSS
development processes.
4. Discussion
While our work has uncovered some interesting possible similarities and differences between
OSS design and conventional software design, we feel one of our largest accomplishments
has simply been to develop a framework (a variant of actor-network analysis) for the analysis
of OSS development that integrates social and cognitive dimensions. Note that our framework
might be compared with analogous work currently under development in the UC system (cf.,
Scacchi, 2004). Our methodology has also resulted in the integration of qualitative and
quantitative work and has engendered the development of automatic tools for the analysis of
OSS project archives. Our joint work has given us a methodological framework and a set of
practical, software tools for us to continue to expand and deepen our research in this area.
References
Bowman, I. T., Richard C., & Holt, R. C. (1998) Software architecture recovery using
Conway's law. Proceedings of the 1998 conference of the Centre for Advanced Studies
on Collaborative research, November 1998.
D’Astous, P., Détienne, F., Robillard, P. N., & Visser, W. (in press) Changing our view on
design evaluation meetings methodology : a study of software technical evaluation
meetings. Design Studies.
Détienne, F., Burkhardt, J-M., & Visser, W. (2003) Cognitive effort in collective software
design: methodological perspectives in cognitive ergonomics. Proceedings of the 2nd
Workshop in the Workshop Series on Empirical Software Engineering "The Future of
Empirical Studies in Software Engineering", pages 17-25, Monte Porzio Catone (Rome,
Italy), 29 September, 2003.
Détienne, F., Martin, G., & Lavigne, E. (in press) Viewpoints in co-design : a field study in
concurrent engineering. Design Studies.
Ducheneaut, N. (2003) The reproduction of Open Source Software programming
communities. Ph.D. Dissertation, School of Information Management and Systems, UC
Berkeley, May 2003.
Eklundh, K. s., & Rodriguez, H. (2004) Coherence and interactivity in text-based group
discussions around web documents. Proceedings of the 37th Hawai international
conference on Systems Sciences.
Gacek, C., & Arief, B. (2004) The Many Meanings of Open Source, IEEE Software, 21(1),
34-40, January/February 2004.
Gasser, L., Scacchi, W., Ripoche, G., & Penne, B. (2003) Understanding Continuous Design
in F/OSS Projects. 16th International Conference on Software Engineering & its
Applications (ICSSEA-03), December, 2003, Paris, France.
Herbsleb, J. D., & Mockus, A. (2003) An empirical study of speed and communication in
globally-distributed software development. IEEE Transactions on Software
Engineering, 29(6).
Herring, S. C. (1999) Interactional coherence in CMC. Proceedings of the 32nd Hawai
international conference on Systems Sciences.
Latour, B. (1987) Science in Action, Cambridge, MA: Harvard University Press.
Madey, G., Freeh, V., & Tynan, R. (to appear, 2004) Modeling the F/OSS Community: A
Quantative Investigation. In Koch, S., (ed.) : Free/Open Source Software Development.
Idea Publishing.
Mahendran, D. (2002) Serpents and Primitives: An ethnographic excursion into an Open
Source community. Master’s Thesis, School of Information Management and Systems,
UC Berkeley, May 2002.
Mockus, A., Fielding, R.T., & Herbsleb, J. D. (2002) Two cases studies of Open Source
Software development: Apache and Mozilla. ACM Transactions on Software
Engineering and Methodology, 11(3), 309-346.
Popolov, D., Callaghan, M., & Luker, P. (2000) Conversation space:visualizing multithreaded conversation. AVI 2000, Palermo, Italy.
Raymond, E. S. (2001) The Cathedral and the Bazaa The Cathedral & the Bazaar: Musings
on Linux and Open Source by an Accidental Revolutionary. Sebastopol, CA: O'Reilly.
Also available at http://www.tuxedo.org/~esr/writings/cathedral-bazaar.
Sack, W., Ducheneaut, N., Mahendran, D., Détienne, F., & Burkhardt, J-M (2003) Social
Architecture and Technological Determinism in Open Source Software Development,
International 4S Conference: Social Studies of Science and Society, Atlanta, GA,
October 2003.
Scacchi, W. (2004) Socio-Technical Interaction Networks in Free/Open Source Software
Development Processes. In S.T. Acuña and N. Juristo (eds.): Peopleware and the
Software Process. World Scientific Press.
Venolia, G., & Neustaedter, C. (2003) Understanding sequence and reply relationships within
email conversations : a mixed-model visualization. CHI 2003, April 5-10, Florida,
USA.
Yee, K-P. (2002) Zest: discussion mapping for mailing lists. CSCW 2002 (demo).
APPENDIX. Citation graph of PEP 279 discussion
Overview of the graph
This graph represents a part of the conversation of PEP 279. Each circle represents an email message
which is labeled with an arbitrary number; arrows that join messages symbolize the relation “ is quoted
by”. For example, the message labeled “0” is quoted by “1”, “22” and “68”. Colors of circle represent
the main problem (theme) treated by the message.
Detailed view of the graph (three parts)
In the graph below, we propose a more detailed view of the same conversation introducing time and roles
of the participants. In abscisse, one can see the day and the time at which messages were sent. Messages
are represented by a different symbol according to the role of their author in the project (BDLF,
Administrators, Developers). Colors of the outlines represent the theme (design problem) addressed in
the messages; colors inside symbols represent the main activity conducted via the message (agreement,
disagreement, proposition, etc.). Arrows joining symbols still express the relation “is quoted by”.

Download Report

A Methodological Framework for Socio

Paperzz.com

Your Paperzz