Experiment Design - DI PUC-Rio

2016
Experiment Design
Identifying Code Smells with Collaborative
Practices: A Controlled Experiment
This material consists of an experiment design related to
investigation the collaborative practices on the identification
of code smells.
Bernardo Estácio
Phd Student (PUC-RS)
Roberto Oliveira
Phd Student (PUC-Rio)
Summary
1.
Introduction ................................................................................................................................ 3
2.
Research Question ...................................................................................................................... 4
3.
Planning ....................................................................................................................................... 5
3.1
Selection of Contextual ....................................................................................................... 5
3.2
Hypothesis Formulation ...................................................................................................... 6
3.3
Variables Selection .............................................................................................................. 6
3.4
Subjects Selection ............................................................................................................... 7
3.5
Experiment Design .............................................................................................................. 7
4.
Instrumentation .......................................................................................................................... 8
5.
Threats to validity........................................................................................................................ 8
6.
References ................................................................................................................................... 9
1. Introduction
Code smells are program structures that often indicate software maintainability
problems [4]. The identification of Code smells is the first essential task to improve
software modularity and maintainability, helping the software's ability to evolve over time
[8]. Ideally, programmers should promptly identify (and remove) code smells as they
produce their source code. Whenever this identification is ineffective or postponed, the
recognition of such code smells can become costly or even prohibitive in the long run [6].
On the other hand, the identification of code smells is not a trivial task and consequently
error-prone [7]. It requires that programmers subjectively recognize the anomalous
structure of each possible anomaly in the source code. In this context, it is necessary to
investigate new ways that can facilitate this task of identification of code smells. Our study
aims to analyze the use of collaborative practices as a means to facilitate this task.
In this context, we intend to explore empirically two collaborative practices: Pair
Programming and Coding Dojo. Pair Programming is a practice in which two programmers
work collaboratively – side by side – on the same activity, designing an algorithm,
developing code, analyzing or testing a program [1][3]. On the other hand, Coding Dojo, is
a practice that promotes group collaboration, specifically in the variant Coding Dojo
Randori, where two programmers work in pairs and the others (the audience) pay attention
to the development and after time cycles all participants work at least once as pilot and
copilot, developing the software together [9].
Pair Programming (PP) is one of the most popular collaborative practices, and it has
been the subject of a considerable number of empirical studies [5][12] involving
programmers with different knowledge levels and professional experience. Unlike the
studies involving the use of PP, only a few studies explore empirical evidence about
Coding Dojo Randori (CDR). According to Rooksby et al. [11] there is a lack of studies
that evaluate the CDR effectiveness in different tasks of software development. However,
these studies did not evaluate these collaborative practices improve the accuracy of the
programmers in the identification of Code smells, especially analyzing in the industry
context. Therefore, this study aims at comparing and evaluating the efficiency of
collaborative practices in code anomaly identification.
2. Research Question
This protocol presents the following general research question: Do collaborative
practices improve the effectiveness of novice developers on the identification of code
smells when compared to the solo programming ”. In order to investigate the main research
question, we described the goal of this study following the GQM template, described in the
Table 1 below.
Table 1 Goal of the controlled experiment
Analyze
For the purpose of
With respect to
From the point of
view
In the context of
Pair Programming (PP) and Coding Dojo (CD)
Characterization
Effectiveness on the Identification of Code
Smells
Researchers
Novice developers identifying code smells
3. Planning
The planning activity is the fundamental basis of this experiment, that is, the context
of the experiment is determined in detail. This description involves: (1) selection of the
context (2) of the formulation hypotheses, (3) the selection of variables, (4) the selection of
individuals, (5) the project experiment, (6) the description of the instrumentation, and (7)
the assessment of the validity of experiment. Figure 1illustrates the steps involved in
planning experiments.
Figure 1 - Experiment Planning. Adaptation [Wohlin et al., 2000]
3.1
Selection of Contextual
The context selection consists where the experiment will be executed, that is, the
environment. According to Wohlin et al [14], the context is characterized in four
dimensions:
 Process: on-line / off-line;
 Participants: students / professionals;
 Reality: real problem / modeled problem;
 General: specific / general.
The study was conducted with 28 novice developers. They were classified
according to two levels of experience: (i) novice developers without experience in
industrial software projects and (ii) novice developers who worked in at least one industrial
software project. They were selected either from an undergraduate course in computer
science or from an Agile Software Development course, called Software Kaizen [3]. This
course provides undergraduate Computer Science students an immersion of four months in
industrial software projects..
3.2
Hypothesis Formulation
One of the essential aspects in an experiment is to know and to state clearly what we
intend to evaluate in the experiment. In order to address our general research question
(Section 2) we formalized the respective hypothesis.
Hypothesis 1 for RQ1:
Null Hypothesis, The use of collaborative practices does not affect the
effectiveness on the identification of code smells.
H100: (V1 = V2 = V3), where
V1 = Average of Code smells identified in one hour in Pair Programming.
V2 = Average of Code smells identified in one hour in Coding Dojo.
V3 = Average of Code smells identified in one hour in Solo Programmer.
Alternative Hypothesis, H1alt1: Novice developers using pair programming
identify more code smells within time constraints than novice developers using solo
programming.
H101: (V1 > V3)
Alternative Hypothesis, H1alt2: Novice developers using group programming
identify more code smells within time constraints hour than novice developers using solo
programming.
H12: (V2 > V3)
This hypotheses question compares the number of smells identified when using
each treatment. This analysis is important because it allows us to check for trends or
relationships between the numbers of smells identified by the participants in of the
experiment and thus verify the effectiveness of each treatment. In additional, this
hypothesis was created aiming to investigate the way programmers perform the
identification of code smells, individually and collaboratively. Based on this analysis, we
can also identify the main (dis)advantages from the use of collaborative practices in order
to: (i) define a set of features that allow us the definition of a collaborative strategy, and (ii)
improve the effectiveness in the task of identification of code smells.
3.3
Variables Selection
Before starting the experiment, we have to choose the independent and dependent
variables. The independent variables are those that we can control and change in the
experiment. The dependent variables consists the elements that we wish to measure
throughout the experiment in a particular context. Changing the independent variables, we
may detect whether the dependent variables is affected, or not, we can see as the variables
are related to experimental process.
Independent Variables: composition pairs and groups.
Dependent Variables: effectiveness.
3.4
Subjects Selection
The study was conducted with 28 novice developers. They were classified
according to two levels of experience: (i) novice developers without experience in
industrial software projects and (ii) novice developers who worked in at least one industrial
software project. They were selected either from an undergraduate course in computer
science or from an Agile Software Development course, called Software Kaizen
\cite{estacio14}. This course provides undergraduate Computer Science students an
immersion of four months in industrial software projects. In order to participate in the
study, all the developers signed a consent form. They also filled out a characterization form
with objective questions about their expertise in the topics related to the study: (a)
programming; (b) Java; (c) PP; and (d) GP (i.e., CDR).
3.5
Experiment Design
The experimental study design will have to deal with some restrictions. In this
sense, this project will use: (i) a factor (identification of the impact of collaborative
practices in contrast to individual practice); (ii) three treatments: Solo Programming, Pair
Programming and Group Programming; and (iii) three objects of study (different programs
with several Code smells).
In each objects studies mentioned earlier, participants will make of the identification
of code smells in three programs pre-selected by the researchers. Therefore, for a
participant will never be given the same program in other practices to be performed. The
resulting design for driving the experimental study are shown illustrated in Figure 2. To
facilitate the understanding of this figure, the three treatments of study were represented in
gray. The columns represent the steps with their objects of studies and the lines represent
the participants of each step. All steps of the experiment will be conducted in the computer
lab at PUC-RS.
In the first step of the experiment will be applied the program "A" in individual
practice for participants from 1 to 6 and 15 to 20, in pairs to the participants from 11 to 14
and 25 to 28, and coding dojo for participants from 7 to 10 and 21 to 24. In the second step
will be applied the program "B" in individual practice for participants from 7 to 10 and 21
to 24, in pairs to the participants from 1 to 6 and 15 to 20, and coding dojo for participants
from 11 to 14 and 25 to 28. In the third and final step will be applied the program "C" in
individual practice for participants from 11 to 14 and 25 to 28, in pairs to the participants of
7 to 10 and 21 to 24, and coding dojo for participants from 1 to 6 and 15 to 20. According
Figure 2.
Solo programming
Pair Programming
Group Programming
Program A
- step one Participants
1-6
15 - 20
Participants
11 - 14
25 -28
Participants
7-10
21- 24
Program B
- step two Participants
7-10
21- 24
Participants
1-6
15 - 20
Participants
11 - 14
25 -28
Program C
- step three Participants
11 - 14
25 -28
Participants
7-10
21- 24
Participants
1-6
15 - 20
Figure 2 - Experimental Study of the project. Three rounds, involving twenty eight Participants three treatments and
three study objects.
4.
Instrumentation
In order to support the experiment, we selected three programs. The researchers
before the experiment will analyze these programs.
The experiment will also have other artifacts such as:



Consent form;
Pre-evaluation questionnaire in order to characterize the group of the
students;
Post-evaluation questionnaire to qualitative analysis;
5. Threats to validity
Construct validity: Regarding the experimental planning, we tried to include all
the programmers of the system analyzed in order to get a wide analysis of the system
regarding the code anomalies from different perspectives. Moreover, we clearly explained
the experimental tasks to avoid misguidance of the subjects. Finally, we elaborated a simple
and direct questionnaire to ease the understanding of the assigned task.
Internal validity: We considered the following threats to the internal validity: (i)
Different knowledge levels among the participants (e.g. Java programming, pair
programming and group programming), and (ii) differences among experimental tasks. To
mitigate the first threat, we applied the design principles of balancing, blocking and random
assignment, as suggested by Wohlin et al. [14]. Regarding the second threat, we applied the
control action of applying a crossed design, in which independent groups applied all
treatments to all tasks.
External validity: Our study interviewed a small sample of programmers what is
the biggest threat to the external validity. Therefore, we cannot generalize our conclusions.
The sample was comprised by novice programmers and programmers who worked in real
projects in the industry. In this sense, the data extracted from this study presents important
results related to identification of code anomalies. But, as an initial study on this subject,
we do not raise any external validity claims and ask for replications to allow further
generalizing the results.
Conclusion validity: We believe the questionnaire and experimental tasks were
properly built to achieve the expected answers to our research questions. For instance, it
allows us to detect the reliance of the inexperienced programmers on the use of software
metrics as support for the classes assessment. Therefore, we tried to bypass such threat to
conclusion validity relying our analysis only in the information gathered with the subject’s
answers and qualitative data obtained from logs and videos.
6. References
[1]
Beck, K. and Andres, C. Extreme Programming Explained: Embrace Chance. 2th ed. Boston: Addison-Wesley,
2004.
[2]
Estácio, B. et al. Software Kaizen: Using Agile to Form High-Performance Software Development Teams. In Agile
Conference (AGILE), 2014, 1-10.
[3]
Fowler, M. et al. Refactoring: Improving the Design of Existing Code. Addison-Wesley Professional, 1999.
[4]
Hanks, B. Problems Encountered by Novice Pair Programmers. In J. on Educ. Res. in Comp., 7, 4, 2008, 1-10.
[5]
Macia, I. et al. On the Relevance of Code smells for Identifying Architecture Degradation Symptoms. In Proceedings
of CSMR, 2012a, 277–286.
[6]
Macia, I. et al. Are Automatically-Detected Code smells Relevant to Architectural Modularity? An Exploratory
Analysis of Evolving Systems. In Proceedings of AOSD, 2012b, 167-178.
[7]
Opdyke, W. F. Refactoring Object-Oriented Frameworks. Univ. of Illinois at Urbana-Champaign, 1992.
[8]
Sato, D.T., Corbucci, H., Bravo, M.V. Coding Dojo: An Environment for Learning and Sharing Agile Practices. In.
Proceedings of AGILE, 2008, 459-464.
[9]
Rooksby, J., Hunt, J. And Wang, X. The Theory and Practice of Randori Coding Dojos. In International Agile
Conference (XP), 2014, 251-259.
[10] ] Rabbit in: https://marketplace.eclipse.org/content/rabbit, Access in Mar. 2015.
[11] Rooksby, J., Hunt, J. And Wang, X. The Theory and Practice of Randori Coding Dojos. In International Agile
Conference (XP), 2014, 251-259.
[12] Tsantalis, N., Chaikalis, T. and Chatzigeorgiou, A. JDeodorant: Identification and Removal of Type-Checking Bad
Smells. In Proceedings of CSMR, 2008, 329-331.
[13] Williams, L. et al. Strengthening the Case for Pair Programming. In IEEE Trans. Software Engineering, 17, 4, Jul. 2000,
19-25.
[14] Wohlin, C., Runeson, P., Host, M., Ohlsson, M.C., Regnell, B., Wesslén, A. Experimentation in Software
Engineering – An Introduction. In Kluwer Academic Publishers, 2000, ISBN 0-7923-8682-5.