Using Students as Experiment Subjects

Using Students as Experiment Subjects – An

Using Students as Experiment Subjects –
An Analysis on Graduate and Freshmen Student Data
Per Runeson
Lund University, Dept. of Communication Systems,
Box 118, SE-221 00 Lund, Sweden
[email protected]
ABSTRACT
The question whether students can be used as subjects in
software engineering experiments is debated. In order to
investigate the feasibility of using students as subjects, a
study is conducted in the context of the Personal Software
Process (PSP) in which the performance of freshmen students and graduate students are compared and also related
to another study in an industrial setting. The hypothesis is
that graduate students perform similarly to industry personnel, while freshmen student’s performance differ. A
quantitative analysis compares the freshmen’ and graduate
students. The improvement trends are also compared to
industry data, although limited data access does not allow a
full comparison. It can be concluded that very much the
same improvement trends can be identified for the three
groups. However, the dispersion is larger in the freshmen
group. The absolute levels of the measured characteristics
are significantly different between the student groups primarily with respect to time, i.e. graduate students do the
tasks in shorter time. The data does not give a sufficient
answer to the hypothesis, but is a basis for further studies
on the issue.
1
INTRODUCTION
People, process and technology are aspects that affect the
capabilities of software development organizations. The
three aspects interact, but it is not clear to what extent the
different aspects contribute to the success or failure in software engineering. It is important to know which aspects
contribute to, for example, increased productivity when
introducing a new process. The issue can be analyzed by
conducting empirical studies [22]. Many experiments are
conducted using students as subjects, and it is as often questioned whether these studies give valid results applicable to
a population of software engineering professionals.
The Personal Software Process (PSP) [7, 8, 9] is presented as a contributor to the process part, and to some extent a
contributor to the technologies in the area of project man-
agement. The PSP defines an approach to personalized software development processes with continuous improvement,
packaged in process descriptions and course material. The
PSP consists of a set of processes, ranging from the PSP0
Baseline Process, via the PSP1 Planning Process and the
PSP 2 Quality Management Process, to the PSP3 Cyclic Process. Each step adds more features to the previous step in
terms of planning, measurement and quality control.
New technologies are presented continuously in research
and industry, and are to some extent also evaluated. Technologies that are presented and evaluated are, for example,
different techniques for inspections [1, 18].
In empirical studies, people with different background
and experience have contributed as subjects. However, it is
not clear how people interact with the process and technology issues. In most studies, the experiment design blocks the
people in order to evaluate the process or technology part,
i.e. the study is intended to be independent of the people. It
is however important to try to clarify the impact and interaction with the people issue in empirical software engineering,
in order to validate studies, in particular those with students
as subjects, and the generalizability of such studies.
Empirical studies on the effect of using PSP have addressed the question on interaction between people and process by comparing the improvements made by graduate
students using PSP to the improvements made by industry
people [19]. The improvements achieved are almost the
same in both cases, i.e. the graduate students behave similarly to the industry people when taking the PSP course. In order to investigate this further, this paper presents a study
which investigates the performance of freshmen students
taking the PSP course compared to graduate students and,
and in a second step, the industry people.
Our hypothesis is that there are small differences between graduate students and industry people on one hand,
while there are significant differences between graduate students and freshmen students on the other hand.
The differences investigated are of two types. First, it is
investigated whether the same improvements are achieved in
the improvement steps between the PSP levels 0, 1 and 2, i.e.
if estimation accuracy, defect density and productivity im-
prove. Second, it is analyzed whether there are differences
between the performance, i.e. time consumption, productivity and number of defects. The outline of the study is shown
in Figure 1, where the improvement comparisons are marked
with solid lines and performance comparisons are marked
with dashed lines. Limited access to industry data does not
allow for the performance analysis on the industry data.
The paper is structured as follows. In Section 2 the context of the study is presented. In Section 3 the hypotheses are
formally defined and the analysis is reported. Section 4 contains a discussion on the interpretation of the results, and finally in Section 5 a summary is given.
2
STUDY CONTEXT
Since Humphrey presented the Personal Software Process in
his book [7], different studies related to PSP have been conducted. There are reports of descriptive nature which
present positive results in general, for example, experience
reports [4, 8]. Other studies are related to the quality of the
data collected in the use of PSP [3, 11, 12]. Further, studies
that investigate within-course effects of the PSP methods
are presented [5, 6, 19] as well as attempts to assess postcourse impact [14]. Reports regarding the use of PSP in
industry exist [13], and regarding use of PSP for teaching
are numerous, e.g. [2, 16]. It is also proposed to use the PSP
as a context for software engineering experiments [21].
This study is conducted on data primarily from students at
Lund University, Sweden, taking the PSP course as defined
by Humphrey [7]. The course settings are almost identical to
the settings in the Wesslén study [19]. In this study we have
one group of freshmen students at undergraduate level [16]
in addition to the graduate students. The graduate students
studied in Masters programs which are scheduled for 4.5
years in a sequence, including both undergraduate and graduate studies. Hence, most students study their topics without
having industrial experiences between their undergraduate
and graduate studies.
The PSP course was given at Lund University the first
time during the fall semester of 1996. It was then given to
Freshmen
improvement
performance
Industry
Graduate
FIGURE 1. Outline
of study
graduate students during their fourth year of studies. The
course attendants are students at the Computer Science and
Engineering program (CSE) and the Electrical Engineering
program (EE). During the spring semester 1999, the PSP
course was given to undergraduate students in their first year
of study in a Bachelors program in Software Engineering
(SE). In addition, the course was given to Ph.D. students at
Linköping University, Sweden, 1997. In this chapter, the
context of the course occasions contributing to the study is
presented.
The students were informed that the data collected might
be used in future empirical research under guaranteed anonymity [17]. The grading in the courses was partly based on
how well they adhered to the process, but not in the collected
metrics as such.
The industry data is collected at the Software Engineering
Institute (SEI,) and reported by Hayes and Over [5]. The data
are collected from courses given by the SEI at 23 different
occasions, comprising 298 students. Half of the courses were
given in an academical setting and half in an industrial setting.
2.1
General for all students
All the university courses used the original PSP book by
Humphrey as the key source of information [7]. In addition
to the book, all students from 1996 onwards were given an
additional booklet that guided the students in each task by
giving pointers to relevant parts of the PSP book, and clarifying the use of, for example, the estimation method proposed in the book. The programming tasks performed are
presented in Table 1
In order to ease the data collection and thereby improve
the quality of the data, electronic support was given to the
students. In the 1996 course setting, an ASCII-based solution
was used, while from 1997 and onwards, a spreadsheet based
tool for data collection was used. The students filled out a
spreadsheet for each task and submitted it electronically for
examination. The spreadsheet of the individual students
TABLE 1. Programming
tasks in the PSP course
#
Description
1A
Calculate standard deviation of a data set
2A
Count lines of code in a source file
3A
Extend 2A to count length of methods or functions
4A
Calculate linear regression of a data set
5A
Integrate a function numerically
6A
Calculate a prediction interval based on 4A and 5A
7A
Calculate the correlation between two data sets
8A
Sort elements of a linked list
9A
Calculate normality using Chi-2 test
were then linked together for analysis. The code counting
data was collected using the code counting program developed as exercises 2A and 3A, which were based on a common code counting standard.
Based on experience from the initial courses, the order between tasks 6A and 7A was shifted in the courses from 1998
and onwards. The reason is that the complexity of the tasks
grows smoother when taken in this order.
In neither of the university courses is the design method
presented by Humphrey prescribed. It was left to the students
to use any method they wanted.
2.2
PSP for graduate students
The graduate students attending the PSP courses had taken
programming courses in various languages. The CSE students had taken more courses than the EE students, but they
had all taken at least one programming course. At the first
course occasion, C was the mandatory programming language. At the other occasions, the students were free to
choose programming language as long as they were familiar
with the language they decided to use. Wesslén reports analyses of the outcome of the courses [19].
2.3
PSP for freshmen students
At Lund University a new Bachelors program in Software
Engineering (SE) was launched 1998 [15]. The program is
designed to make the students software engineers, not only
as a last add-on, but from the very beginning provide them
with means for quantifying, analyzing and managing their
software development tasks. Therefore it is assumed that the
attitudes are set towards software engineering from the very
beginning.
At the first run of the SE program, an introductory course
in Java was given during the first semester. In addition, a
brief introduction to the PSP concepts was given based on
the PSP introductory book [9]. During the PSP introduction
the basic forms were used, i.e. project plan summary, time
reporting log and defect reporting log. During the second semester, the full PSP course was given according to Humphrey’s book [7]. In parallel with the PSP course, a statistics
course was given to teach the statistics needed to implement
the PSP programs and to analyze the data. Experiences from
teaching this course are reported by Runeson [16].
The undergraduate students used Java as a mandatory language. In contrast to the graduate students, they were allowed to use a list package as a support for the programs,
which affects tasks 1A, 4A and 6A.
The different groups of students are summarized in
Table 2. The data reported by Hayes and Over is characterized in Table 3.
TABLE 2. Overview
of student subjects in the study
Year
University
Level
Language
# stud
96/97
Lund
Graduate
C
42
96/97
Linköping
Ph.D.
mixed
30
97/98
Lund
Grad
mixed
59
99
Lund
UndergradJava
uate
Sum
31
162
TABLE 3. Subjects
in the Hayes and Over study
Type
Number Class Size
of Classes Category
Number
of Classes
Instructor Training
4
4 to 10
6
Industry Setting
8
11 to 15
11
Academic Setting
11
16 to 21
6
Sum
23
298
23
3
3.1
ANALYSIS
Hypotheses
The informal hypothesis presented in the introduction is formally defined below. The hypotheses are of two types:
improvement hypothesis and performance hypotheses.
The improvement hypotheses are summarized in Table 4.
The primary hypotheses are tested using the freshmen, graduate and industry data. The additional hypotheses are tested
only for the student data due to limited access to raw industry
data. The improvement hypotheses are the same as in the
studies by Hayes [5] and Wesslén [19]. They are formulated
undirectionally, to allow comparison to the original studies.
Directional hypotheses would allow one-sided statistical
tests that are more powerful than two-sided tests.
The performance hypotheses investigate differences in
the measurements between the groups. Due to limited access
to industry data, these hypotheses are only tested on the
freshmen and graduate student data. The hypotheses are
summarized in Table 5.
3.2
Data validation
In the graduate student group, the individuals are removed
which “had not finished the course, received more help than
the other individuals or had not reported trustworthy data.”
[19]. The data validation reduces the data set from the original 131 data points to, as most, 113 data points for the different analyses, i.e. at maximum 18 out of 131 are removed.
TABLE 4. Improvement
hypotheses
Area
Primary hypothesis
Additional hypothesis
Size estimation accuracy
Estimation gets better for each PSP level
Dispersion in estimation reduced for each PSP level
Effort estimation accuracy
Estimation gets better for each PSP level
Dispersion in estimation reduced for each PSP level
Defect density
Defect density gets lower for each PSP level, overall,
and for compile and test respectively
Dispersion in defect density reduced for each PSP
level
Pre-compile defect yield
Yield gets higher for each PSP level
–
Productivity
Productivity gets higher for each PSP level
Dispersion reduced for each PSP level
TABLE 5. Performance
hypotheses
Area
Hypothesis
Size
Freshmen students write programs of different size compared to graduate students
Effort
Freshmen students spend different amount
of time compared to graduate students
Productivity
Freshmen students have different productivity compared to graduate students
Defect
Freshmen students have different amount of
defects in their programs compared to graduate students
Defect density
Freshmen students have different amount of
defects per size in their programs compared
to graduate students
Defect intensity
Freshmen students make different amount
defects per time unit compared to graduate
students
Applying the same validation procedure to the freshmen
student data set involves several risks. The data set is smaller, and thereby each subject contributes more to the totality.
If the individuals were removed which did not perform very
well, the results tend to be better than the sample actually
should indicate. Hence, two alternative validation procedures are applied and the analysis results are reported for
both.
The first approach is to follow the same procedure as in
the graduate student group, below referred to as the “reduction” approach. Then the data set is reduced from the original
31 data points to range from 17 to 25 data points for the analyses, i.e. between 6 and 14 out of 31 are removed.
The second approach is to fill in lacking data values (below referred to as the “fill-in” approach), according to the
following procedure:
1. If the data value is available, but not in the correct data
sheet, it is filled in. For example, actual size is reported in
the Project Plan Summary for the previous task, but not
moved into the sheet for the current task.
2. If the data is available for other tasks at the same PSP
level, this data is used. For example, if the yield is missing for task 8A but filled in for 7A, this data is used.
3. Otherwise, average population data is used.
When applying the “fill-in” approach to the data, 11 data
values are found in other data sheets, 12 data values are taken
from other tasks and 54 data values are taken from population average. This can be compared to the total number of
data values of about 1 700 per student [12], i.e. 53 000 for 31
students.
The analyses in this study are conducted on data validated
by both of the approaches, and it is reported where the results
differ. In the Hayes and Over data set, between 222 and 277
data points out of 298 students were possible to use. They did
not apply any method to complete the data.
3.3
Improvement study
The hypotheses in the improvement study are tested and
compared to previous studies, referred to as graduate [19]
and industry respectively [5].
The analysis procedure follows the previous studies.
Within each of the three groups, an ANOVA test is used to
test if there are any differences between the adjacent PSP
levels. If the ANOVA test rejects the null hypothesis that
here is no difference, a pair-wise t-test is conducted to see in
which step the improvement are done. For the freshmen and
graduate groups, an F-test is conducted to test if the dispersion is reduced with more sophisticated PSP levels. The
analysis results are summarized in Table 6. An “X” means
that the hypothesis is rejected at a significance level higher
than 0.95.
It can be noted that the improvements are very much the
same for the three groups. In the step from PSP0 to PSP1, the
freshmen group improves significantly on four out of six areas, and a sixth is improved as well for the “reduction” approach to data validation. There is no reduction on test defect
density, but otherwise, the result is compliant to both graduate students and industry people. The productivity is improved for freshmen in the step from PSP0 to PSP1, but is on
the other hand not improved in the subsequent step, as for the
other two groups.
3.4
The dispersion analysis shows less consistent results. The
dispersion in the freshmen group is not reduced in size estimation accuracy and productivity, while the graduate student
group has reduced dispersion on the yield. The freshmen
group tends to reduce the dispersion in the step from PSP1 to
PSP2 while the graduate student group reduces the dispersion already in the step from PSP0 to PSP1.
The median improvements from PSP0 to PSP2 are in the
same magnitude of order for the three groups, as presented in
Table 7. The exception is the effort estimation accuracy, for
which the freshmen improve a factor of 14.9, while graduate
students and industry people improve a factor of 3.0 and 1.75
respectively.
In order to further investigate the differences between the
groups, the metrics for the different development performance characteristics, collected in the PSP, are compared
for the freshmen students and the graduate students. The
reduced access to industry data makes it impossible to make
the same comparison to the industry group.
The following metrics are compared:
• Size of program, measured in LOC
• Total development time in minutes
• Productivity, measured in LOC per hour
• Total number of defects
• Defect density, measured as number of defects per LOC
• Error intensity, measured as number of defects per
development hour
For each of the metrics, a t-test is conducted to test the null
hypothesis that the performance is the same for freshmen
students and graduates students. Further, the mean percentage difference between the groups are calculated according
to the following formula:
improvement from PSP 0 to PSP2
Area
Freshmen Graduate Industry
Size Estimation Accuracy
1.79
2.1
2.5
Effort Estimation Accuracy
14.9
3.0
1.75
Overall Defect Density
1.8
1.4
1.5
Compile Defect Density
3.4
2.9
3.7
Test Defect Density
1.8
2.0
2.5
Pre-Compile Defect Yield
45%
39%
50%
1.58
No gain
0.9 (0.86)
or loss
Productivity
F ( freshmen )
Diff =  --------------------------------- – 1 100
F ( graduate )
where
F = [Size, Time, Prod, Defects, Density, Intensity]
It can be concluded that there are no other significant differences between the groups with respect to their improvement within the PSP context. Next question to study is
whether the performance metrics show any statistical differences.
of results in the improvement analysis.
Dispersion
X
Effort Estimation Accuracy
X
X
X
X
Overall Defect Density
X
X
X
Xb
Compile Defect Density
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
Test Defect Density
Pre-Compile Defect Yield
Productivity
X
a. Only for the “reduction” validation approach
b. Only for the “fill-in” validation approach
PSP1 vs. PSP2
Graduate
X
PSP0 vs. PSP1
Industry
Xa
Graduate
Industry
Size Estimation Accuracy
Area
Freshmen
Graduate
PSP1 vs. PSP2
Freshmen
PSP0 vs. PSP1
Freshmen
Mean
Graduate
TABLE 6. Summary
The relative improvement is analyzed and no absolute
values. Hence, the variety of languages used does not impact
on the size difference. The analyses are summarized in
Table 8, where * refers to significance level of 0.9 and ** refers to significance level of 0.95.
Freshmen
TABLE 7. Median
Performance study
X
X
X
X
X
X
X
X
X
X
X
The freshmen students tend to raise questions more on
programming issues, while the graduate students are more
focused on the process parts. This is not surprising as the
freshmen students attended the course directly after their
first programming course, while the graduate students attended the course in their fourth year of studies. On the other
hand, there may be some learning effects for the graduate
student group as well, in particular for the electrical engineering students. They take their programming courses primarily in the first and second years, and are focusing on other
topics during their third year. Hence, they have to recover
their programming skills.
The variation within the groups is larger for the freshmen
students. Few students in a graduate student group have serious problems while the share of students in the freshmen
group having problems is larger. This is indicated by the
number of data points removed in the “reduction” approach
to data validation. In the graduate group of 131 subjects, it is
reduced to 113, i.e. by 14%. In the freshmen group of 31, it
is reduced to between 25 and 17, i.e. by 20 to 45% for the different analyses.
In the performance analysis, the differences are clearer
between freshmen students and graduates students than in
the improvement analysis. Freshmen students write significantly smaller programs for tasks with PSP0 and PSP1. The
average difference is 19% related to graduate students. In
tasks 1A, 4A and 6A the groups have different prerequisites,
i.e. the freshmen students are allowed to use a list package.
The comparison to the subset of graduate students using Java
shows the same trend, although it differs for the individual
tasks. However, there are only 9 students in the graduate
group which were using Java, so the basis for any conclusions is rather limited.
The freshmen students spend significantly more time on 8
out of 9 tasks. On average the freshmen students spent 47%
more time than the graduate students. A direct consequence
of this big difference is that the productivity is significantly
lower for freshmen students. They write shorter programs in
longer time.
The number of defects does not differ between the groups.
This is an issue where the data quality can be debated. It can
be questioned whether the freshmen students really report all
the problems they encounter. The time data indicates that
they have more problems, but the defect data does not.
Although there is no significant difference in number of
defects, the defect density is significantly higher for freshmen in 4 out of 9 cases. On the other hand, the defect intensity, i.e. the number of defects per development hour is
lower. This indicates that the real difference is in the time
consumption.
3.5
3.6
The most important threats to the validity of the study are
discussed below.
Conclusion validity is threatened by the fact that the data
are collected in different settings. This is particularly true regarding industry data. However, as the PSP environment is
well defined, this reduces the threat. The reliability of the
measures can be questioned in this study as well as in other
PSP studies [3, 11, 12], and hence the conclusions as well.
Further, the data validation is performed using alternative
approaches (fill-in, reduction), which give slightly different
results.
Internal validity is threatened by instrumentation issues.
In its standard format, the PSP material provides an exten-
Qualitative differences
Having experience from teaching the two student groups,
there are also some qualitative differences worth mentioning [16]. Some of the issues are measurable, but they were
not measured during the courses.
TABLE 8. Summary
Threats to validity
of analysis
PSP0
1A
Size(fresh) < Size(grad)
PSP1
2A
**
**
Size(fresh) < Size(grad), Java only
*
**
Time(fresh) >Time(grad)
**
Prod(fresh) < Prod(grad)
**
**
3A
4A
**
PSP2
5A
*
6A
7A
**
a
**
**
**
**
**
**
**
**
**
**
**
**
*
Diff
8A
9A
–18.7%
–10.4%
**
**
46.8%
**
–37.4%
Defects(fresh) < Defects(grad)
–9.1%
Density(fresh) > Density(grad)
**
**
Intensity(fresh) < Intensity(grad)
**
**
a. ** for Size(fresh) > Size(grad)
*
**
**
**
12.7%
**
–32.1%
sive paper-based set of forms to fill out. In the student settings, most of the data are collected using electronic support.
It is unknown to what degree this impacts on the result. The
selection of subjects within each group is based on convenience sampling, and is hence no true sample of any larger
population. In all data sets, there are subjects whom drop out
and we do not know how this impacts on the results.
Regarding construct validity, the use of the PSP context is
the largest threat. It increases the internal validity as it adds
rigor to the process and the data collection, but it decreases
construct validity since few software engineering settings are
so well defined, nor are the tasks to be solved so small. However, as the key question is to investigate the validity of using
students as subjects in experiments, the PSP is quite similar
to how experiment packages look like, and hence it is reduces validity threat regarding the purpose of this study.
For the external validity of the study, the question is
whether the study is representative to other software engineering experiments, as the purpose is to analyze whether
students can be successfully used as subjects. We believe
that the student groups and the industry group are quite similar to groups conducting different types of experiments, and
thus the external validity is reasonably high. Whether and
experiment conducted in a student environment is another issues; this is the issue of the investigation as such.
4
DISCUSSION
The analysis presented shows two clear trends: 1) the
improvements between the PSP levels are very much the
same for all three groups, 2) the freshmen students spend
significantly more time on their tasks. The question is now
how this can be interpreted.
In the three groups, the process is the same. All groups
follow the PSP course with minor variations. The technology
is also rather similar, even though they use different languages. The tasks in the PSP are small, and thus there is no need
for extensive tool support to do a good job. The people have
different experience and knowledge. It can be debated which
of the issues that has the largest impact on the total result, but
it is hard to measure.
However, as the PSP course is designed with continuous
improvement in the three PSP levels and the adding of new
methods, the improvement is probably to a large extent due
to the methods as such and not due to the people learning. In
PSP0, there is no estimation method, and very limited experience data available, while in PSP1 experience data is available and the PROBE estimation method is gradually
introduced. Hence, it is not surprising the estimation accuracy improves. The same holds for the pre-compile defect
yield. Code and design reviews are introduced in PSP2,
while in PSP0 and PSP1, reviews are not a formal part of the
process. Again, it is no surprise that the case of applying reviews reveals more defects before compile than the case of
not applying reviews.
These issues are related to the process, and it seems that,
independently of people, almost the same effects can be observed. The direct measurements show a significant difference on time consumption. In the study, freshmen spend
47% more time than graduate students do. This indicates that
the people part actually is different between the freshmen
and graduate student groups. Unfortunately, industry data is
not available to make the same comparison between graduate students and industry people.
A last question related to the data is why there is no difference in defect levels between freshmen and graduate students. Here, it is tempting to assume that the freshmen
students do not report all defects. The reported repair time in
the defect reporting log seems to be somewhat low related to
the time spent on compile and test, but no systematic investigation is performed on this issue. The quality of the PSP
data is investigated and debated [11, 12, 3], but not concerning the defect reporting.
How shall these results be interpreted in terms of the feasibility of using students as subjects in software engineering
experiments? The improvement study may give the impression that any subject is feasible for a software engineering
experiment. The performance study and qualitative judgments turn more towards that there are substantial differences between the two student groups. Unfortunately, industry
data is not available to perform the same comparison to the
industry group. Hence the general question remains unanswered, while it can be stated that freshmen students should
not be used as subjects for software engineering experiments.
5
SUMMARY
It is generally accepted that people, process and technology
are three different aspects that affect software engineering.
In order to learn more about the different parts, experiments
are conducted. An important question is whether students
can be used as subjects and still give generalizable results.
In this paper, three sets of PSP data are compared in order to
evaluate differences regarding people issues between freshmen students, graduate students and industry people. It is
observed that almost the same improvements are made
between the different PSP levels for the three groups. The
estimation accuracy is improved and the defects are
reduced. This is however primarily an effect of the PSP process as such, rather than the people. New steps for estimation and defect reduction are introduced which give the
observed effects.
The measurements on the absolute performance on the
freshmen student group and the graduate student group show
more varying results. The freshmen students spend significantly more time to fulfill the tasks than graduate students
do. From this, we conclude that there is a difference on the
people issue between the two student groups, which is also
supported by the quantitative observations.
The conclusions drawn from the study can neither reject
nor accept the hypothesis on differences between freshmen,
graduate students and industry people. The difference between freshmen and graduate students is observed, while the
data is not sufficient to evaluate similarities or differences
between industry people and graduate students. Hence, this
relation is subject for further studies.
ACKNOWLEDGEMENT
Thanks to Dr. Anders Wesslén for letting me use his analysis tools in the study and guiding in the data access. Thanks
to Dr. Thomas Thelin and Dr. Magnus C. Ohlsson for good
cooperation during the PSP course for freshmen students.
Thanks to Dr. Martin Höst for reviewing a draft of this
paper.
REFERENCES
[1] V. R. Basili, S. Green, O. Laitenberger, F. Lanubile, F. Shull,
S. Sørumgård, and M. Zelkowitz, “The Empirical Investigation of Perspective-Based Reading”, Empirical Software
Engineering, 1(2):133-164, 1996.
[2] J. Börstler, D. Carrington, G. W. Hislop, S. Lisack, K. Olson
and L. Williams, “Teaching PSP: Challenges and Lessons
Learned”, IEEE Software, Sep./Oct. 2002, pp. 42-48.
[3] A. M. Disney and P. M. Johnson, “Investigating Data Quality
Problems in the PSP”, FSE-6, 1998.
[4] P. Ferguson, W. S. Humphrey, S. Khajenoori, S. Macke and A.
Matvya, “Results of Applying the Personal Software Process”, IEEE Computer, No 5, 1997, pp. 24-31.
[5] W. Hayes and J. W. Over, “The Personal Software Process
(PSP): An Empirical Study of the Impact of PSP on Individual
Engineers”, Technical Report, CMU/SEI-97-TR-001. ESCTR-97-001, Software Engineering Institute, December 1997.
[6] W. Hayes, “Using a Personal Software Process to Improve
Performance”, Proc. 5th International Metrics Conference,
pp. 61-71, 1998.
[7] W. S. Humphrey, A Discipline for Software Engineering,
Addison Wesley, 1995.
[8] W. S. Humphrey, “Using a Defined and Measured Personal
Software Process”, IEEE Software, May 1996, pp. 77-88.
[9] W. S. Humphrey, Introduction to the Personal Software Process, Addison Wesley, 1997.
[10] M. Höst, B. Regnell and C. Wohlin, “Using Students as Subjects – A Comparative Study of Students and Professional in
Lead-Time Impact Assessment”, Journal of Empirical Software Engineering, 5(3):201-214, 2000.
[11] P. M. Johnson and A. M. Disney, “The Personal Software Process: A Cautionary Case Study”, IEEE Software, Nov./Dec.
1998, pp. 85-88.
[12] P. M. Johnson and A. M. Disney, “A Critical Analysis of PSP
Data Quality: Results form a Case Study”, Empirical Software
Engineering 4(4):317-349, 1999.
[13] M. Morisio, “Applying the PSP in Industry”, IEEE Software,
Nov./Dec. 2000, pp. 90-95
[14] L. Prechelt and B. Unger, “An Experiment Measuring the
Effects of Personal Software Process (PSP) Training”, IEEE
Trans. on Software Engineering, 27(5): 465-472, 2000.
[15] P. Runeson, “A New Software Engineering Programme –
Structure and Initial Experiences”, Proc. 13th Conference on
Software Engineering Education & Training, pp. 223-232,
2000.
[16] P. Runeson, “Experience from Teaching PSP for Freshmen”
Proc. 14th Conference on Software Engineering Education &
Training, pp. 98-107, 2001.
[17] J. Singer and N. G. Vinson, “Ethical Issues in Empirical Studies of Software Engineering”, IEEE Trans. on Software Engineering, 28(12): 1171-1180, 2002.
[18] T. Thelin, P. Runeson, and B. Regnell, “Usage-Based Reading
- An Experiment to Guide Reviewers with Use Cases”, Information and Software Technology, 43(15):925-938, 2001.
[19] A. Wesslén, “A Replicated Empirical Study of the Impact of
the Methods in the PSP in Individual Engineers”, Empirical
Software Engineering, 5(2), pp. 93-123, 2000.
[20] C. Wohlin, “Meeting the Challenge of Large Scale Software
Development in an Educational Environment”, Proc. 10th
Conference on Software Engineering Education & Training,
pp. 40-52, 1997.
[21] C. Wohlin, “The Personal Software Process as a Context for
Empirical Studies”, IEEE TCSE Software Process Newsletter,
pp. 7-12, No. 12, Spring 1998.
[22] C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B. Regnell
and Anders Wesslén, Experimentation in Software Engineering – An Introduction, Kluwer Academic Publishers, Boston,
MA, USA, 2000.

Download Report

Using Students as Experiment Subjects – An

Paperzz.com

Your Paperzz