Software Fault Reporting Processes in
Business-Critical Systems
Jon Arvid Børretzen
Doctoral Thesis
Submitted for the partial fulfilment of the requirements for the degree of
Philosophiae Doctor
Department of Computer and Information Science
Faculty of Information Technology, Mathematics and Electrical Engineering
Norwegian University for Science and Technology
Copyright © 2007 Jon Arvid Børretzen
ISBN 82-471-xxxx-x (printed)
ISBN 82-471-xxxx-x (electronic)
ISSN 1503-8181
NTNU 2007:xx (local report series)
Printed in Norway by NTNU Trykk, Trondheim
ii
Abstract
Today’s society is crucially dependent on software systems. The number of areas where
functioning software is at the core of operation is growing steadily. Both financial
systems and e-business systems relies on increasingly larger and more complex
computer and software systems. To increase e.g. the reliability and performance of such
systems we rely on a plethora of methods, techniques and processes specifically aimed
at improving the development, operation and maintenance of such software. The BUCS
project (BUsiness-Critical Systems) is seeking to develop and evaluate methods to
improve the support for development, operation and maintenance of business-critical
software and systems. Improving software processes relies on the ability to analyze
previous projects and derive concrete improvement proposals. The research in this
thesis is based on empirical studies performed in several Norwegian companies that
develop business-critical software. The work specifically aims to assess the use of fault
reporting approaches, and describe how improvement in this area can benefit process
and product quality.
Some specific software methods will be adopted from safety-critical software
engineering practices, while others will be taken from general software engineering.
Together they will be tuned and refined for this particular context. A specific goal in the
BUCS project has been to facilitate the use of traditional Software Criticality Analysis
techniques for the development of business-critical software. This encompasses
techniques used to evaluate and explore potential risks and hazards in a system. The
thesis describes six studies of software development technology for business-critical
systems. The main goal is to attain a better understanding of business-critical systems,
as well as to adapt and improve relevant methods and processes. Through data mining
of historical software project data and other studies of relevant projects, we have
gathered information to be evaluated with the goal of improving business-critical
systems development. The BUCS project has been involved in investigation of
development projects for business-critical systems, investigations that have been
continued in the EVISOFT user-driven project. The main goal was to study the effects
of revised development methods for business-critical software, in order to improve
important quality aspects of these systems.
The main research questions in this work are:
• RQ1. What is the role of fault reporting in existing industrial software development?
• RQ2. How can we improve existing fault reporting processes?
• RQ3. What are the most common and severe fault types, and how can we reduce
them in number and severity?
• RQ4. How can we use safety analysis techniques together with failure report
analysis to improve the development process?
The main contributions of this thesis are:
• C1. Describing how to utilize safety criticality techniques to improve the
development process for business-critical software.
• C2. Identification of typical shortcomings in fault reporting.
• C3. Improved model of fault origins and types for business-critical software.
iii
Preface
This thesis is submitted to the Norwegian University of Science and Technology
(NTNU) in partial fulfilment of the requirements for the degree Philosophiae Doctor.
The work has been performed at the Department of Computer and Information Science,
NTNU, Trondheim, with Professor Reidar Conradi as the main advisor, and Professor
Tor Stålhane and Professor Torbjørn Skramstad as co-advisors.
The thesis is part of the BUCS project (BUsiness-Critical Systems) and has been
financed for three years by the Norwegian Research Council through the IKT’2010
basic IT Programme under NFR grant number 152923/V30. In addition comes one year
as a teaching assistant paid by NTNU. The BUCS project has been lead by Professor
Tor Stålhane. Some of the work in this thesis has also partly been financed by the
EVISOFT user-driven R&D project under NFR grant number 174390/I40.
iv
Acknowledgements
During the work on this thesis, I have been lucky to been in contact with many people
who have provided help, inspiration and motivation. First of all, I want to thank my
supervisor, Professor Reidar Conradi, for giving valuable feedback and comments on
many drafts and ideas during the last four years. Also, I want to thank Professor Tor
Stålhane, my co-advisor, for being the source of a lot of good advice and many bad
jokes. I also want to thank the present and former members of the software engineering
group at IDI, NTNU for giving me a good working environment. A special thanks to my
BUCS colleagues Torgrim Lauritsen and Per Trygve Myhrer for collaboration in our
research and daily work.
Parts of the work for this thesis have been done in collaboration with people from
several industrial organizations. I am very grateful to these companies and the people I
have been in touch with from these organizations who have been helpful and
accommodating when sharing their information and experience with me. Also I want to
thank master student Jostein Dyre-Hansen for helping me analyze a great deal of data
material.
Finally, I want to thank my family and friends for their encouragement and inspiration,
and I would especially express my thanks to Ingvild for her love and patience.
Trondheim, Nov 1, 2007
Jon Arvid Børretzen
v
vi
Table of contents
1
Introduction .............................................................................................................. 1
1.1 Motivation .............................................................................................................. 1
1.2 Research Context.................................................................................................... 2
1.3 Research design ...................................................................................................... 4
1.4 Research questions and contributions..................................................................... 4
1.5 Included research papers ........................................................................................ 5
1.6 Thesis structure....................................................................................................... 7
2
State-of-the-art.......................................................................................................... 9
2.1 Introduction ............................................................................................................ 9
2.2 Software engineering.............................................................................................. 9
2.3 Software Quality................................................................................................... 11
2.4 Anomalies: Faults, errors, failures and hazards.................................................... 12
2.5 Current methods and practices ............................................................................. 16
2.6 Business-critical software..................................................................................... 20
2.6.1 Criticality definitions......................................................................................... 20
2.7 Techniques and methods used to develop safety-critical systems........................ 22
2.8 Empirical Software Engineering .......................................................................... 26
2.9 Main challenges in business-critical software engineering .................................. 29
3
Research Context and Design................................................................................. 31
3.1 BUCS Context ...................................................................................................... 31
3.2 Research Focus ..................................................................................................... 32
3.3 Research approach and research design ............................................................... 35
3.4 Overview of the studies ........................................................................................ 41
4
Results .................................................................................................................... 43
4.1 Study 1: Preliminary Interviews with company representatives (used in P1)...... 43
4.2 Study 2: Combining safety methods in the BUCS project (Paper P1) ................. 44
4.3 Study 3: Fault report analysis (Papers P2, P3, P5) ............................................... 45
4.4 Study 4: Fault report analysis (Paper P4) ............................................................. 48
4.5 Study 5: Interviewing practitioners about fault management (Paper P6)............. 50
4.6 Study 6: Using hazard identification to identify faults (Paper P7)....................... 51
4.7 Study 7: Experiences from fault report studies (Technical Report P8)................ 52
5
Evaluation and Discussion ..................................................................................... 55
5.1 Contributions ........................................................................................................ 55
5.2 Contribution of this thesis vs. literature................................................................ 57
5.3 Revisiting the Thesis Research Questions, RQ1-RQ4 ......................................... 58
5.4 Evaluation of validity ........................................................................................... 59
5.5 Industrial relevance of results............................................................................... 60
5.6 Reflection: Research cooperation with industry................................................... 61
6
Conclusions and future work.................................................................................. 63
6.1 Conclusions .......................................................................................................... 63
vii
6.2 Future Work.......................................................................................................... 64
Glossary .......................................................................................................................... 67
Term definitions ......................................................................................................... 67
References ...................................................................................................................... 73
Appendix A: Papers........................................................................................................ 81
Appendix B: Interview guide ....................................................................................... 175
viii
List of Figures
Figure 1-1
Figure 1-2
Figure 2-1
Figure 2-2
Figure 2-3
Figure 2-4
Figure 2-5
Figure 2-3
Figure 4-1
Figure 4-2
Figure 4-3
Figure 4-4
Figure 4-5
The studies with their related papers and contributions
The structure of this thesis
Relationship between faults, errors, failures and reliability
Relationship between hazards, accidents and safety
Faults, Hazards, Reliability and Safety
The Rational Unified Process
Relationship of business-critical and other types of criticality
Relationship between faults, errors and failures
Combining PHA/HazOp and Safety Case
Percentage of high severity faults in some fault categories
Quality views associated to defect data, and their relations
Distribution of severity with respect to fault types for all projects
Distribution of hazards represented as fault types (%)
5
8
13
14
15
18
21
22
45
47
48
50
51
List of Tables
Table 2-1
Table 2-2
Table 2-3
Table 2-4
Table 3-1
Table 3-2
Table 3-3
Table 4-1
Table 4-2
Table 4-3
Table 5-1
Examples of different systems’ criticality
Properties of some safety criticality analysis techniques
12 ways of studying technology, from [Zelkowitz98]
Empirical research approaches
Description of our studies
Type of studies in this thesis
Relation between main and local research questions
Distribution of all faults in fault type categories
Distribution of all faults in fault type categories
Fault type distribution across all projects
Relationship of contributions and research questions
ix
22
25
26
28
33
41
41
47
47
49
56
x
Abbreviations
BUCS
CBD
CBSE
CCA
COTS
DBMS
GQM
GUI
ETA
EVISOFT
FMEA
FMECA
FTA
HAZOP
IEEE
INCO
ISO
NFR
NS-ISO
NTNU
OMG
OS
OSS
PHA
QA
RUP
SPI
UML
XP
Business-Critical Software (project)
Component-Based Development
Component-Based Software Engineering
Cause-Consequence Analysis
Commercial Off The Shelf
Data Base Management System
Goal Question Metric
Graphical User Interface
Event Tree Analysis
EVidence based Improvement of SOFTware engineering (project)
Failure Mode and Effects Analysis
Failure Mode Effects and Criticality Analysis
Fault Tree Analysis
Hazard and Operability Analysis
Institute of Electrical and Electronics Engineers
Incremental and component-based software development (project)
International Organization for Standardization
Norwegian Research Council
Norwegian Standard
Norwegian University of Science and Technology
Object Management Group
Operating System
Open Source Software
Preliminary Hazard Analysis
Quality Assurance
Rational Unified Process (by Rational)
Software Process Improvement
Unified Modelling Language (by Rational, later OMG)
Extreme Programming
xi
xii
1
Introduction
In this chapter the background and research context for this thesis is presented. The
chapter also introduces the research design, the research questions and the contributions.
Finally, the list of papers and the outline of the thesis is presented.
1.1 Motivation
The technological development in our society has lead to software systems being
introduced into an increasing number of different business domains. In many of these
areas we become more or less dependent on these systems, and their potential
weaknesses could have grave consequences. In this respect, we can coarsely divide
software products into three categories: safety-critical software (e.g. controlling traffic
signals), business-critical software (e.g. for banking) and non-critical software (e.g. for
word processing).
Evidently, the definition of business-critical versus the other two categories may be
difficult to state precisely, and would in many cases depend on the particular viewpoint
of the business and users. To clarify the distinction between business-critical and safetycritical, we can consider what consequences operation failure (observable and erroneous
behaviour of the system compared to the requirements) will have in the two different
cases. For safety-critical applications, the result of a failure could easily be a physical
accident or an action leading to physical harm for one or more human beings. In the
case of business-critical systems, the consequences of failures are not that grave, in the
sense that accidents do not mean real physical damage, but that the negative
implications may be of a more financial or trust-threatening nature.
Ian Sommerville states that business-criticality signifies the ability of core computer and
other support systems of a business to have sufficient QoS to preserve the stability of
the business [Sommerville04]. Thus business-critical systems are those whose failure
could threaten the stability of a business.
The overall goal for the BUCS project is to better understand and thus sensibly improve
software technologies, including processes used for developing business-critical
software. In order to do this, empirical studies of projects have been performed in
cooperation with Norwegian ICT industry.
Specific BUCS goals as presented in the BUCS project proposal [BUCS02] are the
following:
1
BG1 To obtain a better understanding of the problems encountered by Norwegian
industry during development, operation and maintenance of business-critical
software.
BG2 Study the effects of introducing safety-critical methods and techniques into
the development of business-critical software, to reduce the number of
system failures (increased reliability).
BG3 Provide adapted and annotated methods and processes for development of
business-critical software
BG4 Package and disseminate the effective methods into Norwegian software
industry.
In this thesis, we aim to study how software faults and software fault reporting practises
affects business-critical software, and also if techniques (e.g. PHA, Hazop, etc) from the
area of safety-critical systems development can have a positive effect on other quality
attributes (e.g. reliability) than safety. The relation between faults and failures is
explained in Section 2.4.
1.2 Research Context
This thesis is a part of the work done in the BUCS basic research and development
project (BUsiness-Critical Software). The BUCS project was funded by the Norwegian
Research Council as a basic R&D project in IT, and was run in 2003-2007. Some parts
of the work in this thesis were also financed by the EVISOFT project, a national, userdriven R&D project on software process improvement funded by the Norwegian
Research Council [EVISOFT06].
•
•
•
Within the BUCS project, this thesis will focus on fault reporting processes in businesscritical systems. Some important research issues we want to study are the following:
How do software faults affect the reliability and safety of business-critical systems?
What are the common fault types in business-critical systems?
How can we use system safety methods in business-critical application development?
1.2.2 The BUCS project
The goal of the BUCS project is not to help developers to finish their development on
schedule and budget. We are not particularly interested in the delivered functionality or
how to identify or avoid process and project risk. This is not because we think that these
properties are not important – it is just that we have defined them out of the BUCS
project.
The goal of the BUCS project is to help developers, users and other stakeholders to
develop software whose later use is less prone to critical problems, i.e. has sufficient
reliability and safety. In a business environment this means that the system seldom
behaves in such a way that it causes the customer or his users to lose money, important
information, or both. We will use the term business-critical for this characteristic.
2
Another term is business-safe, which means that a system fulfils the criteria for
business-safety in a business-critical system.
That a system is business-safe does not mean that the system is fault-free, i.e. cannot
possibly fail. What this means is that the system will have a low probability of entering
a state where it will cause serious losses. In this respect, the system characteristic is
close to the term “safe”. This term is, however, wider, since it is concerned with all
activities that can cause damage to people, equipment, the environment or severe
economic losses. Just as with general safety, business-safety is not a characteristic of the
system alone – it is a characteristic of the system’s interactions with its usage
environment.
BUCS are considering two groups of stakeholders and wants to help them both:
• The customers and their users. They need methods that enables them to:
o Understand the dangers that can occur when they start to use the system as
part of their business.
o Write or state requirements to the developers so that they can take care of the
risks incurred when operating the system.
• The developers. They need help to implement the system so that:
o It can be made business-safe.
o They can support their claims with analysis and documentation.
o It is possible to change the systems in such a way that when the operating
environment or profile changes, the systems are still business-safe.
BUCS aim to help the developers to build a business-safe system without large
increases in development costs or schedule. This is achieved by the following
contributions from BUCS:
BC1 A set of methods for analysing business-safety concerns. These methods are
adapted to the software development process in general and – for the first
version – especially to the Rational Unified Process (RUP).
BC2 A systematic approach for analysing, understanding, and protecting against
business-safety related events.
BC3 A method for testing that the customers’ business-safety concerns are
adequately taken care of in the implementation.
Why should development organizations do something that costs extra, i.e. is this a smart
business proposition? We definitively mean that the answer is “Yes”, and for the
following reasons:
• The only solution most companies have to offer to customers with businesssafety concerns today is that the developers will be more careful and test more –
this is not a good enough solution.
• By building a business-safe system, the developers will help the customer to
achieve an efficient operation of their business and thus build an image of a
company that have their customers’ interest in focus. Applying new methods to
increase the products’ business-safety must thus be viewed as an investment.
The return on the investment will come as more business from large, important
customers.
3
BUCS will not invent entirely new methods. What we will do, is to take commonly used
methods, especially from the area of systems safety such as Hazard Analysis and
FMEA, and adapt them to more mainstream software development. This is done by
extending the methods, making them:
• More practical to use in a software development environment.
• Suitable to fit into the ways developers work in a software project environment –
concerning both process and related software tools and methods.
1.3 Research design
As stated in the BUCS project proposal, “The principal goal is through empirical studies
to understand and improve the software technologies and processes used for developing
business-critical software” [BUCS02]. This entails both quantitative and qualitative
studies, and in some cases a combination. Several aspects have to be considered when
performing such studies, and particularly:
• Deciding on the metrics used in the investigations.
• Deciding on the process of retrieving information (data mining, observation,
surveys).
Members of the BUCS project have conducted interviews, experiments, data analysis,
surveys, and case studies. The methods employed in this part of the BUCS project are
structured interviews, historical data mining and analysis, and case studies.
1.4 Research questions and contributions
The goal of this research is to explore quality issues of business-critical software, with
focus on fault reporting and management, as well as the use of safety analysis
techniques for this type of software development. In this thesis, four overall research
questions have been defined:
RQ1. What is the role of fault reporting in existing industrial software development?
RQ2. How can we improve on existing fault reporting processes?
RQ3. What are the most common and severe fault types, and how can we reduce them
in number and severity?
RQ4. How can we use safety analysis techniques together with fault report analysis to
improve the development process?
4
Phase 1
Phase 2
Phase 3
P2
Study 1
Study 3
Preliminary
Interview Study
2003
First fault report
analysis study
2005
Study 5
P3
Interviews on fault
reports
2007
C1
C2
P5
C3
C2
P6
C3
Study 4
Study 7
Second fault report
analysis study
2006
Study 6
Study 2
C1
Experiences on
fault reporting
2007
C2
C3
P4
Assessing hazard
analysis vs. fault
report analysis
2007
Literature study
of safety
methods
2004
P1
C1
C2
P7
C3
June 2007
June 2003
Quantitative
study
C
Qualitative
study
P
Figure 1-1
Contribution
Input
Industrial cooperation
Paper
The studies with their related papers and contributions
The research questions together with the studies performed have resulted in the
following contributions:
C1. Describing how to utilize safety criticality techniques to improve the
development process for business-critical software.
C2. Identification of typical shortcomings in fault reporting.
C3. Improved model of fault origins and types for business-critical software.
Figure 1-1 illustrates how the studies, contributions and research papers are connected.
It also shows the time and sequence of the studies and how the different studies have
influenced each other with input and experience. The background cloud shows which
studies were performed with industrial cooperation.
1.5 Included research papers
This thesis includes seven papers numbered P1 to P7, whose full text is included
verbatim in Appendix A. The papers are briefly described in the following:
5
P1. Jon Arvid Børretzen, Tor Stålhane, Torgrim Lauritsen, and Per Trygve Myhrer:
"Safety activities during early software project phases", In Proc. Norwegian Informatics
Conference (NIK'04), pp. 180-191, Stavanger, 29. Nov. - 1. Dec. 2004.
Relevance to the thesis: This paper describes the introduction and use of safety
criticality analysis techniques in early project phases. It presents several relevant
techniques and how they can be combined with a common development methodology
like RUP.
My contribution: I was the leading author and contributed 80% of the work, including
literature review and paper writing.
P2. Jon Arvid Børretzen and Reidar Conradi: "A study of Fault Reports in
Commercial Projects", In Jürgen Münch and Matias Vierimaa (Eds): Proc. 7th
International Conference on Product Focused Software Process Improvement
(PROFES'2006), pp. 389-394, Amsterdam, the Netherlands, 12-14 June 2006.
Relevance to the thesis: This paper presents work done in the area of fault report
analysis, and describes how using a fault categorization scheme can help identify
problem areas in the development process.
My contribution: I was the leading author and contributed 80% of the work, including
research design, data collection, data analysis and paper writing.
P3. Parastoo Mohagheghi, Reidar Conradi, and Jon A. Børretzen: "Revisiting the
Problem of Using Problem Reports for Quality Assessment", In Kenneth Anderson
(Ed.): Proc. the 4th Workshop on Software Quality, held at ICSE'06, 21 May 2006 - as
part of Proc. 28th International Conference on Software Engineering & Co-Located
Workshops, 21-26 May 2006, Shanghai, P. R. China, ACM Press 2006, ISBN 1-59593085-X, ISSN 0270-5257
Relevance to the thesis: This paper describes experience with working with problem
reports from industry. It discusses several problems with using this type of data and how
they can be used for assessing software quality.
My contribution: I contributed on 30% of the work, including data collection and
analysis, commenting the data material and draft paper.
P4. Jon Arvid Børretzen and Jostein Dyre-Hansen: Investigating the Software Fault
Profile of Industrial Projects to Determine Process Improvement Areas: An Empirical
Study, Proceedings of the European Systems & Software Process Improvement and
Innovation Conference 2007 (EuroSPI07), pp. 212-223, Potsdam, Germany, 26-28 Sept.
2007.
Relevance to the thesis: This paper continues the fault report study focus, refining the
design and execution of the previous study and confirming several of our findings.
My contribution: I was the leading author and contributed 80% of the work, including
research design, data collection, data analysis and paper writing.
6
P5. Jingyue Li, Anita Gupta, Jon Arvid Børretzen, and Reidar Conradi: "The
Empirical Studies on Quality Benefits of Reusing Software Components" Proc. The
First IEEE International Workshop on Quality Oriented Reuse of Software
(QUORS'2007), held in conjunction with IEEE COMPSAC 2007, 5 p, Beijing, July 2327, 2007.
Relevance to the thesis: This paper uses data from our first fault report study, and
presents a study where defect types are compared in reusable components with nonreusable components.
My contribution: I contributed 20% of the work, including theory definition, data
collection, data analysis and commenting on data material, results and draft paper.
P6. Jon Arvid Børretzen: “Fault classification and fault management: Experiences
from a software developer perspective”. 14 pages, submitted to Journal of Systems and
Software.
Relevance to the thesis: This paper presents findings from a series of interviews
performed with developers involved in fault reporting, and seeks to describe problems
and issues in fault management and reporting as seen from the practitioners’ viewpoint.
My contribution: I contributed 95% of the work, including interviews, transcription,
coding, analysis and paper writing.
P7. Jon Arvid Børretzen: “Using Hazard Identification to Identify Potential Software
Faults: A Proposed Method and Case Study”. 10 pages, submitted to the First
International Conference on Software Testing, Verification and Validation,
Lillehammer, Norway, April 9-11, 2008.
Relevance to the thesis: This paper seeks to combine the knowledge gained from fault
reports analysis with the potential of hazard analysis techniques, and proposes a novel
method for doing this.
My contribution: I contributed 85% of the work, including Hazard analysis, Fault
report analysis, data analysis and paper writing.
1.6 Thesis structure
Chapter 2 deals with issues about software engineering in general and state-of-the-art,
including software criticality and especially business-critical software and an overview
of the most important challenges in these areas. Chapter 3 presents the context for the
BUCS project and research, the methods used and the research questions for this thesis.
Chapter 4 presents the results of the studies performed. An evaluation of the
contributions and results are made in Chapter 5. Chapter 6 sums up the thesis work and
present relevant issues for further work. Figure 1-2 illustrates how the thesis is
composed.
7
Theory and stateof-the-art
Chapter 2
Results
Chapter 4
Evaluation
Chapter 5
Conclusions and
future work
Chapter 6
Research Context
and design
Chapter 3
Figure 1-2
The structure of this thesis
In this thesis I have used the term “we” when presenting the work, both when
presenting my description of the work in Chapters 1-6 and in the collaborative work
from the papers P1 to P7 – in Appendix A.
8
2
State-of-the-art
This chapter describes the challenges in software engineering that are the motivation
behind improving approaches for business-critical software development. Then, there is
a presentation of literature related to business-critical software development. The
definitions of these subjects are discussed and research challenges are described for
each of them. Finally, the chapter is summarized and the research challenges are
described related to the studies in this thesis.
2.1 Introduction
In the engineering of business-critical software systems, as in the engineering of other
software systems, there is a multitude of different methods, techniques and processes
being employed by industry. Since business-critical applications are not really an
established phrase or topic within the software engineering community, it is therefore
difficult to point out specific methods and techniques that are being used when
business-critical systems are being developed. Instead, the most common methods of
software engineering will be presented in the following, with additional comments on
how they may be best utilized to aid the development of business-critical systems. Also,
a presentation of methods from the development of safety-critical applications will be
made as these are relevant for use in the BUCS project in general.
2.2 Software engineering
Software Engineering is an engineering discipline dealing with all aspects of software
development from the early stages of system specification to maintaining the system
after it has gone into use. Software engineering is the profession concerned with
creation and maintenance of software by applying computer technology, project
management, domain knowledge, and other skills and technologies. Fairley says that:
"Software engineering is the technological and managerial discipline concerned with
systematic production and maintenance of software products" of required functionality
and quality "on time and within cost estimates" [Fairley85].
On the other hand, software systems have social and economic value, by making people
more productive, improving their quality of life, and making them able to perform work
and actions that would otherwise be impossible, like controlling a modern aeroplane.
Software engineering technologies and practices help developers by improving
productivity and quality. The field has been evolving continuously from its early days
9
in the 1940s until today in the 2000s. The ongoing goal is to improve technologies and
practices, seeking to improve the productivity of practitioners and the quality of
applications for the users.
The effort in software engineering technology was stepped up due to the “software
crisis” (a term coined in 1968), which identified many problems of software
development [Glass94]. Many software projects ran over budget and schedule. Some
projects caused property damage, and a few projects actually caused loss of life. The
software crisis was originally defined in terms of productivity, but evolved to emphasize
quality. The most common result of failed software development projects are projects
that overrun their schedule and budget, but more serious consequences may also be the
result of poorly executed software projects.1
Cost and Budget Overruns: A survey conducted at the Simula Research Laboratory in
2003 showed that 37% of the investigated projects used more effort than estimated. The
average effort overrun was 41%, with 67% in projects with a public client, and 21% for
projects with a private client [Moløkken04].
Property Damage: Software defects can cause property damage. Poor software
security allows hackers to steal identities, and defective control systems can damage the
physical systems the software is controlling. The result is lost time, money, and
damaged reputation. The expensive European Ariane 5 rocket exploded on its virgin
voyage in 1996, because its software operated under different flight conditions than the
software was designed for [Kropp98].
Life and Death: Defects in software can be lethal. Some software systems used in
radiotherapy machines failed so gravely that they administered lethal doses of radiation
to patients [Leveson95].
The use of the term “software crisis” has been slowly fading out; perhaps because the
software engineering community have come to the understanding that it is unrealistic
and unproductive to remain in crisis mode for this many years. Software engineers are
accepting that the problems of software engineering are truly difficult and only hard
work over a long period of time can solve them. Processes and methods have become
major parts of software engineering, e.g. object-orientation (OO) and the Rational
Unified Process (RUP). Studies have however shown, that many practitioners resist
formalized processes, which often treats them impersonally like machines, rather than
creative people [Thomas96]. The profession of software engineering is important, and
has made big improvements since 1968, even though it is not perfect. Software
engineering is a relatively young field, and practitioners and researchers continually
work to improve the technologies and practices, in order to improve the final products
and to better comply with the needs of the users and customers.
1
Peter G. Neumann has done much work on this subject, and edits a contemporary list of software problems and
disasters on his website http://catless.ncl.ac.uk/Risks/ [Neumann07].
10
2.3 Software Quality
The word quality can have several meanings and definitions, even though most of these
definitions try to communicate practically the same idea. Often, the context in which the
quality is to be judged, decides which definition that will be used. The context could be
user-orientated, product-oriented, production-oriented or even emotionally oriented.
ISO defines quality as “The totality of features and characteristics of a product/service
that bears upon its ability to satisfy stated or implied needs” [ISO 8402]. Another ISO
definition is “Quality: ability of a set of inherent characteristics of a product, system or
process to fulfil requirements of customers and other interested parties” [ISO 9001].
Aune presents the following simplified definitions from the ISO 8402 standard
[Aune00]:
1. Quality: Conformity with requirements (or needs, expectations, specifications)
2. Quality: The satisfaction of the customer
3. Quality: Suitability for use (at a given time)
Software quality in terms of reliability is often related to faults and failures, e.g. in
number of faults found, or failure rate over a period of time during use. Added to this,
as the before mentioned definitions imply, there are other quality factors that are
important, e.g. the software’s ability to be used in an effective way (i.e. its usability).
There is a multitude of concepts that together can be used to define quality, where the
importance of a given factor or characteristic depends on the software context.
Reliability, Usability, Safety, Security, Availability and Performance are common
examples. The glossary in Appendix A describes some of the relevant quality attributes.
2.3.1 Software Quality practices
Quality Assurance (QA)
QA is the planned and systematic efforts needed to gain sufficient confidence in that a
product or a service will satisfy stated requirements to quality (e.g. degree of
safety/reliability). Alternatively, QA is control of product and process throughout
software development, so that we increase the probability that we manage to fulfil the
requirements specifications. Software QA involves the entire software development
process, monitoring and improving the process, making sure that any agreed-upon
standards and procedures are followed, and ensuring that problems are found and dealt
with. QA work is oriented towards problem “prevention”. Solving problems is a highvisibility process; preventing problems is low-visibility.
Among the duties of a QA team are certification and standardization work, as well as
internal inspections and reviews. Other relevant QA tasks are inspections, testing,
verification and validation, some of which are presented further in section 2.5.4.
Software Process Improvement (SPI)
Software Process Improvement is basically systematic improvement of the work
processes used in a software-producing organization, based on organizational goals and
11
backed by empirical studies and results. Capability Maturity Model Integration (CMMI)
and ISO 9000 are examples of ways to assess and certify software processes. Statistical
Process Control (SPC) and the Goals/Question/Metric (GQM) paradigm are examples
of methods used to implement Software Process Improvement [Dybå00], but these
require a certain level of stability in an organization to be applicable.
To be able to measure improvement, we have to introduce measurement into software
development processes. SPI initiatives are generally based on measurement of
processes, followed by results and information feedback into the process under study.
The work in this thesis is directed towards measurement of faults in software, and how
this information may be used to improve the software process and product.
2.4 Anomalies: Faults, errors, failures and hazards
Improving software quality is a goal of most software development organizations. This
is not a trivial task, and different stakeholders will have different views on what
software quality is. In addition, the character of the actual software will influence what
is considered the most important quality attributes of that software. For many
organizations, analyzing routinely collected data could be used to improve their process
and product quality. Fault reports is one possible source of such data, and research
shows that fault analysis can be a viable approach to certain parts of software process
improvement [Grady92]. One important issue in developing business-critical software is
to remove possible causes for failure, which may lead to wrong operations of the
system. In our studies we will investigate fault reports from business-critical industrial
software projects.
Software quality encompasses a great number of properties or attributes. The ISO 9126
standard defines many of these attributes as sub-attributes of the term “quality of use”
[ISO91]. When speaking about business-critical systems, the critical quality attribute is
often experienced as the dependability of the system. In [Laprie95], Laprie states that “a
computer system’s dependability is the quality of the delivered service such that
reliance can justifiably be placed on this service.” According to [Avizienis04] and
[Littlewood00], dependability is a software quality attribute that encompasses several
other attributes, especially reliability, availability, safety, integrity and maintainability2.
The term dependability can also be regarded subjectively as the “amount of trust one
has in the system”. Quality-of-Service (QoS) is the dependability plus performance,
usability and certain provision aspects [Emstad03].
Much effort has been put into reducing the probability of software failures, but this has
not removed the need for post-release fault-fixing. Faults in the software are detrimental
to the software’s quality, to a greater or lesser extent dependent on the nature and
severity of the fault. Therefore, one way to improve the quality of developed software is
to reduce the number of faults introduced into the system during initial development.
2
In Laprie’s initial dependability definition, the attribute security was present, while the attributes integrity and
maintainability were not [Laprie95].
12
Faults are potential flaws (i.e. incorrect versus explicitly stated requirements) in a
software system, that later may be activated to produce an error (as incorrect internal
dynamic state). An error is the execution of a "passive fault", and my lead to a failure
(for incorrect external dynamic state). This relationship is illustrated in Figure 2-1. A
failure results in observable and incorrect external behaviour and system state. The
remedies for errors and failures are to limit the consequences of an active error or
failure, in order to resume service. This may be in the form of duplication, repair,
containment etc. These kinds of remedies do work, but studies have shown that this
kind of downstream (late) protection is more expensive than preventing the faults from
being introduced into the code [Leveson95].
Fault (static)
Potential flaw,
erroneous
program.
Error (dynamic)
Erroneous internal
system state.
Failure (dynamic)
Erroneous external
behaviour.
Reliability
Figure 2-1 Relationship between faults, errors, failures and reliability
Faults that unintentionally have been introduced into the system during some lifecycle
phase can be discovered either by formal proof or manual inspections before the system
is run, by testing during development or when the application is run on site. The
discovered faults are then reported in some fault reporting system, to be candidates for
later correction. Software may very well have faults that do not lead to failure, since
they may never be executed, given the actual context and usage profile. Many such
faults will remain in the system unknown to the developers and users. That is, a system
with few discovered faults is not necessarily the same as a system with few faults.
Indeed, many reported faults may be deemed too “exotic” or irrelevant to correct.
Inversely, a system with many reported faults may be a very reliable system, since most
relevant faults can have been eliminated. Faults are also commonly known as defects or
bugs, while a more extensive concept is anomaly, used in the IEEE 1044 standard
[IEEE 1044].
The relationship between faults, errors and failures concerns the reliability dimension. If
we look at the safety dimension, we have a relationship between hazards and accidents.
A hazard is a state or set of conditions of a system or an object that, together with other
conditions in the environment of the system or object, may lead to an accident (safety
dimension) [Leveson95]. Leveson defines an accident as “an undesired and unplanned
(but not necessatily unexpected) event that results in at least a specified level of loss.”
13
The connection between hazards and safety is defined through Leveson’s definition of
safety: “Safety is freedom form accidents or losses”. Figure 2-2 illustrates this
relationship.
Hazards (static)
Potential negative
event
Accident/loss
(dynamic)
Negative effect of
event occuring
Safety
Figure 2-2 Relationship between hazards, accidents and safety
To reduce the chances of critical faults existing in a software system, the latter should
be analyzed in the context of its environment and operation to identify possible
hazardous events [Leveson95]. Hazard analysis techniques like Failure and Effect
Analysis (FMEA) and Hazard and Operability Study (Hazop) can help us to reduce the
product risk stemming from such accident. Hazards encompass a greater scope than
faults, because a system can be prone to many hazards even if it has no faults. Hazards
are related to the system’s environment, not just to the software itself. Therefore they
may be present even though the system fulfils the requirements specifications
completely, i.e. has no faults.
The full lines in Figure 2-3 show the common view of how faults are related to
reliability and hazards are related to safety. In parts of the thesis we also suggest that
faults may influence safety and hazards may influence reliability, as shown by the
dotted lines. Literature searches shows little work that have been done in this specific
area, but the fact that faults and hazards do share some characteristics make plausible
connections between faults and safety and hazards and reliability also, at least from a
pragmatic viewpoint.
Avizienis et al. emphasize that fault prevention and fault tolerance aim to provide the
ability to deliver a service that can be trusted, while fault removal and fault forecasting
aim to reach confidence in that ability, by justifying that the functional, dependability
and security specifications are adequate, and that the system is likely to meet them
[Avizienis04]. Hence, by working towards techniques that can prevent faults and reduce
the number and severity of faults in a system, the quality of the system can be improved
in the area of reliability (and thus dependability).
14
Faults
Reliability
Hazards
Figure 2-3
Safety
Faults, Hazards, Reliability and Safety
A usual course of events leading to a fault report is that someone reports a failure
through testing or operation, whereupon a report is logged. This report could initially be
classed as a failure report, as it describes what happened when the system failed. As
developers examine the report, they will eliminate reported “problems” that were not
real failures (often caused by wrong user commands) or duplicates of previously
reported ones. Primarily, they work to establish what caused the failure, i.e. the originall
fault. When they identify the fault, they can choose to repair the fault and report what
the fault was and how it was repaired. The failure report has thus become a fault report.
When looking at a large collection of fault/failure reports in a system in testing or
operation, some faults have been repaired, while others have not (and may never be).
Still, we choose to refer to a report of a software failure as a fault report, even if the
fault has not yet been identified, since it is stored with the other fault reports, and work
is usually being done to identify the fault that caused the failure.
2.4.1 Reflection and challenges
As stated in Section 1.1, the terminology from the literature, although clear and concise
in each individual field and source, gets confusing and conflicting when you compare
definitions. In our work, we have not tried to redefine the terms and definitions to make
them smoothly fit together, we merely want to explain some of our understanding about
faults and fault reporting, to the degree it is relevant for the thesis.
We still see a need for work unifying concepts, especially in the reliability area. There is
great diversity in the literature on the terminology used to report software or system
related problems. The possible differences between problems, troubles, bugs,
anomalies, defects, errors, faults or failures are discussed in books (e.g., [Fenton97]),
standards and classification schemes such as IEEE Std. 1044-1993 [IEEE 1044] the
United Kingdom Software Metrics Association (UKSMA)’s scheme [UKSMA], and
papers; e.g., [Freimut01]. Until there is agreement on the terminology used in reporting
problems, we must be aware of these differences and answer the above questions when
using a term.
15
2.5 Current methods and practices
2.5.1 General software engineering paradigms
In software engineering there have been many different paradigms or life-cycle models.
The most common and well known paradigms are presented in the following.
The traditional software process (waterfall): The waterfall model was the first widely
used software development model. It was first proposed in 1970 by W. W. Royce
[Royce70], in which software development is seen as flowing steadily through the
phases of requirements analysis, design, implementation, testing (validation),
integration and maintenance. In the original article, Royce advocated using the model
repeatedly, in an iterative way. However, many people do not know that, and some have
unjustly discredited this paradigm for real use. In practice, the process rarely proceeds
in a purely linear fashion. Iterations, by going back to or adapting results of previous
stages, are common.
The spiral model: The spiral model was defined by Barry Boehm [Boehm88], and
combines elements of both design and prototyping in stages, so it's a mix of top-down
and bottom-up concepts. This model was not the first model to discuss iteration, but it
was the first model to explain why iteration is important. As originally envisioned, the
iterations were typically 6 months to 2 years long. This persisted until around 2000.
Increasingly, development has turned towards shorter iteration periods, because of
higher time-to-market demand. In her doctoral thesis, Parastoo Mohagheghi reports
iterations of 2-3 months being common [Mohagheghi04b].
Prototyping, iterative and incremental development: The prototyping model is a
software development process that starts with (incomplete) requirements gathering,
followed by prototyping and user evaluation. Often the customer/user may not be able
to provide a complete set of application objectives, detailed input, processing, or output
requirements at the start. After user evaluation, another prototype will be built based on
feedback from users, and again the cycle returns to customer evaluation.
Agile methods: The benefits of agile methods for small teams working with rapidly
changing requirements have been documented [Beck99]. However, both by proponents
and critics, the applicability of agile methods to larger projects is hotly debated. Largescale projects, with high QA requirements, have traditionally been seen as the homeground for plan-driven software development methods. Deciding when to use agile
methods also depends on the values and principles that a developer wishes to be
reflected in her/his work. Extreme Programming (XP) [Beck99], one of the more
popular of the agile methods, is explicit in its demand for developers to follow a "code
of software conduct" that transmits these values and principles to the project at-hand. In
keeping with the philosophy of agile methods, there is no rigid structure defining when
to use any particular feature of these approaches(!).
16
2.5.2 Software Reuse
Reuse in software development is a term describing development that includes
systematic activities for creation and later incorporation ("reuse") of common, domainspecific artifacts. Reuse can lead to profound technological, practical, economic, and
legal obstacles, but the benefits may be substantial. It mostly concerns program artifacts
in the form of components. In the SEI’s report [Bachmann00] on technical aspects of
CBD, a component is defined as:
• An opaque implementation of functionality.
• Subject to third-party composition.
• Conformant to component model.
Software development that systematically develops domain-specific and generalized
software artifacts for possible, later reuse is called software development for reuse.
Software development that systematically makes use of such pre-made, reusable
artefacts, is called software development with reuse.
Component-based software engineering (CBSE)
Component-based software engineering is a field of study within software engineering,
building on prior theories of software objects, software architectures, software
frameworks and software design patterns, and on extensive theory of object-oriented
(OO) programming and design of all these. It claims that software components, like the
idea of a hardware component used e.g. in telecommunication systems, can be
ultimately made interchangeable and reliable. CBSE is often said to be mostly software
development with reuse, and with emphasis on reusing components developed outside
the actual project.
Commercial Off-The-Shelf (COTS)
COTS components are external executable software components being sold, leased, or
licensed to the general public; offered by a vendor trying to profit from it; supported and
evolved by the vendor, and used by the customers normally without source code access
or modification ("black box"). Different ways of incorporating COTS-based activities is
described by Li et al. in [Li06].
Open Source Software (OSS)
Open Source Software is software released following the principles of the open source
movement. In particular, it must be released under an Open Source license as defined by
the Open Source Definition, with there being over 50 license types. The Open Source
movement is a result of the free software movement, that advocates the term "Open
Source Software" as an alternative term for free software, and primarily makes its
arguments on pragmatical rather than philosophical grounds. Nearly all Open Source
Software is also "Free Software". An OSS component is an external component for
which the source code is available ("white box"), and the source code can be acquired
either free of charge or for a nominal fee, and with a possible obligation to report back
any changes done.
17
2.5.3 Specific software development methods
The two following methods are well-known and commonly used in software
development.
Rational Unified Process (RUP)
The Rational Unified Process (RUP) is a software process, design and development
method created by the Rational Software Corporation [Rational], and is described in
[Kruchten00] and [Kroll03]. It describes how to effectively deploy software using
commercially proven techniques. It is really a heavyweight process, and therefore
particularly applicable to larger software development teams working on large projects.
It is essentially an incremental development process which centers on the Unified
Modelling Language (UML) [Fowler04]. It divides a project into four distinct phases;
Inception, Elaboration, Construction and Transition. Figure 2-4 shows the overall
architecture of the RUP.
Figure 2-4
The Rational Unified Process
Patterns and Architecture-driven methods
Design patterns are recurring solutions to problems in object-oriented design. The
phrase was introduced to computer science in the 1990s by the text “Design Patterns:
elements of reusable object-oriented software” [Gamma95]. The scope of the term
remained a matter of dispute into the next decade. Algorithms are not thought of as
design patterns, since they solve implementation problems rather than design problems.
Typically, a design pattern is thought to encompass a tight interaction of a few classes
and objects. Three major terms have been proposed: pattern languages, pattern catalogs
and pattern systems [Riehle96].
The architect Christopher Alexander's work on a pattern language, for designing
buildings and communities, was the inspiration for the design patterns of software
[Price99]. Interest in sharing patterns in the software community has led to a number of
books and symposia. The goal of the pattern literature is to make the experience of past
designers accessible to beginners and others in the field. Design patterns thus presents
18
different solutions in a common format, to provide a language for discussing design
issues.
2.5.4 Techniques for increasing trust in software systems
In addition to the general practices of QA and SPI for improving quality in software
systems, there are also some specific verification techniques that are commonly used in
software development to increase the trust in software. Software verification is a
discipline whose goal is to assure that software fully satisfies all the expected
requirements, and the following are some well known techniques in use:
Testing: Dynamic verification is performed during the execution of software, and
dynamically checks its behaviour; it is commonly known as testing. Testing is part of
more or less all software development processes, and can be performed at many levels,
for instance unit level, interface level and system level.
Inspections: An inspection is also a very common sort of review used in software
development projects. The goal of the inspection is for all of the inspectors to reach
consensus on a work product and approve it for use in the project. Commonly inspected
work products include software requirements specifications, design documentation and
test plans. In an inspection, a work product is selected for review and a team is gathered
for an inspection meeting to review the work product. In an inspection, a defect is any
part of the work product that will keep an inspector from approving it. For example, if
the team is inspecting a software requirements specification, each defect will be text in
the document which an inspector disagrees with. Basili et al. describes an investigation
of an inspection technique called perspective-based testing in [Basili00].
Formal methods: Formal methods are mathematically-based techniques for the
specification, development and verification of software and hardware systems. The use
of formal methods for software and hardware design is motivated by the expectation
that, as in other engineering disciplines, performing appropriate mathematical analyses
can contribute to the reliability and robustness of a design. However, the high cost of
using formal methods means that they are usually only used in the development of highintegrity systems, where safety or security is important. Heimdahl and Heitmayer
present some issues concerning formal methods in [Heimdahl98].
2.5.5 Business-Critical computing and related terms
At first glance, there is little evidence of work on business-critical computing, when
searching the literature. The term “mission-critical” is much more commonly used, and
can be interpreted to include many of the characteristics of “business-critical”. The key
similarity is that both terms are related to the core activity of an organization, and that
the computer systems supporting this activity should not fail. Another term that comes
from Software Engineering Institute (SEI) is “performance-critical” [SEI], and has
much of the same meaning as “business-critical”.
19
“Safety-critical” systems are closely connected to these former terms, but this term has a
more severe meaning. Nonetheless, most of the main characteristics of these terms are
the same; i.e. that reliability, availability and similar quality attributes are deemed very
important. Safety-critical systems have been much more thoroughly researched than the
other types of “-critical” systems, simply because of the seriousness of failure and the
potential effects of failure in safety-critical systems.
2.6 Business-critical software
As mentioned, our societies’ dependency on timely and well-functioning software
systems is increasing. Banking systems, train control systems, airport landing systems,
automatic teller machines and industrial process control systems are but examples of the
systems many of us are directly or indirectly critically dependent on. Of these, some are
highly critical to our safety (e.g. traffic control), while others are critical only in the
sense that we are able to perform operations that we want or need to carry out our
work/business (for instance cinema ticket sales).
That a software-intensive system is business-critical means that:
If and when a system failure occurs, the consequences are restricted to
financial or financially related negative implications, not including
physical harm to humans, animals or physical objects. The consequences
are severe enough to mean a considerable loss of money if the fault or
failure is not corrected or averted swiftly enough.
2.6.1 Criticality definitions
Business-critical software systems have a lot in common with safety-critical systems,
but there are also quite telling differences. A simplistic way to distinguish them is to put
them into classes according to the effects that software anomalies (faults or hazards)
may have on the environment. The classes are safety-critical, mission-critical,
performance-critical, business-critical, and non-critical:
Safety-critical: A safety-critical system could be a computer, electronic or
electromechanical system where a hazardous event may cause injury or even death to
human beings, or physical harm to other objects that interact with the system. Examples
are aircraft control systems and nuclear power-station control systems, where an
accident in most cases will lead to economic losses as well as injury and other physical
damage. Common tools to design safety-critical systems are redundancy and formal
methods, and a spectrum of specialized technologies exist for safety-critical systems
(Hazop, Fault-tree analysis etc). The IEC 61508 standard is intended to be a basic
functional safety standard applicable to all kinds of industry, and is also used to define
the safety standards of some safety-critical systems [IEC 61508].
20
Mission-critical: The term mission-critical system reflects military usage and is used to
describe activities, processing etc., that are deemed vital to the organization's business
success and, possibly, its very existence. Some major software systems are described as
mission-critical if such a system, product or service experiences a failure or is otherwise
unavailable to the organization, it will have a significant negative impact upon the
organization. Such systems typically include support for accounts/billing, customer
balances, computer-controlled machinery and production lines, just-in-time ordering,
and delivery scheduling. Examples of related technologies are Enterprise Resource
Planning tools, such as SAP [SAP].
Performance-critical: The SEI defines performance-criticality as the ability of
software-intensive systems to perform successfully under adverse circumstances, e.g.,
under heavy or unexpected load or in the presence of subsystem failures. One trivial
example of this is the performance of the SMS telecom services during New Years Eve.
Some services like this can have critical functions, and yet, the behaviour of systems
under such circumstances is often less than acceptable [SEI].
Business-critical: The difference between a business-critical and a regular commercial
software system is really defined by the business. There is no established general
definition telling us which software applications are critical to an operation. In a retail
business, a Customer Relationship Management (CRM) system may be the most
important. On the other hand, it may be the manufacturing or supplier management
software that is the most important. We need to consider the impact of relevant services
from software on the business operations, and determine how much value each brings to
the business and the impacts of such software parts being unavailable. The impact can
be lost revenue, corrupted data or lost user time, as well as indirect and more elusive
losses in customer reputation, goodwill, slipped deadlines, and increased levels of stress
among employees and customers.
Non-critical: Although important enough, some types of software will simply not be
classified as critical. Word processors, spreadsheets and graphical design software are
examples of such software. Of course it is expected that such tools are reasonably faultfree and stable, but should they fail, the damage will usually be limited, typically a
person-day of effort in the worst case scenario.
Figure 2-5 shows the relationship between business-criticality and the other types of
criticality defined here. As we see, safety-, performance-, and mission-critical systems
can also be business-critical, but a business-critical system need not be one of the
others. Table 2-1 illustrate the overlap between the different categories.
21
Figure 2-5
Relationship of business-critical and other types of criticality
Table 2-1
Examples of different systems’ criticality
Example
Criticality
category
Safety-critical
Nuclear reactor control system.
Performance-critical Electronic toll collection in traffic, must process and transfer
information quickly enough to keep up with traffic.
Mission-critical
Software handling financial transactions between banks.
Functional and non-functional aspects of such applications are
considered.
Business-critical
Software handling financial transactions between banks. As
mission-critical, but wider consequences are also considered.
Non-critical
Computer games, word processor application.
2.7 Techniques and methods used to develop safety-critical
systems
There are a number of methods and techniques that are commonly employed when
making safety-critical systems. Some of them will be presented here and related to
business-critical computing. According to [Leveson95] and [Rausand91], the most
common ones are the following:
o PHA (Preliminary Hazard analysis): Preliminary Hazard Analysis (PHA) is used
in the early project life cycle stages to identify critical system functions and broad
system hazards, so as to enable hazard elimination, reduction or control further on in
the project. The identified hazards are assessed and prioritized, and safety design
criteria and requirements are identified. A PHA is started early in the concept
22
exploration phase so that safety considerations are included in tradeoff studies and
design alternatives. This process is iterative, with the PHA being updated as more
information about the design is obtained and as changes are being made. The results
serve as a baseline for later analysis and are used in developing system safety
requirements and in the preparation of performance and design specifications. Since
PHA starts at the concept formation stage of a project, little detail is available, and
the assessments of hazard and risk levels are therefore qualitative. A PHA should be
performed by a small group with good knowledge about the system specifications.
o HAZOP (Hazard and Operability Analysis): This is a method to identify possible
safety-related or operational problems that can occur during the use and
maintenance of a system. Both Preliminary Hazard Analysis and Hazard and
Operability Analysis (HAZOP) are performed to identify hazards and potential
problems that the stakeholders see at the conceptual stage, and that could be created
by system usage. A HAZOP study is a systematic analysis of how deviations from
the intended design specifications in a system can arise, and whether these
deviations can result in hazards. Both analysis methods build on information that is
available at an early stage of the project. This information can be used to reduce the
severity or build safeguards against the effects of the identified hazards. HAZOP is
a creative team method, using a set of guidewords to trigger creative thinking
among the stakeholders and the cross-functional team in RUP. The guidewords are
applied to all parts and aspects of the system concept plan and early design
documents, to find and eliminate possible deviations from design intentions. An
example of a guideword is MORE. This will mean an increase of some quantity in
the system. For example, by using the “MORE” guideword on “a customer client
application”, you would have “MORE customer client applications”, which could
spark ideas like “How will the system react if the servers get swamped with
customer client requests?” and “How will we deal with many different client
application versions making requests to the servers?” A HAZOP study is conducted
by a team consisting of four to eight persons with a detailed knowledge of the
system to be analysed. The main difference between HazOp and PHA is that PHA is
a lighter method that needs less effort and available information than the HAZOP
method. Since HAZOP is a more thorough and systematic analysis method, the
results will be more specific. If there is enough information available for a HAZOP
study, and the development team can spare the effort, a HAZOP study will most
likely produce more precise and suitable results for a safety requirement
specification.
o FMEA (Failure Modes and Effects Analysis): The method of Failure Modes and
Effects Analysis, alternatively the variant Failure Modes, Effects and Criticality
Analysis (FMECA), is used to study the potential effects of fault occurrences in a
system. Failure Modes and Effects Analysis is a method for analyzing potential
reliability problems early in the development cycle. Here, it is easier to overcome
such issues, thereby enhancing the reliability through design. FMEA is used to
identify potential failure modes, determine their effect on the operation of the
system, and identify actions to mitigate such failures. A crucial step is anticipating
what might go wrong with a product. While anticipating every failure mode is not
23
possible, the development team should formulate a extensive list of potential failure
modes. Early and consistent use of FMEAs in the design process can help the
engineer to design out failures and produce more reliable and safe products. FMEAs
can also be used to capture historical information for use in future product
improvement.
o FTA (Fault Tree Analysis): A Fault Tree Analysis diagram is a logical diagram
which illustrates the connection between an unwanted event and the causes of this
event. The causes can include environment factors, human error, strange
combinations of “innocent” events, normal events and outright component failures.
The two main results are: 1) The fault tree diagram which shows the logical
structure of failure effects. 2) The cut-sets, which show the sets of events which can
cause the top event – system failure. If we can assign probability values or failure
rates to each basic event, we can also get quantitative predictions for Mean Time To
Failure (MTTF) and failure rate for the system.
o ETA (Event-tree analysis): An event-tree is a graphical representation of a
sequence of related events. Each branching point in the tree is a point in time where
we can get one of two or more possible consequences. The event-tree can be
described with or without branching probabilities. In economical analyses it is
customary to assign a benefit or cost to each possible alternative – or branch. An
event tree can help our understanding and documentation of one or more sequences
of events in a system or part of a system. Areas where we can use event-trees are: 1)
Study of error propagation through a complete system – people, operational
procedures, hardware, and software. 2) Build usage scenarios to enhance HazOp:
“what could happen if…?”
o CCA (Cause-Consequence Analysis): Cause-consequence analysis (CCA) is a
two-part system safety analytical technique that combines Fault Tree Analysis and
Event Tree Analysis. Fault Tree Analysis considers the “causes” and Event Tree
Analysis considers the “consequences”, and hence both deductive and inductive
analysis is used. The purpose of CCA is to identify chains of events that can result
in unwanted consequences. With the probabilities of the various events in a CCA
diagram, the probabilities of the various consequences can be calculated, thus
establishing the risk level of the system. A CCA starts with a critical event and
determines the causes of the event (using top-down or backward search) and the
consequences it might create (using forward search). The cause-consequence
diagram can show both temporal dependencies and causal relationships among
events. The notation builds on the FTA and ETA notations, and extends these with
timing, condition and decision alternatives. The result is a diagram (along with
elaborated documentation), showing both a logical structure of the cause of a critical
event and a graphical representation of the effect the critical event can have on the
system. CCA enables probability assessments of success/failure outcomes at staged
increments of system examination. Also, the CCA method helps in creating a link
between the FTA and ETA methods. CCA shows the sequence of events explicitly,
which makes CCA diagrams especially useful in studying start-up, shutdown and
other sequential control issues. Other advantages are that multiple outcomes are
24
analyzed from each critical event, and different levels of success/failure are
distinguishable, as CCA may be used for quantitative assessment.
In addition to these techniques, we included the Safety Case method tool for use
alongside the other safety criticality analysis methods. The purpose is to keep track of
the requirements and information acquired when using safety criticality analysis
methods. Usage of the Safety Case method is also presented in paper P1.
o Safety Case: The Safety Case method seeks to minimise safety risks and
commercial risks by constructing a demonstrable safety case. Bishop and
Bloomfield [Adelard98, Bishop98] define a safety case as: “A documented body of
evidence that provides a convincing and valid argument that a system is adequately
safe for a given application in a given environment”. The safety case method is a
vehicle for managing safety claims, containing a reasoned argument that a system is
or will be safe. It is manifested as a collection of data, metadata and logical
arguments. The Safety Case documents will answer questions like “How will we
argue that this system can be trusted/ is safe?” The Safety Case shows how safety
requirements are decomposed and addressed, and will provide an appropriate
answer to the above questions. The layered structure of the Safety Case allows
lifetime evolution and helps to establish the safety requirements at different detail
levels.
Table 2-2 shows a comparison of the safety criticality analysis methods we have
considered. The properties shown are relevant when choosing between such analysis
techniques. The costs involved are described for each method by the properties
“Formalization” and “Effort needed”. Other properties are the requirements for
available system information, which can range from a sketchy system description to a
full system description including all technical documentation and code. The process
stage is also important, as it tell us where in the development cycle the technique is best
suited.
Table 2-2
Properties of some safety criticality analysis techniques
Technique
output
Identification of hazards,
their causes, effects and
possible barriers or
measures.
Participant
roles/groups
Small group (with
moderator).
HAZOP
Moderate
Moderate
Middle
Moderate.
Specification and design
documentation of the
system.
Identification of possible
safety- or operational
problems, their causes,
effects and suggested
solution.
Moderator, secretary, 4-6
domain experts.
Application
Identifying hazards.
Identifying hazards.
Formalization
Effort needed
Process stage
System
information
requirements
PHA
Low
Low
Early
Low.
Any system information.
25
FMEA
High
Moderate
Middle
Moderate.
Detailed information
about the system.
Identification of fault
modes for all
components, their causes,
effects and severity.
System developers with
good knowledge of the
system’s operating
environment.
Predict events.
ETA
Moderate
Low
Early
High.
Quantitative
reliability data for
the analyzed parts.
CCA
High
Moderate
Middle
High.
As for ETA and
FTA.
Output
Identification of
event chains that
could lead to
accidents.
Combination of
ETA and FTA,
where time
sequenced events
and discrete,
staged levels of
outcome are
shown.
Participant
roles/groups
1-4 persons, depending on the size of the system, some with
knowledge of the system and its environment.
Application
Predict events.
Formalization
Effort needed
Process stage
System
information
requirements
Combines ETA
and FTA.
FTA
High
Moderate
Late
High.
Knowledge about
the system’s
failure modes
(from FMEA
analysis).
Logical illustration
of the relationship
between an
unwanted event
and its causes.
Analyzing causes
of hazards.
Safety Case
Moderate
Moderate/high
Entire
High.
All available
information
concerning system
safety, and related
documentation.
A documented
body of evidence
that provides a
valid argument
that a system is
adequately safe for
a given application
in a given
environment.
All personnel
involved in
software safety
work.
Building a case for
safe systems.
2.8 Empirical Software Engineering
Empirical Software engineering is not software development per se, but a branch of
software engineering research and practice which emphasizes empirical studies to
investigate processes, methods, techniques and technology.
According to Votta et al., the goal of empirical software engineering is to construct a
“credible empirical base ... that is of value to both professional developers and
researchers” [Votta95]. They argue that empirical software engineering inherits most of
the methodological approaches and techniques of social sciences, since its goal is to
examine complex social settings, contexts where the human interaction is the most
critical factor determining the quality and effectiveness of the results being produced. In
particular, empirical work is accomplished through the execution of empirical studies.
This entails observations of specific settings, where the purpose is collecting and
analysing/deriving useful information on their behaviour and attributes.
Empirical studies can be classified in three categories, according to the increasing
degree of rigor and confidence in the results of the study [Wohlin00]:
(1) Surveys
(2) Case studies
(3) Controlled experiments
26
This is in line with the classification made by Votta et al. where the different types of
investigations are said to be anecdotal studies, case studies and experiments [Votta95].
It can be argued that surveys and case studies should swap positions, as case studies do
tend to be hard to replicate and very open ended, while surveys can be made very
rigorous and are definitely suitable for replication.
Zelkowitz et al. describe different ways of studying technology [Zelkowitz98], and
states that the empirical method is the following: “Empirical method: A statistical
method is proposed as a means to validate a given hypothesis. Unlike the scientific
method, there may not be a formal model or theory describing the hypothesis. Data is
collected to verify the hypothesis.” Zelkowitz et al. describe 12 different ways of
studying technology, in three dimensions: Observational, Historical and Controlled. The
different ways are described in detail in [Zelkowitz98], and the the models they propose
for validating technology are shown in table 2-3. In addition to these, there are a few
other techniques commonly used in empirical software engineering. Firstly, as
previously mentioned Wohlin et al. discusses surveys in [Wohlin00]. Secondly, the
Action Research method is presented by Avison et al. This method involves researchers
and practitioners acting together on a set of activities, including problem diagnosis,
action intervention and reflective learning [Avison99]. Finally, a research method that is
becoming more used in studies of software developing organizations is Grounded
Theory which emphasizes generation of theory from data. This method originates from
the sociologists Barney Glaser and Anselm Strauss and is presented in [Strauss98].
Table 2-3
12 ways of studying technology, from [Zelkowitz98]
Validation method Category
Description
Project monitoring
Case study
Assertion
Field study
Literature search
Legacy
Lessons learned
Static analysis
Replicated
Synthetic
Dynamic analysis
Simulation
Observational
Observational
Observational
Observational
Historical
Historical
Historical
Historical
Controlled
Controlled
Controlled
Controlled
Collect development data
Monitor project in depth
Use ad hoc validation techniques
Monitor multiple projects
Examine previously published studies
Examine data from completed projects
Examine qualitative data from completed projects
Examine structure of developed product
Develop multiple versions of product
Replicate one factor in laboratory setting
Execute developed product for performance
Execute product with artificial data
2.8.1 Research strategies in Empirical Research
There are three main types of research strategies, each with distinct approaches to
empirical studies, and they may all be used for Empirical Software Engineering
[Wohlin00][Seaman99]:
• Quantitative approaches are mainly used to quantify a relationship or
comparing groups, with the aim being to identify cause-effect relationships,
verifying hypotheses or testing theories.
27
•
•
Qualitative approaches are observational studies with the aim of interpreting a
phenomenon based on information collected from various sources. This
information is usually subjective and non-numeric.
Mixed-method approaches are used to overcome limitations in the two
strategies above, by triangulation of data and combining the advantages of the
two former ones.
Table 2-4 gives an overview of empirical research approaches and examples of
strategies for each. This is taken from [Moghaghegi04b] and [Creswell03].
The boundaries between these approaches are not sharp. For instance, case studies can
combine qualitative and qualitative studies, and although case studies are often classed
as qualitative in nature Yin states that case studies do not by their nature equal
qualitative research [Yin03].
Table 2-4
Strategies
Methods
Knowledge
claims
Empirical research approaches
Approaches
Quantitative
Qualitative
Mixed methods
• Sequential
• Experimental
• Ethnographies
design
• Concurrent
• Grounded theory
• Surveys
• Transformative
• Case studies
• Case studies
• Surveys
• Emerging methods • Both
• Predetermined
predetermined and
• Open-ended
• Instrument-based
emerging methods
questions
questions
• Multiple forms of
• Interview data
• Numeric data
data drawing on
• Statistical analysis • Observation data
several
• Document data
possibilities
• Text and image
• Statistical and text
analysis
analysis
Postpositivism:
Constructivism:
Pragmatism:
• Theory
• Theory generation • Consequences of
verification
action
• Understanding
• Empirical
• Problem-centered
• Interpretations of
observation and
data
• Pluralistic
measurement
28
2.9 Main challenges in business-critical software engineering
The following is a short list of some of the general challenges in software engineering
today. They are sometimes cited as reasons for difficulties in software engineering, e.g.
in [Charette05]:
• Poor Requirements
• Rising Complexity
• Ongoing Change
• Failure to Pinpoint Causes
In our work, we concentrate on issues dealing with reduction of product risk, improving
requirement specifications, coping with complexity, and helping with pinpointing
causes for failures. This adds up to a main challenge for business-critical software
development:
• Developing methods that help reduce product risk, without increasing costs too
much compared to the received benefit from these methods.
What this means is that we need to introduce low cost methods and techniques that
focus on important areas, so as to spend just a little more effort to reduce the largest
problems. This is in line with Boehm’s notion of value-based software engineering,
where the main agenda is developing an overall framework to better focus effort where
it is needed [Boehm03].
As far as empirical software engineering research is concerned, an important challenge
is that of following up research and technology change proposals with continued
observation and measurement in the field when practitioners put theory into practice.
Much of the research being performed concerns the software development process up to
a point, but does not follow the eventual implementation of these results further.
29
30
3
Research Context and Design
In this chapter, the project context is presented with more details. The overall research
design is presented, and it combines quantitative and qualitative studies. Finally a more
detailed description of each study is given.
3.1 BUCS Context
In the last decade, computers have taken on a more important role in several areas of
commercial business. As an effect of this, many of the functions required by industry
and services depend on software and computer systems. Failures in such systems can
have serious consequences for businesses that depend on these systems for their
livelihood. As in many related areas, there will be substantial savings by discovering,
reducing or removing these potential failures early in the system’s life-cycle. In fact,
most potential failures should be possible to reduce very early in the process of the
system’s development.
The main goal for the BUCS project is to better understand and to improve the software
technologies and processes used for developing business-critical software. Much of the
information about current practises and possible problems was collected at Norwegian
IT companies. It was important that the relationship between the BUCS project and the
involved companies and organizations was based on mutual profit for both parties.
The BUCS project have – through literature reviews, controlled experiments, historical
data analysis and case studies – investigated methods from the area of safety-critical
software. These methods include Preliminary Hazard Analysis (PHA), HazOp, Fault
Tree Analysis (FTA), Cause-Consequence Diagrams (CCD), and Failure Mode and
Effect Analysis (FMEA). We have also studied important standards in this area, such as
IEC 61508 – a standard for functional safety. The effects of both methods and standards
have been studied using controlled student experiments and through industrial case
studies. All the above-mentioned methods are rather general. They can be applied to
both local and distributed systems, they can be used on hardware, software and
“wetware” (people). This is specially important when we are dealing with a problem as
wide and diverse as business criticality.
Important techniques that we have sought to adapt from the development of safetycritical software are mainly PHA, HazOp, and FMEA.
31
As well as the industrial focus from the BUCS basic R&D project, some of the studies
in this thesis also cooperated with organizations that were involved in the EVISOFT
project, an industrially-driven research project [EVISOFT].
3.2 Research Focus
When deciding the focus areas of this thesis, the input was the BUCS project context,
less-exploited research areas, and available sources for research data. During our
literature studies and after contact with Norwegian IT-companies some key areas to
focus our work in this thesis was decided:
• Business-criticality in terms of software faults
• Fault report analysis
• Fault reporting in software development
In terms of goals for the research, we formulated the following research questions:
• RQ1. What is the role of fault reporting in existing industrial software
development?
• RQ2. How can we improve on existing fault reporting processes?
• RQ3. What are the most common and severe fault types, and how can we reduce
them in number and severity?
• RQ4. How can we use safety analysis techniques together with failure report
analysis to improve the development process?
To obtain answers to these research questions, we decided on common metrics for our
studies. We started broadly, including attributes like structural fault location, functional
fault location and fault repair effort. When we received actual data from industrial
projects, we had to reduce the scope somewhat. This was due to lack of complete
information in the data material, and great variation between organizations on what data
they stored.
The main metrics we identified for fault report studies were:
• The number of detected faults is an indirect metric, attained simply by counting
the number of faults of a certain type or for a certain system part etc.
• The metrics that are used directly from the data in the fault reports are the
reported type, severity, priority, and release version of the fault.
The reasons why we decided to focus on software faults were several:
• The BUCS project is concerned with business-critical systems. A recurring
theme in the definition of business-criticality is that the major threat to such
systems is failures that stop or limit the use of the system. As described in
section 2.4, “Faults are potential flaws in a software system, that later may be
activated to produce an error. An error is the execution of a "passive fault",
leading to a failure.” This means that by working to reduce the number and
criticality of faults in the software, we would also reduce the number or
frequency of failures.
32
•
•
As Avizienis et al. suggests, one way of attaining better dependability in a
system is fault removal to reduce the number and severity of faults
[Avizienis04]. By working to identify critical or numerous fault types,
developers can eliminate a larger number of faults by focusing on preventing
such fault types.
As far as industrial data, fault report data is abundant in most software
developing organizations. Thus we had a wide array of potential industrial
partners to collect data from.
As shown in Figure 1-1, the studies we have performed are all connected on the topic of
business-critical software and fault report analysis, and have been performed in
sequence. A short description of the studies is given in Table 3-1, and each study is
elaborated in Section 3.3.
Study #
Study 1
(2003)
Table 3-1
Description of our studies
Description
Structured interviews: Preliminary interviews about businesscritical software and state-of-practice in Norwegian IT industry.
Study 2
(2004)
Literature review: Software Criticality Techniques, Fault
reporting and management literature.
Study 3
(2004-05)
Historical data mining: Fault report analysis of industrial
projects from four organizations.
Study 4
(2006-07)
Historical data mining: Fault report analysis of industrial
projects.
Study 5
(2007)
Structured interviews: Exploring the results from study 4 further,
regarding fault report analysis and fault reporting processes.
Study 6
(2006-07)
Case study: Comparison of hazard analysis and fault report
analysis in a practical setting.
Study 7
(2004-07)
Lessons learned from our experiences with fault report studies.
3.2.1 Data collection
Before starting to plan and conduct our empirical studies, we decided on the goals of
our studies, which types of studies we were going to perform, and which data sources
we were going to need to complete them. Data collection was split up in three phases.
First, there was a pre-study phase of initial data collection and pre-analysis of this to
narrow down research areas and questions. Second, we would focus the data collection
33
on more deeper issues that seemed the most relevant. Finally, there would be an
analysis phase to summarize and reflect and to collect lessons learned.
As the BUCS project is aimed at supporting business-critical systems, and part of the
BUCS goals was close cooperation with Norwegian IT industry, we chose early on to
focus on empirical studies of Norwegian commercial projects and organizations. This
meant that we had to contact and select relevant organizations developing businesscritical applications. This raised the issue of which organizations we wanted to study.
As it turned out, the sampling of companies was mostly done out of convenience,
because of apparent reluctance to disclose sensitive information about quality data and
processes, as well as many organizations simply being “too busy”.
Another issue was whether data collection should be performed in pre-implementation
phases or post implementation ones. Pre-implementation studies are better for working
with possibilities for improvement initiatives, but a problem is knowing which data to
collect. In this case the data would be more of a qualitative nature, and thus harder to
analyse. Post-implementation studies, on the other hand, would be better for obtaining
quantified data, but then there is the question about the data being relevant for
investigation.
Once the studies to be performed were tentatively planned, the actual data collection
depended on the available projects and their data, i.e. which companies we were able to
cooperate with, and which processes those companies were willing to let us participate
in. The employed data collection methods were interviews, surveys, and field/case
studies, as well as historical data mining and analysis of reports or logs on relevant
issues.
Because of the nature of historical data analysis, some of our research was based on
bottom-up data collection. That is, we needed to examine the data material prior to
being able to formulate research questions and goals. As Basili et al. states in [Basili94],
data collection should ideally proceed in a top-down rather than a bottom up fashion,
e.g. by employing GQM to define relevant metrics [Solingen99]. However, some
reasons for why bottom-up studies also are useful, are given in [Mohaghegi04c]:
1. There is a gap between the state of the art (best theories) and the state of the
practice (current practices). Therefore, most data gathered in companies’
repositories are not collected following the GQM paradigm.
2. Many projects have been running for a while without having improvement
programs and may later want to start one. The projects want to assess the
usefulness of the data that is already collected and to relate data to goals (reverse
GQM).
3. Even if a company has a measurement program with defined goals and metrics,
these programs need improvements from bottom-up studies.
Exploring industrial data repositories can be part of an exploratory study (identifying
relations and trends in data) or a formal study (confirmatory) to validate other or newer
theories than those originally underlying the collected data.
34
3.3 Research approach and research design
This section explains the research design used to collect and analyze the relevant data.
The thesis combines qualitative and quantitative techniques, mainly by using
quantitative studies on historical data sources and qualitative studies on practice and
processes. The reasons for combining these different types of studies are the following:
•
•
By doing quantitative studies of ongoing commercial projects or by reusing
historical data, we could collect information about real life projects.
Results of these studies were confirmed by other studies using other and often
qualitative methods, thus triangulating the data and results.
The research design for each individual study has been both bottom-up and top-down.
The chosen design has depended on the maturity of the research and available
information. Some of the research questions were a result of our literature studies and
common work in the BUCS project, in a top-down manner. Other research questions
were bottom-up, because of the available data sets and the actual practices in the
organizations we studied.
The research can be split into three phases, as shown in Figure 1-1:
Phase 1: Literature studies of state-of-the-art and industrial interviews to increase the
understanding of practice (top-down research qustions) (Study 1 and 2).
Phase 2: Quantitative studies of fault reports. This started with a bottom-up exploratory
(Study 3) study and continues with top-down confirmatory studies (Study 4).
Phase 3: Qualitative studies to expand the knowledge gained from the quantitative
studies (top-down research questions) (Study 5, 6 and 7).
Sections 3.3.1 through 3.3.7 explain the research design and practical setting for each of
the studies that make up this thesis.
3.3.1 Study 1: Interviews with company representatives
To establish a basis for the most commonly used methods and most common problems
encountered in companies that develop business-critical software, several semistructured interviews were carried out with representatives from cooperating companies.
These companies were chosen both for relevance to ‘business-critical’ issues, but also in
some ways out of convenience of location and availability.
Before the interviews, a list of topics were discussed and decided, on which the later
interviews/talks with the company representatives were based.
Research questions for Study 1:
35
RQ.S1.a: How the use of well known software development methods may improve
business-critical system development?
RQ.S1.b: Do companies know much about safety-critical methods at all? If so, how do
they view the possibility of using such safety methods to improve business-critical
system development?
RQ.S1.c: What are the most common reliability/safety-related problems in businesscritical system development? – We must identify the most important factors leading to
failures or accidents.
We also wanted answers to questions such as:
• What are the most important hindrances for achieving high quality products
when developing business-critical software?
• How does industry handle these problems now?
• What are the most important problems encountered during the operation of
business-critical software?
• How can we remove or reduce these problems by changing the way businesscritical systems are developed, operated and maintained?
Validity comment. The main validity concerns in this study would be the relatively low
number of respondents and that the interviews were carried out by four different
researchers.
3.3.2 Study 2: Literature review - Software Criticality Techniques,
Fault reporting and management literature.
This study proposed a way to integrate software criticality techniques into a common
development regime like RUP. Taking the results from Study 1 into account, together
with a literature review of state-of-the-art in software engineering and safety methods,
we sought to combine the common and the special, by introducing special techniques
from safety related development into the common way of developing business-critical
software.
The research questions for Study 2 were:
RQ.S2.a: Which software criticality analysis techniques were most eligible for
introduction into a common development framework?
RQ.S2.b: Where in the development process would introduction of such techniques be
most effective or easiest to implement?
Validity comment. Being a literature review, we would not be able to validate any
findings further than referring to literature.
36
3.3.3 Study 3: First Empirical analysis of software faults in industrial
projects
This study looks at when and how faults have been introduced into a system under
development, and how they have been found and dealt with. By analysing fault-/change
reports for several (semi-)completed development projects, we wanted to investigate if
there are common causes for faults being introduced and not being discovered early
enough. The goal was to improve the knowledge about why and how faults are
introduced, and how we can identify and rectify them earlier in the software.
This study is based on historical data collection/data mining, where the data consists of
fault reports we have received from four commercial projects in four different
companies. The steps of the study were the following:
1. Define study goals and research questions.
2. Contact eligible companies for cooperation.
3. Select suitable projects for study and agree on cooperation practicalities.
4. Collect and convert data from projects.
5. Filter data – extracting only fault reports from the total data sets (which in some
cases included change reports), and removing duplicate data.
6. Categorize faults according to fault type, software module and severity.
7. Analyze resulting data sets by comparing project internal data, as well as
projects against each other.
This investigation was mostly a bottom-up process, because of the initial uncertainty
about the available data from the potential participants. After establishing a dialogue
with the participating projects, and acquiring the actual fault reports, our initial research
questions and goals were altered accordingly.
Initially we wanted to find which types of faults that are most frequent, and if there are
some parts of the systems with higher fault-density than others. This also helps show if
the pre-defined fault taxonomy is suitable. When we know which types of faults
dominate and where these faults appear in the systems, we can focus on the most severe
faults to identify the most important targets for later improvement work.
The research questions for Study 3 are:
RQ.S3.a: Which types of faults are most typical for the different software components
and parts?
RQ.S3.b: Are certain types of faults considered to be more severe than others by the
developers?
Validity comment. Since the number of projects would not be large, we knew that
external validity was a concern. The differences in domain, environments and fault
reporting procedure, added to these concerns.
37
3.3.4 Study 4: Second Empirical analysis of software faults in
industrial projects
This study was based on the lessons learned in Study 3, with somewhat refined metrics
to make sure the data material was more suitable for this type of study. The research
design was similar to that of Study 3, i.e. it was a historical data collection/data mining
study to further explore and confirm the issues from Study 3. In this study, we had
access to five projects from one company.
This investigation was a top-down study, as we had identified our research goals before
initiating the study. The research questions for Study 4 are:
RQ.S4.a: Which types of faults are the most common for the studied projects?
RQ.S4.b: Which fault types are rated as the most severe faults?
RQ.S4.c: How do the results of this study compare with our previous fault report study
(Study3)?
Validity comment. For this study, we had more fault reports and more projects to
study, but everything would be collected from the same organization. Again this would
impact external validity.
3.3.5 Study 5: Interviews focusing on empirical results
Study 5 was a qualitative study where we interviewed representatives that had been
involved in the five projects we studied in Study 4. We performed semi-structured
interviews using an interview guide with seven main topics and 32 questions.
We selected interviewees who had been actively involved in some of the five projects
we had studied in this organization before and who also had hands-on experience with
fault management in the same projects. The interviews were conducted as open-ended
interviews, with the same questions asked to each interviewee. However, the
interviewees were given room to talk about what they felt was important within the
topic of the question.
Each question in the interview guide was related to one or more local research
questions, and the different responses for each question were compared to extract
answers related to the research questions. In line with using the constant comparison
method, we coded each answer into groups. The codes were postformed, i.e. constructed
as a part of the coding process, since the interviews were open-ended. Additionally, we
received feedback about the topic at hand through discussions and comments during
two workshops that were held in the organization in conjunction with the fault report
study and interviews.
38
This study is based on the results from Study 4, on fault reports. The main research
questions for this study were therefore derived from the researchers’ viewpoint in Study
4.
Firstly, we wished to see if the experience of the practitioners in the actual projects was
similar to the analysis results we had found. Secondly, we wanted to draw on their
experience to hear if they thought a common fault type classification scheme could be
helpful towards improving their development processes. We also wanted to hear their
opinions on possibly increasing the effort in data collection and fault report analysis in
order to improve their software development processes. Lastly, we wanted to ask them
where they thought that there was most potential of improvement in their fault
management system, to elicit areas that they felt were lacking in their current fault
reporting process.
This lead to the following four research questions for Study 5:
RQ.S5.a: How can the large number of identified faults from early development phases
be explained?
RQ.S5.b: Can the introduction of a standard fault classification scheme like
Orthogonal Defect Classification (ODC) be useful to improve development processes?
RQ.S5.c: Do they see feedback from fault report analysis as a useful software process
improvement tool?
RQ.S5.d: Do they see any potential improvement areas in their fault management
system?
Summed up, the main topics covered in the interviews were:
• The results from our quantitative Study 4 of their development projects,
• The organization’s own measurements of faults,
• Their existing quality and fault management system,
• Fault categorization and fault management,
• Communicating feedback from fault reporting to developers,
• Attitudes to process change and quality improvement for fault management.
Validity comment. The interviews, transcription and data coding would all be
performed by one person, which was a threat to internal validity. In addition there was a
relatively low number of interviews, which would affect external validity.
3.3.6 Study 6: Comparing results from Hazard Analysis and analysis
of Faults
This study was prompted by our experiences with fault report analysis, and how some
of the faults were comparable to hazards identified from hazard analysis.
39
By conducting a qualitative hazard analysis of a small existing web-application and
database concept/specification, and comparing the results with a quantitative fault report
analysis of the actual completed system, we wanted to explore the possibility of using
the PHA hazard analysis method to reduce the number of faults being introduced into a
system.
The fault report analysis was performed in the same manner as in Studies 3 and 4, and
applied on the fault reports we received from the maintainers of the DAIM system. The
hazard analysis of the DAIM system was performed by a group of BUCS project
researchers, and was performed in a series of PHA sessions. Finally the results of the
two analyses were compared.
The three research questions for Study 6 were the following:
RQ.S6.a: What kind of faults in terms of Orthogonal Defect Classification (ODC) fault
types does the PHA technique help elicit?
RQ.S6.b: How does the distribution of fault types found in the fault analysis compare to
the one found in the PHA?
RQ.S6.c: Does the PHA technique identify potential hazards that also actually appear
as faults in the software?
Validity comment. Being a study of just one system, external validity would be weak.
Another concern was construct validity, as we would be making a comparison of
hazards and faults, which are two different concepts.
3.3.7 Study 7: Fault management and reporting
This study does not have explicit research questions, but is a compilation of lessons
learned over the course of studying fault management and fault reporting in several
different organizations. This was based on our experience from collecting and analysing
fault reports as well as from literature studies and feedback from the organizations
involved in our studies.
Validity comment. The main validity concerns is that our experience comes from a
limited number of organizations, and our main means of validating the lessons learned
is comparison with literature review.
40
3.4 Overview of the studies
In Table 3-2, an overview of the studies is given, together with short description of the
type of study.
Table 3-2
Types of studies in this thesis
Study Description
Type
1
Interviews with company
Qualitative, explorative
representatives
2
Literature study
Qualitative, descriptive
3
Fault report study of four projects
Quantitative, explorative
4
Fault report study of five projects
Quantitative, confirmative
5
Fault reporting and management
Qualitative, confirmative
interviews
6
Hazard analysis vs. fault report
Quantitative and
analysis - DAIM
qualitative, combining two
different types of results.
7
Fault management and reporting
Qualitative, descriptive
Paper
(P1)
P1
P2, P3, P5
P4
P6
P7
(P8)
Table 3-3 shows how the local research questions for each study relates to the main
research questions in this thesis.
Table 3-3
Relation between main and local research questions
Main research questions Local research questions
RQ1
RQ.S1.a, RQ.S1.c
RQ.S5.c, RQ.S5.d
RQ2
RQ.S5.b
RQ3
RQ.S3.a, RQ.S3.b
RQ.S4.a, RQ.S4.b, RQ.S4.c
RQ.S5.a
RQ.S6.b
RQ4
RQ.S1.b
RQ.S2.a, RQ.S2.b
RQ.S6.a, RQ.S6.c
41
42
4
Results
This chapter summarizes the research results for each of the studies. The results are
reported in more detail in the papers in Appendix A, but this chapter also includes some
results of work that so far have not been reported in papers.
4.1 Study 1: Preliminary
representatives (used in P1)
Interviews
with
company
In order to learn more about the way business-critical software projects are being
executed, we sought out a few companies and conducted short interviews with
representatives from these companies. Eight interviews were conducted in eight
different companies. The companies were picked partly for being representative among
Norwegian IT industry, partly because of our suspicions of their relevance to the
business-critical topic, and also partly because of convenience with respect to
geographic location and general availability. The companies were represented by
persons in different positions in the company structure, from directors to project
managers and developers. The interviews lasted 30-45 minutes, and each interview was
performed by one researcher taking notes. The questions, or topics, had been worked
out beforehand. They were partly taken from literature studies, and dealt with areas we
felt were important to solicit answers to this early in the project. After the interviews,
the researchers compiled and wrote up an internal BUCS technical report, for use as
future reference for the BUCS project members [Stålhane03].
The main results to be extracted form the interview sessions were the following:
• The industry defines the term ‘business-critical’ as something that is related to
their economy, their reputation and their position in the market.
• RUP or some variant of it, is common among companies who actually employ
some specified process.
• Business-critical software development is a very common activity among
software development companies.
• A typical problem in development of business-critical software is
communication, both within the company and towards the customer.
• The companies generally do not consider the technical risk aspects of a project
in detail, perhaps mainly due to a lack of an instrument for this.
Contributions of Study 1:
The purpose of these interviews was to elicit knowledge about how the situation in
Norwegian IT industry was with respect to development of business-critical software.
43
As this was the first investigation of the BUCS project, the goal was in the main part to
get an overview and a general impression of the situation. Also, it was intended as a
basis for further work, both for further empirical studies, and as a tool to help us focus
future research.
This study was the first step towards the main contribution C1: “Describing how to
utilize safety criticality techniques to improve the development process for businesscritical software.”
4.2 Study 2: Combining safety methods in the BUCS project
(Paper P1)
Study 2 was carried out by doing a literature review of software engineering practices
and safety criticality analysis methods. We wanted to propose a way to combine these
into a more unified tool set.
P1. Jon Arvid Børretzen, Tor Stålhane, Torgrim Lauritsen, and Per Trygve Myhrer:
"Safety activities during early software project phases"
Abstract to P1: This paper describes how methods taken from safety-critical practises
can be used in development of business-critical software. The emphasis is on the early
phases of product development, and on use together with the Rational Unified Process.
One important part of the early project phases is to define safety requirements for the
system. This means that in addition to satisfying the need for functional system
requirements, non-functional requirements about system safety must also be included.
By using information that already is required or produced in the first phases of RUP
together with some suitable “safety methods”, we are able to produce a complete set of
safety requirements for a business-critical system before the system design process is
started.
In P1, we showed how the Preliminary Hazard Analysis, Hazard and Operability
Analysis and Safety Case methods can be used together in the RUP inception phase, to
help produce a safety requirements specification, this is illustrated in Figure 4-1. The
shown example is simple, but demonstrates how the combination of these methods will
work in this context. By building on information made available in an iterative
development process like RUP, we can use the presented methods to improve the
process for producing a safety requirements specification. The paper also emphasizes
that early development phases are prime candidates for efficient safety analysis work.
44
Customer
Requirements
Environment
PHA and/ or
HazOp
Safety
Requirements
Figure 4-1
Safety case
Combining PHA/HazOp and Safety Case
Contributions of Study 2:
The contribution of this study was showing possible integration of a common software
development method with techniques taken from development of safety-critical
systems. This study thus supports the main contribution C1.
4.3 Study 3: Fault report analysis (Papers P2, P3, P5)
The work and results of Study 3 has been presented in three papers P2 (main), P3 and
P5. The basis was a quantitative study of fault reports in four companies.
P2. Jon Arvid Børretzen and Reidar Conradi: "A study of Fault Reports in Commercial
Projects"
Abstract to P2: Faults introduced into systems during development are costly to fix,
and especially so for business-critical systems. These systems are developed using
common development practices, but have high requirements for dependability. This
paper reports on an ongoing investigation of fault reports from Norwegian IT
companies, where the aim is to seek a better understanding on faults that have been
found during development and how this may affect the quality of the system. Our
objective in this paper is to investigate the fault profiles of four business-critical
commercial projects to explore if there are differences in the way faults appear in
different systems. We have conducted an empirical study by collecting fault reports
from several industrial projects, comparing findings from projects where components
and reuse have been core strategies with more traditional development projects.
Findings show that some specific fault types are generally dominant across reports from
all projects, and that some fault types are rated as more severe than others.
45
P3. Parastoo Mohagheghi, Reidar Conradi, and Jon A. Børretzen: "Revisiting the
Problem of Using Problem Reports for Quality Assessment”
Abstract to P3: In this paper, we describe our experience with using problem reports
from industry for quality assessment. The non-uniform terminology used in problem
reports and validity concerns have been subject of earlier research but are far from
settled. To distinguish between terms such as defects or errors, we propose to answer
three questions on the scope of a study related to what (problem appearance or its
cause), where (problems related to software; executable or not; or system), and when
(problems recorded in all development life cycles or some of them). Challenges in
defining research questions and metrics, collecting and analyzing data, generalizing the
results and reporting them are discussed. Ambiguity in defining problem report fields
and missing, inconsistent or wrong data threatens the value of collected evidence. Some
of these concerns could be settled by answering some basic questions related to the
problem reporting fields and improving data collection routines and tools.
P5. Jingyue Li, Anita Gupta, Jon Arvid Børretzen, and Reidar Conradi: "The Empirical
Studies on Quality Benefits of Reusing Software Components"
Abstract to P5: The benefits of reusing software components have been studied for
many years. Several previous studies have concluded that reused components have
fewer defects in general than non-reusable components. However, few of these studies
have gone a further step, i.e., investigating which type of defects has been reduced
because of reuse. Thus, it is suspected that making a software component reusable will
automatically improve its quality. This paper presents an on-going industrial empirical
study on the quality benefits of reuse. We are going to compare the defects types, which
are classified by ODC (Orthogonal Defect Classification), of the reusable component
vs. the non-reusable components in several large and medium software systems. The
intention is to figure out which defects have been reduced because of reuse and the
reasons of the reduction.
Paper P2 was the main paper for this study, and it presented some results of an
investigation on fault reports in industrial projects. The main conclusions of this paper
were:
• When looking at all faults in all projects, “functional logic” faults were the
dominant fault type. For high severity faults, “functional logic” and “functional
state” faults were dominant. This is shown in Tables 4-1 and 4-2.
46
Table 4-1 Distribution of all
faults in fault type categories
Fault type
Assignment
Checking
Data
Documentatio
n
Environment
Funct. comp.
Funct. logic
Funct. state
GUI
I/O
Interface
Memory
Missing data
Missing funct.
Missing value
Performance
Wrong funct.
Wrong value
UNKNOWN
A
7%
4%
4%
0%
0%
13 %
20 %
0%
2%
0%
0%
0%
2%
13 %
4%
0%
0%
27 %
2%
Project
B
C
4%
1%
3%
2%
6%
5%
1%
2%
1%
29 %
25 %
8%
2%
4%
1%
0%
8%
1%
1%
1%
3%
2%
6%
1%
1%
49 %
3%
8%
1%
0%
0%
1%
8%
1%
3%
2%
3%
5%
Table 4-2 Distribution of all
faults in fault type categories
D
1%
1%
4%
Fault type
Assignment
Checking
Data
Documentatio
n
Environment
Funct. comp.
Funct. logic
Funct. state
GUI
I/O
Interface
Memory
Missing data
Missing funct.
Missing value
Performance
Wrong funct.
Wrong value
UNKNOWN
3%
0%
0%
58 %
5%
7%
0%
0%
0%
2%
3%
1%
1%
1%
4%
8%
A
7%
4%
4%
0%
0%
13 %
20 %
0%
2%
0%
0%
0%
2%
13 %
4%
0%
0%
27 %
2%
Project
B
C
4%
1%
3%
2%
6%
5%
1%
2%
1%
29 %
25 %
8%
2%
4%
1%
0%
8%
1%
1%
1%
3%
2%
6%
1%
1%
49 %
3%
8%
1%
0%
0%
1%
8%
1%
3%
2%
3%
5%
D
1%
1%
4%
3%
0%
0%
58 %
5%
7%
0%
0%
0%
2%
3%
1%
1%
1%
4%
8%
Also, we saw that some fault types were rated more severe than others, for instance
“Memory fault”. However, the fault type “GUI fault” was rated as less severe for the
two projects employing systematic software reuse in development, this is illustrated in
Figure 4-2.
Wro ng value used
Wro ng functio n called
P erfo rmance fault
M issing value
M issing functio nality
M issing data
M emo ry fault
D
Interface fault
C
I/O fault
B
GUI fault
Functio n fault state
Functio n fault lo gic
Enviro nment fault
Data fault
A ssignment fault
0,0 %
20,0 %
40,0 %
60,0 %
80,0 %
100,0 %
Figure 4-2 Percentage of high severity faults in some fault categories
The main conclusions of P3 were the following: We identified three questions that
define a fault: what- whether the term applies to manifestation of a problem or its cause,
where- whether problems are related to software or the environment supporting it as
well, and whether the problems are related to executable software or all types of
artifacts, and when- whether the problem reporting system records problems detected in
all or some life cycle phases. We also described how data from problem reports may be
used to evaluate quality from different quality views, as shown in Figure 4-3, and how
measures from problem or defect data is one the few measures used in all quality views.
Finally, we discussed how data from problem reports should be collected and analyzed
and what is the validity concerns using such reports for evaluating quality.
47
Q1
. qu
alit
y-
l
rna
x te s
d e etric
n
a
m
nal ality
ter
. in ct qu
2
u
Q od
developers
pr
inuse
s
n
o
Defect
cti
rre
data
co rk
f
o
o
ss,
e
lue f rew
r
a
.v to
rog
t p ing
Q5 . cos
jec lann
o
vs
r
.p ep
Q4 ourc
res
user
Q3
.p
roc
ess
qua
lity
me
tric
s
quality manager
project leader
Figure 4-3 Quality views associated to defect data, and their relations
In P5, we presented the research design of an on-going empirical study to investigate
the benefit or cost of software reuse on software quality. By analyzing the defect reports
of several software systems, which include both reusable and non-reusable components,
we planned to deepen the understanding on why reuse improves the quality of software.
This paper also described the future work; to collect data from project with different
contexts, such as application domains, technologies, and development processes, in
order to find the common good practices and lessons learned of software reuse.
Contributions of Study 3:
In paper P2, the contributions were the description of the most typical and severe faults
found by analyzing fault reports, which was related to main contribution C3: “Improved
model of fault origins and types for business-critical software”.
In paper P3, we described our experience with using fault reports for quality
assessment, and in answering three questions about what, where and when faults are and
how they are discovered, we showed that improvements in how faults are described and
worked with are needed. This was related to the main contribution C2: “Identification of
typical shortcomings in fault reporting”.
The contribution in P5 was using fault categorization to compare defect types of reused
and non-reused components, which was related to main contribution C3.
4.4 Study 4: Fault report analysis (Paper P4)
P4. Jon Arvid Børretzen and Jostein Dyre-Hansen: “Investigating the Software Fault
Profile of Industrial Projects to Determine Process Improvement Areas: An Empirical
Study”
Abstract to P4: Improving software processes relies on the ability to analyze previous
projects and derive which parts of the process that should be focused on for
improvement. All software projects encounter software faults during development and
have to put much effort into locating and fixing these. A lot of information is produced
when handling faults, through fault reports. This paper reports a study of fault reports
48
from industrial projects, where we seek a better understanding of faults that have been
reported during development and how this may affect the quality of the system. We
investigated the fault profiles of five business-critical industrial projects by data mining
to explore if there were significant trends in the way faults appear in these systems. We
wanted to see if any types of faults dominate, and whether some types of faults were
reported as being more severe than others. Our findings show that one specific fault
type is generally dominant across reports from all projects, and that some fault types are
rated as more severe than others. From this we could propose that the organization
studied should increase effort in the design phase in order to improve software quality.
The results from P4 were the following:
• We have found that "function" faults, closely followed by "GUI" faults are the
fault types that occur most frequently in the projects as shown in Table 4-3. To
reduce the number of faults introduced in the systems, the organization should
focus on improving the processes which are most likely to contribute to these
types of faults, namely the specification and design phases of development.
Table 4-3
Fault type distribution across all projects
Fault type
# of faults %
Function
GUI
Unknown
Assignment
Checking
Data
Algorithm
Environment
Interface
Timing/Serialization
Relationship
Documentation
191
138
87
75
58
46
37
36
11
11
9
8
27,0 %
19,5 %
12,3 %
10,6 %
8,2 %
6,5 %
5,2 %
5,1 %
1,6 %
1,6 %
1,3 %
1,1 %
•
The most severe fault types were "relationship" and "timing/serialization" faults,
while the fault types "GUI" and "documentation" were considered the least
severe. This is illustrated in Figure 4-4. Although “function” faults were not
rated as the most severe, this fault type still dominates when looking at the
distribution of highly severe faults only.
•
We also observed that the organization’s fault reporting process could be
improved by adding additional information to the fault reports, e.g. fault location
(name of program module) and fault repair effort. This would facilitate more
effective targeting of fault types and locations in order to better focus future
efforts for improvement.
49
100 %
80 %
5 - Enhancement
60 %
4 - Cosmetic
3 - Can be circumvented
40 %
2- Can not be circumvented
1- Critical
20 %
D
at
a
Al
go
rit
h
En
m
vi
ro
nm
en
t
In
Ti
te
m
rfa
in
g/
c
e
Se
ri a
li z
at
io
R
n
el
at
io
ns
D
hi
oc
p
um
en
ta
ti o
n
G
U
I
nk
no
w
As
n
si
gn
m
en
t
C
he
ck
in
g
U
Fu
nc
tio
n
0%
Figure 4-4 Distribution of severity with respect to fault types for all projects
Contribution of Study 4:
In paper P4 describe findings on faults types and fault origins in commercial projects.
We also identified some issues that are common shortcomings in fault reporting. These
contributions relate to the main contributions C2 and C3.
4.5 Study 5: Interviewing practitioners about fault management
(Paper P6)
P6. Jon Arvid Børretzen: “Fault classification and fault management: Experiences from
a software developer perspective”
Abstract to P6: In most software development projects, faults are unintentionally
injected in the software, and are later found through inspection, testing or field use and
reported in order to be fixed later. The associated fault reports can have uses that go
beyond just fixing discovered faults. This paper presents the findings from interviews
performed with representatives involved in fault reporting and correcting processes in
different software projects. The main topics of the interviews were fault management
and fault reporting processes. The objective was to present practitioners’ view on fault
reporting, and in particular fault classification, as well as to expand and deepen the
knowledge gained from a previous study on the same projects. Through interviews and
use of Grounded Theory we wanted to find the potential weaknesses in a current fault
reporting process and elicit improvement areas and their motivation. The results show
that fault management could and should include steps to improve product quality. The
interviews also supported our quantitative findings in previous studies on the same
development projects, where much rework through fault fixing need to be done after
testing because areas of work in early stages of projects have been neglected.
50
The interviews were conducted by one interviewer, using an interview guide and a
digital voice recorder. These interviews were later transcribed and coded by the same
person. The main results of P6 were the following:
• The interviewees agreed with our conclusions from the previous quantitative
study from P4, i.e. that the early phases in their development process had
weaknesses that lead to a high number of software faults from early
development phases.
• They also expressed a need for better fault categorization in their fault reports, in
order to analyze previous projects with intention of improving their work
processes.
• The proposed ODC fault types were seen as a useful basis for introducing a
better fault classification scheme, although simplicity was important.
• They were positive to using fault report analysis feedback to improve
development processes, although introducing such analysis for regular use
would have to be done carefully in the organization.
• Finally, they revealed some areas in their fault reporting scheme that could be
improved in order to improve analysis usefulness, for instance including
attributes like fault finding and correction effort and component location of
fault. The knowledge was present, it was just not recorded formally.
Contributions of Study 5:
Our main contribution is showing that practitioners are motivated to use their existing
knowledge of software faults in a more extensive manner to improve their work
practices. These findings support our main contributions C2 and C3.
4.6 Study 6: Using hazard identification to identify faults (Paper
P7)
Abstract to P7: When designing a business-critical software system, early analysis
with correction of software faults and hazards (commonly called anomalies) may
improve the system’s reliability and safety, respectively. We wanted to investigate if
safety hazards, identified by Preliminary Hazard Analysis, could also be related to the
actual system faults that had been discovered and documented in existing fault reports
from testing and field use. A research method for this is the main contribution of this
paper. For validation, a small web-based database for management of student theses was
studied, using both Preliminary Hazard Analysis and analysis of fault reports. Our
findings showed that Preliminary Hazard Analysis was suited to find potential
specification and design faults in software.
P7 presented the description and an implementation of a novel method for identifying
software faults using the PHA technique. This method identified 6 faults that were
actually found in the system as well as 20 potential faults that may be in the system. We
51
also showed that there are certain types of faults that analysis techniques such as PHA
can help to uncover in an early process phase. Performing the PHA elicited many
hazards that could have been found in the system as “function” faults, as shown in
Figure 4-5. That is, faults which originate from early phases of system development,
and are related to the specification and design of the system. From this we conclude that
PHA can be useful for identifying hazards that are related to faults introduced early in
software development.
35,0
30,0
25,0
20,0
15,0
10,0
5,0
Figure 4-5
I
U
G
D
at
En
a
vi
ro
nm
en
As
t
si
gn
m
en
t
Ti
I
n
m
te
in
rfa
g/
ce
Se
ri a
l
i
D
za
oc
t
io
um
n
en
ta
ti o
n
D
up
l ic
at
R
e
el
at
io
ns
hi
p
U
nk
no
w
n
Fu
nc
tio
n
C
he
ck
in
g
N
ot
fa
ul
t
Al
go
rit
hm
0,0
Distribution of hazards represented as fault types (%)
As for finding direct ties between hazards found in PHA and faults reported in fault
reports, we were not very successful. This, we feel, was mainly due to the studied
system’s particular fault type profile which was very different from fault distribution
profiles we had found in earlier studies. Some weak links were found, but the data did
not support any systematic links.
Contributions of Study 6:
The main contribution of this paper was the description and implementation of the
method for identifying software faults using the PHA technique. The contributions of
this study are related to the main contributions C1 and C3.
4.7 Study 7: Experiences from fault report studies (Technical Report
P8)
This section will describe, sum up and reflect upon our experiences from several fault
reporting studies. It has not yet been written as a final paper, but this is planned in the
near future. See the technical report P8 in Appendix A.
P8. Jon Arvid Børretzen: “Diverse Fault management – a prestudy of industrial
practice”
Abstract to P8: This report describes our experiences with fault reports and fault
reporting from working with fault reports from several different organizations. Data
52
from projects we have studied is presented in order to show the variance and at times
lack of information in the reports used. Also we show that although useful process
information is readily available, it is seldom used or analyzed with process
improvement in mind. An important challenge is to describe to practitioners why using
a common description of faults is advantageous and also to propose a way to better use
the knowledge gained in colleting data about faults. The main contribution is to explain
why more effort should be put into the production of fault reports, and how this
information can be used to improve the software development process. We explain how
fault reports can become more useful just by including information that is already
available in development projects.
P8 presents an overview of studies performed concerning fault reports, and shows the
type of information that exists and is lacking from such reports. Our learnings include
that fault data is in some cases under-reported, and in most cases under-analyzed. By
including some of the information that the organization already has, more focused
analyses could be made possible. One possibility is to introduce a standard for fault
reporting, where the most important and useful fault information is mandatory.
Furthermore, we have learnt that the effort spent by external researchers to produce
useful results based on the available data is quite small compared to the collective effort
spent by developers recording this data. This shows that very little effort may give
substantial effects for many software developing organizations.
Finally, there are two main points we want to convey as a result of the studies we have
presented:
• It is important to be able to approach the subject of fault data analysis with a
bottom-up approach, at least in early phases of such research and analysis
initiatives. The data is readily available, the work that has to be performed is
designing and performing a study of these data.
• Much of the recorded fault data is of poor quality. This is most likely because of
the lack of interest in use of the data.
We are planning to write a final paper P8 to combine lessons learned from Study 3 and
5, cf. Section 4.3 and 4.5. This is partly in response to very positive review comments
on paper P3. The preliminary paper is presented as a Technical Report in Appendix A.
Contributions of Study 7:
This study directly identifies issues that are common shortcomings in fault reporting,
and suggests actions to improve and support the use of fault report analysis as at tool for
process improvement. These findings support our main contribution C2.
53
54
5
Evaluation and Discussion
This chapter intends to answer the four research questions RQ1-RQ4 based on the
results. This chapter discusses the relations between the thesis contributions and the
research questions. The research context, papers and BUCS goals are also discussed.
There is also a discussion of validity threats and experience from industrial cooperation.
5.1 Contributions
From Section 1.4 we reiterate the main contributions in this thesis and elaborate on
them:
C1. “Describing how to utilize safety criticality techniques to improve the
development process for business-critical software.”
• We have described ways of integrating of safety criticality techniques with
regular development practices to improve the development process for
business-critical software. We have proposed integrating safety techniques
like PHA and Hazop into early development phases in order to help improve
safety and reliability of the resulting software [P1], although this has not
been validated industrially. In addition we have shown that the PHA
technique is useful in eliciting hazards that are related to faults that are
introduced in early development process phases [P7].
C2. “Identification of typical shortcomings in fault reporting.”
• Through our studies on fault reports, we have described several issues
concerning shortcomings in fault reporting. The most striking is that
commercial organizations generally do not exploit the fault report data they
possess for more than day-to-day fault logging or at most shallow analysis.
Additionally, it is clear that fault reporting is treated more as a necessary
chore, than as a potential source for process improvement. Fault reports are
often inaccurate, incomplete or incomprehensible, which makes for poor
reusability for analysis. In addition fault data that could easily have been
recorded for process improvement gains, e.g. correction effort or location of
fault, are not even considered in fault reports.
C3. Improved model of fault origins and types for business-critical software.
• We have described studies to give insight in what fault types are most
common or severe in business-critical software. We found that the most
common faults were ones that originated from early process phases, namely
55
specification and design. We have also shown that certain fault types tend to
be more severe than others [P2][P4].
These contributions were described more briefly in Section 1.4. Table 5-1 shows the
relationship between the contributions C1-C3 and research questions RQ1-RQ4.
Table 5-1 Relationship of contributions and research questions.
Contribution
RQ1
RQ2
RQ3
RQ4
C1
X
X
C2
X
X
C3
X
X
X
5.1.1 Contributions related to BUCS goals
The relation between our contributions and the BUCS goals as defined in Section 1.1
are now considered:
BG1 To obtain a better understanding of the problems encountered by Norwegian
industry during development, operation and maintenance of business-critical
software.
Regarding BG1, we have found that early development phases like specification and
design are a source of a high number of faults in software. Lack of communication and
adequate tools and processes for describing development difficulties in these phases
seem to be the main problem. It is thought that the work in this thesis advances the
state-of-the-art of software engineering of business-critical software as defined by our
contributions C1-C3. Better understanding of problems encountered by Norwegian
industry is achieved, as is reflected in contributions C1 and C3.
BG2 Study the effects of introducing safety-critical methods and techniques into the
development of business-critical software, to reduce the number of system
failures (increased reliability).
Our studies on fault reports suggest concrete measures to reduce the largest group of
faults found in studies of business-critical software in our contribution C3. In addition,
we have found that lightweight hazard analysis such as the PHA method is useful in
eliciting hazards that could be avoided to reduce the number of faults originating in
early development phases, from our contribution C1.
BG3 Provide adapted and annotated methods and processes for business-critical
software.
Although the goal BG3 has not been an explicit focus of this thesis, we describe how
fault report analysis and certain hazard analysis methods can be used to improve the
development process, related to C1
BG4 Package and disseminate the effective methods into Norwegian software
industry.
56
Most results are published, or planned on being published, and presented at interational
and national conferences and workshops. During this thesis work, several Masters
students have directly or indirectly been involved in activities, project work or Masters
theses concerning business-critical systems and the BUCS project. Furthermore, the
knowledge gained from our studies in commercial organizations have been
disseminated back to them through reports and internal workshops. This relates to all
contributions C1-C3.
5.2 Contribution of this thesis vs. literature
In this section we present how our results and contributions compare with state-of-theart.
Looking at the wide perspective, our research on business-critical systems and software
has shown not to be directly comparable with much of the literature on software
engineering. This is something we were aware of from the start of the BUCS project.
The introduction of safety related methods into “regular” software engineering is not
common, many of the methods are still regarded as resource-hungry and rigid methods,
and this is difficult to combine with the emergence of agile and other lightweight
methods [Beck99]. On the other hand, there are many types of systems that demand a
more rigorous development process to ensure reliability and related qualities (e.g.
financial systems), and for these types of systems we have contributed both on a process
level and with techniques that could be applicable.
In our work, we have proposed a novel method for doing fault inspections of
specification and design documents [P6]. This adds to the existing literature on
inspections, for instance that of Basili et. al concerning perspective-based reading
[Basili00, Shull00].
The results of our quantitative studies on fault reports [P2][P4] show that in many
systems, faults originating from specification and design phases constitute a major part
of the total number of faults being found in testing. This is in line with the findings of
Vinter et al. [Vinter00], but in contrast to findings by Eldh et al., where a common type
of early process fault (function) was not very frequent [Eldh07]. Our fault report study
of a small frame simple system in [P7] did, however, show that systems have different
fault profiles .This may be as a result of both the type of system and the development
method used when designing and implementing the system.
Further, we have discussed the need for improving fault reporting as a support tool for
process improvement. Several sources present fault management processes as useful for
such improvement in a software organization, among others [Grady92, Chillarege92].
We support this stance and suggest how to better utilize the available fault information
[P3][P4][P6][P8].
57
5.3 Revisiting the Thesis Research Questions, RQ1-RQ4
In answering our four research questions, we have the following:
RQ1. What is the role of fault reporting in existing industrial software development?
a) Fault reporting seems to generally be underused and undervalued. Our
experience is that the recorded data is often not of high quality, which not only
makes any analysis hard, but also diminishes the usefulness of the fault reports
for fixing faults.
b) All software developing organizations have a fault reporting system in
operation, but its use differs substantially. The most basic fault reporting system
is only used as a means to document faults that have been found and that are to
be corrected, but more advanced use of the available data can easily be arranged.
c) Even where fault report data is thoroughly recorded and stored, it is not
systematically used as a tool for software process improvement. A lot of detailed
information is stored in the fault management systems of software organizations,
but is never used beyond the simplest applications.
RQ2. How can we improve on existing fault reporting processes?
a) Developers should be more conscious about the potential for improvement by
analyzing fault reports. Only through feedback on quality/fault data can an
organization “learn from their mistakes”.
b) We need more formalized reporting schemes, and clearly defined procedures for
reporting faults.
c) Introduce updated fault reporting schemes (fault type, severity, priority, effort,
location etc) for the organization’s needs, so that the correct and complete
information is reported. There is a need for a process looking at the requirements
and possibilities in each organization.
RQ3. What are the most common and severe fault types, and how can we reduce them
in number and severity?
a) P2 and P4 show that the most common fault type is the “function” fault type, i.e.
faults related to faults in the specification and design phases of development.
“GUI” faults are also numerous, and can in many cases also be related to
specification and design phases.
b) Our studies on safety-critical analysis techniques have shown that the PHA
technique is a useful tool for eliciting hazards that can be related to the fault
types that are most common [P1] [P6].
RQ4. How can we use safety analysis techniques together with failure report analysis
to improve the development process?
a) In P6, we have found that the PHA technique is useful for eliciting hazards that
can be related to faults that are commonly introduced in early development
phases.
58
5.4 Evaluation of validity
For the validity of the work in this thesis, there are some overall issues to be discussed.
Initial validity concerns of the individual studies are discussed for each study in Section
3, as well as in each individual paper.
To improve validity of the studies seen as a whole, some possible actions can be
performed:
1. Replication of studies, both over time and in other organizations. This applies
especially to the quantitative studies, in order to track development over and also
to ensure that the results are generalizable. Example: our fault report studies on
projects from five different organizations show very similar main results for
most projects [P2, P4].
2. Using different research strategies to triangulate the research results. By using
different research methods for the same study objects etc., we increase the
validity of the results. Examples: Fault report study combined with interview
sessions on the topic of fault report management [P4,P6], combining a
qualitative study and a quantitative one in the DAIM study [P7].
Wohlin et al. define four main categories of validity threats [Wohlin00], which are
further discussed in the next section, for different types of studies performed.
5.4.1 Quantitative studies: construct, internal, conclusion and
external validity
Studies 3, 4 and 6 used quantitative methods, and were mostly concerned with
analyzing fault report data. These data were collected from existing fault report
collections made by the organizations’ internal measures. Our contribution was the
categorization of faults in the data where this had not been performed. Some threats to
the validity of quantitative studies and how this was handled is described here:
• Construct validity: In study 6, the main threat to construct validity is the
conceptual difference between hazards and faults. We had to perform a
conversion of the hazards found to potential fault types. It should be verified
whether this type of hazard to fault type conversion is consequently correct, but
during hazard analysis, there was a discussion of how each hazard could
influence the system, and in many cases a software fault was proposed.
• Internal validity: In study 3, the greatest threat to internal validity is missing
data in the fault reports. Many fault reports were not described well enough to be
categorized and had to be left out. In certain fault reports, the fault had been
classified by the developers, and they may have had a different opinion of the
fault types than we had. In addition, with respect to severity of faults, it is not
certain that the developers reporting the fault necessarily reported the true
severity. In study 6, the hazard analysis sessions were time limited, so only the
most obvious hazards were taken into account. Also, these sessions were
59
•
•
performed over a period of time, so some maturation in the form of better
understanding of the system being analyzed can have occurred.
Conclusion validity: One possible threat to conclusion validity in study 3 and 4
is low reliability of measures, because of some missing and ambiguous data.
Because categorization of faults into fault types is a subjective task, it was
important that the data we based the categorization on was correct and
understandable. To prevent mistakes, we added an “unknown” type to filter out
the faults we were not able to confidently categorize. The subjective nature of
categorization is also a threat to conclusion validity.
External validity: Especially in study 6 where we only studied one project, but
also partly in studies 3 and 4 one threat to external validity is the relatively low
number of projects studied. In study 6 we were not able to gain access to system
documentation of more systems where we could also have fault report data. The
projects under study may also not necessarily be the most typical businesscritical systems, but this is hard to verify in any way.
5.4.2 Qualitative studies: internal and external validity.
Studies 1, 2, 5 and partly 6 are qualitative studies, mostly explorative and descriptive in
nature. The collected data is mainly from interviews and other subjective techniques
(PHA) and are subject to interpretation. Here we have identified internal and external
validity threats as the most serious.
• Internal validity. For Study 5, the main internal validity threat is that the same
person performed interviewing, transcribing and information coding, which may
introduce bias to how responses have been interpreted. By having workshops as
feedback sessions after the interviews, we feel bias have been reduced.
• External validity. In Study 1, the main validity threat was the low number of
organizations interviewed, and in Study 5 all interviews were performed with
representatives from the same organization, although this is explained by us
having to interview the people who had been involved in certain projects we had
studied earlier.
5.5 Industrial relevance of results
As many of our studies involved industrial data, our results were interesting not only to
us, but also to the organizations the data was collected from. As such, we were able to
present our results to the organizations and receive feedback both in terms of the results
of the studies and how we should interpret the results.
In general, the organizations received general reports on the results, but also a specific
report concerning the results from their organization. After Study 4, a workshop was
held in order to convey our results to the organization as well as to receive more
feedback.
60
5.6 Reflection: Research cooperation with industry
Both the BUCS and the EVISOFT research projects are based on cooperation with
industrial partners. Whereas EVISOFT had a number of industrial organizations
involved from the project start, the BUCS project had no formal connections to any
industrial partners as the project got under way. This meant that some effort had to be
made in order to initiate contact and agreement with organizations in order to collect
research data.
The hardest part of industrial cooperation was establishing contact and an agreement
about what was going to be performed. In Study 3, the first fault report study, we
initially contacted over 40 different organizations developing business-critical software.
Despite many positive responses initially, we ended up only being able to use fault
report data from four of them. There were two serious barriers for setting up
cooperation with commercial organizations. Firstly, we experienced unwillingness by
such organizations to disclose information about faults and failures in their systems,
despite promises of anonymization, Secondly, many of the organizations decided that
they were not able to spare the effort to facilitate our data collection, due to their own
deadlines. In addition to this, a few organizations chose to end their cooperation with us
before the data had been analyzed, because of resource issues. Finally, there was the
issue of lack of communication, in one instance we were ready to collect data for
analysis when it appeared that all but one fault report had been deleted from their fault
management system.
When performing the second fault report study, we were in contact with an organization
that was already involved in the EVISOFT project as a participating partner, which
made establishment of contact and research agreement much simpler.
However, a common issue through all our industrial cooperation was that since we were
external researchers who were just collecting and analyzing existing data, we were not
part of a planned sequence of events for the organization, and therefore were not
prioritized when times were busy.
61
62
6
Conclusions and future work
This thesis presents the results from several empirical studies investigating management
of fault reports within a business-critical software perspective. This is augmented by
work concerning business-critical software in general. We have combined literature
studies, quantitative studies of historical data sources, qualitative studies through
interviews of industry representatives, and a case study using both qualitative and
quantitative methods. By combining different empirical strategies in a mixed-method
research design, we could combine results and answer questions that had not been
answered previously.
This work analyzed historical fault data that the source organizations had not analyzed
in such a manner and to this extent. The results were backed up with interviews and
feedback from the involved organizations to improve the validity of the results.
6.1 Conclusions
6.1.1 Fault reporting as a tool for process improvement
Our findings show that there is much to gain by using fault report data to support
process improvement through reduction of faults. Our analyses showed that a large
number of faults had their origin in early development phases, something some of the
organizations’ studies had suspected but had not been able (or willing) to quantify.
We also uncovered a lack of consistency in fault reporting. Fault reports in an
organization did often not follow a strict standard, which could make it difficult for the
data to be used in an analytic fashion. Another finding is that many software
organizations are in possession of data resources concerning their own products and
processes that they do not exploit fully. Through better recording of available
information and simple analysis, many organizations could be able to focus process
improvement initiatives better.
Added to this, our work has also included literature studies of fault categorization
schemes. We have described how fault categorization and subsequent fault report
analysis could identify improvement areas of the development process.
6.1.2 Empirical findings
63
During our fault report studies of several industrial projects, we have presented results
on fault type frequency and severity that for larger business-critical applications seem to
be valid and general. Some fault types have been shown to be considerably more
frequent than others, and we have identified fault types that are likely to be more severe
than others.
Drawing on experience from others, we have concluded that many of the occurences of
the most frequent fault types that are reported have their origins in early phases like
system specification and design.
6.1.3 Software safety and reliability from a fault perspective
This thesis’ overall contribution is showing how a focus on fault management and
reporting in the software development process may pinpoint areas of improvement in
terms of software safety and reliability. We have also proposed how to utilize
techniques taken from safety analysis in software development to elicit and record
possible faults in the software. Our conclusion is that such techniques should be used
early in the development phases, both because suitable techniques like PHA works well
in early process phases, and also because identifying and correcting faults early is more
efficient than correcting them in later phases.
6.2 Future Work
This work has covered several aspects of fault management and the use of hazard
analysis techniques to improve the process of developing business-critical software.
Still, we see the need for more work in these areas, and the following sections propose
possible directions for future work.
6.2.1 Following fault reporting throughout the development process
The software projects under study during this thesis have all been more or less
completed development projects. Thus, we have not been able to get reports from all
phases of the development projects. The faults found and fixed in design phases and in
many cases also unit testing during implementation, have not been studied. By
including this information in fault studies, we could learn even more about the potential
for fault report analysis as a process improvement tool.
6.2.3 Further studies of Hazard Analysis results and fault reports
64
Combining hazard analysis and fault report analysis showed that hazard identification
could be helpful in eliciting possible hazardous events caused by faults possibly existing
in the system. Unfortunately the system we studied had a very different fault type
profile (mostly coding faults) than the other systems we had studied. This may have
been a contributing reason for the lack of actual faults being identified by hazard
analysis, although the number of potential faults found was high.
By performing a similar study on a system where the fault profile is more skewed
towards faults introduced in early development phases, we may have a larger portion of
faults found by the PHA technique and similar. This would be useful to validate this as
a useful technique for reducing faults.
65
66
Glossary
Term definitions
To address the relevant issues, we need reasonably precise definitions of the terms used.
The following contain a table of short definitions of some terms. Where relevant, they
are re-iterated and elaborated in the thesis. These terms are mostly taken from
[Conradi07].
Availability
The degree to which a system or component is operational and
accessible when required for use [IEEE 610.12].3
BUCS
BUsiness Critical Software – a basic R&D project at NTNU in 20032007 under the ICT-2010 program at the Research Council of Norway,
lead by Tor Stålhane. See http://www.idi.ntnu.no/grupper/su/bucs.html
Businesscritical
The ability of core computer and other support systems of a business to
have sufficient QoS to preserve the stability of the business
[Sommerville04].
Businesscritical
systems
Systems whose failure could threaten the stability of the business.
Criticality
A state of urgency. In this context to signify the graveness of the effects
a failure (i.e. erroneous external behaviour) in a system can have.
3
Thus reliability means that it continues to be available
67
Dependability The trustworthiness of a computing system which allows reliance to be
justifiably placed on the service it delivers [Avizienis01], an integrating
concept that encompasses the following attributes:
• Availability: readiness for correct service;
• Reliability: continuity of correct service;
• Safety: absence of catastrophic consequences on the user(s) and
the environment;
• Security: the concurrent existence of (a) availability for
authorized users only, (b) confidentiality, and (c) integrity.
In the later [Avizienis04], security is split off as a separate quality, and
dependability is rephrased as:
• Availability: readiness for correct service;
• Reliability: continuity of correct service;
• Safety: absence of catastrophic consequences on the user(s) and
the environment;
• Integrity: absence of improper system alterations;
• Maintainability: ability to undergo modifications and repairs.
Error
•
•
Failure
•
•
•
Fault
That at least one (or more) internal state of the system deviates from
the correct service state. The adjudged or hypothesized cause of an
error is called a fault. In most cases, a fault first causes an error in
the service state of a component that is a part of the internal state of
the system and the external state is not immediately affected. …
many errors do not reach the system’s external state and cause a
failure [Avizienis04].
The difference between a computed, observed, or measured value or
condition and the true, specified, or theoretically correct value or
condition. For example, a difference of 30 meters between a
computed result and the correct result [IEEE 610.12].
The non-performance or inability of the system or component to
perform its intended function for a specified time under specified
environmental conditions. A failure may be caused by design flaws
– the intended, designed and constructed behavior does not satisfy
the system goal [Leveson95].
The inability of a system or component to perform its required
function within specified performance requirements [IEEE 610.12].
Since a service is a sequence of the system’s external states, a
service failure means that at least one (or more) external state of the
system deviates from the correct service state [Avizienis04].
An incorrect step, process, or data definition in a computer program
[IEEE 610.12].
68
FMEA
Failure Mode and Effects Analysis (FMEA) is a risk assessment
technique for systematically identifying potential failures in a system or
a process.
Hazard
•
•
•
A Hazard is a physical situation with a potential for human injury
[IEC 61508]
A state or set of conditions that, together with other conditions in
the environment, will lead to an accident (loss event). Note that a
hazard is not equal to a failure [Leveson95].
A software condition that is a prerequisite to an accident [IEEE
1228]. + [IEC61508]
HazOp
Hazard and Operability analysis is a systematic method for examining
complex facilities or processes to find actual or potentially hazardous
procedures and operations so that they may be eliminated or mitigated.
Performance
The speed or volume offered by a service, e.g. delay/transmission time
for data communication, storage capacity in a database, image
resolution on a screen, or sound quality over a telephone line.
Quality
•
•
•
Quality
Service
(QoS)
of •
•
The degree to which a system, component or process meets
specified requirements.
The degree to which a system, component or process meets
customer or user needs or expectations [IEEE 610.12].
Ability of a set of inherent characteristics of a product, system or
process to fulfil requirements of customers and other interested
parties [ISO 9000].
In telephony, QoS can simply be defined as “user satisfaction with
the service” [ITU-T E.800].
"A set of quality requirements on the collective behavior of one or
more objects" [ITU-T X.902].
Comment: That is, the behavioral properties of a service must be
acceptable (of high enough quality) for the user, which can be another
system, an end-user, or a social organization. Such properties
encompass technical aspects like dependability (i.e. trustworthiness),
security, and timely performance (transfer rate, delay, jitter, and loss),
as well as human-social aspects (from perceived multimedia reception
to sales, billing, and service handling). NB: not defined in IEEE 610.12.
See popular paper on QoS [Emstad03] where the more subjective term
QoE (Quality of Experience) is introduced, and also [Zekro99].
69
Reliability
•
•
•
•
The characteristic of an item expressed by the probability that it will
perform its required function in the specified manner over a given
time period and under specified or assumed conditions. Reliability
is not a guarantee of safety [Leveson95].
Continuity for correct service [Avizienis2004].
The ability of a system or component to perform its required
functions under stated conditions for a specified period of time
[IEEE 610.12].
A set of attributes that bear on the capability of software to maintain
its level of performance under stated conditions for a stated period
of time [ISO 9126].
Often measured as Mean-Time-To-Failure (e.g. 1 year), failure rate
(e.g. 10-9/second) or fault density (7 faults/KLOC).
Requirement
1. A condition or capability needed by a user to solve a problem or
achieve an objective.
2. A condition or capability that must be met or possessed by a system
or system component to satisfy a contract, standard, specification,
or other formally imposed documents.
A documented representation of a condition or capability as in 1) or 2)
[IEEE 610.12].
Robustness
The ability to limit the consequences of an active error or failure, in
order to resume (partial) service. Ways to improve this attribute are
duplication, repair, containment etc.
RUP
The Rational Unified Process [Kruchten00] [Kroll03], an incremental
development process based around UML [Fowler04].
Safety
•
•
Freedom from unacceptable risk of physical injury or of damage to
the health of people, either directly or indirectly as a result of
damage to property or to the environment [IEC 61508].
Freedom from software hazards [IEEE 1228].
Security
Protection against unauthorized access (e.g. read / write / search) of
data / information. Remedy: Encryption and strict access control e.g. by
passwords and physical hinders.
Software
Computer programs, procedures and possibly associated documentation
and data pertaining to the operation of a computer system [IEEE
610.12].
70
Software
Safety
Features and procedures which ensure that a product performs
predictably under normal and abnormal conditions, thereby minimizing
the likelihood of an unplanned event occurring, controlling and
containing its consequences, and preventing accidental injury, death,
destruction of property and/or damage to the environment, whether
intentional or unintentional [Herrmann99].
Survivability
The degree to which essential services continue to be provided in spite
of either accidental or malicious harm [Firesmith03]
System
An entity that interacts with other entities, i.e., other systems, including
hardware, software, humans, and the physical world with its natural
phenomena. These other systems are the environment of the given
system. The system boundary is the common frontier between the
system and its environment.
71
72
References
[Aune00] Aune, A.: Kvalitetsdrevet ledelse, kvalitetsstyrte bedrifter. Gyldendal Norsk
Forlag, Oslo, 2000.
[Avison99] Avison, D., Lau, F., Myers, M.D., Nielson, P.A.: Action Research.
Communications of the ACM, (42)1, pp. 94-97, January 1999.
[Avizienis04] Avizienis, A., Laprie, J.-C., Randell, B., Landwehr, C.: Basic concepts
and taxonomy of dependable and secure computing. IEEE Transactions on
Dependable and Secure Computing, 1(1), pp.11-33, Jan.-March 2004.
[Bachmann00] Bachmann, F., Bass, L., Buhman, C., Comella-Dorda, S., Long, F.,
Robert, J., Seacord, R., and Wallnau, K.: Volume II: Technical Concepts of
Component-Based Software Engineering. SEI Technical Report number CMU/SEI2000-TR-008, 2000, available at: http://www.sei.cmu.edu/
[Basili94] Basili, V.R., Calidiera, G., Rombach, H.D.: Goal Question Metric Paradigm.
In: Marciniak, J.J. (ed.): Encyclopaedia of Software Engineering, pp. 528-532,
Wiley, New York, 1994.
[Basili00] Basili, V., Green, S., Laitenberger, O., Shull, F., Sorumgaard, S., and
Zelkowitz, M.: The Empirical Investigation of Perspective-Based Reading. Empirical
Software Engineering: An International Journal, 1(2), pp. 133-164, October 1996.
[Beck99] Kent Beck, Extreme programming explained. Embrace change, ISBN:
0201616416, Addison-Wesley Professional, 1999.
[Boehm88] Boehm, B.W.: A Spiral Model of Software Development and Enhancement.
IEEE Computer, (21)5, pp. 61-72, May 1988.
[Boehm91] Boehm B.W.: Software Risk Management: Principles and Practices. IEEE
Software, (17)1, pp. 32-41, January 1991.
[Boehm03] Boehm B.: Value-Based Software Engineering. ACM Software Engineering
Notes, (28)2, pp.1-12, March 2003.
[Beck99] Beck, K.: Extreme Programming Explained: Embrace Change. AddisonWesley, Boston, 1999.
73
[Bishop98] Bishop, P.G., Bloomfield, R.E.: A Methodology for Safety Case
Development. Proceedings of the Safety-critical Systems Symposium, Birmingham,
UK, Feb 1998.
[Cekro99] Cekro, Z.: Quality of Service – Overview of Concepts and Standards. Report
for COST 256, Free University of Brussels, April 1999, available from
http://www.iihe.ac.be/internal-report/1999/COSTqos.doc.
[Charette05] Charette, R.N.: Why Software Fails. IEEE Spectrum, September 2005.
[Chillarege92] Chillarege, R., Bhandari, I.S., Chaar. J.K., Halliday, M.J., Moebus, D.S.,
Ray, B.K., Wong, M.-Y.: Orthogonal defect classification - a concept for in-process
measurements. IEEE Transactions on Software Engineering, 18(11), pp. 943 – 956,
Nov. 1992.
[Chillarege02] Chillarege, R., Prasad, K.R.: Test and development process
retrospective- a case study using ODC triggers. Proceedings of the International
Conference on Dependable Systems and Networks (DSN’02), pp. 669- 678,
Bethesda, USA, 2002.
[Conradi03] Reidar Conradi (Ed.): Software engineering mini glossary. IDI, NTNU,
available from http://www.idi.ntnu.no/grupper/su/se-defs.html, August 2003.
[Conradi07] Reidar Conradi (Ed.): Mini-glossary of software quality terms, with
emphasis
on
safety.
IDI,
NTNU,
available
from
http://www.idi.ntnu.no/grupper/su/publ/ese/se-qual-glossary-v3_0-rc-4jun07.doc,
June 2007.
[Crnkovic02] Crnkovic, I., Larsson M.: Building reliable component-based software
systems. Artech House, Boston, 2002.
[Dawkins97] Dawkins, S., Kelly, T.: Supporting the use of COTS in safety critical
applications. IEE Colloquium on COTS and Safety Critical Systems (Digest No.
1997/013), pp. 8/1 -8/4, 28 Jan. 1997.
[Dybå00] Dybå, T., Wedde, K.J., Stålhane, T., Moe, N.B., Conradi, R., Dingsøyr, T.,
Sjøberg, D.I.K., Jørgensen, M.: SPIQ Metodehåndbok. Department of Informatics,
University of Oslo, Research Report(282), 2000.
[Eldh07] Eldh, S., Punnekkat, S., Hansson, H., Jönsson, P.: Component Testing Is Not
Enough - A Study of Software Faults in Telecom Middleware. Proceedings of the
19th IFIP International Conference on Testing of Communicating Systems
TESTCOM/FATES 2007, pp. 74-89, Tallinn, Estonia, June 2007.
[El Emam98] El Emam, K., Wieczorek, I.: The repeatability of code defect
classifications. Proceedings of The Ninth International Symposium on Software
Reliability Engineering, pp. 322-333, Paderborn, Germany, 4-7 Nov. 1998
74
[Emstad03] Emstad, P.J., Helvik, B.E., Knapskog, S.J., Kure, Ø., Perkis, A., Swensson,
P.: A Brief Introduction to Quantitative QoS. In Annual Report for 2003 from Q2S
Centre of Excellence, NTNU, pp. 18-29, 2003.
[EVISOFT]
EVISOFT
project,
http://www.idi.ntnu.no/grupper/su/evisoft.html, 2006
available
at:
[Fairley85] Fairley, R.: Software Engineering Concepts. McGraw-Hill, 1985.
[Fenton97] Fenton, N., Pfleeger, S.L.: Software metrics (2nd ed.): a rigorous and
practical approach. PWS Publishing Co., Boston, 1997.
[Firesmith03] Firesmith, D.G.: Common Concepts Underlying Safety, Security, and
Survivability Engineering. Technical Note CMU/SEI-2003-TN-033, Software
Engineering Institute, Pittsburgh, Pennsylvania, December 2003.
[Fowler04] Fowler, M.: UML Distilled. Third Edition, Addison-Wesley, 2004.
[Freimut01] Freimut, B: Developing and using defect classification schemes. IESEReport No. 072.01/E, Version 1.0, Fraunhofer IESE, Sept. 2001.
[Gamma95] Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design patterns: Elements
of reusable object-oriented software. Addison Wesley, 1995.
[Glass94] Glass, R.L.: The Software Research Crisis. IEEE Software, (11)6, pp. 42-47,
Nov. 1994.
[Grady92] Grady, R.: Practical Software Metrics for Project Management and Process
Improvement. Prentice Hall, 1992.
[Heimdahl98] Heimdahl, M.P.E., Heitmeyer, C.L.: Formal methods for developing high
assurance computer systems: working group report. Proceedings of the 2nd IEEE
Workshop on Industrial Strength Formal Specification Techniques, pp. 60-64, 21-23
Oct. 1998.
[Heineman01] Heineman, G.T., Councill, W.T.:
Engineering. Addison-Wesley, Boston, 2001.
Component-Based
Software
[Herrmann99] Herrmann, D.S., Peercy, D.E.: Software reliability cases: the bridge
between hardware, software and system safety and reliability. Proceedings of the
Annual Reliability and Maintainability Symposium, pp. 396-402, Washington, DC,
USA,18-21 Jan. 1999.
75
[Hongxia01] Hongxia J., Santhanam, P.: An approach to higher reliability using
software components. Proceedings of 12th International Symposium on Software
Reliability Engineering, pp. 2-11, Hong Kong, China, 27-30 Nov. 2001.
[IEC61508] IEC: Functional safety and IEC 61508 – A basic guide. 11 p, Geneva,
Switzerland,
available
from
http://www.iee.org/oncomms/pn/functionalsafety/HLD.pdf, Nov. 2002.
[IEEE 1228] IEEE: Standard for Software Safety Plans, IEEE STD 1228-1994. 17
logical p. of 23 physical pages.
[IEEE 1044] IEEE: Standard Classification for Software Anomalies, IEEE STD 10441993. December 2, 1993
[IEEE 610.12] IEEE: IEEE Standard Glossary of Software Engineering Terminology,
IEEE STD 610.12-1990. 84 p., created in 1990 and reaffirmed in 2002.
[ISO91] ISO: ISO/IEC 9126 - Information technology - Software evaluation – Quality
characteristics and guide-lines for their use. December 1991.
[ISO 9000] ISO: Quality management and quality assurance standards, Part 1:
Guidelines for selection and use, ISO 9000-1. Geneva, 1994
[ISO 9001] ISO: Quality Management Systems - Requirements for quality assurance,
ISO 9001:2000. Geneva, 2000.
[ITU-T E.800] ITU: Telephone Network and ISDN, Quality of Service, Network
Management and Traffic Engineering – Terms and Definitions Related to Quality of
Service And Network Performance Including Dependability, ITU-T Recommendation
E.800. 54 p, Geneva, Switzerland, August 1994.
[ITU-T X.902] ITU: Open Distributed Processing – Reference Model – Part 2:
Foundations, ITU-T Recommendation X.902. 20 p, Geneva, Switzerland, 1995.
[Jarke93] Jarke, M., Bubenko, J.A., Rolland, C., Sutcliffe, A., Vassiliou, Y.: Theories
Underlying Requirements Engineering: An Overview of NATURE at Genesis.
Proceedings of the IEEE Symposium on Requirements Engineering, pp. 19-31, IEEE
Computer Society Press, San Diego, January 1993.
[Kohl99] Kohl, R.J.: Establishing guidelines for suitability of COTS for a mission
critical application. Proceedings of The Twenty-Third Annual International
Computer Software and Applications Conference, COMPSAC '99, pp. 98 -99,
Phoenix, AZ, USA, 27-29 Oct. 1999.
[Kroll03] Kroll, P., Krutchen, P.: The Rational Unified Process Made Easy: A
Practitioner's Guide to Rational Unified Process. Addison Wesley, Boston, 2003.
76
[Kropp98] Kropp, N.P., Koopman Jr., P.J., Siewiorek, D.P.: Automated Robustness
Testing of Off-the-Shelf Software Components. Proceedings of the 29th Symposium
on Fault-Tolerant Computing, pp. 230-239, Madison, Wisconsin, USA, June 15-18,
1999.
[Kruchten00] Kruchten, P.: The Rational Unified Process. An Introduction. AddisonWesley, Boston, 2000.
[Laprie95] Laprie, J.-C.: Dependable computing and fault tolerance: Concepts and
terminology. Proceedings of the Twenty-Fifth International Symposium on FaultTolerant Computing, Pasadena, California, June 27-30, 1995.
[Leveson95] Leveson, N.: Safeware: System safety and computers. Addison Wesley,
1995.
[Leveson07] Leveson, N.: System Safety Engineering: Back To The Future (web version
of updates to 1995 book), available from http://sunnyday.mit.edu/book2.pdf , 2007.
[Li06] Li, J., Bjoernson, F.O., Conradi, R., Kampenes, V.B.: An Empirical Study of
Variations in COTS-based Software Development Processes in Norwegian IT
Industry. Journal of Empirical Software Engineering, 11(3), pp. 433-461, 2006.
[Littlewood00] Littlewood, B., Strigini, L.: Software reliability and dependability: a
roadmap. Proceedings of the Conference on The Future of Software Engineering,
22nd International Conference on Software Engineering, pp. 175-188, Limerick,
Ireland, June 2000.
[Mohagheghi04] Mohagheghi, P., Conradi, R., Killi, O.M., Schwarz, H.: An Empirical
Study of Software Reuse vs. Defect Density and Stability. In Proceedings of the 26th
International Conference on Software Engineering (ICSE'04), pp. 282-292,
Edinburgh, Scotland, May 2004.
[Mohagheghi04b] Mohagheghi, P.: The Impact of Software Reuse and Incremental
Development on the Quality of Large Systems. PhD Thesis, NTNU 2004:95, ISBN
82-471-6408-6, 10 July 2004.
[Mohagheghi04c] Mohagheghi, P., Conradi, R.: Exploring Industrial Data Repositories:
Where Software Development Approaches Meet. In Proceedings of the 8th ECOOP
Workshop on Quantitative Approaches in Object-Oriented Software Engineering
(QAOOSE’04), 9 p., Oslo, Norway, 15 June 2004.
[Mohagheghi06] Mohagheghi, P., Conradi, R., Børretzen, J.A.: Revisiting the Problem
of Using Problem Reports for Quality Assessment. Proceedings of the 4th Workshop
on Software Quality, held at ICSE'06, Shanghai, pp. 45-50, 21 May 2006.
77
[Moløkken04] Moløkken-Østvold, K.J., Jørgensen, M., Tanilkan, S.S., Gallis, H., Lien,
A.C., Hove, S.E.: Simula Report 2004-03. “Results from the BEST-Pro (Better
Estimation of Software Tasks and Process Improvement) survey”, 2004.
[Neumann07]
Neumann,
P.G.:
The
http://catless.ncl.ac.uk/Risks/, 2007.
Risks
Digest.
Available
from:
[Parnas03] Parnas, D.L., Lawford, M.: The role of inspection in software quality
assurance. IEEE Transactions on Software Engineering, 29(8), pp 674-676, Aug.
2003.
[Price99] Price, J.: Christopher Alexander's pattern language. IEEE Transactions on
Professional Communication, (42)2, pp. 117-122, June 1999.
[Rational] Rational Software, available at: http://www-306.ibm.com/software/rational/,
2007.
[Rausand91] Rausand, M.: Risikoanalyse. Tapir Forlag, Trondheim, 1991.
[Riehle96] Riehle, D. and Zullighoven, H.; Understanding and Using Patterns in
Software Development. Theory and Practice of Object Systems, 2(1), pp. 3-13, 1996.
[Royce70] Royce, W.W.: Managing the Development of Large Software Systems.
Proceedings of IEEE WESCON, pp. 1-9, August 1970.
[SAP] SAP AG: SAP ERP, http://www.sap.com/index.epx
[Schneidewind98] Schneidewind, N.F.: Methods for assessing COTS reliability,
maintainability, and availability. Proceedings of IEEE International Conference on
Software Maintenance, pp. 224-225, Bethesda, Maryland, USA, 16-20 Nov. 1998.
[Seaman99] Seaman, C.B.: Qualitative Methods in Empirical Studies of Software
Engineering. IEEE Transactions on Software Engineering, (25)4, pp. 557–572,
July/August 1999.
[SEI] Carnegie Mellon Software Engineering Institute: Performance-Critical Systems
(PCS) Introduction. Available from: http://www.sei.cmu.edu/pcs/introduction.html,
2007.
[Shull00] Shull, F., Russ, I., Basili, V.: How Perspective-Based Reading Can Improve
Requirements Inspections. IEEE Computer, 33(7), pp. 73-79, July 2000.
[Solingen99] van Solingen, R., Berghout, E.: The Goal/Question/Metric Method.
McGraw Hill, 1999.
78
[Sommerville04] Sommerville, I.: Software Engineering. 7th edition, Addison-Wesley,
2004.
[Stålhane02] Stålhane, T., Conradi, R., Sjøberg, D.: Proposal for BUCS project. pp. 129, October 2002.
[Stålhane03] Stålhane, T, Myhrer, P.T., Lauritsen, T., Børretzen, J.A.: Intervju med
utvalgte norske bedrifter omkring utvikling av forretningskritiske systemer. Internal
BUCS
report,
6
pages,
available
at:
http://www.idi.ntnu.no/grupper/su/bucs/files/BUCS-rapport-h03.doc, 2003.
[Strauss98] Strauss, A., Corbin, J.: Basics of Qualitative Research. Sage Publications,
London, UK, 1998.
[Thomas96] Thomas, S.A., Hurley, S.F., Barnes, D.J.: Looking for the human factors in
software quality management. Proceedings of International Conference on Software
Engineering: Education and Practice, pp. 474-480, Dunedin, New Zealand, 24-27
Jan. 1996.
[UKS]
UKSMAUnited
http://www.uksma.co.uk.
Kingdom
Software
Metrics
Association:
[Vinter00] Vinter, O., Lauesen, S.: Analyzing Requirements Bugs. Software Testing &
Quality Engineering Magazine, Vol. 2-6, Nov/Dec 2000
[Votta95] Votta, L.G., Zajak, M.L.: Design Process Improvement Case Study Using
Process Waiver Data. Proceedings of the 5th European Software Engineering
Conference, pp.44-58, Barcelona, Spain, September 25-28, 1995.
[Wohlin00] Wohlin, C., Runeson, P., Höst, M., Ohlsson, M.C., Regnell, B., Wesslén,
A.: Experimentation in software engineering: an introduction. Kluwer Academic
Publishers, Norwell, MA, USA, 2000.
[Yin03] Yin, R.K.: Case Study Research, Design and Methods. Sage Publications,
2003.
[Zelkowitz98] Zelkowitz, M.V., Wallace, D.R.: Experimental models for validating
technology. IEEE Computer, (31)5, pp. 23-31, May 1998.
79
80
Appendix A: Papers
This section contains the seven papers P1-P7 as presented in section 1.5, as well as a
proposed paper P8 presented as a technical report. It should be noted that the papers
have been re-formatted from their original format to fit into this thesis.
P1. Safety activities during early software project phases
Jon Arvid Børretzen, Tor Stålhane, Torgrim Lauritsen, and Per Trygve Myhrer
Department of Computer and Information Science,
Norwegian University of Science and Technology, NO-7491 Trondheim, Norway
Email: [email protected]
Abstract
This paper describes how methods taken from safety-critical practises can be used in
development of business-critical software. The emphasis is on the early phases of
product development, and on use together with the Rational Unified Process. One
important part of the early project phases is to define safety requirements for the
system. This means that in addition to satisfying the need for functional system
requirements, non-functional requirements about system safety must also be included.
By using information that already is required or produced in the first phases of RUP
together with some suitable “safety methods”, we are able to produce a complete set of
safety requirements for a business-critical system before the system design process is
started.
1. Introduction
Software systems play an increasingly important role in our daily lives. The
technological development has lead to the introduction of software systems into an
increasing number of areas. In many of these areas we become dependent on these
systems, and their weaknesses could have grave consequences. There are areas where
correctly functioning software is important for the health and well-being of humans, like
air-traffic control and in health systems. There are, however, other systems that we also
expect and hope will run correctly because of the negative effects of failure, even if the
consequences are mainly of an economic nature. This is what we call business-critical
81
systems, and business-critical software. The number of areas where functioning
software is at the core of operation is steadily increasing. Both financial systems and ebusiness systems are relying on increasingly larger and more complex software systems.
In order to increase the quality and efficiency of such products we need methods,
techniques and processes specifically aimed at improving the development, use and
maintenance of this type of software.
In this paper, we will discuss methods that can be used together with Rational Unified
Process in the early parts of a development project. These methods are Safety Case,
Preliminary Hazard Analysis and Hazard and Operability Analysis. Our contribution is
to combine these methods into a comprehensive method for use early in the
development of business-critical systems.
1.1 BUCS
The BUCS project is a research project funded by the Norwegian Research Council
(NFR). The goal of the BUCS project is to help developers, users and customers to
develop software that is safe to use. In a business environment this means that the
system seldom or never behaves in such a way that it causes the customer or the
customer’s users to lose money or important information. We will use the term
“business-safe” for this characteristic.
The goal of the BUCS project is not to help developers to finish their development on
schedule and to the agreed price. We are not particularly interested in delivered
functionality or how to identify or avoid process and project risk. This is not because we
think that these things are not important – it is just that we have defined them out of the
BUCS project.
The BUCS project is seeking to develop a set of integrated methods to improve support
for analysis, development, operation, and maintenance of business-critical systems.
Some methods will be taken from safety-critical software engineering practices, while
others will be taken from general software engineering. Together they are tuned and
refined to fit into this particular context and to be practical to use in a software
development environment. The research will be based on empirical studies, where
interviews, surveys and case studies will help us understand the needs and problems of
the business critical software developers.
Early in the BUCS project, we conducted a series of short interviews with eight
software developing companies as a pre-study to find some important issues we should
focus on [Stålhane03]. These interviews showed us that many companies used or
wanted to use RUP or similar processes, and that a common concern in the industry was
lack of communication, both internally and with the customers. With this basis, the
BUCS project has decided to use RUP as the environment for our enhanced methods,
and the methods used will be helpful in improving communication on requirements
gathering, implementation and documentation in a software development project.
Adaptation of methods from safety-critical development has to be done so that the
methods introduced fit into RUP and are less complicated and time consuming than
82
when used in regular safety-critical development.
That a system is business-safe does not mean that the system is error free. What it
means is that the system will have a low probability of causing losses for the users. In
this respect, the system characteristic is close to the term “safe”. This term is, however,
wider, since it is concerned with all activities that can cause damage to people,
equipment or the environment or severe economic losses. Just as with general safety,
business-safety is not a characteristic of the system alone – it is a characteristic of the
system’s interactions with its environment.
BUCS is considering two groups of stakeholders and wants to help them both.
• The customers and their users. They need methods that enables them to:
o Understand the dangers that can occur when they start to use the system as
part of their business.
o Write or state requirements to the developers so that they can take care of the
risks incurred when operating the system – product risk.
• The developers. They need help to implement the system so that:
o It is business-safe.
o They can create confidence by supporting their claims with analysis and
documentation.
o It is possible to change the systems so that when the environment changes,
the systems are still business-safe.
This will not make it cheaper to develop the system. It will, however, help the
developers to build a business-safe system without large increases in the development
costs.
Why should developing companies do something that costs extra – is this a smart
business proposition? We definitively mean that the answer is “Yes”, and for the
following reasons:
• The only solution most companies have to offer to customers with business-safety
concerns today is that the developers will be more careful and test more – which is
not a good enough solution.
• By building a business-safe system the developers will help the customer achieve
efficient operation of their business and thus build an image of a company that have
their customers’ interest in focus. Applying new methods to increase the products’
business-safety must thus be viewed as an investment. The return on the investment
will come as more business from large, important customers.
2. The Rational Unified Process
The Rational Unified Process (RUP) is a software engineering process. It provides a
disciplined approach to assigning tasks and responsibilities within a development
organization. Its goal is to ensure the production of high-quality software that meets the
needs of its end users within a predictable schedule and budget.
83
RUP is developed and supported by Rational Software [Rational]. The framework is
based on popular development methods used by leading actors in the software industry.
RUP consists of four phases; inception, elaboration, construction and transition. The
BUCS project has identified the three first phases as most relevant to our work, and will
make proposals for introduction of safety methods for these phases. In this paper, we
will concentrate on the inception phase.
Figure 1 - Rational Unified Process; © IBM [Rational]
Figure 1 shows the overall architecture of the RUP, and its two dimensions:
• The horizontal axis which represents time and shows the lifecycle aspects of the
process as it unfolds
• The vertical axis which represents disciplines and group activities to be
performed in each phase.
The first dimension represents the dynamic aspect of the process as it is enacted, and is
expressed in terms of phases, iterations, and milestones. The second dimension
represents the static aspect of the process: how it is described in terms of process
components, disciplines, activities, workflows, artefacts, and roles [Kroll03]
[Krutchen00]. The graph shows how the emphasis varies over time. For example, in
early iterations, we spend more time on requirements, and in later iterations we spend
more time on implementation.
The ideas presented in this paper are valid even if the RUP process is not used. An
iterative software development process will in most cases be quite similar to a RUP
process in broad terms, with phases and where certain events, artefacts and actions exist.
Some companies also use other process frameworks that in principle differ from RUP
mostly in name. Therefore, it is possible and beneficial to include and integrate the
safety methods we propose into any iterative development process.
2.1 Inception
84
Early in a software development project, system requirements will always be on top of
the agenda. In the same way as well thought-out plans are important for a system in
general, well thought-out plans for system safety are important when trying to build a
correctly functioning, safe system. Our goal is to introduce methods that are helpful for
producing a safety requirements specification, which can largely be seen as one type of
non-functional requirements. However, safety requirements also force us to include the
system’s environment. In RUP, with its use-case driven approach, this process can be
seen as analogous to the process of defining general non-functional requirements, since
use-case driven processes are not well suited for non-functional requirements
specification. Because the RUP process itself does not explicitly command safety
requirements in the same way it does not command non-functional requirements, other
methods have to be introduced for this purpose. On the other hand, the architecturecentric approach in RUP is helpful for producing non-functional requirements, as these
requirements are strongly linked to a system’s architecture. Considerations about system
architecture will therefore influence non-functional and safety requirements.
Although designing safety into the system from the beginning (upstream protection)
may incur some design trade-offs, eliminating or controlling hazards may result in
lower costs during both development and overall system lifetime, due to fewer delays
and less need for redesign [Leveson95]. Working in the opposite direction, adding
protection features to a completed design (downstream protection) may cut costs early
in the design process, but will increase system costs, delays and risk to a much greater
extent than the costs owing to early safety design.
The main goal of the inception phase is to achieve a common understanding among the
stakeholders on the lifecycle objectives for the development project [Krutchen00]. You
should decide exactly what to build, and from a financial perspective, whether you
should start building it at all. Key functionality should be identified early. The inception
phase is important, primarily for new development efforts, in which there are significant
project risks which must be addressed before the project can proceed. The primary
objectives of the inception phase include (from [Kroll03] [Krutchen00]):
• Establishing the project's software scope and boundary conditions, including an
operational vision, acceptance criteria and what is intended to be included in the
product and what is not.
• Identifying the critical use cases of the system, the primary scenarios of
operation that will drive the major design trade-offs. This also includes deciding
which use cases that are the most critical ones.
• Exhibiting, and maybe demonstrating, at least one candidate architecture against
some of the primary scenarios.
• Estimating the overall cost and schedule for the entire project (and more detailed
estimates for the elaboration phase that will immediately follow).
• Assessing risks and the sources of unpredictability.
• Preparing the supporting environment for the project.
85
3. Safety methods introduced by BUCS
Early in a project’s life-cycle, many decisions have not yet been made, and we have to
deal with a conceptual view or even just ideas for the forthcoming system. Therefore,
much of the information we have to base our safety-related work on is at a conceptual
level. The methods we can use will therefore be those that can use this kind of highlevel information, and the ones that are suited to the early phases of software
development projects.
We have identified five safety methods that are suitable for the inception phase of a
development project. Two of them, Safety Case and Intent Specification, are methods
that are well suited for use throughout the development project [Adelard98]
[Leveson00], as they focus on storing and combining information relevant to safety
through the product’s life-cycle. The other three, Preliminary Hazard Analysis, Hazards
and Operability Analysis and Event Tree Analysis are focused methods [Rausand91]
[Leveson95], well suited for use in the inception phase, as they can be used on a project
where many details are yet to be defined. In this paper, the Safety Case, Preliminary
Hazard Analysis and Hazard and Operability Analysis methods are used as examples of
how such methods can be used in a RUP context.
When introducing safety related development methods into an environment where the
aim is to build a business-safe system, but not necessarily error-free and completely
safe, we have to accept that usage of these methods will not be as stringent and effort
demanding as in a safety-critical system. This entails that the safety methods used in
business-critical system development will be adapted and simplified versions, in order
to save time and resources.
3.1 Safety Case
A safety case is a documented body of evidence that provides a convincing and valid
argument that a system is adequately safe for a given application in a given environment
[Adelard98] [Bishop98]. The safety case method is a tool for managing safety claims,
containing a reasoned argument that a system is or will be safe. It is manifested as a
collection of data, metadata and logical arguments. The main elements of a safety case
are shown in Figure 2:
• Claims about a property of the system or a subsystem (Usually about safety
requirements for the system)
• Evidence which is used as basis for the safety argument (Facts, assumptions, subclaims)
• Arguments linking the evidence to the claim
• Inference rules for the argument
86
Inference rule
Evidence
Claim
Evidence
Sub-claim
Inference rule
Argument structure
Figure 2 – How a safety case is built up
The arguments can be:
• Deterministic: Application of predetermined rules to derive a true/false claim, or
demonstration of a safety requirement.
• Probabilistic: Quantitative statistical reasoning, to establish a numerical level.
• Qualitative: Compliance with rules that have an indirect link to the desired
attributes.
The safety case method can be used throughout a system’s life-cycle, and divides a
project into four phases: Preliminary, Architectural, Implementation, and Operation and
Installation. This is similar to the phases of RUP, and makes it reasonable to tie a
preliminary safety case to the inception phase of a development project. The
development of a safety case does not follow a simple step by step process. The main
activities interact with each other and iterate as the design proceeds and as the level of
detail in the system design increases. This also fits well with the RUP process.
The question the safety case documents will answer is in our case “How will we argue
that this system can be trusted?” The safety case shows how safety requirements are
decomposed and addressed, and will provide an appropriate answer to the question. The
characteristics of the safety case elements in the inception phase are:
1. Establish the system context, whether the safety case is for a complete system
or a component within a system.
2. Establish safety requirements and attributes for the current level of the design,
and how these requirements and attributes are related to the system’s safety
analysis.
3. Define important operational requirements and constraints such as
maintenance levels, time to repair and issues related to the operating
environment.
87
3.2 Preliminary Hazard Analysis and Hazard and Operability Analysis
Preliminary Hazard Analysis (PHA) is used in the early life cycle stages to identify
critical system functions and broad system hazards. The identified hazards are assessed
and prioritized, and safety design criteria and requirements are identified. A PHA is
started early in the concept exploration phase so that safety considerations are included
in tradeoff studies and design alternatives. This process is iterative, with the PHA being
updated as more information about the design is obtained and as changes are being
made. The results serve as a baseline for later analysis and are used in developing
system safety requirements and in the preparation of performance and design
specifications. Since PHA starts at the concept formation stage of a project, little detail
is available, and the assessments of hazard and risk levels are therefore qualitative. A
PHA should be performed by a small group with good knowledge about the system
specifications.
Both Preliminary Hazard Analysis and Hazard and Operability Analysis (HazOp) are
performed to identify hazards and potential problems that the stakeholders see at the
conceptual stage, and that could be created by the system after being put into operation.
A HazOp study is a more systematic analysis of how deviations from the design
specifications in a system can arise, and whether these deviations can result in hazards.
Both analysis methods build on information that is available at an early stage of the
project. This information can be used to reduce the severity or build safeguards against
the effects of the identified hazards.
HazOp is a creative team method, using a set of guidewords to trigger creative thinking
among the stakeholders and the cross-functional team in RUP. The guidewords are
applied to all parts and aspects of the system concept plan and early design documents,
to find possible deviations from design intentions that have to be handled. Examples of
guidewords are MORE and LESS. This will mean an increase or decrease of some
quantity. For example, by using the “MORE” guideword on “a customer client
application”, you would have “MORE customer client applications”, which could spark
ideas like “How will the system react if the servers get swamped with customer client
requests?” and “How will we deal with many different client application versions
making requests to the servers?” A HazOp study is conducted by a team consisting of
four to eight persons with a detailed knowledge of the system to be analysed.
The main difference between a HazOp and a PHA is that PHA is a lighter method that
needs less effort and available information than the HazOp method. Since HazOp is a
more thorough and systematic analysis method, the results will be more specific. If
there is enough information available for a HazOp study, and the development team can
spare the effort, a HazOp study will most likely produce more precise and more suitable
results for the safety requirement specification definition.
88
4. Integration: Using safety methods in the RUP Inception phase
In the inception phase we will focus on understanding the overall requirements and
scoping the development effort. When a project goes through its inception phase, the
following artifacts will be established/produced:
• Requirements, leading to a System Test Plan
• Identification of key functionality
• Proposals for possible solutions
• Vision documents
• Internal business case
• Proof of concept
The artifacts in bold are the ones that are interesting from a system-safe point of view,
and the fact that the RUP inception phase requires development teams to produce such
information eases the introduction of safety methods into the process. Because of RUP’s
demands on information collection, using these methods do not lead to extensive extra
work for the development team.
By using the safety methods we have proposed, we can produce safety requirements for
the system. These are high-level requirements, and must be specified before the project
goes from the inception to the elaboration phase. When the project moves on from the
inception to the elaboration phase, identification of the business-critical aspects should
be mostly complete; and we should have high confidence in having identified the
requirements for those aspects.
The safety work in the project continues into the elaboration phase, and some of the
methods, like Safety Case and Intent Specification will also be used when the project
moves on to this phase.
4.1 Software Safety Case in a RUP context
According to [Bishop98], we need the following information when producing a safety
case:
• Information used to construct the safety argument
• Safety evidence
As indicated in 3.1, to implement a safety case we need to:
• make an explicit set of claims about the system
• produce the supporting evidence
• supply a set of safety arguments linking the claims to the evidence, shown in
Figure 2
• make clear the assumptions and judgements underlying the arguments
The safety case is broken down into claims about non-functional attributes for subsystems, such as reliability, availability, fail-safety, response time, robustness to
overload, functional correctness, accuracy, usability, security, maintainability,
modifiability, and so on.
89
The evidence used to support a safety case argument comes from:
• The design itself
• The development processes
• Simulation of problem solution proposals
• Prior experience from similar projects or problems
Much of the work done early in conjunction with safety cases tries to identify possible
hazards and risks, for instance by using methods like Preliminary Hazard Analysis
(PHA) and Hazard and Operability Analysis (HazOp). These are especially useful in
combination with Safety Case for identifying the risks and safety concerns that the
safety case is going to handle. Also, methods like Failure Mode and Effects Analysis,
Event Tree Analysis, Fault Tree Analysis and Cause Consequence Analysis can be used
as tools to generate evidence for the safety case [Rausand91].
The need for concrete project artefacts as input in the safety case varies over the project
phases, and is not strictly defined. Early on in a project, only a general system
description is needed for making the safety requirements specification. When used in
the inception phase, the Safety Case method will support the definition of a safety
requirements specification document by forcing the developers to “prove” that their
intended system can be trusted. When doing that, they will have to produce a set of
safety requirements that will follow the project through its phases, and which will be
updated along with the safety case documents.
The Safety Case method, when used to its full potential, will be too elaborate when not
dealing with safety-critical projects. The main concept and structure will, however, help
trace the connection between hazards and solutions through the design from top level
down to detailed level implementation.
Much of the work that has to be performed when constructing a software safety case is
to collect information and arrange this information in a way that shows the reasoning
behind the safety case. Thus, the safety case does not in itself bring much new
information into the project; it is mainly a way of structuring the information.
4.2 Preliminary Hazard Analysis and Hazard and Operability Analysis in a
RUP context
By performing a PHA or HazOp we can identify threats attached to both malicious
actions and unintended design deviations, for instance as a result of unexpected use of
the system or as a result of operators or users without necessary skills executing an
unwanted activity.
To perform a PHA or HazOp, we only need a conceptual system description, and a
description of the system’s environment. RUP encourages such information to be
produced in the inception phase of a project. When a hazard is identified, either by PHA
or HazOp, it is categorized and we have to decide if it is acceptable or if it needs further
investigation. When trustworthiness is an issue, the hazard should be tracked in a hazard
90
log and subjected to review along the development process. This makes a basis for
further analysis, and produces elements to be considered for the safety requirement
specification.
The result of a PHA or HazOp investigation is the identification of possible deviations
from the intent of the system. For every deviation, the causes and consequences are
examined and documented in a table. The results are used to focus work effort and to
solve the problems identified. The results of PHA and HazOp are also incorporated into
the safety case documents either as problems to be solved, or as evidence used in
existing safety claim arguments.
4.3 Combining the methods
By introducing the use of Safety Case and PHA/HazOp into the RUP inception phase,
we have a process where the system safety requirements are maintained in the safety
case documents. PHA and HazOp studies on the system specification, together with its
customer requirements and environment description, produces hazard identification logs
that are incorporated into the safety case as issues to be handled. This also leads to
revision of the safety requirements. Thus, the deviations found with PHA/HazOp will
be covered by these requirements as shown in Figure 3.
From the inception phase of the development process, the safety requirements and
safety case documents are used in the remaining phases where the information is used in
the implementation of the system.
Customer
Requirements
Environment
PHA and/ or
HazOp
Safety
Requirements
Safety case
Figure 3 – Combining PHA/HazOp and Safety Case
5. A small example
Let us assume a business needing a database containing information about their
customers and the customers’ credit information. When developing a computer system
91
for this business, not only should we ask the business representatives which functions
they need and what operating system they would like to run their system on, but we
should also use proper methods to improve the development process with regard to
business-critical issues. An example of an important requirement for such a system
would be ensuring the correctness and validity of customers’ credit information. Any
problems concerning this information in a system would seriously impact a company’s
ability to operate satisfactorily.
The preliminary hazard analysis method will be helpful here, by making stakeholders
think about each part of the planned system and any unwanted events that could occur.
By doing this, we will get a list of possible hazards that have to be eliminated, reduced
or controlled. This adds directly to the safety requirements specification. An example is
the potential event that the customer information database becomes erroneous, corrupt
or deleted. By using a preliminary hazard analysis, we can identify the possible causes
that can lead to this unwanted event, and add the necessary safety requirements.
We can use the system’s database as an example. In order to identify possible database
problems – Dangers – we can consider each database item in turn and ask: “What will
happen if this information is wrong or is missing?” If the identified effect could be
dangerous for the system’s users or owner – Effects – we will have to consider how it
could happen – Causes - and what possible barriers we could insert into the system. The
PHA is documented in a table. The table, partly filled out for our example, is shown
below in Table 1.
Customer info management
Danger
Causes
Wrong address Wrong
address
inserted
Update error
Database error
Wrong credit Wrong credit info
info
inserted
Update error
Database error
Effects
Barriers
Correspondence sent Check against name
and public info, e.g.
to wrong address
“Yellow pages”
Testing
Manual
check
Wrong billing.
Can have serious required
consequences
Consistency check
Testing
Table 1 – PHA example
When we have finished the PHA, we must show that each identified danger that can
have a serious effect will be catered to during the development process. In BUCS we
have chosen to use safety cases for this.
When using the safety case method, the developers will have to show that the way they
want to implement a function or some part of the system is trustworthy. This is done by
producing evidence and a reasoned argument that this way of doing things will be safe.
From Table 1, we see that for the customer’s credit information, the safety case should
be able to document what the developers are going to do to make sure that the credit
information used in billing situations is correct. Figure 4 shows a high level example of
how this might look in a safety case diagram. The evidence may come from earlier
92
experience with implementing such a solution, or the belief that their testing methods
are sufficient to ensure safety.
Credit info
must be correct
when sending
invoice
Claim
Arguments
Evidence
Insertion and
updating credit
info is made
trustworthy
Implementation
of manual
credit info
check
Credit info DB
is sufficiently
reliable
Implementation
of credit info
consistency
check
Database
Implementation
testing
Figure 4 – Safety Case example
The lowest level in the safety case in Figure 4 contains the evidences. In our case, these
evidences give rise to three types of requirements:
• Manual procedures. These are not realised in software but the need to perform
manual checks will put extra functional requirements onto the system.
• The software. An example in Figure 4 is the need to implement a credit information
consistency check.
• The process. The safety case requires us to put an extra effort into testing the
database implementation. Most likely this will be realised either by allocating more
effort to testing or to allocate a disproportional part of the testing effort to testing the
database.
After using these methods for eliciting and documenting safety requirements, in the next
development stages the developers will have to produce the evidence suggested in the
diagram, show how the evidence supports the claims by making suitable arguments and
finally document that the claims are supported by the evidence and arguments. Some
examples of evidence are trusted components from a component repository, statistical
evidence from simulation, or claims about sub-systems that are supported by evidence
and arguments in their own right. Examples of relevant arguments are formal proof that
two pieces of evidence together supports a claim, quantitative reasoning to establish a
required numerical level, or compliance with some rules that have a link to the relevant
attributes.
93
Further on in the development process, in the elaboration and construction phases, the
evidence and arguments in the safety case will be updated with information as we get
more knowledge about the system. Each piece of evidence and argumentation should be
directly linked to some part of the system implementation. The responsibility of the
safety case is to show that the selected barriers and their implementation are sufficient
to prevent the dangerous event from taking place. When the evidence and arguments in
the safety case diagram are implemented and later tested in the development process,
the safety case documentation is updated to show that the safety case claim has been
validated.
By using PHA to find potential hazards and deviations from intended operation, and
Safety Case to document how we intend to solve these problems, we produce elements
to the safety requirements specification, which without these methods may have been
missed.
6. Conclusion and further work
We have shown how the Preliminary Hazard Analysis, Hazard and Operability Analysis
and Safety Case methods can be used together in the RUP inception phase, to help
produce a safety requirements specification. The shown example is simple, but
demonstrates how the combination of these methods will work in this context. By
building on information made available in an iterative development process like RUP,
we can use the presented methods to improve the process for producing a safety
requirements specification.
As a development project moves into the proceeding phases, the need for safety effort
will still remain to ensure the development of a trustworthy system. The other RUP
phases contain different development activities and therefore different safety activities.
The BUCS project will make similar descriptions of the other RUP phases and show
how safety related methods can be used beneficially also in these phases.
BUCS will also continue the effort in working with methods for improving safety
requirements collection, and will make contributions in the following areas:
• Proposals on adaptation of methods from safety development for businesscritical system development.
• Guides and advice on business-critical system development.
• Tools supporting development of business-critical systems.
• Investigations on use of component-based development in the development of
business-critical systems.
94
References
[Adelard98] “ASCAD, Adelard Safety Case Development Manual”, Published 1998 by
Adelard.
[Bishop98] P.G. Bishop, R.E. Bloomfield, "A Methodology for Safety Case Development",
Safety-critical Systems Symposium (SSS 98), Birmingham, UK, Feb, 1998.
[Kroll03] P. Kroll, P. Krutchen, The Rational Unified Process Made Easy: A Practitioner's
Guide to Rational Unified Process, Addison Wesley, Boston, 2003, ISBN: 0-321-16609-4.
[Krutchen00] P. Krutchen, The Rational Unified Process: An Introduction (2nd Edition),
Addison Wesley, Boston, 2000, ISBN: 0-201-70710-1.
[Leveson95] N.G. Leveson, Safeware: System safety and computers, Addison Wesley, USA,
1995, ISBN: 0-201-11972-2.
[Leveson00] N.G Leveson, “Intent specifications: an approach to building human-centered
specifications”, IEEE Transactions on Software Engineering, Volume: 26, Issue: 1, Jan. 2000,
Pages:15 – 35.
[Rational] Rational Software, http://www.rational.com
[Rausand91] M. Rausand, Risikoanalyse, Tapir Forlag, Trondheim, 1991, ISBN: 82-519-09708.
[Stålhane03] T. Stålhane, T. Lauritsen, P.T. Myhrer, J.A. Børretzen, BUCS rapport - Intervju
med utvalgte norske bedrifter omkring utvikling av forretningskritiske systemer, October 2003,
available from: http://www.idi.ntnu.no/grupper/su/bucs/files/BUCS-rapport-h03.doc
95
96
P2. Results and Experiences from an Empirical Study of Fault
Reports in Industrial Projects
Jon Arvid Børretzen, Reidar Conradi
Department of Computer and Information Science,
Norwegian University of Science and Technology (NTNU),
NO-7491 Trondheim, Norway
[email protected], [email protected]
Abstract. Faults introduced into systems during development are costly to fix, and
especially so for business-critical systems. These systems are developed using common
development practices, but have high requirements for dependability. This paper reports on
an ongoing investigation of fault reports from Norwegian IT companies, where the aim is to
seek a better understanding on faults that have been found during development and how
this may affect the quality of the system. Our objective in this paper is to investigate the
fault profiles of four business-critical commercial projects to explore if there are differences
in the way faults appear in different systems. We have conducted an empirical study by
collecting fault reports from several industrial projects, comparing findings from projects
where components and reuse have been core strategies with more traditional development
projects. Findings show that some specific fault types are generally dominant across reports
from all projects, and that some fault types are rated as more severe than others.
1. Introduction
Producing high quality software is an important goal for most software developers. The
notion of software quality is not trivial, different stakeholders will have different views
on what software quality is. In the Business-Critical Software (BUCS) project [1] we
are seeking to develop a set of methods to improve support for analysis, development,
operation, and maintenance of business-critical systems. These are systems that we
expect and hope will run correctly because of the possibly severe effects of failure, even
if the consequences are mainly of an economic nature. In these systems, software
quality is important, and the main target for developers will be to make systems that
operate correctly all the time [1]. One important issue in developing these kinds of
systems is to remove any possible causes for failure, which may lead to wrong operation
of the system.
The study presented here investigated fault reports from two software projects using
components and reuse strategies, and two projects using a more traditional development
process. It compares the fault profiles of the reuse-intensive projects with the other two,
in several dimensions; Fault type, fault severity and location of fault.
2. Previous studies on software faults and fault implications
Software quality is a notion that encompasses a great number of attributes. When
speaking about business-critical systems, the critical quality attribute is often
97
experienced as the dependability of the system. According to Littlewood et al. [2],
dependability is a software quality attribute that encompasses several other attributes,
the most important are reliability, availability, safety and security.
Faults in the software lessen the software’s quality, and by reducing the number of
faults introduced during development you can improve the quality of software. Faults
are potential flaws in a software system, that later may be activated to produce an error.
An error is the execution of a fault, leading to a failure. A failure results in erroneous
external behaviour, system state or data state. Remedies known for errors and failures
are to limit the consequences of a failure, in order to resume service, but studies have
shown that this kind of late protection is more expensive than removing the faults
before they are introduced into the code [3]. Faults are also known as defects or bugs,
and a more extensive concept is anomalies, which is used in the IEEE 1044 standard
[4]. Orthogonal Defect Classification – ODC – is a way of studying defects in software
systems [5, 6, 7, 8]. ODC is a scheme to capture the semantics of each software fault
quickly.
It has been debated if faults can be tied to reliability in a cause-effect relationship. Some
papers like [6, 8] indicate that this is valid, while others like [9] are more critical. Still,
reducing the number of faults will make the system less prone to failure, so by
removing faults without adding new ones, there is a good case for the system reliability
increasing. This is called “reliability-growth models”, and is discussed by Hamlet in [9].
Avizienis et al. states [10] that fault prevention aim to provide the ability to deliver a
service that can be trusted. Hence, preventing faults and reducing their numbers and
severity in a system, the quality of the system can be improved in the area of
dependability.
3. Research design
Research questions. Initially we want to find which types of faults that are most
frequent, and if there are some parts of the systems that have more faults than others:
RQ1: Which types of faults are most typical for the different software parts?
When we know which types of faults dominate and where these faults appear in the
systems, we can choose to concentrate on the most serious ones in order to identify the
most important issues to target in improvement work:
RQ2: Are certain types of faults considered to be more severe than others by the
developers?
Research method. This study is based on data mining, where the data consists of fault
reports we have received from four commercial projects. The investigation has mostly
been a bottom-up process, because of the initial uncertainty about the available data
from potential participants. After establishing a dialogue with the projects, and
acquiring the fault reports, our initial research questions and goals were altered
accordingly.
The metrics used. The metrics have been chosen based on what we wanted to
investigate and on what data turned out to be available from the projects participating in
the study. The frequency number of detected faults is an indirect metric, attained by
98
counting the number of faults of a type or for a system part etc. The metrics used
directly from the data in the reports are type, severity and location of the fault.
3.1 Fault categories
There are several taxonomies for fault types, two examples are the ones used in the
IEEE 1044 standard [4] and in a variant of the Orthogonal Defect Classification (ODC)
scheme [6]. The fault types used in this study is shown in Table 1. They have been
derived by using the existing data material in the reports, combined with two
taxonomies found in literature, IEEE 1044 and ODC.
Categorization of faults in this investigation has been performed partly by the projects
themselves and completed by us as a part of this investigation, based on the fault
reports’ textual description and partial categorization. Also, grading the faults’
consequences upon the system and system environment enables fault severities to be
defined. All severity grading has been done by the fault reporters in the projects.
Table 1. Fault types used in this study
Assignment fault
Checking fault
Data fault
Documentation fault
Environment fault
Functional fault - computation
Fault types
Functional fault - logic
Functional fault - state
GUI fault
I/O fault
Interface fault
Memory fault
Missing data
Missing functionality
Missing value
Performance fault
Wrong functionality called
Wrong value used
3.2 Data collection
The data sample. We contacted over 40 different companies that we believed had
relevant projects we could study. In the end four projects fit our criteria and were
willing to proceed with the study. The reasons for the low participation rate among the
contacted companies were most likely issues like skepticism towards releasing sensitive
information, lack of organized effort in fault handling and lack of resources. Table 2
contains information about the participating projects.
Table 2. Information about the participating projects
Project
Project
description
A
Financial
system.
B
Real-time
embedded system.
Domain
Platform
# reports
Dev. effort
Finance
MVS, OS/2
52
~27400 hours
Security
VxWorks
360
~ 32000 hours
C
Public
administration
application.
Publ. administration
J2EE, EJB
1684
~17600 hours
D
Task management
system.
Publ. administration
J2EE
379
2165 hours
Note that projects C and D have been developed using modern practices, including
component-based development, while projects A and B have been developed using
more traditional development practices.
99
4. Research results
RQ1 – Which types of faults are most typical?
To answer RQ1, we look at the distribution of the fault type categories for the projects,
shown in Table 3. For projects C and D, we see that functional logic faults are
dominant, with 49% and 58% of the faults for those projects. Functional logic faults are
also a large part of the faults in projects A and B.
In the same manner, the distribution of faults with a severity rating of “high” is shown
in Table 4. Functional logic faults are still dominant in projects C and D, with 45% and
69% of the faults, respectively. Project A is a special case here, as only one single fault
was reported to be of high severity.
Table 3. Distribution of all faults in fault
type categories
Fault type
Assignment
Checking
Data
Documentatio
n
Environment
Funct. comp.
Funct. logic
Funct. state
GUI
I/O
Interface
Memory
Missing data
Missing funct.
Missing value
Performance
Wrong funct.
Wrong value
UNKNOWN
A
7%
4%
4%
0%
0%
13 %
20 %
0%
2%
0%
0%
0%
2%
13 %
4%
0%
0%
27 %
2%
Project
B
C
4%
1%
3%
2%
6%
5%
1%
2%
1%
29 %
25 %
8%
2%
4%
1%
0%
8%
1%
1%
1%
3%
2%
6%
1%
1%
49 %
3%
8%
1%
0%
0%
1%
8%
1%
3%
2%
3%
5%
D
1%
1%
4%
3%
0%
0%
58 %
5%
7%
0%
0%
0%
2%
3%
1%
1%
1%
4%
8%
Table 4. Distribution of high severity
faults in fault type categories
Fault type
Assignment
Data
Documentation
Environment
Funct. logic
Funct. state
GUI
I/O
Interface
Memory
Missing data
Missing funct.
Missing value
Performance
Wrong funct.
Wrong value
A
100 %
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
0%
Project
B
C
1%
0%
6%
15 %
0%
2%
4%
5%
19 %
45 %
36 %
8%
10 %
2%
1%
5%
3%
0%
3%
0%
0%
2%
7%
2%
1%
2%
3%
9%
0%
0%
6%
6%
D
0%
4%
0%
0%
69 %
9%
0%
0%
0%
2%
4%
4%
0%
0%
2%
4%
When looking at the distribution of faults, especially for the high severity faults, we see
that two categories of dominate the picture, “Functional logic” and “Functional state”.
We also see that for all faults, “GUI” faults have a large share (around 8% for projects
B, C, D) of the reports, while for the high severity faults the share of GUI faults are
strongly reduced in projects C and D to 2% and 0% respectively.
RQ2 – Are certain types of faults considered to be more severe?
To answer RQ2, we need to look at the number of “high” severity rated faults for
different fault categories. Figure 1 shows the percentage of high severity faults found in
some fault categories for three of the projects. Project A is left out because of having
only one high severity fault reported.
From Figure 1, we see that some fault types seem to be judged as more severe than
others. In the projects that do report them, “Memory fault” stands out as a high severity
type of fault. For Projects C and D, “GUI faults” are not judged to be very severe, while
Project B rates them in line with other fault types. We also see that Project B has
generally rated more of their faults as being highly severe than Projects C and D.
100
By comparing the two projects C and D, which had employed reuse strategies in
development, with the other two projects, there is no evidence that development with
reuse has had any significant effects on fault distribution or severity.
Wro ng value used
Wro ng functio n called
P erfo rmance fault
M issing value
M issing functio nality
M issing data
M emo ry fault
D
Interface fault
C
I/O fault
B
GUI fault
Functio n fault state
Functio n fault lo gic
Enviro nment fault
Data fault
A ssignment fault
0,0 %
20,0 %
40,0 %
60,0 %
80,0 %
100,0 %
Figure 1. Percentage of high severity faults in some fault categories
5. Discussion
A major issue when doing the analysis of the data collected was the heterogeneity of the
data. These are four different companies where data collection has not been coordinated
beforehand, and as each company used their own proprietary fault report system, no
standards for reporting was followed. Another issue was cases of missing data in
reports, e.g. missing information about fault location. Because the reports have been
used for development rather than for research purposes, the developers have not always
entered all data into the reports. A final issue was incompatibility between fault reports
for one of the projects and other information concerning the project. No satisfactory link
between the functional and structural modules was available in project D. This
prevented us from separating the reused parts from the rest of the system, and hindered
a valid study of comparing reused to non-reused system parts at this time.
Concerning validity, the most serious threats to external validity are the small number
of projects under investigation and that the chosen projects may also not necessarily be
the most typical. As for conclusion validity, one possible threat is low reliability of
measures, because of some missing data or parts of the data.
6. Conclusion and future work
This paper has presented some preliminary results of an investigation on fault reports in
industrial projects. The results answer our two questions:
RQ1: Which types of faults are most typical for the different software parts? -Looking
at all faults in all projects, “functional logic” faults were the dominant fault type. For
high severity faults, “functional logic” and “functional state” faults were dominant.
101
RQ2: Are certain types of faults considered to be more severe than others? -We have
seen that some fault types are rated more severe than others, for instance “Memory
fault”, while the fault type “GUI fault” was rated as less severe for the two projects
employing reuse in development.
Results from this study are preliminary, and the next step is to focus on the differences
between reuse-based development projects and non-reuse projects. We will also try to
incorporate fault report data from 2-3 other projects into the investigation in order to
increase the validity of the study.
Later, the BUCS project wants to focus on the most typical and serious faults, and
describe how we can identify and prevent these at an earlier development stage. This
may be in the form of a checklist for some hazard analysis scheme.
References
1. J. A. Børretzen; T. Stålhane; T. Lauritsen; P. T. Myhrer, “Safety activities during early
software project phases”. Proceedings, Norwegian Informatics Conference, 2004
2. B. Littlewood; L. Strigini, “Software reliability and dependability: a roadmap”, Proceedings
of the Conference on The Future of Software Engineering, Limerick, Ireland, 2000, Pages:
175 - 188
3. N. Leveson, Safeware: System safety and computers, Addison-Wesley, Boston, 1995
4. IEEE Standard Classification for Software Anomalies, IEEE Std 1044-1993, December 2,
1993
5. K. Bassin; P. Santhanam, “Managing the maintenance of ported, outsourced, and legacy
software via orthogonal defect classification”, Proceedings. IEEE International Conference
on Software Maintenance, 2001, 7-9 Nov. 2001
6. K. El Emam; I. Wieczorek, “The repeatability of code defect classifications”, Proceedings.
The Ninth International Symposium on Software Reliability Engineering, 1998, 4-7 Nov.
1998 Page(s):322 – 333
7. R. Chillarege; I.S. Bhandari; J.K. Chaar; M.J. Halliday; D.S. Moebus; B.K. Ray; M.-Y.
Wong, “Orthogonal defect classification-a concept for in-process measurements”, IEEE
Transactions on Software Engineering, Volume 18, Issue 11, Nov. 1992 Page(s):943 - 956
8. R.R. Lutz; I.C. Mikulski, “Empirical analysis of safety-critical anomalies during operations”,
IEEE Transactions on Software Engineering, 30(3):172-180, March 2004
9. D. Hamlet, “What is software reliability?”, Proceedings of the Ninth Annual Conference on
Computer Assurance, 1994. COMPASS '94 'Safety, Reliability, Fault Tolerance,
Concurrency and Real Time, Security', 27 June-1 July 1994 Page(s):169 - 170
10. A. Avizienis, J.-C. Laprie, B. Randell, and C. Landwehr; Basic Concepts and Taxonomy of
Dependable and Secure Computing, IEEE Transactions on Dependable and Secure
Computing, vol. 1, no. 1, January-March 2004
102
P3. Revisiting the Problem of Using Problem Reports for
Quality Assessment
Parastoo Mohagheghi, Reidar Conradi, Jon Arvid Børretzen
Department of Computer and Information Science
Norwegian University of Science and Technology
No-7491, Trondheim- Norway
{parastoo, conradi, borretze}@idi.ntnu.no
Abstract
In this paper, we describe our experience with using problem reports from industry for quality
assessment. The non-uniform terminology used in problem reports and validity concerns have been
subject of earlier research but are far from settled. To distinguish between terms such as defects or
errors, we propose to answer three questions on the scope of a study related to what (problem
appearance or its cause), where (problems related to software; executable or not; or system), and when
(problems recorded in all development life cycles or some of them). Challenges in defining research
questions and metrics, collecting and analyzing data, generalizing the results and reporting them are
discussed. Ambiguity in defining problem report fields and missing, inconsistent or wrong data threatens
the value of collected evidence. Some of these concerns could be settled by answering some basic
questions related to the problem reporting fields and improving data collection routines and tools.
Categories and Subject Descriptors
D.2.8 [Software Engineering]: Metrics- product metrics, process metrics; D.2.4
[Software Engineering]: Software/Program Verification- reliability, validation.
General Terms
Measurement, Reliability.
Keywords
Quality, defect density, validity.
1. INTRODUCTION
Data collected on defect or faults (or in general problems) are used in evaluating
software quality in several empirical studies. For example, our review of extant
literature on industrial software reuse experiments and case studies verified that
problem-related measures were used in 70% of the reviewed papers to compare quality
of reused software components versus the non-reused ones, or development with
systematic reuse to development without it. However, the studies report several
concerns using data from problem reports and we identified some common concerns as
well. The purpose of this paper is to reflect over these concerns and generalize the
experience, get feedback from other researchers on the problems in using problem
reports, and how they are handled or should be handled.
In this paper, we use data from 6 large commercial systems all developed by the
Norwegian industry. Although most quantitative results of the studies are already
103
published [4, 12, 18], we felt that there is a need for summarizing the experience in
using problem reports, identifying common questions and concerns, and raising the
level of discussion by answering them. Examples from similar research are provided to
further illustrate the points. The main goal is to improve the quality of future research
on product or process quality using problem reports.
The remainder of this paper is organized as follows. Section 2 partly builds on work of
others; e.g., [14] has integrated IEEE standards with the Software Engineering Institute
(SEI)’s framework and knowledge from four industrial companies to build an entityrelationship model of problem report concepts, and [9] has compared some attributes of
a number of problem classification schemes (the Orthogonal Defect ClassificationODC [5], the IEEE Standard Classification for Software Anomalies (IEEE Std. 10441993) and a classification used by Hewlett-Packard). We have identified three
dimensions that may be used to clarify the vagueness in defining and applying terms
such as problem, anomaly, failure, fault or defect. In Section 3 we discuss why
analyzing data from problem reports is interesting for quality assessment and who the
users of such data are. Section 4 discusses practical problems in defining goals and
metrics, collecting and analyzing data, and reporting the results through some examples.
Finally, Section 5 contains discussion and conclusion.
2. TERMINOLOGY
There is great diversity in the literature on the terminology used to report software or
system related problems. The possible differences between problems, troubles, bugs,
anomalies, defects, errors, faults or failures are discussed in books (e.g., [7]), standards
and classification schemes such as IEEE Std. 1044-1993, IEEE Std. 982.1-1988 and
982.2-1988 [13], the United Kingdom Software Metrics Association (UKSMA)’s
scheme [24] and the SEI’s scheme [8], and papers; e.g., [2, 9, 14]. The intention of this
section is not to provide a comparison and draw conclusions, but to classify differences
and discuss the practical impacts for research. We have identified the following three
questions that should be answered to distinguish the above terms from one another, and
call these as problem dimensions:
What- appearance or cause: The terms may be used for manifestation of a problem
(e.g., to users or testers), its actual cause or the human encounter with software. While
there is consensus on “failure” as the manifestation of a problem and “fault” as its
cause, other terms are used interchangeably. For example, “error” is sometimes used for
the execution of a passive fault, and sometimes for the human encounter with software
[2]. Fenton uses “defect” collectively for faults and failures [7], while Kajko-Mattson
defines “defect” as a particular class of cause that is related to software [14].
Where- Software (executable or not) or system: The reported problem may be related to
software or the whole system including system configuration, hardware or network
problems, tools, misuse of system etc. Some definitions exclude non-software related
problems while others include them. For example, the UKSMA’s defect classification
scheme is designed for software-related problems, while SEI uses two terms: “defects”
104
are related to the software under execution or examination, while “problems” may be
caused by misunderstanding, misuse, hardware problems or a number of other factors
that are not related to software. Software related problems may also be recorded for
executable software or all types of artefacts: “Fault” is often used for an incorrect step,
logic or data definition in a computer program (IEEE STd. 982.1-1998), while a
“defect” or “anomaly” [13] may also be related to documentation, requirement
specifications, test cases etc. In [14], problems are divided into static and dynamic ones
(failures), where the dynamic ones are related to executable software.
When- detection phase: Sometimes problems are recorded in all life cycle phases, while
in other cases they are recorded in later phases such as in system testing or later in field
use. Fenton gives examples of when “defect” is used to refer to faults prior to coding
[7], while according to IEEE STd. 982.1-1998, a “defect” may be found during early
life cycle phases or in software mature for testing and operation [from 14]. SEI
distinguishes the static finding mode which does not involve executing the software
(e.g., reviews and inspections) from the dynamic one.
Until there is agreement on the terminology used in reporting problems, we must be
aware of these differences and answer the above questions when using a term.
Some problem reporting systems cover enhancements in addition to corrective changes.
For example, an “anomaly” in IEEE Std. 1044-1993 may be a problem or an
enhancement request, and the same is true for a “bug” as defined by OSS (Open Source
Software) bug reporting tools such as Bugzilla [3] or Trac [23]. An example of
ambiguity in separating change categories is given by Ostrand et al. in their study of 17
releases of an AT&T system [20]. In this case, there was generally no identification in
the database of whether a change was initiated because of a fault, an enhancement, or
some other reason such as a change in the specifications. The researches defined a rule
of thumb that if only one or two files were changed by a modification request, then it
was likely a fault, while if more than two files were affected, it was likely not a fault.
We have seen examples where minor enhancements were registered as problems to
accelerate their implementation and major problems were classified as enhancement
requests (S5 and S6 in Section 4).
In addition to the diversity in definitions of a problem, problem report fields such as
Severity or Priority are also defined in multiple ways as discussed in Section 4.
3. QUALITY VIEWS AND DEFECT DATA
In this section, we use the term “problem report” to cover all recorded problems related
to software or other parts of a system offering a service, executable or non-executable
artefacts, and detected in phases specified by an organization, and a “defect” for the
cause of a problem.
Kitchenham and Pfleeger refer to David Garvin’s study on quality in different
application domains [15]. It shows that quality is a complex and multifaceted concept
that can be described from five perspectives: The user view (quality as fitness for
105
purpose or validation), the product view (tied to characteristics of the product), the
manufacturing view (called software process view here or verification as conformance
to specification), the value-based view (quality depends on the amount a customer is
willing to pay for it), and the transcendental view (quality can be recognized but not
defined). We have dropped the transcendental view since it is difficult to measure, and
added the planning view (quality as conformance to plans) as shown in Figure 1 and
described below (“Q” stands for a Quality view). While there are several metrics to
evaluate quality in each of the above views, data from problem reports are among the
few measures of quality being applicable to most views.
Q1
. qu
alit
y-
al
ern
ext trics
d
e
n
a
m
nal ality
ter
. in ct qu
2
Q odu
developers
pr
inuse
s
n
o
Defect
cti
rre
data
co rk
f
o
o
s,
e
w
res
alu of re
g
v
o
.
r
t
t p ng
Q5 . cos
ec nni
vs
roj pla
p
e
.
Q4 ourc
res
user
Q3
.p
roc
ess
q
ual
ity
m
etri
cs
quality manager
project leader
Figure 1. Quality views associated to defect data, and relations between them
Q1.Evaluating product quality from a user’s view. What truly represents software
quality in the user’s view can be elusive. Nevertheless, the number and frequency of
defects associated with a product (especially those reported during use) are inversely
proportional to the quality of the product [8], or more specific to its reliability. Some
problems are also more severe from the user’s point of view.
Q2.Evaluating product quality from the organization’s (developers’) view. Product
quality can be studied from the organization’s view by assuming that improved
internal quality indicators such as defect density will result in improved external
behavior or quality in use [15]. One example is the ISO 9126 definition of internal,
external and quality-in-use metrics. Problem reports may be used to identify defectprone parts and take actions to correct them and prevent similar defects.
Q3.Evaluating software process quality. Problem reports may be used to identify when
most defects are injected, e.g., in requirement analysis or coding. Efficiency of
Verification and Validation (V&V) activities in identifying defects and the
organization’s efficiency in removing such defects are also measurable by defining
proper metrics of defect data [5].
Q4.Planning resources. Unsolved problems represent work to be done. Cost of rework
is related to the efficiency of the organization to detect and solve defects and to the
maintainability of software. A problem database may be used to evaluate whether
the product is ready for roll-out, to follow project progress and to assign resources
for maintenance and evolution.
Q5.Value-based decision support. There should be a trade-off between the cost of
repairing a defect and its presumed customer value. Number of problems and
106
criticality of them for users may also be used as a quality indicator for purchased or
reused software.
Table 1. Relation between quality views and problem dimensions
Quality view
Problem
Examples of problem report fields to
Dimension
evaluate a quality view
Q1-user,
Q4planning and Q5value-based
Q2developer
and Q3-process
what-external
appearance
wheresystem,
executable
software or not (user
manuals),
when-field
use
what-cause,
wheresoftware, executable or
not, when-all phases
IEEE Std. 1044-1993 sets Customer value in the
recognition phase of a defect. It also asks about impacts
on project cost, schedule or risk, and correction effort
which may be used to assign resources.
The count or density of defects may be used to compare
software developed in-house with reused.
ODC is designed for in-process feedback to developers
before operation.
IEEE Std. 1044-1993 and the SEI’s scheme cover
defects detected in all phases and may be used to
compare efficiency of V&V activities. Examples of
metrics types of defects and the efficiency of V&V
activities in detecting them.
Table 1 relates the dimensions defined in Section 2 to the quality views. E.g., in the first
row, “what-external appearance” means that the external appearance of a problem is
important for users, while the actual problem cause is important for developers (Q2developer). Examples of problem report fields or metrics that may be used to assess a
special quality view are given. Mendonça and Basili [17] call identifying quality views
as identifying data user groups.
We conclude that the contents of problem reports should be adjusted to quality views.
We discuss the problems we faced in our use of problem reports in the next section.
4. INDUSTRIAL CASES
Ours and other’s experience from using problem reports in assessment, control or
prediction of software quality (the three quality functions defined in [21]) shows
problems in defining measurement goals and metrics, collecting data from problem
reporting systems, analyzing data and finally reporting the results. An overview of our
case studies is shown in Table 2.
4.1 Research Questions and Metrics
The most common purpose of a problem reporting system is to record problems and
follow their status (maps to Q1, Q4 and Q5). However, as discussed in Section 3, they
may be used for other views as well if proper data is collected. Sometimes quality views
and measurement goals are defined top-down when initiating a measurement program
(e.g., by using the Goal-Question-Metric paradigm [1]), while in most cases the topdown approach is followed by a bottom-up approach such as data-mining or Attribute
107
Focusing (AF) to identify useful metrics when some data is available; e.g., [17, 19, 22].
We do not intend to focus on the goals more than what is already discussed in Section 3
and refer to literature on that. But we have encountered the same problem in several
industrial cases which is the difficulty of collecting data across several tools to answer a
single question. Our experience suggests that questions that need measures from
different tools are difficult to answer unless effort is spent to integrate the tools or data.
Examples are:
− In S6, problems for systems not based on the reusable framework were not recorded
in the same way as those based on it. Therefore it was not possible to evaluate
whether defect density is improved or not by introducing a reusable framework [12].
− In S5, correction effort was recorded in an effort reporting tool and modified
modules could be identified by analyzing change logs in the configuration
management tool, without much interoperability between these tools and the
problem reporting tool. This is observed in several studies. Although problem
reporting systems often included fields for reporting correction effort and
modifications, these data were not reliable or consistent with other data. Thus
evaluating correction effort or the number of modified modules per defect or type of
defect was not possible.
Graves gives another example on the difficulty of integrating data [11]. The difference
between two organizations’ problem reporting systems within the same company lead to
a large discrepancy in the fault rates of modules developed by the two organizations
because the international organization would report an average of four faults for a
problem that would prompt one fault for the domestic organization.
To solve the problem, researchers often collect or mine industrial data, transform it and
save it in a common database for further analysis. Examples are given in the next
section.
Table 2. Case Studies using data from problem reports
System Id. and description
Approximate size (KLOC) and
programming language
S1- Financial system
S2- Controller software for a
real-time embedded system
S3- Public administration
application
S4- combined web system
and task management system
S5- Large telecom system
S6- a reusable framework for
developing software systems
for oil and gas sector
Not available (but large) in C,
COBOL and COBOL II
271 in C and C++
No. of
problem
reports
52
3
360
4
952 in Java and XML
1684
10
Not available (but large), in Java
379
3
480 in the latest studied release
in Erlang, C and Java
16 in Java
2555
2
223
3
108
No. of
releases
reported on
4.2 Collecting and Analyzing Data
Four problems are discussed in this section:
1. Ambiguity in defining problem report fields even when the discussion on
terminology is settled. A good example is the impact of a problem:
− The impact of a problem on the reporter (user, customer, tester etc.) is called for
Severity in [24], Criticality in [8] or even Product status in IEEE Std. 10441993. This field should be set when reporting a problem.
− The urgency of correction from the maintenance engineer’s view is called
Priority in [24], Urgency in [8] or Severity in IEEE Std. 1044-1993. It should be
set during resolution.
Some problem reporting systems include the one or the other, or even do not
distinguish between these. Thus, the severity field may be set by the reporter and
later changed by the maintenance engineer. Here are some examples on how these
fields are used:
− For reports in S1 and S4 there was only one field (S1 used “Consequence”, while
S4 used “Priority”), and we do not know if the value has been changed from the
first report until the fault has been fixed.
− S2 used the terms “Severity” and “Priority” in the reports.
− S3 used two terms: “Importance” and “Importance Customer”, but these were
mostly judged to be the same.
In [14], it is recommended to use four fields for reporter and maintenance criticality,
and reporter and maintenance priority. We have not seen examples of such detailed
classification. In addition to the problem of ambiguity in definitions of severity or
priority, there are other concerns:
− Ostrand et al. reported that severity ratings were highly subjective and also
sometimes inaccurate because of political considerations not related to the
importance of the change to be made. It might be downplayed so that friends or
colleagues in the development organization “looked better”, provided they
agreed to fix it with the speed and effort normally reserved for highest severity
faults [20].
− Severity of defects may be downplayed to allow launching a release.
− Probably, most defects are set to medium severity which reduces the value of
such classification. E.g., 90% of problem reports in S1, 57% in S2, 72% in S3,
57% in S4, and 57% in release 2 of S5 (containing 1953 problem reports) were
set to medium severity.
2. A second problem is related to release-based development. While most systems are
developed incrementally or release-based, problem reporting systems and
procedures may not be adapted to differ between releases of a product. As an
example, in S6 problem reports did not include release number, only date of
reporting. The study assumed that problems are related to the latest release. In S5,
we experienced that the size of software components (used to measure defect
density) was not collected systematically on the date of a release. Problem report
fields had also changed between releases, making data inconsistent.
109
3. The third problem is related to the granularity of data. Location of a problem used to
measure defect density or counting defects may be given for large components or
subsystems (as in S6) or fine ones (software modules or functional modules as in
S4) or both (as in S5). Too coarse data gives little information while collecting fine
data needs more effort.
4. Finally, data is recorded in different formats and problem reporting tools. The
commercial problem reporting tools used in industry in our case studies often did
not help data collection and analysis. In S1, data were given to researchers as
hardcopies of the problem reports, which were scanned and converted to digital
form. In S2, the output of the problem reporting system was a HTML document. In
S3 and S4, data were given to researchers in Microsoft Excel spreadsheet, which
provides some facilities for analysis but not for advanced analysis. In S5, problem
reports were stored in text files and were transferred to a SQL database by
researchers. In S6, data were transferred to Microsoft Excel spreadsheets for further
analysis. Thus, researchers had to transform data in most cases. In a large-scale
empirical study to identify reuse success factors, data from 25 NASA software
projects were inserted by researchers in a relational database for analysis [22]. One
plausible conclusion is that the collected data were rarely analyzed by organizations
themselves, beyond collecting simple statistics.
The main purpose for industry should always be to collect business-specific data and
avoid "information graveyards". Unused data are costly, lead to poor data quality (low
internal validity) and even animosity among the developers. Improving tools and
routines allows getting sense of collected data and giving feedback.
4.3 Validity Threats
We have identified the following main validity threats in the studies:
1. Construct validity is related to using counts of problems (or defects) or their density
as quality indicators. For example, high defect density before operation may be an
indicator of thorough testing or poor quality. Since this is discussed in the papers in
the reference list, we refer to them and [21] on validating metrics.
2. Internal validity: Missing, inconsistent or wrong data is a threat to internal validity.
Table 3 shows the percentages of missing data in some studies. In Table 3,
“Location” gives the defect-prone module or component, while “Type” has different
classifications in different studies.
Table 3. Percentages of missing data
System Severity Location
Type
Id
S1
0
0
0
S2
4.4
25.1
2.5
S3
20.0
20.0
8.6* (4.3)
S4
0
0
9.0* (8.4)
S5
0**
22 for large
44 for 12
subsystems,
releases in
46 for
the dataset
smaller
110
blocks
inside
subsystems
**
Notes:
*These are the sum of uncategorized data points (unknown, duplicate, not fault). In parentheses are “unknown” only.
** For release 2
The data in Table 3 shows large variation is different studies, but the problem is
significant in some cases. Missing data is often related to the problem reporting
procedure that allows reporting a problem or closing it without filling all the fields.
We wonder whether problem reporting tools may be improved to force developers
entering sufficient information. In the meantime, researchers have to discuss the
introduced bias and how missing data is handled, for example by mean substitution
or verifying random missing. One observation is that most cases discussed in this
paper collected data at least on product, location of a fault or defect, severity
(reporter or developer or mixed) and type of problem. These data may therefore base
a minimum for comparing systems and release, but with sufficient care.
3. Conclusion validity: Most studies referred to in this paper have applied statistical
tests such as t-test, Mann-Whitney test or ANOVA. In most cases, there is no
experimental design and neither is random allocation of subjects to treatments.
Often all available data is analyzed and not samples of it. Preconditions of tests such
as the assumption of normality or equal variances should be discussed as well.
Studies often chose a fixed significance level and did not discuss the effect size or
power of the tests (See [6]). The conclusions should therefore be evaluated with
care.
4. External validity or generalization: There are arguments for generalization on the
background of cases, e.g., to products in the same company if the case is a probable
one. But “formal” generalization even to future releases of the same system needs
careful discussion [10]. Another type of generalization is to theories or models [16]
which is seldom done. Results of a study may be considered as relevant, which is
different from generalizable.
4.4 Publishing the Results
If a study manages to overcome the above barriers in metrics definition, data collection
and analysis, there is still the barrier of publishing the results in major conferences or
journals. We have faced the following:
1. The referees will justifiably ask for a discussion of the terminology and the relation
between terms used in the study and standards or other studies. We believe that this
is not an easy task to do, and hope that this paper can help clarifying the issue.
2. Collecting evidence in the field needs comparing the results across studies, domains
and development technologies. We tried to collect such evidence for studies on
software reuse and immediately faced the challenge of inconsistent terminology and
ambiguous definitions. More effort should be put in meta-analysis or review type
studies to collect evidence and integrate the results of different studies.
111
3. Companies may resist publishing results or making data available to other
researches.
5. DISCUSSION AND CONCLUSION
We here described our experience with using problem reports for quality assessment in
various industrial studies. While industrial case studies assure a higher degree of
relevance, there is little control of collected data. In most cases, researchers have to
mine industrial data, transform or recode it, and cope with missing or inconsistent data.
Relevant experiments can give more rigor (such as in [2]), but the scale is small. We
summarize the contributions of this paper in answering the following questions:
1. What is the meaning of a defect versus other terms such as error, fault or failure?
We identified three questions to answer in Section 2: what- whether the term applies
to manifestation of a problem or its cause, where- whether problems are related to
software or the environment supporting it as well, and whether the problems are
related to executable software or all types of artifacts, and when- whether the
problem reporting system records problems detected in all or some life cycle phases.
We gave examples on how standards and schemes use different terms and are
intended for different quality views (Q1 to Q5).
2. How data from problem reports may be used to evaluate quality from different
views? We used the model described in [15] and extended in Section 3. Measures
from problem or defect data is one the few measures used in all quality views.
3. How data from problem reports should be collected and analyzed? What is the
validity concerns using such reports for evaluating quality? We discussed these
questions with examples in Section 4. The examples show challenges that
researchers face in different phases of research.
One possible remedy to ensure consistent and uniform problem reporting is to use a
common tool for this - cf. the OSS tools Bugzilla or Trac (which stores data in SQL
databases with search facilities). However, companies will need local relevance
(tailoring) of the collected data and will require that such a tool can interplay with
existing processes and tools, either for development or project management - i.e.,
interoperability. Another problem is related to stability and logistics. Products,
processes and companies are volatile entities, so that longitudinal studies may be very
difficult to perform. And given the popularity of sub-contacting/outsourcing, it is
difficult to impose a standard measurement regime (or in general to reuse common
artifacts) across subcontractors possibly in different countries. Nevertheless, we
evaluate adapting an OSS tool and defining a common defect classification scheme for
our research purposes and collecting the results of several studies.
6. REFERENCES
[1] Basili, V.R., Caldiera, G. and Rombach, H.D. Goal Question Metrics Paradigm. In
Encyclopedia of Software Engineering, Wiley, I (1994), 469-476.
112
[2] Basili, V.R., Briand, L.C. and Melo, W.L. How software reuse influences productivity in
object-oriented systems. Communications of the ACM, 39, 10 (Oct. 1996), 104-116.
[3] The Bugzilla project: http://www.bugzilla.org/
[4] Børretzen, J.A. and Conradi, R. Results and experiences from an empirical study of fault
reports in industrial projects. Accepted for publication in Proceedings of the 7th International
Conference on Product Focused Software Process Improvement (PROFES'2006), 12-14 June,
2006, Amsterdam, Netherlands, 6 p.
[5] Chillarege, R. and Prasad, K.R. Test and development process retrospective- a case study
using ODC triggers. In Proceedings of the International Conference on Dependable Systems
and Networks (DSN’02), 2002, 669- 678.
[6] Dybå, T., Kampenes, V. and Sjøberg, D.I.K. A systematic review of statistical power in
software engineering experiments. Accepted for publication in Journal of Information and
Software Technology.
[7] Fenton, N.E. and Pfleeger, S.L. Software Metrics. A Rogorous & Practical Approach.
International Thomson Computer Press, 1996.
[8] Florac, W. Software quality measurement: a framework for counting problems and defects.
Software Engineering Institute, Technical Report CMU/SEI-92-TR-22, 1992.
[9] Freimut, B. Developing and using defect classification schemes. IESE- Report No.
072.01/E, Version 1.0, Fraunhofer IESE, Sept. 2001.
[10] Glass, R.L. Predicting future maintenance cost, and how we’re doing it wrong. IEEE
Software, 19, 6 (Nov. 2002), 112, 111.
[11] Graves, T.L., Karr, A.F., Marron, J.S. and Harvey, S. Predicting fault incidence using
software change history. IEEE Trans. Software Eng., 26, 7 (July 2000), 653-661.
[12] Haug, M.T. and Steen, T.C. An empirical study of software quality and evolution in the
context of software reuse. Student project report, Department of Computer and Information
Science, NTNU, 2005.
[13] IEEE standards on http://standards.ieee.org
[14] Kajko-Mattsson, M. Common concept apparatus within corrective software maintenance.
In Proceedings of 15th IEEE International Conference on Software Maintenance (ICSM'99),
IEEE Press, 1999, 287-296.
[15] Kitchenham, B. and Pfleeger, S.L. Software quality: the elusive target. IEEE Software, 13,
10 (Jan. 1996), 12-21.
[16] Lee, A.S. and Baskerville, R.L. Generalizing generalizability in information systems
research. Information Systems Research, 14, 3 (2003), 221-243.
[17] Mendonça, M.G. and Basili, V.R. Validation of an approach for improving existing
measurement frameworks. IEEE Trans. Software Eng., 26, 6 (June 2000), 484-499.
[18] Mohagheghi, P., Conradi, R., Killi, O.M. and Schwarz, H. An empirical study of software
reuse vs. defect-density and stability. In Proceedings of the 26th International Conference on
Software Engineering (ICSE’04), IEEE Press, 2004, 282-292.
[19] Mohagheghi, P. and Conradi, R. Exploring industrial data repositories: where software
development approaches meet. In Proceedings of the 8th ECOOP Workshop on Quantitative
Approaches in Object-Oriented Software Engineering (QAOOSE’04), 2004, 61-77.
[20] Ostrand, T.J., Weyuker, E.J. and Bell, R.M. Where the bugs are. In Proceedings of the
International Symposium on Software Testing and Analysis (ISSTA’04), ACM SIGSOFT
Software Engineering Notes, 29, 4 (2004), 86–96.
113
[21] Schneidewind, N.F. Methodology for validating software metrics. IEEE Trans. Software
Eng., 18, 5 (May 1992), 410-422.
[22] Selby, W. Enabling reuse-based software development of large-scale systems. IEEE Trans.
SE, 31, 6 (June 2005), 495-510.
[23] The Trac project: http://projects.edgewall.com/trac/
[24]UKSMA- United Kingdom Software Metrics Association: http://www.uksma.co.uk/
114
P4. Investigating the Software Fault Profile of Industrial
Projects to Determine Process Improvement Areas: An
Empirical Study
Jon Arvid Børretzen and Jostein Dyre-Hansen
Department of Computer and Information Science,
Norwegian University of Science and Technology (NTNU),
NO-7491 Trondheim, Norway
{borretze, dyrehans}@idi.ntnu.no
Abstract. Improving software processes relies on the ability to analyze previous projects
and derive which parts of the process that should be focused on for improvement. All
software projects encounter software faults during development and have to put much effort
into locating and fixing these. A lot of information is produced when handling faults,
through fault reports. This paper reports a study of fault reports from industrial projects,
where we seek a better understanding of faults that have been reported during development
and how this may affect the quality of the system. We investigated the fault profiles of five
business-critical industrial projects by data mining to explore if there were significant
trends in the way faults appear in these systems. We wanted to see if any types of faults
dominate, and whether some types of faults were reported as being more severe than others.
Our findings show that one specific fault type is generally dominant across reports from all
projects, and that some fault types are rated as more severe than others. From this we could
propose that the organization studied should increase effort in the design phase in order to
improve software quality.
1. Introduction
Improving software quality is a goal most software development organizations aim for.
This is not a trivial task, and different stakeholders will have different views on what
software quality is. In addition, the character of the actual software will influence what
is considered the most important quality attributes of that software. For many
organizations, analyzing routinely collected data could be used to improve their process
and product quality. Fault report data is one possible source of such data, and research
shows that fault analysis can be a good approach to software process improvement [1].
The Business-Critical Software (BUCS) project [2] is seeking to develop a set of
techniques to improve support for analysis, development, operation, and maintenance of
business-critical systems. Aside from safety-critical systems, like air-traffic control and
health care systems, there are other systems that we also expect will run correctly
because of the possibly severe effects of failure, even if the consequences are mainly of
an economic nature. This is what we call business-critical systems and software. In
these systems, software quality is highly important, and the main target for developers
will be to make systems that operate correctly [2]. One important issue in developing
these kinds of systems is to remove any possible causes for failure, which may lead to
wrong operation of the system. In a previous study [3], we investigated fault reports
115
from four business-critical industrial software projects. Building on the results of that
study, we look at fault reports from five further projects. The study presented here
investigated fault reports from five industrial software projects. It investigates the fault
profiles in two main dimensions; Fault type and fault severity.
The rest of this paper is organized as follows. Section 2 gives our motivation and related
work. Section 3 describes the research design and research questions. Section 4 presents
the results found, and Section 5 presents analysis and discussion of the results. The
conclusion and further work is presented in Section 6.
2. Motivation and Related Work
The motivation for the work described in this paper is to further the knowledge gained
from a previous study on fault reports from industrial projects. We also wanted to
present empirical data on the results of fault classification and analysis, and show how
this can be of use in a software process improvement setting.
When considering quality improvement in terms of fault analysis, there are several
related topics to consider. Several issues about fault reporting are discussed in [4] by
Mohagheghi et al. General terminology in fault reporting is one problem mentioned,
validity of use of fault reports as a means for evaluating software quality is another. One
of its conclusions is that “There should be a trade-off between the cost of repairing a
fault and its presumed customer value. The number of faults and their severity for users
may also be used as a quality indicator for purchased or reused software.”
Software quality is a notion that encompasses a great number of attributes. The ISO
9126 standard defines many of these attributes as sub-attributes of the term “quality of
use” [5]. When speaking about business-critical systems, the critical quality attribute is
often experienced as the dependability of the system. In [6], Laprie states that “a
computer system’s dependability is the quality of the delivered service such that reliance
can justifiably be placed on this service.” According to Littlewood and Strigini [7],
dependability is a software quality attribute that encompasses several other attributes,
the most important are reliability, availability, safety and security. The term
dependability can also be regarded subjectively as the “amount of trust one has in the
system”.
Much effort is being put into reducing the probability of software failures, but this has
not removed the need for post-release fault-fixing. Faults in the software are detrimental
to the software’s quality, to a greater or lesser extent dependent on the nature and
severity of the fault. Therefore, one way to improve the quality of developed software is
to reduce the number of faults introduced into the system during development. Faults
are potential flaws in a software system, that later may be activated to produce an error.
An error is the execution of a "passive fault", leading to a failure. A failure results in
observable and erroneous external behaviour, system state or data state. The remedies
known for errors and failures are to limit the consequences of an active error or failure,
in order to resume service. This may be in the form of duplication, repair, containment
116
etc. These kinds of remedies do work, but as Leveson states in [8], studies have shown
that this kind of downstream (late) protection is more expensive than preventing the
faults from being introduced into the code.
Faults that have been introduced into the system during implementation can be
discovered either by inspection before the system is run, by testing during development
or when the application is run on site. The discovered faults are then reported in a fault
reporting system, to be fixed later. Faults are also commonly known as defects or bugs,
while another, similar but more extensive concept is anomalies, which is used in the
IEEE 1044 standard [9].
Orthogonal Defect Classification – ODC – is one way of studying defects in software
systems, and is mainly suited to design and coding defects. [10, 11, 12, 13, 14] are some
papers on ODC and using ODC in empirical studies. ODC is a scheme to capture the
semantics of each software fault quickly.
It has been discussed in several papers if faults can be tied to the reliability in a more or
less cause-effect relationship. Some papers like [12, 14, 15] indicate that this kind of
connection is valid, while others like [16] are more critical to this approach.
Even if many of the studies point towards a connection being present between faults and
reliability, they also emphasize that it is not easy to tie faults to reliability directly.
Thus, it is not given that a system with a low number of faults necessarily has a higher
reliability than a system with a high number of faults. Still, reducing the number of
faults in a system will make the system less prone to failure, so if you can remove the
faults you find without adding new ones, there is a good case for the reliability of the
system being increased. This is called “reliability-growth models”, and is discussed by
Hamlet in [16] and by Paul et al. in [15].
Avizienis et al. state [17] that the fault prevention and fault tolerance aim to provide the
ability to deliver a service that can be trusted, while fault removal and fault forecasting
aim to reach confidence in that ability by justifying that the functional and the
dependability and security specifications are adequate and that the system is likely to
meet them. Hence, by working towards techniques that can prevent faults and reduce
the number and severity of faults in a system, the quality of the system can be improved
in the area of dependability.
An example of results in a related study is the work done in Vinter and Lauesen [18].
This paper used a different fault taxonomy as proposed by Bezier [19], and reports that
in their studied project close to a quarter of the faults found were of the type
“Requirements and Features”.
3. Research design
This paper builds on a previous study [3] where we investigated the fault profiles of
industrial projects, and this paper expands on those findings, using a similar research
design. We want to explore the fault profiles of the studied projects with respect to fault
117
types and fault severity. In order to study the faults, we categorized them into fault types
as described in Section 3.2.
3.1 Research questions
Initially we want to find which types of faults which are most frequent, and also the
distribution of faults into different fault types:
RQ1: Which types of faults are most common for the studied projects?
When we know which types of faults dominate and where these faults appear in the
systems, we can choose to concentrate on the most serious ones in order to identify the
most important issues to target in improvement work (note that the severity of the faults
are judged by the developers who report the faults):
RQ2: Which fault types are rated as the most severe faults?
We also want to compare the results from this study with the results we found in the
previous study on this topic [3]:
RQ3: How do the results of this study compare with our previous fault report study?
3.2 Fault categorization
There are several taxonomies for fault types, two examples are the ones used in the
IEEE 1044 standard [9] and in a variant of the Orthogonal Defect Classification (ODC)
scheme by El Emam and Wieczorek [12]. The fault reports we received were already
categorized in some manner by the developers and testers, but using a very broad
categorization scheme, which mainly placed the fault into categories of “fault caused by
others”, “change request”, “test environment fault”, “analysis/design fault”, “test fault”
and “coding fault”. The fault types used in this study is shown in Table 1. This is very
similar to the ODC scheme used in [12], but with the addition of a GUI fault type. The
reason this classification scheme was used, is that it is quite simple to use but still
discerns the fault types well. Further descriptions of the fault types used can be found in
Chillarege et al. [13].
Table 1. Fault types used in this study
Fault types
Algorithm
Assignment
Checking
Data
Documentation
Environment
Function
GUI
Interface
Relationship
Timing/serialization
Unknown
The categorization of faults in this investigation has been performed by the authors of
this paper, based on the fault reports’ textual description and partial categorization.
In addition, grading the faults’ consequences upon the system and system environment
enables fault severities to be defined. All severity grading was done by the developers
and testers performing the fault reporting in the projects. In the projects under study, the
faults have been graded on a severity scale from 1 to 5, where 1 is “critical” and 5 is
“change request”. The different severity classifications are shown in Table 2.
118
Table 2. Fault severity classification
Fault severity
classification
1
2
3
4
5
Critical
Can not be circumvented
Can be circumvented
Cosmetic
Change request
3.3 The data sample
The data collected for this study comes from five different projects, all from the same
company, but from variously located development groups. The software systems
developed in these projects are all on-line systems of a business-critical nature, and they
have all been put into full or partial production. Altogether, we classified and analyzed
981 fault reports from the five projects. Table 3 contains information about the
participating projects. The fault reports consisted of fault summary, severity rating, a
coarse fault categorization, description of fault and comments made by testers and
developers after the fault had been reported, while fixing the fault.
Table 3. Information about the participating projects
Project
Project
description
Technical
platform
Development
language
Development
effort (hours)
Number of
fault reports
P1
Registering
data
J2EE
P2
Administratio
n tool
J2EE
P3
Merging of
applications
Unix, Oracle
P5
Transaction
tool
N/A
Java
P4
Administratio
n tool
J2EE, Unix,
Oracle
Java
Java
Java
N/A
7900
14000
6000
2100
490
212
42
34
123
Java
4. Results
4.1 RQ1 – Which types of faults are most frequent?
To answer RQ1, we look at the distribution of the fault type categories for the different
projects. Table 4 shows the distribution of faults types across all projects studied, Table
5 shows distribution of faults for each project. A plot of Table 5 is shown in Figure 1.
We see that “function” and “GUI” faults are the most common fault types, with
Assignment also being quite frequent. Some faults like “documentation”, “relationship”,
“timing/serialization” and “interface” faults are not frequent.
119
Table 4. Fault type distribution across all projects
Fault type
# of faults
Function
GUI
Unknown
Assignment
Checking
Data
Algorithm
Environment
Interface
Timing/Serialization
Relationship
Documentation
%
191
138
87
75
58
46
37
36
11
11
9
8
27,0 %
19,5 %
12,3 %
10,6 %
8,2 %
6,5 %
5,2 %
5,1 %
1,6 %
1,6 %
1,3 %
1,1 %
Table 5. Fault type distribution for each project
Fault type
P1
Algorithm
Assignment
Checking
Data
Documentation
Environment
Function
GUI
Interface
Relationship
Timing/Serialization
Unknown
P2
1,1 %
9,5 %
6,3 %
1,9 %
1,4 %
4,6 %
25,3 %
29,9 %
0,3 %
0,3 %
1,4 %
18,2 %
P3
12,0 %
7,4 %
15,4 %
15,4 %
0,6 %
7,4 %
24,0 %
5,7 %
1,1 %
1,7 %
2,3 %
6,9 %
P4
4,9 %
14,6 %
2,4 %
2,4 %
0,0 %
2,4 %
53,7 %
14,6 %
0,0 %
0,0 %
2,4 %
2,4 %
P5
6,7 %
26,7 %
0,0 %
3,3 %
0,0 %
3,3 %
36,7 %
6,7 %
10,0 %
3,3 %
0,0 %
3,3 %
8,6 %
14,0 %
7,5 %
10,8 %
2,2 %
4,3 %
24,7 %
10,8 %
5,4 %
4,3 %
1,1 %
6,5 %
Unknow n
100 %
Timing/Serialization
90 %
Relationship
80 %
Interface
70 %
GUI
60 %
Function
50 %
Environment
40 %
Documentation
30 %
Data
20 %
Checking
10 %
Assignment
Algorithm
0%
P1
P2
P3
P4
P5
Fig. 1. Fault type distribution for each project
If we focus only on the faults that are rated with “critical” severity (7.6% of all faults),
the distribution is as shown in Figure 2. “Function” faults do not just dominate the total
distribution, but also the distribution of “critical” faults. A very similar distribution is
also the case for “can not be circumvented” severity rated faults.
When looking at the distribution of faults, especially for the high severity faults, we see
that “function” faults dominate the picture, We also see that for all faults, “GUI” faults
120
have a large share (19.5% in total) of the reports, while for the critical severity faults the
share of “GUI” faults are strongly reduced to 1.5%.
40,00 %
35,00 %
30,00 %
25,00 %
20,00 %
15,00 %
10,00 %
5,00 %
G
U
I
In
te
rfa
D
ce
oc
um
en
ta
ti o
n
D
at
a
R
el
a
Ti
tio
m
ns
in
hi
g/
p
Se
ri a
li z
at
io
n
C
he
ck
in
g
Fu
nc
tio
n
U
nk
no
w
n
As
si
gn
m
en
En
t
vi
ro
nm
en
t
Al
go
rit
hm
0,00 %
Fig. 2. Distribution of faults rated as critical
4.2 RQ2 – What types of faults are rated as most severe?
As for the severity of fault types, Figure 3 illustrates how the distribution of severities
was for each fault type. The “relationship” fault type has the highest share of “critical”
faults, and also the highest share when looking at both “critical” and “can not be
circumvented” severity faults. The most numerous fault type “function”, does not stand
out as a particularly severe fault type compared with the others. The fault types that
show themselves to be rated as least severe, are “GUI” and “data” faults.
100 %
80 %
5 - Enhancement
60 %
4 - Cosmetic
3 - Can be circumvented
40 %
2- Can not be circumvented
1- Critical
20 %
D
at
a
Al
go
rit
h
En
m
vi
ro
nm
en
t
In
Ti
te
m
rfa
in
g/
c
e
Se
ri a
li z
at
io
R
n
el
at
io
ns
D
hi
oc
p
um
en
ta
ti o
n
G
U
I
nk
no
w
As
n
si
gn
m
en
t
C
he
ck
in
g
U
Fu
nc
tio
n
0%
Fig. 3. Distribution of severity with respect to fault types for all projects
4.3 RQ3 – How do the results compare with the previous study?
Previously, we conducted a similar study of fault reports from industrial projects, which
is described in [3]. In the previous study, “function” faults were the dominant fault type,
making out 33.3% to 61.3% of the reported faults in the four investigated projects. The
percentage of “function” faults is lower for the five projects studied for this paper, but is
still the dominant fault type making out 24.0% to 53.7% of the reported faults in P1 to
P5 as shown in Table 5.
121
When looking at the highest severity rated faults reported, this study also shows that
“function” faults are the most numerous of the “critical” severity rated faults as shown
in Figure 2 with 35.8%. This is in line with the previous study where “function” faults
were also dominant among the most severe faults reported, with 45.3%.
5. Analysis and discussion
5.1 Implications of the results
The results found in this study coincide with the results of the previous fault study we
performed with different development organizations. In both studies the “function”
faults have been the most numerous, both in general and among the faults rated as most
severe. As “function” faults are mainly associated with the design process phase, as
stated by Chillarege et al. in [13] and also by Zheng et al. in [20] as shown in Table 6,
this indicates that a large number of faults had their origin in early phases of
development. This is a sign that the design and specification process is not working as
well as it should, making it the source of faults that are demanding and expensive to fix,
as “function” faults will generally involve larger fixing efforts than pure code errors
like “checking” and “assignment” types of faults. This means that we can recommend
the developers in the projects that have been studied to increase the effort used during
design in order to reduce the total number of effort demanding faults in their products.
This finding is also similar to the one from the study of Vinter and Lauesen [18], where
“Requirements and Features” faults were the dominating fault type.
Table 6. ODC fault types and development process phase associations [20]
Process
Association
Design
Low
Level
Design
Code
Library Tools
Publications
Fault types
Function
Interface, Checking, Timing/Serialization,
Algorithm
Checking, Assignment
Relationship
Documentation
When looking at each fault type in Figure 3, we see which fault types that tend to
produce the most severe faults. One observation here is that although “function” faults
dominate the picture for critical severity faults in Figure 2, it is the “relationship” and
“timing/serialization” fault types that consist of the most critical severity rated faults.
It can therefore be argued that the fault types “relationship” and “timing/serialization”
fault types are important to prevent, as it is likely that these types of faults have greater
consequences than those of for instance “GUI” and “data” type faults. “Function” faults
show themselves to be important to focus on preventing due to the sheer number of
them, both in general and for the “critical” severity rated faults. Although “function”
122
faults do not stand out as a fault type where most faults are rated as “critical”, it is still
the biggest contributor to “critical” severity rated faults.
When informing the organization involved of the results of this study, the feedback was
anecdotal confirmation of our findings, as they informed us that they were indeed
having issues with design and specification, even though their own fault statistics
showed most faults to be coding faults. We would like to study this issue further in our
future work on the subject.
In many cases, fault reporting is performed with one goal in mind, to fix faults that are
uncovered through inspection and testing. Once the fault has been corrected, the fault
report information is not used again. The available information can be employed in a
useful fashion as long as future development projects are similar to, or based on
previous projects. By reusing the information that has been accumulated during fault
discovery through testing and during production, we are able to learn about possible
faults for new similar projects and further development of current projects.
Measuring quality and effects on quality in a software system is not a trivial matter. As
presented in Section 2, the opinion on how and if this can be done is divided. One of the
means Avizienis et al. suggests for attaining better dependability in a system is fault
removal in order to reduce the number and severity of faults [17]. By identifying
common fault types, developers can reduce a larger number of faults by focusing their
efforts on preventing these types of faults. Also, identifying the most severe fault types
makes developers able to focus on preventing those faults that have the biggest
detrimental impact on the system.
5.2 Further issues concerning fault reporting in this organization
In addition to our quantitative study results, we were able to identify some points of
possible improvement in the studied organization's fault reporting. Two attributes that
we found lacking, which should be possible to include in fault reporting are Fault
Location and Fault Fixing Effort. The location of a fault should be readily known once
a fault report has been dealt with, as fault fixing must have a target module or software
part. This information would be very helpful if the organization wants to investigate
which software modules produce the most serious faults, and they can then make a
reasoned argument if these modules are of a particularly critical type (like infrastructure
or server components), or if some modules are simply of a poorer quality than others.
Including fault fixing effort into the fault reports is also an issue that could be of great
benefit when working to improve fault prevention processes. By recording such
information, we can see which fault types that produce the most expensive faults in
terms of effort when fixing them. These are issues that will be presented to the
organization under study. Their current process of testing and registering faults in a
centralized way hinders the testers and developers from including this valuable
information from the fault reports. The testers who initially produce the fault reports do
not necessarily know which software modules the fault is located in, and developers
fixing the fault do not communicate the location it was found in after it has been found
and fixed.
123
5.3 Threats to validity
When performing an empirical study on industrial projects, it is not possible to control
the environment or data collected as we would do in an experiment. The following is a
short presentation of what we see as the main validity threats.
Internal validity. An issue here might be factors affecting the distribution of fault
types. When the fault data was collected the intention of use was solely for fault fixing,
it was not intended to be studied in this way. The coarse classification given by the
developers could have been biased. Such bias or other inconsistencies were hopefully
reduced by us classifying the fault reports with new fault types.
External validity. The small number of projects under investigation is a threat to
external validity. However, the results of this study support the findings of a previous
similar study of fault reports from other software development organizations. The
projects under study may also not necessarily be the most typical, but this is hard to
verify in any way.
Conclusion validity. One possible threat here is the reliability of measures, as the
categorization of faults into fault types is a subjective task. To prevent categorizing
faults we were unsure of into the wrong category, we used a type “unknown” to filter
out the faults we were not able to confidently categorize.
6. Conclusion and future work
In this paper we have described the results of a study of fault reports from five software
projects from a company developing business-critical software. The fault reports have
been categorized and analyzed according to our research questions. From the research
questions we have found that "function" faults, closely followed by "GUI" faults are the
fault types that occur most frequently in the projects. To reduce the number of faults
introduced in the systems, the organization should focus on improving the processes
which are most likely to contribute to these types of faults, namely the specification and
design phases of development. Faults of the fault types "documentation", "relationship",
"timing/serialization" and "interface" are the least frequent occurring fault types.
The fault types that are most often rated as most severe are "relationship" and
"timing/serialization" faults, while the fault types "GUI" and "documentation" are
considered the least severe. Although “function” faults are not rated as the most severe
type of fault, this fault type still dominates when looking at the distribution of highly
severe faults only.
In additions to these results, we observed that the organization’s fault reporting process
could be improved by adding some information to the fault reports. This would
facilitate more effective targeting of fault types and locations in order to better focus
future efforts for improvement.
In terms of future work, we want to continue studying the projects explored in this
paper, using qualitative methods to further explain our quantitative results. Feedback
124
from the developers’ organization would aid us understand the source of these results,
and help us suggest concrete measures for process improvement in the organization.
Acknowledgements
The authors would like to thank Reidar Conradi for careful reviewing and valuable
input. We also thank the organization involved for their participation and cooperation
during the study.
References
1. R. Grady, Practical Software Metrics for Project Management and Process Improvement,
Prentice Hall, 1992
2. J. A. Børretzen; T. Stålhane; T. Lauritsen; P. T. Myhrer, “Safety activities during early
software project phases”. Proceedings, Norwegian Informatics Conference, 2004
3. J. A. Børretzen; R. Conradi, ”Results and Experiences From an Empirical Study of Fault
Reports in Industrial Projects”. Proc. 7th International Conference on Product Focused
Software Process Improvement (PROFES'2006), 12-14 June 2006, Amsterdam, Pages: 389394
4. P. Mohagheghi; R. Conradi; J.A. Børretzen, "Revisiting the Problem of Using Problem
Reports for Quality Assessment", Proc. the 4th Workshop on Software Quality, held at
ICSE'06, 21 May 2006, Shanghai, Pages: 45-50
5. ISO, ISO/IEC 9126 - Information technology - Software evaluation – Quality characteristics
and guide-lines for their use, ISO, December 1991
6. J.-C. Laprie, “Dependable computing and fault tolerance: Concepts and terminology”,
Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from
Twenty-Five Years', June 27-30, 1995
7. B. Littlewood; L. Strigini, “Software reliability and dependability: a roadmap”, Proceedings
of the Conference on The Future of Software Engineering, Limerick, Ireland, 2000, Pages:
175 - 188
8. N. Leveson, Safeware: System safety and computers, Addison-Wesley, Boston, 1995
9. IEEE, IEEE Standard Classification for Software Anomalies, IEEE Std 1044-1993,
December 2, 1993
10. K.A. Bassin; T. Kratschmer; P. Santhanam, “Evaluating software development objectively”,
IEEE Software, 15(6): 66-74, Nov.-Dec. 1998
11. K. Bassin; P. Santhanam, “Managing the maintenance of ported, outsourced, and legacy
software via orthogonal defect classification”, Proceedings. IEEE International Conference
on Software Maintenance, 2001, 7-9 Nov. 2001
12. K. El Emam; I. Wieczorek, “The repeatability of code defect classifications”, Proceedings.
The Ninth International Symposium on Software Reliability Engineering, 1998, 4-7 Nov.
1998 Page(s):322 – 333
13. R. Chillarege; I.S. Bhandari; J.K. Chaar; M.J. Halliday; D.S. Moebus; B.K. Ray; M.-Y.
Wong, “Orthogonal defect classification-a concept for in-process measurements”, IEEE
Transactions on Software Engineering, Volume 18, Issue 11, Nov. 1992 Page(s):943 - 956
14. R.R. Lutz; I.C. Mikulski, “Empirical analysis of safety-critical anomalies during
operations”, IEEE Transactions on Software Engineering, 30(3):172-180, March 2004
125
15. R.A. Paul; F. Bastani; I-Ling Yen; V.U.B. Challagulla, “Defect-based reliability analysis for
mission-critical software”, The 24th Annual International Computer Software and
Applications Conference, 2000. COMPSAC 2000. 25-27 Oct. 2000 Page(s):439 - 444
16. D. Hamlet, “What is software reliability?”, Proceedings of the Ninth Annual Conference on
Computer Assurance, 1994. COMPASS '94 'Safety, Reliability, Fault Tolerance,
Concurrency and Real Time, Security', 27 June-1 July 1994 Page(s):169 – 170
17. A. Avizienis; J.-C. Laprie; B. Randell; and C. Landwehr, “Basic Concepts and Taxonomy of
Dependable and Secure Computing”, IEEE Transactions on Dependable and Secure
Computing, vol. 1, no. 1, January-March 2004
18. O. Vinter; S. Lauesen, “Analyzing Requirements Bugs”, Software Testing & Quality
Engineering Magazine, Vol. 2-6, Nov/Dec 2000.
19. B. Beizer, Software Testing Techniques. Second Edition, Van Nostrand Reinhold, New
York, 1990
20. J. Zheng; L. Williams; N. Nagappan; W. Snipes; J.P. Hudepohl; M.A. Vouk, “On the value
of static analysis for fault detection in software”, IEEE Transactions on Software
Engineering, Volume 32, Issue 4, April 2006 Page(s):240 – 253
126
P5. The Empirical Studies on Quality Benefits of Reusing
Software Components
(Position paper)
Jingyue Li, Anita Gupta, Jon Arvid, Børretzen, and Reidar Conradi
Department of Computer and Information Science (IDI)
Norwegian University of Science and Technology (NTNU)
{jingyue, anitaash, borretze, conradi}@idi.ntnu.no
Abstract
The benefits of reusing software components have been studied for many years. Several
previous studies have concluded that reused components have fewer defects in general
than non-reusable components. However, few of these studies have gone a further step,
i.e., investigating which type of defects has been reduced because of reuse. Thus, it is
suspected that making a software component reusable will automatically improve its
quality. This paper presents an on-going industrial empirical study on the quality
benefits of reuse. We are going to compare the defects types, which are classified by
ODC (Orthogonal Defect Classification), of the reusable component vs. the nonreusable components in several large and medium software systems. The intention is to
figure out which defects have been reduced because of reuse and the reasons of the
reduction.
1. Introduction
Software reuse is a management strategy for software evolution, in terms of
development for and with reuse. Development for reuse refers to the generalization of
components towards reuse, while development with reuse has to do with the inclusion
of these reusable components in new and future development [1]. Understanding the
issues related to software reuse, involving its purpose and promises, has been a focus
since the 1970s. The focuses have been on how to develop for/with reuse,
technical/managerial/organizational aspects, measuring reuse in terms of quality and
productivity, as well as reporting success and failures of reuse practices. Although some
studies have found that reusable software components have less defect density than nonreusable components [2]-[7], few studies have studied why software defects have been
reduced because of reuse.
In this study, we first have collected defect reports from several large and medium
software systems, which include both reusable software components and non-reusable
software components. Second, we will classify defects in the defect reports using ODC
(Orthogonal Defect Classification) [8]. Then, we will compare the defect density and
severity of different defect types of the reusable components vs. those non-reusable
components. We expect to figure out the types of defects that are not related to reuse
(i.e., their presence is the same for both reusable and non-reusable components). In
addition, we hope to find out some defects that may be more probable in reusable
components than non-reusable components. Finally, we will show the results of our
defect density analysis to project members that building the reusable components and to
127
those building the non-reusable component. By discussing and interviewing these
project members, we expect to find out why making a component reusable has helped or
not helped to reduce certain defect types. This paper is structured as follows: Section 2
introduces related work; Section 3 presents our research motivation and research
questions; Section 4 illustrates detailed research design; Section 5 shows the current
available data that are going to analyze; Section 6 concludes the paper.
2. Related work
Several industrial empirical studies shown in Table 1 conclude that reuse reduces the
defect density, and therefore helps to improve the quality (especially the reliability) of
the system. However, most studies focus on quantity of defects, such as the number of
defects, or the defect density, without considering the defect type.
Some studies have investigated why reusable components have better quality than nonreusable components. The study of Succi et al. [6] concludes that implementing a
systematic reuse policy, such as the adoption of a domain specific library, improves the
customer satisfaction. Results from study of Selby [7] show that software modules
without revisions had the fewest faults. However, the results of these studies only show
the connection between the reuse policy and the number of defects.
Table 1: Studies related to defect density and reuse
Quality focus
Reusable vs.
non-reusable
components[2]
Quality measures
No definition of what
a defect is. Defect density is given as
defects/kilo non-comment source
statements (KNCSS).
Conclusion
Reuse can provide
improved quality, increased productivity,
shortened time-to-market, and enhanced
economics.
Reusable vs. non-reusable
components [3]
Defect density (number of
defects/LOC) and stability (module
size and size of modified code). Size
is in SLOC.
Reusable vs. non-reusable
components [4]
Defect density (number of
defects/SLOC) and change density
(number of change requests/SLOC).
Error/defect densities (errors/defects
per thousand source statements). No
definition for error/defect. Size is in
SLOC.
Reused components had lower defect
density than the non-reused ones, and they
were also less modified (more stable) than
non-reused ones. Reused components had a
higher number of defects of the highest
severity before delivery, but fewer defects
post-delivery.
The quality of the reusable framework
improves and it becomes more stable over
several releases.
Reuse provides an improvement in error
density (more than a 90% reduction)
compared to new development.
Reusable vs. newly
developed components [5]
Code reuse [6]
Customer Complaint Density (CCD)
is the ratio of customer complaints to
LOC, and is post-release defect
density.
Reuse is significantly positively correlated
with customer satisfaction.
Reused, modified and
newly developed modules
[7]
Module fault rate number of faults in
a module. An error correction may
affect more than one module. Each
module affected by an error is
counted as having a fault. Size is in
SLOC.
Software modules reused without revision
had the fewest faults, fewest faults per
source line, and lowest fault correction effort.
Software modules reused with major
revisions had the highest fault correction
effort and highest fault isolation effort.
128
3. Research motivation and research questions
Our motivation here is to investigate the issues related to defects, namely classification
and severity, in relation to software reuse. In another word, we are interested in the
connection between the reuse policy and different kinds of defects. We want to know
whether system reuse will help to reduce all kinds of defects or it is only helpful to
minimize certain kinds of defects. We also suspect that reuse may increase certain types
of defects. For example, we expect that familiarity with a component might prevent
testing a component thoroughly. The results of this study will help industrial
practitioners to better understand the benefits of reuse. It can also help them to improve
their reuse policy in order to get a better quality system. Our research questions are as
following:
• RQ1: What are the more common defect types in the reusable components vs. those
non-reusable components?
• RQ2: What are the severities of defects for the reusable components vs. the nonreusable components?
• RQ3: What are the reasons of reduced defects in the reusable components?
4. Detailed research design
When abnormalities in the operation of a system are found, these are reported as
failures. These failures are reported to developers through failure reports. A fault is an
underlying problem in a software system that leads to a failure. Error is used to denote
the execution of a passive fault which leads to incorrect behavior (with respect to the
requirements) or system state [9], and also for any fault or failure resulting from human
activity [10]. In our study, defect is used in place of fault, error or failure, without
distinguishing the origin or whether it’s active or passive.
To answer the first research question RQ1, we plan to use the Orthogonal Defect
Classification (ODC) scheme defined by IBM [8] to classify the defects into different
types. The goal of IBM’s ODC scheme is to categorize defects such that each defect
type is associated with a specific stage of development. ODC has been used o evaluate
and improve technology. For example, in order to investigate the value of automatic
static analysis, the defects found by the static analysis and not found by static analysis
can be classified [11].
Reuse is proposed as a mechanism to improve the efficiency and quality of software
development. It is therefore reasonable to use ODC to analyze the quality improvement
due to reuse. The attribute defect type in ODC captures the fix that was made to resolve
the defect. For example, defects of type function are those that require a formal design
change. Examples of the defect types are given in Table 2. Details of other defect types
are in [12].
129
Table 2. Examples of defect types in ODC
Name
Assignment
Description
Value(s) assigned incorrectly or not assigned
at all
Examples
Internal variable or variable within a
control block did not have correct
value, or did not have any value at
all
Checking
Errors caused by missing or incorrect
validation of parameters or data in conditional
statements.
Value greater than 100 is not valid,
but the check to make sure that the
value was less than 100 was
missing.
Algorithm or Method
Efficiency or correctness problems that affect
the task and can be fixed by (re)implementing
an algorithm or local data structure without
the need for requesting a design change
The number and/or types of
parameters of a method or an
operation are incorrectly specified
To answer RQ1, we will compare the density and distribution of different defect types
of reusable components with those non-reusable components.
To answer RQ2, we will compare the distribution of defect severities, which are usually
defined by testers or developers, of the reusable components with that of the nonreusable components.
After getting the results of the RQ1 and RQ2, we will do a causal analysis by comparing
the development processes, quality assurances, change management, application
domain, and other contexts of the projects building the reusable components with those
building non-reusable components to answer RQ3. The causal analysis will be done by
interviewing project managers with supplemental documentation analysis. The purpose
is to figure out why the reusable components have less/more defect density/severity of
certain defect types than the non-reusable components.
5. The available data
We currently have data from two software systems in company A and B (the company
names are not shown in the paper due to confidential reasons). More data in several
other companies will be collected in the near future.
5.1. Available data from the Company A
The company A is a large, Norwegian company, in the Oil & Gas industry. The central
IT-department of the company is responsible for developing and delivering software,
which is meant to give key business areas better flexibility in their operation. It is also
responsible for the operation and support of IT-systems. This department consists of
approximately 100 developers, located mainly in Norway. Since 2003, a central IT
strategy of the O&S (Oil Sales, Trading and Supply) business area has been to explore
the potential benefits of reusing software systematically. Company A has developed a
customized framework of reusable components, which is based on J2EE (Java 2
Enterprise Edition). The reusable components have been developed for three releases
with in total 56 KLOC. There are several applications using the function of the reusable
130
components. One application is the document storage application and includes several
components. The components in this application are defined as non-reusable
components in our study. The application has also three releases with total of 67 KLOC.
In company A, the defects are recorded by the Rational ClearQuest tool. Each trouble
report contains an ID, headline description, severity (that indicates how critical the
problem is evaluated by developers, such as critical, high, medium, or low),
classification (Error, Error in other system, Duplicate, Rejected and Postponed),
estimated time to fix, remaining time to fix, subsystem location, as well as an updated
action and timestamp record for each new state the defect enters in the workflow. There
are 223 trouble reports for the reusable framework and 438 trouble reports for the nonreusable application.
5.2. Available data from the Company B
Company B is a large Nordic company, working in the IT industry. They specialize in
applications for workflow and process support both for public and corporate purposes,
as well as doing consultancy work for their customers. The company employs around
500 people. The project studied from company B is a combined web presentation
system and task management system used in administration of public information and
application processing.
The defect reports stem from three releases of the project occupying 6-7 developers.
The project from company B is developed using a framework with reusable components
and generated code. This project is also based on J2EE. Major parts of the reused code
were automatically generated by a code generation tool, and the company did not report
on number of lines of code. The defects have been reported in the Atlassian Jira Bug
Tracking tool. The trouble reports contains an ID (Key), a short summary, type, status
(Resolved/Closed/Open), severity evaluated by the developers (Blocker, Critical, Major,
Normal, Minor, Trivial), resolution (Fixed, Cannot Reproduce, Won't fix etc), which
person has been assigned responsibility (Assignee), who reported the defect (Reporter),
time created, time updated, version defect was found, version the defect should be fixed,
and the subsystem location (Components). There are 379 trouble reports from company
B, of which 286 trouble reports come from the reused parts, and 93 trouble reports come
from the non-reused parts.
6 Conclusion and future work
In this position paper, we present the research design of an on-going empirical study to
investigate the benefit or cost of software reuse on software quality. By analyzing the
defect reports of several software systems, which include both reusable and nonreusable components, we expect to deepen the understanding on why reuse improves
the quality of software. The conclusions of this study will be given as guidelines on
improving the reuse process to companies involved this study. In order to generalize our
conclusion, the future work is to collect data from project with different contexts, such
131
as application domains, technologies, and development processes, in order to find the
common good practices and lessons learned of software reuse.
7 References
[1] G. Sindre, R. Conradi, and E. Karlsson, “The REBOOT Approach to Software Reuse”,
Journal of System Software, 30(3): 201–212, 1995.
[2] W. C. Lim, “Effect of Reuse on Quality, Productivity and Economics”, IEEE Software,
11(5): 23-30, Sept./Oct. 1994.
[3] P. Mohagheghi, R. Conradi, O. M. Killi, H. Schwarz, “An Empirical Study of Software
Reuse vs. Defect Density and Stability”, Proc. 26th Int’l Conference on Software
Engineering (ICSE’2004), 23-28 May 2004, Edinburgh, Scotland, pp. 282-291, IEEE-CS
Press.
[4] A. Gupta, O. P. N. Slyngstad, R. Conradi, P. Mohagheghi, H. Rønneberg, and E. Landre:
“An Empirical Study of Defect-Density and Change-Density and their Progress over Time in
Statoil ASA”, Proc. 11th European Conference on Software Maintenance and Reengineering
(CSMR’07), 21-23 March 2007, Amsterdam, The Netherlands, p. 10.
[5] W.M. Thomas, A. Delis and V.R. Basili, “An analysis of Errors in a Reuse-Oriented
Development Environment”, Journal of Systems and Software, 38(3): 211-224, September
2004.
[6] G. Succi, L. Benedicenti, and T. Vernazza, “Analysis of the Effects of Software Reuse on
Customer Satisfaction in an RPG Environment”, IEEE Transactions on Software
Engineering, 27(5): 473-479, May 2001.
[7] W. Selby, “Enabling Reuse-Based Software Development of Large-Scale Systems”, IEEE
Transactions on Software Engineering, 31(6): 495-510, June 2005.
[8] R. Chillarege, I. S. Bhandari, J. K. Chaar; M. J. Halliday, D. S. Moebus, B. K. Ray, M.-Y.
Wong, “Orthogonal Defect Classification - a Concept for in-Process Measurements, IEEE
Transactions on Software Engineering, 18 (1): 943-956, Nov. 1992.
[9] IEEE Standard Glossary of Software Engineering Terminology, IEEE Standard 610.12,
1990.
[10] A. Endres and D. Rombach, “A Handbook of Software and Systems Engineering:
Empirical Observations, Laws and Theories”, Addison-Wesley, 2004.
[11] J. Zheng, L. Williams, N. Nagappan, W. Snipes, J. P. Hudepohl, M. A. Vouk, “On the
Value of Static Analysis for Fault Detection in Software”, IEEE Transactions on Software
Engineering, 32 (4): 240-253, April, 2006.
[12] ODC defect type: http://www.research.ibm.com/softeng/ODC/DETODC.HTM#type
132
P6. Fault classification and fault management: Experiences
from a software developer perspective
Jon Arvid Børretzen
Department of Computer and Information Science,
Norwegian University of Science and Technology (NTNU),
NO-7491 Trondheim, Norway
[email protected]
Abstract: In most software development projects, faults are unintentionally injected in
the software, and are later found through inspection, testing or field use and reported in
order to be fixed later. The associated fault reports can have uses that go beyond just
fixing discovered faults. This paper presents the findings from interviews performed
with representatives involved in fault reporting and correcting processes in different
software projects. The main topics of the interviews were fault management and fault
reporting processes. The objective was to present practitioners’ view on fault reporting,
and in particular fault classification, as well as to expand and deepen the knowledge
gained from a previous study on the same projects. Through interviews and use of
Grounded Theory we wanted to find the potential weaknesses in a current fault
reporting process and elicit improvement areas and their motivation. The results show
that fault management could and should include steps to improve product quality. The
interviews also supported our quantitative findings in previous studies on the same
development projects, where much rework through fault fixing need to be done after
testing because areas of work in early stages of projects have been neglected.
Keywords: Process improvement, Fault management, Fault classification, Software
quality.
1. Introduction
An important goal for most software development organizations is improving software
quality, typically reliability. There are several ways to go about this, but it is not a trivial
task, and different stakeholders have different views on what software quality is. In
addition, the application domain of the actual software will influence what quality
attributes we consider to be most relevant. For many organizations, their routinely
collected data is an untapped source for process analysis and possible process
improvement. Indeed, leaving collected data largely unused can demotivate the
developers and reduce data quality. Our conjecture, and supported by previous research,
is that fault analysis can be an effective approach to software process improvement
(Grady, 1992).
The Business-Critical Software (BUCS) project (Børretzen et al., 2004) works to
develop a set of techniques to improve support for analysis, development, operation,
and maintenance of business-critical systems. Business-critical systems are systems that
we expect will run correctly and safely, even if the consequences are mainly of a “mild”
133
economic nature. In these systems, the software is critical, and the main target for
developers will be to make systems that operate correctly and without serious
consequences in case of failures (erroneous behaviour vs. specifications). One important
issue when developing these kinds of systems is to remove possible causes for failure,
which may lead to wrong operation of the system.
In two previous studies (Børretzen and Conradi, 2006; Børretzen and DyreHansen, 2007), we have investigated fault reports in nine business-critical industrial
software projects. These studies were quantitative studies based on fault report analysis,
where we mainly studied fault types and severity of faults. Building on the results of the
most recent study, we want to gain a better understanding of the results this study gave
us. A considerable share of the reported faults were of a type that are associated with
early software development phases, indicating flaws in the quality control in these
phases. This paper presents the results from interviews done with representatives from
some of the projects studied in (Børretzen and Dyre-Hansen, 2007), as well as two
workshops with further representatives on fault reporting and fault classification.
The rest of this paper is organized as follows. Section 2 gives our motivation and
related work. Section 3 describes the research, research questions and procedure.
Section 4 presents the results from the interviews and workshop feedback, and Section 5
presents a discussion of the work and the results. The conclusion and further work is
presented in Section 6.
2. Background and Motivation
The motivation for the work described in this paper is to expand on the
knowledge gained from a previous quantitative study on fault reports from five projects
in a software development organization. By performing an additional qualitative
investigation with representatives from some of the same projects, we sought to better
understand the reasons for the results we saw in the quantitative study, as well as
receiving input from practitioners on how a more thorough fault management can be
part of a software process improvement initiative. “Value-based” software engineering
say that developing models and measures that focus on value received, makes us able to
perform trade-off decisions which helps us concentrate on the right issues in process
improvement (Biffl et al., 2006). There are several motivations for preventing and
correcting faults as early in the process as possible, but the main issue will usually be of
an economical nature.
2.1 State-of-the-art
When considering quality improvement through use of fault analysis, there are many
related topics to consider. Several issues about fault reporting are discussed by
Mohagheghi et al. in (Mohagheghi et al., 2006). General terminology in fault reporting
is one problem, validity of use of fault reports as a means for evaluating software
quality is another. Important issues to consider are how to describe the fault by “what” –
the cause of the fault, “where” – the location of the fault, and “when” – the detection
phase of the fault. One of the conclusions in (Mohagheghi et al., 2006) is that “There
should be a trade-off between the cost of repairing a fault and its presumed customer
134
value. The number of faults and their severity for users may also be used as a quality
indicator for purchased or reused software.” By using fault report analysis, one can get
a step closer to understand the cost of repairing faults of various categories.
Fault reports is the term describing the information that is recorded about faults
that are discovered during development, testing and field use of a software system.
These reports can contain a range of different information, examples some common
fault report attributes are fault description, fault severity, fault type and fault location.
Their most obvious function is to be the link between fault discovery and fault
correction, but they are also valuable when performing fault analysis of a system.
One way to improve the quality of developed software is to reduce the number
and severity of faults introduced into the system during development. Faults are
potential flaws in a software system, that later may be activated to produce an error. An
error is the execution of a "passive fault", leading to a failure. A failure results in
observable and erroneous external behaviour, i.e. inconsistent system state. Faults that
have been introduced into the system during implementation can be discovered either
by inspection before the system is run, by testing during development or when the
application is run on site. The discovered faults are then reported in a fault reporting
system, and will normally be fixed later. Faults are also commonly known as defects or
bugs, and also under the more extensive concept anomalies.
Orthogonal Defect Classification – ODC – is a way of studying defects in
software systems, and is mainly suited to design and coding defects. (Chillarege et al.,
1992; El Emam and Wieczorek, 1998) are some papers on ODC and using ODC in
empirical studies. ODC goes along with a scheme where the semantics of each software
fault can be captured quickly and easily.
Avizienis et al. (2004) state that the fault prevention and fault tolerance aim to
provide the ability to deliver a service that can be trusted, while fault removal and fault
forecasting aim to reach confidence in that ability by justifying that the functional and
the dependability and security specifications are adequate and that the system is likely
to meet them. Hence, by working towards techniques that can prevent faults and reduce
the number and severity of faults in a system, the quality of the system can be improved
in the area of dependability.
There are different perspectives and motivations for working to prevent and
correct faults in software, but the most important motivation is that of cost. Correcting
faults is costly and in many instances is nothing but redoing the work that should have
been done correctly in the first place.
2.2 Previous work
Previously, we had conducted a fault report analysis of five projects in a software
development organization. We studied the projects with regard to reported faults,
through analysing fault reports from the development of the applications. Looking at
descriptions of the individual faults, as well as other data reported about the faults, we
classified the faults into different fault types. By grouping faults into fault types, we
tried to find indications on where reported faults originated in the development process.
The analysis was based on ODC, and we categorized faults into fault types
based on that technique. Table 1 shows the fault types that were used in that study. The
135
fault analysis of the five projects and the results are described further in (Børretzen and
Dyre-Hansen, 2007).
The results of the fault analysis were presented to the interviewees, and the basis
for the interviews was therefore our analysis of projects in their organization as well as
their own experiences on fault reporting in the organization. Quantitative results from
the fault report study indicated that a large share of faults reported originated from early
phases of development, and we wanted to explore this area further with qualitative
feedback from people involved in the studied projects. An earlier study performed by us
in four other companies had shown the same tendency of high numbers of early phase
faults (Børretzen and Conradi, 2006). Another example of results in a related study is
the work done by Vinter and Lauesen (2000). This paper used a different fault type
taxonomy, but reports that in their studied project close to a quarter of the faults found
were of the similar type “Requirements and Features”. In addition to building on the
quantitative results, we wanted to learn about practitioners’ opinions and knowledge
about fault reporting and management.
Table 2. Fault types used in (Børretzen and Dyre-Hansen, 2007)
Fault types
Algorithm
Assignment
Checking
Data
Documentation
Environment
Function
GUI
Interface
Relationship
Timing/serialization
Following on to our two previous qualitative studies on fault reports; we wanted
to continue studying the projects explored in (Børretzen and Dyre-Hansen, 2007) using
qualitative methods to further explain our quantitative results. Feedback from the
developers’ organization would aid us understand the source of these results, and help
us suggest concrete measures for process improvement in the organization.
2.3 Study Context
The development organization we have studied is a part of a large company developing
and maintaining business-critical applications in the financial sector. It has been
involved in the user-driven EVISOFT research and development project concerning
industrial process improvement since 2006 (EVISOFT, 2006). The organization under
study is distributed over several locations, and it employs hundreds of software
developers. Our study has mainly involved representatives from two locations in
Norway.
136
Table 2. The organization’s existing fault categorization scheme.
Fault types
Other/not fault
User error
Fault at subcontractor
Change Request
Fault in generation
Wrong version
Failt in internal test systems
Fault when establishing test environment
Fault in sequencing
Fault in external test environment
Fault in test data
Fault in test specification
Specification/design fault
Code fault
This organization had been working on process improvement concerning their
fault management routines for some time, and some changes in the way faults were
reported and handled were on the verge of being introduced when our first quantitative
study started. They had an existing fault categorization scheme, although this was
mostly focused on issues concerning the test environment, and did not capture the
semantics of specification, design and coding faults in a detailed manner. This existing
scheme is shown in Table 2. The feedback they received from our study (Børretzen and
Dyre-Hansen, 2007) prompted some further proposals for change, which were to be
implemented in a pilot project. The organization used a common process for reporting
and managing faults, with minor customizations for individual projects. Test leaders
were responsible for dealing with fault reports and reporting the faults back to
developers for fixing.
2.4 The Grounded Theory Approach
Grounded Theory is a systematic research methodology originating from the social
sciences developed by the sociologists Barney Glaser and Anselm Strauss emphasizing
generation of theory from data. When the principles of grounded theory are followed, a
researcher using this approach will formulate a theory about the phenomena they are
studying that can be evaluated. The grounded theory approach is a “qualitative research
method that uses a systematic set of procedures to develop an inductively derived
grounded theory about a phenomenon” (Strauss and Corbin, 1998). The methodology is
designed to help researchers produce “conceptually dense” theories that consist of
relationships among concepts representing “patterns of action and interaction between
and among various types of social units” (Strauss and Corbin, 1998). Potential sources
of data for developing grounded theory include interviews, field observations,
documents, and videotapes.
At the heart of the grounded theory methodology, are three coding procedures of
open coding, axial coding, and selective coding (Strauss and Corbin, 1998). These
codes are generated and validated using the constant comparison method, and coding, at
137
each stage, terminates when theoretical saturation is achieved with no further codes or
relationships among codes emerging from the data.
Open coding involves immersion in the data and generation of concepts with
dimensionalized properties using constant comparison. This is done by “breaking down,
examining, comparing, conceptualizing, and categorizing data”, often in terms of
properties and dimensions (Strauss and Corbin, 1998). The examination of data in order
to fracture it and generate codes could proceed “line by line”, by sentence or paragraph,
or by a holistic analysis of an entire document. The open coding process, while
procedurally guided and promoting a realist ontology, requires researchers to “include
the perspectives and voices of the people” whom they study. Data, for open coding, is
selected using a form of theoretical sampling known as “open sampling.” Open
sampling involves identifying situations/ portions of the transcripts that lead to greater
understanding of categories and their properties.
Axial coding refers to the analytic activity for "making connections between a
category and its sub-categories" developed during open coding, that is, reassembling
fractured data by utilizing "a coding paradigm involving conditions, context,
action/interactional strategies and consequences" (Strauss and Corbin, 1998).
Selective coding involves the identification of the “core category” (central
phenomenon that needs to be theorized about) and linking the different categories to the
core category using the paradigm model (consisting of conditions, context, strategies,
and consequences). Often, this integration takes the shape of a process model with the
linking of action/interactional sequences.
Although Grounded Theory is not the most common research method in
computer sciences, there are several studies showing that this way of building theory
and drawing conclusions from qualitative data is highly applicable in this field as well
as in the social studies (Bryant, 2002; Hansen and Kautz, 2005; Sarker et al., 2000).
3. Research design
As this study is based on qualitative methods, we had not initially made any rigorous
research questions before initializing the study, just an interview guide. As the interview
guide was prepared, though, we could put the different questions into related groups that
would help answer some common questions. Interviews were carried out based on the
interview guide, and the transcribed interview responses were coded and analyzed using
the Grounded Theory method.
3.1 Research Questions
This investigation is based on the results we got from a previous study on fault reports.
The main research questions for this study derived from the researchers’ viewpoint after
the quantitative study are the following:
Firstly, we wished to hear if the experience of the practitioners involved in the projects
we had analyzed was similar to the analysis results we had found in previous studies.
RQ1: How can the large number of faults originating in early development phases
which was found in the quantitative study be explained?
138
Secondly, we wanted to draw on their experience to hear if they thought a fault (type)
classification scheme could be helpful towards improving their development processes.
RQ2: Can the introduction of a fault classification scheme like ODC be useful to
improve development processes?
We also wanted to hear their opinions on increasing effort in data collection and fault
report analysis in order to improve their software development processes.
RQ3: Do they see feedback from fault report analysis as a useful software process
improvement tool?
Lastly, we wanted to ask them where they thought that there was most potential of
improvement in their fault management system, to elicit areas that they felt were lacking
in their current fault reporting process.
RQ4: Do they see any potential improvement areas in their fault management system?
We designed a semi-structured interview using an interview guide containing seven
topics and incorporating 32 questions.
3.2 Research Procedure
We started by defining our research goals and questions, by drawing on the conclusions
from our previous quantitative study in the organization. We proceeded to design the
interview guide, with adjustment of research questions accordingly. Figure 1 shows the
main structure of our research procedure.
Interview
guide and
research
questions
Conducting
interviews
Transcribing
interviews,
data coding
Presentation
of results
Workshops
to get
developer
feedback
Figure 1. Research procedure
Interviews
The data from which we wanted answering our research questions were collected
through expert interviews. The interviews were conducted by the author, a PhD-student,
using an interview guide and a digital voice recorder. The interviews were subsequently
transcribed and coded by the same person.
139
When selecting the interviewees, we wanted to have individuals who had been
actively involved in some of the five projects we had studied in this organization before
and who also had hands-on experience from dealing with fault management in these
projects. From these criteria, we contacted project managers and test managers from the
five projects we had studied in conjunction with our contact person in the organization.
The outcome was interviews with three persons from projects we had studied, in
addition to one other person who had worked in a similar project.
The interviews were conducted as open-ended, but structured interviews. The
same questions were asked in every interview, but the interviewees were given a lot of
room to talk about what they felt was important within the topic of the question.
Interview data analysis
Each question in the interview guide was related to one or more research questions, and
the different responses for each question were compared to extract answers related to
the research question. Grouping the answers related to each research question, we could
extract information helping to answer our four research questions. Following this, the
answers to each question were coded according to Grounded Theory in order to be able
to separate different views on the questions. In line with using the constant comparison
method, we coded each answer into groups. The codes were postformed, i.e. constructed
as a part of the coding process, since the interviews were open and we had not made any
expectations of how the interviewees would answer. After one round of coding had been
performed, we went through the data once more in order to make sure that the responses
grouped together actually said the same things. As Seaman (1999) states, the work of
finding patterns and trends is largely creative, but as most of the responses in the
interviews were rather direct, drawing general conclusions from interview responses
was not difficult.
Workshops
In addition, we later received feedback about the topic at 7hand through discussions and
comments during two workshops that were held in the organization in conjunction with
the fault report study and interviews. The participants of these workshops had the same
job description and responsibilities as the participants in the interviews, and all of them
worked on similar business-critical application development projects. These comments
and discussions were not formally recorded by voice recorder, but notes were taken as
the workshop endured. This information was used to clarify and support the findings in
the interviews.
3.3 Research Execution
The organization under study is a large Norwegian software development organization
that develops and maintains applications in several business-critical domains. The
interviewees had worked in projects which had been developed for external customers.
The software systems developed in these projects are all on-line systems of a businesscritical nature, and they have all been put into full or partial production. The
organization used a commercially available and common fault management tool,
140
Mercury Quality Center, and had developed a fault report template that were used in all
projects, with minor amendments as needed by the projects.
Interviews were conducted with four representatives from the organization from which
we had studied five development projects with respect to fault reports (Børretzen and
Dyre-Hansen, 2007). Two of the interviewees had worked as test managers in the
projects we had studied, and one interviewee had worked as a test manager in another,
but similar project in the organization. One interviewee had been the project manager
for one of the studied projects.
Prior to the interview, the interviewees had been given information about our
fault report study and our intentions with the interview. On the day of the interview, we
had a presentation about the findings in our study and our interpretations of these
results.
There were four separate interviews, one with each interviewee, held at two
separate locations. The duration of the interviews was from 40 to 55 minutes. The
interviews were subsequently transcribed, coded and analyzed using the constant
comparison method (Seaman, 1999), and following the grounded theory method as
described in (Strauss and Corbin, 1998). The interview guide consisted of 32 questions
from seven categories as shown in Table 3.
Table 3. Interview categories
Interviewee background
Results of the previous study of the same organization
The organization’s own measurements and analysis on faults
The existing quality system and fault management system used in the organization
Fault reporting and fault categorization
Feedback on fault reporting to developers
Attitude to process changes and quality improvement initiatives in the organization
In addition to the interviews comes feedback that we got from participants
during two workshops on fault reporting and fault categorization that we had organized.
The workshop participants were all test managers from the same organization where we
had conducted the fault report study. The three test managers that had been interviewed
also participated in these workshops. As the discussions and feedback from these
workshops were very fruitful, we decided to include some of this elicited information in
this study, to augment the knowledge we got from the interviews. The workshops also
worked to confirm much of what had been talked about in the interviews.
4. Results
This section presents the findings in our study, together with the augmenting feedback
we received from the workshop we arranged with the organization.
141
4.1 Interview response
The following presents a summary of how the interviewees responded to the interview
questions related to each research question. This is based on the coded interview
answers, and using the grounded theory method we can formulate some general findings
for each of the research questions we had defined.
RQ1: How can the large number of faults originating in early process phases which was
found in the quantitative study be explained?
In answering RQ1, there was consensus that work processes in early stages of
development, i.e. specification and design, should be improved. Many of the faults were
caused by poor specifications or design, or difficult access to specifications and design.
Lack of information forwarded from analysts and designers to the developers was often
cited as a cause of faults. Especially for more complex systems there was a need for
more effort in early development phases. Poor guidelines for developers were also
mentioned as a probable cause for the high number of faults related to specification and
design. The interviewees agreed that the general results from the study of the five
projects also were relevant for their individual projects. The findings in our previous
qualitative fault analysis work supported the suspicion of the interviewees, that the work
processes in early development were not optimal and that introducing fault analysis that
could help pinpoint fault origins would be very useful.
RQ2: Can the introduction of a fault classification scheme like ODC be useful to
improve development processes?
When answering questions related to RQ2, the response was that introducing a fault
categorization scheme like ODC would be a good idea, given that the introduction of a
new scheme was performed in steps and with the cooperation of everyone involved.
They felt that by introducing a reporting scheme like this, it would be easier to
document how development processes needed improvement, since the different fault
types are quite distinct and descriptive. They were not particularly pleased with their
current fault categorization scheme, which was not very detailed except for faults
related to test environment. Introducing a new scheme was seen as an easy technical
task, as they used a very flexible fault management system.
RQ3: Do they see feedback from fault report analysis as a useful process improvement
tool?
In answers to questions on RQ3, there was a strong agreement that fault management
could and should include steps to improve process and product quality. There had been
spread attempts at fault analysis in the organization, but they didn’t believe the correct
metrics had been used to exploit it fully. Using fault report analysis more actively, with
more descriptive fault reports was seen as a very useful tool, but they also warned that
the concept would have to be introduced to the developers who were going to use and
be affected by this in the right manner, especially since more detailed fault reporting
142
could lay developers more open to “blame” for faults that had been introduced in
development.
RQ4: Do they see any potential improvement areas in their fault management system?
The response to the questions related to RQ4 indicated that information flow could be
improved between testers discovering faults and developers who were fixing faults.
Although all respondents initially said they were pleased with the fault reporting
scheme in terms of what information was entered into it, they had some comments on
expansion of the current scheme. The potential improvements that were mentioned were
better registration of effort (hours) used in fault discovery and correction tasks, and
better registration of fault location on component or code level.
4.2 Feedback and experience from workshops
In addition to the response and results we got from the interviews, we also would like to
include some of the information and feedback we received on this topic when we
arranged a workshop on the topics of fault reporting and fault categorization for
representatives of the studied organization. These representatives were only shown the
results from our quantitative study, not any information from the interviews, although
some of the participants in the workshop had also been involved in the interview
sessions.
Related to RQ1, on the large number of faults related to early process phases, the
general reaction was again that in their experience, a documented development process
was lacking, and there were clear indications that improving specification and design
processes and work would be a positive move. The major complaints were that design
phases were too hasty, there were not enough reviews and documentation was not good
enough. Discussions around topics for RQ2 told us that most of the people at the
workshop could see the need for better fault classification. Still, several people were
skeptical of introducing an all new fault taxonomy without involving the people who
were going to actually use it for classifying faults found. On the topic of feedback from
fault report analysis as a process improvement tool (RQ3), the general consensus was
that it could be very useful, and that the completed quantitative study performed showed
that it had been useful already. As for potential improvement of their fault management
system (RQ4), they seemed to agree that the actual system in use was sufficient, but that
the information put into the system should be improved.
5. Discussion
This chapter first discusses some issues concerning the results in view of our research
questions, the study’s validity, and relevance for the company.
In RQ1, we wished to hear if the experience of the practitioners involved in the projects
we had analyzed, was similar to the analysis results we had found in previous studies.
They generally agreed that the results from the quantitative study were valid and
seemed relevant for their individual projects as well. Results also showed that
143
experience in the organization was in line with our previous findings, that there were
weaknesses in the early development phases, specification and design, that were origins
for faults being introduced in the software. This is a similar conclusion as presented by
Vinter and Lauesen (2000).
In RQ2, we wanted to draw on their experience to hear if they thought a fault type
classification scheme could be helpful towards improving their development processes.
The response was that introducing a better and more structured fault classification
scheme would be a good idea, as long as it was a scheme that was clearly defined and
useable for the developers. El Emam and Wieczorek (1998) claims that as long as fault
type classes are well understood, developers are able to correctly categorize faults into
fault types.
In RQ3, we also wanted to hear their opinions on increasing effort in data collection
and fault report analysis in order to improve their software development processes. The
respondents were all very positive about introducing fault report analysis for process
improvement into their fault management process. By using fault report analysis more
actively, they could be able to produce better metrics for process improvement. As most
software developing organizations produce large quantities of information about the
faults in the system they develop, it makes a lot of sense to utilize this information
further than the simplest and most obvious way that is purely using fault report data as a
fault reporting and correction log.
Lastly in RQ4, we wanted to ask where they thought there was most potential of
improvement in their fault management system. This was to elicit areas that they felt
were lacking in their current fault reporting process. The answers indicated that
information flow between testers and developers should be improved, and there were
areas of improvement on the information stored in fault reports. This is an issue we
have touched upon in our previous studies (Børretzen and Conradi, 2006; Børretzen and
Dyre-Hansen, 2007).
5.1 Validity
Concerning the validity of this study, we have to consider internal validity - i.e.,
credibility, believability, plausibility of findings and results and external validity - i.e.,
generalizability or applicability of the study's findings, results and conclusions to other
circumstances. Also, we touch upon an issue concerning construct validity.
The main internal validity threat we have identified is that the interview,
transcribing and information coding were all performed by the same person. This may
introduce bias to how certain responses have been interpreted. The reason for having
just one person performing all the tasks in the study has been due to resource
limitations. By using feedback from the workshops to augment the interview responses,
we feel that the potential bias has been reduced. In addition, we had a dialogue with
some of the interviewees to clear up some questions we had after the transcription of the
interview recordings. Also, as Sarker et al. (2000) states, “grounded theory coding and
sampling must never be delegated to hired assistants, but must be done by the
researchers who have a stake in the theory emerging from the project.”
144
The main threat to external validity is that interviews have all been carried out
with interviewees from the same organization. This means that we have only
investigated the opinions of people using the same form of fault management and
reporting system. The reason for this was that we wanted to base our interviews on
information gathered in the previous quantitative study, and we therefore had to
interview the people who had been involved in the actual projects.
As for construct validity, one threat was in the truthfulness of the responses in
the interviews. The topics of faults in software and fault management are delicate ones,
as it touches upon aspects concerning quality deficiencies in the product an organization
delivers. The interviewees might feel that they were being evaluated, and present a
better picture than what was actually true. Nevertheless, we found the interviewees to be
truthful and open minded, and do not suspect that they were holding information back
or presenting a more attractive situation in their organization than what is actually the
case. A more likely issue could actually be participants being more positive to the ideas
when presented with them in interviews than what they would actually feel about them
in real life.
5.2 Relevance of results for the studied organization
The organization involved in the study were already working to improve their fault
management processes, which meant that our suggestions and findings based on their
fault report data would be welcome. Our previous quantitative fault report study lead to
a specific suggestion of new fault types being introduced into a pilot project, and the
results of these interviews and workshops will most likely be used to fine tune the
selection of fault types used. Compared to their original fault type classification, the
new classification scheme should be better suited for process improvement work.
6. Conclusion and Future Work
We have performed qualitative interviews with representatives of a software developing
organization regarding fault reporting and fault classification. This has expanded our
knowledge and is built on results from a quantitative study of five projects in the same
organization. Our main contribution is showing that practitioners are motivated to use
their existing knowledge of software faults in a more extensive manner to improve their
work practices. By triangulation of both qualitative and quantitative methods, we have
increased the validity of our studies. Our main findings are that:
• The interviewees agreed with our conclusions from the previous quantitative
study (Børretzen and Dyre-Hansen, 2007), i.e. that the early phases in their
development process had weaknesses that lead to a high number of software
faults from early development phases.
• They also expressed a need for better fault categorization in their fault reports, in
order to analyze previous projects with intention of improving their work
processes.
• The proposed ODC fault types were seen as a useful basis for introducing a
better fault classification scheme, although simplicity was important.
145
•
•
They were positive to using fault report analysis feedback to improve
development processes, although introducing such analysis for regular use
would have to be done carefully in the organization.
Finally, they revealed some areas in their fault reporting scheme that could be
improved in order to improve analysis usefulness, for instance including
attributes like fault finding and correction effort and component location of
fault. The knowledge was present, it was just not recorded formally.
In terms of future work, we would want to perform a second series of interviews
in the organization after the new fault categorization scheme has been in use for some
time. Through this we would be able to ascertain how this initiative has worked in the
organization, and how it influences their project analyses and development process. We
would also like to expand the generalizability of the study by including other software
developing organizations using similar fault management processes.
Acknowledgements
The author would like to thank Reidar Conradi for careful reviewing and valuable input.
We also thank the organization involved for their participation and cooperation during
the study.
References
Avizienis, A., Laprie, J.-C., Randell, B., Landwehr, C., 2004. Basic Concepts and Taxonomy of
Dependable and Secure Computing. In: IEEE Transactions on Dependable and Secure
Computing. (1)1, January-March 2004.
Biffl, S., Aurum, A., Boehm, B., Erdogmus, H., Grünbacher, P., 2006. Value-Based Software
Engineering, Springer, Berlin Heidelberg.
Bryant, A., 2002. Grounding systems research: re-establishing grounded theory. Proceedings of
the 35th Annual Hawaii International Conference on System Sciences, (HICSS’02). IEEE
Computer Society, pp. 3446-3455, Big Island, Hawaii, 7-10 January 2002.
Børretzen, J.A., Stålhane, T., Lauritsen, T., Myhrer, P.T., 2004. Safety activities during early
software project phases. Proceedings of the Norwegian Informatics Conference, pp. 180191, Stavanger, Norway.
Børretzen, J.A., Conradi, R., 2006. Results and Experiences From an Empirical Study of Fault
Reports in Industrial Projects. Proceedings of the 7th International Conference on Product
Focused Software Process Improvement (PROFES'2006), pp. 389-394, Amsterdam, 1214 June 2006.
Børretzen, J.A., Dyre-Hansen, J., 2007. Investigating the Software Fault Profile of Industrial
Projects to Determine Process Improvement Areas: An Empirical Study, Proceedings of
the European Systems & Software Process Improvement and Innovation Conference
2007 (EuroSPI07), pp. 212-223, Potsdam, Germany, 26-28 September 2007.
146
Chillarege, R., Bhandari, I.S., Chaar, J.K., Halliday, M.J., Moebus, D.S., Ray, B.K., Wong, M.Y., 1992. Orthogonal defect classification - a concept for in-process measurements. IEEE
Transactions on Software Engineering, (18)11, pp. 943 – 956, Nov. 1992.
El Emam, K., Wieczorek, I., 1998. The repeatability of code defect classifications. Proceedings
of The Ninth International Symposium on Software Reliability Engineering, pp. 322-333,
4-7 November 1998.
EVISOFT
user-driven
R&D
project
on
http://www.idi.ntnu.no/grupper/su/evisoft.html.
SPI,
2006.
available
at:
Grady, R., 1992. Practical Software Metrics for Project Management and Process Improvement,
Prentice Hall.
Hansen, B.H., Kautz, K., 2005. Grounded Theory Applied - Studying Information Systems
Development Methodologies in Practice. Proceedings of the 38th Annual Hawaii
International Conference on System Sciences (HICSS'05), IEEE Computer Society, 10 p.,
Big Island, Hawaii, January 3-6 2005.
Hove, S.E., Anda, B., 2005. Experiences from conducting semi-structured interviews in
empirical software engineering research. 11th IEEE International Symposium on
Software Metrics, 10 pages, 19-22 Sept. 2005.
Leveson, N., 1995. Safeware: System safety and computers, Addison-Wesley, Boston.
Mohagheghi, P., Conradi, R., Børretzen, J.A., 2006. Revisiting the Problem of Using Problem
Reports for Quality Assessment. Proceedings of the 4th Workshop on Software Quality,
held at ICSE'06, pp. 45-50, Shanghai, 21 May 2006.
Sarker, S., Lau, F., Sahay, S., 2000. Building an inductive theory of collaboration in virtual
teams: an adapted grounded theory approach. Proceedings of the 33rd Annual Hawaii
International Conference on System Sciences, pp. 1-10, Hawaii, 4-7 Jan. 2000.
Seaman, C.B., 1999. Qualitative Methods in Empirical Studies of Software Engineering. IEEE
Transactions on Software Engineering, (25)4, pp. 557-572, July 1999.
Strauss, A., Corbin, J., 1998. Basics of Qualitative Research, Sage Publications, London, UK.
Vinter, O., Lauesen, S., 2000. Analyzing Requirements Bugs. Software Testing & Quality
Engineering Magazine, Vol. 2-6, Nov/Dec 2000.
147
148
P7. Using Hazard Identification to Identify Potential Software
Faults: A Proposed Method and Case Study
Jon Arvid Børretzen
Department of Computer and Information Science,
Norwegian University of Science and Technology (NTNU),
NO-7491 Trondheim, Norway
[email protected]
Abstract
When designing a business-critical software system, early analysis with correction of
software faults and hazards (commonly called anomalies) may improve the system’s
reliability and safety, respectively. We wanted to investigate if safety hazards, identified
by Preliminary Hazard Analysis, could also be related to the actual system faults that
had been discovered and documented in existing fault reports from testing and field use.
A research method for this is the main contribution of this paper. For validation, a
small web-based database for management of student theses was studied, using both
Preliminary Hazard Analysis and analysis of fault reports. Our findings showed that
Preliminary Hazard Analysis was suited to find potential specification and design faults
in software.
1. Introduction
When developing a critical software system, much effort is put into ensuring that the
system will have as few critical anomalies (faults and hazards) as possible in the context
of its environment and modes of operation. Despite this effort, critical systems are still
failing due to software faults, i.e. reducing reliability and possibly safety. A goal for the
research community is to develop and introduce processes and techniques to reduce the
number of critical faults and hazards in software. In this paper we present a novel
method, results and lessons learned in a study where we compared the findings from
Preliminary Hazard Analysis (PHA) with findings by traditional analysis of system
testing and field use fault reports, both applied to the same system. PHA is a review
technique for safety-critical systems, and used in early stages of development.
This paper is organized as follows. Section 2 gives our motivation and related work.
Section 3 describes the research method, research questions and procedure. Section 4
presents the proposed method, the results from the hazard analysis and the fault report
analysis. Section 5 presents the interpretation of these results and evaluation of the
work. The conclusion and further work is presented in Section 6.
149
2. Motivation and Background
When proposing a method in the border area between reliability and safety, we need
to clarify some of the terminology. A fault is an incorrect part of the system (program,
hardware, even “data”), i.e. where possible later execution will violate stated
requirements and cause a system failure (reliability dimension). A hazard is a state or
set of conditions of a system or an object that, together with other conditions in the
environment of the system or object, may lead to an accident (safety dimension) [1]. In
this paper, we will investigate whether the hazard analysis technique PHA can help us
to identify not only hazards, but also faults and in turn failures, thereby reducing the
reliability of the product stemming from these failures.
Since hazard analysis e.g. by PHA typically is performed in earlier phases of the
system development, we were motivated to investigate whether the PHA technique can
be used to reveal faults early in the system development process of a given system. This
paper describes a method and study where we analyzed fault reports from system testing
and field use and compared them with results from hazard analysis of the same software
system. In doing this we can compare the results from the PHA and the analysis of fault
reports, to see if some faults could potentially have been identified and removed earlier.
2.1 State-of-the-art
Measuring quality and effects on quality in a software system is not a trivial matter.
One of the means Avizienis et al. suggest for attaining dependability in a system is fault
removal and fault prevention in order to reduce the number and severity of faults [2]. By
identifying common fault types, developers can reduce the number of critical faults by
focusing their efforts on preventing such faults. Also, identifying the most severe fault
types makes developers able to focus on preventing those faults that have the biggest
detrimental impact on the system. This concurs with Boehm’s concept of “value-based”
software engineering and value-based testing, as presented in [3, 4]. Fault report
analysis can thereby be of help in identifying the most important fault types, in order to
focus quality improvement work on these in later projects.
Basili et al. have presented several experiments where inspection techniques are
compared to testing, for example in [5]. In a related article, Shull et al. presents
Perspective-Based Reading and how this technique can improve requirements
inspections [6]. Wagner has made a survey of the quality economics of defect-detection
techniques in [7], where he presents some numbers on the costs of removing faults at
different stages of development. In [8], Ciolkowski et al. state that the software review
is a popular quality assurance method, and presents a survey concluding that reviews
should be integrated in the development process, performed systematically rather than
ad hoc, and be optimized for their target system.
The PHA method is used in the early life cycle stages to identify critical system
functions and broad system hazards. The identified hazards are assessed and prioritized,
and safety design criteria and requirements are identified. As Rausand states, a PHA is
started early in the concept exploration phase, so that safety considerations are included
in trade-off studies and design alternatives [9]. This process is iterative, with the PHA
being updated as more information about the design is obtained and as changes are
150
being made. PHA is a relatively light-weight method, the information requirements are
low as high-level documentation like concept and requirements specification is
sufficient for an early PHA analysis. The method is also not very training-intensive, and
practitioners can start using the method fairly quickly. The PHA sessions are performed
as semi-structured brainstorming using the available documentation as source of
information. The results are sets of PHA sheets containing the identified hazards and
further information about the hazards, e.g. the cause and effect of the hazard and also
proposed measures for removing the hazard. This serves as a baseline for later analysis
and is used in developing system safety requirements and in the preparation of
performance and design specifications. Since PHA starts at the concept formation stage
of a project, little detail is available, and the assessments of hazard and risk levels are
therefore qualitative. A PHA should be performed by a small group with some
knowledge about the system requirements [10]. PHA is usually performed in order to
identify system hazards, translate system hazards into high-level system safety design
constraints, assess hazards if necessary, and establish a hazard log. These system
hazards are not equivalent with faults or failures. Failures (incorrect behaviour vs
requirement specifications) may contribute to hazards, but hazards are system states that
combined with certain environmental conditions, cause accidents regardless of whether
requirement specifications are violated.
More commonly used alternatives than the PHA method are different inspection
techniques for specification, design and code. Table 1 shows in which development
phases the different techniques are used to identify faults.
Table 1. Different techniques identify faults in different development phases.
Fault identifying technique
Development Inspections
PHA
Program
phase
execution
Requirements
●
●
Design
●
●
Coding
●
(●)
Testing
●
Field use
●
2.2 The DAIM context
DAIM is a web-based database for delivery and processing of academic master
theses. It is a small-framed system developed internally at the Department of Computer
and Information Science at the Norwegian University of Science and Technology
(NTNU). The development process was small-scaled, with strong user-orientation. The
specification and design process involved system users and administrators, and used
interviews and paper prototyping to produce specification and design documents. The
implementation was carried out by a small team, and consists mainly of a database
implementation and a php-based web presentation application.
The system description contains 14 distinct use cases, with description of
functionality for the different user types.
151
3. Research Method
This work proposes a method which combines two different analysis techniques,
where PHA is applied in early stages of software development, and testing or field use
with fault analysis is performed late in the development process, typically after system
testing or when the system is put in production. By comparing the results from a PHA
performed on available documentation of system concepts and specifications, with the
results from analysis of fault reports from late testing and field use, we want to
investigate how the PHA helps us in identifying hazards that are relevant towards faults
actually found in the system.
The dotted lines in Figure 1 show the common view of how faults are related to
reliability and hazards are related to safety, and how our work proposes how we may
possibly relate findings in hazard analysis to reliability as well, as shown by the full
line.
Faults
Reliability
Hazards
Safety
Figure 1. Linking hazards to reliability.
This results in a method as described in Section 4.1, of using safety reviews (like
PHA) on requirements and design documentation to not only find hazards but also find
faults. The converse could also be considered, using reliability reviews and inspections
of requirements, design and code documents to not only find faults but also hazards. In
Figure 1, this would have been represented by an arrow linking faults to safety.
3.1 Research questions
The research questions we wanted to explore in this study were the following:
RQ1: What kind of faults in terms of Orthogonal Defect Classification (ODC) fault
types does the PHA technique help elicit?
RQ2: How does the distribution of fault types found in the fault analysis compare to
the one found in the PHA?
RQ3: Does the PHA technique identify potential hazards that also actually appear as
faults in the software?
3.2 Hazard analysis by PHA
The hazard analysis was to be performed prior to studying the fault reports, so that
we would not be influenced by the faults that had actually been reported. This is also the
same order of analyses in a practical project; the hazard analysis would be performed at
an early stage of development, while the fault report analysis would be performed at the
very end of the development process.
152
To be able to compare the results from fault report analysis with those from hazard
analysis, we assigned one or more fault types to each of the hazards identified in the
PHA. We had to assign several fault types for some hazards that were somewhat generic
in nature and which could correspond to several different fault types. Some hazards
were not possible to relate to a fault type, for instance hazards related to human error or
manual routines not directly related to the software under study. These were then
classed as “Not fault” in accordance with our classification scheme.
3.3 Fault analysis
The fault analysis was based on Orthogonal Defect Classification (ODC) [11, 12],
and was performed by categorizing faults into fault types based on that technique. Table
2 shows the fault types that were used. The reason for using this categorization scheme
was that we had already performed fault analysis of several projects previously using
this scheme. We were therefore accustomed to using these fault types and felt they
worked well. In addition to the actual fault types, we added two categories. “Unknown”
which could be used for faults that we could not classify with certainty into one of the
fault types, and “Not fault” which was used when a reported fault was a false positive,
i.e. reported as a fault, but not actually a fault vs. system specifications.
Table 2. Fault types used in fault analysis.
Fault types
Algorithm
Assignment
Checking
Data
Documentation
Environment
Function
GUI
Interface
Relationship
Timing/serialization
Unknown
Not fault
One property of the ODC fault types is that they can be associated with different
process phases, as stated by Chillarege et al. in [11] and also by Zheng et al. in [13].
Table 3 shows the associations as presented in [13]. This division of fault types into
process phases can not be considered to be unassailable, but it gives a good indication
of where a fault of a certain type is most likely to have originated from.
3.4 Research execution
This PHA was performed not as a part of analyzing specifications, but rather after the
DAIM system had been developed and been put in use for some time. Usually a PHA
would be performed much earlier, but we chose to analyze a completed system in order
to compare PHA results with fault analysis results. The hazard analysis was performed
153
in four sessions, each session concentrating on the use-cases for certain user types.
These sessions were attended by five to six persons, of which one participant was a
system expert, and the others had experience in performing PHA. One person was
responsible for leading the sessions, and one person was scribe, recording the PHA
elicitations to PHA sheets. In total, the PHA sessions consisted of 38 staff-hours of
effort.
The fault analysis of the DAIM system was done by two researchers individually
categorizing the fault reports using a fault categorizing scheme based on that used in the
Orthogonal Defect Classification (ODC) scheme [11, 12]. We used fault descriptions in
the fault reports to categorize the faults into the fault types shown in Table 2.
Afterwards, we compared our categorization results and came to a consensus on the
reports where our initial categorization was dissimilar.
Table 3. ODC fault types and development process phase associations.
Process phase association
Design
Low Level Design
Code
Library Tools
Publications
Fault types
Function
Interface, Checking, Timing/Serialization, Algorithm
Checking, Assignment
Relationship
Documentation
4. Results
The results are presented in form of a description of the method we used for
evaluating the use of PHA to identify faults in Section 4.1, and then the presentation of
the results of this evaluation in Section 4.2 – 4.5.
4.1 Method description: Using PHA to identify faults
1) Initially, we define and delimit the system to be studied. Information required is
the same as that of a PHA analysis; a system description and requirements and design
documentation like use-cases, high level class-diagrams, or similar documentation. It is
important to make clear the system context and the roles of the members of the PHA
group: Are they to be independent of development, are they part of the development
team, are they domain experts?
2) Executing the PHA session(s). This involves making sure the group understands
the use of the PHA technique and that they have some knowledge of the system to be
analyzed, like its main functionality and the actors involved in system use. This group
meets and performs a systematic walkthrough of the available use-cases and system
descriptions to identify possible hazards. These are decided upon through discussion
and consensus and recorded in PHA tables, an example of which is shown in Table 4.
3) Next, the resulting hazards are considered in terms of which fault types they
potentially may cause. This is not necessarily a one-to-one relation; a hazard can be the
154
potential origin for several faults. The fault types used should be the same that are used
in the categorization task in step 4.
4) A collection of fault reports (from testing and field use) are compiled from the
system. If the fault reports are not already categorized, the faults are categorized by
using the same fault type categorization scheme as in step 3. This categorization should
be performed by persons that understand the fault type categories well. This also
requires that the fault reports are descriptive enough to be properly categorized.
5) Finally, we perform a comparison of the fault reports with the possible faults from
the PHA session, helped by the categorization of faults.
We sum up some attributes for this method in Table 4, which is based on a similar
description of review methods by Laitenberger et al. in [14].
Table 4. Method attributes.
Quality improvement
through fault and hazard
detection and removal.
Participants
Personnel familiar with
the PHA method, and at
least high-level system
knowledge.
Prescripti Walkthrough of system
on
documentation, like usecases and high-level
design documents.
Preparati Participants have studied
on
for system description and
meeting
documentation, but need
not have performed any
individual inspection.
Meeting
Group members suggest
and discuss system
hazards, with or without a
designated moderator.
Process
Goals
4.2 Hazard analysis (PHA)
The result of the PHA sessions was PHA sheets containing potential hazards the
group had elicited from the system description and documentation. Table 5 shows two
short examples of hazard results from the PHA sessions.
In total the PHA identified 33 hazards in the DAIM system. By assigning fault types
to the hazards, with some hazards potentially causing several types of faults, we
identified 43 potential faults in total. Six of these potential faults were later classified as
“Not fault” bringing the actual number of identified potential faults to 37. Figure 3
shows the distribution of hazards represented as fault types.
155
Table 5. PHA Sheet example.
Actor: Student
Hazard description
Cause
Unauthorized access
Illegal username in Data destroyed
DB
Username missing
Cannot perform work
from DB because of
faulty data import
Too strict network
Cannot perform work
policy
No access
Effect
Barriers/
measures
System feedback
Manual control
routines
Use different login
system
35,0
30,0
25,0
20,0
15,0
10,0
5,0
I
U
G
D
at
En
a
vi
ro
nm
en
As
t
si
gn
m
en
t
Ti
I
n
m
te
in
rfa
g/
ce
Se
ri a
l
i
D
za
oc
t
io
um
n
en
ta
ti o
n
D
up
l ic
at
R
e
el
at
io
ns
hi
p
U
nk
no
w
n
Fu
nc
tio
n
C
he
ck
in
g
N
ot
fa
ul
t
Al
go
rit
hm
0,0
Figure 3. Distribution of hazards represented as fault types (%).
4.3 Fault analysis
In total, 117 fault reports collected by both human reporting during system testing
and automatic failure log generation during system use were categorized using the ODC
fault types shown in Table 2. Figure 4 shows the distribution of fault types. Of these
117 faults, 25 were categorized as “Not fault”, giving us 92 actual faults found in the
system.
The collected fault reports from the DAIM system were split in two different groups one from system testing, and one from the first months of field use. The two groups had
different distribution of fault types. Figure 5 shows the difference in the distribution of
the faults reported in field use and the faults reported in system testing.
We see that there are certain fault types that were only reported at system test level
and not in field use, such as “documentation”, “function” and “GUI” type faults.
156
30,0 %
25,0 %
20,0 %
15,0 %
10,0 %
5,0 %
G
U
I
Da
En
ta
vi
ro
nm
As
en
t
si
gn
m
en
Ti
t
m
In
in
te
g/
Se rfac
e
ria
l
iz
D
a
oc
um tion
en
ta
tio
n
D
up
lic
R
at
el
e
at
io
ns
hi
p
U
nk
no
w
n
Fu
nc
tio
n
Ch
ec
ki
ng
N
ot
fa
ul
Al
t
go
rit
hm
0,0 %
Figure 4. Distribution of fault types (%).
System test
Field use
30,0
25,0
20,0
15,0
10,0
5,0
G
U
In
I
te
rfa
ce
N
ot
fa
Ti Rel
ul
m
t
at
in
io
ns
g/
Se
hi
p
ri a
li z
at
io
U
n
nk
no
w
n
oc
D
um ata
en
ta
ti o
D
up n
En l ica
t
vi
ro e
nm
en
Fu
t
nc
tio
n
D
Al
go
As rith
m
si
gn
m
en
C
he t
ck
in
g
0,0
Figure 5. Distribution of fault reports in the two DAIM fault report collections (%).
4.4 Comparing hazards and faults
The method of employing hazard analysis and analysis of fault reports that is used
here, could be described as triangulation of techniques to show how hazard analysis can
be used to identify possible faults in software.
Combining the distributions from Figures 3 and 4 in one graph, we get Figure 6,
which shows that the distribution of hazards and faults to be quite different to each
other.
Hazards (%)
Faults (%)
35,0
30,0
25,0
20,0
15,0
10,0
5,0
I
U
G
D
at
En
a
vi
ro
nm
en
As
t
si
gn
m
en
t
Ti
In
m
te
in
rfa
g/
Se
ce
ria
liz
D
at
oc
io
um
n
en
ta
tio
n
D
up
lic
at
R
e
el
at
io
ns
hi
p
U
nk
no
w
n
Fu
nc
tio
n
C
he
ck
in
g
N
ot
fa
ul
t
Al
go
rit
hm
0,0
Figure 6. Distribution of hazards identified and faults reported (%).
157
During the PHA, we found 33 hazards, of which 7 were classed as ”not fault”, i.e. the
hazards could not be connected to faults in the software vs. specifications. Of the
remaining 26 hazards, we found 6 hazards that could be linked to the actual faults that
were reported from testing and field use. Table 6 shows a short description of the
hazards and faults that were connected.
In addition to these 6, there were 20 more hazards that could be assigned a fault type,
i.e. they could potentially lead to a fault in the software. These could still exist in the
system but they have not been discovered in testing or field use. In Figure 7 we
illustrate this by adding the numbers to Figure 1. The six faults identified by hazard
analysis are shown in bold, and signify the same faults for each arrow.
Table 6. Examples of hazards that were linked to faults in fault reports.
Fault type
Data
Environment
Function
GUI
Timing/Ser.
Hazard description
Missing/Unauthorized
access due to user
import failure
E-mail addresses
containing illegal
characters
One person in group
delivers thesis without
consent
Group composition
difficulties
Wrong type of values
used in contract
Group composition
difficulties
Fault
report
analysis
•92 faults
PHA
•26 hazards
leading to
faults
86 + 6
20 + 6
20 + 6 + 7
Fault report
Missing username, user
import error
Wrong character set in use
Email was not sent when
group was established
Missing member of group
Student used wrong entry
fields
No group members
assigned to master thesis
Potential faults
affecting
reliability
•112 potential
faults
Potential hazards
affecting safety
•33 hazards
Figure 7. Linking hazards to faults.
With only 6 of the 92 faults reported also being identified by the PHA analysis, there
are many faults that was not identified by the PHA. With many of the faults reported
being pure coding faults, like the faults of the “assignment” fault type, this was to be
158
expected. A typical example of a fault report description in this fault type was “ThesisID missing for document”.
4.5 Efficiency of PHA for fault identification
Looking at staff-hours spent per fault identified, we get two figures, one for the faults
that were actually found in system testing and field use, and one for the total number of
potential faults in the system (including the actual faults). Table 7 shows these figures,
and compares them with some mean numbers on inspection efficiency from [7].
Table 7. Staff-hours spent per fault found.
Reported faults from testing also identified by PHA in this study
Potential faults identified by PHA in this study
Mean cost of requirements inspection [7]
Mean cost of design inspection [7]
Staff-hours/ fault
38/6=6.33
38/26=1.46
1.06
2.31
This result is based on four PHA sessions with five or six participants, but PHA can
also be performed with as little as two participants, and still produce good results in the
number of anomalies identified. This would certainly have reduced the ratio of staffhours per fault considerably.
5. Discussion
5.1 The results in terms of our Research Questions
Our main findings related to RQ1 were that PHA was most useful in eliciting hazards
that were related to “function” faults. These types of faults are related to specification
and design, as shown in Table 3 and stated in [11, 13]. Hazards related to the
“checking” and “algorithm” fault types were also common. Our reasoning about this
result is that when performing a PHA, you are mostly basing your analysis on
documentation and artefacts for the early stages of development. This means that you
are more likely to be able to elicit possible hazards that are related to more general
design and specification. Other types of hazards are found as well, but as the system
details are unclear, it is more difficult in the PHA to specify exactly what can go wrong
technically.
For RQ2, we did not find any correlation between hazards elicited through PHA and
faults found in the fault analysis. As for finding direct matches between PHA findings
and fault analysis, as stated in RQ3, there was a very low match rate. Of the 92 fault
reports, only 6 of them could be said to have been specifically elicited as hazards in the
PHA.
In this instance we believe that an important reason for the lack of match between
elicited hazards and faults reported is the nature of the system under study. Compared to
other systems we have performed fault analyses of, the DAIM system has a very
different fault distribution profile. Earlier, we have performed similar analyses of fault
159
reports, and these have had a distribution where “function” and “GUI” faults have been
the most frequent. [15, 16].
5.2 Comparing fault distribution with previous studies
When comparing the fault distribution for DAIM with fault distributions we have
found in previous studies, we see that the distribution for DAIM seems atypical. As an
example, we compare the fault distributions of DAIM and that of another fault report
study where five industrial projects were analyzed from system testing and field use
[16]. Figure 8 shows the difference between the distributions.
30,0
25,0
20,0
DAIM
15,0
[16]
10,0
5,0
As
si
gn
m
e
N nt
ot
fa
ul
t
D
at
In
a
t
e
En
r
vir face
on
m
Al en
go t
rit
hm
D
oc
um
G
e n UI
ta
ti
Fu on
nc
C tion
he
ck
D ing
u
R plic
Ti
e
m
in lati ate
g/
Se ons
ri a h i p
li z
at
U ion
nk
no
wn
0,0
Figure 8. Comparison of DAIM fault distribution with previous fault study (%).
This is one example of why we think the DAIM system has an atypical fault
distribution, and this is supported also by findings we made in [15] and also by Vinter et
al. in [17]. As we see in Figure 8, the very numerous fault types “function” and “GUI”
for the systems analyzed in [16] are not at all numerous in the fault reports for the
DAIM system. This will of course have an impact on the ability to compare the fault
reports in the DAIM system to the hazards found, where “function” and “GUI” were
more numerous. It seems it would have been more appropriate to perform a “postmortem” hazard analysis on the systems studied in [16] as they had a fault profile more
similar to the fault profile that hazard analyses are likely to result in.
5.3 Method evaluation: Improving specification and design inspection
The comparison of possible hazards identified during PHA sessions and the faults
found after system testing and field use is a novel approach for exploring how the PHA
method can be used for eliciting possible faults in a software system. PHA is a very
light-weight and easy to learn method, which is suited for use in very early phases of
development, as shown in Table 1. If we compare with the Perspective-Based Reading
technique from [7], using PHA will result in inspections where the role of the readers
will be with an emphasis on system safety.
Compared to the economics of other inspection methods, the results in terms of
efficiency depend on using the number of faults actually found or the number of
potential faults found. According to Wagner’s literature survey, the mean inspection
efficiency is 1.06 staff-hours per defect found for requirements and 2.31 staff-hours per
160
defect found for design [7]. Our study showed for actual faults found in fault reports an
efficiency of 6.33 staff-hours per defect, and for potential faults found an efficiency of
1.46 staff-hours per defect, as shown in Table 6. As the DAIM system under study had
not been injected with known faults, but was used because it was an accessible real life
system with available documentation and fault reports, it is difficult to say how many
actual faults the system had.
Also, it should be noted that the fault distribution of the DAIM system was very
different from several other systems we have studied in previous studies. Another
remark is that since PHA is a safety review technique, it will also catch potential safety
hazards like the technique is originally meant to do. So when using the proposed
method you could combine the results for safety and fault review, which would give
another number on the efficiency, staff-hours per caught anomaly (hazards and faults).
5.4 Validity threats
The main validity threats in this study are:
Construct validity:
The main threat to construct validity is the difference between hazards and faults.
Hazard analysis and fault report analysis do not produce the same type of reports. By
converting the hazards found to types of fault we were able to make a comparison
between the two. As the results show, most of the hazards identified did not show up in
the fault reports as actual faults. Still, a great deal of the hazards identified through the
PHA would have manifested themselves as faults, if these hazards over time and diverse
users had in fact occurred in the system. If these faults would have manifested
themselves as observable failures in some future execution context is another matter.
Internal validity:
One threat to internal validity is that the fault categorization was performed by us.
Since such categorization is a subjective task, this can affect the reliability of measures.
By having two persons independently categorize and then compare results, we feel we
reduced this threat.
Similarly, PHA sessions are also based on subjective views on the description of a
system. This threat is not possible to circumvent as the PHA technique is based on
personal ideas and collective “brainstorming”. The quality of the PHA results is
dependant on the participants experience and knowledge.
Another threat is the issue of unconscious bias or “fishing”. By comparing the results
from the PHA and the fault analysis, we were looking to find some connections between
the results and this could have lead us to find ”weak” connections which others would
not have found.
The group of hazards that were compared to the reported faults may cause a certain
degree of validity threat in the form of selection. The PHA sessions were time limited,
so only the most obvious hazards were taken into account. Also, the PHA sessions were
performed over a period of time, so some maturation in the form of better understanding
of the actual system may have occurred.
161
Another issue here is the time span of the study. It is possible that by studying too
short a time span of fault report collection, some fault types were underreported in the
collected fault reports.
External validity:
In our work we have analyzed data from only one and possibly atypical software
system, DAIM, and this will conflict with the ability to generalize the results. Another
issue is the size and simplicity of the software system studied, which may be smaller
than many other web based projects of similar type. The development process of this
system has also been rather small-scaled, with few people involved in design,
implementation and testing. This may influence the distribution of faults found, as the
developers have had a less complex system to develop. The reason thie DAIM system
was chosen for study was that it was system developed close to us, which gave us a lot
of freedom with respect to documentation accessibility and possibility for data
collection and clarification with developers.
6. Conclusion and further work
This paper has presented the description and an implementation of a novel method for
identifying software faults using the PHA technique. Because of the nature of the
system, the results did not turn out as clear as we had hoped. The fault reports were few
and mostly limited to certain types. On the other hand, we did identify 6 faults that were
actually found in the system as well as 20 potential faults that may be in the system. The
hazard analysis also showed that there is a certain type of faults that analysis techniques
such as PHA can help to uncover in an early process phase. Performing the PHA
elicited many hazards that could have been found in the system as “function” faults.
That is, faults which originate from early phases of system development, and are related
to the specification and design of the system. From this we conclude that PHA can be
useful for identifying hazards that are related to faults introduced early in software
development.
As for finding direct ties between hazards found in PHA and faults reported in fault
reports, we were not very successful. This, we feel, is mainly due to the studied
system’s particular fault type profile which was very different from fault distribution
profiles we had found in earlier studies. Some weak links were found, but the data did
not support any systematic links.
The method we have proposed in this paper should be validated by performing future
similar studies. Because of the circumstances and type of system we analyzed here,
interesting further work would be to execute a similar study on a larger system where
the fault distribution would be more similar to other systems we have conducted fault
report analyses of.
Acknowledgements
The author wishes to thank Professor Reidar Conradi for valuable input and reviewing. I
would also like to thank Jostein Dyre-Hansen, Professor Tor Stålhane, Kai Torgeir
162
Dragland, Torgrim Lauritsen and Per Trygve Myhrer for their assistance during the
execution of this study.
References
[1] N. Leveson, Safeware: System safety and computers, Addison-Wesley, Boston, 1995.
[2] A. Avizienis, J.-C. Laprie, B. Randell, C. Landwehr: “Basic Concepts and Taxonomy of
Dependable and Secure Computing”, IEEE Transactions on Dependable and Secure
Computing, (1)1, January-March 2004.
[3] L. Huang, B. Boehm: “How Much Software Quality Investment Is Enough: A Value-Based
Approach”, IEEE Software, (23)5, pp. 88-95, Sept.-Oct. 2006.
[4] S. Biffl, A.Aurum, B. Boehm, H. Erdogmus, P. Grünbacher: Value-Based Software
Engineering, Springer, Berlin Heidelberg, 2006.
[5] Basili, V.R., Selby, R.W.: “Comparing the Effectiveness of Software Testing Strategies”,
IEEE Transactions on Software Engineering, (13)12, pp. 1278 – 1296, Dec. 1987.
[6] Shull, F., Rus, I., Basili, V.: How perspective-based reading can improve requirements
inspections, IEEE Computer, (33)7, pp. 73-79, July 2000.
[7] Stefan Wagner: “A literature survey of the quality economics of defect-detection
techniques”, Proceedings of the 2006 ACM/IEEE international symposium on International
symposium on empirical software engineering, Rio de Janeiro, Brazil, September 21-22, 2006.
[8] Ciolkowski, M, Laitenberger, O, Biffl, S.: “Software reviews, the state of the practice”,
IEEE Software, (20)6, pp. 46-51, Nov.-Dec. 2003.
[9] M. Rausand: Risikoanalyse, Tapir Forlag, Trondheim, 1991.
[10] J. A. Børretzen; T. Stålhane; T. Lauritsen; P. T. Myhrer, “Safety activities during early
software project phases,” Proceedings of the Norwegian Informatics Conference, pp. 180-191,
Stavanger, Norway, 2004.
[11] R. Chillarege, I.S. Bhandari, J.K. Chaar, M.J. Halliday, D.S. Moebus, B.K. Ray, M.-Y.
Wong: “Orthogonal defect classification-a concept for in-process measurements”, IEEE
Transactions on Software Engineering, (18)11, pp. 943-956, Nov. 1992.
[12] K. El Emam, I. Wieczorek: “The repeatability of code defect classifications”, Proceedings
of The Ninth International Symposium on Software Reliability Engineering, pp. 322-333, 4-7
Nov. 1998.
163
[13] J. Zheng, L. Williams, N. Nagappan, W. Snipes, J.P. Hudepohl, M.A. Vouk: “On the value
of static analysis for fault detection in software”, IEEE Transactions on Software Engineering,
(32)4, pp. 240-253, April 2006.
[14] Oliver Laitenberger, Sira Vegas, Marcus Ciolkowski: “The State of the Practice of Review
and Inspection Technologies in Germany”, Technical Report ViSEK/010/E, ViSEK, 2002.
[15] J. A. Børretzen, R. Conradi: “Results and Experiences From an Empirical Study of Fault
Reports in Industrial Projects”, Proceedings of the 7th International Conference on Product
Focused Software Process Improvement, pp. 389-394, Amsterdam, 12-14 June 2006.
[16] Jon Arvid Børretzen, Jostein Dyre-Hansen: “Investigating the Software Fault Profile of
Industrial Projects to Determine Process Improvement Areas: An Empirical Study”,
Proceedings of the 14th European Systems & Software Process Improvement and Innovation
Conference, pp. 212-223, Potsdam, Germany, 26-28 Sept. 2007.
[17] O. Vinter, S. Lauesen: “Analyzing Requirements Bugs”, Software Testing & Quality
Engineering Magazine, Vol. 2-6, Nov/Dec 2000.
164
Technical Report (P8)
Diverse Fault Management – a comment and prestudy of
industrial practice
Jon Arvid Børretzen
Department of Computer and Information Science,
Norwegian University of Science and Technology (NTNU),
NO-7491 Trondheim, Norway
[email protected]
Abstract: This report describes our experiences with fault reports and their processing
from several organizations. Data from investigated projects is presented in order to
show the diversity and at times lack of information in the reports used. Also, we show
that although useful process information is readily available, it is seldom used or
analyzed with process improvement in mind. An important challenge is to describe to
practitioners why a standard description of faults could be advantageous and also to
propose better use of the knowledge gained about faults. The main contribution is to
explain why more effort should be put into codifying of fault reports, and how this
information can be used to improve the software development process.
1. Introduction
In all software development organizations there is a need for some minimum fault
logging and follow-up to respond to faults discovered during development and testing,
as well as claimed fault reports (really failure reports) from customers and field use.
Such reports typically contain fault attributes that are used to describe, classify, analyze,
decide on and correct faults. There are many standards for this kind of information,
although the original fault reports will be more of an ad hoc character than of a specified
standard. A software development organization will, in addition to a fault report
scheme, have defined own customized metrics and processes related to fault
management.
Systematic fault management is often also motivated by certification efforts, for
instance ISO 9000. Software Process Improvement (SPI) and Quality Assurance (QA)
initiatives can also be a motivation for fault management improvement work.
Despite of this, in most organizations there is still much underused or even non-used
data, either from lack of knowledge about the subject or from lack of procedures to
assist in using the available data. As Jørgensen et al. states, “no data is better than
unused data” [Jør98]. This is because collection of data that is not used, leads to waste
of effort during data collection, poor data quality and even possibly a negative attitude
to any kind of measurements during development or other SPI and QA activities.
165
In this investigative pre-study, we are reporting earlier experience from case studies and
data mining in 8 Norwegian IT organizations and also an Open Source Software
community, where fault data has been under-reported and/or under-analyzed. That is,
poor or wrongly coded classification of faults, missing fault information for affected
program module, no effort registration and so on. There is also the issue of fragmented
data representation, partial fault reports, Software Configuration Management logs, or
merely comments in code.
This paper describes our experiences with fault reports and fault reporting from working
with fault reports from several different organizations. Results from these studies have
been published papers like [Bor06, Bor07, Con99, Moh04, Moh06], but there is also a
need to describe what we have learned from these studies in a descriptive manner.
In this field of study, different terminology is used in various sources. For this paper, we
use the term fault in the same meaning as bug or defect. That is, a fault is the passive
flaw in the system that could lead to an observable failure (vs. requirements) when
executed. For fault report, other terms that are used are problem report and trouble
report.
2. Metrics
Our studies have given us insight and knowledge about the practice and information
available from fault report repositories in several commercial organizations. This
section presents these organizations and some attributes of these repositories. Such
information gives a quick insight into how fault reporting is performed and what
possibilities are available in termas of analysis and process improvement.
Table 1 shows an overview of the 8+1 involved organizations. Because of nondisclosure agreements with some organizations, their identities have been anonymized
(O1-O8). We compare with the Open Source organization Gentoo [Gen].
Table 1. Organization information
Organization
Period under
study
Domain
O1
1993-98
O2
2000-04
O3
2004-05
O4
2004-05
O5
2004-05
O6
2004-05
O7
2006-07
O8
2007
Gentoo
2004-06
Telecom
Telecom
Finance
Financial
Risk
managem
ent
Operating
system
400
300
~500 in
Norway
~320 in
Norway
~320
active
Erlang, C
Java
N/A
Several
Information
collected from
Studied by
1 project,
3 releases
Master
students
1 project.
3 releases
Master
and PhD
students
C,
COBOL,
COBOL
II
1 project,
18 months
PhD
student
~3900 in
Norway
and
Sweden
Java
~7000
worldwid
e
SDL,
PLEX
Knowledg
e and
process
managem
ent
Over 500
in
Scandinav
ia
Java
Security
Organization
size (not just
SW
developers)
Development
language
Knowledg
e and
process
managem
ent
~150 in
Norway
1 project,
10 months
PhD
student
1 project,
9 months
PhD
student
1 project,
24 months
PhD
student
1 project,
12 months
Post.Doc.
1 project
166
C, C++
5 projects,
6 months
Master
and PHD
student
PhD
student
For each of these organizations, we have studied and analyzed fault reports in one or
more development projects. From this, we have selected some relevant fault report
attributes, and report the situation for each of the organizations. The attributes are the
following:
• Fault report description: Whether the initial description of the fault is long or
short, this indicates how well the fault has been described when found.
• Fault severity: How many levels of fault severity does the organization use to
discern their faults?
• Fault type categorization: Does the organization classify faults according to
type?
• Fault location: Does the organization describe where the fault is located, either
structurally (i.e. which component) or functionally (what user function the fault
relates to)?
• Release version of fault: Does the organization register in which release of the
software the fault was found?
• Correction log: Does the organization keep a correction log for each fault,
where developers can enter information relevant to the identification of fault
cause and correction?
• Solution description: Does the organization record what the solution of the
problem was and how the fault was corrected?
• Correction effort: Is information recorded about the effort needed to find and
correct the fault?
• Mandatory completion: Are all fault report entry fields mandatory for
completion?
• Specialized fault report system or change reports: Is the fault reporting
system a separate entity, or is it used in combination with all change reports?
• Standard or custom fault reporting system: Does the organization use a
standard available fault reporting system, or do they use a custom made system?
These attributes are shown for each organization in Table 2.
Table 2 shows that there is a wide range of information used in the fault reports of these
organizations. For instance the amount of information recorded differs from well
described faults with correction log and solution description, to cases where the faults
are scantily described and the only information about correction or solution is whether it
has been solved or not.
167
Table 2. Fault report attributes for each organization
Organization
Fault report
description
Fault severity
Fault status
Fault type
categorization
Fault location
(functional or
structural
component)
O1
Long
O2
Long
O3
Long
O4
Short
O5
Short
O6
Long
(mostly)
3 levels
Yes
No
O7
Long
O8
Long
Gentoo
Long
2 levels
Yes
No
2 levels
Yes
Yes
3 levels
No
No
3 levels
Yes
Coarse
6 levels
Yes
No
5 levels
Yes
Yes
5 levels
Yes
No
7 levels
Yes
No
Structural
Structural
(but
many
misspellings)
Funct.
Funct.
Mix
Structural
comp.
Funct.
Structural
comp.
Yes
Yes
Yes
Anecdota
l
descriptio
n in some
correctio
n logs
Date
Release version
for fault
Correction log
or description
Solution
description
Correction
effort
Mandatory
completion of
fault reports
Specialized
fault report
system or
common change
report system
Standard or
custom fault
reporting
system
Yes
Yes
Yes
Yes
Yes
(coarse)
Yes
Date
Yes
No
No
Yes
Yes
No
Yes
No
Partly
Yes
No
No
Yes
Yes
No
No
?
?
No
No
No
No
No
No
No
Yes
Yes
No
No
Yes
Yes
No
No
No
Fault
reports
Change
reports
Fault
reports
Change
reports
Change
reports
Change
reports
Change
reports
Fault
reports
Fault
reports
Custom
Standard:
ClearCas
e
Custom
Custom
Standard:
Jira
Custom:
SQL and
web
Standard:
Mercury
Quality
Centre
Custom
Standard:
Bugzilla
3. Process
Using fault report information to support process improvement can be a viable approach
to certain parts of software process improvement [Gra92]. Some organizations have
used this approach actively, while others have not. For the most part, organizations have
done little work in this area until other researchers start studying (by data mining) their
fault report repositories. For each organization, we describe the level of fault report use
beyond fault correction, i.e. if any analysis work has been performed by the
organization itself, followed by what have been performed by us as researchers. This is
shown in Table 3. As we see, as external researchers we have been able to exploit the
available data in the companies in a much larger degree than the organizations
themselves.
168
Table 3. Level of fault report work and external reasearch
Organization
O1
Organizations’
work beyond fault
correction
Marginal
O2
Planned only
O3
None
O4
Basic analysis
O5
Basic analysis
O6
Basic analysis
O7
Some analysis
O8
O9
N/A
None
External research process performed
Study of the relation between complexity/modification-rate and
number of defects in different development phases, and whether
defects found during design inspections can be used to predict
defects in the same module in later phases and releases.
Study of defect-density and stability of software components in
the context of reuse.
Study of faults and fault types with the aim of locating potential
for general process improvement, as well as identifying the
software components with numerous faults.
Study of faults and fault types with the aim of locating potential
for general process improvement, as well as identifying the most
numerous fault types and the fault types leading to severe faults.
Study of faults and fault types with the aim of locating potential
for general process improvement, as well as identifying the most
numerous fault types and the fault types leading to severe faults.
Study of faults and fault types with the aim of locating potential
for general process improvement, as well as identifying the most
numerous fault types and the fault types leading to severe faults
and the software components with numerous and severe faults.
Study of faults and fault types with the aim of locating potential
for general process improvement, as well as identifying the most
numerous fault types and the fault types leading to severe faults.
Follow-up study on organization’s attitude to fault type
classification.
N/A
N/A
One example of research results is from organization O1, where one conclusion of the
work performed by the external researchers was performing software inspections was
cost-effective for that organization. Inspections found about 70% of the recorded
defects, but only cost 6-9% of the effort compared with testing. This yielded a saving of
21-34%. In addition, this study showed that the existing inspection methods were based
on too short inspections. By increasing the length of inspections, there was a large
saving of effort compared to the effort needed in testing. Figure 1 shows that by slowing
down inspection rate from 8 pages/hour to 5 pages/hour, they could find almost twice as
many faults. Calculations showed that by spending 200 extra analysis hours, and 1250
more inspection hours, they could save ca. 8000 test hours!
In O7, external researchers concluded through analysis of fault reports and fault types
that the organization’s development process had definitive weaknesses in the
specification and design phases, as a large percentage of faults that were found during
system testing were of types that originated mainly in these early phases. Additionally,
this external research lead the organization to alter the way they classified faults in a
pilot project in order to study these issues further.
169
>1
Recommended rate
actual rate
0.66
8
Figure 1 Inspection rates/defects in organization O1.
Other results we have drawn from several studies, is that the data material is not always
well suited for analysis. This is mostly because of missing, incorrect or ambiguous data.
It is apparent that since the organization generally does not use this data after recording
it, the motivation for recording the correct data is low. In O3, for instance, 97% of the
fault reports were classed as “medium” severity faults. This was the default severity
when recording a fault, and was rarely altered even if the fault was actually of a more
severe character.
There are some interesting fault report attributes that are not in wide use, even if the
information is most likely available. Such types of information could be very useful in
process improvement initiatives and the cost of collecting and analyzing this data is
marginal. Some examples of such attributes are the following:
•
•
Fault location: This attribute addresses where the fault is located, either as a
functional location or structural location. When the functional location of a fault
is reported, this is mainly from the view of the user or testers. It tells us which
function or functional part of the system where the fault is discovered through a
failure. In case of structural location, the fault report points to a place (or
several) in the code, an interface or to a component where the fault has been
found. For analysis purposes, the structural location is often the more useful
information.
Fault injection/discovery phase: The fault injection phase describes when in
the development process the fault has been introduced. Sometimes faults are
injected in the specification phase, but the most common faults are introduced in
design and implementation. Even during testing faults can be introduced, if you
include test preparation as part of the system implementation. The fault
discovery phase describes in which phase the fault has been discovered, and the
gap between injection and discovery should preferably be as small as possible,
170
•
because the longer a fault is present in the system, the more effort it will take to
remove it.
Fault cost (effort): This shows how much effort has gone into finding and/or
correcting a fault. Such information shows how expensive a fault has been for a
project, and may be an indication on fault complexity or areas where a project
needs to improve their knowledge or work process.
By introducing and implementing a core set of fault data attributes (i.e. a metric) to be
recorded and analyzed, we could make a common process for fault reporting. Already,
several schemes for recording and classifying faults exist, like the Orthogonal Defect
Classification scheme [Chi92], or the IEEE 1044 standard [IEEE 1044]. A core process
could be customized for organizations who want a broader approach to analysis of fault
reports. Some organizations use custom made tools for fault reporting, but a great deal
do use standard commercial or open source tools. Introducing a core set of fault report
attributes in these tools would help encourage organizations to record the most useful
information that can be used as a basis for process improvement. Many tools already
have functionality for analysis of the data sets they contain.
4. Use of data repositories
With industrial data repositories, we mean contents of defect reporting systems, source
control systems, or any other data repository containing information on a software
product or a software project. This is data gathered during the lifetime of a product or
project and may be part of a measurement program or not. Some of this data is stored in
databases that have facilities for search or mining, while others are not. Zelkowitz and
Wallace define examining data from completed projects as a type of historical study
[Zel98].
As the fields of Software Process Improvement (SPI) and empirical research have
matured, these communities have increasingly focused on gathering data consciously
and according to defined goals. This is best reflected in the Goal-Question-Metric
(GQM) paradigm developed first by Basili [Bas94]. This explicitly states that data
collection should proceed in a top-down way (i.e. designing research goals and process
before examining data) rather than a bottom-up way (i.e. designing reseach goals and
process after seeing what data is available). However, some reasons why bottom-up
studies are useful are (taken from [Moh04]):
1. There is a gap between the state of the art (best theories) and the state of the
practice (current practices). Therefore, most data gathered in companies’
repositories are not defined and collected following the GQM paradigm.
2. Many projects have been running for a while without having improvement
programs and may later want to start one. The projects want to assess the
usefulness of the data that is already collected and to relate data to goals (reverse
GQM).
3. Even if a company has a measurement program with defined goals and metrics,
these programs need improvements from bottom-up studies.
171
Another issue of data repositories is the ease of which data can be extracted for analysis.
An example is from O1, where the researchers had to go to a great deal of effort to
convert the fault data into a form that could be analyzed. In O3, the fault reports could
only be accessed for analysis by printing hardcopies of the reports, which in turn had to
be scanned and converted into data that could be analyzed. To be able to support
process analysis in an efficient manner, the availability and form of the fault repositories
should be in a standard and well kept form.
5. Discussion and conclusion
We have presented an overview of studies performed concerning fault reports, and
shown the type of information that exists and is lacking from such reports.
What we have learnt from the studies of the fault report repositories of these
organizations is that the data is in some cases under-reported, and in most cases underanalyzed. By including some of the information that the organization already has, more
focused analyses could be made possible. For instance, specific information about fault
location and fault correction effort is generally not reported even though this
information is easy to register. One possibility is to introduce a standard for fault
reporting, where the most important and useful fault information is mandatory.
A reasonable approach to improving fault reporting and using fault reports as a support
for process improvement is to start by being pragmatic. At first, use the readily
available data that has already been collected, and in time change the amount and type
of data that is collected through development and testing to tune this process.
We have learnt that the effort spent by external researchers to produce useful results
based on the available data is quite small compared to the collective effort spent by
developers recording this data. This shows that very little effort may give substantial
effects for many software developing organizations.
Finally, there are two main points we want to convey as a result of the studies we have
done in these organizations:
• It is important to be able to approach the subject of fault data analysis with a
bottom-up approach, at least in early phases of such research and analysis
initiatives. The data is readily available, the work that has to be performed is
designing and performing a study of these data.
• Much of the recorded fault data is of poor quality. This is most likely because of
the lack of interest in use of the data.
References
[Bas94] Basili, V.R., Calidiera, G., Rombach, H.D.: Goal Question Metric Paradigm.
In: Marciniak, J.J. (ed.): Encyclopaedia of Software Engineering, pp. 528-532, Wiley,
New York, 1994.
172
[Bor06] Børretzen, J.A., Conradi, R.: Results and Experiences From an Empirical Study
of Fault Reports in Industrial Projects. Proceedings of the 7th International Conference
on Product Focused Software Process Improvement (PROFES'2006), pp. 389-394,
Amsterdam, 12-14 June 2006.
[Bor07] Børretzen, J.A., Dyre-Hansen, J.: Investigating the Software Fault Profile of
Industrial Projects to Determine Process Improvement Areas: An Empirical Study.
Proceedings of the European Systems & Software Process Improvement and Innovation
Conference 2007 (EuroSPI’07), pp. 212-223, Potsdam, Germany, 26-28 September
2007.
[Con99] Conradi, R., Marjara, A.S., Skåtevik, B.: An Empirical Study of Inspection and
Testing Data at Ericsson. Proceedings of the International Conference on Product
Focused Software Process Improvement (PROFES'99), p. 263-284, Oulu, Finland, 2224 June 1999.
[Chil92] R. Chillarege; I.S. Bhandari; J.K. Chaar; M.J. Halliday; D.S. Moebus; B.K.
Ray; M.-Y. Wong, “Orthogonal defect classification-a concept for in-process
measurements”, IEEE Transactions on Software Engineering, Volume 18, Issue 11,
Nov. 1992 Page(s):943 - 956
[Gra92] Grady, R.: Practical Software Metrics for Project Management and Process
Improvement, Prentice Hall, 1992.
[Gen] The Gentoo linux project, available from: http://www.gentoo.org/
IEEE 1044] IEEE Standard Classification for Software Anomalies, IEEE Std 10441993, December 2, 1993.
[Jør98] Jørgensen, M., Sjøberg, D.I.K., Conradi R.: Reuse of software development
experience at Telenor Telecom Software. In Proceedings of the European Software
Process Improvement Conference (EuroSPI'98), pp. 10.19-10.31, Gothenburg, Sweden,
16-18 November 1998.
[Moh04] Mohagheghi, P., Conradi, R.: Exploring Industrial Data Repositories: Where
Software Development Approaches Meet. In Proceedings of the 8th ECOOP Workshop
on Quantitative Approaches in Object-Oriented Software Engineering (QAOOSE’04),
pp. 61-77, Oslo, Norway, 15 June 2004.
[Moh06] P. Mohagheghi, P., Conradi, P., Børretzen, J.A.: Revisiting the Problem of
Using Problem Reports for Quality Assessment. Proceedings of the 4th Workshop on
Software Quality, held at ICSE'06, pp. 45-50, Shanghai, 21 May 2006.
[Zel98] Zelkowitz, M.V., Wallace, D.R.: Experimental models for validating
technology. IEEE Computer, (31)5, pp. 23-31, May 1998.
173
174
Appendix B: Interview guide
Questions for Test Managers
Background
1. Which responsibilities do you have in the organization?
2. How long have you been working in the company?
3. What was your involvement in the project under study?
4. Are you still involved in work with this project?
On the study results
1. The results from our study (both on the organization in general and this project)
shows that many faults are of a character that points to them having been
introduced in specification and design phases. How does this compare to your
impression of faults that are found in your projects?
2. How do you feel that the analysis results for this project compares to your
experience of the project?
3. What do you think about the fault categorization scheme we have used, based on
ODC?
On the organization’s own measurements and results
1. The organization uses its own way of categorizing faults today, how do you
think this works?
2. Some results we have received from the organization indicates where in the
development process faults have been introduced and where they have been
discovered, does your project report this type of information?
3. How do you separate design faults and implementation faults, when fault
reporting is concerned? Do design faults sometimes get reported as change
requests?
The quality system
1. What is the fault reporting process like in your organization, and who is
responsible for quality?
2. Which tools do you use in fault reporting? Are they the same as in change
request reporting?
3. What is the fault correction process like?
4. How much effort does it take to register a fault report? Do you think this task
could or should be simplified?
5. Do the reporters of a fault have the same access to system and information as the
ones who are going to correct it?
175
6. Do you think that all the necessary information is accessible when reporting a
fault?
7. Do you think that all the necessary information is accessible when correcting a
fault?
8. Is the fault reporting in any way used as a basis for process improvement, or is it
only used as a fault reporting log for faults that are to be corrected.
9. Do you register information about hours of effort for fault finding and
correction? This is relevant towards knowing which faults that requires the most
resources.
10. Do you think the tool support for fault reporting is good enough?
Fault reports: Available information
Amount of information, correct fields, number of fields
1. Do you think the fields that are used in the fault reporting system are sufficient?
2. Are there any extraneous fields that are not used or that is used without further
use of that information?
3. Do you think that any fields are missing?
4. Do you have any information about fault location? In some projects you use the
field “Testobjekt”, does this describe functional modules or structural modules
which can be linked to code?
5. Do you have the necessary information available to tell which components that
are involved in a fault correction, or is this implicit knowledge only the
developers have?
6. Is it possible to later use the fault reporting tool to look for which
components/code parts that have been involved in a fault correction? For
example to be able to find which components that have the most severe faults
and so on.
Feedback from fault reporting
1. Do you have any sort of feedback to the developers based on what you find in
your quality system?
2. How do you think feedback from what is being done can be used for
improvement?
3. Have there been any changes in terms of technical issues, development
processes or that of being a systems developer for you, based on what is being
uncovered as faults during development of your systems?
Process changes
1. Do you think the organization is willing to change the reporting routines, with
respect to adding information for use in analysis (or change in order to increase
preciseness/correctness of the information)?
2. Do you think such changes would be useful in order to improve product quality?
3. How much effort and which actions do you think the company initiates in order
to change processes related to process improvement?
176
© Copyright 2026 Paperzz