Standard error classification to support software reliability assessment

Standard error classification to support
software reliability assessment
by JOHN B. BOWEN
Hughes-Fullerton
Fullerton, California
SUMMARY
application of software development tools and techniques,
but the reality of the situation does not support this conclusion. Project managers who have the ability to both initiate
error reporting procedures, and analyze the incoming data,
do not consistently take action resulting from the analysis
of the error reports." This observation is still true today.
Typically, the reason for not acting on the analysis of error
trends is the overriding pressure of getting the immediate job
done on schedule. Such a reason is understandable particularly during the latter stages of development. However,
software management appears to be remiss in not applying
the results of error analysis to subsequent projects.
In the case of predictive software reliability models, most
program managers have doubts about their usefulness. Consequently some view error data collection as a nonproductive extra burden. This view is unfortunate, because only
with the support of conscientious error data collection can
proposed quantitive reliability models be validated.
The Rome Air Development Center (RADC) has sponsored numerous studies on software error collection and
analysis, starting with a software reliability study by TRW!
which included an error category scheme, which was generated as the raw error data was analyzed. This error scheme
was used in later RADC studies ,4,5 but no approved standard
error classification has emerged within the Air Force, to
date. The Navy has included a software trouble classification
in its recent MIL-STD-1679;6 however the four categories
do not have enough detail to assist in feeding back constructive information to management.
This paper proposes the standardization of a set of software error classifications that have casual, severity, and
source phase properties. Such a set will assist the project
manager in taking remedial action to improve reliability, support company and software community efforts in evaluating
the impact of reliability-producing techniques, and aid in
validating software reliability models and metrics. The term
software reliability, therefore, as used in this paper represents both the assessment of the use of reliability-producing
factors and the prediction of residual errors. Although reliability models are primarily used to predict residual errors
existing after acceptance testing, they can also be applied
to earlier development phases if primed with sufficient error
data. Reliability prediction is not concerned with the casual
A standard software error classification is viable based on
experimental use of different schemes on Hughes-Fullerton
projects. Error classification schemes have proliferated independently due to varied emphasis on depth of casual traceability and when error data was collected. A standard classification is proposed that can be applied to all phases of
software development.· It includes a major casual category
for design errors. Software error classification is a prerequisite both for feedback for error prevention and detection,
and for prediction of residual errors in operational software.
INTRODUCTION
The ability of managers and technical developers to influence
the reliability of software is very high at the outset of a project but declines rapidly as commitments are made, schedule
time and budgets are used, and code and documents are
produced. The acceptance test phase is the very time when
little chanceremains to influence the reliability of the system
except by rebuilding the deficient parts. A significant goal
is to alert management as early as possible in the development phases of critical problems and adverse trends that
could degrade software reliability. Since up to 60 percent of
the errors detected in the life cycle of software have been
committed during the design phase,! a major challenge is to
devise error categories that are sensitive to that phase, and
thereby provide feedback. Management feedback has been
difficult to obtain, because programmers have traditionally
enjoyed a pride of codemanship that rarely admits to the
existence of errors. However, with the advent of Modern
Programming Practices (MPPs), such as code reviews, software errors are available for analysis and feedback-even
before a program module is executed.
A special conference on the problems of data collection 2
concluded that "The most success in data collection has
been realized in those places where there has been feedback. " Over three years ago Marcia Finfe~ noted that
"Many papers addressing the problem of error collection
and quantization state that greater understanding of software errors will lead to the improvement in the design and
697
From the collection of the Computer History Museum (www.computerhistory.org)
National Computer Conference, 1980
698
properties of errors, but should be concerned with severity
and source phase properties.
NEED FOR A STANDARD ERROR CLASSIFICATION
Like most human activities, the software engineering environment is a complex of a g~eat variety of interrelated factors. Some researchers such as Willmorth, et al. 7 conclude
that "No one set of data parameters collected for research
purposes will significantly support a wide range of reliability
analyses." Weiss8 contends that error classifications need
to be tailored for each study or application so that the questions of interest can be answered. I contend that there is a
need for a standard scheme to classify error data which represents the basic characteristics of the software environment.
In fact, a number of organizations and agencies such as
the Joint Logistic Commanders, U.S. Navy, IEEE Computer Society, and a number of industrial companies have
developed, or °are in the process of standardizing, software
error classificationso Unfortunately, few of these schemes
are compatible with each other. Only the severity classifications are similar, and even in this case the number of severity categories ranges from three to five. RADC has inaugurated a software data collection and analysis program9
which has as one of its major objectives to "Promote standards of software data collection, and support the development and definition of common software data collection terminology. "
The necessity of a standard error classification scheme
becomes evident when the needs of a large project and research activities are examined. A few examples are: to provide feedback to develop software design standards; provide
guidance to test engineers; evaluate modern programming
practices; evaluate verification and validation tools; and validate and support quantitative reliability models. The minimal
ingredients of such a scheme are listed in Table I.
Since some studies report that as much as 60 percent of
all software errors originate in the design phase, it is important that error collection and classification be sensitive
to the point in time in the life cycle of a program when the
error occurs. Only then can improved software design standards be developed. In addition, the distribution of types of
errors from related projects can assist test engineers and
quality analysts in concentrating their activities. For instance, if one particular application is expected to have a
preponderance of computational errors then the test planners would profit by applying dynamic tools, rather than
static tools, to uncover such errors. 10 Thus, while it has been
TABLE I.-Questions that can be answered by a feedback-oriented
classification scheme
When
-
How
What
In what phase in the software development cycle did the
error ori ginate?
What did the designer/analyst/programmer do wrong?
-
What is the effect of exercising the resultant fault?
established that the use of error classifications can aid in
evaluating all phases of software development, the most rewarding efforts occur during the early phases, such as design. As suggested by Finfer,3 error analysis can indicate the
necessity to apply additional personnel to a particularly
error-prone program or subsystem, and a cluster of errors
in a related group of programs may indicate that particular
software is poorly designed. In a study for NASA-Langley,
Hecht l l recommended that "Classification by cause of failure is desirable in order to organize remedial measures. This
information is of value for the management of the immediate
project on which it is obtained, for overall software management (e.g., in guiding the allocation of resources), and
for the development of improved software engineering tools
and procedures (language processors, test tools)."
Thus while these examples illustrate the underlying necessity for developing a standard software error classification scheme, the problem is not exactly new. A software
data collection conference in 19752 concluded that: "Standardization of data items, collection procedures, and project
characteristics is needed to provide comparability of measures in evaluating tools, techniques, and methods." This is
still true today especially in the validation of predictive software reliability models and software reliability metrics, as
well in the selection of the best V&V tools and techniques.
One of the major hurdles in comparability is the difficulty
in controlling all of the factors that influence software development during an experiment that compares two software
development activities using different modern programming
practices. It is difficult to compare the programming activity
of different projects using error analysis because of uncontrollable factors-such as programmer background, hardware and software environment, and applications. Error
density is frequently used to evaluate MPPs. For example,
IBM12 compared two large projects: One project with topdown design, structured code, chief programmer teams, and
a librarian, had an error rate of 1.0 per 100 lines of code.
Another project, using conventional techniques, had twice
that rate. This report is an example of the typical use of
unqualified errors to evaluate the effectiveness of MPPs. In
the final analysis, such a use can be misleading unless the
researcher reveals when the errors were detected, and how
severely these errors impact mission performance. Even if
the errors are qualified there must be a common understanding of the classification scheme. Susan Gerhart 13 reports
"The study of observed ~rrors on the fallibility of modern
programming methodologies suffered from an inconsistent
error domain which caused several types of classification
schemes to be difficult to construct and to interpret."
Castle, in a thesis on validation of software reliability math
models,14 states that if he had to make one recommendation,
it would be the importance of continued software error collection. He pointed out, "A dis~ase cannot be cured without
knowledge of the cause. So is the case with unreliable software." In a list of 22 software error characteristics for collection, he includes the phase in which the error occurred,
the criticality of the error, and the error categories (causal)
with unambiguous definitions. As a result of a study of candidate software reliability models, Kruszewskp5 recom-
From the collection of the Computer History Museum (www.computerhistory.org)
Error Classification to Support Software Reliability Assessment
mended improved data collection with formal error reporting
and using causal and severity categories. Schafer in a recent
RADC study to validate candidate software reliability models 16
used 16 sets of project error data which represented a total
of 31,181 errors. The results of the study indicated that in
general the software models fit poorly due to vagaries of the
data, rather than shortcomings of the models. The study report concluded that more work remains in the area of software error data collection. Echoing these findings, Sukert,
at a recent conference,17 recommended the development of
software error data collection standards, and the study of
software reliability predictions based on error criticality categories.
SURVEY 'oF CANDIDATE ERROR CLASSIFICATION
SCHEMES
An excellent survey of the state-of-the-art in software
error data collection and analysis was published by Robert
Thibodeau. 18 His report describes recent efforts of government agencies, educational institutions, and private companies; and includes synopses of several studies on software
error collection and analysis. On the topic of error classification he states:
"The study of software errors requires them to be separated according to their attributes. This is the first step in
understanding what causes them and, subsequently, how
they may be prevented. The need for a practical error classification is important and, since it applies to nearly all areas
of software research, it deserves to be treated as a separate
topic. "
699
TRW software reliability study
During a study for RADC,I TRW-Redondo Beach devised
a software error classification scheme with twelve major
causal categories. The study also developed a source phase
classification. These classifications which were iteratively
developed during a 2.5-year study are listed in Table II.
Study of errors found in validation
Raymond Rubey in a technical paper published in 197520
presents several error categories. He stated that, "The most
basic data required about the errors found during validation
are the frequency of occurrence of those errors in defined
error categories and their relative effect or severity. " Three
of the proposed error classification schemes are included in
Table III.
'
ANISLQ-32(V) verification and validation
In May 1977 the Navy distributed a statement of work for
V& V services21 which characterized the software errors encountered during software development as follows:
Requirements
Processing Design
Data Base Design
Interface Design
Processing Construction
Data Base Construction
Interface Construction
Verification
Specification (all documentation)
Mitre error classification study
In early 1973 MITRE Corporation, under contract to
RADC, developed a general software error classification
methodology.19 The methodology was designed to serve as
a guideline for experiment-specific application. The proposed classification scheme is hierarchical, and consists of
five major categories:
1.
2.
3.
4.
5.
Where did the error take place
What did the error look like
How was the error made
When did the error occur
Why did the error occur
The associated subcategories are not unique to the major
categories and include attributes such as People, Hardware,
Software, Mechanical, Intellectual, and Communicational.
The scheme accounts for the fact that a single error can have
a number of characteristics occurring simultaneously. The
report addresses the problem of mUltiple classification of the
same error, and suggests the use of the" fuzzy set theory
where multiple classifications are qualified by degree to fully
describe a single software error.
TABLE H.-Software error classifications developed during TRW
reliability study
COMPUTATIONAL
LOGIC
DATA INPUT
DAT A HANDLIN G
DATA OUTPUT
INTERFAGE
DATA DEFIN IT ION
DATA BASE
OPERATION
OTHER
DOCUMENTATION
PROBLEM REPORT REJECTION
Source Phase
REQUIREMENTS
DESIGN
CODING
MAINTENANCE
NOT KNOWN
From the collection of the Computer History Museum (www.computerhistory.org)
National Computer Conference, 1980
700
TABLE I11.-Error classifications proposed by Rubey study
Discussion
Causal
INCOMPLETE OR ERRONEOUS SPECIFICATICN
INTENTIONAL DEVIATION FROM SPECIFICATION
VIOLATION OF PROGRAMMING STANDARDS
ERRONEOUS DATA ACCESSING
ERRONEOUS DECISION LOGIC OR SEQUENCING
ERRONEOUS ARITHMETIC COMP UT ATION S
INVALID TIMING
IMPROPER HANDLING OF INTERRUPTS
WRONG CONSTANTS AND DATA VALUES
INACCURATE DOCUMENTATION
Severity
SERIOUS
MODERATE
MINOR
Source Phase
DEFINING THE PROGRAl\] SPECIFICATION
DEFINING TilE PROGRAM
CODING
PERFORMING MAINTENANCE FUNCTIONS
JLC preliminary error classification
In April 1979 the Joint Logistics Commanders Joint Policy
Coordinating Group on Computer Resource Management
held a software workshop22 where preliminary general categories for classifying software errors were defined. As
shown. in Table IV three major casual categories and four
severity categories were included.
Most of the software error classification schemes surveyed
have a separate classification for severity or impact on mission performance. However, there was no general agreement
on using distinct classifications for cause and source phase.
Some error causes are phase peculiar, therefore a combined
single category would result in fewer subcategories than all
possible combinations of source phase and casual subcategories. This advantage appears to be outweighed, however,
by the ease in implementing automated statistical analysis
of the phase and casual attributes when the categories are
separated.
It should be noted that only the Navy AN/SLQ-32(v) casual classification scheme included unique categories for design errors. Rubey's classification contains only one special
design category, intentional deviation from specification.
(This category could be interpreted as representing either a
design or coding activity.) The JLC classification has design
categories; however they are combined with requirements
(e.g., incomplete requirements or design).
RESULTS OF USING EXPERIMENTAL
CLASSIFICATIONS ON HUGHES-FULLERTON
PROJECTS
For over two years Hughes experimented with a software
error classification scheme on an Army project during the
development phases. The classification scheme used on this
project was based on the scheme proposed by Rubey.20
Three classifications were used: Severity, Cause, and Miscellaneous as shown in Table V. The casual classification
TABLE IV.-Software error categories proposed by Joint Logistic
Commanders
Software Specifications
1.
2.
3.
4.
5.
6.
7.
TABLE V.-Hughes-Fullerton experimental error classification
Unnecessary functions
Incomplete requirements or design
Inconsistent requirements or design
Untestable requirements or design
Heq uirements not traccable to higher specifications
Incorrect algorithm
Incomplete or inaccurate interface specifications
Code
1.
2.
3.
4.
5.
6.
7.
S.
1.
2.
3.
Syntax errors
Non-compliance with specification (s)
Interface errors
Exception handling errors
Shared variable accessing error
Software support environment errors
Violation of programming standards
Operational support environment errors
Accuracy
Precision
Consistency
Severity
1.
Prevents accomplishment of its primary function, jeopardizes safety,
or inhibits maintainability of the software
2.
Degrades performance or maintainability, with no workaround
3.
Degrades performance or maintainability, but a workaround exists
4.
Doesn't adversely affect performance or maintainability (such as
documentation, etc. errors transparent to users)
Severity
CR
l\IA
l\JJ
System Crash or Serious Effect on l\lission Performance
Incorrect Values that Reduce Mission Performance
Incorrect Values that have Tolerable Effect on l\lission
Cause
REQl\IT
PROGl\1
SPECS
LOGIC
!l\/PVE
INTRT
LINKE
ARITII
ALGOR
DOCUM
EDIT
DATAl
DATA2
DATA3
DATA4
DATA5
DATA6
DATA7
DATAS
Expanded. Reduced, or Erroneous Requirements
Non Responsive Program Desi gn
Incomplete or Erroneous Prog-ram Design Specifications
Erroneous Decision Logic on Sequencing
Improved Program Storage or Response Time
Improper Handling of Interrupts
Incorrect l\lodule or !loutine Linl<nge
Erroneous Arithmetic Computations
Insufficient Accuracy in Implementation of Algorithm
Inaccurate or Incomplete Comments on Prologue
Erroneous Editing for New Version Update
Incomplete or Inconsistent Data Structure Definition
Wrong Value for Constant or Preset Data
Improper Scaling of Constmlt or Preset Data
Uncoordinated Use of Data by !\lore than O,ne User
Erroneous Access or Transfer of Data
Erroneous Reformatting or Conversion of Data
Improper !\lasking & Shifting During Data Extraction &
Storage
Failure to Initialize Counters, .Flags, or Data Areas
Miscellaneous
INTRa
STAND
NelV Error Introduced During Correction
Noncompliance with Programming Standards and
Conventions
--------------------------
From the collection of the Computer History Museum (www.computerhistory.org)
Error Classification to Support Software Reliability Assessment
was open-ended, that is to say, categories were added as
required during the project. The Data category was assigned
most frequently (23 percent of total errors), consequently
it was divided into eight subcategories. Incomplete or erroneous program design specifications accounted for 15 percent of the total number of errors; logic for 14 percent; and
requirements, program design, and access or tran'sfer of data
for 10 percent each.
On a similar Army project,23 Hughes has over a year's
experience in using an error classification scheme based on
the TRW/RADC scheme. The casual classification was assigned separately from the source phase, and was tailored
to the following ten major categories (percent of total errors
are shown in parentheses):
Computational (4)
Logic (38.5)
Data definition (20.5)
Data handling (14)
Data base (3)
Interface (4.5)
Operation (1)
Documentation (0.5)
Problem Report Rejection (NA)
Other (13.5)
The major categories, Data Input and Data Output, were
dropped, because they were not appropriate to the application.
An analysis of error trends on this project revealed that
eight problems were caused by the improper selection of
instructions. Accordingly, it was felt that this class of errors
warranted a separate subcategory. Since such a selection
could result from either misunderstanding or carelessness,
the following two subcategories were added to the Other
category:
Selection of wrong instruction or statement
Careless selection/omission of instruction or statement
It is believed that these two categories will determine the
need for improved training of new programmers on subsequent projects in the understanding of the instruction repertoire. Such categories may be useful in validating complexity metrics such as the one proposed by Ruston. 24 The
metric is based on information theory, and assumes that the
less frequently an operator or operand is used then the more
difficult it is for the programmer to use correctly.
On a Navy project, Hughes employed a code review technique which included the recording of errors according to
categories. Five hundred modules had a total of 765 errors;
the remaining 742 had no problems. 25 Table VI presents the
distribution of the most frequent errors, and compares the
distribution with comparable categories from IBM's code
inspection technique. 26 The high percentage of errors due
to missing or insufficient listing prologues and comments for
the Hughes project was probably due to the novelty of such
a requirement early in the coding phase.
701
TABLE VI.-=-Distribut~~_~~_:rr~~ dete~!~.~ during code inspection
~
Categury
-----
Prologue IComments
Desig'n Conflict
Logic
Programming Standards
Language Usage
Other
Module Interface
Data Base
Total
_____ y
of Total
Hug'hes
44
19.5
11.5
11
5
3
3
IBM
17.0
25.5
30.5
4.5
12.5
3.5
6.5
3
100.0
100.0
_ _ _ ._•........ ___ . ____ ...L _ _ _ __
RECOMMENDED ERROR CATEGORIES
With respect to proposed error classification schemes, the
applicability to more than one project, the excessive granularity and ambiguity of subcategories, have been called out
as problems. Hughes has found that the use of a minimal set
of three software error classifications (Cause, Severity, and
Source) solves these problems and is sufficient to support
the assessment of software reliability. As summarized in
Table VII, Source tells in which software development phase
the error originated in, Cause tells what the analyst or programmer did wrong, and Severity tells whether the manifestation of the error degrades mission performance.
The recommended casual classification for software reliability assessment, containing seven major categories, is
shown in Table VIII. The scheme can be tailored by adding
subcategories of interest or exception, such as problem rejection, to the Other category. A definition of each category/
subcategory is presented in Appendix A.
A severity classification of at least three categories (for
example Critical, Major, and Minor) is recommended. In
addition to guiding project managers in assigning priorities
to the troubleshooting and resolution of problems, severity
categories are necessary for practical application of predictive software reliability models. In order for the prediction
of residual software faults to be meaningful, the impact of
the execution or manifestation of the fault on the system
mission performance must also be included. Some proposed
reliability models such as the execution time theory model
can accommodate severity by running separate predictions
for each severity category of interest. The justification for
the recommended error casual and source phase categories/
subcategories is discussed in the following subparagraphs.
TABLE VII.-A software error classification scheme that provides
feedback
- - - - - _ ..-
..
__. _ - - _ . _ - - - - - - - - - - - -
Scurce - Phase in which error of omission Icommission was madc
(e. g. Requirement, Design, Coding, Test, Maintenance,
and Corrective Maintenance).
Cause - The causal description of the error. rather than
symptomatic
Severity - The resulting effect of the error on mission performance
(e.g. Critical. Major, and Minor)
From the collection of the Computer History Museum (www.computerhistory.org)
National Computer Conference, 1980
702
TABLE VIII.-Casual categories to support software reliability analysis
Design
Nonresponsive to reluirements
Inconsistent or incOloplete data base
Incorrect or incomplete interface
Incorrect or incomplete program structure
Extreme conditions neglected
Interface
Wrong' or nonexistent subroutine called
Sl.broutine call arg'wnents not consistent
Improper usc or setting of data base by a routine
Improper handling of interruptE
Subcategory for clerical errors
Data Definition
Data not initialized property
Incorrect data units or scaling
Incorrect variable type
Logic
Incorrect relalional operator
Logic activities out of sequence
Wrong variable being checked
l\Jissing' logic or condition tests
Loop iterated incorrect number of times (including endless loop)
Duplicate logic
Data Handling'
Data accessed or stored improperly
Variable used as a flag or index not set properly
Bit manipulation done incorrectly
Incorrect variable type
Data packing/unpacking error
Uni ts or data conversion error
Subscripting error
Computational
Two experimental studies, one performed at the Naval
Post Graduate School (NPGS)Z7 and the other performed at
the Naval Research Laboratory (NRL),8 found it necessary
to include clerical as a major error category. In fact, both
studies found that the clerical category was the most frequency error cause (see Tables IX and X). The NPGS error
distributions represent a composite of four projects. On one
project the Clerical, Manual subcategory contributed to 36
percent of the total errors. Due to the high occurrences of
clerical errors reported on these two unrelated projects, it
is recommended that clerical be added as a subcategory to
the Other category.
Maintenance category
Incorrect operator /operand in equation
Sign convention error
Incorrect/inaccurate equation used
Precision loss due to mixed mode
Missing computation
Rounding or truncation error
Other
Not applicable to software reliability analysis
Not compatible with project standards
Unacceptable listing prologue/comments
Code or desig'n inefficient/not necessary
Operator
Clerical
Category for design-related errors
Although a casual category for design-related errors is redundant with the source phase category Design, sufficient
error volume has been associated with software design activities to warrant a separate casual category. In analyzing
designed-related category assignments on three software
TABLE IX.-Misunderstandings as sources of errors during NRL
experiment
Categ~.~y __ ..
Clerical
Design
Coding StJccs
Careless Omission
Language
Interface
Requirements
Coding Standards
Total
projects at Hughes-Fullerton it was found that the categories
accounted for 25, 17, and 8 percent of the total errors. Furthermore, the results of the error category frequency distributions collected during code review/inspections (refer to
Table V) reveal that design conflicts constitute a significant
portion of the error causes (25.5 and 19.5 percent). Another
study8 performed at the Naval Research Laboratory (NRL)
reported that design misunderstandings contributed to 19
percent of the total errors (see Table IX).
% of Total Errors
36
19
13
_L____ J
Maintenance errors are defined by Thayer l as those errors
resulting from the correction of previously documented errors. He reported that in one project this category of errors
reached 9 percent of the total number of errors; however,
he estimated that a practical norm for this type of error
ranges from 2 to 5 percent. Fries 5 reported " ... a surprisingly high 6.5 percent of the errors were a result of attempts
to fix previous errors or update the software. Thus, the number of errors introduced by the correction process itself is
nontrivial. This is an important consideration when developing reliability model assumptions." Note that Fries' 6.5
percent includes updates or enhancement changes as well
as corrections of previously documented errors; therefore
the actual percentage value for maintenance errors would
probablY lie in the 2 to 5 percent range.
TABLE X.-Most frequent
e!f0~
types found during NPGS experiment
Subcategory
Clerical, manual
Coding, Representation
Coding, Syntax
Design, Extreme Condition Neglected
Coding, Inconsistency in Naming
Coding, Forgotten Statements
Design, Forgotten Cases or Steps
Design, Loop Control
Coding, Missing Declarations or Block
Coding, Level Problems Limits
Coding, Sequencing
Design, Indexing
Coding, Mh;sing Data Declarations
Clerical, Mental
Other (combined)
Total
- - - - - - - - - -. .- - - - - - - - --
From the collection of the Computer History Museum (www.computerhistory.org)
% of Total Errors
18.5
10.0
7.0
6.5
5.0
5.0
4.5
4.0
4.0
3.0
3.0
2.5
2.5
2.5
...l.l!.d.
100.0
Error Classification to Support Software Reliability Assessment
At Hughes-Fullerton three projects have been monitored
during development phases for maintenance errors. The portion of total errors for these three projects are 14, 12, and
8 percent. One possible reason these percentages are higher
than the previously reported range of two to five percent,
is that none of the thre"e Hughes projects controlled the number of allowable patches. Consequently, there was always
the extra risk of wrong correction in patch form due to hasty
implementation, or the subsequent incorrect symbolic implementation of a successful patch. It is estimated that maintenance errors· contribute to as high as 20 percent of the total
errors after a system is fielded. Because of the frequency of
this type of error, and the interest in reducing the cause of
maintenance errors, a separate category is required. Either
a Maintenance subcategory could be added to the Other
causal category, or a Corrective Maintenance category could
be added to the source phase classification. It is recommended that a new category be added to the Source phase
classification, because including maintenance error as a
causal subcategory would preclude the assignment of the
more descriptive cause (e.g., Subscripting error).
Optional category/subcategory assignment
The original TRW/RADC classification for Project 51 was
designed for universal application by allowing the option to
assign categories at only the major category level (e.g., Computational, Logic, Data Handling, etc.). The TRW study report commented as follows about the applicability of the
subcategories: "The detailed categories, however, are less
universal and suffer in applicability due to differences in language, development philosophy software type, etc. When
data are collected may also have a bearing on applicability
[of the detailed categories] to some software test environments. For Project 5 the list used was apparently adequate
for the real time applications and simulator software, as well
as the Product Assurance tools. However, there was criticism concerning applicability of detailed categories to the
real time operating system software problems." HughesFullerton has employed the two-level (category/subcategory) option, and has found it to be satisfactory for all projects.
ERROR COLLECTION GUIDELINES
It is human nature not to admit to errors, therefore it is
essential that software engineers be informed of the significance of reporting accurate error data to support software
reliability analysis. It should be emphasized that the purpose
of error reporting is to measure the technology and not the
people. I agree with Gerhart's13 statement: "It is necessary
to view errors as a phenomenon of programming which requires study and, while it is necessary to be sensitive to
peoples' reactions when threatened by exposure of errors,
it may be healthier to get the errors and the errants out in
the open rather than to cover up the human origin of errors."
Automatic data collection may be the only means to ensure
. 703
objective data, but short term projects cannot afford it. In
most instances, useful software reliability information can
be obtained by only slight modifications to existing problem
report/correction systems. The use of coded error category
descriptors on program trouble and correction reports tends
_
to alleviate thoughts of incrimination.
Guideline procedures for assigning and approving error
categories should be included in project standard practices
to promote consistent interpretation of the error categories.
In addition to the error categories, the procedure should
contain detailed definitions of the error subcategories. Those
definitions guide individual programmers in assigning the
most appropriate category to represent the error at hand.
Even with the use of such an error category dictionary, programmers may assign different categories for the same errors. Therefore, it is futher suggested that a senior programmer or reliability analyst be responsible for reviewing all
error category assignments for consistency and accuracy.
Certain less offensive subcategories such as Clerical require
special monitoring, because a programmer will lean toward
them when given a choice.
Programmers must be reminded to fill out a separate problem correction report for each distinguishable correction at
the module level. It is recommended that the following data
be collected in addition to the error classifications:
•
•
•
•
Date/time that error/incident was detected
Date/time that error was resolved by programmer
Date/time that resolution was verified
Principle module responsible for error
CONCLUSIONS
It appears from the survey of proposed software error
classification schemes that they differ primarily because of
varying emphasis on different areas of software development. I agree with some researchers that error classifications
must reflect areas of interest, however this does not preclude
the development of a standard minimal set of software error
classifications that has universal application-including reliability assessment. Therefore, I suggest that the proposed
error classification scheme be considered as a standard for
use in software reliability assessment. The proposed scheme
can be used during design reviews, code reviews, and testing.
In order to satisfy all activities, additional error characteristics will have to be collected. For example, in the validation and use of predictive software reliability models the
date and time of detection of a fault, and the date and time
of correction of the error are additional data that are required
to be collected. However, if the cause of the "error" is ignored a reliability model could be fed time/date data for a
problem report, such as integration of new software, that is
not analogous to the residual class of errors that quantitative
models predict.
The development of a set of standard software error classifications is a prerequisite for the development of a mean-
From the collection of the Computer History Museum (www.computerhistory.org)
704
National Computer Conference, 1980
ingful software reliability discipline. Such a set of classifications can serve two promising approaches to the discipline:
1) those that emphasize the use and assessment of reliabilityproducing techniques during the early development phases,
and 2) those that focus on the prediction and measurement
of the number of residual errors after acceptance, by statistical math models. Both approaches require error classifications to effectively assess and measure software reliability.
Concurrent with the development and acceptance by the
software community of a standard set of causal, severity,
and source classifications there is a need for research and
development in the automation error collection through compilers and test runs. Also, the capabilities of emerging independent V& V tools when augmented by standard error
classifications can be extended to improve test plan and procedure generation.
22. Hartwick, R. Dean, "Software Acceptance Criteria Panel Report," Joint
Logistics Commanders Joint Policy Coordinating Group on Computer
Resource Management, Software Workshop, Monterey, CA (April 1979).
23. Bowen, J. B., "AN/TPQ-36 Software Reliability Status Report," HughesFullerton, CDRL 8-18-015 (Dec 1979).
24. Shooman, M. L. and Ruston, H., "Summary of Technical Progress, Investigation of Software Models," Polytechnic Institute of New York,
RADC-TR-79-188 (July 1979).
25. Thielen, B. J., "SURTASS Code Review Statistics," Hughes-Fullerton,
IDC 78/1720.1004 (Jan 1978).
26. Fagan, M. E., "Inspecting Software Design and Code," Datamation, pp
133-144 (Oct 1977).
27. Hoffman, H., "An Experiment in Software Error Occurrence and Detection," masters thesis, Naval Postgraduate School (Jun 1977).
APPENDIX A
DEFINITION OF RECOMMENDED ERROR
CATEGORIES/SUBCATEGORIES
REFERENCES
1. Thayer, T. A., et ai, "Software Reliability Study," TRW-Redondo Beach,
RADC TR-76-238 (Aug 1976).
2. Willmorth, N. E., "Proceedings of Data Collection Problem Conference," RADC TR-76-329, Vo!' VI (Dec 1976).
3. Finrer, M. C., "Software Data Collection Study," System Development
Corp., RADC-TR-76-329, Vol III (Dec 1976).
4. Baker, W. F., "Software Data Collection and Analysis: A Real-Time
System Project History," IBM Corp., RADC-TR-77-192 (Jun 1977).
5. Fries, M. J., "Software Error Data Acquisition," Boeing-Seattle, RADCTR-77-130 (April 1977).
6. Chief of Naval Materiel, Military Standard for Weapon System Software
Development MIL-STD-1679 (Navy), AMSC No. 23033 (Dec 1978).
7. Willmorth, N. E., et aI, "Software Data Collection Study, Summary and
Conclusions," RADC-TR-76-329, Vo!' I (Dec 1976).
8. Weiss, D. M., "Evaluating Software Development by Error Analysis:
The Data from the Architecture Research Facility," Naval Research Laboratory, NRL report 8268 (Dec 1978).
9. Nelson, R., "Software Data Collection and Analysis, Draft"-partial
report, RADC (Sep 1978).
10. Gannon, c., "Error Detection Using Path Testing and Static Analysis,"
Computer, pp 26-31 (Aug 1979).
11. Hecht, H., "Measurement, Estimation, and Prediction of Software Reliability," Aerospace Corp. NASA CR-145135 (Jan 1977).
12. Motley, R. W. and Brooks, W. D., "Statistical Prediction of Programming
Errors," IBM Corp. RADC TR-77-175 (May 1977).
13. Gerhart, S. L., "Development of a Methodology for Classifying Software
Errors," Duke University (July 1976).
14. Castle, S. G., "Software Reliability: Modelling Time-to-Error and Timeto-Fix," masters thesis, Air Force Institute of Technology (Mar 1978).
15. Kruszewski, G., "Modeling Software Reliability Growth, Proceedings of
Surface Warfare Systems RMQ Seminar," Norfolk, VA (Sept 1978).
16. Schafer, R. E., et ai, "Validation of Software Reliability Models,"
Hughes-Fullerton, RADC-TR-79-147 (Aug 1979).
17. Sukert, A., "State of the Art in Software Reliability," Presentation, NSIA
Software Conference, Buena Park, CA (Feb 1979).
18. Thibodeau, R., "The State-of-the-Art in Software Error Data Collection
and Analysis," AIRMICS (Jan 1979).
19. Amory, W. and Clapp, J. A., "Engineering of Quality Software Systems
(A Software Error Classification Methodology)," MITRE Corp., MTR2648, Vol VII, Jan 1975, also RADC-TR-74-324, Vol VII.
20. Rubey, R. J., "Quantitative Aspects of Software Validation," Proceedings of the 1975 International Conference on Reliable Software Los Angeles, pp 246-251 (April 1975).
21. NAVSEA, Statement of Work for AN/SLQ-32(V) Verification and Validation, Appendix A (May 1977).
Design
The Design category reflects software errors caused by
improper translation of requirements into design. The design
at all levels of program and data structure is included (subsystem through module and data base through table). Such
errors normally occur in the design phase, but are not limited
to that phase. Errors due to inconsistent, incomplete, or incorrect requirements do not qualify for this category; such
errors should be assigned to the subcategory, "Not Applicable to Software Reliability Analysis. "
Interface
The Interface category includes those errors concerned
with communicating between ,1) routines and subroutines,
2) routines and functions, 3) routines and the data base, 4)
the executive routine and other routines, and 5) external interrupts and the executive routine.
Data definition
This category pertains to errors involved with permanent
data, such as retained, global, and COMPOOL. It includes
common variable and constant data, as well as preset, initialized, and dynamically set variables.
Logic
The Logic category includes all logical-related errors at
the intramodule level. Examples of this category are incorrect relational operators and incorrect looping control. Improper or incomplete logic occurrences at the intermodule
level do not qualify for this category, and should be assigned
to the Interface category.
From the collection of the Computer History Museum (www.computerhistory.org)
Error Classification to Support Software Reliability Assessment
705
subcategories should not change. The following suggested
subcategories deserve further explanation.
Data handling
The Data Handling category is concerned with errors in
the initialization, accessing, and storage oflocal data; as well
as the conversion and modification of all data.
Computational
The Computational category pertains to inaccuracies and
mistakes in the implementation of addition, subtraction,
multiplication, and division operations.
Operator
This subcategory includes errors caused by inaccurate
users manuals for both operational and diagnostic applications.
Clerical
Other
The Other category is designed to provide flexibility for
each application. However once selected for a project, the
This subcategory includes errors that can be traced to
careless keypunch, configuration control, or system generation operations.
From the collection of the Computer History Museum (www.computerhistory.org)
From the collection of the Computer History Museum (www.computerhistory.org)