Defining defects, errors, and service degradations - Fdu

Defining Defects, Errors, and Service Degradations
Gertrude Neuman Levine
Fairleigh Dickinson University
[email protected]
Abstract
The study of defects is a principal topic of software systems, affecting all phases of a system’s lifecycle. Defects are the cause of
errors and service degradations. Unresolved errors cause failures.
If defects cannot be prevented effectively, then error control mechanisms must be evaluated.
In an earlier paper, we defined resource deadlock and developed a
classification scheme for dead states [15]. Resource deadlock is
enabled by a defect in a system’s scheduling mechanisms that allows processes to be trapped in a cyclical wait for resources. Cyclical waits can be prevented. When resource deadlock is rare and
prevention mechanisms are onerous, however, systems have relied
on (perhaps heuristics of) detection and recovery mechanisms to
We introduce a model to distinguish between defects, errors, and enable delivery of completed service [7].
service degradations. A two-dimensional classification scheme is
developed for defects, defined by the types of process interaction The remainder of this paper is organized as follows:
and software corruption that are involved. A third dimension is  Related work. Open questions concerning related work.
added to this taxonomy for defects that cause service degradation,  Introduction of a model to structure our classifications.
based on the deviations in service quality that are tolerated. We  Presentation of definitions for our terminology.
investigate the role of service degradation in error prevention.
 Representation of the states of a system.
 Development of a two-dimensional classification scheme for
Keywords: defects, errors, failure, service degradation
defects.
 Illustration of active defects in each class. Identification of
I Introduction
classes that have poor potential for recovery.
A project is a set of cooperating processes that are developed to
satisfy user and system requirements. Each process consists of pro-  Taxonomy of service degradations. Discussion of the relationship between service degradations and errors.
ject requests that are bound together by restrictions to regulate
their movement as a unit through layers of a software system. If a II Related Work
process contains a defect, its execution can result in errors and/or
Some of the terminology of our introduction is defined in the litservice degradations. Sometimes a defect remains dormant, pererature of computer science. ISO/IEC [10] contains a three-step
haps in an uncalled subroutine or in an unread page of a document
definition, from failure (violation of a contract) (13.5.1), to error
or in an unexploited vulnerability. A defect becomes “active” when
(cause of a failure if unresolved; manifestation of a fault) (13.5.2),
it causes an error [1] or a degradation of service. An error occurs
to fault (situation that can cause an error) (13.5.3). We prefer the
when an authorized user process loses service that is specified in
term “defect” rather than “fault” since we do not include physical
the user/system contract. Either an error is confined within a user
phenomena in our study.
project, or else an external process interferes with a user process.
External interference implies that there is a defect in the resource The IEEE Standard Glossary of Software Engineering Terminolomanagement system; the activation of such a defect introduces a gy [9] has different meanings for some of the above terms. A fault
vulnerability that can be compromised by another process.
is defined as both a hardware defect and alternatively as an incorrect step, process, or data definition in a computer program. The
Computer science literature contains several classification schemes
second definition of fault is also used for the term “error,” with an
that were developed to assist in defect prevention. Defect prevenalternate definition provided for error as an incorrectly computed
tion, however, is not always possible or cost effective, so that serresult or a human action that produces an incorrect result. We seek
vice degradations and errors are common [21]. When defects cause
to identify characteristics of faults that can be detected and conerrors, they can sometimes be detected and resolved by a resource
trolled after they begin executing but before errors or failures remanagement system [8, 20]. Alternatively, when defects become
sult. We need to differentiate between faults, errors, and service
active, errors can be prevented or mitigated through service degradegradations and thus do not use the IEEE definitions.
dations. Errors that are neither prevented, nor mitigated, nor resolved cause failures.
Landwehr et al. [12] develop a taxonomy of software security
flaws (defects). Outcomes resulting from (active) security flaws are
We introduce a two-dimensional classification scheme for defects.
classified as: unauthorized disclosure, unauthorized destruction of
We identify those types of errors for which resolution mechanisms
data, unauthorized modification of data, and denial of service.
are problematic, making prevention critical. A three dimensional
These outcomes can be reconciled with our classification. Unauclassification scheme is presented for service degradation. Service
thorized modification and destruction of data both involve unaudegradations are discussed in terms of their role in the prevention
thorized requests that conflict in stored data values with authorized
of errors. (They do not prevent defects!) Our classes are organized
requests (absence of required values implies presence of unacaccording to constructs of a model that previously was integrated
ceptable values) and, according to our classification, result from
into the study of specific types of defects [14, 15, 17]. We generaldefects in data output. (See section 6.2.) Unauthorized disclosures
ize previous findings [16] in order to obtain a unified approach to
are outputs at unauthorized locations, but result from defects in
the control of defects.
data inputs. Denial of service requires that processes be prevented
from obtaining requested resources, which can occur, for example,
during system overload; such behavior results from defects in
times restrictions. Other denial of service attacks include corrupted
web links (defective dependencies) and page ranks (defective priorities). Landwehr et al. also develop a taxonomy of system flaws
based on causation in terms of motive, time, and location of introduction, with the class of time (when the flaw enters the system)
specified as occurring during development, maintenance, or operation. Our model divides the system into layers in which both development and maintenance are services of the lowest layer, while
operation occurs at upper layers. Defects originate during development or maintenance, but cause errors during operation.
Chillarege et al. [4], in their classification of defects, do not distinguish between defects and errors. They list defect types as function, interface, checking, algorithm, assignment, build/package/
merge, timing/serialization, and documentation. We exclude the
classes of function, documentation, and algorithm from our study;
such “defects” cause errors or service degradations because their
implementations contain defects in request attributes. Assignment
defects are subsets of our class of data output defects; checking
defects are subsets of data input defects; build/package/ merge and
interface defects are subsets of defects in dependency restrictions;
timing/serialization defects are subsets of defects involving times
restrictions. We add a category for defects in priority restrictions.
(when data are used for identity theft or other attacks on confidentiality)?
Do all active defects cause errors? Do all unresolved errors cause
failures (ignoring toleration of partial service)? Then, if “cutting
the line” is an error and if an intruder is not bumped off the line,
failure must result. Yet failure need not result; typically all cars
behind the intruder complete service, although they suffer service
degradation. Similarly, priority inversion [13] is an active defect
that causes service degradation, but it does not necessarily cause
the failure of higher priority processes.
Defect prevention mechanisms are applied before defects begin
execution. Resolution mechanisms are applied after defects cause
errors. Defect prevention frequently is onerous; can prevention be
delayed until after defect execution, but before errors occur? Can
defect activation be used to trigger the employment of error prevention mechanisms? Can service degradations assist in error prevention? Can service degradations assist in the mitigation of errors
with more flexibility and less cost than error resolution?
III A Model for Software Systems
We introduce a model to enable definitions of the above terms and
to structure classifications of errors and service degradation. User
requirements are expressed, in their simplest form, as requests for
input or output at resources. Requests are combined into processes,
Avizienis et al. [1] present taxonomies for dependable and secure which are the entities of completion of service at system layers
computing, introducing definitions for common terminology and (stages that must be reached or relinquished in order).
issues. They differentiate between faults (defects) and errors, similar to ISO/IEC. They state that a fault is “active” when it causes an 3.1 The Layers
error. We claim that an “active” fault can cause a service degrada- M is a set of ordered layers of requests in a software system.
tion that need not be an error. Mechanisms to achieve dependabil-  The Process Conception Layer, PC, is a set of requests that are
ity and security are classified as: fault prevention, fault tolerance,
being developed from Requirements into a project. Requests
fault removal, and fault forecasting. Fault tolerance, the ability to
are conceived and reconceived (corrected, modified, mainprovide (degraded) service despite the presence of faults, is gertained) in this layer until their processes are in a form that can
mane to our study. Perhaps there exist faults whose prevention is
be accepted by the resource system.
not critical. Some faults cause errors that result in partial service  A Process Buffer Layer, PB, is a set of requests that are buff[1], in which failure is tolerated for processes that are nonessential
ered (stored, delayed, postponed) awaiting delivery to the refor project completion. Other faults cause degraded output or insource layers.
put; applications in certain domains can tolerate small deviations
 An Independent Delivery Layer, ID, is a set of requests that
from optimal values [6, 19]. Active faults that result in delays are
are being delivered to resource layers (perhaps transmitted via
routinely tolerated within a limited range [3, 19]. Some systems
the Internet). An ID contains requests from multiple projects.
attempt the selection of an optimal choice, but accept non-optimal
 A Resource Buffer Layer, RB, is a set of requests that are
alternatives. Many faults cause errors that are resolvable [8]; resobuffered in the resource system.
lution methods include rollback and exception handling, but intro
A Resource Service Layer, RS, is a set of requests that are
duce delay (degradation). There are also classes of faults that
executing at the resource, via inputs or outputs, requesting
threaten irreparable harm to a system if they become active; we
completion of service.
need to identify these faults and, in dealing with them, channel our

A Service Completion Layer, SC, is a set of requests whose
energies towards their prevention.
processes have completed service at RS. (If the layers are cirSome researchers do not distinguish between faults and errors
cularly ordered, SC and PC denote the same layer and pro[e.g., 4, 12]. Others define a fault as the cause of an error [e.g., 1,
cesses that move to SC / PC can be modified for reuse [17]).
10], such that an error occurs when a fault is “active” [1]. Defining
a software defect (fault) is difficult. Even blatant defects, when We define an ordering relation, >, on the elements of M, such that
executed, need not result in an error. Consider the transmission of SC > RS > RB > ID > PB > PC. For m, m’ є M, if m’ > m we
unencrypted top secret data over an insecure wireless connection. say that m’ is a higher layer than m.
Is this defect active when it executes (when data are transmitted All processes seek to linearly traverse layers, but are impeded by
insecurely), when the vulnerability is compromised (when unau- restrictions assigned during different stages of their lifetimes. (For
thorized data are read), or only when captured data are exploited example, assume that an Ada programmer conceives a program to
play the Game of Life on a personal computer. A GNAT compiler
translates the source code. If syntax errors are found during compilation, the program is returned to the programmer. After correction
and recompilation, executable code is stored in user buffers waiting to be submitted for input and output service at devices and intermediate resources. Once execution begins, statements are stored
in system buffers, pending the service of preceding statements.
Many errors, such as attempting to open nonexistent files, incur
traps to the operating system and interrupt the program’s service.
The operating system might signal the program via an exception
handler and enable it to complete service.) A six layer model is a
simplification, chosen to correspond to basic computer systems.
Some systems contain null layers. (The personal computer in the
above example does not utilize an ID.) Processes progress through
empty layers, and only through empty layers, without delay. They
remain for at least one resource unit at each nonempty layer, to
obtain the service, such as buffering or delivery, of that layer.
Many systems contain sub-layers that processes must traverse
(such as in the waterfall model of software engineering).
output, as well as process and data authorization keys. In addition,
a request is assigned, at different stages of its mapping, a restriction set (see section 3.7) and a data set. The data set includes
values (such as literals, colors, and signal strength), format and
type keys, a work-area, and an operation.
When peer-to-peer systems are modeled, requests are sent and received at each end of the layers by processes that control the resources at that end. The ordering relation is then defined by the
direction from source of conception to destination of completed
service.
An input request with a wild card for a value requests the retrieval
of the stored value from the accessed resource, followed by the
output of that value into its work-area. Any input request except a
match request that matches the format key of the accessed resource
(in contrast to undecipherable input, for example) enables comprehension after retrieval and is called a read.
An output request seeks to store, at the requested resource, either
its value and/or the value in its work-area, perhaps modified as
specified by its operation. Certain types of output requests, including those that constitute an update, share their work-area.
An input request that contains any value except a wild card is a
match request and is serviced at its layer only if its key and value
match those of the accessed resource. Guards for conditionals and
loops are match requests; if their attributes do not match those of
the resource, the dependent requests in their structure are refused
service without affecting the service of their processes. A match
request that conflicts with an authentication lock, on the other
hand, impairs its entire process.
3.2 Time
T is a finite set of linearly ordered discrete units of Time, repre- 3.5 The Process
sented by an initial subset of the natural numbers and bounded by P is a set of processes in a software system. A process, p є P, is an
the lifetime of the system.
nonempty, ordered set of cooperating requests that are bound together by dependency restrictions so that requests of the same pro3.3 The Resource
cess are serviced sequentially in RS and so that a process
R is a set of resources in a software system. Each resource ele- completes service and/or is demoted as an entity. Additional rement, r є R, is an ordered set of units that are bound together with strictions determine control within and between processes (involvdependency restrictions (see section 3.7) so that units of the same ing conditional statements, loops, reference parameters, structure
resource element are allocated in sequence, corresponding to units chart relationships, and concurrency synchronization, for example).
of T. Dependency restrictions also bind together different resource A project’s completion of service is dependent upon the compleelements for combined allocation. (For example, an operating sys- tion of service of all of its processes (or of essential processes, if
tem combines disk sectors into blocks for indivisible allocation and the project supports partial service). Each process has a unique key
deallocation.) A resource’s attributes include data values as well as that is assigned to its requests. This key contains fields for process,
keys for status, resource and system identification, and for user project, and user identification and authorization.
authorization. Fungible resources contain elements that are indistinguishable to user processes, but have additional fields with 3.6 The Software System
which the resource management system differentiates between A Software System is a quintuple (M, P, R, T, F), where M is a
them.
set of ordered layers, P is a set of processes, R is a set of resources, T is a finite set of discrete units of Time, and F is a func3.4 The Request
tion that controls the movement of requests. For the requests of
A request is an atomic entity expressed by the tuple (m, r, t):
each process and for each single and composite layer, F assigns
restrictions as well as a relation, gM,: M  M, such that
 m є M identifies the current position of the request in the system layers. A request at layer m requests to be mapped to the gm (m, r, t) = (m’, r’, t’), m’ > m, if restrictions permit this mapping
layer m’ > m.
(called promotion), else
 r є R identifies a resource where service is requested. For cer- gm (m, r, t) = gm (m, r’, t’), t’ > t, if restrictions permit this mapping
tain processes, such as implementations of search algorithms,
(called rescheduling), else
a request can be mapped to different identifiable resources gm (m, r, t) = gm (gm’ (m’, r’, t’)), m > m’ (called demotion).
during its lifetime.
We have simplified the mapping above. When a request is promot t є T identifies a request’s current position in the discrete ored, a duplicate may be generated. Sometimes copies are rescheddering of Time. A request seeks movement to a higher layer at
uled at their former layers. (For example, stations using the
its current unit, t є T, but might be delayed (mapped to t’>t).
Transmission Control Protocol (TCP) retain duplicates when they
Each request has a permanent attribute, denoting it as input or transmit data units.) Alternatively, duplicates are promoted concur-
rently (as in RAID architecture or child processes). Multiple map- 3.7.3 Priority restrictions
pings occur in many circumstances.
A request is enabled for promotion iff its promotion dependency
set is empty and its promotion times restriction is 0. A request is
Requests ultimately request promotion to SC. During their proenabled for rescheduling iff its demotion times restriction is posigress, if restrictions prevent promotion but allow rescheduling,
tive and its demotion dependency is unchanged. Two enabled rerequests are mapped within the same layer to later units of T. If
quests that request mapping to the same resource, delivery, or
restrictions prevent both promotion and rescheduling, requests are
buffer unit compete if one is an output and the other has a nondemoted to lower layers to repeat their quest for completed service.
matching authorization and/or data key. Conflict takes place if two
Service at a layer is provided during rescheduling. Rescheduling
or more competing requests are mapped to the same service unit.
also determines which elements of fungible resources are allocated.
Processes are said to conflict (or compete) with each other if their
requests conflict (or compete). At most one conflicting process can
3.7 The Restrictions
be serviced at a resource unit; others are demoted. Conflicts inAt each layer, a request is assigned restrictions, both statically and
volving match requests for locks or logins are important protection
dynamically, of dependencies, times, and priorities. These conmechanisms that prevent unauthorized access. They result in errors
straints, together with attributes of data input and output, determine
only in intruders (if utilized correctly). Conflicts resulting in data
the classes of active defects that are defined in section 6.2 and the
inconsistencies, on the other hand, can cause errors for all involved
classes of service deviations that are presented in section 7.
parties, including authorized user processes.
3.7.1 Dependency restrictions
Priority restrictions determine which competing or conflicting rePromotion and demotion dependencies are sets of events that requests are promoted, rescheduled, or demoted. (For example,
strict a request’s movement. Before a request is promoted, it must
hardware interrupt service routines are assigned higher priorities
wait for completion of (some combination of) the events contained
than competing user processes.) Priorities are determined differwithin its promotion dependency. (As examples, while a user reently in different systems. (For example, newly promoted virus
quest in RB waits for an input or output completion, its process
outputs conflict with and overwrite authorized outputs that were
remains on the blocked list; an access to a block referenced by an
promoted earlier but were still being serviced at the resource. On
i-node’s triple indirect pointer is dependent upon three accesses to
the other hand, in broadcast systems using C-Aloha, conflicting
blocks containing intermediate pointers; minimizing the number of
outputs with weaker signals are demoted even if they are newly
events in a dependency decreases the degree of coupling.) A repromoted, while all conflicting outputs are demoted in Alohanet.)
quest is demoted if an event in its demotion dependency occurs.
(For example, a request and its process are demoted to the blocked IV Definitions
list when it makes a system call requesting output. This demotion The above model enables definitions of the fundamental terms of
is not an error since the state of the process is stored, enabling re- this paper. Defects are defined and categorized.
sumption without loss of service.)
Execution is the rescheduling of a request in ID, RB, or RS.
3.7.2 Times restrictions
A promotion times restriction is a nonnegative integer that deter- Service is the execution of a request with the decrement of its
mines the minimum number of times that a request must be re- promotion times restriction. Service is lost if a process is demoted
scheduled at a layer within some interval of T before promotion. A without maintaining state information. (Its promotion time redemotion times restriction is a nonnegative integer that determines striction at each layer of demotion is reset.)
the maximum number of times that a request can be rescheduled at Abortion is the demotion of an executing process to PC.
a layer within some interval of T before demotion. These values
are decremented, down to 0, each time a request is rescheduled User Requirements are a set of user requests for completed serwithout conflict. A positive value for a promotion times restriction vice at specified resources: processing specified inputs, producing
prevents a request from being promoted. (For example, a sleep (n) specified outputs, and satisfying specified restrictions. User restatement in the C programming language forces the next statement quests are developed in PC into a project and its processes and
to be rescheduled at its layer at least n times before its promotion requests to satisfy a user/ system contract. We assume that, as part
to RS.) A zero value for a demotion times restriction forces a re- of the contract, the project and resource management system agree
quest, in any layer except PC, to be demoted. (Requests with a upon behavior to provide completed service. Projects are assigned
hard deadline are assigned demotion times restrictions limiting the authorization keys that are attributed to its processes and requests.
number of times that they can be rescheduled. Demotion times The resource management system assigns keys to resources for
restrictions also limit the number of times that modems attempt authorization of requests and processes.
connections during a session. Data transmission rates have an up- Authorized processes have keys that match the authorization keys
per bound for the number of bits that are sent per second; such of the resources that their requests are accessing.
maximum values are set by communications protocols and the
hardware. Even if a sender could exceed a maximum transmission Authorized requests have keys that match the authorization keys
rate, the receiver would not supply service.) All requests are as- of data stored at the resources that they are accessing, as well as
signed composite demotion times restrictions that are bounded by the authorization keys of the resources.
the lifetime of the system so that, at the end of T, processes that are Interference is the corruption of attributes of authorized user or
in a state of execution are demoted to PC.
system requests, impeding the service of an authorized user. Interference is transitive; any process whose service is impeded by a
process that suffers interference also suffers interference.
V The System States
Avizienis et al. [1] assert that the behavior of a system can be described by its states, which consist of computation, communication, stored information, interconnection, and physical condition.
These states are expressible with the constructs of our model, both
at a specific unit of T and over the lifetime of the system. Resources and requests at a given unit of T provide a snapshot where:
An intruder is a process containing unauthorized requests that
interfere with authorized requests of a different project. An intruder might be an unauthorized process, or else it might be an authorized process of a user or resource management system that
contains unauthorized requests. An authorized project that contains
unauthorized requests that interfere only with its own requests is
not an intruder.
 Information is stored in the data values of requests that are
currently being serviced and in the resources and buffers
A defect (flaw) is a nonempty set of requests whose execution can
where they are being serviced.
result in interference. We identify three classes of defects:
 Computation is specified in the operations of output requests
1) A development defect contains unauthorized requests that,  Interconnection and blockage are defined by the dependency
when executed, interfere with authorized user requests. The
sets of requests.
unauthorized requests might belong to the project of the au Communication is achieved via input and output service of
thorized user requests. Otherwise they belong to an intruder, in
requests.
which case interference is enabled only if the resource man The system’s physical condition, although excluded from our
agement system is defective.
study, can be expressed by resource status fields.
2) A scheduling defect contains resource management requests
that, when executed, enable interference between authorized The behavior of a system is defined by request mappings. At each
user processes. A scheduling defect results in interference only unit of T, a request is mapped to another request, either in a higher
if user processes compete for service. (For example, priority or lower layer or for a later unit of T. Our model can thus be conmechanisms might be corrupted, causing conflict between sidered to be a very large finite state machine, with the mappings
competing authorized processes. Conflict results in demotion of requests identifying transitions from one state to another. (Muland loss of service of at least one of the authorized processes.) tiple mappings require multiple transitions following a single in3) A security defect contains resource management requests that, put.) Note that a cycle cannot exist; each mapping is either to a
when executed, enable interference between an intruder and an request at a later unit of T, or to an adjacent layer followed by a
authorized user process. A security defect results in interfer- mapping to a later unit of T at the adjacent layer (unless the adjaence only if an intruder exploits the vulnerability. (For exam- cent layer is empty). The mapping of a process to a lower layer
ple, a system’s match requests to verify access control rights signals a possible error. The continuous mapping of a process
might be defective, allowing an intruder to overwrite user da- within the same layer(s) signals potential service degradation.
ta.)
VI The Classification Scheme
A failure is the abortion of an authorized user process.
Defects are classified as scheduling, security, or development, deAn error is the demotion of an authorized user request and its loss pending on the type of process interaction involved. Active defects
of service caused by the execution of a defect. If the demoted re- are classified according to the corrupted attribute that causes them.
quest does not repeat upward movement through the layers from
which it was demoted as well as service at those layers (called res- 6.1Classification of Defects
olution), and if its service is necessary to its process, the process All software defects originate during development, either from user
and/or from system requirements. Some development defects conwill fail.
tain unauthorized requests that are isolated within their authorized
Standard service is service without defects.
user project, potentially harming only their process and/ or project.
A service degradation is a deviation from requested optimal ser- Other development defects are found in intruders. Scheduling device of an authorized user process caused by the execution of a fects consist of incomplete or inconsistent competition mechanisms
defect. Most user/system contracts tolerate deviations from optimal of resource management systems that interfere with access to reservice that result during standard service (such as delays in com- sources of authorized user processes. Security defects consist of
pletion); we consider such deviations to be an accommodation, not incomplete or inconsistent cooperation mechanisms of resource
a degradation of service.1 Systems deploy service degradations to management systems that allow intruders to interfere with authorprevent errors. Where possible, processes accept deviations in ser- ized user processes.
vice. In addition, systems monitor suspected service degradations 6.1.1 Development Defects
and adjust restrictions to prevent the occurrence of errors.
Development is the production of a software project in PC, underA defect is active when it causes service degradation or an error.
1
Degradation of service has been defined as a mechanism for tolerating
an increased load on a system [25]. Unless the increased load is beyond
service specifications, we consider a deviation in quality of service that
accommodates more users to be a responsibility of standard service.
standing, planning, implementing, testing, documenting, and maintaining the processes that cooperate in comprising a project.
Development defects cannot always be prevented. Furthermore,
defect prevention frequently is cost prohibitive. Systems can detect
some errors caused by inadvertent development defects and signal
exception handlers (if they are provided), assisting processes in
completing service. Operating systems and programming environments also detect and rectify certain program errors and service
degradations. (For example, “canaries” help systems detect buffer cally following its conflict with the input of the second update.
overflow [20] and the Java and LISP programming environments Overwriting the data of the first update causes its loss of previous
utilize garbage collection to remedy memory leaks.)
service.) A security data output defect causes an error, for example, when a virus is allowed into a system and its requests over6.1.2 Scheduling Defects
write data being stored by authorized output requests. Resource
Scheduling is the assignment of a set of asynchronous processes to
management systems frequently detect errors resulting from schedshared resources. (For example, an operating system assigns
uling and security output defects via mechanisms such as checkthreads to processors and a network router places packets in output
sums and integrity check values, and by logs and audits. They
queues of selected ports.) To enhance throughput and minimize
recover from detected errors via rollbacks and restarts in RB or
response time, a system schedules processes to resources for conID. Systems also detect certain errors resulting from user develcurrent units of T. A defective resource manager can interfere with
opment data output defects using mechanisms such as overflow
user processes during certain traffic conditions. Systems resolve
circuitry; error resolution is achieved, for example, via signals to
many scheduling errors with backups and restarts (maintaining
exception handlers in user processes. Such error resolutions, howduplicates at lower layers, promoting the demoted processes, and
ever, cause delays and are thus examples of service degradation
repeating lost service). Systems diminish many scheduling service
caused by corrupted data outputs.
degradations by adjusting restrictions (perhaps by measuring waiting times as decrements in times restrictions and raising priorities 6.2.2 Data input defects
accordingly). The greatest efficiency for error resolution is A development data input defect causes an error, for example, if a
achieved at the layer closest to the initial demotion. Typically, ser- process inputs from a resource before a value has been stored there
vice degradations are controlled at the layers at which defects are and an output request places that (unauthorized) value in another
activated. Degradation mitigation thus saves the cost of promotion location. The error occurs during data output, but the cause is a
and repeated service that is involved in error resolution.
data input defect. A scheduling data input defect causes an error
during inconsistent retrieval [2]. (One transaction transfers money
6.1.3 Security Defects
between two resources. Its outputs are interleaved with another
Security is the prevention of the access of intruders to resources
transaction, which inputs from both resources and outputs the sum
that are shared among authorized processes. (For example, a UNIX
of the retrieved values elsewhere. If the second transaction’s inputs
system authenticates processes before they are provided entry.) An
execute in between the first transaction’s outputs, for example, the
active security defect in a resource manager enables interference
second transaction’s output is erroneous.) A security data input
from advertent or inadvertent intruders. Intruders can be both addefect becomes an error, for example, during identity theft, where
vertent and authorized (such as an administrator that misuses
an intruder inputs data from a resource at which the output of an
granted authority), advertent and unauthorized (such as viruses),
authorized process is being serviced. When that data are misused,
inadvertent and authorized (such as temperamental device drivers
an error occurs. Errors caused by data input errors are difficult to
that crash the system), or inadvertent and unauthorized (such as
detect, since retrieved values can be stored in work-areas that are
buggy games installed by a careless employee). Systems recover
outside the control of the system. Logs and audits are not always
from certain types of security errors with backups and restarts.
effective; an intruder might eavesdrop over an unprotected connection or else corrupt the logs. Assuming that unauthorized input is
6.2 Classification of Active Defects
detected, recovery is not feasible if the data have been output beAn active defect can be placed into one of seven classes, dependyond the system’s control. Even if recovery is achieved, it delays
ing upon which request attribute is corrupted. These classes are:
the service of user processes and causes service degradation.
output and input data attributes, promotion and demotion dependency restrictions, promotion and demotion times restrictions, and 6.2.3 Promotion dependency defects
priority restrictions. Errors and service degradations are usually A development promotion dependency defect causes an error, for
caused by multiple defects, such as an intruder gaining access to a example, if a pointer calculation is incorrect and there is no path to
system due to a defect in authentication mechanisms and then the resource that was dependent upon the pointer. A scheduling
overwriting user data using corrupted access control mechanisms, promotion dependency defect causes an error when switches send
dependencies, and data attributes (typical activities of viruses). The packets to crashed routers based upon circular information, pernext sections contain examples of active defects in the above clas- haps using the distance vector algorithm [24]. A security promoses. (See Table 2 and Table 3 for charts of sample errors and ser- tion dependency defect becomes an error, for example, if intruders
vice degradations in these classes.)
corrupt web link dependencies. Referential links can be lost or,
more dangerously, reference a masquerading site [3]. A develop6.2.1 Data output defects
ment promotion dependency defect causes service degradation, for
A development data output defect causes an error, for example,
example, if the corruption of a promotion dependency causes a
when a mistyped initialization is stored in a database. (If data are
request to wait for a nonexistent process [7]. (The continued rewithin acceptable range, service degradations instead of errors
scheduling during the wait can be detected by the process with a
occur.) A scheduling data output defect causes an error during a
match on a time-out, and initiate the discarding of the waiting relost update. (Authorized requests from different processes are asquest.) A scheduling promotion dependency defect causes service
signed priorities that allow them to output concurrently to the same
degradation when scheduling enables a closed circular chain of
resource [2] and the two updates are interleaved in execution. The
dependencies, as in a resource deadlock [7, 15]. A scheduling deoutput of the second update overwrites the first update, whose
fect also causes service degradation in a circular dependency in
promotion had been prevented by a dependency, assigned dynamiwhich four cars are each idling at different corners of a four-way
intersection, guarded by stop signs, waiting for cars on the left to
proceed first [15]. A security dependency defect causes service
degradation, for example, if an intruder corrupts a promotion dependency in TCP [24]. A SYN (connection request) flood can prevent authorized users from establishing a connection, since TCP
systems wait for nonexistent acknowledgements before releasing
buffers necessary for new connections. If defects cause service
degradation and their demotion times restrictions expire, waiting
processes are demoted – an error. If composite demotion times
restrictions expire, processes fail. Systems maintain metrics of
waiting times and resource usage. Such statistical tools, although
heuristics, are useful in alleviating many types of service degradations by initiating the adjustment of parameters (such as the size of
TCP’s backlog queue). They also monitor for circular links. If cycles are found in resource deadlock, systems force errors and initiate recovery procedures. Compensation through redundancy [1] is
effective for preventing dependency defects from becoming errors.
Doubly linked lists, for example, provide alternate paths to list
elements, contained within the promotion dependency of requests
for list elements. Typically, however, the alternative path must
traverse more links that the preferred path, causing degradation of
service when a link to the preferred path is lost.
an error if a duplicate TCP data unit is assigned a promotion times
restriction that is smaller than the round trip propagation delay, so
that it is transmitted before the original’s acknowledgement can be
received. If the window has wrapped around at the destination, the
duplicate might conflict with a message with the same connection
identifier [24]. A security promotion times defect becomes an error
when a password is too small to foil a password attack and enables
an intruder to attack the system. The above defects cause service
degradation as well. If the scheduling and security defects are detected by logs and audits and resolved with rollbacks and restarts,
there will be delays in service completion. A prematurely delivered
Beta version might satisfy sufficient service requirements for some
users, and thus cause only partial failure [1], a type of service degradation. Even if window wrap-around does not occur, defective
transmission of a duplicate data unit wastes system resources and
potentially causes delays for other processes.
Promotion times defects also become active when a promotion
times restriction is too large and a request is rescheduled for too
many times. A cake can be burned (error) or a non-terminating
UNIX ping command can waste resources (service degradation)
due to development promotion times defects. Queueing theory tells
us that promotion times restrictions allowing new processes into a
system must be considerably less, on average, than the promotion
times restrictions required for service completion. (Average customer arrival rates must be less than average service rates.) If too
many processes are accepted (the promotion times restriction for
the resource management system’s acceptance of new customers
into the system is too large) instability or congestion ensue; these
are service degradations caused by scheduling times promotion
defects. Similar security promotion times defects enable “denial of
service” attacks during the replication of worms. Systems monitor
metrics to detect overload. Effective handling of service degradation resulting from overload involves throttling new traffic (lowering system promotion times restrictions) and decreasing the
amount of resources assigned to processes (perhaps changing restrictions on resources so that less elements, such as channels, are
assigned as a unit [22, 25] or decreasing times restrictions on users
for their rate of access). Systems also choose competing processes
to drop (lower their priorities) rather than store, forcing errors [18].
Reducing bandwidth for all processes is an effective service degradation mechanism to prevent errors involved in dropping service
for some [25]. Even when packet dropping is used to control service degradation, it is an error according to our definition. Authorized user processes have been demoted and lost their previous
service because of the activation of a defect.
6.2.4 Demotion dependency defects
A development demotion dependency defect causes an error, for
example, when a process deletes its only pointer to heap data; the
dependency set of the heap data has been corrupted and memory
leaks occur. (Storage cannot be freed.) A scheduling demotion
dependency defect causes an error, for example, if TCP cannot
reassemble a message due to a corrupted dependency. Assume that
an acknowledgment is delayed and arrives at a new connection
where the connection window has wrapped around. Since identifiers now appear to match, a duplicate data unit that has been stored
is demoted. If the original data unit is lost as well, TCP will not be
able to reassemble the message at the destination [24]. A security
demotion dependency defect causes an error if a hard disk is erased
because of a corrupt dependency (a trigger) assigned by a logic
bomb or virus. (Other defects allowed the virus into the system.) A
development demotion dependency defect can cause service degradation. For example, in virtual memory systems, requests are
assigned demotion dependency sets that contain events for deallocating assigned memory frames. Assume that a programmer sets a
lock bit if it is necessary to keep a page in memory. If the bit is not
set, the process is delayed as it is swapped in and out of memory,
probably causing an error and failure. Scheduling and security demotion dependency defects cause service degradation when, for
example, logs and audits detect demotion dependency errors and
6.2.6 Demotion times defects
initiate resolution in RB or ID. Typically, processes are restarted
Development demotion times defects can cause errors if assigned
from duplicates maintained at lower layers.
demotion times restrictions are too small or too large. For example,
an Internet Protocol packet contains a TimeToLive (TTL) field for
6.2.5 Promotion times defects
A development promotion times defect causes an error if a promo- the number of times that it has been rescheduled to routers. TTLs
tion times restriction is too small, resulting in premature request are decremented with each packet hop. A packet is demoted when
promotion. For example, a project’s Beta version might be deliv- its TTL reaches 0 [24]. If the assigned TTL is too small, the packet
ered to a user before it fulfills all service requirements. As another will be demoted before reaching the destination and lose previous
example, a linux student can insert a sleep command in a program service. As an example of a too large demotion times restriction,
and execute the program in the background in order to execute on the other hand, a loss of confidentiality can result when a com“ps” in the foreground and obtain the process identifier (pid). If the puter is sold without erasing sensitive data on its hard disk. A
sleep interval is too small, the process completes executing before scheduling demotion times defect can cause an error, for example,
its pid is displayed. A scheduling promotion times defect can cause if a hash storage scheme allows a limited number of rehashes to
fixed size buckets following collisions. Some data will not be
stored if the key distribution is not spread adequately by the hash
function. An example of an active security demotion times defect is
that of a program manager who purposely assigns deadlines that
are too small to be met or so large that they waste management
funds. Most errors resulting from too small demotion times defects
are detected when service is not completed. It is typically too late,
however, for resolution.
6.2.7 Priority defects
Defects in development priority restrictions are common during
project development. The disproportionate allocation of manpower
and resources to processes that are of lesser importance to the
overall project frequently causes both service degradations and
errors. Scheduling priority defects cause service degradation in
priority inversions, where higher priority processes are suspended
while lower priority processes are chosen for execution. If the suspension time exceeds the process’s demotion times restriction, the
process will be demoted. Scheduling priority defects become active in networks that use a policy of “wine and milk” for dropping
packets. Multimedia packets are treated as milk, since old packets
are probably worthless [24]. Bank or database transactions are
considered “wine” since older packets have consumed more network resources and will require additional resources to enable their
recovery. The determinations of priorities by these criteria are heuristics, however, so that more important packets are sometimes
delayed or demoted. A security priority defect causes service degradation when an intruder exploits vulnerabilities in search algorithms to raise its page rank via cyclical links or spamming blogs.
Access to appropriate links is either delayed or prevented, the latter causing an error. Priority defects are of great concern during
system design and management’s allocation of resources. Resultant
errors are difficult to control. Yet we have not found this type of
defect included in defect classifications in the literature.
improve the quality of service provided.
An active defect causes service degradation and/or an error. An
active defect will not result in an error in a mandatory process if
processes can complete service in a limited manner. Degradation
occurs when interference causes a non-optimal result, but one that
is within acceptable limits. We have already shown that request
attributes determine the types of corruptions that cause service
degradations. We now show that the same set of attributes defines
the types of variations in service quality.
7.1Times Service Degradation
Times service degradation is the continued rescheduling of an authorized request due to an active defect. Slow service [1] is a subset of times service degradation. Slow service, or, more generally,
slower service, can refer to delays in obtaining resources or in
completing service at resources. These service degradations need
not waste resources nor become errors. (For example, when priority inversion occurs, the processor is allocated to a lower priority
process, degrading service for an enabled higher priority process.
The processor, however, is fully utilized. The higher priority process need not lose previous service and can complete remaining
service if its times restrictions permit. Interference, however, has
delayed its access to the processor. Defects in scheduling priority
restrictions have caused times service degradations.)
Times service degradation has been characterized as a delay. More
generally, times service degradation is determined by the number
of times of rescheduling per specified interval, either more or less
than the value agreed to be optimal. A system, for example, can
reduce the number of bits transmitted per second, or increase the
number of packets dropped per interval to alleviate system overload. Both of these mechanisms to control overload can cause delays if the packets must be resent or the same amount of data must
be sent. Alternatively, a receiver can accept lower resolution, with
no resultant delay. Continued rescheduling also occurs at compoVII Software Service Degradation
Fault tolerance is the ability to continue service, with reduced qual- site layers, such as frequent page faulting during thrashing and
ity, when faults become active. When a redundant hardware unit repeated restarts during system design in an unfamiliar domain.
fails, service is maintained in degraded mode, preventing failure of Delays and other degradations that result from standard service are
the entire system. Similarly, when software defects are activated, not degradations. (For example, processes encounter standard deredundancy mechanisms maintain service in degraded mode. Four lays when higher priority processes are chosen for service. Higher
types of service degradations are used to prevent or mitigate errors priority processes may have their priorities lowered and encounter
that frequently lead to system failure. (Service degradations do not standard delays when systems prevent starvation with aging mechprevent defects.) Systems monitor service deviations, which signal anisms. Queues are mechanisms for preventing conflict and resultthe possible activation of defects. They control suspected service ant errors. The delays involved, if bounded and prioritized, are the
degradations, usually by adjusting the restrictions that they assume result of standard service as processes wait for turns at shared rehave been corrupted. Since the handler is determined dynamically sources.) In order to distinguish between standard service and serand activated on a “need to use” basis, its cost is greatly reduced. vice degradation, statistical tools monitor metrics such as page
The project must accept deviations. These include: slower service fault frequencies and number of dropped packets]. These tools,
that is completed later than optimal (redundancies in acceptable although heuristics, are effective in identifying parameters that
units of time); data values that vary from optimal (redundancies in should be adjusted. (For example, infinite delays for cars waiting at
acceptable data values); partial services that are a subset of optimal a four-way stop sign intersection are prevented by eliminating the
services (redundancies in acceptable sets of delivered services); dependency on one car. Failures during system overloads can be
skewed voting results in which non-optimal winners are chosen prevented by throttling processes or decreasing bandwidth.)
(redundancies in choices); or a combination of the above. (Requests accept any choice of elements of fungible resources, which Times restrictions are carefully adjusted to limit times service degprovide alternatives of equal quality. This option is neither a deg- radations that have been identified as frequently leading to errors.
radation nor a deviation, but the definition of fungible.) Most sys- In particular, system overloads must be controlled before they feed
tems accept deviations during routine service and continuously on resultant errors and error recovery routines. Adjusting times
work to reduce them, particularly any suspected degradations, and restrictions to prevent overloads causes delays, but less severe ones
than are caused by overload. Defect prevention for system overload requires resource overload (as in hard real-time systems) and
restrictive process access. Error resolution for overload conditions,
such as attempting to recover dropped packets and connections, is
generally too late to be successful. Delaying the binding of an error
prevention handler to the detection of probable defect activation is
the preferred and most cost-effective method of controlling overload for most applications.
routers periodically exchanged packets with their neighbors to estimate the optimal port based on queuing delays. A scheduling
defect existed because decision making was cyclical, relying partly
on information gained from the router that sent the packet. If a
router suddenly became congested, bad news travelled slowly [24];
packets continued to be scheduled to the port of the overloaded
router, a non-optimal selection. A security defect existed because a
malfunctioning router could broadcast zero delay and its neighbors
would route their packets to it [24], also a poor choice.) Byzantine
fault tolerance [11] is a voting mechanism to identify and eliminate
7.2 Input/Output Service Degradation
Input/ output (i/o) service degradation is the input or output at a malfunctioning members from a set of alternatives.
resource of a non-optimal data value due to the execution of a defect. In multi-media or web applications, standard service may 7.5 More on Service Degradations
include some loss in data values. (For example, digitizing audio Most systems balance multiple types of service degradations toinformation causes quantizing noise and color specifications for gether with other service deviations. Fuzzy set heuristics are useful
web pages are limited based on browser support.) But data loss for adjusting parameters (restrictions) in order to optimize the
without interference is neither a degradation of service nor an er- combined result [5, 18]. Game theory has also been studied for
ror. Degraded modes of data output or input include, for example, optimization of parameters and dropping choices [23].
output of choppy sound or video or input of signal fading [6]. They
The causes of service degradations are frequently difficult to deare caused by such active defects as memory leaks, resulting from
termine. Furthermore, it is not always clear when deviations result
corrupted dependencies, or packet loss, resulting from corrupted
from active defects or from standard operation. It is also challengpriority or times restriction. By accepting the service of degraded
ing to identify when service degradations actually begin [11, 20].
i/o, particularly in multimedia applications, systems can tolerate
Errors, on the other hand, occur at the time unit of process demodropped packets (partial service) without retransmission. Times
tion and usually are easily distinguishable from standard demotions
mechanisms monitor deviations, such as increases in packet loss.
such as preemptions.
Such statistical tools detect degradations in signal quality during
process execution and adjust restrictions as appropriate [6, 18, 19]. VIII Summary
The term “service degradation” implies that:
7.3 Dependency Service Degradation
 There exists a set of optimal service outcomes that have been
Dependency service degradation is the loss of services that are
agreed upon in the user/system contract.
not critical to the successful completion of a project (partial service  There is a range of acceptable deviations specified by the us[1]) due to interference. Typically, a project is assigned a promoer/system contract in addition to the optimal outcomes.
tion dependency restriction so that it cannot complete service until
 At least one acceptable deviation has occurred due to the exeall of its processes complete service. In dependency service degracution of a defect.
dations, a user/system contract specifies an optimal set of services,
but only a subset is mandatory. (Application programs and operat- Service degradation involves one or more of these deviations:
ing systems, for example, provide features that most users never  Deviation, within acceptable limits, from optimal numbers of
rescheduling because of interference.
use. After experimenting with one of these services and finding
that it doesn’t work, a user probably abandons it.) Optional pro-  Deviation, within an acceptable range, from the optimal outcesses are not contained in the promotion dependency sets of mancome of input and/or output because of interference.
datory processes or their project; the project completes service  Deviation from requested services, with delivery of a proper
even if optional processes fail. The tolerance for failures of unessubset of services that includes all mandatory services, besential services prevents errors and failures of mandatory processcause of an unresolved error.
es. Partial service degradations are caused by all type(s) of errors.
 Deviation from an optimal choice, such that an inferior but
acceptable alternative is selected, because of interference.
7.4 Priority Service Degradation
Priority service degradation occurs when an active defect causes Errors and service degradation are integrally related. Both are
a non-optimal selection from a set of acceptable alternatives. Sys- caused by active defects. Both are classifiable in terms of causation
tems rely on voting schemes to identify the optimal choice among by corrupted request attributes. This same set of request attributes
available possibilities [22]. Higher fault tolerance is obtained, defines the types of outcomes of service degradation. All errors
providing compensation through redundancy [1], since alternatives cause service degradation; resource units are wasted during error
are available if the optimal choice becomes disabled or downgrad- resolution and/or during previous service, as well as during moveed. Multiple deviations contribute to voting decisions, including ments up and down layers. Unresolved errors in non-mandatory
possible degradations in time units, i/o values, and selected ser- services cause degradation (partial service). Service degradations
vices. Page replacement, routing, and web page ranking algorithms that exceed acceptable deviations cause errors. Service degradaare all heuristic-based voting schemes that sometimes make non- tions are useful in preventing errors, both by tolerating degraded
optimal choices. (For example, routers schedule packets to dynam- service and by monitoring deviations and adjusting restrictions
ically selected outgoing ports that have the “shortest” distance to accordingly. Forcing errors is useful in the control of infinite loops
their destination. In the distance vector algorithm of the Arpanet, (by killing a process in a resource deadlock, for example) and sys-
tem overloads (by dropping packets, for example). See Table 1 for
a chart of the interdependence of errors and service degradations.
This paper presents a classification scheme for defects that is applicable to all phases of a software system and to both service degradations and errors. With the aid of a model developed for other
research areas, we obtain detailed definitions for defect, service
degradation, and error. We identify classes of treatable errors,
mechanisms appropriate for their prevention and resolution, and
the layers of a system at which such mechanisms are effective. We
offer a two-dimensional classification for errors, as well as a threedimensional classification for service degradation. Service degradations are categorized according to three types of process interaction, seven types of corrupted request attributes that enable
degradation (including two subcategories), and four types of degradations that result, yielding (more than) eighty-four classes. Examples of active defects in many of these categories follow (Table
2 and Table 3).
Further research should be conducted to develop an expression of
our definitions in mathematical notation in order to eliminate remaining ambiguities. Although the translation of many of our definitions into mathematical terms is straightforward, defining defects
is difficult. A comprehensive study of service degradation is another open area of research.
References
[1]
[2]
[3]
A. Avizienis, J. Laprie, B. Randell, and C. Landwehr (2004), Basic Concepts and Taxonomy for Dependable and Secure Computing, IEEE Transactions on Dependable and Secure Computing, vol. 1, no.1 (Jan.- Mar.
2004) pp. 11-33.
P. A. Bernstein and N. Goodman (1981), Concurrency Control in Distributed Database Systems, ACM Computer Surveys, vol. 13, no. 2 (June 1981)
pp. 185-211.
A. Bremler-Barr, E. Cohen, H. Kaplan, and Y. Mansour (2002), Predicting
and Bypassing End-to-End Internet Service Degradations, Proceedings of
the 2nd ACM SIGCOMM Workshop on Internet Measurement (November
2002) pp.307-320.
[4]
R. Chillarege, I.S. Bhandari, J.K. Chaar, M. Halliday, D.S. Moebus, B.K.
Ray, and M-Y. Wong (1992), Orthogonal Defect Classification- a Concept
for In-process Measurement, IEEE Trans. on Software Engineering, vol.18,
no.11 (Nov. 1992) pp. 943-956.
[5]
S. Ghosh, Q. Razouqi, H. J. Schumacher, and A. Celmins (1998), A Survey
of Recent Advances in Fuzzy Logic in Telecommunications Networks and
New Challenges, IEEE Transactions on Fuzzy Systems, volume 6, (e) (Aug.
1998) pp. 443-447.
[6]
M. Hadzialic, M. Hamza, and P. Begovic (2007), An Approach to Cell
Signal Coverage Reliability in the Presence of Different Fading Models,
Proceedings of the 5th ACM International Workshop on Mobility Management and wireless Access, (2007) pp 91-98.
[7]
R.C. Holt (1972), Some Deadlock Properties of Computer Systems, ACM
Computing Surveys, vol.4, no.3 (Sept. 1972) pp.179-195.
[8]
J.N. Herder, H. Bos, B. Gras, P. Homburg, and A.S. Tanenbaum (2006),
MINIX 3: a Highly Reliable, Self-repairing Operating System,” Operating
System Review, ACM Press, vol. 40, no. 3 (July 2006) pp. 80-89.
[9]
IEEE Computer Society 1990. Standard Glossary of Software Engineering
Terminology,” ANSI/IEEE Standard 610.12-1990. IEEE Press, New York.
[10] ISO Reference Model for Open Distributed Processing (1996), ISO/IEC
10746-2:1996 (E) at
http://standards.iso.org/ittf/PubliclyAvailableStandards/
[11] R. Kotla and M. Dahlin (2004), High Throughput Byzantine Fault Tolerance, International Conference on Dependable Systems and Networks
(June 2004) pp.575-584.
[12] C. E. Landwehr, A.R. Bull, J.P. McDermott, and W.S. Choi (1994) A Taxonomy of Computer Program Security Flaws, ACM Computing Surveys,
vol.26, no.3 (Sept. 1994) pp. 211-254.
[13] G. Levine (1988) The Control of Priority Inversion in Ada, Ada Letters, vol.
8, no.6 (Nov., Dec. 1988) pp. 53-56.
[14] G. Levine (1989) The Control of Starvation, International Journal of General Systems, vol.15 (1989) pp. 113-127.
[15] G. Levine (2003) Defining Deadlock, Operating Systems Review, ACM
Press, vol.37, no.1 (Jan. 2003) pp. 54-64.
[16] G. Levine (2005) A Model for Anomalies of Software Engineering, in T.
Sobh and K. Elleithy (Ed), Advances in Systems, Computing Sciences and
Software Engineering, Springer, 2005, pp.243-250
[17] G. Levine (1996) A Model for Software Reuse, OOPSLA, San Diego, CA,,
Oct. 1996, pp. 71-87.
[18] H. Liao, X. Wang, and H. Chen (2008), Adaptive Call Admission Control
for Multi-class services in Wireless Networks, IEEE International Conference on Communications, (May 2008) pp. 2840–2844.
[19] J. Liebeherr and D. Liao (1995) A Service With Bounded Degradation in
Quality-of-Service Networks, Proceedings of the Fourteenth Annual Joint
Conference of the IEEE Computer and Communication Societies, vol 3,
April 1995, pp. 1103—1110.
[20] G. Novark, E. D. Berger, and B.G. Zorn (2008), Exterminator: Automatically Correcting Memory Errors with High Probability, CACM, vol 51 (12)
(Dec. 2008), pp. 87-95.
[21] H. D. Owens, B.F. Womack, and M.J. Gonzalez (1996) Software Error
Classification using Purify, Proceedings, International Conference on
Software Maintenance., Nov. 1996, pp. 104-112.
[22] W.O. Rom and S. A. Slotnick (2009), Order Acceptance Using Genetic Algorithms, Computer and Operations Research, 36 (2009), pp. 1758-1767.
[23] A. N. Rouskas, A. A. Kikilis, and S. S. Ratsiatos, A game theoretical formulation of integrated admission and pricing in wireless networks, European Journal of Operational Research, vol 191 (3), 2008, pp. 1175-1188.
[24] A. S. Tanenbaum (2002) Computer Networks, 4th edition, Prentice-Hall, 2002
[25] G. V. Zaruba, I. Chlamtac, S.K. Das, A Prioritized Real-time Wireless Call
Degradation Framework for Optimal Call Mix Selection (2002), Mobile
Networks and Applications, vol 7, (2), April 2002, pp 143-151.
2
Is this anomaly caused by an active defect?
Is this anomaly caused by corruption in all classes of restrictions?
Does demotion occur when the
anomaly becomes active?
Error
Yes
Service Degradation
Yes
Yes
Yes
Yes
Yes, during failures causing partial service.
Yes, during the continued rescheduling of composite times service degradations.
Service degradations, however, typically prevent demotions.
Partial service is specifically designed to prevent project failure when errors in nonmandatory services are not resolved.
Uncontrolled times service degradations (e.g. in
infinite waits) result in failures.
Times service degradation (slower service, etc.)
delays access and prevents conflict between
requests assigned the same priority.
Times service degradation (less time slots per
time interval, etc.) and i/o degradation (less
resource elements per time interval) raise the
number of customers serviced per time interval
and help prevent errors and failures resulting
from system overload.
Accepting lower quality i/o values during i/o
degradation raises the service completion rate.
Dependency service degradation (partial service) prevents failure of essential services.
Priority service degradation provides alternative choices when previously identified optimal
choices fail or deteriorate, thus preventing delays or errors.
Does the project fail if the anomaly
is not controlled in composite layers?
Yes
Is the anomaly used as a prevention
mechanism for an anomaly?
If a process is detected to be
waiting for so long a time interval that an infinite delay is suspected, the process is demoted
to the layer where its duplicate
is maintained, possibly triggering restart and recovery. In addition,
during
overload,
processes are dropped. These
forced errors help control types
of times service degradation.
Forcing demotion of intruders
prevents errors, degradations,
and failures; these are not errors. Errors occur if recovery
involves restarting authorized
user processes. Recovery causes service degradation.
Exception handling
Rollback and restart
Which mechanisms are most effective for the recovery of the anomaly?
What is the result of not controlling
the anomaly?
Does one anomaly cause the other
anomaly?
Failure or partial service
All errors cause service degradation.
Unresolved errors cause partial
service or failure.
When is the handler for this anomaly applied?
An error handler is applied
when an error occurs, assuming
that it is detected.
When is this anomaly applied as a
mechanism for handling an anomaly?
Errors are used to force rollback and recovery for times
service degradations that are
assumed to be unbounded.
These are applied at the time
unit of detection, typically estimated by heuristics.
Adjustment of restrictions
Forcing an error to achieve rollback or to drop
a service.
Error; other service degradations; potential
failure.
Service degradations cause errors when demotion times restrictions expire.
Dependency service degradations cause times
service degradations or errors when nonoptimal choices execute.
Detection of service degradation is ambiguous.
Typically, systems monitor behavior and continuously adjust parameters to minimize deviations from optimal service.
Service degradation becomes active after the
defect is activated but before an error results.
Statistical tools monitor the effects of service
degradation and adjust restrictions in order to
prevent the occurrence of an error.
Table 1. Comparison of Errors and Service Degradation
2
The term “anomaly” refers to error, service degradation, or failure in this chart.
Defect
Classes
Data Output
Defect
Data Input
Defect
Promotion
Dependency
Restriction
Defect
Demotion
Dependency
Restriction
Defect
Scheduling Defect
Security Defect
Development Defect
Caused by resource management
systems that interfere with authorized
user processes
Caused by resource management
systems that enable an intruder to
interfere with an authorized user
process
Example
Lost update
Destruction of data by virus
Caused by unauthorized requests
that interfere with authorized requests of the same authorized user
project. (Examples of intruders are
placed under security defects.)
Mistyped file name
Detection
mechanism
Frame check sequences, CRCs, logs
and audits
Integrity check values, logs and
audits
I/O system detecting nonexistent file
name
Resolution
mechanism
Rollback and restart of authorized
processes in RB or ID
Rollback and restart of authorized
process in RB or ID; Removal of
intruders
Exception handling in RS and RB
based on detected demotion
Example
Inconsistent retrieval
Identity Theft
Data format inconsistency
Detection
mechanism
System audit of transactions and
keys
System audit of transaction keys;
audit of user accounts
Exception handling in RS or RB;
logs and audits
Resolution
mechanism
Resolution is difficult without access
to all output sites
Removal of intruders; Resolution is
difficult
Recovery is difficult without user
involvement
Example
Packet sent to crashed router based
on circular exchange of data
Corrupted web link leading to intruder’s site
Configuration error so that process
is linked to obsolete version
Internet audits; duplicate maintained
waiting for acknowledgment
Internet audits; Intrusion detection
systems
System logs and audits; possible
traps to operating system
Resolution
Mechanism
Transmission of duplicate in ID
Removal of intruder; ; Rollback and
restart in RB or ID
Automated reconfiguration tools
Example
Acknowledgment terminating wrong
duplicate packet
Logic-bomb event causing the erasure of a hard disk
Lost pointer; heap object unreachable
Detection
mechanism
System logs and audits
System logs and audits
Difficult without user involvement
Resolution
mechanism
Rollback and restart in ID, if duplicate is available at lower layer.
Removal of intruders; Rollback and
restart in RB or ID
Recovery is difficult without user
involvement
Transmitted duplicate packet causing
conflict
Password too short to prevent password attacks
Beta or final version released too
soon
System logs and audits detecting
duplicate packet
System logs and audits detecting
password attempts
System logs and audits detecting
resultant (other) errors
Detection
mechanism
Promotion
Times
Restriction
Defect
(restriction
too Small)
Example
Resolution
mechanism
Possible rollback and restart in RB
or ID.
Removal of intruders; rollback and
restart in RB or ID
Recovery is difficult without user
involvement
Demotion
Times
Restriction
Defect
Example
Hashing function not appropriate for
application
Deliberate setting of too short a
deadline
Internet packet assigned too small a
time-to-live field by user
Budgetary controls; system logs and
audits
Budgetary and management controls.
Logs and audits; statistical tools
(heuristics)
Resolution
mechanism
Recovery is not possible if data are
lost
Recovery is not possible if deadlines are exceeded
Recovery is difficult without user
involvement
Example
Priority inversion / higher priority
process misses a hard deadline
Corrupted page rankings causing
choice of wrong page/ product.
Misallocation of manpower, resources
Importance of lost services is self
evident.
Web browsers checking blogs and
circular links.
Statistical tools signaling unmet
time and service goals
None possible.
Resolution is impractical.
Recovery is impractical
Priority
Restriction
Defect
Detection
mechanism
Detection
mechanism
Detection
mechanism
Resolution
mechanism
Table 2. Classification of Errors in Software Systems
Defect
classes by outcomes
Degradations
causing
Data input or
output
deviations
Deviations
(delays) in time
Deviations in
dependencies
(partial service)
Deviations in
priorities
(inferior voting
choice)
Defect classes by
causes
Scheduling Defect
Security Defect
Development Defect
Caused by resource management systems that interfere with authorized user
processes
Caused by resource management
systems that enable an intruder to
interfere with an authorized user
process
Caused by corrupted
attributes of
Data input or output
attributes
Examples
Examples
Caused by unauthorized requests
that interfere with authorized requests of the same authorized user
project. (Examples of intruders are
placed under security defects.)
Examples
Transmission interference,
such as static, (within acceptable range) due to overlapping channels
Intruder’s corruption of values
(within acceptable range)
Degraded data initialization (within
acceptable range)
Times restrictions
Dropped packets due to
system overloads of authorized packets (congestion)
Dropped multimedia packets due to
DDoS and other overload attacks
Degraded data value (within acceptable range) due to too few iterations in computation
Dependency restrictions
Lost multimedia packets due
to circular routing
Degraded outputs due to memory
leaks in the system
Incorrect parameter binding causing
delivery of wrong (acceptable) value
Priority restrictions
Wrong dropped packet due
to choice of congested port
using distance vector routing
Wrong dropped multimedia packet
due to a corrupted voting choice
(Byzantine Generals)
Wrong dropped packet due to incorrect QoS specifications
Data input or output
attributes
Lost update resolution
Malware resolution
Typing error placing too large a
parameter in sleep statement.
Times restrictions
Too many authorized processes accepted into the
system
Distributed Denial of Service attack
Retransmission of packet assigned a
TTL field that was too small
Dependency restrictions
Resource deadlock; deadlock
resolution
TCP backlog queue filled with halfopen connections from SYN attack
Wait for a nonexistent process
Priority restrictions
Priority inversion, (delays
for high priority processes)
Corrupted page rankings (delays
for higher priority pages)
Priorities in time-lines mishandled
in project assignments
Data input or output
attributes
See Table 2 for any unresolved error in this class.
See Table 2 for any unresolved error
in this class.
See Table 2 for any unresolved error
in this class.
Times restrictions
See Table 2 for any unresolved error in this class..
See Table 2 for any unresolved error
in this class.
See Table 2 for any unresolved error
in this class.
Dependency restrictions
See Table 2 for any unresolved error in this class.
See Table 2 for any unresolved error
in this class.
See Table 2 for any unresolved error
in this class.
Priority restrictions
See Table 2 for any unresolved error in this class.
See Table 2 for any unresolved error
in this class.
See Table 2 for any unresolved error
in this class.
Data input or output
attributes
Distance vector routing
cyclical computation of
(incorrect) weight to determine port priorities
Router intentionally transmitted
wrong delay value using distance
vector routing.
Typing error assigned wrong priority to process.
Times restrictions
Page replacement by FIFO or
packet discarding based on
“wine and milk” is incorrect
Page ranks determined by number
of hits; number is corrupted
Priority assigned for file by developer, perhaps based incorrectly on
when it was last used
Dependency restrictions
Anticipatory page fetching
(dependency by location)
picks wrong page
Page ranks determined by links in
other pages; links are corrupted
Process eliminated from voting set
due to incorrectly assigned dependency
Priority restrictions
Page replacement giving I/O
bound jobs higher priority
picks wrong page
(Corrupted) administrator’s priorities allowed to determine priorities
in system implementation
Programmer-assigned erroneous
priorities are implemented.
Table 3. Classification of Service Degradations in Software Systems