Defining Defects, Errors, and Service Degradations Gertrude Neuman Levine Fairleigh Dickinson University [email protected] Abstract The study of defects is a principal topic of software systems, affecting all phases of a system’s lifecycle. Defects are the cause of errors and service degradations. Unresolved errors cause failures. If defects cannot be prevented effectively, then error control mechanisms must be evaluated. In an earlier paper, we defined resource deadlock and developed a classification scheme for dead states [15]. Resource deadlock is enabled by a defect in a system’s scheduling mechanisms that allows processes to be trapped in a cyclical wait for resources. Cyclical waits can be prevented. When resource deadlock is rare and prevention mechanisms are onerous, however, systems have relied on (perhaps heuristics of) detection and recovery mechanisms to We introduce a model to distinguish between defects, errors, and enable delivery of completed service [7]. service degradations. A two-dimensional classification scheme is developed for defects, defined by the types of process interaction The remainder of this paper is organized as follows: and software corruption that are involved. A third dimension is Related work. Open questions concerning related work. added to this taxonomy for defects that cause service degradation, Introduction of a model to structure our classifications. based on the deviations in service quality that are tolerated. We Presentation of definitions for our terminology. investigate the role of service degradation in error prevention. Representation of the states of a system. Development of a two-dimensional classification scheme for Keywords: defects, errors, failure, service degradation defects. Illustration of active defects in each class. Identification of I Introduction classes that have poor potential for recovery. A project is a set of cooperating processes that are developed to satisfy user and system requirements. Each process consists of pro- Taxonomy of service degradations. Discussion of the relationship between service degradations and errors. ject requests that are bound together by restrictions to regulate their movement as a unit through layers of a software system. If a II Related Work process contains a defect, its execution can result in errors and/or Some of the terminology of our introduction is defined in the litservice degradations. Sometimes a defect remains dormant, pererature of computer science. ISO/IEC [10] contains a three-step haps in an uncalled subroutine or in an unread page of a document definition, from failure (violation of a contract) (13.5.1), to error or in an unexploited vulnerability. A defect becomes “active” when (cause of a failure if unresolved; manifestation of a fault) (13.5.2), it causes an error [1] or a degradation of service. An error occurs to fault (situation that can cause an error) (13.5.3). We prefer the when an authorized user process loses service that is specified in term “defect” rather than “fault” since we do not include physical the user/system contract. Either an error is confined within a user phenomena in our study. project, or else an external process interferes with a user process. External interference implies that there is a defect in the resource The IEEE Standard Glossary of Software Engineering Terminolomanagement system; the activation of such a defect introduces a gy [9] has different meanings for some of the above terms. A fault vulnerability that can be compromised by another process. is defined as both a hardware defect and alternatively as an incorrect step, process, or data definition in a computer program. The Computer science literature contains several classification schemes second definition of fault is also used for the term “error,” with an that were developed to assist in defect prevention. Defect prevenalternate definition provided for error as an incorrectly computed tion, however, is not always possible or cost effective, so that serresult or a human action that produces an incorrect result. We seek vice degradations and errors are common [21]. When defects cause to identify characteristics of faults that can be detected and conerrors, they can sometimes be detected and resolved by a resource trolled after they begin executing but before errors or failures remanagement system [8, 20]. Alternatively, when defects become sult. We need to differentiate between faults, errors, and service active, errors can be prevented or mitigated through service degradegradations and thus do not use the IEEE definitions. dations. Errors that are neither prevented, nor mitigated, nor resolved cause failures. Landwehr et al. [12] develop a taxonomy of software security flaws (defects). Outcomes resulting from (active) security flaws are We introduce a two-dimensional classification scheme for defects. classified as: unauthorized disclosure, unauthorized destruction of We identify those types of errors for which resolution mechanisms data, unauthorized modification of data, and denial of service. are problematic, making prevention critical. A three dimensional These outcomes can be reconciled with our classification. Unauclassification scheme is presented for service degradation. Service thorized modification and destruction of data both involve unaudegradations are discussed in terms of their role in the prevention thorized requests that conflict in stored data values with authorized of errors. (They do not prevent defects!) Our classes are organized requests (absence of required values implies presence of unacaccording to constructs of a model that previously was integrated ceptable values) and, according to our classification, result from into the study of specific types of defects [14, 15, 17]. We generaldefects in data output. (See section 6.2.) Unauthorized disclosures ize previous findings [16] in order to obtain a unified approach to are outputs at unauthorized locations, but result from defects in the control of defects. data inputs. Denial of service requires that processes be prevented from obtaining requested resources, which can occur, for example, during system overload; such behavior results from defects in times restrictions. Other denial of service attacks include corrupted web links (defective dependencies) and page ranks (defective priorities). Landwehr et al. also develop a taxonomy of system flaws based on causation in terms of motive, time, and location of introduction, with the class of time (when the flaw enters the system) specified as occurring during development, maintenance, or operation. Our model divides the system into layers in which both development and maintenance are services of the lowest layer, while operation occurs at upper layers. Defects originate during development or maintenance, but cause errors during operation. Chillarege et al. [4], in their classification of defects, do not distinguish between defects and errors. They list defect types as function, interface, checking, algorithm, assignment, build/package/ merge, timing/serialization, and documentation. We exclude the classes of function, documentation, and algorithm from our study; such “defects” cause errors or service degradations because their implementations contain defects in request attributes. Assignment defects are subsets of our class of data output defects; checking defects are subsets of data input defects; build/package/ merge and interface defects are subsets of defects in dependency restrictions; timing/serialization defects are subsets of defects involving times restrictions. We add a category for defects in priority restrictions. (when data are used for identity theft or other attacks on confidentiality)? Do all active defects cause errors? Do all unresolved errors cause failures (ignoring toleration of partial service)? Then, if “cutting the line” is an error and if an intruder is not bumped off the line, failure must result. Yet failure need not result; typically all cars behind the intruder complete service, although they suffer service degradation. Similarly, priority inversion [13] is an active defect that causes service degradation, but it does not necessarily cause the failure of higher priority processes. Defect prevention mechanisms are applied before defects begin execution. Resolution mechanisms are applied after defects cause errors. Defect prevention frequently is onerous; can prevention be delayed until after defect execution, but before errors occur? Can defect activation be used to trigger the employment of error prevention mechanisms? Can service degradations assist in error prevention? Can service degradations assist in the mitigation of errors with more flexibility and less cost than error resolution? III A Model for Software Systems We introduce a model to enable definitions of the above terms and to structure classifications of errors and service degradation. User requirements are expressed, in their simplest form, as requests for input or output at resources. Requests are combined into processes, Avizienis et al. [1] present taxonomies for dependable and secure which are the entities of completion of service at system layers computing, introducing definitions for common terminology and (stages that must be reached or relinquished in order). issues. They differentiate between faults (defects) and errors, similar to ISO/IEC. They state that a fault is “active” when it causes an 3.1 The Layers error. We claim that an “active” fault can cause a service degrada- M is a set of ordered layers of requests in a software system. tion that need not be an error. Mechanisms to achieve dependabil- The Process Conception Layer, PC, is a set of requests that are ity and security are classified as: fault prevention, fault tolerance, being developed from Requirements into a project. Requests fault removal, and fault forecasting. Fault tolerance, the ability to are conceived and reconceived (corrected, modified, mainprovide (degraded) service despite the presence of faults, is gertained) in this layer until their processes are in a form that can mane to our study. Perhaps there exist faults whose prevention is be accepted by the resource system. not critical. Some faults cause errors that result in partial service A Process Buffer Layer, PB, is a set of requests that are buff[1], in which failure is tolerated for processes that are nonessential ered (stored, delayed, postponed) awaiting delivery to the refor project completion. Other faults cause degraded output or insource layers. put; applications in certain domains can tolerate small deviations An Independent Delivery Layer, ID, is a set of requests that from optimal values [6, 19]. Active faults that result in delays are are being delivered to resource layers (perhaps transmitted via routinely tolerated within a limited range [3, 19]. Some systems the Internet). An ID contains requests from multiple projects. attempt the selection of an optimal choice, but accept non-optimal A Resource Buffer Layer, RB, is a set of requests that are alternatives. Many faults cause errors that are resolvable [8]; resobuffered in the resource system. lution methods include rollback and exception handling, but intro A Resource Service Layer, RS, is a set of requests that are duce delay (degradation). There are also classes of faults that executing at the resource, via inputs or outputs, requesting threaten irreparable harm to a system if they become active; we completion of service. need to identify these faults and, in dealing with them, channel our A Service Completion Layer, SC, is a set of requests whose energies towards their prevention. processes have completed service at RS. (If the layers are cirSome researchers do not distinguish between faults and errors cularly ordered, SC and PC denote the same layer and pro[e.g., 4, 12]. Others define a fault as the cause of an error [e.g., 1, cesses that move to SC / PC can be modified for reuse [17]). 10], such that an error occurs when a fault is “active” [1]. Defining a software defect (fault) is difficult. Even blatant defects, when We define an ordering relation, >, on the elements of M, such that executed, need not result in an error. Consider the transmission of SC > RS > RB > ID > PB > PC. For m, m’ є M, if m’ > m we unencrypted top secret data over an insecure wireless connection. say that m’ is a higher layer than m. Is this defect active when it executes (when data are transmitted All processes seek to linearly traverse layers, but are impeded by insecurely), when the vulnerability is compromised (when unau- restrictions assigned during different stages of their lifetimes. (For thorized data are read), or only when captured data are exploited example, assume that an Ada programmer conceives a program to play the Game of Life on a personal computer. A GNAT compiler translates the source code. If syntax errors are found during compilation, the program is returned to the programmer. After correction and recompilation, executable code is stored in user buffers waiting to be submitted for input and output service at devices and intermediate resources. Once execution begins, statements are stored in system buffers, pending the service of preceding statements. Many errors, such as attempting to open nonexistent files, incur traps to the operating system and interrupt the program’s service. The operating system might signal the program via an exception handler and enable it to complete service.) A six layer model is a simplification, chosen to correspond to basic computer systems. Some systems contain null layers. (The personal computer in the above example does not utilize an ID.) Processes progress through empty layers, and only through empty layers, without delay. They remain for at least one resource unit at each nonempty layer, to obtain the service, such as buffering or delivery, of that layer. Many systems contain sub-layers that processes must traverse (such as in the waterfall model of software engineering). output, as well as process and data authorization keys. In addition, a request is assigned, at different stages of its mapping, a restriction set (see section 3.7) and a data set. The data set includes values (such as literals, colors, and signal strength), format and type keys, a work-area, and an operation. When peer-to-peer systems are modeled, requests are sent and received at each end of the layers by processes that control the resources at that end. The ordering relation is then defined by the direction from source of conception to destination of completed service. An input request with a wild card for a value requests the retrieval of the stored value from the accessed resource, followed by the output of that value into its work-area. Any input request except a match request that matches the format key of the accessed resource (in contrast to undecipherable input, for example) enables comprehension after retrieval and is called a read. An output request seeks to store, at the requested resource, either its value and/or the value in its work-area, perhaps modified as specified by its operation. Certain types of output requests, including those that constitute an update, share their work-area. An input request that contains any value except a wild card is a match request and is serviced at its layer only if its key and value match those of the accessed resource. Guards for conditionals and loops are match requests; if their attributes do not match those of the resource, the dependent requests in their structure are refused service without affecting the service of their processes. A match request that conflicts with an authentication lock, on the other hand, impairs its entire process. 3.2 Time T is a finite set of linearly ordered discrete units of Time, repre- 3.5 The Process sented by an initial subset of the natural numbers and bounded by P is a set of processes in a software system. A process, p є P, is an the lifetime of the system. nonempty, ordered set of cooperating requests that are bound together by dependency restrictions so that requests of the same pro3.3 The Resource cess are serviced sequentially in RS and so that a process R is a set of resources in a software system. Each resource ele- completes service and/or is demoted as an entity. Additional rement, r є R, is an ordered set of units that are bound together with strictions determine control within and between processes (involvdependency restrictions (see section 3.7) so that units of the same ing conditional statements, loops, reference parameters, structure resource element are allocated in sequence, corresponding to units chart relationships, and concurrency synchronization, for example). of T. Dependency restrictions also bind together different resource A project’s completion of service is dependent upon the compleelements for combined allocation. (For example, an operating sys- tion of service of all of its processes (or of essential processes, if tem combines disk sectors into blocks for indivisible allocation and the project supports partial service). Each process has a unique key deallocation.) A resource’s attributes include data values as well as that is assigned to its requests. This key contains fields for process, keys for status, resource and system identification, and for user project, and user identification and authorization. authorization. Fungible resources contain elements that are indistinguishable to user processes, but have additional fields with 3.6 The Software System which the resource management system differentiates between A Software System is a quintuple (M, P, R, T, F), where M is a them. set of ordered layers, P is a set of processes, R is a set of resources, T is a finite set of discrete units of Time, and F is a func3.4 The Request tion that controls the movement of requests. For the requests of A request is an atomic entity expressed by the tuple (m, r, t): each process and for each single and composite layer, F assigns restrictions as well as a relation, gM,: M M, such that m є M identifies the current position of the request in the system layers. A request at layer m requests to be mapped to the gm (m, r, t) = (m’, r’, t’), m’ > m, if restrictions permit this mapping layer m’ > m. (called promotion), else r є R identifies a resource where service is requested. For cer- gm (m, r, t) = gm (m, r’, t’), t’ > t, if restrictions permit this mapping tain processes, such as implementations of search algorithms, (called rescheduling), else a request can be mapped to different identifiable resources gm (m, r, t) = gm (gm’ (m’, r’, t’)), m > m’ (called demotion). during its lifetime. We have simplified the mapping above. When a request is promot t є T identifies a request’s current position in the discrete ored, a duplicate may be generated. Sometimes copies are rescheddering of Time. A request seeks movement to a higher layer at uled at their former layers. (For example, stations using the its current unit, t є T, but might be delayed (mapped to t’>t). Transmission Control Protocol (TCP) retain duplicates when they Each request has a permanent attribute, denoting it as input or transmit data units.) Alternatively, duplicates are promoted concur- rently (as in RAID architecture or child processes). Multiple map- 3.7.3 Priority restrictions pings occur in many circumstances. A request is enabled for promotion iff its promotion dependency set is empty and its promotion times restriction is 0. A request is Requests ultimately request promotion to SC. During their proenabled for rescheduling iff its demotion times restriction is posigress, if restrictions prevent promotion but allow rescheduling, tive and its demotion dependency is unchanged. Two enabled rerequests are mapped within the same layer to later units of T. If quests that request mapping to the same resource, delivery, or restrictions prevent both promotion and rescheduling, requests are buffer unit compete if one is an output and the other has a nondemoted to lower layers to repeat their quest for completed service. matching authorization and/or data key. Conflict takes place if two Service at a layer is provided during rescheduling. Rescheduling or more competing requests are mapped to the same service unit. also determines which elements of fungible resources are allocated. Processes are said to conflict (or compete) with each other if their requests conflict (or compete). At most one conflicting process can 3.7 The Restrictions be serviced at a resource unit; others are demoted. Conflicts inAt each layer, a request is assigned restrictions, both statically and volving match requests for locks or logins are important protection dynamically, of dependencies, times, and priorities. These conmechanisms that prevent unauthorized access. They result in errors straints, together with attributes of data input and output, determine only in intruders (if utilized correctly). Conflicts resulting in data the classes of active defects that are defined in section 6.2 and the inconsistencies, on the other hand, can cause errors for all involved classes of service deviations that are presented in section 7. parties, including authorized user processes. 3.7.1 Dependency restrictions Priority restrictions determine which competing or conflicting rePromotion and demotion dependencies are sets of events that requests are promoted, rescheduled, or demoted. (For example, strict a request’s movement. Before a request is promoted, it must hardware interrupt service routines are assigned higher priorities wait for completion of (some combination of) the events contained than competing user processes.) Priorities are determined differwithin its promotion dependency. (As examples, while a user reently in different systems. (For example, newly promoted virus quest in RB waits for an input or output completion, its process outputs conflict with and overwrite authorized outputs that were remains on the blocked list; an access to a block referenced by an promoted earlier but were still being serviced at the resource. On i-node’s triple indirect pointer is dependent upon three accesses to the other hand, in broadcast systems using C-Aloha, conflicting blocks containing intermediate pointers; minimizing the number of outputs with weaker signals are demoted even if they are newly events in a dependency decreases the degree of coupling.) A repromoted, while all conflicting outputs are demoted in Alohanet.) quest is demoted if an event in its demotion dependency occurs. (For example, a request and its process are demoted to the blocked IV Definitions list when it makes a system call requesting output. This demotion The above model enables definitions of the fundamental terms of is not an error since the state of the process is stored, enabling re- this paper. Defects are defined and categorized. sumption without loss of service.) Execution is the rescheduling of a request in ID, RB, or RS. 3.7.2 Times restrictions A promotion times restriction is a nonnegative integer that deter- Service is the execution of a request with the decrement of its mines the minimum number of times that a request must be re- promotion times restriction. Service is lost if a process is demoted scheduled at a layer within some interval of T before promotion. A without maintaining state information. (Its promotion time redemotion times restriction is a nonnegative integer that determines striction at each layer of demotion is reset.) the maximum number of times that a request can be rescheduled at Abortion is the demotion of an executing process to PC. a layer within some interval of T before demotion. These values are decremented, down to 0, each time a request is rescheduled User Requirements are a set of user requests for completed serwithout conflict. A positive value for a promotion times restriction vice at specified resources: processing specified inputs, producing prevents a request from being promoted. (For example, a sleep (n) specified outputs, and satisfying specified restrictions. User restatement in the C programming language forces the next statement quests are developed in PC into a project and its processes and to be rescheduled at its layer at least n times before its promotion requests to satisfy a user/ system contract. We assume that, as part to RS.) A zero value for a demotion times restriction forces a re- of the contract, the project and resource management system agree quest, in any layer except PC, to be demoted. (Requests with a upon behavior to provide completed service. Projects are assigned hard deadline are assigned demotion times restrictions limiting the authorization keys that are attributed to its processes and requests. number of times that they can be rescheduled. Demotion times The resource management system assigns keys to resources for restrictions also limit the number of times that modems attempt authorization of requests and processes. connections during a session. Data transmission rates have an up- Authorized processes have keys that match the authorization keys per bound for the number of bits that are sent per second; such of the resources that their requests are accessing. maximum values are set by communications protocols and the hardware. Even if a sender could exceed a maximum transmission Authorized requests have keys that match the authorization keys rate, the receiver would not supply service.) All requests are as- of data stored at the resources that they are accessing, as well as signed composite demotion times restrictions that are bounded by the authorization keys of the resources. the lifetime of the system so that, at the end of T, processes that are Interference is the corruption of attributes of authorized user or in a state of execution are demoted to PC. system requests, impeding the service of an authorized user. Interference is transitive; any process whose service is impeded by a process that suffers interference also suffers interference. V The System States Avizienis et al. [1] assert that the behavior of a system can be described by its states, which consist of computation, communication, stored information, interconnection, and physical condition. These states are expressible with the constructs of our model, both at a specific unit of T and over the lifetime of the system. Resources and requests at a given unit of T provide a snapshot where: An intruder is a process containing unauthorized requests that interfere with authorized requests of a different project. An intruder might be an unauthorized process, or else it might be an authorized process of a user or resource management system that contains unauthorized requests. An authorized project that contains unauthorized requests that interfere only with its own requests is not an intruder. Information is stored in the data values of requests that are currently being serviced and in the resources and buffers A defect (flaw) is a nonempty set of requests whose execution can where they are being serviced. result in interference. We identify three classes of defects: Computation is specified in the operations of output requests 1) A development defect contains unauthorized requests that, Interconnection and blockage are defined by the dependency when executed, interfere with authorized user requests. The sets of requests. unauthorized requests might belong to the project of the au Communication is achieved via input and output service of thorized user requests. Otherwise they belong to an intruder, in requests. which case interference is enabled only if the resource man The system’s physical condition, although excluded from our agement system is defective. study, can be expressed by resource status fields. 2) A scheduling defect contains resource management requests that, when executed, enable interference between authorized The behavior of a system is defined by request mappings. At each user processes. A scheduling defect results in interference only unit of T, a request is mapped to another request, either in a higher if user processes compete for service. (For example, priority or lower layer or for a later unit of T. Our model can thus be conmechanisms might be corrupted, causing conflict between sidered to be a very large finite state machine, with the mappings competing authorized processes. Conflict results in demotion of requests identifying transitions from one state to another. (Muland loss of service of at least one of the authorized processes.) tiple mappings require multiple transitions following a single in3) A security defect contains resource management requests that, put.) Note that a cycle cannot exist; each mapping is either to a when executed, enable interference between an intruder and an request at a later unit of T, or to an adjacent layer followed by a authorized user process. A security defect results in interfer- mapping to a later unit of T at the adjacent layer (unless the adjaence only if an intruder exploits the vulnerability. (For exam- cent layer is empty). The mapping of a process to a lower layer ple, a system’s match requests to verify access control rights signals a possible error. The continuous mapping of a process might be defective, allowing an intruder to overwrite user da- within the same layer(s) signals potential service degradation. ta.) VI The Classification Scheme A failure is the abortion of an authorized user process. Defects are classified as scheduling, security, or development, deAn error is the demotion of an authorized user request and its loss pending on the type of process interaction involved. Active defects of service caused by the execution of a defect. If the demoted re- are classified according to the corrupted attribute that causes them. quest does not repeat upward movement through the layers from which it was demoted as well as service at those layers (called res- 6.1Classification of Defects olution), and if its service is necessary to its process, the process All software defects originate during development, either from user and/or from system requirements. Some development defects conwill fail. tain unauthorized requests that are isolated within their authorized Standard service is service without defects. user project, potentially harming only their process and/ or project. A service degradation is a deviation from requested optimal ser- Other development defects are found in intruders. Scheduling device of an authorized user process caused by the execution of a fects consist of incomplete or inconsistent competition mechanisms defect. Most user/system contracts tolerate deviations from optimal of resource management systems that interfere with access to reservice that result during standard service (such as delays in com- sources of authorized user processes. Security defects consist of pletion); we consider such deviations to be an accommodation, not incomplete or inconsistent cooperation mechanisms of resource a degradation of service.1 Systems deploy service degradations to management systems that allow intruders to interfere with authorprevent errors. Where possible, processes accept deviations in ser- ized user processes. vice. In addition, systems monitor suspected service degradations 6.1.1 Development Defects and adjust restrictions to prevent the occurrence of errors. Development is the production of a software project in PC, underA defect is active when it causes service degradation or an error. 1 Degradation of service has been defined as a mechanism for tolerating an increased load on a system [25]. Unless the increased load is beyond service specifications, we consider a deviation in quality of service that accommodates more users to be a responsibility of standard service. standing, planning, implementing, testing, documenting, and maintaining the processes that cooperate in comprising a project. Development defects cannot always be prevented. Furthermore, defect prevention frequently is cost prohibitive. Systems can detect some errors caused by inadvertent development defects and signal exception handlers (if they are provided), assisting processes in completing service. Operating systems and programming environments also detect and rectify certain program errors and service degradations. (For example, “canaries” help systems detect buffer cally following its conflict with the input of the second update. overflow [20] and the Java and LISP programming environments Overwriting the data of the first update causes its loss of previous utilize garbage collection to remedy memory leaks.) service.) A security data output defect causes an error, for example, when a virus is allowed into a system and its requests over6.1.2 Scheduling Defects write data being stored by authorized output requests. Resource Scheduling is the assignment of a set of asynchronous processes to management systems frequently detect errors resulting from schedshared resources. (For example, an operating system assigns uling and security output defects via mechanisms such as checkthreads to processors and a network router places packets in output sums and integrity check values, and by logs and audits. They queues of selected ports.) To enhance throughput and minimize recover from detected errors via rollbacks and restarts in RB or response time, a system schedules processes to resources for conID. Systems also detect certain errors resulting from user develcurrent units of T. A defective resource manager can interfere with opment data output defects using mechanisms such as overflow user processes during certain traffic conditions. Systems resolve circuitry; error resolution is achieved, for example, via signals to many scheduling errors with backups and restarts (maintaining exception handlers in user processes. Such error resolutions, howduplicates at lower layers, promoting the demoted processes, and ever, cause delays and are thus examples of service degradation repeating lost service). Systems diminish many scheduling service caused by corrupted data outputs. degradations by adjusting restrictions (perhaps by measuring waiting times as decrements in times restrictions and raising priorities 6.2.2 Data input defects accordingly). The greatest efficiency for error resolution is A development data input defect causes an error, for example, if a achieved at the layer closest to the initial demotion. Typically, ser- process inputs from a resource before a value has been stored there vice degradations are controlled at the layers at which defects are and an output request places that (unauthorized) value in another activated. Degradation mitigation thus saves the cost of promotion location. The error occurs during data output, but the cause is a and repeated service that is involved in error resolution. data input defect. A scheduling data input defect causes an error during inconsistent retrieval [2]. (One transaction transfers money 6.1.3 Security Defects between two resources. Its outputs are interleaved with another Security is the prevention of the access of intruders to resources transaction, which inputs from both resources and outputs the sum that are shared among authorized processes. (For example, a UNIX of the retrieved values elsewhere. If the second transaction’s inputs system authenticates processes before they are provided entry.) An execute in between the first transaction’s outputs, for example, the active security defect in a resource manager enables interference second transaction’s output is erroneous.) A security data input from advertent or inadvertent intruders. Intruders can be both addefect becomes an error, for example, during identity theft, where vertent and authorized (such as an administrator that misuses an intruder inputs data from a resource at which the output of an granted authority), advertent and unauthorized (such as viruses), authorized process is being serviced. When that data are misused, inadvertent and authorized (such as temperamental device drivers an error occurs. Errors caused by data input errors are difficult to that crash the system), or inadvertent and unauthorized (such as detect, since retrieved values can be stored in work-areas that are buggy games installed by a careless employee). Systems recover outside the control of the system. Logs and audits are not always from certain types of security errors with backups and restarts. effective; an intruder might eavesdrop over an unprotected connection or else corrupt the logs. Assuming that unauthorized input is 6.2 Classification of Active Defects detected, recovery is not feasible if the data have been output beAn active defect can be placed into one of seven classes, dependyond the system’s control. Even if recovery is achieved, it delays ing upon which request attribute is corrupted. These classes are: the service of user processes and causes service degradation. output and input data attributes, promotion and demotion dependency restrictions, promotion and demotion times restrictions, and 6.2.3 Promotion dependency defects priority restrictions. Errors and service degradations are usually A development promotion dependency defect causes an error, for caused by multiple defects, such as an intruder gaining access to a example, if a pointer calculation is incorrect and there is no path to system due to a defect in authentication mechanisms and then the resource that was dependent upon the pointer. A scheduling overwriting user data using corrupted access control mechanisms, promotion dependency defect causes an error when switches send dependencies, and data attributes (typical activities of viruses). The packets to crashed routers based upon circular information, pernext sections contain examples of active defects in the above clas- haps using the distance vector algorithm [24]. A security promoses. (See Table 2 and Table 3 for charts of sample errors and ser- tion dependency defect becomes an error, for example, if intruders vice degradations in these classes.) corrupt web link dependencies. Referential links can be lost or, more dangerously, reference a masquerading site [3]. A develop6.2.1 Data output defects ment promotion dependency defect causes service degradation, for A development data output defect causes an error, for example, example, if the corruption of a promotion dependency causes a when a mistyped initialization is stored in a database. (If data are request to wait for a nonexistent process [7]. (The continued rewithin acceptable range, service degradations instead of errors scheduling during the wait can be detected by the process with a occur.) A scheduling data output defect causes an error during a match on a time-out, and initiate the discarding of the waiting relost update. (Authorized requests from different processes are asquest.) A scheduling promotion dependency defect causes service signed priorities that allow them to output concurrently to the same degradation when scheduling enables a closed circular chain of resource [2] and the two updates are interleaved in execution. The dependencies, as in a resource deadlock [7, 15]. A scheduling deoutput of the second update overwrites the first update, whose fect also causes service degradation in a circular dependency in promotion had been prevented by a dependency, assigned dynamiwhich four cars are each idling at different corners of a four-way intersection, guarded by stop signs, waiting for cars on the left to proceed first [15]. A security dependency defect causes service degradation, for example, if an intruder corrupts a promotion dependency in TCP [24]. A SYN (connection request) flood can prevent authorized users from establishing a connection, since TCP systems wait for nonexistent acknowledgements before releasing buffers necessary for new connections. If defects cause service degradation and their demotion times restrictions expire, waiting processes are demoted – an error. If composite demotion times restrictions expire, processes fail. Systems maintain metrics of waiting times and resource usage. Such statistical tools, although heuristics, are useful in alleviating many types of service degradations by initiating the adjustment of parameters (such as the size of TCP’s backlog queue). They also monitor for circular links. If cycles are found in resource deadlock, systems force errors and initiate recovery procedures. Compensation through redundancy [1] is effective for preventing dependency defects from becoming errors. Doubly linked lists, for example, provide alternate paths to list elements, contained within the promotion dependency of requests for list elements. Typically, however, the alternative path must traverse more links that the preferred path, causing degradation of service when a link to the preferred path is lost. an error if a duplicate TCP data unit is assigned a promotion times restriction that is smaller than the round trip propagation delay, so that it is transmitted before the original’s acknowledgement can be received. If the window has wrapped around at the destination, the duplicate might conflict with a message with the same connection identifier [24]. A security promotion times defect becomes an error when a password is too small to foil a password attack and enables an intruder to attack the system. The above defects cause service degradation as well. If the scheduling and security defects are detected by logs and audits and resolved with rollbacks and restarts, there will be delays in service completion. A prematurely delivered Beta version might satisfy sufficient service requirements for some users, and thus cause only partial failure [1], a type of service degradation. Even if window wrap-around does not occur, defective transmission of a duplicate data unit wastes system resources and potentially causes delays for other processes. Promotion times defects also become active when a promotion times restriction is too large and a request is rescheduled for too many times. A cake can be burned (error) or a non-terminating UNIX ping command can waste resources (service degradation) due to development promotion times defects. Queueing theory tells us that promotion times restrictions allowing new processes into a system must be considerably less, on average, than the promotion times restrictions required for service completion. (Average customer arrival rates must be less than average service rates.) If too many processes are accepted (the promotion times restriction for the resource management system’s acceptance of new customers into the system is too large) instability or congestion ensue; these are service degradations caused by scheduling times promotion defects. Similar security promotion times defects enable “denial of service” attacks during the replication of worms. Systems monitor metrics to detect overload. Effective handling of service degradation resulting from overload involves throttling new traffic (lowering system promotion times restrictions) and decreasing the amount of resources assigned to processes (perhaps changing restrictions on resources so that less elements, such as channels, are assigned as a unit [22, 25] or decreasing times restrictions on users for their rate of access). Systems also choose competing processes to drop (lower their priorities) rather than store, forcing errors [18]. Reducing bandwidth for all processes is an effective service degradation mechanism to prevent errors involved in dropping service for some [25]. Even when packet dropping is used to control service degradation, it is an error according to our definition. Authorized user processes have been demoted and lost their previous service because of the activation of a defect. 6.2.4 Demotion dependency defects A development demotion dependency defect causes an error, for example, when a process deletes its only pointer to heap data; the dependency set of the heap data has been corrupted and memory leaks occur. (Storage cannot be freed.) A scheduling demotion dependency defect causes an error, for example, if TCP cannot reassemble a message due to a corrupted dependency. Assume that an acknowledgment is delayed and arrives at a new connection where the connection window has wrapped around. Since identifiers now appear to match, a duplicate data unit that has been stored is demoted. If the original data unit is lost as well, TCP will not be able to reassemble the message at the destination [24]. A security demotion dependency defect causes an error if a hard disk is erased because of a corrupt dependency (a trigger) assigned by a logic bomb or virus. (Other defects allowed the virus into the system.) A development demotion dependency defect can cause service degradation. For example, in virtual memory systems, requests are assigned demotion dependency sets that contain events for deallocating assigned memory frames. Assume that a programmer sets a lock bit if it is necessary to keep a page in memory. If the bit is not set, the process is delayed as it is swapped in and out of memory, probably causing an error and failure. Scheduling and security demotion dependency defects cause service degradation when, for example, logs and audits detect demotion dependency errors and 6.2.6 Demotion times defects initiate resolution in RB or ID. Typically, processes are restarted Development demotion times defects can cause errors if assigned from duplicates maintained at lower layers. demotion times restrictions are too small or too large. For example, an Internet Protocol packet contains a TimeToLive (TTL) field for 6.2.5 Promotion times defects A development promotion times defect causes an error if a promo- the number of times that it has been rescheduled to routers. TTLs tion times restriction is too small, resulting in premature request are decremented with each packet hop. A packet is demoted when promotion. For example, a project’s Beta version might be deliv- its TTL reaches 0 [24]. If the assigned TTL is too small, the packet ered to a user before it fulfills all service requirements. As another will be demoted before reaching the destination and lose previous example, a linux student can insert a sleep command in a program service. As an example of a too large demotion times restriction, and execute the program in the background in order to execute on the other hand, a loss of confidentiality can result when a com“ps” in the foreground and obtain the process identifier (pid). If the puter is sold without erasing sensitive data on its hard disk. A sleep interval is too small, the process completes executing before scheduling demotion times defect can cause an error, for example, its pid is displayed. A scheduling promotion times defect can cause if a hash storage scheme allows a limited number of rehashes to fixed size buckets following collisions. Some data will not be stored if the key distribution is not spread adequately by the hash function. An example of an active security demotion times defect is that of a program manager who purposely assigns deadlines that are too small to be met or so large that they waste management funds. Most errors resulting from too small demotion times defects are detected when service is not completed. It is typically too late, however, for resolution. 6.2.7 Priority defects Defects in development priority restrictions are common during project development. The disproportionate allocation of manpower and resources to processes that are of lesser importance to the overall project frequently causes both service degradations and errors. Scheduling priority defects cause service degradation in priority inversions, where higher priority processes are suspended while lower priority processes are chosen for execution. If the suspension time exceeds the process’s demotion times restriction, the process will be demoted. Scheduling priority defects become active in networks that use a policy of “wine and milk” for dropping packets. Multimedia packets are treated as milk, since old packets are probably worthless [24]. Bank or database transactions are considered “wine” since older packets have consumed more network resources and will require additional resources to enable their recovery. The determinations of priorities by these criteria are heuristics, however, so that more important packets are sometimes delayed or demoted. A security priority defect causes service degradation when an intruder exploits vulnerabilities in search algorithms to raise its page rank via cyclical links or spamming blogs. Access to appropriate links is either delayed or prevented, the latter causing an error. Priority defects are of great concern during system design and management’s allocation of resources. Resultant errors are difficult to control. Yet we have not found this type of defect included in defect classifications in the literature. improve the quality of service provided. An active defect causes service degradation and/or an error. An active defect will not result in an error in a mandatory process if processes can complete service in a limited manner. Degradation occurs when interference causes a non-optimal result, but one that is within acceptable limits. We have already shown that request attributes determine the types of corruptions that cause service degradations. We now show that the same set of attributes defines the types of variations in service quality. 7.1Times Service Degradation Times service degradation is the continued rescheduling of an authorized request due to an active defect. Slow service [1] is a subset of times service degradation. Slow service, or, more generally, slower service, can refer to delays in obtaining resources or in completing service at resources. These service degradations need not waste resources nor become errors. (For example, when priority inversion occurs, the processor is allocated to a lower priority process, degrading service for an enabled higher priority process. The processor, however, is fully utilized. The higher priority process need not lose previous service and can complete remaining service if its times restrictions permit. Interference, however, has delayed its access to the processor. Defects in scheduling priority restrictions have caused times service degradations.) Times service degradation has been characterized as a delay. More generally, times service degradation is determined by the number of times of rescheduling per specified interval, either more or less than the value agreed to be optimal. A system, for example, can reduce the number of bits transmitted per second, or increase the number of packets dropped per interval to alleviate system overload. Both of these mechanisms to control overload can cause delays if the packets must be resent or the same amount of data must be sent. Alternatively, a receiver can accept lower resolution, with no resultant delay. Continued rescheduling also occurs at compoVII Software Service Degradation Fault tolerance is the ability to continue service, with reduced qual- site layers, such as frequent page faulting during thrashing and ity, when faults become active. When a redundant hardware unit repeated restarts during system design in an unfamiliar domain. fails, service is maintained in degraded mode, preventing failure of Delays and other degradations that result from standard service are the entire system. Similarly, when software defects are activated, not degradations. (For example, processes encounter standard deredundancy mechanisms maintain service in degraded mode. Four lays when higher priority processes are chosen for service. Higher types of service degradations are used to prevent or mitigate errors priority processes may have their priorities lowered and encounter that frequently lead to system failure. (Service degradations do not standard delays when systems prevent starvation with aging mechprevent defects.) Systems monitor service deviations, which signal anisms. Queues are mechanisms for preventing conflict and resultthe possible activation of defects. They control suspected service ant errors. The delays involved, if bounded and prioritized, are the degradations, usually by adjusting the restrictions that they assume result of standard service as processes wait for turns at shared rehave been corrupted. Since the handler is determined dynamically sources.) In order to distinguish between standard service and serand activated on a “need to use” basis, its cost is greatly reduced. vice degradation, statistical tools monitor metrics such as page The project must accept deviations. These include: slower service fault frequencies and number of dropped packets]. These tools, that is completed later than optimal (redundancies in acceptable although heuristics, are effective in identifying parameters that units of time); data values that vary from optimal (redundancies in should be adjusted. (For example, infinite delays for cars waiting at acceptable data values); partial services that are a subset of optimal a four-way stop sign intersection are prevented by eliminating the services (redundancies in acceptable sets of delivered services); dependency on one car. Failures during system overloads can be skewed voting results in which non-optimal winners are chosen prevented by throttling processes or decreasing bandwidth.) (redundancies in choices); or a combination of the above. (Requests accept any choice of elements of fungible resources, which Times restrictions are carefully adjusted to limit times service degprovide alternatives of equal quality. This option is neither a deg- radations that have been identified as frequently leading to errors. radation nor a deviation, but the definition of fungible.) Most sys- In particular, system overloads must be controlled before they feed tems accept deviations during routine service and continuously on resultant errors and error recovery routines. Adjusting times work to reduce them, particularly any suspected degradations, and restrictions to prevent overloads causes delays, but less severe ones than are caused by overload. Defect prevention for system overload requires resource overload (as in hard real-time systems) and restrictive process access. Error resolution for overload conditions, such as attempting to recover dropped packets and connections, is generally too late to be successful. Delaying the binding of an error prevention handler to the detection of probable defect activation is the preferred and most cost-effective method of controlling overload for most applications. routers periodically exchanged packets with their neighbors to estimate the optimal port based on queuing delays. A scheduling defect existed because decision making was cyclical, relying partly on information gained from the router that sent the packet. If a router suddenly became congested, bad news travelled slowly [24]; packets continued to be scheduled to the port of the overloaded router, a non-optimal selection. A security defect existed because a malfunctioning router could broadcast zero delay and its neighbors would route their packets to it [24], also a poor choice.) Byzantine fault tolerance [11] is a voting mechanism to identify and eliminate 7.2 Input/Output Service Degradation Input/ output (i/o) service degradation is the input or output at a malfunctioning members from a set of alternatives. resource of a non-optimal data value due to the execution of a defect. In multi-media or web applications, standard service may 7.5 More on Service Degradations include some loss in data values. (For example, digitizing audio Most systems balance multiple types of service degradations toinformation causes quantizing noise and color specifications for gether with other service deviations. Fuzzy set heuristics are useful web pages are limited based on browser support.) But data loss for adjusting parameters (restrictions) in order to optimize the without interference is neither a degradation of service nor an er- combined result [5, 18]. Game theory has also been studied for ror. Degraded modes of data output or input include, for example, optimization of parameters and dropping choices [23]. output of choppy sound or video or input of signal fading [6]. They The causes of service degradations are frequently difficult to deare caused by such active defects as memory leaks, resulting from termine. Furthermore, it is not always clear when deviations result corrupted dependencies, or packet loss, resulting from corrupted from active defects or from standard operation. It is also challengpriority or times restriction. By accepting the service of degraded ing to identify when service degradations actually begin [11, 20]. i/o, particularly in multimedia applications, systems can tolerate Errors, on the other hand, occur at the time unit of process demodropped packets (partial service) without retransmission. Times tion and usually are easily distinguishable from standard demotions mechanisms monitor deviations, such as increases in packet loss. such as preemptions. Such statistical tools detect degradations in signal quality during process execution and adjust restrictions as appropriate [6, 18, 19]. VIII Summary The term “service degradation” implies that: 7.3 Dependency Service Degradation There exists a set of optimal service outcomes that have been Dependency service degradation is the loss of services that are agreed upon in the user/system contract. not critical to the successful completion of a project (partial service There is a range of acceptable deviations specified by the us[1]) due to interference. Typically, a project is assigned a promoer/system contract in addition to the optimal outcomes. tion dependency restriction so that it cannot complete service until At least one acceptable deviation has occurred due to the exeall of its processes complete service. In dependency service degracution of a defect. dations, a user/system contract specifies an optimal set of services, but only a subset is mandatory. (Application programs and operat- Service degradation involves one or more of these deviations: ing systems, for example, provide features that most users never Deviation, within acceptable limits, from optimal numbers of rescheduling because of interference. use. After experimenting with one of these services and finding that it doesn’t work, a user probably abandons it.) Optional pro- Deviation, within an acceptable range, from the optimal outcesses are not contained in the promotion dependency sets of mancome of input and/or output because of interference. datory processes or their project; the project completes service Deviation from requested services, with delivery of a proper even if optional processes fail. The tolerance for failures of unessubset of services that includes all mandatory services, besential services prevents errors and failures of mandatory processcause of an unresolved error. es. Partial service degradations are caused by all type(s) of errors. Deviation from an optimal choice, such that an inferior but acceptable alternative is selected, because of interference. 7.4 Priority Service Degradation Priority service degradation occurs when an active defect causes Errors and service degradation are integrally related. Both are a non-optimal selection from a set of acceptable alternatives. Sys- caused by active defects. Both are classifiable in terms of causation tems rely on voting schemes to identify the optimal choice among by corrupted request attributes. This same set of request attributes available possibilities [22]. Higher fault tolerance is obtained, defines the types of outcomes of service degradation. All errors providing compensation through redundancy [1], since alternatives cause service degradation; resource units are wasted during error are available if the optimal choice becomes disabled or downgrad- resolution and/or during previous service, as well as during moveed. Multiple deviations contribute to voting decisions, including ments up and down layers. Unresolved errors in non-mandatory possible degradations in time units, i/o values, and selected ser- services cause degradation (partial service). Service degradations vices. Page replacement, routing, and web page ranking algorithms that exceed acceptable deviations cause errors. Service degradaare all heuristic-based voting schemes that sometimes make non- tions are useful in preventing errors, both by tolerating degraded optimal choices. (For example, routers schedule packets to dynam- service and by monitoring deviations and adjusting restrictions ically selected outgoing ports that have the “shortest” distance to accordingly. Forcing errors is useful in the control of infinite loops their destination. In the distance vector algorithm of the Arpanet, (by killing a process in a resource deadlock, for example) and sys- tem overloads (by dropping packets, for example). See Table 1 for a chart of the interdependence of errors and service degradations. This paper presents a classification scheme for defects that is applicable to all phases of a software system and to both service degradations and errors. With the aid of a model developed for other research areas, we obtain detailed definitions for defect, service degradation, and error. We identify classes of treatable errors, mechanisms appropriate for their prevention and resolution, and the layers of a system at which such mechanisms are effective. We offer a two-dimensional classification for errors, as well as a threedimensional classification for service degradation. Service degradations are categorized according to three types of process interaction, seven types of corrupted request attributes that enable degradation (including two subcategories), and four types of degradations that result, yielding (more than) eighty-four classes. Examples of active defects in many of these categories follow (Table 2 and Table 3). Further research should be conducted to develop an expression of our definitions in mathematical notation in order to eliminate remaining ambiguities. Although the translation of many of our definitions into mathematical terms is straightforward, defining defects is difficult. A comprehensive study of service degradation is another open area of research. References [1] [2] [3] A. Avizienis, J. Laprie, B. Randell, and C. Landwehr (2004), Basic Concepts and Taxonomy for Dependable and Secure Computing, IEEE Transactions on Dependable and Secure Computing, vol. 1, no.1 (Jan.- Mar. 2004) pp. 11-33. P. A. Bernstein and N. Goodman (1981), Concurrency Control in Distributed Database Systems, ACM Computer Surveys, vol. 13, no. 2 (June 1981) pp. 185-211. A. Bremler-Barr, E. Cohen, H. Kaplan, and Y. Mansour (2002), Predicting and Bypassing End-to-End Internet Service Degradations, Proceedings of the 2nd ACM SIGCOMM Workshop on Internet Measurement (November 2002) pp.307-320. [4] R. Chillarege, I.S. Bhandari, J.K. Chaar, M. Halliday, D.S. Moebus, B.K. Ray, and M-Y. Wong (1992), Orthogonal Defect Classification- a Concept for In-process Measurement, IEEE Trans. on Software Engineering, vol.18, no.11 (Nov. 1992) pp. 943-956. [5] S. Ghosh, Q. Razouqi, H. J. Schumacher, and A. Celmins (1998), A Survey of Recent Advances in Fuzzy Logic in Telecommunications Networks and New Challenges, IEEE Transactions on Fuzzy Systems, volume 6, (e) (Aug. 1998) pp. 443-447. [6] M. Hadzialic, M. Hamza, and P. Begovic (2007), An Approach to Cell Signal Coverage Reliability in the Presence of Different Fading Models, Proceedings of the 5th ACM International Workshop on Mobility Management and wireless Access, (2007) pp 91-98. [7] R.C. Holt (1972), Some Deadlock Properties of Computer Systems, ACM Computing Surveys, vol.4, no.3 (Sept. 1972) pp.179-195. [8] J.N. Herder, H. Bos, B. Gras, P. Homburg, and A.S. Tanenbaum (2006), MINIX 3: a Highly Reliable, Self-repairing Operating System,” Operating System Review, ACM Press, vol. 40, no. 3 (July 2006) pp. 80-89. [9] IEEE Computer Society 1990. Standard Glossary of Software Engineering Terminology,” ANSI/IEEE Standard 610.12-1990. IEEE Press, New York. [10] ISO Reference Model for Open Distributed Processing (1996), ISO/IEC 10746-2:1996 (E) at http://standards.iso.org/ittf/PubliclyAvailableStandards/ [11] R. Kotla and M. Dahlin (2004), High Throughput Byzantine Fault Tolerance, International Conference on Dependable Systems and Networks (June 2004) pp.575-584. [12] C. E. Landwehr, A.R. Bull, J.P. McDermott, and W.S. Choi (1994) A Taxonomy of Computer Program Security Flaws, ACM Computing Surveys, vol.26, no.3 (Sept. 1994) pp. 211-254. [13] G. Levine (1988) The Control of Priority Inversion in Ada, Ada Letters, vol. 8, no.6 (Nov., Dec. 1988) pp. 53-56. [14] G. Levine (1989) The Control of Starvation, International Journal of General Systems, vol.15 (1989) pp. 113-127. [15] G. Levine (2003) Defining Deadlock, Operating Systems Review, ACM Press, vol.37, no.1 (Jan. 2003) pp. 54-64. [16] G. Levine (2005) A Model for Anomalies of Software Engineering, in T. Sobh and K. Elleithy (Ed), Advances in Systems, Computing Sciences and Software Engineering, Springer, 2005, pp.243-250 [17] G. Levine (1996) A Model for Software Reuse, OOPSLA, San Diego, CA,, Oct. 1996, pp. 71-87. [18] H. Liao, X. Wang, and H. Chen (2008), Adaptive Call Admission Control for Multi-class services in Wireless Networks, IEEE International Conference on Communications, (May 2008) pp. 2840–2844. [19] J. Liebeherr and D. Liao (1995) A Service With Bounded Degradation in Quality-of-Service Networks, Proceedings of the Fourteenth Annual Joint Conference of the IEEE Computer and Communication Societies, vol 3, April 1995, pp. 1103—1110. [20] G. Novark, E. D. Berger, and B.G. Zorn (2008), Exterminator: Automatically Correcting Memory Errors with High Probability, CACM, vol 51 (12) (Dec. 2008), pp. 87-95. [21] H. D. Owens, B.F. Womack, and M.J. Gonzalez (1996) Software Error Classification using Purify, Proceedings, International Conference on Software Maintenance., Nov. 1996, pp. 104-112. [22] W.O. Rom and S. A. Slotnick (2009), Order Acceptance Using Genetic Algorithms, Computer and Operations Research, 36 (2009), pp. 1758-1767. [23] A. N. Rouskas, A. A. Kikilis, and S. S. Ratsiatos, A game theoretical formulation of integrated admission and pricing in wireless networks, European Journal of Operational Research, vol 191 (3), 2008, pp. 1175-1188. [24] A. S. Tanenbaum (2002) Computer Networks, 4th edition, Prentice-Hall, 2002 [25] G. V. Zaruba, I. Chlamtac, S.K. Das, A Prioritized Real-time Wireless Call Degradation Framework for Optimal Call Mix Selection (2002), Mobile Networks and Applications, vol 7, (2), April 2002, pp 143-151. 2 Is this anomaly caused by an active defect? Is this anomaly caused by corruption in all classes of restrictions? Does demotion occur when the anomaly becomes active? Error Yes Service Degradation Yes Yes Yes Yes Yes, during failures causing partial service. Yes, during the continued rescheduling of composite times service degradations. Service degradations, however, typically prevent demotions. Partial service is specifically designed to prevent project failure when errors in nonmandatory services are not resolved. Uncontrolled times service degradations (e.g. in infinite waits) result in failures. Times service degradation (slower service, etc.) delays access and prevents conflict between requests assigned the same priority. Times service degradation (less time slots per time interval, etc.) and i/o degradation (less resource elements per time interval) raise the number of customers serviced per time interval and help prevent errors and failures resulting from system overload. Accepting lower quality i/o values during i/o degradation raises the service completion rate. Dependency service degradation (partial service) prevents failure of essential services. Priority service degradation provides alternative choices when previously identified optimal choices fail or deteriorate, thus preventing delays or errors. Does the project fail if the anomaly is not controlled in composite layers? Yes Is the anomaly used as a prevention mechanism for an anomaly? If a process is detected to be waiting for so long a time interval that an infinite delay is suspected, the process is demoted to the layer where its duplicate is maintained, possibly triggering restart and recovery. In addition, during overload, processes are dropped. These forced errors help control types of times service degradation. Forcing demotion of intruders prevents errors, degradations, and failures; these are not errors. Errors occur if recovery involves restarting authorized user processes. Recovery causes service degradation. Exception handling Rollback and restart Which mechanisms are most effective for the recovery of the anomaly? What is the result of not controlling the anomaly? Does one anomaly cause the other anomaly? Failure or partial service All errors cause service degradation. Unresolved errors cause partial service or failure. When is the handler for this anomaly applied? An error handler is applied when an error occurs, assuming that it is detected. When is this anomaly applied as a mechanism for handling an anomaly? Errors are used to force rollback and recovery for times service degradations that are assumed to be unbounded. These are applied at the time unit of detection, typically estimated by heuristics. Adjustment of restrictions Forcing an error to achieve rollback or to drop a service. Error; other service degradations; potential failure. Service degradations cause errors when demotion times restrictions expire. Dependency service degradations cause times service degradations or errors when nonoptimal choices execute. Detection of service degradation is ambiguous. Typically, systems monitor behavior and continuously adjust parameters to minimize deviations from optimal service. Service degradation becomes active after the defect is activated but before an error results. Statistical tools monitor the effects of service degradation and adjust restrictions in order to prevent the occurrence of an error. Table 1. Comparison of Errors and Service Degradation 2 The term “anomaly” refers to error, service degradation, or failure in this chart. Defect Classes Data Output Defect Data Input Defect Promotion Dependency Restriction Defect Demotion Dependency Restriction Defect Scheduling Defect Security Defect Development Defect Caused by resource management systems that interfere with authorized user processes Caused by resource management systems that enable an intruder to interfere with an authorized user process Example Lost update Destruction of data by virus Caused by unauthorized requests that interfere with authorized requests of the same authorized user project. (Examples of intruders are placed under security defects.) Mistyped file name Detection mechanism Frame check sequences, CRCs, logs and audits Integrity check values, logs and audits I/O system detecting nonexistent file name Resolution mechanism Rollback and restart of authorized processes in RB or ID Rollback and restart of authorized process in RB or ID; Removal of intruders Exception handling in RS and RB based on detected demotion Example Inconsistent retrieval Identity Theft Data format inconsistency Detection mechanism System audit of transactions and keys System audit of transaction keys; audit of user accounts Exception handling in RS or RB; logs and audits Resolution mechanism Resolution is difficult without access to all output sites Removal of intruders; Resolution is difficult Recovery is difficult without user involvement Example Packet sent to crashed router based on circular exchange of data Corrupted web link leading to intruder’s site Configuration error so that process is linked to obsolete version Internet audits; duplicate maintained waiting for acknowledgment Internet audits; Intrusion detection systems System logs and audits; possible traps to operating system Resolution Mechanism Transmission of duplicate in ID Removal of intruder; ; Rollback and restart in RB or ID Automated reconfiguration tools Example Acknowledgment terminating wrong duplicate packet Logic-bomb event causing the erasure of a hard disk Lost pointer; heap object unreachable Detection mechanism System logs and audits System logs and audits Difficult without user involvement Resolution mechanism Rollback and restart in ID, if duplicate is available at lower layer. Removal of intruders; Rollback and restart in RB or ID Recovery is difficult without user involvement Transmitted duplicate packet causing conflict Password too short to prevent password attacks Beta or final version released too soon System logs and audits detecting duplicate packet System logs and audits detecting password attempts System logs and audits detecting resultant (other) errors Detection mechanism Promotion Times Restriction Defect (restriction too Small) Example Resolution mechanism Possible rollback and restart in RB or ID. Removal of intruders; rollback and restart in RB or ID Recovery is difficult without user involvement Demotion Times Restriction Defect Example Hashing function not appropriate for application Deliberate setting of too short a deadline Internet packet assigned too small a time-to-live field by user Budgetary controls; system logs and audits Budgetary and management controls. Logs and audits; statistical tools (heuristics) Resolution mechanism Recovery is not possible if data are lost Recovery is not possible if deadlines are exceeded Recovery is difficult without user involvement Example Priority inversion / higher priority process misses a hard deadline Corrupted page rankings causing choice of wrong page/ product. Misallocation of manpower, resources Importance of lost services is self evident. Web browsers checking blogs and circular links. Statistical tools signaling unmet time and service goals None possible. Resolution is impractical. Recovery is impractical Priority Restriction Defect Detection mechanism Detection mechanism Detection mechanism Resolution mechanism Table 2. Classification of Errors in Software Systems Defect classes by outcomes Degradations causing Data input or output deviations Deviations (delays) in time Deviations in dependencies (partial service) Deviations in priorities (inferior voting choice) Defect classes by causes Scheduling Defect Security Defect Development Defect Caused by resource management systems that interfere with authorized user processes Caused by resource management systems that enable an intruder to interfere with an authorized user process Caused by corrupted attributes of Data input or output attributes Examples Examples Caused by unauthorized requests that interfere with authorized requests of the same authorized user project. (Examples of intruders are placed under security defects.) Examples Transmission interference, such as static, (within acceptable range) due to overlapping channels Intruder’s corruption of values (within acceptable range) Degraded data initialization (within acceptable range) Times restrictions Dropped packets due to system overloads of authorized packets (congestion) Dropped multimedia packets due to DDoS and other overload attacks Degraded data value (within acceptable range) due to too few iterations in computation Dependency restrictions Lost multimedia packets due to circular routing Degraded outputs due to memory leaks in the system Incorrect parameter binding causing delivery of wrong (acceptable) value Priority restrictions Wrong dropped packet due to choice of congested port using distance vector routing Wrong dropped multimedia packet due to a corrupted voting choice (Byzantine Generals) Wrong dropped packet due to incorrect QoS specifications Data input or output attributes Lost update resolution Malware resolution Typing error placing too large a parameter in sleep statement. Times restrictions Too many authorized processes accepted into the system Distributed Denial of Service attack Retransmission of packet assigned a TTL field that was too small Dependency restrictions Resource deadlock; deadlock resolution TCP backlog queue filled with halfopen connections from SYN attack Wait for a nonexistent process Priority restrictions Priority inversion, (delays for high priority processes) Corrupted page rankings (delays for higher priority pages) Priorities in time-lines mishandled in project assignments Data input or output attributes See Table 2 for any unresolved error in this class. See Table 2 for any unresolved error in this class. See Table 2 for any unresolved error in this class. Times restrictions See Table 2 for any unresolved error in this class.. See Table 2 for any unresolved error in this class. See Table 2 for any unresolved error in this class. Dependency restrictions See Table 2 for any unresolved error in this class. See Table 2 for any unresolved error in this class. See Table 2 for any unresolved error in this class. Priority restrictions See Table 2 for any unresolved error in this class. See Table 2 for any unresolved error in this class. See Table 2 for any unresolved error in this class. Data input or output attributes Distance vector routing cyclical computation of (incorrect) weight to determine port priorities Router intentionally transmitted wrong delay value using distance vector routing. Typing error assigned wrong priority to process. Times restrictions Page replacement by FIFO or packet discarding based on “wine and milk” is incorrect Page ranks determined by number of hits; number is corrupted Priority assigned for file by developer, perhaps based incorrectly on when it was last used Dependency restrictions Anticipatory page fetching (dependency by location) picks wrong page Page ranks determined by links in other pages; links are corrupted Process eliminated from voting set due to incorrectly assigned dependency Priority restrictions Page replacement giving I/O bound jobs higher priority picks wrong page (Corrupted) administrator’s priorities allowed to determine priorities in system implementation Programmer-assigned erroneous priorities are implemented. Table 3. Classification of Service Degradations in Software Systems
© Copyright 2026 Paperzz