Data Diverse Software Fault Tolerance Techniques 5.1 Retry Blocks The basic RtB technique is one of the two original data diverse software fault tolerance techniques. The hardware fault tolerance architecture related to the RtB technique is stand-by sparing or passive dynamic redundancy. The RtB technique is the data diverse complement of the recovery block (RcB) scheme . The RtB acceptance technique tests (AT) uses and backward recovery to accomplish fault tolerance. The technique typically uses one Data re-expression Algorithm(DRA) and one algorithm. A watchdog timer (WDT) is also used and triggers execution of a backup algorithm if the original algorithm does not produce an acceptable result within a specified period of time. The algorithm is executed using the original system input. The primary algorithm's results are examined by an AT that has the same form and purpose as the AT used in the RcB. If the algorithm results pass the AT, then the RtB is complete. However, if the results are not acceptable, then the input is reexpressed and the same primary algorithm runs again using the new, re-expressed, input data. This continues until the AT finds an acceptable result or the WDT deadline is violated. If the deadline expires, a backup algorithm may be invoked to execute on the original input data. 5.1.1 Retry Block Operation (RtB) The RtB technique consists of an executive, an AT, a DRA, a WDT, and primary and backup algorithms. The executive orchestrates the operation of the RtB, which has the general syntax: ensure Acceptance Test byPrimary Algorithm(Original Input) else by Primary Algorithm(Reexpressed Input) else by Primary Algorithm(Reexpressed Input) …… [Deadline Expires] else by Backup algorithm(Original Input) else failure exception Figure 5.1 illustrates the structure and operation of the basic RtB with a WDT. We examine several scenarios to describe RtB operation: • Failure-free operation; • Exception in primary algorithm execution; • Primary's results are on time, but fail AT; successful execution with re-expressed input; • All DRA options are used without success; successful backup execution; • All DRA options are used without success; backup executes, but fails the AT. Retry Block Example 2.3.1 Overview of Data Re-Expression Data re-expression is used to obtain alternate (or diverse) input data by generating logically equivalent input data sets. Given initial data within the program failure region, the re-expressed input data should exist outside that failure region. A re-expression algorithm, R, transforms the original input x to produce the new input, y = R(x). The input y may either approximate x or contain x.s information in a different form. The program, P, and R determine the relationship between P(x) and P( y). Figure 2.3 illustrates basic data re-expression. The requirements for the DRA can be derived from characteristics of the outputs. Other re-expression structures exist. Re-expression with postexecution adjustment (Figure 2.4) allows the DRA to produce more diverse inputs than those produced using the basic structure. A correction, A, is performed on P(y) to undo the distortion produced by the re-expression algorithm, R. If the distortion induced by R can be removed after execution, then this approach allows major changes to the inputs and allows copies of the program to operate in widely separated regions of the input space [45]. In another approach, data re-expression via decomposition and recombination (Figure 2.5), an input x is decomposed into a related set of inputs Structuring Redundancy for Software Fault Tolerance 37 x Execute P Execute P Re-expression y = R(x) P(y) P(x) Figure 2.3 Basic data re-expression method. (Source: [45], © 1988, IEEE. Reprinted with permission.) New data re-expression methods may be developed by variation on the basic method or by entirely new methods and algorithms. and the program is then run on each of these related inputs. The results are then recombined. Basic data re-expression and re-expression with postexecution adjustment allow for both exact and approximate DRAs (defined in the following section). /*- 3. Two-Pass Adjudicators The TPA t/echnique developed by Pullum [7.9], is a set of combination data and design diverse software fault tolerance techniques. TPA is also a combination static and dynamic technique based on the recovery technique required. The hardware fault tolerance architecture related to the technique is N-modular redundancy. The processes can run concurrently on different computers or sequentially on a single computer, but are designed to run concurrently. The TPA technique uses a DM and both forward and backward recovery to accomplish fault tolerance. The technique uses one or more DRA and at least two variants of a program. The system operates like NVP unless and until the DM cannot determine a correct result given the variant results. If this occurs, then the inputs are run through the DRA(s) to be reexpressed. The variants re-execute using the reexpressed data as input (each input is different, one of which may be the original input value). A DM examines the results of the variant executions of this second pass and selects the best result, if one exists. There are a number of alternative detection and selection mechanisms available for use with TPA. These are discussed in Section 5.3.1. 5.3.1 Two-Pass Adjudicator Operation The basic TPA technique consists of an executive, 1 to n DRA, n variants of the program or function, and a DM. The executive orchestrates the TPA technique operation, which has the general syntax: Pass 1: run Variant 1 (original input), Variant input),…, 2(original Variant n(original input) if (Decision Mechanism (Result(Pass 1, Variant 1),Result(Pass1, Variant 2), …, Result(Pass 1, Variant n))) return Result else Pass 2: run DRA 1, DRA 2, …, DRA n run Variant 1(result of DRA 1), Variant 2(result of DRA 2),…, Variant n(result of DRA n) if (Decision Mechanism (Result(Pass 2, Variant 1),Result(Pass 2, Variant 2),…, Result(Pass 2, Variant n))) return Result else failure exception Operations 1 Failure-Free Operation. 2 Partial Failure Scenario.Incorrect Results on First Pass. 3 Failure Scenario.Incorrect Results on Both Passes. 5.3.3 Two-Pass Adjudicator Example Suppose we have a network of cities, all of which must be visited, starting at city node 1. The network of airline flights and flight durations is shown in Figure 5.8. No revisiting of cities is allowed. The objective is to determine the route (or routes) that satisfies the requirements in the least amount of time. Table 5.5 provides the list of all possible routes meeting the requirements. Suppose a program is implemented using TPA solution category I to solve this type of problem. Let the inputs (the flight network information) defining the specific problem be those shown in Figure 5.8. The output of the variant operation is the route letter or the list of cities. If all variants are operating correctly, a majority vote will yield either route A or route B as the correct answer (because, in this example, there are two correct answers and three correctly operating variants for this input domain). Consider the case in which one of the variants fails for this input domain and the resulting decision vector is (A, B, C). A majority voter would raise an exception in this case, even though two of the answers are correct.
© Copyright 2026 Paperzz