Optimizing the Fault Tolerance Capabilities of Distributed Real-Time Systems Abhilash Thekkilakattil, Radu Dobrin, Sasikumar Punnekkat, and Huseyin Aysan Mälardalen Real-Time Research Centre, Mälardalen University, Sweden Why ? • Pervasiveness of real-time systems – increased complexity • • Hard and soft tasks leading to different task criticalities Tasks of mixed criticalities - key characteristic of modern real time systems • Dealing with transient errors – task re-executions • Different tasks may require different number of re-executions based on, e.g., task criticalities • Zonal analysis • A technique for analyzing the reliability of components in different locations • May require the re-execution to be carried out on a different node • Re-execution constraints • • Multiple re-executions (tasks with multiple alternates) Re-execution on a different node What ? • Guarantee the re-execution constraints using minimum number of processors An NP hard problem - we simplify by introducing Feasibility Windows Temporal intervals in which tasks need to execute - Fault Tolerant Feasibility Windows for critical tasks - Fault Aware Feasibility Windows for non-critical tasks Method • 1 2 3 4 • Find minimum number of processors - max ( U(c a) , max(mi)1) • Derive and allocate Fault Tolerant Feasibility Windows for critical tasks • Derive and allocate Fault Aware Feasibility Windows for non critical tasks • Derive scheduler specific attributes to ensure executions are within the derived windows Constraints for derivation and allocation of feasibility windows • • • Minimum size of a window of a task = WCET of the task Windows of same task instances are disjoint in time While allocation the processor utilization demand during any interval should not exceed the size of the interval to avoid overloads Example A FT_FW(B1) FA_FW(C) B1 C B1 6 5 8 10 FT_FW(A1) D B FT_FW(B1) A1 C FT_FW(A) Node2 Maximum fault occurrence Node1 Worst Case FA_FW(C) B 3 D 5 A2 8 FT_FW(B) FT_FW(B) 10 FT_FW(A2) Worst Case Execution Time (Ci) Re-executions required ( Ri ) No. of re-executions required on a different node (mi) Criticality A 10 2 2 1 C B 5 2 1 1 C C 5 D Less than 10 Node1 Time Period (Ti) A1 6 C 0 B1 0 C 5 A1 0 6 0 B1 8 maximum fault Utilization of critical primaries+alternates= 1.4 , 1.4 = 2 occurrence Max(Number B of re-executions D required on aB different node,Dmi) + 1=2 Number of processors 3required = 2 Node2 Better than Worst Case Task 5 8 N N 10 10 Conclusion and Future Work • We have proposed a methodology for the allocation and scheduling of tasks with mixed criticalities which: • • • • • Allows maximum number of re-executions for the critical tasks Maximize the service to non-critical tasks Is scheduler independent Uses minimum number of processors Ongoing efforts • Incorporate energy awareness mechanisms to make use of the slack generated to conserve energy by voltage scaling • Include space redundancy techniques like TMR
© Copyright 2026 Paperzz