Presentation

Optimizing the Fault Tolerance Capabilities of
Distributed Real-Time Systems
Abhilash Thekkilakattil, Radu Dobrin, Sasikumar Punnekkat, and
Huseyin Aysan
Mälardalen Real-Time Research Centre, Mälardalen University,
Sweden
Why ?
• Pervasiveness of real-time systems – increased complexity
•
•
Hard and soft tasks leading to different task criticalities
Tasks of mixed criticalities - key characteristic of modern
real time systems
• Dealing with transient errors – task re-executions
•
Different tasks may require different number of re-executions based
on, e.g., task criticalities
• Zonal analysis
• A technique for analyzing the reliability of components in different
locations
• May require the re-execution to be carried out on a different node
• Re-execution constraints
•
•
Multiple re-executions (tasks with multiple alternates)
Re-execution on a different node
What ?
•
Guarantee the re-execution constraints using minimum number of
processors
An NP hard problem - we simplify by introducing Feasibility Windows
 Temporal intervals in which tasks need to execute
- Fault Tolerant Feasibility Windows for critical tasks
- Fault Aware Feasibility Windows for non-critical tasks
Method
•
1
2
3
4
• Find minimum number of processors - max ( U(c  a) , max(mi)1)
• Derive and allocate Fault Tolerant Feasibility Windows for critical tasks
• Derive and allocate Fault Aware Feasibility Windows for non critical
tasks
• Derive scheduler specific attributes to ensure executions are within
the derived windows
Constraints for derivation and allocation of feasibility windows
•
•
•
Minimum size of a window of a task = WCET of the task
Windows of same task instances are disjoint in time
While allocation the processor utilization demand during any interval
should not exceed the size of the interval to avoid overloads
Example
A
FT_FW(B1) FA_FW(C)
B1
C
B1
6
5
8
10
FT_FW(A1)
D
B
FT_FW(B1)
A1
C
FT_FW(A)
Node2
Maximum
fault
occurrence
Node1
Worst Case
FA_FW(C)
B
3
D
5
A2
8
FT_FW(B)
FT_FW(B)
10
FT_FW(A2)
Worst Case
Execution Time (Ci)
Re-executions
required ( Ri )
No. of re-executions
required on a different
node (mi)
Criticality
A
10
2
2
1
C
B
5
2
1
1
C
C
5
D Less than
10
Node1
Time Period
(Ti)
A1
6
C
0
B1
0
C
5
A1 0
6
0
B1
8
maximum
fault
Utilization of critical primaries+alternates= 1.4 , 1.4 = 2
occurrence Max(Number
B of re-executions
D required on aB different node,Dmi) + 1=2
Number of processors 3required = 2
Node2
Better than Worst Case
Task
5
8
N
N
10
10
Conclusion and Future Work
• We have proposed a methodology for the allocation and
scheduling of tasks with mixed criticalities which:
•
•
•
•
•
Allows maximum number of re-executions for the critical tasks
Maximize the service to non-critical tasks
Is scheduler independent
Uses minimum number of processors
Ongoing efforts
•
Incorporate energy awareness mechanisms to make use of the
slack generated to conserve energy by voltage scaling
• Include space redundancy techniques like TMR