Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling Rainer Müller, Daniel Danner, Wolfgang Schröder-Preikschat, Daniel Lohmann July 10, 2014 Sloth: Let the Hardware do the Work! Sloth kernels use hardware for OS purposes, and are concise (200–500 LoC) are small (300–900 bytes) are fast (latency speed-up 2x to 170x) implement industry standards (OSEK, AUTOSAR OS) Rainer Müller Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling 2 Automotive Domain: The AUTOSAR OS Family Families of completely statically configured RTOS OSEK OS / OSEKtime AUTOSAR OS System model with different control-flow types: Basic Tasks software-triggered, run to completion Extended Tasks software-triggered, may block ISRs hardware-triggered, run to completion ... Preemptive, fixed-priority scheduling Rainer Müller Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling 3 Sloth: Building Blocks Main Idea Threads are interrupt handlers, synchronous thread activation is IRQ ⇒ Interrupt subsystem does scheduling and dispatching work ActivateTask(Task1) Hardware HW IRQ Advantages: Periphery prio=1 IRQ Source request Task1 prio=2 IRQ Source request ISR2 Unified control-flow abstraction for prio=3 space Single, shared priority IRQ Source ActivateTask(Task3) request Task3 No priority inversion Predictable event latency ActivateTask(Task4) Rainer Müller prio=4 IRQ Source request Task4 CPU curprio = X IRQ ArbiHW/SW tration Unit events IRQ Vector Table task1() isr2() task3() task4() Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling 4 Sloth: Building Blocks Main Idea Threads are interrupt handlers, synchronous thread activation is IRQ ⇒ Interrupt subsystem does scheduling and dispatching work ActivateTask(Task1) Hardware HW IRQ Advantages: Periphery prio=1 IRQ Source request Task1 prio=2 IRQ Source request ISR2 Unified control-flow abstraction for prio=3 space Single, shared priority IRQ Source ActivateTask(Task3) request Task3 No priority inversion Predictable event latency ActivateTask(Task4) Rainer Müller prio=4 IRQ Source request Task4 CPU curprio = X IRQ ArbiHW/SW tration Unit events IRQ Vector Table task1() isr2() task3() task4() Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling 4 Sloth OS family Sleepy Sloth Sloth on Time + events/blocking + time-triggered features Hofer et al. RTSS’ 11 Hofer et al. RTSS ’12 Sloth event-triggered Hofer et al. RTSS ’09 Safer Sloth Multi Sloth + memory/privilege isolation + multi-core support Danner et al. RTAS ’14 Müller et al. ECRTS’ 14 Rainer Müller Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling 5 Multi Sloth: Design Multi Sloth + multi-core support Müller et al. ECRTS’ 14 AUTOSAR system model: partitioned fixed-priority scheduling Reusing building blocks of Sloth Generative approach: tailoring the kernel to the application and the hardware platform Reference implementation for the Infineon AURIX 32-bit RISC µ-Controller upcoming multi-core platform in the automotive industry 3 integrated TriCore CPUs IRQ system with 256 priority levels and up to 1024 remappable sources Rainer Müller Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling 6 Multi Sloth: Herding Sloths ActivateTask(), SetEvent(), etc. HW Periphery Src 0: Task1 request prio=23 enable core=0 IRQ ICU (IRQ Arbitration) Src 1: ISR1 Src 2: Task2 request prio=24 enable core=0 request enable prio=5 core=1 ICU (IRQ Arbitration) Src 3: Task4 request enable prio=8 core=2 ICU (IRQ Arbitration) Reusing building blocks of Sloth: curprio = X Core 0 IRQ Vector Table task1() isr1() ... Rainer Müller curprio = Y Task Activation Core 1 Task Dispatching Task Blocking IRQ Vector Table Resource Management task2() ... curprio = Z Core 2 IRQ Vector Table task4() ... Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling 7 Multi Sloth: Herding Sloths ActivateTask(), SetEvent(), etc. HW Periphery Src 0: Task1 request prio=23 enable core=0 IRQ ICU (IRQ Arbitration) Src 1: ISR1 Src 2: Task2 request prio=24 enable core=0 request enable prio=5 core=1 ICU (IRQ Arbitration) Src 3: Task4 request enable prio=8 core=2 ICU (IRQ Arbitration) Reusing building blocks of Sloth: curprio = X Core 0 IRQ Vector Table task1() isr1() ... Rainer Müller curprio = Y Task Activation Core 1 Task Dispatching Task Blocking IRQ Vector Table Resource Management task2() ... curprio = Z Core 2 IRQ Vector Table task4() ... Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling 7 Synchronization in AUTOSAR OS Synchronization primitives as specified by AUTOSAR OS Local access on each core ⇒ Resources with OSEK priority-ceiling protocol GetResource() ReleaseResource() Global access across multiple cores ⇒ Spinlocks GetSpinlock() ReleaseSpinlock() TryToGetSpinlock() Rainer Müller Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling 8 MPCP: Resource access synchronization AUTOSAR lacks definiton of semantics for spinlocks Requirement: priority-aware synchronization ⇒ Synchronization protocols Multi-processor Priority Ceiling Protocol (MPCP) published by Rajkumar et al. (RTSS ’88, ICDCS ’90) proposed for AUTOSAR OS by Lakshmanan et al. (SAE 2011) Resuing building blocks to implement MPCP in Multi Sloth Rainer Müller Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling 9 MPCP for AUTOSAR OS System Model All tasks use their assigned priority except in critical sections, all cores share the same priority space Local resource access is synchronized using OSEK single-core PCP Each global resource has a ceiling priority above all task priorities πres = πmax + πtask Acquiring a global resource raises the current execution priority to the ceiling priority If resource is already held by another task, the requesting task blocks When a task leaves a global critical section, the highest-priority task waiting for this global resource is signaled and resumes at the higher priority Rainer Müller Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling 10 MPCP: Implementation for Infineon AURIX GetMPCPResource(): Priority-ordered waiting queue Disable Local IRQs queue as bitmask, one bit for each task accessing this resource most significant bit for highest-priority task marks task holding the lock Disable IRQ Source Raise Source Prio Atomic enqueue with swapmsk Enqueueimplementationnew Wait-free of instruction Infineon AURIX MPCP inon Multi Sloth! swaps single bits in a word according NoOther interference for concurrent actions on the same resource tasks to a bit mask in queue? progress on a per-thread basis Guaranteed yes returns previous data word Acquiring and releasing MPCP resources is wait-free no Raise CPU Prio Block Task Enable Local IRQs Enable IRQ Source Get top of queue with clz count of leading zeros directly determines highest-priority waiting task Resource Acquired Rainer Müller Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling 11 MPCP: Implementation for Infineon AURIX GetMPCPResource(): ReleaseMPCPResource(): Priority-ordered waiting queue Disable Local IRQs Localone IRQsbit for each queue asDisable bitmask, task accessing this resource Reset Source Prio most significant bit for highest-priority task marks task holding the lock Disable IRQ Source Dequeue Raise Source Prio Atomic enqueue with swapmsk Other on Enqueueimplementationnew Infineon AURIX Wait-free of instruction MPCP in tasks Multi Sloth! in queue? swaps single bits in a word according NoOther interference for concurrent actions on the same resource tasks no to a bit mask yes in queue? progress on a per-thread basis Guaranteed yes returns previous data word Get Top Queue Acquiring and releasing MPCP resources is ofwait-free no Enable IRQ Source d un blo Enable Local IRQs Block Task cke Raise CPU Prio Resource Acquired Rainer Müller Unblock Get top of Task queue with clz count leading LowerofCPU Prio zeros directly determines highest-priority waiting task IRQs Enable Local Resource Released Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling 11 MPCP: Implementation for Infineon AURIX GetMPCPResource(): ReleaseMPCPResource(): Priority-ordered waiting queue Disable Local IRQs Localone IRQsbit for each queue asDisable bitmask, task accessing this resource Reset Source Prio most significant bit for highest-priority task marks task holding the lock Disable IRQ Source Dequeue Raise Source Prio Atomic enqueue with swapmsk Other on Enqueueimplementationnew Infineon AURIX Wait-free of instruction MPCP in tasks Multi Sloth! in queue? swaps single bits in a word according NoOther interference for concurrent actions on the same resource tasks no to a bit mask yes in queue? progress on a per-thread basis Guaranteed yes returns previous data word Get Top Queue Acquiring and releasing MPCP resources is ofwait-free no Enable IRQ Source d un blo Enable Local IRQs Block Task cke Raise CPU Prio Resource Acquired Rainer Müller Unblock Get top of Task queue with clz count leading LowerofCPU Prio zeros directly determines highest-priority waiting task IRQs Enable Local Resource Released Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling 11 Evaluation Microbenchmarks of system service overheads Basic system on the Infineon AURIX Transition ActivateTask w/o dispatch ActivateTask w/ dispatch ChainTask w/ dispatch GetResource ReleaseResource w/o dispatch ReleaseResource w/ dispatch TerminateTask w/ dispatch ActivateTask() cross-core round-trip Rainer Müller Cycles 65 87 97 36 19 41 20 135 Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling 12 Evaluation Microbenchmarks of system service overheads MPCP resources on the Infineon AURIX Transition GetMPCPResource() w/ blocking GetMPCPResource() w/o blocking ReleaseMPCPResource() w/ local dispatch ReleaseMPCPResource() w/o dispatch ReleaseMPCPResource() w/ remote unblock ReleaseMPCPResource() w/ local unblock and dispatch ReleaseMPCPResource() w/o unblock w/ dispatch Rainer Müller Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling Cycles 217 112 360 134 183 311 231 13 Conclusion Multi Sloth . . . is an efficient multi-core AUTOSAR OS implements MPCP: multi-core systems beyond AUTOSAR offers low latency with predictable overheads adopts design philosophy of the Sloth kernel family Rainer Müller Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling 14
© Copyright 2026 Paperzz