Multi Sloth: An Efficient Multi-Core RTOS using Hardware

Multi Sloth:
An Efficient Multi-Core RTOS using
Hardware-Based Scheduling
Rainer Müller, Daniel Danner,
Wolfgang Schröder-Preikschat, Daniel Lohmann
July 10, 2014
Sloth: Let the Hardware do the Work!
Sloth kernels use hardware for OS purposes, and
are concise (200–500 LoC)
are small (300–900 bytes)
are fast (latency speed-up 2x to 170x)
implement industry standards (OSEK, AUTOSAR OS)
Rainer Müller
Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling
2
Automotive Domain: The AUTOSAR OS Family
Families of completely statically configured RTOS
OSEK OS / OSEKtime
AUTOSAR OS
System model with different control-flow types:
Basic Tasks
software-triggered, run to completion
Extended Tasks software-triggered, may block
ISRs
hardware-triggered, run to completion
...
Preemptive, fixed-priority scheduling
Rainer Müller
Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling
3
Sloth: Building Blocks
Main Idea
Threads are interrupt handlers, synchronous thread activation is IRQ
⇒ Interrupt subsystem does scheduling and dispatching work
ActivateTask(Task1)
Hardware
HW IRQ
Advantages:
Periphery
prio=1 IRQ Source
request
Task1
prio=2 IRQ Source
request
ISR2
Unified control-flow abstraction for
prio=3 space
Single, shared priority
IRQ Source
ActivateTask(Task3)
request
Task3
No priority inversion
Predictable event latency
ActivateTask(Task4)
Rainer Müller
prio=4 IRQ Source
request
Task4
CPU
curprio = X
IRQ
ArbiHW/SW
tration
Unit
events
IRQ Vector
Table
task1()
isr2()
task3()
task4()
Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling
4
Sloth: Building Blocks
Main Idea
Threads are interrupt handlers, synchronous thread activation is IRQ
⇒ Interrupt subsystem does scheduling and dispatching work
ActivateTask(Task1)
Hardware
HW IRQ
Advantages:
Periphery
prio=1 IRQ Source
request
Task1
prio=2 IRQ Source
request
ISR2
Unified control-flow abstraction for
prio=3 space
Single, shared priority
IRQ Source
ActivateTask(Task3)
request
Task3
No priority inversion
Predictable event latency
ActivateTask(Task4)
Rainer Müller
prio=4 IRQ Source
request
Task4
CPU
curprio = X
IRQ
ArbiHW/SW
tration
Unit
events
IRQ Vector
Table
task1()
isr2()
task3()
task4()
Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling
4
Sloth OS family
Sleepy Sloth
Sloth on Time
+ events/blocking
+ time-triggered features
Hofer et al. RTSS’ 11
Hofer et al. RTSS ’12
Sloth
event-triggered
Hofer et al. RTSS ’09
Safer Sloth
Multi Sloth
+ memory/privilege isolation
+ multi-core support
Danner et al. RTAS ’14
Müller et al. ECRTS’ 14
Rainer Müller
Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling
5
Multi Sloth: Design
Multi Sloth
+ multi-core support
Müller et al. ECRTS’ 14
AUTOSAR system model: partitioned fixed-priority scheduling
Reusing building blocks of Sloth
Generative approach:
tailoring the kernel to the application and the hardware platform
Reference implementation for the Infineon AURIX
32-bit RISC µ-Controller
upcoming multi-core platform in the automotive industry
3 integrated TriCore CPUs
IRQ system with 256 priority levels and up to 1024 remappable sources
Rainer Müller
Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling
6
Multi Sloth: Herding Sloths
ActivateTask(), SetEvent(), etc.
HW Periphery
Src 0: Task1
request prio=23
enable core=0
IRQ
ICU
(IRQ Arbitration)
Src 1: ISR1
Src 2: Task2
request prio=24
enable core=0
request
enable
prio=5
core=1
ICU
(IRQ Arbitration)
Src 3: Task4
request
enable
prio=8
core=2
ICU
(IRQ Arbitration)
Reusing building blocks of Sloth:
curprio = X
Core 0
IRQ Vector Table
task1()
isr1()
...
Rainer Müller
curprio = Y
Task Activation
Core 1
Task Dispatching
Task Blocking
IRQ
Vector Table
Resource
Management
task2()
...
curprio = Z
Core 2
IRQ Vector Table
task4()
...
Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling
7
Multi Sloth: Herding Sloths
ActivateTask(), SetEvent(), etc.
HW Periphery
Src 0: Task1
request prio=23
enable core=0
IRQ
ICU
(IRQ Arbitration)
Src 1: ISR1
Src 2: Task2
request prio=24
enable core=0
request
enable
prio=5
core=1
ICU
(IRQ Arbitration)
Src 3: Task4
request
enable
prio=8
core=2
ICU
(IRQ Arbitration)
Reusing building blocks of Sloth:
curprio = X
Core 0
IRQ Vector Table
task1()
isr1()
...
Rainer Müller
curprio = Y
Task Activation
Core 1
Task Dispatching
Task Blocking
IRQ
Vector Table
Resource
Management
task2()
...
curprio = Z
Core 2
IRQ Vector Table
task4()
...
Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling
7
Synchronization in AUTOSAR OS
Synchronization primitives as specified by AUTOSAR OS
Local access on each core
⇒ Resources with OSEK priority-ceiling protocol
GetResource()
ReleaseResource()
Global access across multiple cores
⇒ Spinlocks
GetSpinlock()
ReleaseSpinlock()
TryToGetSpinlock()
Rainer Müller
Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling
8
MPCP: Resource access synchronization
AUTOSAR lacks definiton of semantics for spinlocks
Requirement: priority-aware synchronization
⇒ Synchronization protocols
Multi-processor Priority Ceiling Protocol (MPCP)
published by Rajkumar et al. (RTSS ’88, ICDCS ’90)
proposed for AUTOSAR OS by Lakshmanan et al. (SAE 2011)
Resuing building blocks to implement MPCP in Multi Sloth
Rainer Müller
Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling
9
MPCP for AUTOSAR OS System Model
All tasks use their assigned priority except in critical sections,
all cores share the same priority space
Local resource access is synchronized using OSEK single-core PCP
Each global resource has a ceiling priority above all task priorities
πres = πmax + πtask
Acquiring a global resource raises the current execution priority to
the ceiling priority
If resource is already held by another task, the requesting task blocks
When a task leaves a global critical section, the highest-priority task
waiting for this global resource is signaled and resumes at the higher
priority
Rainer Müller
Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling
10
MPCP: Implementation for Infineon AURIX
GetMPCPResource():
Priority-ordered waiting queue
Disable Local IRQs
queue as bitmask, one bit for each
task accessing this resource
most significant bit for highest-priority
task marks task holding the lock
Disable IRQ Source
Raise Source Prio
Atomic enqueue with swapmsk
Enqueueimplementationnew
Wait-free
of
instruction
Infineon
AURIX
MPCP inon
Multi
Sloth!
swaps single bits in a word according
NoOther
interference
for concurrent actions on the same resource
tasks
to a bit mask
in queue? progress on a per-thread basis
Guaranteed
yes
returns previous data word
Acquiring
and releasing MPCP resources is wait-free
no
Raise CPU Prio
Block Task
Enable Local IRQs
Enable IRQ Source
Get top of queue with clz
count of leading zeros directly
determines highest-priority
waiting task
Resource Acquired
Rainer Müller
Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling
11
MPCP: Implementation for Infineon AURIX
GetMPCPResource():
ReleaseMPCPResource():
Priority-ordered waiting queue
Disable Local IRQs
Localone
IRQsbit for each
queue asDisable
bitmask,
task accessing this resource
Reset Source Prio
most significant
bit for highest-priority
task marks task holding the lock
Disable IRQ Source
Dequeue
Raise Source Prio
Atomic enqueue with swapmsk
Other on
Enqueueimplementationnew
Infineon
AURIX
Wait-free
of instruction
MPCP
in tasks
Multi
Sloth!
in queue?
swaps single bits in a word according
NoOther
interference
for concurrent actions on the same resource
tasks
no
to a bit mask yes
in queue? progress on a per-thread basis
Guaranteed
yes
returns previous data word
Get Top
Queue
Acquiring
and releasing MPCP resources
is ofwait-free
no
Enable IRQ Source
d
un
blo
Enable Local IRQs
Block Task
cke
Raise CPU Prio
Resource Acquired
Rainer Müller
Unblock
Get top
of Task
queue with clz
count
leading
LowerofCPU
Prio zeros directly
determines highest-priority
waiting
task IRQs
Enable Local
Resource Released
Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling
11
MPCP: Implementation for Infineon AURIX
GetMPCPResource():
ReleaseMPCPResource():
Priority-ordered waiting queue
Disable Local IRQs
Localone
IRQsbit for each
queue asDisable
bitmask,
task accessing this resource
Reset Source Prio
most significant
bit for highest-priority
task marks task holding the lock
Disable IRQ Source
Dequeue
Raise Source Prio
Atomic enqueue with swapmsk
Other on
Enqueueimplementationnew
Infineon
AURIX
Wait-free
of instruction
MPCP
in tasks
Multi
Sloth!
in queue?
swaps single bits in a word according
NoOther
interference
for concurrent actions on the same resource
tasks
no
to a bit mask yes
in queue? progress on a per-thread basis
Guaranteed
yes
returns previous data word
Get Top
Queue
Acquiring
and releasing MPCP resources
is ofwait-free
no
Enable IRQ Source
d
un
blo
Enable Local IRQs
Block Task
cke
Raise CPU Prio
Resource Acquired
Rainer Müller
Unblock
Get top
of Task
queue with clz
count
leading
LowerofCPU
Prio zeros directly
determines highest-priority
waiting
task IRQs
Enable Local
Resource Released
Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling
11
Evaluation
Microbenchmarks of system service overheads
Basic system on the Infineon AURIX
Transition
ActivateTask w/o dispatch
ActivateTask w/ dispatch
ChainTask w/ dispatch
GetResource
ReleaseResource w/o dispatch
ReleaseResource w/ dispatch
TerminateTask w/ dispatch
ActivateTask() cross-core round-trip
Rainer Müller
Cycles
65
87
97
36
19
41
20
135
Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling
12
Evaluation
Microbenchmarks of system service overheads
MPCP resources on the Infineon AURIX
Transition
GetMPCPResource() w/ blocking
GetMPCPResource() w/o blocking
ReleaseMPCPResource() w/ local dispatch
ReleaseMPCPResource() w/o dispatch
ReleaseMPCPResource() w/ remote unblock
ReleaseMPCPResource() w/ local unblock and dispatch
ReleaseMPCPResource() w/o unblock w/ dispatch
Rainer Müller
Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling
Cycles
217
112
360
134
183
311
231
13
Conclusion
Multi Sloth . . .
is an efficient multi-core AUTOSAR OS
implements MPCP: multi-core systems beyond AUTOSAR
offers low latency with predictable overheads
adopts design philosophy of the Sloth kernel family
Rainer Müller
Multi Sloth: An Efficient Multi-Core RTOS using Hardware-Based Scheduling
14