White Paper – Renesas R-IN32M3 Industrial Network ASSP HW-RTOS Improved RTOS Performance by Implementation in Silicon Author: Carl Stenquist Renesas Electronics America Inc. May 2014 Abstract A Real Time Operating System (RTOS) is an integral part of an embedded system as applications are becoming more complex. According to one embedded market study1 the use of an RTOS or scheduler is required in more than 68% of applications. The problem with many software RTOS is that they are inherently dependent on the processor’s performance and load. Finding ways to optimize the performance of the RTOS within each CPU architecture is therefore key. This paper describes the performance improvement from an RTOS accelerator implemented in silicon. The Renesas Industrial Network ASSP (R-IN32M3) embeds a “Real-Time OS Accelerator” (HW-RTOS) block that executes common RTOS system calls in hardware including task scheduling, prioritization, as well as managing semaphores and mailbox operations. Benchmarks show that context switching can execute up to 2-3x faster than typical SW-RTOS operation at the same CPU clock speed, with significantly less jitter. I. Introduction An RTOS enables a system to conveniently be divided into subtasks (processes) with clear interfaces between them. The subsystems, that is the RTOS tasks, can then be designed independently. The tasks will then communicate with each other through message queues, semaphores, flags etc. provided by the RTOS services. An RTOS also provides means to easily schedule tasks to make sure time deadlines are met. A typical software RTOS is a kernel library that manages all this. Its algorithms will optimize the operation for task priority level, and distribute access to limited hardware resources. See Figure 1. In a preemptive RTOS, a CPU timer ‘tick’ wakes the kernel at a regular interval to determine if it is time to switch the running task. White Paper – HW-RTOS: Improved RTOS Performance by Implementation in Silicon Page 1 of 11 This operation is well and good for most applications as there are so many other subsystem checks and operations that the processor has to manage. But in industrial networking applications, it is necessary to support the real-time behavior of protocols such as EtherCAT, Ethernet/IP or Profinet IO. Adding a traditional RTOS could reduce the overall speed of the system, and may add jitter. Figure 1: Services that an RTOS provides to an application. Traditional SW-RTOS Application (Software) ■ ■ ■ OS resource secured System call processing Dispatch Task Scheduling OS resource management Tick processing ■ Timer (tick count) ■ ■ SW RTOS Library Hardware Assign OS resource System call is made ■ ■ ■ Benefits of hardware accelerated RTOS In this paper we look at how implementing the RTOS in silicon can lessen administrative CPU overhead. There is no “timer tick” interrupt to determine whether it is time to preempt the currently running task since this is taken care of by a timer in the HW-RTOS block. Since this is done in hardware, there is also inherently less execution jitter. Jitter is caused by varying time for RTOS functions to run, due to system state, number of tasks, resources in use etc. managed by the CPU. II. Software RTOS vs. Hardware RTOS Although the RTOS manages parallel tasks, the actual task sequence is based on how the RTOS manages the CPU. Figure 2 below shows a typical RTOS operation. As Task A is running, an interrupt from a peripheral triggers the RTOS to execute a glue routine that initiates the interrupt handler. As part of the interrupt handler a system call is done based on what is required in the operation. If another Task (B) is required with higher priority, the interrupt handler exits and dispatches the Task B to execute. Interrupt from Peripheral Interrupt System Call Hardware CPU Interrupt Handler RTOS User Task Jitter is introduced due to indeterminate CPU loading Glue Routine Dispatch Task A Task B Figure 2. An RTOS can also manage interrupt routines. When such an interrupt service routine (ISR) is finished, the RTOS determines whether a different task should run instead of the latest one. This is called preemption. (Preemption can also occur at a timer tick/timeout.) However even with a system that uses a modern, fast MCU and SW RTOS, there may be several tasks with conflicting hard deadlines that must be met. This can cause uncertainty. How long will it take for my data input to be read, processed, and for a new output value to be set? How do I calculate worst case? White Paper – HW-RTOS: Improved RTOS Performance by Implementation in Silicon Page 2 of 11 In addition, if the timer tick is set to a high rate to avoid surpassing a deadline, perhaps the constant interruption the scheduler incurs will in itself use up more precious CPU time. Since scheduling takes place in hardware for HW-RTOS (context switching is still done in software) total context switch time will be reduced. Reduced Jitter Even if a worst case timing is determined, when a system later grows, the deadline may no longer be met. An engineer that makes a change to the system may not be aware that some other deadline, outside of his work assignment, may be violated. This is in large part due to jitter. With some of an RTOS’s functionality, such as time management and task scheduling, handled in hard logic and not by CPU instructions, variation of RTOS function execution times due to system state (number of tasks and other OS resources) will diminish. This is for two reasons. First, the execution time needed to read and execute code by a CPU is greater than the time it takes for hard wired logic gates in silicon to run to completion. Secondly, the execution time will vary for a SWOS with the number of tasks, semaphores, flags, queue size etc. The time taken will depend on the system state at the time of the timer tick, and this state may be very complicated in a non-trivial system. To summarize, it may be difficult to work out exactly what the maximum time is for the RTOS to execute a task. However, with HWOS, since scheduling and system resources are managed in hardware and therefore executed in parallel with the CPU and also executed faster than the CPU can, this uncertainty will diminish. Task timing and OS tick offload In a conventional preemptive RTOS an OS “timer tick” regularly interrupts the current task to check whether a higher priority task is ready to run. As a part of this system interrupt the kernel checks if any task has asked for a timeout and is therefore a candidate to run. This tick processing, even if there is no rescheduling, will use up precious CPU time. • In the HW-RTOS, there is no tick interrupt. Timing management is taken care of by the HW-RTOS block in hardware. For HW-RTOS, a running task can be switched: When its internal clock, the OS reference timer, causes preemption for a timeout previously called for by a task. • When a call is made to the RTOS. • When an interrupt occurs. Suppose that at a given moment only one task has made a call to the OS pending for a timeout. For example; one task is waiting for a flag to be set, but with a timeout. If the timeout expires, HW-RTOS will then preempt the running task and reschedule to the task with the highest priority. In the meantime, no time is lost running a timer tick handler, and executing the kernel, when it turns out there are no timeouts pending. In addition to speeding things up, this also helps reduce jitter. For illustration on scheduling and task switching, see to Figure 6. White Paper – HW-RTOS: Improved RTOS Performance by Implementation in Silicon Page 3 of 11 III. HW-RTOS on R-IN32 The R-IN32M is an industrial network ASSP that contains a combination of peripherals and hardware IP blocks that accelerate the processing of Ethernet communication, while being able to manage RTOS operation for complex industrial applications. The device includes either an EtherCAT slave controller with an integrated PHY, or a CC-Link IE slave that supports Gigabit Ethernet performance. There is a SRAM interface, which can be used as a high speed slave port when connecting to a Host CPU. In addition, the R-IN32M3 core CPU has an ARM Cortex-M3 32-bit RISC, with an integrated dual 10/100 MAC, a hardware three port switch, dual Ethernet PHY (-EC version), a dedicated DMA controller, and a separate buffer area for the network processor. R-IN32M3-EC CAN 2ch Cortex-M3 CPU Core 100MHz UART 2ch CSI 2ch 4ch Timer Array Watchdog Timer General Port I2C 2ch Hardware Real-time OS CC-Link Real-time Port Real-time Port DMAC 1ch Internal RAM with ECC General DMAC 4ch Instruction 768KB Serial Flash ROM I/F Data 512KB SRAM I/F or Host CPU I/F Buffer 64KB Ethernet Accelerator Check-sum/ Header ENDEC Buffer Allocator/ Buffer Manager EtherCAT Slave Controller Ether MAC 2-port Switch 2 ports ETHER PHY 100 Tx/Rx In conjunction with the HW-RTOS, the Ethernet Accelerator on the R-IN32 will help to achieve more deterministic communication. (Less jitter + higher speed). SW-RTOS functions done In hardware If we compare Figure 1 that shows a typical SW-RTOS, where the resource management, queuing, task scheduling are done in the SW-RTOS kernel, the HW-RTOS on the R-IN32M3 executes the system calls, including the scheduling and tick processing. The advantage is that you can use the same system call command using standard SW-RTOS API but some functions are accelerated within the HW-RTOS block. Traditional SW-RTOS Application (Software) ■ ■ ■ ■ SW RTOS Library ■ ■ ■ ■ Assign OS resource System call is made OS resource secured System call processing Dispatch Task Scheduling OS resource management Tick processing RTOS Accelerator in HW ■ ■ ■ ■ ■ ■ Timer (tick count) OS resource secured System call processing Dispatch HW-RTOS ■ ■ ■ Hardware Assign OS resource System call is made ■ System call execution Task Scheduling OS resource management Tick processing Figure 3. Functional diagram of how functionality has moved from software to hardware. White Paper – HW-RTOS: Improved RTOS Performance by Implementation in Silicon Page 4 of 11 HW-RTOS blocks Figure 4 shows the HW-RTOS scheduler block and OS resources, the CPU, the instruction/data bus, and how interrupts are routed. Interrupt HW-RTOS Interrupt (x1) Task Scheduler CPU (Cortex M3 on R-IN32M3) Hardware ISR Data RAM System Timer Current Task SP_table[] OS Resource System Call Reg. Task Event Semaphore Mailbox BUS I/F (AHB Bridge on R-IN32M3) Stack Task 1 Task n BUS (AHB on R-IN32M3) Instruction Memory OS Library Figure 4. Block diagram of HW-RTOS in the R-IN32M3. “Hardware ISR”, the top yellow box in HW-RTOS block takes care of interrupts, except “x1”which is issued to the CPU to call the HW-RTOS driver library (bottom of picture). The HW-RTOS provides semaphores, mailboxes, flags and mutexes. It has OS management calls to put a task to sleep, rotate task precedence, disable OS dispatching, etc. The HW-RTOS has a “hard” interrupt mechanism where preregistered service calls can be automatically run when a particular interrupt occurs. These automatic interrupt service calls can be semaphore or flag signaling, or to wake up a task. No software is involved. Tasks, semaphores, flags, mutexes, mailboxes etc can be created statically at compile time, or dynamically as appropriate at runtime. Figure 5 below shows the amount of resource managing and communication objects available for HW-RTOS on the R-IN32M3. HW-RTOS on R-IN32M3 Total number of contexts that can be handled 64 Number of context priorities 16 Number of semaphores (binary or counting) and mutexes Total 128 Number of events 64 Number of mailboxes 64 Number of mailbox messages 192 HW-ISRs Max 32, selectable from 128 QINTs Figure 5: Table of HW-RTOS resources White Paper – HW-RTOS: Improved RTOS Performance by Implementation in Silicon Page 5 of 11 API The HW-RTOS is written with both uITRON RTOS standard API and uC/OS-III HW-RTOS API as templates. There are some 30 system calls for resources such as event flags, semaphores, and mailboxes. Priorities The task with the highest priority (lowest number) is run when the scheduler is invoked. Several tasks may have the same priority. In that case tasks are scheduled by a FCFS (First-Come First Served) mechanism that can manage up to 32 tasks. Task scheduling There is no need for preemptive scheduling using a “timer tick”. This is because tasks may be scheduled to run by any of the following causes: 1. At a specific time, that is a certain system clock value. For example; receive from a mailbox with timeout, or lock a mutex with a timeout. 2. A HW-RTOS (“system”) call is made from a task, at which time the kernel sees that another task has a higher priority. 3. When an interrupt occurs: a. A system call can be made from an interrupt without any software being involved. This feature must be set up at compile time in the “Hardware ISR” table. An entry causes a certain interrupt to make a flag or semaphore call. b. A system call is made from a SW ISR. Figure 6 shows how HW-RTOS determines execution flow when a non-interrupt system call is made. The actual context switch is done by a driver library. Task A Task B Driver System call HW-RTOS Set HW-RTOS registers (converts request to HW setting) Result NO Return to Task A Context switch? System call operation picks next task Error code Next task ID YES Save current context (register set) Change context – stack pointer Restore next context (register set) Run Task B Figure 6. Non-interrupt system call execution flow. HW-RTOS determines what task to execute and a driver does the context switch. White Paper – HW-RTOS: Improved RTOS Performance by Implementation in Silicon Page 6 of 11 Queues There is no queuing services in the HW-RTOS API, instead mailbox services with flexible priority schemes is incorporated. Each mailbox is either consumed in FIFO order, in task priority order, or by message priority order. Each message contains the mailbox ID and a pointer to the actual message data. Interrupts, the HW-ISR When a task timeout expires and HW-RTOS determines it is time to reschedule, a dedicated interrupt is reserved in the ARM core. The CPU services this interrupt and relays execution to the task selected by HW-RTOS. To increase speed, a user can instead of using a software ISR preconfigure the HW-ISR table to perform certain services; signal a flag set, post to a semaphore, or wake up a task. A software ISR routine is in that case not even necessary. This is illustrated in Figure 7. SW-RTOS case HW-RTOS case Task A Task A Task B Interrupt ISR Task B (Dedicated to signal semaphore) wait_semaphore Interrupt used to signal semaphore, ISR not needed wait_semaphore Interrupt HW-RTOS X Wake up sig_sem() called by HW-RTOS sig_sem() Wake up RTOS runs interrupt to change task. ISR is not necessary, saving time. Figure 7. If the static HW-ISR table is prepared at compile time, interrupt service routines in software can be omitted. Interrupt from Peripheral HW-ISR is processed before CPU-ISR Interrupt Hardware Context switch here – But only if needed HW-ISR including system call processing Interrupt to ISR only if ISR exists *For example Set flag, Sig Sem, Rel Wai, Wup Tsk CPU Context switch (+ run of SW-ISR) RTOS User Task Task A Task A is uninterrupted while HW-ISR runs... Task B if no SW-ISR Task B only if context switch due to HW-ISR Task B ...and Task A may continue unless context switch due to ISR system call Figure 8. HW- and SW-ISR processing in greater detail with time on horizontal axis. White Paper – HW-RTOS: Improved RTOS Performance by Implementation in Silicon Page 7 of 11 Mutex HW-RTOS does not protect against deadlock / priority inversion for mutexes. Priority inversion is when a high priority task is waiting for a suspended low priority task occupying a resource. Since HW-RTOS doesn’t support priority inheritance, this must be added by user software. Other features Here are some other features worth mentioning: • Release task from waiting, wakeup a task, cancel wakeup, and put the calling task to sleep with timeout option. • Task delay argument is 32 bits; 1 ms to 1100 hours. • Ethernet MAC with built-in DMAC. Features not available • Priority inheritance (inversion) or the priority ceiling protocol of uITRON. This would need to be done by software. • Deadlock detection/avoidance on non mutex resources. However, you can break out of a deadlock with a timeout. • Stack over-/under-flow surveillance in hardware and software. IV. Performance Test A performance test was done using a Tessera R-IN32M3-EC evaluation board, connected to a Windows 7 64-bit machine. HW-RTOS vs. off-the-shelf SW-RTOS The author ran some benchmarks between HW-RTOS and Micrium’s SW-RTOS uCOS-III, ported for the R-IN32M3. Observed that the uC/OS-III RTOS used did not use the HW-RTOS block. (Such a port has since been developed.) The tests were run using an R-IN32M3-EC board and IAR toolchain (ARM 6.70). As OS reference timer the system clock is used. This was 100 MHz in the studied system. The author only ran tests to compare usage of flags and semaphores, with and without preemption. Only semaphore and event flags were tested; with and without context switching for each respective call. The author found that HW-RTOS’s main benefit on the R-IN32 is for applications that have heavy task switching. That is, the user software processes are often swapped in and out. This is common e.g. for motor control systems. HW-RTOS showed task switching operation to be over twice as fast for most calls. Without context switch — just system call then proceed with same task — the speed of HW-RTOS did not change much. Here, the SW RTOS did much better and was in fact faster than HW-RTOS. For SW RTOS, the measured time with context switch was around 5 to 8 times that of the time for the same SW RTOS call without context switch. The following is a list of situations where the author saw noteworthy benefits in the number of microseconds it took for a switch tasks. In all these cases, a task switch occurs. 1. A task calls the RTOS, and there is another task waiting that has a higher priority. 2. A task is released from waiting (pending) on the RTOS to release a resource (e.g. flag, semaphore) that is not available at the moment. This was the case both for a resource released via another task, and (even more so) when a resource is released via a call from an interrupt. White Paper – HW-RTOS: Improved RTOS Performance by Implementation in Silicon Page 8 of 11 3. A task has previously called HW-RTOS asking to be awoken at a specific time. OS-calls that did not result in a task switch did not result in any improvement. These were actually slower. HWRTOS Category Scenario Test type Clocks @100 MHz. Green is faster. Start/Create task Semaphore 175 268 0 stat, 188 dyn. 86 No context switch 137 74 With task switch 168 497 Non-block 156 76 With task switch 201 399 0 stat, 190 dyn. 79 No context switch 149 79 With task switch 191 529 Non-block 183 119 With task switch 202 480 End of ISR to task waiting for flag resume 147 186 (823*) End of ISR to task waiting for semaphore resume 142 (Not measured) Create Signal (Post) Wait (Pend) Event Flag Create Set (Post) Wait (Pend) Interrupt context switch Micrium Preempt Figure 9. Actual measure made by author comparing HW-RTOS with a traditional SW-RTOS* using the R-IN32-EC board. *Note that since this paper was written, the Micrium uc/OS-III HW-RTOS has been developed for the R-IN32M3. Memory footprint The writer found when testing that compared with the used SW-RTOS (uCOS-III) around 25% less RAM, and around 15% less flash was used. Larger memory needs of a SWRTOS typically consist of the tasks’ stacks and space requirements to store data structures and actual RTOS program code. Jitter Jitter is as we said a variation in task execution over time. Lowering jitter will reduce any risk of varying or unexpected behavior. That is, it will result in a more deterministic performance. Improvement in jitter was not measured by the author as this requires a larger project using the RTOS. An internal study from Renesas Japan roughly estimated that there was a 20%-80% reduction improvement in the jitter against a software RTOS implementation on a different MCU. This showed that the HW-RTOS has a much more consistent (stable) execution period. White Paper – HW-RTOS: Improved RTOS Performance by Implementation in Silicon Page 9 of 11 0 2 4 6 8 [µs] pol_sem clr_flg SW-RTOS wai_flg R-IN32M3 System call SW-RTOS OS processing time (event flag) R-IN32M3 set_flg sta_tsk(act_tsk) ter_tsk ext_tsk System call OS processing time (task mng) OS processing time (semaphore) 2 4 6 8 [µs] 0 2 4 6 8 [µs] System call R-IN32M3 SW-RTOS R-IN32M3 snd_mbx wai_sem SW-RTOS rcv_mbx OS processing time (mailbox) sig_sem System call pol_sem 0 0 2 4 6 8 [µs] Figure 10: Chart shows a comparison between the R-IN32M3 using HW-RTOS, and a comparable MCU running at 100MHz, using a SW-RTOS. V. Summary In this paper we analyzed the features and performance improvements using the “Real-Time OS Accelerator” (HW-RTOS) hardware IP on the Renesas R-IN32M3 industrial networking ASSP. Benchmarks showed that tasks could execute up to 3x faster than typical SW-RTOS operation at the same CPU clock speed, and at significantly less jitter. Compared with a typical software RTOS operation that is basically sequential, the HW-RTOS is closely tied to the CPU and allows for interrupt handling while not interrupting the current task, and that the performance is not dependent on the number of task switching. By simply doing the system calls through familiar RTOS environments such as uItron or uC/OS-III HWOS, one can easily manage multiple tasks while having the hardware IP do the heavy load of resource management and prioritization. So the R-IN32M3 HW-RTOS does help to improve RTOS operation, and would be more evident and advantageous for industrial networking applications – which matches the R-IN32M3 target. On the other hand, having an accelerator for RTOS in silicon would in fact benefit to a wider range of applications. White Paper – HW-RTOS: Improved RTOS Performance by Implementation in Silicon Page 10 of 11 References 1. 2013 Embedded Market Study, UBM Tech Electronics, April 2013 2. “R-IN32M3 Series Programming Manual (OS edition)”, doc. nr r18uz0011ej0300_rin32m3. 3. “Hardware Real-Time Operating System for FPGA based embedded systems”, by Anders Blaabjerg Lange. June 2011. 4. uITRON specification 4.0 http://www.t-engine.org/wp-content/themes/wp.vicuna/pdf/specifications/en_US/WG024S001-04.03.00_en.pdf 5. Micrium, Hardware-Accelerated RTOS: µC/OS-III HW-RTOS and the R-IN32M3. White Paper – HW-RTOS: Improved RTOS Performance by Implementation in Silicon Page 11 of 11
© Copyright 2026 Paperzz