Operating Systems Should
Manage Accelerators
Sankaralingam Panneerselvam
Michael M. Swift
Computer Sciences Department
University of Wisconsin, Madison, WI
1
HotPar 2012
Accelerators
Specialized execution units
for common tasks
Wire speed processor
Data parallel tasks, encryption,
video encoding, XML parsing,
network processing etc.
Task offloaded to
accelerators
APU
GPU
High performance, energy
efficient (or both)
DySER
Crypto accelerator
2
HotPar 2012
Why accelerators?
Moore’s law and Dennard’s scaling
Single core to Multi-core
Dark Silicon: More transistors but limited power
budget [Esmaeilzadeh et. al - ISCA ‘11]
Accelerators
Trade-off area for specialized logic
Performance up by 250x and energy efficiency by
500x [Hameed et. al – ISCA ’10]
Heterogeneous architectures
3
HotPar 2012
Heterogeneity everywhere
Focus on diverse heterogeneous systems
Our work
4
Processing elements with different properties in the
same system
How can OS support this architecture?
1. Abstract heterogeneity
2. Enable flexible task execution
3. Multiplex shared accelerators among
applications
HotPar 2012
Outline
Motivation
Classification of accelerators
Challenges
Our model - Rinnegan
Conclusion
5
HotPar 2012
IBM wire-speed processor
CPU cores
Accelerator Complex
Memory
controller
6
HotPar 2012
Classification of accelerators
Accelerators classified based on their
accessibility
Acceleration devices
Co-Processor
Asymmetric cores
7
HotPar 2012
Classification
Properties
Acceleration
devices
Co-Processor
Asymmetric cores
Systems
GPU, APU, Crypto
accelerators in Sun
Niagara chips etc.
C-cores, DySER, IBM
wire-speed processor
NVIDIA’s Kal-El
processor, Scalable
cores like WiDGET
Location
Off/On-chip device
On-chip device
Regular core that can
execute threads
Accessibility
1) Application must deal with different classes of accelerators with
Control
Device drivers
ISAmechanisms
instructions
Thread migration,
different access
RPC
2) Resources to be shared among multiple applications
Data
Resource contention
8
DMA or zero copy
System memory
Cache coherency
Multiple applications
might want to make
use of the device
Sharing of resource
across multiple cores
could result in
contention
Multiple threads
might want to
access the special
core
HotPar 2012
Challenges
Task invocation
Virtualization
Sharing across multiple applications
Scheduling
9
Execution on different processing elements
Time multiplexing the accelerators
HotPar 2012
Task invocation
Current systems decide statically
on where to execute the task
Programmer problems where
system can help
Data granularity
Power budget limitation
Presence/availability of
accelerator
10
HotPar 2012
AES
Crypto accelerator
Virtualization
Process preemption after
launching a computation
11
System must be aware of
computation completion
HotPar 2012
0x000
0x100
0x200
0x500
0x400
0x4800
0x4900
0x5000
Accelerators need to understand
virtual addresses
OS must provide virtual-address
translations
CPU cores
Virtual
memory
crypto
XML
crypt 0x240
Data addressing for accelerators
Accelerator
units
Scheduling
Scheduling needed to
prioritize access to shared
accelerators
User-mode access to
accelerators complicates
scheduling decisions
12
OS cannot interpose on every
request
HotPar 2012
Outline
Motivation
Classification of Accelerators
Challenges
Our model - Rinnegan
Conclusion
13
HotPar 2012
Rinnegan
Goals
Application leverage the heterogeneity
OS enforces system wide policies
Rinnegan tries to provide support for different
class of accelerators
14
Flexible task execution by Accelerator stub
Accelerator agent for safe multiplexing
Enforcing system policy through Accelerator
monitor
HotPar 2012
Accelerator stub
Entry point for accelerator
invocation and a dispatcher
Abstracts different processing
elements
Binding - Selecting the best
implementation for the task
{
Encrypt
through AES
instructions
}
Static: during compilation
Early: at process start
Late: at call
15
{
Offload to
crypto
Accelerator
}
Stub
Process making
use of
accelerator
HotPar 2012
Accelerator agent
Manages an accelerator
Roles of an agent
Accelerator agents
Asymmetric
core
Hardware
Thread
migration
Device
driver
Resource
allocator
Kernel Space
Stub
Stub
Process trying to offload a
parallel task
Process can
communicate
directly with
accelerator
Virtualize accelerator
Forwarding requests to
the accelerator
Implement scheduling
decisions
Expose accounting
information to the OS
User Space
16
HotPar 2012
Accelerator monitor
Accelerator
Hardware
Accelerator
agent
Kernel
Space
Stub
User
Space
Process
making use of
accelerator
17
Monitors information like
utilization of resources
Responsible for systemwide energy and
performance goals
Acts as an online modeling
tool
Accelerator
monitor
Helps in deciding where to
schedule the task
HotPar 2012
Programming model
Based on task level
parallelism
Task creation and execution
Write: different
implementations
Launch: similar to function
invocation
Fetching result: supporting
synchronous/asynchronous
invocation
Example: Intel TBB/Cilk
18
application()
{
task = cryptStub(inputData,
outputData);
...
waitFor(task);
}
<TaskHandle> cryptStub(input,
output)
{
if(factor1 == true)
output = implmn_AES(input)
//or//
else if(factor2 == true)
output = implmn_CACC(input)
}
HotPar 2012
Summary
Kernel Space
CPU cores
Accelerator
Other kernel components
Accelerator agent
Accelerator
monitor
Stub
User Space
19
Process
Process making use of
accelerator
HotPar 2012
Direct
communication
with accelerator
Related work
Task invocation based on data granularity
Generating device specific implementation of
same task
Merge, Harmony
GPUOcelot
Resource scheduling: Sharing between multiple
applications
20
PTask, Pegasus
HotPar 2012
Conclusions
Accelerators common in future systems
Rinnegan embraces accelerators through stub
mechanism, agent and monitoring components
Other challenges
21
Data movement between devices
Power measurement
HotPar 2012
Thank You
22
HotPar 2012
© Copyright 2026 Paperzz