Arrakis: The Operating System is the Control Plane

Arrakis: The Operating System is the Control Plane
Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug Woos, Arvind
Krishnamurthy, Thomas Anderson, Timothy Roscoe∗
University of Washington, ∗ ETH Zurich
Symposium on Operating Systems Design and
Implementation (OSDI), October 2014
1 / 10
Motivation
Fast, low-latency hardware: Ethernet, SSD
I/O-intensive applications
OS on critical (data) path
2 / 10
App
App
Userspace
Core
Kernel
Fast, low-latency hardware: Ethernet, SSD
I/O-intensive applications
Core
Core
OS on critical (data) path
Incoming Q's Outgoing Q's
System call duration [us]
Motivation
NIC
Figure 1: Linux networking architecture and workflow.
removed, there is an opportunity to rethink the POSIX API
for more streamlined networking. In addition to a POSIX
compatible interface, Arrakis provides a native interface
(Arrakis/N) which supports true zero-copy I/O.
Fig
im
Err
by
req
2.2 Storage Stack Overheads
fou
cal
To illustrate the overhead of today’s OS storage stacks,
to
we conduct an experiment, where we execute small write
operations immediately followed by an fsync1 system call 2 / 10ma
Motivation
Receiver Running
Network Stack
in
out
Scheduler
CPU Idle
1.26
1.05
(37.6 %)
(31.3 %)
1.24
1.42
(20.0 %)
(22.9 %)
0.17
(5.0 %)
2.40
(38.8 %)
Copy
in
out
0.24
0.44
(7.1 %)
(13.2 %)
0.25
0.55
(4.0 %)
(8.9 %)
Kernel Crossing
return
syscall
0.10
0.10
(2.9 %)
(2.9 %)
0.20
0.13
(3.3 %)
(2.1 %)
App
Core
Core
App
Userspace
Core
Kernel
Incoming Q's Outgoing Q's
NIC
Figure 1: Linux networking architecture and workflow.
Total
3.36
(σ = 0.66)
6.19
(σ = 0.82)
UDP echo: 1,024 Byte payload, 1,000 samples, times in µs
removed, there is an opportunity to rethink the POSIX API
for more streamlined networking. In addition to a POSIX
compatible interface, Arrakis provides a native interface
(Arrakis/N) which supports true zero-copy I/O.
2.2
Storage Stack Overheads
To illustrate the overhead of today’s OS storage stacks,
we conduct an experiment, where we execute small write
operations immediately followed by an fsync1 system call
in a tight loop of 10,000 iterations, measuring each operation’s latency. We store the file system on a RAM disk,
2 / 10
so the measured latencies represent purely CPU overhead.
Goal
Direct application-level access to I/O devices
Maintain properties of classical server OS: isolation, rate limiting, global naming
3 / 10
Goal
Direct application-level access to I/O devices
Maintain properties of classical server OS: isolation, rate limiting, global naming
Use hardware virtualisation mechanisms
SR-IOV: virtual PCI devices with dedicated resources
Standard for NIC, expected for RAID
Memory protection via IOMMU
Support only policies that hardware can implement efficiently
3 / 10
Arrakis — Hardware Model
Dedicated queues and rate limiters
Virtual network interface card (VNIC)
Transmit and receive filters
Optional support for offloading
App
libos
libos
VNIC
Virtual storage interface controller (VSIC)
Virtual storage areas (VSA)
Emulated by dedicated CPU core
App
NIC
VNIC
Switch
Control
Plane
VSIC
VSA
V
st
Kernel
Managed by control plane
Userspace
Virtual interface cards:
VSIC
VSA
VSA
Storage Controller
Figure 3: Arrakis architecture. The storage controller maps
VSAs to physical storage.
Q
qu
of
th
sc
di
N
of
en
ad
sy
through their protected virtual device instance without reTr
quiring kernel intervention. In order to perform these operic
ations, applications rely on a user-level I/O stack that is prow
vided as a library. The user-level I/O stack can be tailored to
it
the application as it can assume exclusive access to a virtuth
alized device instance, allowing us to remove any features
not necessary for the application’s functionality. Finally,4 / 10 sp
IPC endpoints (“doorbells”) associated to
events on VIC
Virtual file system interfaces with
directories exported by applications
App
libos
libos
VNIC
NIC
VNIC
Switch
Control
Plane
VSIC
VSA
V
st
Kernel
Provides VICs, filters, VSAs, rate specifiers
App
Userspace
Arrakis — Control Plane
VSIC
VSA
VSA
Storage Controller
Figure 3: Arrakis architecture. The storage controller maps
VSAs to physical storage.
Q
qu
of
th
sc
di
N
of
en
ad
sy
through their protected virtual device instance without reTr
quiring kernel intervention. In order to perform these operic
ations, applications rely on a user-level I/O stack that is prow
vided as a library. The user-level I/O stack can be tailored to
it
the application as it can assume exclusive access to a virtuth
alized device instance, allowing us to remove any features
sp
5
/
10
not necessary for the application’s functionality. Finally,
Arrakis — Data Plane
Applications use (virtual) hardware directly
Asynchronous operations, doorbells signal completion
Optimised native interfaces with POSIX-compatibility layer on top
Network
Send/Receive packet to/from queue
Extaris: user-level network stack partly based on lwIP
Storage
Read/Write/Flush in a VSA; any offset, arbitrary size
Caladan: library of persistent data structures (log, queue)
6 / 10
Evaluation
Implemented by extending Barrelfish
6 machine cluster for evaluation
6-core Intel Xeon E5-2430 (Sandy Bridge) @ 2.2 GHz
Intel X520 (82599-based) 10Gb Ethernet adapter
Intel MegaRAID RS3DC040 RAID controller
100GB Intel DC S3700 SSD
One machine executes application, others act as clients
“Tuned” Ubuntu Linux 13.04 (Linux 3.8)
7 / 10
UDP Echo
(1,024 Byte payload, 1,000 samples, times in µs)
Linux
Receiver Running
Network stack
in
out
Scheduler
Arrakis
CPU Idle
POSIX
1.26
1.05
(37.6 %)
(31.3 %)
1.24
1.42
(20.0 %)
(22.9 %)
0.32
0.27
0.17
(5.0 %)
2.40
(38.8 %)
—
Copy
in
out
0.24
0.44
(7.1 %)
(13.2 %)
0.25
0.55
(4.0 %)
(8.9 %)
0.27
0.58
Kernel crossing
return
syscall
0.10
0.10
(2.9 %)
(2.9 %)
0.20
0.13
(3.3 %)
(2.1 %)
—
—
3.36
(σ = 0.66)
6.19
(σ = 0.82)
1.44
Total
(22.3 %)
(18.7 %)
Native
0.21
0.17
(55.3 %)
(44.7 %)
—
(18.7 %)
(40.3 %)
—
—
—
—
(σ < 0.01)
0.38
(σ < 0.01)
8 / 10
UDP Echo
(1,024 Byte payload, 1,000 samples, times in µs)
Linux
Arrakis
Receiver Running
Network stack
in
out
Scheduler
CPU Idle
POSIX
1.26
1.05
(37.6 %)
(31.3 %)
1.24
1.42
(20.0 %)
(22.9 %)
0.32
0.27
0.17
(5.0 %)
2.40
(38.8 %)
—
in
out
0.24
0.44
(7.1 %)
(13.2 %)
0.25
0.55
(4.0 %)
(8.9 %)
0.27
0.58
Kernel crossing
return
syscall
0.10
0.10
(2.9 %)
(2.9 %)
0.20
0.13
(3.3 %)
(2.1 %)
—
—
3.36
(σ = 0.66)
6.19
(σ = 0.82)
1.44
Total
ntributors to performance
ow do they compare to those
?
ter latency and throughput
plications? How does the
number of CPU cores for
efits of user-level application
orcement, while providing
Throughput [k packets / s]
Copy
1200
(22.3 %)
(18.7 %)
Native
0.21
0.17
(55.3 %)
(44.7 %)
—
(18.7 %)
(40.3 %)
—
—
—
—
(σ < 0.01)
0.38
(σ < 0.01)
Linux
Arrakis/P
Arrakis/N
Driver
1000
800
600
400
200
0
0
1
2
4
8
16 32 64
Processing time [us]
8 / 10
Redis
(64k random keys, 1,024 Byte value size, 1,000 samples, times in µs)
Read Hit
Linux
epoll
recv
Parse Input
Lookup/Set Key
Log Marshaling
write
fsync
Prepare Response
send
Other
2.42
0.98
0.85
0.10
—
—
—
0.60
3.17
0.55
(27.91 %)
(11.30 %)
(9.80 %)
(1.15 %)
Total
8.67
Durable Write
Arrakis/P
(27.52 %)
(7.13 %)
(16.22 %)
(2.46 %)
(6.92 %)
(36.56 %)
(6.34 %)
1.12
0.29
0.66
0.10
—
—
—
0.64
0.71
0.46
(σ = 2.55)
4.07
Linux
Arrakis/P
(15.72 %)
(17.44 %)
(11.30 %)
2.64
1.55
2.34
1.03
3.64
6.33
137.84
0.59
5.06
2.12
(1.62 %)
(0.95 %)
(1.43 %)
(0.63 %)
(2.23 %)
(3.88 %)
(84.49 %)
(0.36 %)
(3.10 %)
(1.30 %)
1.49
0.66
1.19
0.43
2.43
0.10
24.26
0.10
0.33
0.52
(4.73 %)
(2.09 %)
(3.78 %)
(1.36 %)
(7.71 %)
(0.32 %)
(76.99 %)
(0.32 %)
(1.05 %)
(1.65 %)
(σ = 0.44)
163.14
(σ = 13.68)
31.51
(σ = 1.91)
9 / 10
Redis
(64k random keys, 1,024 Byte value size, 1,000 samples, times in µs)
Read Hit
Linux
Arrakis/P
2.42
0.98
0.85
0.10
—
—
—
0.60
3.17
0.55
(27.91 %)
(11.30 %)
(9.80 %)
(1.15 %)
Total
8.67
2
4
6
umber of CPU cores
(27.52 %)
(7.13 %)
(16.22 %)
(2.46 %)
(6.92 %)
(36.56 %)
(6.34 %)
1.12
0.29
0.66
0.10
—
—
—
0.64
0.71
0.46
(σ = 2.55)
4.07
Throughput [k transactions / s]
epoll
recv
Parse Input
Lookup/Set Key
Log Marshaling
write
fsync
Prepare Response
send
Other
hreads
x procs
rakis/P
Durable Write
Linux
Arrakis/P
(15.72 %)
(17.44 %)
(11.30 %)
2.64
1.55
2.34
1.03
3.64
6.33
137.84
0.59
5.06
2.12
(1.62 %)
(0.95 %)
(1.43 %)
(0.63 %)
(2.23 %)
(3.88 %)
(84.49 %)
(0.36 %)
(3.10 %)
(1.30 %)
1.49
0.66
1.19
0.43
2.43
0.10
24.26
0.10
0.33
0.52
(4.73 %)
(2.09 %)
(3.78 %)
(1.36 %)
(7.71 %)
(0.32 %)
(76.99 %)
(0.32 %)
(1.05 %)
(1.65 %)
(σ = 0.44)
163.14
(σ = 13.68)
31.51
(σ = 1.91)
300
Linux
Arrakis/P
Arrakis/P [15us]
Linux/Caladan
250
200
150
100
50
0
GET
SET
9 / 10
Conclusion
Control plane configures hardware to enforce policies
Direct application access to (virtualised) hardware
Specialised, application-level I/O stacks
Significant performance improvements
Merged back into Barrelfish mainline
10 / 10