Arrakis: The Operating System is the Control Plane Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug Woos, Arvind Krishnamurthy, Thomas Anderson, Timothy Roscoe∗ University of Washington, ∗ ETH Zurich Symposium on Operating Systems Design and Implementation (OSDI), October 2014 1 / 10 Motivation Fast, low-latency hardware: Ethernet, SSD I/O-intensive applications OS on critical (data) path 2 / 10 App App Userspace Core Kernel Fast, low-latency hardware: Ethernet, SSD I/O-intensive applications Core Core OS on critical (data) path Incoming Q's Outgoing Q's System call duration [us] Motivation NIC Figure 1: Linux networking architecture and workflow. removed, there is an opportunity to rethink the POSIX API for more streamlined networking. In addition to a POSIX compatible interface, Arrakis provides a native interface (Arrakis/N) which supports true zero-copy I/O. Fig im Err by req 2.2 Storage Stack Overheads fou cal To illustrate the overhead of today’s OS storage stacks, to we conduct an experiment, where we execute small write operations immediately followed by an fsync1 system call 2 / 10ma Motivation Receiver Running Network Stack in out Scheduler CPU Idle 1.26 1.05 (37.6 %) (31.3 %) 1.24 1.42 (20.0 %) (22.9 %) 0.17 (5.0 %) 2.40 (38.8 %) Copy in out 0.24 0.44 (7.1 %) (13.2 %) 0.25 0.55 (4.0 %) (8.9 %) Kernel Crossing return syscall 0.10 0.10 (2.9 %) (2.9 %) 0.20 0.13 (3.3 %) (2.1 %) App Core Core App Userspace Core Kernel Incoming Q's Outgoing Q's NIC Figure 1: Linux networking architecture and workflow. Total 3.36 (σ = 0.66) 6.19 (σ = 0.82) UDP echo: 1,024 Byte payload, 1,000 samples, times in µs removed, there is an opportunity to rethink the POSIX API for more streamlined networking. In addition to a POSIX compatible interface, Arrakis provides a native interface (Arrakis/N) which supports true zero-copy I/O. 2.2 Storage Stack Overheads To illustrate the overhead of today’s OS storage stacks, we conduct an experiment, where we execute small write operations immediately followed by an fsync1 system call in a tight loop of 10,000 iterations, measuring each operation’s latency. We store the file system on a RAM disk, 2 / 10 so the measured latencies represent purely CPU overhead. Goal Direct application-level access to I/O devices Maintain properties of classical server OS: isolation, rate limiting, global naming 3 / 10 Goal Direct application-level access to I/O devices Maintain properties of classical server OS: isolation, rate limiting, global naming Use hardware virtualisation mechanisms SR-IOV: virtual PCI devices with dedicated resources Standard for NIC, expected for RAID Memory protection via IOMMU Support only policies that hardware can implement efficiently 3 / 10 Arrakis — Hardware Model Dedicated queues and rate limiters Virtual network interface card (VNIC) Transmit and receive filters Optional support for offloading App libos libos VNIC Virtual storage interface controller (VSIC) Virtual storage areas (VSA) Emulated by dedicated CPU core App NIC VNIC Switch Control Plane VSIC VSA V st Kernel Managed by control plane Userspace Virtual interface cards: VSIC VSA VSA Storage Controller Figure 3: Arrakis architecture. The storage controller maps VSAs to physical storage. Q qu of th sc di N of en ad sy through their protected virtual device instance without reTr quiring kernel intervention. In order to perform these operic ations, applications rely on a user-level I/O stack that is prow vided as a library. The user-level I/O stack can be tailored to it the application as it can assume exclusive access to a virtuth alized device instance, allowing us to remove any features not necessary for the application’s functionality. Finally,4 / 10 sp IPC endpoints (“doorbells”) associated to events on VIC Virtual file system interfaces with directories exported by applications App libos libos VNIC NIC VNIC Switch Control Plane VSIC VSA V st Kernel Provides VICs, filters, VSAs, rate specifiers App Userspace Arrakis — Control Plane VSIC VSA VSA Storage Controller Figure 3: Arrakis architecture. The storage controller maps VSAs to physical storage. Q qu of th sc di N of en ad sy through their protected virtual device instance without reTr quiring kernel intervention. In order to perform these operic ations, applications rely on a user-level I/O stack that is prow vided as a library. The user-level I/O stack can be tailored to it the application as it can assume exclusive access to a virtuth alized device instance, allowing us to remove any features sp 5 / 10 not necessary for the application’s functionality. Finally, Arrakis — Data Plane Applications use (virtual) hardware directly Asynchronous operations, doorbells signal completion Optimised native interfaces with POSIX-compatibility layer on top Network Send/Receive packet to/from queue Extaris: user-level network stack partly based on lwIP Storage Read/Write/Flush in a VSA; any offset, arbitrary size Caladan: library of persistent data structures (log, queue) 6 / 10 Evaluation Implemented by extending Barrelfish 6 machine cluster for evaluation 6-core Intel Xeon E5-2430 (Sandy Bridge) @ 2.2 GHz Intel X520 (82599-based) 10Gb Ethernet adapter Intel MegaRAID RS3DC040 RAID controller 100GB Intel DC S3700 SSD One machine executes application, others act as clients “Tuned” Ubuntu Linux 13.04 (Linux 3.8) 7 / 10 UDP Echo (1,024 Byte payload, 1,000 samples, times in µs) Linux Receiver Running Network stack in out Scheduler Arrakis CPU Idle POSIX 1.26 1.05 (37.6 %) (31.3 %) 1.24 1.42 (20.0 %) (22.9 %) 0.32 0.27 0.17 (5.0 %) 2.40 (38.8 %) — Copy in out 0.24 0.44 (7.1 %) (13.2 %) 0.25 0.55 (4.0 %) (8.9 %) 0.27 0.58 Kernel crossing return syscall 0.10 0.10 (2.9 %) (2.9 %) 0.20 0.13 (3.3 %) (2.1 %) — — 3.36 (σ = 0.66) 6.19 (σ = 0.82) 1.44 Total (22.3 %) (18.7 %) Native 0.21 0.17 (55.3 %) (44.7 %) — (18.7 %) (40.3 %) — — — — (σ < 0.01) 0.38 (σ < 0.01) 8 / 10 UDP Echo (1,024 Byte payload, 1,000 samples, times in µs) Linux Arrakis Receiver Running Network stack in out Scheduler CPU Idle POSIX 1.26 1.05 (37.6 %) (31.3 %) 1.24 1.42 (20.0 %) (22.9 %) 0.32 0.27 0.17 (5.0 %) 2.40 (38.8 %) — in out 0.24 0.44 (7.1 %) (13.2 %) 0.25 0.55 (4.0 %) (8.9 %) 0.27 0.58 Kernel crossing return syscall 0.10 0.10 (2.9 %) (2.9 %) 0.20 0.13 (3.3 %) (2.1 %) — — 3.36 (σ = 0.66) 6.19 (σ = 0.82) 1.44 Total ntributors to performance ow do they compare to those ? ter latency and throughput plications? How does the number of CPU cores for efits of user-level application orcement, while providing Throughput [k packets / s] Copy 1200 (22.3 %) (18.7 %) Native 0.21 0.17 (55.3 %) (44.7 %) — (18.7 %) (40.3 %) — — — — (σ < 0.01) 0.38 (σ < 0.01) Linux Arrakis/P Arrakis/N Driver 1000 800 600 400 200 0 0 1 2 4 8 16 32 64 Processing time [us] 8 / 10 Redis (64k random keys, 1,024 Byte value size, 1,000 samples, times in µs) Read Hit Linux epoll recv Parse Input Lookup/Set Key Log Marshaling write fsync Prepare Response send Other 2.42 0.98 0.85 0.10 — — — 0.60 3.17 0.55 (27.91 %) (11.30 %) (9.80 %) (1.15 %) Total 8.67 Durable Write Arrakis/P (27.52 %) (7.13 %) (16.22 %) (2.46 %) (6.92 %) (36.56 %) (6.34 %) 1.12 0.29 0.66 0.10 — — — 0.64 0.71 0.46 (σ = 2.55) 4.07 Linux Arrakis/P (15.72 %) (17.44 %) (11.30 %) 2.64 1.55 2.34 1.03 3.64 6.33 137.84 0.59 5.06 2.12 (1.62 %) (0.95 %) (1.43 %) (0.63 %) (2.23 %) (3.88 %) (84.49 %) (0.36 %) (3.10 %) (1.30 %) 1.49 0.66 1.19 0.43 2.43 0.10 24.26 0.10 0.33 0.52 (4.73 %) (2.09 %) (3.78 %) (1.36 %) (7.71 %) (0.32 %) (76.99 %) (0.32 %) (1.05 %) (1.65 %) (σ = 0.44) 163.14 (σ = 13.68) 31.51 (σ = 1.91) 9 / 10 Redis (64k random keys, 1,024 Byte value size, 1,000 samples, times in µs) Read Hit Linux Arrakis/P 2.42 0.98 0.85 0.10 — — — 0.60 3.17 0.55 (27.91 %) (11.30 %) (9.80 %) (1.15 %) Total 8.67 2 4 6 umber of CPU cores (27.52 %) (7.13 %) (16.22 %) (2.46 %) (6.92 %) (36.56 %) (6.34 %) 1.12 0.29 0.66 0.10 — — — 0.64 0.71 0.46 (σ = 2.55) 4.07 Throughput [k transactions / s] epoll recv Parse Input Lookup/Set Key Log Marshaling write fsync Prepare Response send Other hreads x procs rakis/P Durable Write Linux Arrakis/P (15.72 %) (17.44 %) (11.30 %) 2.64 1.55 2.34 1.03 3.64 6.33 137.84 0.59 5.06 2.12 (1.62 %) (0.95 %) (1.43 %) (0.63 %) (2.23 %) (3.88 %) (84.49 %) (0.36 %) (3.10 %) (1.30 %) 1.49 0.66 1.19 0.43 2.43 0.10 24.26 0.10 0.33 0.52 (4.73 %) (2.09 %) (3.78 %) (1.36 %) (7.71 %) (0.32 %) (76.99 %) (0.32 %) (1.05 %) (1.65 %) (σ = 0.44) 163.14 (σ = 13.68) 31.51 (σ = 1.91) 300 Linux Arrakis/P Arrakis/P [15us] Linux/Caladan 250 200 150 100 50 0 GET SET 9 / 10 Conclusion Control plane configures hardware to enforce policies Direct application access to (virtualised) hardware Specialised, application-level I/O stacks Significant performance improvements Merged back into Barrelfish mainline 10 / 10
© Copyright 2026 Paperzz