IX: A Protected Dataplane Operating System for High Throughput and Low Latency By: Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, and Christos Kozyrakis, Edouard Bugnion, Presented by: Xianghan Pei EECS 582 – W16 1 Outline • Overview • Motivation • IX Design • IX Implementation • Evaluation EECS 582 – W16 2 Overview • OSDI 2014 Best Paper • Extension work from Dune (an extension to Linux that provides safe and efficient access to kernel-only CPU features) • Some slides borrow from IX conference slides EECS 582 – W16 3 Challenges for Datacenter Application • Microsecond Tail Latency • High Packet Rates • Protection • Resource Efficiency EECS 582 – W16 4 Motivation • HW is fast, but SW is a Bottleneck 64-byte TCP Echo EECS 582 – W16 5 Motivation • Why SW Slow? • HW and Workload is changing • Dense Multicore +10 GbE • Scale out workloads • Berkeley sockets, designed for CPU time sharing • Complex Interface & Code Paths Convoluted by Interrupts and Scheduling • Packet inter-arrival times being many times higher than the latency of interrupts and system call EECS 582 – W16 6 Motivation • IX Closes the SW Performance Gap • Protection and direct HW access through virtualization • Execution model for low latency and high throughput 64-byte TCP Echo EECS 582 – W16 7 IX Design • Separation and protection of control and data plane • Control plane responses for resource configuration, provisioning, ect. • Data plane runs for networking stack and application logic • Run to completion with adaptive batching • Data and instruction cache locality • Native, zero-copy API with explicit flow control • Flow consistent, synchronization-free processing EECS 582 – W16 8 IX implementation • Separation of Control and Data Plane EECS 582 – W16 9 IX implementation • Run to Completion with Adaptive Batching EECS 582 – W16 10 IX implementation • Dataplane Details • Different Memory Management • Large blocks, allow internal memory fragmentation, turn off swappable memory • Hierarchical timing wheel implementation, such as TCP retransmissions timeout • Redesign API EECS 582 – W16 11 IX implementation • Dataplane API and Operation EECS 582 – W16 12 IX implementation • Multi-core scalability • Elastic threads operate in a synchronization and coherence free manner • API commute • Flow-consistent hashing at the NICs • Small number of shared structures • Security model • The malicious application can’t corrupt the networking stack or other applications EECS 582 – W16 13 Evaluation • Comparison IX to Linux and mTCP • TCP microbenchmarks and Memcached EECS 582 – W16 14 Evaluation • NetPIPE performance EECS 582 – W16 15 Evaluation • Multicore Scalability for Short Connections EECS 582 – W16 16 Evaluation • n round-trips per connection EECS 582 – W16 17 Evaluation • Different message sizes s (n=1) EECS 582 – W16 18 Evaluation • Connection Scalability EECS 582 – W16 19 Evaluation • Memcached over TCP EECS 582 – W16 20 Discussion • Why IX fast? • Subtleties of adaptive batching • Limitation EECS 582 – W16 21 Conclusion • A protected dataplane OS for datacenter applications with an event-driven model and demanding connection scalability requirements • Efficient access to HW, without sacrificing security, through virtualization • High throughput and low latency enabled by a dataplane execution model EECS 582 – W16 22 Conventional Wisdom • Bypass the kernel • Move TCP to user-space (Onload, mTCP, Sandstorm) • Move TCP to hardware (TOE) • Avoid the connection scalability bottleneck • Use datagrams instead of connections (DIY congestion management) • Use proxies at the expense of latency • Replace classic Ethernet • Use a lossless fabric (Infiniband) • Offload memory access (rDMA) EECS 582 – W16 23
© Copyright 2026 Paperzz