The nom Profit-Maximizing Operating System Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir Lightbits Labs Technion—Israel Institute of Technology Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 1 / 43 This talk is about money Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 2 / 43 In the clouds Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 3 / 43 It all began a few years ago, . . . The Resource-as-a-service (RaaS) Cloud [HotCloud’12,VEE’14,CACM] Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 4 / 43 Trend towards RaaS: Temporal Granularity 3 years on average: buying hardware Months: web hosting Hours: EC2 on-demand (pay-as-you-go) 5 minutes: EC2 Spot Instances (pay-as-you-go) 3 minutes: GridSpot 1 minute: Profitbricks, Google Compute Engine ... Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 5 / 43 Trend towards RaaS: Spatial Granularity Amazon allows clients to dynamically add and remove I/O devices Amazon allows clients to set a desired I/O rate on a per-block-instance basis GridSpot, ProfitBricks and CloudSigma let their clients compose a flexible bundle—with prices depending on current cost of resources Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 6 / 43 Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 7 / 43 RaaS leads to the need in automatic economic mechanisms inside the machine, and it’s happening today: CloudSigma’s burst pricing Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 8 / 43 CloudSigma’s burst prices, a typical day in July 2012 Image by Kristof Kovacs, taken from http://kkovacs.eu/cloudsigma-burst-price-chart Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 9 / 43 CloudSigma’s burst RAM prices, May 2015–Jan 2016 Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 10 / 43 CloudSigma’s burst CPU prices, May 2015–Jan 2016 Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 11 / 43 Dynamic pricing is great. . . Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 12 / 43 Except for the fly in the ointment Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 13 / 43 Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 14 / 43 For traditional operating systems, performance is everything Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 15 / 43 Costs? They don’t care about no costs Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 16 / 43 Dynamic pricing clouds mandate change Resource ownership and control the OS is no longer sole owner Economic model Cloud provider vs. various clients Resource granularity fine-grained resource allocation Architectural support physical and virtual hardware Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 17 / 43 Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 18 / 43 Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 19 / 43 A profit maximizing operating system Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 20 / 43 Requirements Maximize profit optimize for both performance and cost Expose resources move the kernel out of the way Isolate applications . . . with a little help from hardware friends Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 21 / 43 nom Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 22 / 43 Design and Implementation Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 23 / 43 Networking: private application stacks Applications run with private network stacks and device drivers. nom provides default TCP/IP stack and a device driver for the virtio virtnet virtual device. Applications configure the stack and driver independently. Potentially: application-specific optimizations to stacks and drivers. Applications choose polling vs. interrupt-driven operations: trade off CPU cycles and power for better performance Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 24 / 43 Networking: batching delay Application controls network stack and device driver behavior on receive and transmit via the batching delay Batching delay 0: no delay, every packet is run to completion Batching delay Wµsec : batch packets together but don’t let any packet wait more than Wµsec Balancing throughput, latency, and jitter Throughput increases with W (good) up to a point Latency and jitter increase with W (bad!) What’s the right batching delay? Depends! Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 25 / 43 Utility of network bandwidth=Income-Expenses Goal: Maximize the profit of a server running in the guest OS. Income: depends on its performance and client SLA. Expenses: depend on its resource consumption and on current prices. Given: price of bandwidth and application load Means: throughput, latency, and jitter Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 26 / 43 Assumptions on Utility Assumption: network server’s utility increases as throughput rises 1 Can serve more users 2 Can serve users faster Assumption: server’s utility decreases as latency and jitter rise 1 Lower quality of service 2 More likely to violate SLO’s Assumption: server’s utility decreases as the price of bandwidth rises 1 Bandwidth becomes more expensive 2 Expenses rise ⇒ lower profits Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 27 / 43 The bonus utility function Ubonus = T · (α + γ − P) L Key idea: users give the server bonuses for better latency T throughput in Gbit s or application operations/second α server’s valuation of useful bandwidth ($/Gbit or $/operation) γ size of the bonus P bandwidth price in $/Gbit or $/operation Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 28 / 43 The refund utility function Urefund = T · (max(0, α − β · L) − P) Key idea: the server gives its users refunds as latency increases T throughput in Gbit s or application operations/second α server’s valuation of useful bandwidth ($/Gbit or $/operation) β extent of the refund P bandwidth price in $/Gbit or $/operation Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 29 / 43 The penalty utility function Upenalty = T · (α · (1 − min(1, X · N (L0 , L, σ))) − P) Key idea: server pays penalties when it fails to meet SLAs T throughput in Gbit s or application operations/second α server’s valuation of useful bandwidth ($/Gbit or $/operation) X penalty factor for not meeting users’ SLOs L mean latency (µsecs) L0 the maximal latency allowed by the SLAs σ the latency’s standard deviation (jitter) N (L0 , L, σ) the probability that a latency sample will not meet the latency SLO P bandwidth price in $/Gbit or $/operation Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 30 / 43 Implementation x86-64 SMP virtual machines Direct access to I/O devices Runs on bare-metal too C and (a little) assembly Applications: memcached key-value store nhttpd web server NetPIPE network benchmark Penalty, refund, and bonus utility Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 31 / 43 Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 32 / 43 Methodology: key questions 1 Does optimizing for cost preclude optimizing for performance? 2 Does optimizing for both cost and performance improve profit? 3 Is changing behavior at runtime important for maximizing profits? Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 33 / 43 Methodology: experiment design memcached, nhttpd, and NetPIPE running in Linux, OSv, and nom virtual machines with the same∗ resources Each experiment runs for 120 seconds First 60 seconds: Serving many users together Cloud provider charges $1/Gb for bandwidth Second 60 seconds: Serving a single important user at a time Cloud provider charges $10/Gb for bandwidth Results reported are averages of 5 runs nom applications know throughput, latency, jitter as a function of batching delay and change it when conditions change Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 34 / 43 40 110 38 105 100 36 95 34 90 32 85 30 80 75 28 70 26 65 throughput latency 24 0 5 10 15 20 25 30 batching delay 'w' [usec] latency [usec] throughput [1K ops/s] Effect of batching on throughput and latency 60 35 40 Figure : memcached throughput and latency as a function of batching delay 1 Throughput rises with batching delay up to a point 2 Latency is best with batching delay 0 (no delay) Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 35 / 43 Pareto frontier 110 latency [usec] 105 100 32 95 90 85 80 75 70 0 65 60 24 24 26 10 8 6 28 30 32 34 36 throughput [1K ops/s] 38 14 12 40 Figure : The memcached throughput/latency Pareto frontier 1 2 3 No single “best” batching delay setting ⇒ No single “best” stack or driver ⇒ Applications can pick the right working point according to current price and load Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 36 / 43 48 100 24 0 0 0 1.29x 0.27x 4 3 2 1 2.47x 0 Up to 1.29x better throughput and latency than Linux 250 200 150 1.11x 1.22x 100 50 0 nom 200 nom 72 OSv 300 1.4 0 5 0.93x 96 2.8 0.32x 2 OSv 120 4 Linux 1.01x 0.91x 4.2 latency [usec] 0.93x 400 1.16x 5.6 6 nom 500 8 OSv 0 nhttpd single 7 1.23x Linux 10 latency [sec] 20 0.93x nhttpd many 10 nom 30 1.28x OSv 40 memcached single 18 15 12 9 6 3 0 Linux 0.93x 1.01x throughput [1K reqs/s] memcached many 50 Linux latency [usec] throughput [1K ops/s] How well does nom perform? 1.2x–3.9x better throughput and up to 9.1x better latency than Linux and OSv Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 37 / 43 What makes nom fast? 1 Hypervisor friendly: 10K-20K exits vs. 43K-90K for Linux and OSv 2 Polling: 1K interrupts vs. 12K-22K for Linux and OSv 3 Zero-copy transmit and receive, no allocations on data path 4 Application-specific configuration of stacks and drivers Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 38 / 43 Does nom profit more? 1.04x 0.93x 1 11.14x bonus utility 1.5 0.75 0.9 0.5 0.6 0.25 0.3 0 0 Linux Linux nom Linux nom 0.00x OSv 0.5 0.25 0 1.12x 0.91x 1.2 nom 1 0.75 OSv Profit [$1M] refund utility 1.25 OSv penalty utility 1.25 memcached: nom yields up to 11x more profit refund utility 275 220 0 195 0.57x 130 0.40x 65 0 nom 0 1.25x 260 OSv 55 1.81x nom 110 12 nom 24 OSv 165 OSv 1.00x 36 bonus utility 325 Linux 48 Linux Profit [$1K] 1.30x Linux penalty utility 60 nhttpd: nom yields up to 1.8x more profit Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 39 / 43 What makes nom profitable? bonus utility 1.2 0.14x 0.5 0.6 0.3 0 0 adp lat tpt 0.92x 0.73x 0.9 0.25 0 adp 0.69x adp 0.5 0.25 0.97x 0.75 lat 0.82x 0.75 lat 1.5 1 tpt refund utility 1.25 1 tpt Profit [$1M] penalty utility 1.25 memcached static vs. adaptive bonus utility 275 220 0.87x0.78x 165 165 0 lat 55 0 tpt 110 55 0 adp 110 12 lat 24 tpt 0.90x 0.80x 220 lat 36 refund utility 275 tpt 48 adp Profit [$1K] 0.94x 0.73x adp penalty utility 60 nhttpd static vs. adaptive Runtime adaptation is a must for maximizing profit Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 40 / 43 Conclusions Clouds with dynamic pricing present challenges and opportunities. Traditional OSes are a poor fit for such clouds because they ignore costs. We present nom, a profit-maximizing operating system. In nom there is no “best” software stack: Application-specific stacks and drivers Use utility functions to adapt runtime behavior and maximize profit. Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 41 / 43 Thank you! Questions? Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 42 / 43 Related work Userspace I/O and virtual machine device assignment The Exokernel [SOSP’95,SOSP’97,TOCS’02] Library operating systems [VEE’07,ASPLOS’11] Dune [OSDI’12] and IX [OSDI’14] Mirage [HotCloud’10], [ASPLOS’13] NoHype [ISCA’10] Arrakis [HotOS’13], [OSDI’14] RaaS [HotCloud’12],[VEE’14],[CACM] Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir The nom Profit-Maximizing OS VEE, April, 2016 43 / 43
© Copyright 2026 Paperzz