The nom Profit-Maximizing Operating System

The nom Profit-Maximizing Operating System
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
Lightbits Labs
Technion—Israel Institute of Technology
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
1 / 43
This talk is about money
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
2 / 43
In the clouds
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
3 / 43
It all began a few years ago, . . .
The Resource-as-a-service (RaaS) Cloud
[HotCloud’12,VEE’14,CACM]
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
4 / 43
Trend towards RaaS: Temporal Granularity
3 years on average: buying hardware
Months: web hosting
Hours: EC2 on-demand (pay-as-you-go)
5 minutes: EC2 Spot Instances (pay-as-you-go)
3 minutes: GridSpot
1 minute: Profitbricks, Google Compute Engine
...
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
5 / 43
Trend towards RaaS: Spatial Granularity
Amazon allows clients to dynamically add and remove I/O devices
Amazon allows clients to set a desired I/O rate on a
per-block-instance basis
GridSpot, ProfitBricks and CloudSigma let their clients compose a
flexible bundle—with prices depending on current cost of
resources
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
6 / 43
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
7 / 43
RaaS leads to the need in automatic economic mechanisms inside the
machine, and it’s happening today: CloudSigma’s burst pricing
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
8 / 43
CloudSigma’s burst prices, a typical day in July 2012
Image by Kristof Kovacs, taken from
http://kkovacs.eu/cloudsigma-burst-price-chart
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
9 / 43
CloudSigma’s burst RAM prices, May 2015–Jan 2016
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
10 / 43
CloudSigma’s burst CPU prices, May 2015–Jan 2016
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
11 / 43
Dynamic pricing is great. . .
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
12 / 43
Except for the fly in the ointment
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
13 / 43
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
14 / 43
For traditional operating systems, performance is everything
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
15 / 43
Costs? They don’t care about no costs
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
16 / 43
Dynamic pricing clouds mandate change
Resource ownership and control the OS is no longer sole owner
Economic model Cloud provider vs. various clients
Resource granularity fine-grained resource allocation
Architectural support physical and virtual hardware
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
17 / 43
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
18 / 43
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
19 / 43
A profit maximizing operating system
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
20 / 43
Requirements
Maximize profit optimize for both performance and cost
Expose resources move the kernel out of the way
Isolate applications . . . with a little help from hardware friends
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
21 / 43
nom
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
22 / 43
Design and Implementation
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
23 / 43
Networking: private application stacks
Applications run with private network stacks and device drivers.
nom provides default TCP/IP stack and a device driver for the virtio
virtnet virtual device.
Applications configure the stack and driver independently.
Potentially: application-specific optimizations to stacks and
drivers.
Applications choose polling vs. interrupt-driven operations: trade
off CPU cycles and power for better performance
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
24 / 43
Networking: batching delay
Application controls network stack and device driver behavior on
receive and transmit via the batching delay
Batching delay 0: no delay, every packet is run to completion
Batching delay Wµsec : batch packets together but don’t let any
packet wait more than Wµsec
Balancing throughput, latency, and jitter
Throughput increases with W (good) up to a point
Latency and jitter increase with W (bad!)
What’s the right batching delay? Depends!
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
25 / 43
Utility of network bandwidth=Income-Expenses
Goal: Maximize the profit of a server running in the guest OS.
Income: depends on its performance and client SLA.
Expenses: depend on its resource consumption and on current
prices.
Given: price of bandwidth and application load
Means: throughput, latency, and jitter
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
26 / 43
Assumptions on Utility
Assumption: network server’s utility increases as throughput rises
1
Can serve more users
2
Can serve users faster
Assumption: server’s utility decreases as latency and jitter rise
1
Lower quality of service
2
More likely to violate SLO’s
Assumption: server’s utility decreases as the price of bandwidth rises
1
Bandwidth becomes more expensive
2
Expenses rise ⇒ lower profits
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
27 / 43
The bonus utility function
Ubonus = T · (α +
γ
− P)
L
Key idea: users give the server bonuses for better latency
T throughput in
Gbit
s
or application operations/second
α server’s valuation of useful bandwidth ($/Gbit or $/operation)
γ size of the bonus
P bandwidth price in $/Gbit or $/operation
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
28 / 43
The refund utility function
Urefund = T · (max(0, α − β · L) − P)
Key idea: the server gives its users refunds as latency increases
T throughput in
Gbit
s
or application operations/second
α server’s valuation of useful bandwidth ($/Gbit or $/operation)
β extent of the refund
P bandwidth price in $/Gbit or $/operation
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
29 / 43
The penalty utility function
Upenalty = T · (α · (1 − min(1, X · N (L0 , L, σ))) − P)
Key idea: server pays penalties when it fails to meet SLAs
T throughput in
Gbit
s
or application operations/second
α server’s valuation of useful bandwidth ($/Gbit or $/operation)
X penalty factor for not meeting users’ SLOs
L mean latency (µsecs)
L0 the maximal latency allowed by the SLAs
σ the latency’s standard deviation (jitter)
N (L0 , L, σ) the probability that a latency sample will not meet the latency SLO
P bandwidth price in $/Gbit or $/operation
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
30 / 43
Implementation
x86-64 SMP virtual machines
Direct access to I/O devices
Runs on bare-metal too
C and (a little) assembly
Applications:
memcached key-value store
nhttpd web server
NetPIPE network benchmark
Penalty, refund, and bonus utility
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
31 / 43
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
32 / 43
Methodology: key questions
1
Does optimizing for cost preclude optimizing for performance?
2
Does optimizing for both cost and performance improve profit?
3
Is changing behavior at runtime important for maximizing profits?
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
33 / 43
Methodology: experiment design
memcached, nhttpd, and NetPIPE running in Linux, OSv, and
nom virtual machines with the same∗ resources
Each experiment runs for 120 seconds
First 60 seconds:
Serving many users together
Cloud provider charges $1/Gb for bandwidth
Second 60 seconds:
Serving a single important user at a time
Cloud provider charges $10/Gb for bandwidth
Results reported are averages of 5 runs
nom applications know throughput, latency, jitter as a function of
batching delay and change it when conditions change
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
34 / 43
40
110
38
105
100
36
95
34
90
32
85
30
80
75
28
70
26
65
throughput
latency
24
0
5
10
15
20
25
30
batching delay 'w' [usec]
latency [usec]
throughput [1K ops/s]
Effect of batching on throughput and latency
60
35
40
Figure : memcached throughput and latency as a function of batching delay
1
Throughput rises with batching delay up to a point
2
Latency is best with batching delay 0 (no delay)
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
35 / 43
Pareto frontier
110
latency [usec]
105
100
32
95
90
85
80
75
70
0
65
60
24
24
26
10
8
6
28
30
32
34
36
throughput [1K ops/s]
38
14
12
40
Figure : The memcached throughput/latency Pareto frontier
1
2
3
No single “best” batching delay setting
⇒ No single “best” stack or driver
⇒ Applications can pick the right working point according to
current price and load
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
36 / 43
48
100
24
0
0
0
1.29x
0.27x
4
3
2
1
2.47x
0
Up to 1.29x better throughput and
latency than Linux
250
200
150
1.11x 1.22x
100
50
0
nom
200
nom
72
OSv
300
1.4
0
5
0.93x
96
2.8
0.32x
2
OSv
120
4
Linux
1.01x
0.91x
4.2
latency [usec]
0.93x
400
1.16x
5.6
6
nom
500
8
OSv
0
nhttpd single
7
1.23x
Linux
10
latency [sec]
20
0.93x
nhttpd many
10
nom
30
1.28x
OSv
40
memcached single
18
15
12
9
6
3
0
Linux
0.93x 1.01x
throughput [1K reqs/s]
memcached many
50
Linux
latency [usec]
throughput [1K ops/s]
How well does nom perform?
1.2x–3.9x better throughput and up to
9.1x better latency than Linux and OSv
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
37 / 43
What makes nom fast?
1
Hypervisor friendly: 10K-20K exits vs. 43K-90K for Linux and OSv
2
Polling: 1K interrupts vs. 12K-22K for Linux and OSv
3
Zero-copy transmit and receive, no allocations on data path
4
Application-specific configuration of stacks and drivers
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
38 / 43
Does nom profit more?
1.04x
0.93x
1
11.14x
bonus utility
1.5
0.75
0.9
0.5
0.6
0.25
0.3
0
0
Linux
Linux
nom
Linux
nom
0.00x
OSv
0.5
0.25
0
1.12x
0.91x
1.2
nom
1
0.75
OSv
Profit [$1M]
refund utility
1.25
OSv
penalty utility
1.25
memcached: nom yields up to 11x more profit
refund utility
275
220
0
195
0.57x
130
0.40x
65
0
nom
0
1.25x
260
OSv
55
1.81x
nom
110
12
nom
24
OSv
165
OSv
1.00x
36
bonus utility
325
Linux
48
Linux
Profit [$1K]
1.30x
Linux
penalty utility
60
nhttpd: nom yields up to 1.8x more profit
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
39 / 43
What makes nom profitable?
bonus utility
1.2
0.14x
0.5
0.6
0.3
0
0
adp
lat
tpt
0.92x
0.73x
0.9
0.25
0
adp
0.69x
adp
0.5
0.25
0.97x
0.75
lat
0.82x
0.75
lat
1.5
1
tpt
refund utility
1.25
1
tpt
Profit [$1M]
penalty utility
1.25
memcached static vs. adaptive
bonus utility
275
220
0.87x0.78x
165
165
0
lat
55
0
tpt
110
55
0
adp
110
12
lat
24
tpt
0.90x
0.80x
220
lat
36
refund utility
275
tpt
48
adp
Profit [$1K]
0.94x
0.73x
adp
penalty utility
60
nhttpd static vs. adaptive
Runtime adaptation is a must for maximizing profit
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
40 / 43
Conclusions
Clouds with dynamic pricing present
challenges and opportunities.
Traditional OSes are a poor fit for such
clouds because they ignore costs.
We present nom, a profit-maximizing
operating system.
In nom there is no “best” software stack:
Application-specific stacks and drivers
Use utility functions to adapt runtime
behavior and maximize profit.
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
41 / 43
Thank you! Questions?
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
42 / 43
Related work
Userspace I/O and virtual machine device assignment
The Exokernel [SOSP’95,SOSP’97,TOCS’02]
Library operating systems [VEE’07,ASPLOS’11]
Dune [OSDI’12] and IX [OSDI’14]
Mirage [HotCloud’10], [ASPLOS’13]
NoHype [ISCA’10]
Arrakis [HotOS’13], [OSDI’14]
RaaS [HotCloud’12],[VEE’14],[CACM]
Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, Dan Tsafrir
The nom Profit-Maximizing OS
VEE, April, 2016
43 / 43