Let Me Contain That For You

Let Me Contain That For You
Victor Marmol ([email protected])
Rohit Jnagal ([email protected])
Google Confidential and Proprietary
Containers @ Google
●
●
●
●
Early users: Scaling process management and isolation.
What: Linux cgroups + user-space policies and monitoring.
Everywhere: SaaS, PaaS, IaaS; Private and Public clouds.
Containerizing shared machines
○
○
○
Asymmetric workloads : Latency, bandwidth, and priority
Asymmetric Isolation
High churn
● Goals:
○
○
○
○
○
Performance guarantees.
High utilization across resources.
Shared resources.
Overcommitment: Invisible workload from reclaimed resources.
Near zero overhead.
● Other use cases: ChromeOS et al
LPC 2013
Google Confidential and Proprietary
A Shared Google Machine
I/O:CPU:Mem
Sensitive
Front End Job
Back End Job
Allocation
BACKGROUND
System Daemons
LPC 2013
Batch workload
TASKS
Soaker workload
Google Confidential and Proprietary
Resource Isolation
● Quality of service
○
○
○
○
Bandwidth - Fair share, progress guarantees, availability.
Latency - wakeup, allocation, access times
Priority - Order of importance.
Performance: Microarchitecture interference (CPI2); Locality
● Solution:
○
○
○
○
Scheduling a good mix.
Hierarchical resource management for effective sharing.
Maximize utilization across all dimensions.
Cgroup-aware tasks:
■
■
■
User subcontainers [eg. Query management]
User schedulers.
Self-correcting tasks: Notifications
image credit
LPC 2013
Google Confidential and Proprietary
Scalability
● Churn
○
1 Creation/Deletion per 10 seconds
● Per Container
○
○
Read: O(10) cgroup-based stats per second
Write: O(1) cgroup-based param per second
● Per Machine
○
○
O(100) containers
Looks to grow dramatically
● Overall
○
○
Read: 1000’s per second
Write: 100’s per second
● Users can do a lot more.
● Precise accounting for chargeback
● Monitoring built in at multiple layers
● Extremely low overhead
LPC 2013
Google Confidential and Proprietary
Let Me
Contain
That For You
● Revised container management
○
○
Separate cgroup abstraction from policies.
Configuring cgroups with an intent-based resource specification.
● Built for scalability and parallel access.
● Also includes extra kernel patches for:
○
○
○
○
Improving resource isolation.
Providing tighter performance guarantees.
Precise accounting in face of sharing.
Cap for global resources.
● Allow users to create subcontainers with restrictions.
● Open-source: Sharing use-cases, problems, and benchmarks.
● Implement policies in a higher layer:
○
○
○
○
LPC 2013
Continuous monitoring and fine-tuning.
No critical loops [Remember LPC2011?]
Machine-level utilization and isolation management.
Isolated from system APIs.
Google Confidential and Proprietary
Hierarchical Sharing
An allocation A1 with two tasks T1 and T2
/dev/cgroup/cpu/A1
[2048]
T1
[1536]
/dev/cgroup/mem/A1
[4G]
T2
[512]
T1
[2G]
T2
[3G]
Task running in an allocation sharing resources
with co-located siblings.
LPC 2013
Google Confidential and Proprietary
Managing priority across resources
Block I/O
Cpu
T2
[0.8]
Default
[0.1]
Memory
T2
[1024]
Default
[2]
T1
[1G]
T2
[2G]
T3
[1G]
T1
[0.1]
T3
[0.1]
T1
[512]
Cgroups for low-priority batch tasks
LPC 2013
T3
[256]
Cgroups for a latency sensitive task
Google Confidential and Proprietary
Managing priority across resources
Block I/O
Default
[0.1]
Cpu
T1
[0.8]
Default
[2]
Memory
T1
[2048]
T1
[4G]
T2
[0.1]
T1
[0.3]
T1
[PRIO]
[0.5]
T2
[2G]
T2
[1024]
Cgroups for a high I/O priority
latency sensitive task
Cgroups for a low priority
task
A task may require multiple containers for the same resource to balance its
workload priorities. I/O server T1 uses two subcontainers to differentiate incoming
I/O requests and moves threads to the right subcontainer.
LPC 2013
Google Confidential and Proprietary
Splitting hierarchies for performance
Block I/O
T1
[0.8]
Default
[0.1]
T1
[0.5|P]
T2
[0.1]
Cpu
T3
[0.1]
T1
[2048]
T3
[1024]
Splitting hierarchies reduces
stranded resources and
improves performance for
highly sensitive tasks.
LPC 2013
T2
[1024]
Default
[2]
T1
[0.3]
Memory
T3
[2G]
T1
[4G]
T2
[2G]
Cpu, Memory and
I/O sensitive task
Cpu & Memory
sensitive task with
low I/O priority
Low priority batch
task
Google Confidential and Proprietary
User Subcontainers
App Engine Task
Protected
server app
Server
Subcontainers
with tailored spec
and priority
Instances
OOM
Instance1
Instance2
Instance3
App Engine uses on-demand container creation:
fair sharing, notifications, and isolation of misbehaving apps
LPC 2013
Google Confidential and Proprietary
Takeaways
● Cgroups support goes beyond containerized VMs.
● Sharing and overcommitment is a key to higher
utilization.
● Managing each resource separately helps fine-tune
utilization and performance.
● More power to users means better flexibility and
scalability.
Come find us for chat, discussions, BoF, and
drinks.
Or virtually:
[email protected]
[email protected]
LPC 2013
Google Confidential and Proprietary