Control Plane Architectures: Design Solutions Shane Gibson – Cloud Infrastructure Architect ZeroStack, Inc. - https://zerostack.com/ OpenStack Summit - Boston, MA- May 11, 2017 ZeroStack Inc. | zerostack.com ©©ZeroStack Inc. | zerostack.com QR Code Why take pix? Just use the QR Code! https://www.slideshare.net/ShaneGibson3/openstack-control-plane-architectures-design-solutions © ZeroStack Inc. | zerostack.com 2 IMPORTANT LEGAL STUFF Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris venenatis posuere odio vel auctor. Fusce non turpis nec lorem varius dictum. Nulla felis neque, convallis a congue sed, molestie vel lectus. Aenean hendrerit metus non nunc commodo sodales. Etiam ac erat at massa tincidunt lobortis at et ligula. Fusce sed lorem tellus. Suspendisse potenti. Ut dignissim suscipit aliquet. Donec luctus pulvinar lectus quis condimentum. Etiam sed iaculis nunc, sed blandit magna. Fusce mattis nisl nec dapibus luctus. Proin a augue facilisis, vehicula mauris non, cursus augue. Mauris tristique, justo vitae tempor tincidunt, metus ligula ornare tellus, at condimentum ex diam nec odio. Sed porttitor ultrices libero sed efficitur. Cras sem diam, eleifend sit amet dui eu, pretium cursus fermentum nibh. Donec tincidunt cursus enim a varius. Nam placerat eu nunc id rutrum. Praesent ullamcorper fringilla eros, vitae rutrum elit consectetur in. Aliquam eu tempus dui, a feugiat nulla. Quisque laoreet imperd ex, in facilisis elit tristique a. Nunc ante felis, faucibus at semper nec, consequat commodo magna. © ZeroStack Inc. | zerostack.com 3 About Shane Gibson Shane Gibson serves as the Cloud Infrastructure Architect for ZeroStack, Inc., which is a private cloud solutions company. There he is responsible for the architecture, implementation, and management of the internal cloud platform that drives the SaaS and Cloud Portal that power the ZeroStack solution. Previously, he served as Sr. Principal Infrastructure Architect at Symantec for the Cloud Platform Engineering (CPE) team. He was responsible for the infrastructure design of the underlying platforms, operating systems, tools, and application stack that enables the OpenStack clusters within the CPE group. In previous roles, Shane has served as a Systems Architect, Network Architect, Security Architect, Unix Systems Administrator, Mainframe Operator, Mainframe Hardware Specialist, and has also served in the United States Marine Corps. In his "spare" time, he loves to anything on two wheels; motorcycling, mountain biking, road biking, cyclocross, etc… © ZeroStack Inc. | zerostack.com 4 Agenda what we'll be talking about (and not) problem statement needs analysis solutions summary questions thank you references © ZeroStack Inc. | zerostack.com 5 what we'll be talking about (and not) … © ZeroStack Inc. | zerostack.com What we'll be talking about ● Short definition of what "Control Plane" means ● Short definition of what "Data Plane" means ● How much Control Plane do you need? ● Briefly discuss general HA design solutions ● Introduce four design architectures ○ Stand alone (seriously!) ○ Active/passive ○ Fully Redundant, separate control plane ○ Distributed, embedded control plane ● Discuss the architecture of these design solutions © ZeroStack Inc. | zerostack.com 7 What we won't be talking about ● Things that aren't OpenStack ○ Ancillary services (eg AD/LDAP behind Keystone) ○ Server Load Balancers architectures (they're key to HA!) ■ ok, we'll talk about them a bit … ● Specifics of Network Controller architecture ● Container Orchestration Engine (COE) HA ● Physical infrastructure (eg power, cooling, etc.) ● Complex DB setups (sharding, multisite … ) ● Multi-site Control Plane ● Storage HA architecture (Ceph, Swift, etc…) © ZeroStack Inc. | zerostack.com 8 Control Plane Definition ● The control plane is the management traffic responsible for sending signaling and commands, examples: ○ give me a token so I can do something control plane ○ create port, network, router ○ instantiate/terminate an instance ● Sort of like a Drill Sergeant: ○ instructs recruits (data plane) ○ signals and commands Ref: 1 © ZeroStack Inc. | zerostack.com 9 Data Plane Definition ● The data plane is all of the bits and bytes moving around related to doing the work as instructed by the control plane: ○ actually instantiating the instance data plane ○ east/west traffic between VMs, ○ north/south traffic in and out of your cloud ● Kind of like these poor Recruits ○ stand at attention, pass out at attention !! Ref: 1 © ZeroStack Inc. | zerostack.com 10 problem statement © ZeroStack Inc. | zerostack.com Problem Statement ● So you've completed a PoC … like what you see … ● Need to build a shiny new cloud ● From PoC to production - what architecture do you need? ● Understand your needs ● Match your needs to a design ● Overbuilding is just as dangerous as under building ● But, keep in mind - you may need/ want/forced to scale ○ You're control plane needs to grow with your cloud © ZeroStack Inc. | zerostack.com man, this devstack is easy !! 12 needs analysis © ZeroStack Inc. | zerostack.com Needs Analysis ● Understanding how much reliability you need is critical to determining an appropriate CP architecture ● Quantify how available your platform needs to be ● Be honest … can you live with a 95% available CP? How about 98% ? Do you *need* 99.9%? Can you afford to build, staff, support, and maintain 99.999%? ● Complexity adds cost, time, and significant risk © ZeroStack Inc. | zerostack.com 14 Needs Analysis - how much is enough ● Downtime, based on percentage of availability: Percentage Yearly Monthly Weekly Daily 95% 18d 6h 17m 27.6s 1d 12h 31m 27.3s 8h 24m 0.0s 1h 12m 0.0s 98% 7d 7h 18m 59.0s 14h 36m 34.9s 3h 21m 36.0s 28m 48.0s 99% 3d 15h 39m 29.5s 7h 18m 17.5s 1h 40m 48.0s 14m 24.0s 99.5% 1d 19h 49m 44.8s 3h 39m 8.7s 50m 24.0s 7m 12.0s 99.9% 8h 45m 57.0s 43m 49.7s 10m 4.8s 1m 26.4s 99.99% 52m 35.7s 4m 23.0s 1m 0.5s 8.6s 99.999% 5m 15.6s 26.3s 6.0s 0.9s 365.243 days per year (leap year, baby!) © ZeroStack Inc. | zerostack.com 52.178 weeks per year 30.437 days per month 4.348 weeks per month calculations source: http://uptime.is/ 15 Needs Analysis ● To match your uptime/downtime threshold ○ Understand business use of your platform ○ Survey your user groups to determine what applications they will be using, and how critical they are ● Determine how much talent (be honest) you have to build or you can buy (hire or rent) for the platform you need… ○ *You* might be a rock star, but you need a dedicated and competent team to tend to a complex HA solution ○ A well tended single server solution *may* outperform a poorly managed highly complex one ■ performance, of course, not-withstanding … © ZeroStack Inc. | zerostack.com 16 Needs Analysis: match uptime to solution ● A complete (bogus?) guideline: © ZeroStack Inc. | zerostack.com 95 to 98 % Active/Passive 98 to 99.5 % 99.5 to 99.99 % 99.99+ % Active/Active or Distributed Standalone 17 Needs Analysis ● How much capacity (compute, memory, storage, etc) do you need for your control plane services? o Great resource/data: o URL: https://docs.openstack.org/developer/performance-docs/test_results/ o Example Control Plane resource consumption for: • • • • 6 nodes 200 nodes 400 nodes 1000 nodes © ZeroStack Inc. | zerostack.com 18 patterns - basics of availability designs © ZeroStack Inc. | zerostack.com HA Design Solutions - single system with hardware redundancy Server (redundant hardware subystems) typically located in a datacenter(like) location with redundant power, network, cooling, etc… capacity / scaling is going to be your bug-a-boo (you can only scale "up" so much), suggest building in service LB from the beginning © ZeroStack Inc. | zerostack.com 20 HA Design Solutions - active/passive ● either bare metal or virtualized / containerized work loads mysql mysql (active) (standby) VIP svc A svc A svc B svc B svc C svc C mysql replication replicated data (eg DRBD) service based replication example: mysql repl. © ZeroStack Inc. | zerostack.com externally replicated, service is unaware - eg use of load balancer and pacemaker + DRBD 21 HA Design Solutions - clustered follower leader C A B A B C follower application maintains and controls cluster replication, leader election, and take-overs © ZeroStack Inc. | zerostack.com A B C 22 HA Design Solutions - virtualized services ● Implement simple hypervisors (eg just bare KVM) ● or implement a small OpenStack cluster (caution !!) ● a lot of interesting Containerized CP solutions are maturing VIP A VIP B VIP A VIP B VIP A VIP B VM - service A VM - Service A VM - Service A VM - service B VM - Service B VM - Service B hypervisor 1 hypervisor 2 hypervisor 3 © ZeroStack Inc. | zerostack.com 23 HA Design Solutions - distributed services ● Embed a VM or Container in each hypervisor of your cluster which is responsible for service orchestration tasks VIP A / B services orch. data VIP A / B services orch. data VIP A / B services orch. data controller service A controller service B controller service C VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM hypervisor A © ZeroStack Inc. | zerostack.com hypervisor B hypervisor C 24 solutions - as applied to control plane © ZeroStack Inc. | zerostack.com Solutions: Overview ● Standalone (yes, still!) © ZeroStack Inc. | zerostack.com 26 Solutions: Overview ● Standalone (yes, still!) ● Active/Passive ○ one master, one standby system © ZeroStack Inc. | zerostack.com 27 Solutions: Overview ● Standalone (yes, still!) ● Active/Passive ○ one master, one standby system ● Active/Active Cluster ○ Multiple members in cluster ○ Leader election system / quorom protocols ○ Or - use of load balancers for singleton systems © ZeroStack Inc. | zerostack.com 28 Solutions: Overview ● Standalone (yes, still!) ● Active/Passive ○ one master, one standby system ● Active/Active Cluster ○ Multiple members in cluster ○ Leader election system / quorom protocols ○ Or - use of load balancers for singleton systems ● Distributed System ○ Embedded across the cluster © ZeroStack Inc. | zerostack.com 29 Solutions: how the control plane correlates ● Remember our Drill Sergeant ? control plane control plane data plane Ref: 3 Ref: 1 © ZeroStack Inc. | zerostack.com 30 Solutions: how the control plane correlates ● how this might look in racks … tor01 tor02 tor03 tor04 tor05 tor06 tor07 tor08 tor09 tor10 tor11 tor12 infra01 infra03 compute01 compute07 compute13 compute19 infra02 builder compute02 compute08 compute14 compute20 controller01 controller03 ... ... ... ... controller02 db01 ... ... ... ... db02 db03 ... ... ... ... network01 network03 ... ... ... ... network02 storage01 data01 data04 data07 data10 storage02 storage03 data02 data05 data08 data11 ... ... ... ... © ZeroStack Inc. | zerostack.com 31 Solutions: standalone ● Stand alone doesn't have to mean "prone to failure" ○ Redundant power supplies (with redundant feeds) ○ Redundant NICs/separate LOM + PCIe (or 2x PCIe) ○ Hardware RAID based storage ○ Redundant Top-of-Rack (bonded NICs) ○ In an environmentally controlled facility ■ cooling, power, electrical, etc. ● You would be surprised how fault tolerant a single, well designed system can be… ● Can only "scale up" so much before you have to "scale out" ○ Edgar Magana of Workday: OpenStack HA, or not HA ■ not HA - Level 4 Ballroom G at 5:30pm © ZeroStack Inc. | zerostack.com 32 Solutions: standalone © ZeroStack Inc. | zerostack.com 33 Solutions: how much can HA/Reliability cost you ? ● Have you ever heard of the Jepsen tests or articles? ○ Check out "The Network is Reliable" [Ref: 2] (Kyle Kingsbury): ○ it just might chill your blood … © ZeroStack Inc. | zerostack.com 34 Solutions: active/passive ● Ok, maybe standalone doesn't cut it for you … ● Active/Passive utilizes a service to monitor the main (active) service, and then execute a coup if trouble is detected… for example: STONITH (Shoot The Other Node In The Head) © ZeroStack Inc. | zerostack.com 35 Solutions: active/passive ● Most of the services aren't aware of the fact they have a "shadow partner" … ● Utilize various tools to monitor services, and initiate a takeover if the primary/active service fails ○ keepalived, pacemaker, corosync, STONITH, etc… ● Data is usually replicated outside of applications knowledge ○ DRBD (Distributed Replicated Block Devices) ■ very stable, around a LONG time, actively maintained and supported ○ xNBD/bNBD, SAN based replication ○ Ceph RBD (replica of 2), GlusterFS, etc… ○ Or … "simply" via database replication © ZeroStack Inc. | zerostack.com 36 Solutions: active/passive ● Primary mechanism is Service LB with a watchdog of some type ● Let distributed services (eg rabbitmq and mysql) replicate natively ● Shared storage for things like configurations, backing instances, etc. © ZeroStack Inc. | zerostack.com 37 Solutions: fully redundant ● So you've decided you're "all in" ○ Fully Redundant - requires very careful consideration ○ Complex HA and Reliability solutions have their own baggage that just might cost you more than you bargained for ○ But if you need to drive towards the 99% and better uptime… ○ Each service requires it's own treatment in terms of architecture … but there are common threads © ZeroStack Inc. | zerostack.com 38 Solutions: fully redundant - virtualized ● Like active/passive - but we now scale 3, 5, etc… (odd numbers for proper quorum) of fully active members © ZeroStack Inc. | zerostack.com 39 Solutions: fully redundant - containerized ● New alternatives emerging around COE models for managing your Control Plane services. ● Kubernetes Example: Kubernetes Master with HA – One of many proposed HA models © ZeroStack Inc. | zerostack.com 40 Solutions: fully redundant - containerized ● Kubernetes Worker Nodes: kubelet kubernetes kubernetes masterN kubernetes masterN masterN kubelet kubelet mysql neutron neutron glance glance nova cinder nova cinder nova cinder ...etc... worker 1 worker 2 worker 3 © ZeroStack Inc. | zerostack.com 41 Solutions: distributed ● Big departure from the traditional model ● With distributed (embedded) clusters, there are some special considerations necessary: ○ Be very careful of "noisy neighbor" problem causing your control plane grief ○ See "Quantifying the Noisy Neighbor Problem" by ZeroStack from Austin 2016 Summit ○ Designing the algorithms on placing and managing your control plane systems in the cluster can be very complex ○ Need a distributed state/service orchestration piece (eg etcd, consul, serf, atomix, zookeeper) © ZeroStack Inc. | zerostack.com 42 Solutions: distributed ● or ... ● Consider a COE (container orchestration engine) to manage the placement and healing properties of your CP: ○ Still a relatively young solution with potential pitfalls ○ Can utilize this model with Fully Redundant or Distributed models ○ Consider tight QoS controls (eg namespaces and cgroups) for service guarantees if using Distributed © ZeroStack Inc. | zerostack.com 43 Solutions: distributed - four node cluster © ZeroStack Inc. | zerostack.com 44 Solutions: distributed ● When you have a CP that dynamical does this, auto-heals, deals with noisy neighbors, and can scale on demand … © ZeroStack Inc. | zerostack.com 45 QUESTIONS ? © ZeroStack Inc. | zerostack.com We are hiring!! Check us out on the thingy called the "web", at: https://www.zerostack.com/careers/ © ZeroStack Inc. | zerostack.com THANK YOU! Shane Gibson [email protected] ZeroStack Inc. | zerostack.com ©©ZeroStack Inc. | zerostack.com References [1] CartoonStock License Agreement: https://www.cartoonstock.com/licenseagreement.asp [2] "The network is reliable" (Kyle Kingsbury and Peter Bailis): https://aphyr.com/posts/288-the-network-is-reliable [3] OpenStack Operators Guide: http://docs.openstack.org/openstack-ops/content/example_architecture.html#example_archs_conclusion © ZeroStack Inc. | zerostack.com 49
© Copyright 2026 Paperzz