Flexible Networking at Large Mega-Scale

Flexible Networking at
Large Mega-Scale
Exploring issues and solutions
What is “Mega-Scale”?
One or more of:
● > 10,000 compute nodes
● > 100,000 IP addresses
● > 1 Tb/s aggregate bandwidth
● Massive East/West traffic between tenants
Yahoo is “Mega-Scale”
What are our goals?
● Mega-Scale, with
○ Reliability
■
Yahoo supports ~200 million users/day -- it must be reliable
○ Flexibility
■
Yahoo has 100s of internal and user-facing services
○ Simplicity
■
Undue complexity is the enemy of scale!
Our Strategy
Leverage high-performance network design with:
➢ OpenStack
➢ Augmented with additional automation
➢ Hosting applications designed to be “disposable”
- Fortunately, we already had many of the needed pieces
Traditional network design
●
●
●
●
●
Large layer 2 domains
Cheap to build and manage
Allows great flexibility of solutions
Leverage pre-existing network design
IP mobility across the entire domain
It’s Simple. But...
L2 Networks Have Limits
● The L2 Domain can only be extended so far
○ Hardware TCAM limitations (size and update rate)
○ STP scaling/stability issues
● But an L3 network can
○ scale larger
○ at less cost
○ but limits flexibility
Potential Solutions
● Why not use a Software Defined Network?
○ Overlay allows IP mobility but
■ Control plane limits scale and reliability
■ Overhead at on-ramp boundaries
○ OpenFlow-based solutions
■ Not ready for mega-scale yet w/ L3 support
■ Control plane complexities
Not Ready for Mega-Scale
Our Solution
● Use Clos design network backplane
● Each cabinet has a Top-Of-Rack router
○
○
○
Cabinet is a separate L2 domain
Cabinets “own” one or more subnets (CIDRs)
OpenStack is patched to “know” which subnet to use
● Network backplane supports East-West and NorthSouth traffic equally Well
● Structure is ideal if we decide to deploy SDN overlay
A solution for scale: Layer 3 to the rack
L3
L2
...
Compute Racks
Compute + Admin
• Clos-based L3 network
• TOR (Top Of Rack) routers
Admin= API, DB, MQ, etc
Adding Robustness With Availability Zones
Problems
● No IP Mobility Between Cabinets
○ Moving a VM between cabinets requires a re-IP
○ Many small subnets rather than one or more large
ones
○ Scheduling complexities:
■ Availability zones, rack-awareness
● Other issues
○ Coordination between clusters
○ Integration with existing infrastructure
You call that “flexible?”
(re-)Adding Flexibility
● Leverage Load Balancing
○ Allows VMs to be added and removed
(remember, our VMs are mostly “disposable”)
○ Conceals IP changes (such as rack/rack movement)
○ Facilitates high-availability
○ Is the key to flexibility in what would otherwise be a
constrained architecture
(re-)Adding Flexibility (cont’d)
● Automate it:
○ Load Balancer Management
■ Device selection based on capacity & quotas
■ Association between service groups and VIPs
■ Assignment of VMs to VIPs
○ Availability Zone selection & balancing
○ Multiple cluster integration
● Implement “Service Groups”
○ (external to OpenStack -- for now)
Service Groups
● Consists of groups of VMs running the same
application
● Can be a layer of an application stack, an
implementation of an internal service, or a
user-facing server
● Present an API that functions behind a VIP
○ Web services everywhere!
Service Group Creation
Integrating With Openstack
Putting It Together
● Registration of hosts and services
○ A VM is associated with a service group at creation
○ A tag associated with the service group is accessible
to resource allocation
● Control of load balancers
○ Allocates and controls hardware
○ Manages VMs for each service group
○ Provides elasticity and robustness
Putting It Together (cont’d)
● OpenStack Extensions and Patches
○ Three points of integration:
1. Intercept request before issue
2a. Select network based on hypervisor
2b. Transmit new instance information to external automation
3. Transmit deleted instance information to external automation
Wither OpenStack?
● Our Goals:
○
○
○
○
Minimize patching code
Minimize points of integration with external systems
Contribute back patches of general use
Replace custom code with community code:
■ Use Heat for automation
■ Use LBaaS to control load balancers
○ Share our experiences
Complications
● OpenStack clusters don’t exist in a vacuum
-- this makes scaling them harder
○
○
○
○
○
○
Existing physical infrastructure
Existing management infrastructure
Interaction with off-cluster resources
Security and organizational policies
Requirements of existing software stack
Stateful application introduce complexities
Conclusion
● Mega-Scale has unique issues
○ Many potential solutions don’t scale sufficiently
○ Some flexibility must be sacrificed
*BUT*
○ Mega-Scale also admits solutions that aren’t
practical or cost-effective at smaller scale
○ Automation and integration with external
infrastructure is key
Questions
?
email: [email protected]