Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4, 2017 Google Network More than a collection of data centers FASTER (US, JP, TW) 2016 SJC (JP, HK, SG) 2013 Unity (US, JP) 2010 Network fiber Points of presence >100 Google Global Cache edge nodes Google Cloud Regions Adding 11 new regions 3 Netherlands 2 London 3 3 Oregon 2 California Iowa 3 4 3 3 3 Montreal 3 Finland Frankfurt Belgium N Virginia 3 S Carolina 3 3 Mumbai 2 3 # Current regions and number of zones # Future regions and number of zones Tokyo Taiwan Singapore São Paulo 3 Sydney Ubiquitous Cloud...10x Scaling Datacenter Campus & Metro WAN Next-gen disaggregation of storage, memory and compute Cloud regions and campus expansion driving DC interconnect Cloud replication and bandwidth intensive cloud services (e.g., turnkey video, IoT) 10x 10x 10x Step Function Disruptions: Bandwidth, Latency, Availability, Predictability The Pillars of SDN @ Google B4 Andromeda Jupiter WAN Interconnect NFV and network virtualization Datacenter Networking The Pillars of SDN @ Google B4 Andromeda Jupiter Espresso WAN Interconnect NFV and network virtualization Datacenter Networking SDN for public Internet B4: Google's Software Defined WAN B4: [Jain et al, SIGCOMM 13] BwE: [Jain et al, SIGCOMM 15] B4 traffic B4: From Copy Network to Business Critical 2012 — 2016 B4: [Jain et al, SIGCOMM 13] BwE: [Jain et al, SIGCOMM 15] Andromeda Google Infrastructure Services VNET: 10.1.1/24 VNET: 192.168.32/24 Load Balancing DoS ACLs VNET: 5.4/16 VPN NFV ToR ToR ToR ToR Internal Network 10.1.1/24 10.1.2/24 10.1.3/24 10.1.4/24 Google Datacenter Network Innovation Capacity And hardware scale that we could not buy Jupiter Watchtower Firehose 1.0 Saturn 4 Post 1.3Pb/s clusters in 2013 Firehose 1.1 Time 10 The Pillars of SDN @ Google B4 Andromeda Jupiter WAN Interconnect NFV and network virtualization Datacenter Networking Public Internet? The Pillars of SDN @ Google B4 Andromeda Jupiter Espresso WAN Interconnect NFV and network virtualization Datacenter Networking SDN for public Internet Espresso in Context B4 Jupiter Data Center Google Espresso in Context Peering Metro B2 B4 Jupiter Data Center Google Google Espresso in Context User Peering Metro B2 Espresso B4 Jupiter Data Center Google Internet Google Espresso: Before and After Router Cloud Centric 1.0 Protocols Local view Connectivity first Coarse fault recovery Espresso SDN Peering Per-metro and global view Application signals Real-time optimization Espresso Architecture Overview Espresso Metro Peering Fabric BGP speaker Label-switched Fabric eBGP Peering External Peer Espresso Architecture Overview Espresso Metro Peering Fabric Host BGP speaker Label-switched Fabric eBGP Peering External Peer Host Host Host Host Host Packet Processor Labeled packets specify egress Host Host Host Host Host Espresso Architecture Overview Global Controller Espresso Metro Application Signals Local Control Peering Fabric Host BGP speaker Label-switched Fabric eBGP Peering External Peer Host Host Host Host Host Packet Processor Labeled packets specify egress Host Host Host Host Host Next Decade Challenges in Networking The next wave in computing • Serverless compute in Cloud 3.0 • • IoT Tightly coupled, general purpose distributed computing It’s time to put it all together • Agile Scale • • • Jitter Isolation Performance is great, but only meaningful with availability, manageability, and velocity Last Decade Cloud 1.0 Virtualization delivers capex savings to enterprise DCs Now HW on Demand Cloud 1.0 Cloud 2.0 Public cloud frees enterprise from private HW infrastructure Scheduling, load balancing primitives, “big data” query processing The Third Wave of Cloud Computing Compute, not servers Cloud 1.0 Cloud 2.0 Serverless compute, real-time intelligence, and machine learning Not data placement, load balancing, OS configuration and patching Cloud 3.0 The Third Wave of Cloud Computing Cloud 1.0 Cloud 2.0 Cloud 3.0 Networking should be aiming for Cloud 3.0 Networking and Cloud 3.0 Storage disaggregation: the datacenter is the storage appliance Seamless telemetry and scale up/down Transparent live migration Open Marketplace of services, securely placed and accessed Networking and Cloud 3.0 Applications+Functions not VMs Policy not middleboxes Actionable Intelligence not data processing SLOs not placement/load balancing/scheduling Next Decade Challenges in Networking The network will enable next-generation compute infrastructure The network can define next-generation storage infrastructure The right network infrastructure can deliver fundamental new capability How we Prioritize Infrastructure Work Performance Stranding Velocity Manageability Availability Availability is Paramount • • • • • First things first: an insecure infrastructure is an unavailable infrastructure Stability is more important than efficiency Network management is critical Configuration is hard Automation matters but can be counter to availability “Evolve or Die: High-Availability Design Principles Drawn from Google’s Network Infrastructure.” SIGCOMM 2016. Build for Velocity • • • • • • Velocity is the speed of iteration Retrospective on “Tussle in Cyberspace: Defining Tomorrow’s Internet” Build for hitless upgrades and self-validation Debugging and tracing matter ○ Without visibility, performance does not matter Network fabrics built for expansion and evolution Launch and Iterate Isolation is Critical; Stranding is Terrible Isolation with reservations is easy but leads to huge resource stranding ● General-purpose, shared infrastructure to approximate custom-built and reserved Isolation has many components ● Latency, bandwidth, but also the control plane ● Accounting and chargeback are big missing pieces Congestion Control is still really hard ● Rationalizing multiple control loops, flow, endpoint, flow group, Traffic Engineering Performance only Matters if End to End Amdahl’s law applies and so an incredible, localized optimization that takes any effort to adopt will be ignored 1. 2. 3. Scale Jitter Storage Disaggregation Must optimize from the application all the way to the end user How we Prioritize Infrastructure Work Performance Stranding Velocity Manageability Availability Next Decade Challenges in Networking The next wave of computing • Serverless compute in Cloud 3.0 • • IoT Tightly coupled, general purpose distributed computing It’s time to put it all together • Agile Scale • • • Jitter Isolation Performance is great, but only meaningful with availability, manageability, and velocity Thank You! Thank You! Open Source Google MapReduce Google Bigtable Google Borg Google Cloud Platform Google Borg Google Dremel 36 Open Source QUIC TCP BBR gRPC Google Cloud Platform Open Config ... 37
© Copyright 2026 Paperzz