NFV in Cloud Infrastructure Ready or Not Bhavesh Davda Office of the CTO VMware, Inc. Outline • Brief history of x86 virtualization • The case for Network Functions Virtualization • Hypervisor performance characteristics • Challenges and Call to Action X86 Virtualization Timeline x86-64 Dual core Hardware Virtualization Support Intel VT-x Intel EPT AMD-V AMD RVI 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 … ESX 4.0 (8 vcpus) ESX 3.5 Xen 1.0 ESX 3.0 (4 vcpus) Workstation 5.5 (64 bit guests) ESX 2.0 (vSMP, 2vcpus) ESX Server 1.0 First KVM release in RHEL5.4 Workstation 1.0 VMware founded Fusion 1.0 ESXi 6.0 (128 vcpus, 4TB RAM) X86 Virtualization 1998-2006 (pre-hardware virtualization) Direct Exec (user) Faults, syscalls, interrupts VMM IRET, SYSRET BT (kernel) X86 Virtualization 2006-present (hardware virtualization) Ring 3 Apps VM exit Ring 0 Guest mode Guest OS VM enter Root mode VMM Datacenter Workloads The State of Virtualization Virtualized Unvirtualized Datacenter Workloads The State of Virtualization 100% 80% Virtualized Unvirtualized Business Critical Application Virtualization Source: Cloud Infrastructure Annual Primary Research, 2Q13 (N=576) 60% 56% 56% 53% 53% 51% 49% Oracle Applications SAP Oracle Middleware Oracle Databases 58% 56% Other Middleware 58% High Performance Computing 59% Other Mission-critical Apps 60% 40% 20% 0% Average Microsoft SharePoint Microsoft SQL Microsoft Exchange Outline • Brief history of x86 virtualization • The case for Network Functions Virtualization • Hypervisor performance characteristics • Challenges and Call to Action Telecommunications FMC Wi-Fi Optical networks FTTx, NGA RCS(-e) PCRF SMSC Ga te wa ys Media (voice (MG)) ePDG OCS Location MC ANDSF MGCF IMS MRF MSAN → FSAN (POTS, DSLAM, OTN, ADMs) Carrier NAT IPTV Antivirus Firewall IPS Caching URL filtering VoIP Load balancing UC 3GPP architecture ISP (DNS, DHCP, IP management (DDI) Video optimisation WAN optimisation DPI OSS, BSS and SDP ISP DMZ LAN Corporate HSS/AAA Content delivery Networks (CDN) Border GW router MMTel VMail CSCF PSTN, ADSL, Docsis, WLL VoLTE SGW Media Femtocells HSS Gi interface PGW (PCEF) TAS Fixed Small cells (Microcells) MME External Networks Applications non-3GPP networks Radio network controllers (RNC (2G), nodeB (3G), eNodeB (LTE, BSC)) 2/3/4G(LTE), CDMA (Macrocells, BSC) Core Networks Transport Networks Internet Mobile Access Networks NFV in a nutshell NFV Motivation: From Margin Erosion to Sustainable Profitability Margin Erosion OTT Services Network Investment Funded by CSP $ ARPU $ Customer Cost per Subscriber NFV Motivation: From Margin Erosion to Sustainable Profitability Margin Erosion Sustainable Profitability OTT Services Network Investment Funded by CSP $ Cloud Operating Model ARPU $ Customer Cost per Subscriber Cloud Platform NFV Motivation: From Margin Erosion to Sustainable Profitability Margin Erosion Sustainable Profitability OTT Services Network Investment Funded by CSP $ Cloud Operating Model ARPU Service Agility $ Customer Cost per Subscriber Cloud Platform Service Differentiation NFV Architecture Operations and Business Support Systems (OSS / BSS) Orchestrator Service, VNF & Infrastructure Description VNF EMS1 EMS2 EMS3 VNF1 VNF2 VNF3 Virtual Compute Virtual Storage Virtual Network VNF Managers NFVI Virtual Infrastructure Manager Virtualization Layer Compute Hardware Storage Sample Hardware text Network Hardware NFV M&O Many Use Cases Many Use Cases Example Use Case: 4G LTE Network • LTE Network Elements Evolved UTRAN Evolved Packet Core (E-UTRAN) (EPC) HSS S6a MME: Mobility Management Entity PCRF: Policy & Charging Rule Function LTE-UE Evolved Node B (eNB) S10 MME X2 S7 Rx+ PCRF S1-MME S11 S1-U SGi S5/S8 PDN cell LTE-Uu Serving Gateway PDN Gateway SAE Gateway Deployed in production in multiple carriers worldwide LTE: vEPC – MME, PGW, SGW OSS/BSS NFV ORCHESTRATOR EMS EMS A Generic VNF Mgr VNF VIRTUAL COMPUTING VNF A VIRTUAL STORAGE VIRTUAL NETWORK VIRTUALIZED INFRASTRUCTURE MANAGER VIRTUALIZATION LAYER COMPUTING HARDWARE STORAGE HARDWARE NETWORK HARDWARE Deployed in production in multiple carriers worldwide LTE: vEPC – MME, PGW, SGW OSS/BSS NFV ORCHESTRATOR EMS EMS A Generic VNF Mgr VNF ESXi Hypervisor VIRTUAL COMPUTING VNF A VSAN VIRTUAL STORAGE NSX VIRTUAL NETWORK VIRTUALIZATION LAYER vCloud OpenStack Director VIRTUALIZED INFRASTRUCTURE MANAGER vCenter COMPUTING HARDWARE STORAGE HARDWARE NETWORK HARDWARE Deployed in production in multiple carriers worldwide LTE: vEPC – MME, PGW, SGW OSS/BSS NFV ORCHESTRATOR VNF Vendor EMS A EMS EMS Vendor’s VNF Mgr (Optional) VNF ESXi Hypervisor VIRTUAL COMPUTING MME/ SGSN PGW/SGW/ HSS GGSN VNF A VSAN VIRTUAL STORAGE Generic VNF Mgr PCRF NSX VIRTUAL NETWORK VIRTUALIZATION LAYER vCloud OpenStack Director VIRTUALIZED INFRASTRUCTURE MANAGER vCenter COMPUTING HARDWARE STORAGE HARDWARE NETWORK HARDWARE Deployed in production in multiple carriers worldwide LTE: vEPC – MME, PGW, SGW OSS/BSS NFV ORCHESTRATOR Service Orchestrator / Automation Platform VNF Vendor EMS A EMS EMS Vendor’s VNF Mgr (Optional) VNF ESXi Hypervisor VIRTUAL COMPUTING MME/ SGSN PGW/SGW/ HSS GGSN VNF A VSAN VIRTUAL STORAGE Generic VNF Mgr PCRF NSX VIRTUAL NETWORK VIRTUALIZATION LAYER vCloud OpenStack Director VIRTUALIZED INFRASTRUCTURE MANAGER vCenter COMPUTING HARDWARE STORAGE HARDWARE NETWORK HARDWARE Outline • Brief history of x86 virtualization • The case for Network Functions Virtualization • Hypervisor performance characteristics • Challenges and Call to Action Why performance is important to NFV • NFV workloads different from enterprise / datacenter workloads • NFV Performance & Portability Best Practices (ETSI GS NFV-PER 001) classification – Data plane workloads: tasks related to packet handling in an end-to-end communication between edge applications…expected to be very intensive in I/O operations and memory R/W operations. E.g. CDN cache, router, IPsec tunneling • High packet rates – Control plane workloads: communication between NFs that is not directly related to the end-to-end data communication between edge applications, like session management, routing or authentication. Compared to data plane workloads, control plane workloads are expected to be much less intensive in terms of transactions per second, while the complexity of the transactions might be higher • CPU and memory intensive – Signal processing workloads: NF tasks related to digital processing such as the FFT decoding and encoding in a C-RAN Base Band Unit (BBU). Expected to be very intensive in CPU processing capacity and high delay-sensitive • Latency and jitter sensitive ESXi Networking Datapath Overview ESXi Hypervisor Management Agents Background Tasks VMs Virtual Switch Interconnect NIC Server Network Switch ESXi Networking Datapath Overview • Message copy from application to GOS ESXi Hypervisor (kernel) Management Agents Background Tasks VMs Virtual Switch Interconnect NIC Server Network Switch ESXi Networking Datapath Overview • Message copy from application to GOS ESXi Hypervisor (kernel) Management Agents • GOS (network stack) + vNIC driver queues Background Tasks packet for vNIC VMs Virtual Switch Interconnect NIC Server Network Switch ESXi Networking Datapath Overview • Message copy from application to GOS ESXi Hypervisor (kernel) Management Agents • GOS (network stack) + vNIC driver queues Background Tasks packet for vNIC • VM exit to VMM/Hypervisor VMs Virtual Switch Interconnect NIC Server Network Switch ESXi Networking Datapath Overview • Message copy from application to GOS ESXi Hypervisor (kernel) Management Agents • GOS (network stack) + vNIC driver queues Background Tasks packet for vNIC • VM exit to VMM/Hypervisor VMs • vNIC implementation emulates DMA from VM, sends to vSwitch Virtual Switch Interconnect NIC Server Network Switch ESXi Networking Datapath Overview • Message copy from application to GOS ESXi Hypervisor (kernel) Management Agents • GOS (network stack) + vNIC driver queues Background Tasks packet for vNIC • VM exit to VMM/Hypervisor VMs • vNIC implementation emulates DMA from VM, sends to vSwitch • vSwitch queues packet for pNIC Virtual Switch Interconnect NIC Server Network Switch ESXi Networking Datapath Overview • Message copy from application to GOS ESXi Hypervisor (kernel) Management Agents • GOS (network stack) + vNIC driver queues Background Tasks packet for vNIC • VM exit to VMM/Hypervisor VMs • vNIC implementation emulates DMA from VM, sends to vSwitch • vSwitch queues packet for pNIC Virtual Switch • pNIC DMAs packet and transmits on the wire Interconnect NIC Server Network Switch Guest OS bypass and hypervisor bypass ESXi Hypervisor Management Agents Background Tasks VMs Hypervisor provides: • Live migration (vMotion) • Network virtualization (VXLAN, STT, Geneve) • Virtual network monitoring, troubleshooting, stats • High availability • Fault tolerance • Distributed resource scheduling • VM snapshots • Hot add/remove of devices, vCPUs, memory Virtual Switch Interconnect NIC Server Network Switch Guest OS bypass and hypervisor bypass ESXi Hypervisor Management Agents Guest OS bypass Background Tasks VMs Hypervisor provides: • Live migration (vMotion) • Network virtualization (VXLAN, STT, Geneve) • Virtual network monitoring, troubleshooting, stats • High availability • Fault tolerance • Distributed resource scheduling • VM snapshots • Hot add/remove of devices, vCPUs, memory Virtual Switch Interconnect NIC Server Network Switch Guest OS bypass and hypervisor bypass ESXi Hypervisor Management Agents Guest OS bypass Background Tasks Hypervisor bypass VMs Hypervisor provides: • Live migration (vMotion) • Network virtualization (VXLAN, STT, Geneve) • Virtual network monitoring, troubleshooting, stats • High availability • Fault tolerance • Distributed resource scheduling • VM snapshots • Hot add/remove of devices, vCPUs, memory Virtual Switch Interconnect NIC Server Network Switch Guest OS bypass and hypervisor bypass ESXi Hypervisor Management Agents Guest OS bypass Background Tasks Hypervisor bypass VMs Hypervisor provides: • Live migration (vMotion) • Network virtualization (VXLAN, STT, Geneve) • Virtual network monitoring, troubleshooting, stats • High availability • Fault tolerance • Distributed resource scheduling • VM snapshots • Hot add/remove of devices, vCPUs, memory Virtual Switch Interconnect NIC Server Network Switch The Latency Sensitivity Feature in vSphere 5.5 Hypervisor Hypervisor The Latency Sensitivity Feature in vSphere 5.5 Latency-sensitivity Hypervisor Hypervisor Latency-sensitivity Feature – Minimize virtualization overhead – Achieve near bare-metal performance ESXi 5.5 Latency and Jitter Improvements • New virtual machine property: “Latency sensitivity” – High => lowest latency – Medium => low latency • Exclusively assign physical CPUs to virtual CPUs of “Latency Sensitivity = High” VMs – Physical CPUs not used for scheduling other VMs or ESXi tasks • Idle in Virtual Machine monitor (VMM) when Guest OS is idle – Lowers latency to wake up the idle Guest OS, compared to idling in ESXi vmkernel • Disable vNIC interrupt coalescing • For DirectPath I/O, optimize interrupt delivery path for lowest latency • Make ESXi vmkernel more preemptible – Reduces jitter due to long-running kernel code ESXi 6.0 Network Latencies and Jitter 12 60 10 Round-trip Time (microseconds) 50 8 40 Native Native SR-IOV 30 SR-IOV 6 SR-IOV LS SR-IOV LS VMXNET3 VMXNET3 VMXNET3 LS 20 VMXNET3 LS 4 2 10 0 0 Avg Single 4-vCPU VM to Native, Ubuntu 14.04 LTS, RTT from ‘ping –i 0.00001 –c 1000000’ Intel Xeon E5-2697 v2 @ 2.70 GHz, Intel 82599EB PNIC Standard Deviation ESXi Real-time Virtualization Performance Latency Jitter Interrupt ISR ISR ISR period • cyclictest –p 99 –a 1 –m –n –D 10m –q ESXi 5.0 ESXi 6.0 Min 5 μs Max 1676 μs Avg 13 μs Min 2 μs Max 78 μs Avg 4 μs 99%-ile 56 μs 99%-ile 5 μs 99.99%-ile 118 μs 99.99%-ile 7 μs osu_latency benchmark Open MPI 1.6.5 MLX OFED 2.2 RedHat 6.5 HP DL380p Gen8 Mellanox FDR InfiniBand ESXi MPI InfiniBand Latencies MPI PingPong Latency Half Round-trip Latency (µs) 3.5 3 2.5 Native ESX 5.5 ESX 6.0 ESX 6.0u1 2 1.5 1 0.5 0 1 2 4 8 16 32 64 Message size (bytes) 128 256 512 1024 Data Plane Development Kit (DPDK) DPDK Sample Applications ISV Eco-System Applications Customer Applications EAL METER MALLOC SCHED E1000 IGB EXACT MATCH MBUF QoS IXGBE I40e ACL KNI VMXNET 3 VIRTIO XENVIRT RING PCAP 3rd Party NIC MEMPOOL • • • Speeds up networking functions Enables user space application development Facilitates both run-to-completion and pipeline models Free, Open-sourced, BSD Licensed • Git: http://dpdk.org/git/dpdk Classify RING POWER TIMER IVSHMEM Core Libraries LPM ETHDEV Libraries for network application development on Intel Platforms Platform Scales from Intel Atom to multi-socket Intel Xeon architecture platforms Packet Access (PMD- Native & Virtual) User Space KNI Linux Kernel IGB_UIO Rich selection of pre-built example applications Data Plane Development Kit (DPDK) SCHED MBUF QoS MEMPOOL TIMER Core Libraries Quality of Service MALLOC Standard Libraries for arrays, buffers, rings METER Platform Libraries IP interface, Power, Shared Memory Customer Applications EAL RING ISV Eco-System Applications KNI POWER IVSHMEM Platform LPM ETHDEV E1000 Classification Algorithms (Pattern Matching) DPDK Sample Applications IGB Ethernet Poll Mode IXGBE I40e Drivers for HW and VMXNET VIRTIO 3Virtual HW • • • EXACT MATCH ACL Speeds up networking functions Enables user space application development Facilitates both run-to-completion and pipeline models Free, Open-sourced, BSD Licensed • Git: http://dpdk.org/git/dpdk Classify XENVIRT RING PCAP 3rd Party NIC Scales from Intel Atom to multi-socket Intel Xeon architecture platforms Packet Access (PMD- Native & Virtual) User Space KNI Libraries for network application development on Intel Platforms Linux Kernel IGB_UIO Rich selection of pre-built example applications DPDK in vSphere ESXi Environment DPDK in vSphere ESXi Environment Virtual Machine Unmodified App VMXNET3 Linux Kernel VMware ESXi vSwitch No bypass DPDK in vSphere ESXi Environment Virtual Machine Unmodified App Virtual Machine DPDK App Linux Kernel VMXNET3 VMXNET3 Linux Kernel VMware ESXi vSwitch No bypass VMware ESXi vSwitch Guest OS bypass DPDK in vSphere ESXi Environment Linux Kernel VMXNET3 VMXNET3 DPDK App DPDK App Linux Kernel VMware ESXi vSwitch Virtual Machine Virtual Machine VMware ESXi DPDK App Linux Kernel Linux Kernel VMware ESXi IXGBE VF Unmodified App Virtual Machine IXGBE Virtual Machine vSwitch vSwitch DirectPath I/O No bypass Guest OS bypass Guest OS bypass + Hypervisor bypass SR-IOV DPDK in vSphere ESXi Environment Linux Kernel VMXNET3 VMXNET3 DPDK App DPDK App Linux Kernel VMware ESXi vSwitch Virtual Machine Virtual Machine VMware ESXi DPDK App Linux Kernel Linux Kernel VMware ESXi IXGBE VF Unmodified App Virtual Machine IXGBE Virtual Machine vSwitch vSwitch DirectPath I/O No bypass Guest OS bypass Guest OS bypass + Hypervisor bypass SR-IOV Unrealistic Performance Characterization Benchmark Application Packet Generator 64-byte UDP single flow pNIC1 pNIC1 Physical 10G switch pNIC2 pNIC2 Virtual network connection Virtual network connection vNIC1 vNIC2 Hypervisor VNF VM running DPDK l2fwd Unrealistic Performance Characterization Benchmark Application Packet Generator 64-byte UDP single flow pNIC1 pNIC1 Physical 10G switch pNIC2 pNIC2 Virtual network connection Virtual network connection vNIC1 vNIC2 Hypervisor 1. No packet processing in the VM VNF VM running DPDK l2fwd Unrealistic Performance Characterization Benchmark Application Packet Generator 64-byte UDP single flow pNIC1 pNIC1 Physical 10G switch pNIC2 pNIC2 Virtual network connection Virtual network connection vNIC1 vNIC2 Hypervisor 1. No packet processing in the VM 2. No memory or cache accesses in VM VNF VM running DPDK l2fwd Unrealistic Performance Characterization Benchmark Application Packet Generator 64-byte UDP single flow pNIC1 pNIC1 Physical 10G switch pNIC2 pNIC2 Virtual network connection Virtual network connection vNIC1 vNIC2 VNF VM running DPDK l2fwd Hypervisor 1. No packet processing in the VM 2. No memory or cache accesses in VM 3. Port-to-port packet forwarding best done on physical switch via ASICs Unrealistic Performance Characterization Benchmark Application Packet Generator 64-byte UDP single flow pNIC1 pNIC1 Physical 10G switch pNIC2 pNIC2 Virtual network connection Virtual network connection vNIC1 vNIC2 VNF VM running DPDK l2fwd Hypervisor 1. 2. 3. 4. No packet processing in the VM No memory or cache accesses in VM Port-to-port packet forwarding best done on physical switch via ASICs No real telco network has only 64-byte UDP packets for a single flow Unrealistic Performance Characterization Benchmark Application Packet Generator 64-byte UDP single flow pNIC1 pNIC1 Physical 10G switch pNIC2 pNIC2 Virtual network connection Virtual network connection vNIC1 vNIC2 VNF VM running DPDK l2fwd Hypervisor 1. 2. 3. 4. No packet processing in the VM No memory or cache accesses in VM Port-to-port packet forwarding best done on physical switch via ASICs No real telco network has only 64-byte UDP packets for a single flow “Congratulations! You do no nothing faster than anyone else!” [Lori MacVittie, f5, 2010-13-10] Realistic Performance Characterization Benchmark Application Intel Virtualized Broadband Remote Access Server (dppd) NUMA I/O VMs NUMA0 NUMA1 4 1 2 3 Hypervisor HW Thread I/O Device NUMA I/O VMs NUMA0 NUMA1 NUMA0 4 1 2 3 1 2 3 4 Hypervisor HW Thread I/O Device Affinity Thread I/O Device NUMA1 Improvement due to NUMA I/O • Test Setup: Transmit Receive • 1 VM, 1vCPU • NUMA: 2 sockets • Workload: TCP bulk, 4 sessions • I/O Device: Mellanox 40GigE Improvement due to NUMA I/O Throughput, CPU efficiency 40 20 Throughput 30 15 24% 20 10 10 5 0 0 Default NUMA I/O Transmit Default NUMA I/O Receive CPU Utilization/Gbps Throughput (Gbps) 16% CPU • Test Setup: • 1 VM, 1vCPU • NUMA: 2 sockets • Workload: TCP bulk, 4 sessions • I/O Device: Mellanox 40GigE Improvement due to NUMA I/O Throughput, CPU efficiency 40 20 Throughput 19% 30 24% 20 15 10 10% 10 5 0 0 Default NUMA I/O Transmit Default NUMA I/O Receive CPU Utilization/Gbps Throughput (Gbps) 16% CPU • Test Setup: • 1 VM, 1vCPU • NUMA: 2 sockets • Workload: TCP bulk, 4 sessions • I/O Device: Mellanox 40GigE NFV Challenges and Call to Action • Performance focus – Reminiscent of the move from PSTN to VOIP – Shift from using unrealistic benchmarks • Orchestration • Matching reality to expectations • Incumbent pushback NFV Challenges and Call to Action • Performance focus – Reminiscent of the move from PSTN to VOIP – Shift from using unrealistic benchmarks • Orchestration • Matching reality to expectations • Incumbent pushback The IP Hourglass (Steve Deering, Dave Clark) Design NFV Infrastructure for both existing applications and hardware platforms as well as unknown future applications and hardware platforms NFV Challenges and Call to Action • Performance focus – Reminiscent of the move from PSTN to VOIP – Shift from using unrealistic benchmarks • Orchestration • Matching reality to expectations • Incumbent pushback The IP Hourglass (Steve Deering, Dave Clark) Design NFV Infrastructure for both existing applications and hardware platforms as well as unknown future applications and hardware platforms Improve OS and hypervisor performance instead of bypassing and losing capabilities Thank you! [email protected]
© Copyright 2026 Paperzz