davda

NFV in Cloud Infrastructure Ready or Not
Bhavesh Davda
Office of the CTO
VMware, Inc.
Outline
• Brief history of x86 virtualization
• The case for Network Functions Virtualization
• Hypervisor performance characteristics
• Challenges and Call to Action
X86 Virtualization Timeline
x86-64 Dual core
Hardware Virtualization
Support
Intel VT-x
Intel EPT
AMD-V
AMD RVI
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 …
ESX 4.0 (8 vcpus)
ESX 3.5
Xen 1.0
ESX 3.0 (4 vcpus)
Workstation 5.5 (64 bit guests)
ESX 2.0 (vSMP, 2vcpus)
ESX Server 1.0
First KVM
release in
RHEL5.4
Workstation 1.0
VMware founded
Fusion 1.0
ESXi 6.0
(128 vcpus,
4TB RAM)
X86 Virtualization 1998-2006 (pre-hardware virtualization)
Direct Exec
(user)
Faults, syscalls, interrupts
VMM
IRET, SYSRET
BT
(kernel)
X86 Virtualization 2006-present (hardware virtualization)
Ring 3
Apps
VM
exit
Ring 0
Guest mode
Guest OS
VM
enter
Root mode
VMM
Datacenter Workloads
The State of Virtualization
Virtualized
Unvirtualized
Datacenter Workloads
The State of Virtualization
100%
80%
Virtualized
Unvirtualized
Business Critical Application
Virtualization
Source: Cloud Infrastructure Annual Primary Research, 2Q13
(N=576)
60%
56%
56%
53%
53%
51%
49%
Oracle Applications
SAP
Oracle Middleware
Oracle Databases
58%
56%
Other Middleware
58%
High Performance
Computing
59%
Other Mission-critical
Apps
60%
40%
20%
0%
Average
Microsoft SharePoint
Microsoft SQL
Microsoft Exchange
Outline
• Brief history of x86 virtualization
• The case for Network Functions Virtualization
• Hypervisor performance characteristics
• Challenges and Call to Action
Telecommunications
FMC
Wi-Fi
Optical networks
FTTx, NGA
RCS(-e)
PCRF
SMSC
Ga
te
wa
ys
Media
(voice (MG))
ePDG
OCS
Location
MC
ANDSF
MGCF
IMS
MRF
MSAN → FSAN
(POTS, DSLAM,
OTN, ADMs)
Carrier NAT
IPTV
Antivirus
Firewall
IPS
Caching
URL filtering
VoIP
Load
balancing
UC
3GPP
architecture
ISP (DNS, DHCP, IP management (DDI)
Video optimisation
WAN optimisation
DPI
OSS, BSS and
SDP
ISP DMZ LAN
Corporate
HSS/AAA
Content delivery
Networks (CDN)
Border GW
router
MMTel
VMail
CSCF
PSTN, ADSL,
Docsis, WLL
VoLTE
SGW
Media
Femtocells
HSS
Gi interface
PGW
(PCEF)
TAS
Fixed
Small cells
(Microcells)
MME
External
Networks
Applications
non-3GPP
networks
Radio network controllers
(RNC (2G), nodeB (3G),
eNodeB (LTE, BSC))
2/3/4G(LTE), CDMA
(Macrocells, BSC)
Core
Networks
Transport Networks
Internet
Mobile
Access
Networks
NFV in a nutshell
NFV Motivation: From Margin Erosion to Sustainable Profitability
Margin Erosion
OTT
Services
Network
Investment
Funded by
CSP $
ARPU
$
Customer
Cost per
Subscriber
NFV Motivation: From Margin Erosion to Sustainable Profitability
Margin Erosion
Sustainable Profitability
OTT
Services
Network
Investment
Funded by
CSP $
Cloud
Operating
Model
ARPU
$
Customer
Cost per
Subscriber
Cloud
Platform
NFV Motivation: From Margin Erosion to Sustainable Profitability
Margin Erosion
Sustainable Profitability
OTT
Services
Network
Investment
Funded by
CSP $
Cloud
Operating
Model
ARPU
Service
Agility
$
Customer
Cost per
Subscriber
Cloud
Platform
Service
Differentiation
NFV Architecture
Operations and Business Support Systems (OSS / BSS)
Orchestrator
Service, VNF & Infrastructure Description
VNF
EMS1
EMS2
EMS3
VNF1
VNF2
VNF3
Virtual Compute
Virtual Storage
Virtual Network
VNF
Managers
NFVI
Virtual
Infrastructure
Manager
Virtualization Layer
Compute Hardware
Storage
Sample
Hardware
text
Network Hardware
NFV M&O
Many Use Cases
Many Use Cases
Example Use Case: 4G LTE Network
• LTE Network Elements
Evolved UTRAN
Evolved Packet Core
(E-UTRAN)
(EPC)
HSS
S6a
MME: Mobility Management Entity
PCRF: Policy & Charging Rule Function
LTE-UE
Evolved Node B
(eNB)
S10
MME
X2
S7
Rx+
PCRF
S1-MME
S11
S1-U
SGi
S5/S8
PDN
cell
LTE-Uu
Serving Gateway
PDN Gateway
SAE Gateway
Deployed in production in
multiple carriers worldwide
LTE: vEPC – MME, PGW, SGW
OSS/BSS
NFV
ORCHESTRATOR
EMS
EMS A
Generic
VNF Mgr
VNF
VIRTUAL COMPUTING
VNF A
VIRTUAL STORAGE
VIRTUAL NETWORK
VIRTUALIZED
INFRASTRUCTURE
MANAGER
VIRTUALIZATION LAYER
COMPUTING HARDWARE
STORAGE HARDWARE
NETWORK HARDWARE
Deployed in production in
multiple carriers worldwide
LTE: vEPC – MME, PGW, SGW
OSS/BSS
NFV
ORCHESTRATOR
EMS
EMS A
Generic
VNF Mgr
VNF
ESXi Hypervisor
VIRTUAL COMPUTING
VNF A
VSAN
VIRTUAL STORAGE
NSX
VIRTUAL NETWORK
VIRTUALIZATION LAYER
vCloud
OpenStack
Director
VIRTUALIZED
INFRASTRUCTURE
MANAGER
vCenter
COMPUTING HARDWARE
STORAGE HARDWARE
NETWORK HARDWARE
Deployed in production in
multiple carriers worldwide
LTE: vEPC – MME, PGW, SGW
OSS/BSS
NFV
ORCHESTRATOR
VNF Vendor
EMS A
EMS
EMS
Vendor’s
VNF Mgr
(Optional)
VNF
ESXi Hypervisor
VIRTUAL COMPUTING
MME/
SGSN
PGW/SGW/
HSS
GGSN VNF A
VSAN
VIRTUAL STORAGE
Generic
VNF Mgr
PCRF
NSX
VIRTUAL NETWORK
VIRTUALIZATION LAYER
vCloud
OpenStack
Director
VIRTUALIZED
INFRASTRUCTURE
MANAGER
vCenter
COMPUTING HARDWARE
STORAGE HARDWARE
NETWORK HARDWARE
Deployed in production in
multiple carriers worldwide
LTE: vEPC – MME, PGW, SGW
OSS/BSS
NFV
ORCHESTRATOR
Service Orchestrator / Automation Platform
VNF Vendor
EMS A
EMS
EMS
Vendor’s
VNF Mgr
(Optional)
VNF
ESXi Hypervisor
VIRTUAL COMPUTING
MME/
SGSN
PGW/SGW/
HSS
GGSN VNF A
VSAN
VIRTUAL STORAGE
Generic
VNF Mgr
PCRF
NSX
VIRTUAL NETWORK
VIRTUALIZATION LAYER
vCloud
OpenStack
Director
VIRTUALIZED
INFRASTRUCTURE
MANAGER
vCenter
COMPUTING HARDWARE
STORAGE HARDWARE
NETWORK HARDWARE
Outline
• Brief history of x86 virtualization
• The case for Network Functions Virtualization
• Hypervisor performance characteristics
• Challenges and Call to Action
Why performance is important to NFV
• NFV workloads different from enterprise / datacenter workloads
• NFV Performance & Portability Best Practices (ETSI GS NFV-PER 001) classification
– Data plane workloads: tasks related to packet handling in an end-to-end communication between
edge applications…expected to be very intensive in I/O operations and memory R/W operations. E.g.
CDN cache, router, IPsec tunneling
• High packet rates
– Control plane workloads: communication between NFs that is not directly related to the end-to-end
data communication between edge applications, like session management, routing or authentication.
Compared to data plane workloads, control plane workloads are expected to be much less intensive in
terms of transactions per second, while the complexity of the transactions might be higher
• CPU and memory intensive
– Signal processing workloads: NF tasks related to digital processing such as the FFT decoding and
encoding in a C-RAN Base Band Unit (BBU). Expected to be very intensive in CPU processing capacity
and high delay-sensitive
• Latency and jitter sensitive
ESXi Networking Datapath Overview
ESXi Hypervisor
Management
Agents
Background
Tasks
VMs
Virtual Switch
Interconnect
NIC
Server
Network
Switch
ESXi Networking Datapath Overview
• Message copy from application to GOS
ESXi Hypervisor
(kernel)
Management
Agents
Background
Tasks
VMs
Virtual Switch
Interconnect
NIC
Server
Network
Switch
ESXi Networking Datapath Overview
• Message copy from application to GOS
ESXi Hypervisor
(kernel)
Management
Agents
• GOS (network stack) + vNIC driver queues
Background
Tasks
packet for vNIC
VMs
Virtual Switch
Interconnect
NIC
Server
Network
Switch
ESXi Networking Datapath Overview
• Message copy from application to GOS
ESXi Hypervisor
(kernel)
Management
Agents
• GOS (network stack) + vNIC driver queues
Background
Tasks
packet for vNIC
• VM exit to VMM/Hypervisor
VMs
Virtual Switch
Interconnect
NIC
Server
Network
Switch
ESXi Networking Datapath Overview
• Message copy from application to GOS
ESXi Hypervisor
(kernel)
Management
Agents
• GOS (network stack) + vNIC driver queues
Background
Tasks
packet for vNIC
• VM exit to VMM/Hypervisor
VMs
• vNIC implementation emulates DMA from
VM, sends to vSwitch
Virtual Switch
Interconnect
NIC
Server
Network
Switch
ESXi Networking Datapath Overview
• Message copy from application to GOS
ESXi Hypervisor
(kernel)
Management
Agents
• GOS (network stack) + vNIC driver queues
Background
Tasks
packet for vNIC
• VM exit to VMM/Hypervisor
VMs
• vNIC implementation emulates DMA from
VM, sends to vSwitch
• vSwitch queues packet for pNIC
Virtual Switch
Interconnect
NIC
Server
Network
Switch
ESXi Networking Datapath Overview
• Message copy from application to GOS
ESXi Hypervisor
(kernel)
Management
Agents
• GOS (network stack) + vNIC driver queues
Background
Tasks
packet for vNIC
• VM exit to VMM/Hypervisor
VMs
• vNIC implementation emulates DMA from
VM, sends to vSwitch
• vSwitch queues packet for pNIC
Virtual Switch
• pNIC DMAs packet and transmits on the
wire
Interconnect
NIC
Server
Network
Switch
Guest OS bypass and hypervisor bypass
ESXi Hypervisor
Management
Agents
Background
Tasks
VMs
Hypervisor provides:
• Live migration (vMotion)
• Network virtualization (VXLAN, STT, Geneve)
• Virtual network monitoring, troubleshooting, stats
• High availability
• Fault tolerance
• Distributed resource scheduling
• VM snapshots
• Hot add/remove of devices, vCPUs, memory
Virtual Switch
Interconnect
NIC
Server
Network
Switch
Guest OS bypass and hypervisor bypass
ESXi Hypervisor
Management
Agents
Guest OS bypass
Background
Tasks
VMs
Hypervisor provides:
• Live migration (vMotion)
• Network virtualization (VXLAN, STT, Geneve)
• Virtual network monitoring, troubleshooting, stats
• High availability
• Fault tolerance
• Distributed resource scheduling
• VM snapshots
• Hot add/remove of devices, vCPUs, memory
Virtual Switch
Interconnect
NIC
Server
Network
Switch
Guest OS bypass and hypervisor bypass
ESXi Hypervisor
Management
Agents
Guest OS bypass
Background
Tasks
Hypervisor bypass
VMs
Hypervisor provides:
• Live migration (vMotion)
• Network virtualization (VXLAN, STT, Geneve)
• Virtual network monitoring, troubleshooting, stats
• High availability
• Fault tolerance
• Distributed resource scheduling
• VM snapshots
• Hot add/remove of devices, vCPUs, memory
Virtual Switch
Interconnect
NIC
Server
Network
Switch
Guest OS bypass and hypervisor bypass
ESXi Hypervisor
Management
Agents
Guest OS bypass
Background
Tasks
Hypervisor bypass
VMs
Hypervisor provides:
• Live migration (vMotion)
• Network virtualization (VXLAN, STT, Geneve)
• Virtual network monitoring, troubleshooting, stats
• High availability
• Fault tolerance
• Distributed resource scheduling
• VM snapshots
• Hot add/remove of devices, vCPUs, memory
Virtual Switch
Interconnect
NIC
Server
Network
Switch
The Latency Sensitivity Feature in vSphere 5.5
Hypervisor
Hypervisor
The Latency Sensitivity Feature in vSphere 5.5
Latency-sensitivity
Hypervisor
Hypervisor
 Latency-sensitivity Feature
– Minimize virtualization overhead
– Achieve near bare-metal performance
ESXi 5.5 Latency and Jitter Improvements
• New virtual machine property: “Latency sensitivity”
– High => lowest latency
– Medium => low latency
• Exclusively assign physical CPUs to virtual CPUs of “Latency Sensitivity = High” VMs
– Physical CPUs not used for scheduling other VMs or ESXi tasks
• Idle in Virtual Machine monitor (VMM) when Guest OS is idle
– Lowers latency to wake up the idle Guest OS, compared to idling in ESXi vmkernel
• Disable vNIC interrupt coalescing
• For DirectPath I/O, optimize interrupt delivery path for lowest latency
• Make ESXi vmkernel more preemptible
– Reduces jitter due to long-running kernel code
ESXi 6.0 Network Latencies and Jitter
12
60
10
Round-trip Time (microseconds)
50
8
40
Native
Native
SR-IOV
30
SR-IOV
6
SR-IOV LS
SR-IOV LS
VMXNET3
VMXNET3
VMXNET3 LS
20
VMXNET3 LS
4
2
10
0
0
Avg
Single 4-vCPU VM to Native, Ubuntu 14.04 LTS, RTT from ‘ping –i 0.00001 –c 1000000’
Intel Xeon E5-2697 v2 @ 2.70 GHz, Intel 82599EB PNIC
Standard Deviation
ESXi Real-time Virtualization Performance
Latency
Jitter
Interrupt
ISR
ISR
ISR
period
• cyclictest –p 99 –a 1 –m –n –D 10m –q
ESXi 5.0
ESXi 6.0
Min 5 μs
Max 1676 μs
Avg 13 μs
Min 2 μs
Max 78 μs
Avg 4 μs
99%-ile 56 μs
99%-ile 5 μs
99.99%-ile 118 μs
99.99%-ile 7 μs
osu_latency
benchmark
Open MPI 1.6.5
MLX OFED 2.2
RedHat 6.5
HP DL380p Gen8
Mellanox FDR
InfiniBand
ESXi MPI InfiniBand Latencies
MPI PingPong Latency
Half Round-trip Latency (µs)
3.5
3
2.5
Native
ESX 5.5
ESX 6.0
ESX 6.0u1
2
1.5
1
0.5
0
1
2
4
8
16
32
64
Message size (bytes)
128
256
512
1024
Data Plane Development Kit (DPDK)
DPDK Sample
Applications
ISV Eco-System
Applications
Customer Applications
EAL
METER
MALLOC
SCHED
E1000
IGB
EXACT
MATCH
MBUF
QoS
IXGBE
I40e
ACL
KNI
VMXNET
3
VIRTIO
XENVIRT
RING
PCAP
3rd Party
NIC
MEMPOOL
•
•
•
Speeds up networking functions
Enables user space application development
Facilitates both run-to-completion and
pipeline models
Free, Open-sourced, BSD Licensed
•
Git: http://dpdk.org/git/dpdk
Classify
RING
POWER
TIMER
IVSHMEM
Core
Libraries
LPM
ETHDEV
Libraries for network application
development on Intel Platforms
Platform
Scales from Intel Atom to multi-socket
Intel Xeon architecture platforms
Packet Access
(PMD- Native & Virtual)
User Space
KNI
Linux Kernel
IGB_UIO
Rich selection of pre-built example
applications
Data Plane Development Kit (DPDK)
SCHED
MBUF
QoS
MEMPOOL
TIMER
Core
Libraries
Quality of
Service
MALLOC
Standard Libraries for arrays,
buffers, rings
METER
Platform Libraries
IP interface,
Power, Shared
Memory
Customer Applications
EAL
RING
ISV Eco-System
Applications
KNI
POWER
IVSHMEM
Platform
LPM
ETHDEV
E1000
Classification
Algorithms
(Pattern Matching)
DPDK Sample
Applications
IGB
Ethernet
Poll Mode
IXGBE
I40e
Drivers for HW and
VMXNET
VIRTIO
3Virtual HW
•
•
•
EXACT
MATCH
ACL
Speeds up networking functions
Enables user space application development
Facilitates both run-to-completion and
pipeline models
Free, Open-sourced, BSD Licensed
•
Git: http://dpdk.org/git/dpdk
Classify
XENVIRT
RING
PCAP
3rd Party
NIC
Scales from Intel Atom to multi-socket
Intel Xeon architecture platforms
Packet Access
(PMD- Native & Virtual)
User Space
KNI
Libraries for network application
development on Intel Platforms
Linux Kernel
IGB_UIO
Rich selection of pre-built example
applications
DPDK in vSphere ESXi Environment
DPDK in vSphere ESXi Environment
Virtual Machine
Unmodified App
VMXNET3
Linux Kernel
VMware ESXi
vSwitch
No bypass
DPDK in vSphere ESXi Environment
Virtual Machine
Unmodified App
Virtual Machine
DPDK App
Linux Kernel
VMXNET3
VMXNET3
Linux Kernel
VMware ESXi
vSwitch
No bypass
VMware ESXi
vSwitch
Guest OS bypass
DPDK in vSphere ESXi Environment
Linux Kernel
VMXNET3
VMXNET3
DPDK App
DPDK App
Linux Kernel
VMware ESXi
vSwitch
Virtual Machine
Virtual Machine
VMware ESXi
DPDK App
Linux
Kernel
Linux
Kernel
VMware ESXi
IXGBE VF
Unmodified App
Virtual Machine
IXGBE
Virtual Machine
vSwitch
vSwitch
DirectPath I/O
No bypass
Guest OS bypass
Guest OS bypass +
Hypervisor bypass
SR-IOV
DPDK in vSphere ESXi Environment
Linux Kernel
VMXNET3
VMXNET3
DPDK App
DPDK App
Linux Kernel
VMware ESXi
vSwitch
Virtual Machine
Virtual Machine
VMware ESXi
DPDK App
Linux
Kernel
Linux
Kernel
VMware ESXi
IXGBE VF
Unmodified App
Virtual Machine
IXGBE
Virtual Machine
vSwitch
vSwitch
DirectPath I/O
No bypass
Guest OS bypass
Guest OS bypass +
Hypervisor bypass
SR-IOV
Unrealistic Performance Characterization Benchmark Application
Packet
Generator
64-byte UDP
single flow
pNIC1
pNIC1
Physical 10G
switch
pNIC2
pNIC2
Virtual
network
connection
Virtual
network
connection
vNIC1
vNIC2
Hypervisor
VNF VM
running
DPDK
l2fwd
Unrealistic Performance Characterization Benchmark Application
Packet
Generator
64-byte UDP
single flow
pNIC1
pNIC1
Physical 10G
switch
pNIC2
pNIC2
Virtual
network
connection
Virtual
network
connection
vNIC1
vNIC2
Hypervisor
1. No packet processing in the VM
VNF VM
running
DPDK
l2fwd
Unrealistic Performance Characterization Benchmark Application
Packet
Generator
64-byte UDP
single flow
pNIC1
pNIC1
Physical 10G
switch
pNIC2
pNIC2
Virtual
network
connection
Virtual
network
connection
vNIC1
vNIC2
Hypervisor
1. No packet processing in the VM
2. No memory or cache accesses in VM
VNF VM
running
DPDK
l2fwd
Unrealistic Performance Characterization Benchmark Application
Packet
Generator
64-byte UDP
single flow
pNIC1
pNIC1
Physical 10G
switch
pNIC2
pNIC2
Virtual
network
connection
Virtual
network
connection
vNIC1
vNIC2
VNF VM
running
DPDK
l2fwd
Hypervisor
1. No packet processing in the VM
2. No memory or cache accesses in VM
3. Port-to-port packet forwarding best done on physical switch via ASICs
Unrealistic Performance Characterization Benchmark Application
Packet
Generator
64-byte UDP
single flow
pNIC1
pNIC1
Physical 10G
switch
pNIC2
pNIC2
Virtual
network
connection
Virtual
network
connection
vNIC1
vNIC2
VNF VM
running
DPDK
l2fwd
Hypervisor
1.
2.
3.
4.
No packet processing in the VM
No memory or cache accesses in VM
Port-to-port packet forwarding best done on physical switch via ASICs
No real telco network has only 64-byte UDP packets for a single flow
Unrealistic Performance Characterization Benchmark Application
Packet
Generator
64-byte UDP
single flow
pNIC1
pNIC1
Physical 10G
switch
pNIC2
pNIC2
Virtual
network
connection
Virtual
network
connection
vNIC1
vNIC2
VNF VM
running
DPDK
l2fwd
Hypervisor
1.
2.
3.
4.
No packet processing in the VM
No memory or cache accesses in VM
Port-to-port packet forwarding best done on physical switch via ASICs
No real telco network has only 64-byte UDP packets for a single flow
“Congratulations! You do no nothing faster than anyone else!”
[Lori MacVittie, f5, 2010-13-10]
Realistic Performance Characterization Benchmark Application
Intel Virtualized Broadband Remote Access Server (dppd)
NUMA I/O
VMs
NUMA0
NUMA1
4
1 2 3
Hypervisor
HW
Thread
I/O Device
NUMA I/O
VMs
NUMA0
NUMA1
NUMA0
4
1 2 3
1 2 3 4
Hypervisor
HW
Thread
I/O Device
Affinity
Thread
I/O Device
NUMA1
Improvement due to NUMA I/O
• Test Setup:
Transmit
Receive
•
1 VM, 1vCPU
•
NUMA: 2 sockets
•
Workload: TCP bulk, 4
sessions
•
I/O Device: Mellanox
40GigE
Improvement due to NUMA I/O
Throughput, CPU efficiency
40
20
Throughput
30
15
24%
20
10
10
5
0
0
Default
NUMA I/O
Transmit
Default
NUMA I/O
Receive
CPU Utilization/Gbps
Throughput (Gbps)
16%
CPU
• Test Setup:
•
1 VM, 1vCPU
•
NUMA: 2 sockets
•
Workload: TCP bulk, 4
sessions
•
I/O Device: Mellanox
40GigE
Improvement due to NUMA I/O
Throughput, CPU efficiency
40
20
Throughput
19%
30
24%
20
15
10
10%
10
5
0
0
Default
NUMA I/O
Transmit
Default
NUMA I/O
Receive
CPU Utilization/Gbps
Throughput (Gbps)
16%
CPU
• Test Setup:
•
1 VM, 1vCPU
•
NUMA: 2 sockets
•
Workload: TCP bulk, 4
sessions
•
I/O Device: Mellanox
40GigE
NFV Challenges and Call to Action
• Performance focus
– Reminiscent of the move from PSTN to VOIP
– Shift from using unrealistic benchmarks
• Orchestration
• Matching reality to expectations
• Incumbent pushback
NFV Challenges and Call to Action
• Performance focus
– Reminiscent of the move from PSTN to VOIP
– Shift from using unrealistic benchmarks
• Orchestration
• Matching reality to expectations
• Incumbent pushback
The IP Hourglass (Steve Deering, Dave Clark)
Design NFV Infrastructure for both existing applications and hardware
platforms as well as unknown future applications and hardware platforms
NFV Challenges and Call to Action
• Performance focus
– Reminiscent of the move from PSTN to VOIP
– Shift from using unrealistic benchmarks
• Orchestration
• Matching reality to expectations
• Incumbent pushback
The IP Hourglass (Steve Deering, Dave Clark)
Design NFV Infrastructure for both existing applications and hardware
platforms as well as unknown future applications and hardware platforms
Improve OS and hypervisor performance instead of bypassing and losing
capabilities
Thank you!
[email protected]