OvS Lookup Optimization

OvS Lookup Optimization
Using Two-Layer Table Lookup
Sameh Gobriel and Charlie Tai
Intel Labs
OvS De Facto Virtual Switch for NFV Environments
Monolithic Purpose-built Boxes
Networking VMs on Standard Servers
NFV
•
Network appliances use purpose-built H/W &
ASICs (e.g., TCAM) for flow classification
•
General purpose processors with Cache/memory hierarchy
can support much larger flow tables.
•
Cost & power consumption are limiting
factors to support large number of flows
•
Multicores architecture provide a scalable competitive
flow classification performance.
2
Open vSwitch Flow Lookup
1. Set of disjoint sub-table with no priority
2. Rule is only inserted into one sub-table (lookup terminates after first match)
3. Lookup is done by sequentially search each sub-table until a match is found
OvS Flow Classification is
a bottleneck
Fig. Vtunes OVS flow lookup process (bypass EMC). Test case: 20 sub-tables, each has 100 rules.
3
Two Layer Table Lookup Abstraction for MFC
L Lookups  1 lookup + 1st Level Indirection Overhead
4
Bloom Filter – Background
X
Y
Hash1
Hash1
Hash 2
1
0
0
0
1
0
0
Lookup Z
1
0
Hash 2
0
0
0
0
1
0
0
Lookup Y
• With certain false positive rate, bloom filter is used to check if a variable
(x,y,z) is a member. Member means the variable has been inserted already.
• We can use bloom filter to check if a flow is inside a sub-table or not, before
searching the sub-table.
5
Bloom Filter – Lookup Scheme
Mask 0
Mask 1
Mask 2
By using BF
Sub-table
Sub-table
Mask 0
Mask 1
Mask 2
bloomfilter0
bloomfilter1
bloomfilter2
Sub-table
Sub-table
Sub-table
Sub-table
• Before searching into sub-table, we use bloom filter to check if the masked
key (sub-key) is a member of the sub-table or not.
6
Cycles Breakdown Assuming L-subtable traversal
Using Bloom Filter
Using Current Scheme
Operation
Cycles
Repetition
I/O
210
1
Miniflow Extract
103
1
Hash For Submask
97
L
Subtable Sig. Cmp
53
L
Full Key Comparison
82
1
Hit Cycles = 395 + L * 150
Operation
Cycles
Repetition
I/O
210
1
Miniflow Extract
103
1
Hash For Bloom Filter
88
L
Bloom Filter Lookup
30
L
Check Subtable Sig.
Cmp
53
1
Full Key Comparison
82
1
Hit Cycles = 448 + L * 118
7
Vector Bloom Filter – Lookup Scheme
Full Unmasked
Key (512B)
Mask 0
Mask 1
Mask 2
vBF0
vBF1
vBF2
By using VBF
bloomfilter0
bloomfilter1
bloomfilter2
Mask 0
Mask 1
Mask 2
Sub-table
Sub-table
Sub-table
Sub-table
Sub-table
Sub-table
•
Vector Bloom Filter (or vBF) hashes and stores
unmasked full keys (like EMC).
•
vBF Filter for each sub-table store encountered
full keys corresponding to rules in sub-tables
•
A new flow always misses vBF (similar to EMC) but
can hit a rule in the sub-table.
8
Vector Bloom Filter – Cost Analysis
Using vBF
Packet header
Using Original Scheme
1
Flow mask
rules
2
vbf
3
vbf
vbf
11110000
11100000
11000000
1010 xxxx
0011 xxxx
1011 xxxx
101x xxxx
001x xxxx
101x xxxx
10xx xxxx
00xx xxxx
10xx xxxx
Operation
Cycles
Repetition
Operation
Cycles
Repetition
I/O
210
1
I/O
210
1
Miniflow Extract
103
1
Miniflow Extract
103
1
97
L
Hash For XBloom (full
Key)
159
Hash For Submask
1
Subtable Sig. Cmp
53
L
Bloom Filter Lookup
30
L
Full Key Comparison
82
1
Check Subtable Sig. Cmp
53
1
Full Key Comparison
82
1
Hit Cycles = 395 + L * 150
Hit Cycles = 607 + L * 30
9
Lookup Cycles Based on Model
OVS
BF
vBF
4000
3500
3000
Cycles
2500
2000
1500
~2X
1000
500
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20
Average Number of Subtables Traversed
Results and performance figures are for an experimental prototype and is work in progress. The result reflect specific components
on a particular test, in specific systems and should not be generalized for actual products. Differences in hardware, software, or
configuration will affect actual performance.
Results are generated using a model based on processing cycles of Intel Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz using OvS 2.6.0
with 20 sub-table and uniform random traffic.
10
Counting Bloom Filters to Handle Deletion
Start with an m bit array, filled with 0s.
B
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
2
1
0
1
1
0
1
0
Hash each item xj in k times. If Hi(xj) = a, add 1 to B[a].
B
0
3
0
0
1
0
2
0
0
3
2
1
To delete xj decrement the corresponding counters.
B
0
2
0
0
0
0
2
0
0
3
2
1
0
Can obtain a corresponding Bloom filter by reducing to 0/1.
B
0
1
0
0
0
0
1
0
0
1
4 bits/counter  Probability of Overflow = 6.78 E-17
1
1
0
1
Vector Bloom Filter – Results
Throughput Comparison
4000000
Throughput
(PPS/Core)
3500000
3000000
2500000
2000000
~2X
~1.6X
1500000
vBF
OvS
(optimized)
1
9
25
49
81
121
169
225
289
361
441
529
625
729
841
961
1089
1225
1369
1521
1681
1849
2025
2209
2401
1000000
flows per table
Fig. 20 subtables, and each su-btable contains various numbers of rules. Note that we disabled EMC for showing the benefits.
•
vBF provides significant lookup performance gain when compared with native OvS.
•
Gain increases with increasing number of sub-tables.
Results and performance figures are for an experimental prototype and is work in progress. The result
reflect specific components on a particular test, in specific systems and should not be generalized for
actual products. Differences in hardware, software, or configuration will affect actual performance.
Results are generated using uniform random traffic with 20 sub-tables running on Intel Intel(R) Xeon(R)
CPU E5-2699 v4 @ 2.20GHz and using OvS 2.6.0..
12
Conclusion
• Flow Lookup is a performance bottleneck for OvS, especially with increasing
number of flows and sub-tables.
• Two layer table architecture optimizes flow lookup in OvS and avoids the
sequential search of the sub-tables.
• Vector Bloom Filter (vBF) uses bloom filters as the first layer and can
significantly improves lookup performance for OvS.
• Future Work:
• Investigate other technologies to use as the first layer of indirection.
• Realistic traffic pattern and workload
13
Questions?
[email protected]
[email protected]