OvS Lookup Optimization Using Two-Layer Table Lookup Sameh Gobriel and Charlie Tai Intel Labs OvS De Facto Virtual Switch for NFV Environments Monolithic Purpose-built Boxes Networking VMs on Standard Servers NFV • Network appliances use purpose-built H/W & ASICs (e.g., TCAM) for flow classification • General purpose processors with Cache/memory hierarchy can support much larger flow tables. • Cost & power consumption are limiting factors to support large number of flows • Multicores architecture provide a scalable competitive flow classification performance. 2 Open vSwitch Flow Lookup 1. Set of disjoint sub-table with no priority 2. Rule is only inserted into one sub-table (lookup terminates after first match) 3. Lookup is done by sequentially search each sub-table until a match is found OvS Flow Classification is a bottleneck Fig. Vtunes OVS flow lookup process (bypass EMC). Test case: 20 sub-tables, each has 100 rules. 3 Two Layer Table Lookup Abstraction for MFC L Lookups 1 lookup + 1st Level Indirection Overhead 4 Bloom Filter – Background X Y Hash1 Hash1 Hash 2 1 0 0 0 1 0 0 Lookup Z 1 0 Hash 2 0 0 0 0 1 0 0 Lookup Y • With certain false positive rate, bloom filter is used to check if a variable (x,y,z) is a member. Member means the variable has been inserted already. • We can use bloom filter to check if a flow is inside a sub-table or not, before searching the sub-table. 5 Bloom Filter – Lookup Scheme Mask 0 Mask 1 Mask 2 By using BF Sub-table Sub-table Mask 0 Mask 1 Mask 2 bloomfilter0 bloomfilter1 bloomfilter2 Sub-table Sub-table Sub-table Sub-table • Before searching into sub-table, we use bloom filter to check if the masked key (sub-key) is a member of the sub-table or not. 6 Cycles Breakdown Assuming L-subtable traversal Using Bloom Filter Using Current Scheme Operation Cycles Repetition I/O 210 1 Miniflow Extract 103 1 Hash For Submask 97 L Subtable Sig. Cmp 53 L Full Key Comparison 82 1 Hit Cycles = 395 + L * 150 Operation Cycles Repetition I/O 210 1 Miniflow Extract 103 1 Hash For Bloom Filter 88 L Bloom Filter Lookup 30 L Check Subtable Sig. Cmp 53 1 Full Key Comparison 82 1 Hit Cycles = 448 + L * 118 7 Vector Bloom Filter – Lookup Scheme Full Unmasked Key (512B) Mask 0 Mask 1 Mask 2 vBF0 vBF1 vBF2 By using VBF bloomfilter0 bloomfilter1 bloomfilter2 Mask 0 Mask 1 Mask 2 Sub-table Sub-table Sub-table Sub-table Sub-table Sub-table • Vector Bloom Filter (or vBF) hashes and stores unmasked full keys (like EMC). • vBF Filter for each sub-table store encountered full keys corresponding to rules in sub-tables • A new flow always misses vBF (similar to EMC) but can hit a rule in the sub-table. 8 Vector Bloom Filter – Cost Analysis Using vBF Packet header Using Original Scheme 1 Flow mask rules 2 vbf 3 vbf vbf 11110000 11100000 11000000 1010 xxxx 0011 xxxx 1011 xxxx 101x xxxx 001x xxxx 101x xxxx 10xx xxxx 00xx xxxx 10xx xxxx Operation Cycles Repetition Operation Cycles Repetition I/O 210 1 I/O 210 1 Miniflow Extract 103 1 Miniflow Extract 103 1 97 L Hash For XBloom (full Key) 159 Hash For Submask 1 Subtable Sig. Cmp 53 L Bloom Filter Lookup 30 L Full Key Comparison 82 1 Check Subtable Sig. Cmp 53 1 Full Key Comparison 82 1 Hit Cycles = 395 + L * 150 Hit Cycles = 607 + L * 30 9 Lookup Cycles Based on Model OVS BF vBF 4000 3500 3000 Cycles 2500 2000 1500 ~2X 1000 500 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Average Number of Subtables Traversed Results and performance figures are for an experimental prototype and is work in progress. The result reflect specific components on a particular test, in specific systems and should not be generalized for actual products. Differences in hardware, software, or configuration will affect actual performance. Results are generated using a model based on processing cycles of Intel Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz using OvS 2.6.0 with 20 sub-table and uniform random traffic. 10 Counting Bloom Filters to Handle Deletion Start with an m bit array, filled with 0s. B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 1 1 0 1 0 Hash each item xj in k times. If Hi(xj) = a, add 1 to B[a]. B 0 3 0 0 1 0 2 0 0 3 2 1 To delete xj decrement the corresponding counters. B 0 2 0 0 0 0 2 0 0 3 2 1 0 Can obtain a corresponding Bloom filter by reducing to 0/1. B 0 1 0 0 0 0 1 0 0 1 4 bits/counter Probability of Overflow = 6.78 E-17 1 1 0 1 Vector Bloom Filter – Results Throughput Comparison 4000000 Throughput (PPS/Core) 3500000 3000000 2500000 2000000 ~2X ~1.6X 1500000 vBF OvS (optimized) 1 9 25 49 81 121 169 225 289 361 441 529 625 729 841 961 1089 1225 1369 1521 1681 1849 2025 2209 2401 1000000 flows per table Fig. 20 subtables, and each su-btable contains various numbers of rules. Note that we disabled EMC for showing the benefits. • vBF provides significant lookup performance gain when compared with native OvS. • Gain increases with increasing number of sub-tables. Results and performance figures are for an experimental prototype and is work in progress. The result reflect specific components on a particular test, in specific systems and should not be generalized for actual products. Differences in hardware, software, or configuration will affect actual performance. Results are generated using uniform random traffic with 20 sub-tables running on Intel Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz and using OvS 2.6.0.. 12 Conclusion • Flow Lookup is a performance bottleneck for OvS, especially with increasing number of flows and sub-tables. • Two layer table architecture optimizes flow lookup in OvS and avoids the sequential search of the sub-tables. • Vector Bloom Filter (vBF) uses bloom filters as the first layer and can significantly improves lookup performance for OvS. • Future Work: • Investigate other technologies to use as the first layer of indirection. • Realistic traffic pattern and workload 13 Questions? [email protected] [email protected]
© Copyright 2026 Paperzz