Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines By F. Bonomi et al. Presented by Kenny Cheng, Tonny Mak Yui Kuen Introduction A) Motivation B) Objectives C) Problem statements 2 A) Motivation • Increasing trend to keep flow state in routers • Large memory space (~100 bits per flow) is needed for storing a large amount of flow states • If memory space can be reduced, using fast onchip memory is feasible to improve performance 3 B) Objectives • Introduce the idea of an Approximate Concurrent State Machine (ACSM), it sacrifices some accuracy for memory size. • Introduce and compare several solutions to ACSM problem • To find an approach with the highest accuracy to memory ratio 4 C) Problem statements • Describe 3 techniques based on Bloom filters and hashing, and evaluate them using both theoretical analysis and simulation 5 Bloom Filter • A data structure proposed by Bloom in 1970 • Designed for membership test, i.e. to test whether an element exists in a set • Fast and compact • Chance of false positive, i.e. an element not in the set may be wrongly identified • No false negative, i.e. an element in the set must be identified correctly 6 How a Bloom Filter Works 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 ... k • A bit array with all zeros initially • k hash functions 7 How a Bloom Filter Works Insertion 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 2 3 ... k x • Hash the element using the hash functions, get k indices in the bit array • Mark the bits to 1 8 How a Bloom Filter Works Lookup 0 0 1 1 1 0 0 1 0 0 1 0 0 1 1 2 3 ... k x • Hash the element using the hash functions • If all corresponding bits are 1, it’s in the set 9 How a Bloom Filter Works Deletion 1 ? 1 1 0 0 ? 1 0 0 ? 1 0 0 1 0 0 ? 1 2 3 ... k x • Sorry, no deletion • You don’t know whether the bits are used by other elements or not, cannot simply clear them 10 Counting Bloom Filter 0 0 1 0 1 0 3 1 0 0 3 1 0 0 1 2 0 0 1 0 1 2 3 ... k x • • • • Use a counter to replace a bit For insertion, increment the counters For deletion, decrement the counters Problems: more space, overflow counters 11 3 Approaches to ACSM • Approaches: 1. Direct Bloom Filter 2. Stateful Bloom Filter 3. Fingerprint-compressed Filter • Operations need to implement: 1. Insert(flow, state) 2. Lookup(flow) returns (state) 3. Delete(flow) 4. Update(flow, new_state) 12 Direct Bloom Filter Approach • Use counting Bloom filter • 4 operations: Insert – insert (flow_id, state) pair Lookup – if state is not provided, have to lookup every state, return “don’t know” if more than one state is found Delete – lookup + decrement counters Update – delete old + insert new • Improvement: use timing-based deletion to handle non-terminated flows 13 Timing-based Deletion Timing Bits 0 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 2 0 1 1 0 0 1 0 2 0 0 3 0 3 0 0 1 0 1 2 3 ... k x • • • • Add a timing bit to each cell Set the bit if the cell is touched Clear untouched cells periodically, and reset timing bits Alternative to DBF: use standard Bloom filter instead of counting, delete elements only by time-based deletion 14 Stateful Bloom Filter Approach • Direct Bloom Filter doesn’t store the state of a flow, need to lookup every state • Improvement: add a state value for each cell for faster lookup • Hash flow_id only, instead of (flow_id, state) pair • Introduce a “don’t know” (DK) state when collision occurs • Keep timing-based deletion 15 Stateful Bloom Filter Approach • Insert, modify, delete – similar to Direct Bloom Filter, set the cell value to DK for collision (counter > 1) • Lookup: If all cells are DK, return DK If all cells are either state i or DK, return state i If more than one state other than DK, return “not found” 16 Fingerprint-compressed Filter Approach • Store a fingerprint of flow + state in a d-left hashtable ... 1 2 ... d x Fingerprint State 0110111010 1100110000 1001010110 2 4 1 1110001000 1110011101 0111010100 1 3 1 ... 0000111101 1100000110 3 3 17 Fingerprint-compressed Filter Approach • Insert - hash the element, and find the corresponding bucket in each hash table, insert the fingerprint + state in the bucket with least number of elements (choose the left-most one to break ties) • Lookup – retrieve the state of the fingerprint • Delete – remove the fingerprint • Update – direct update or remove old + add new • Make use of DK when a fingerprint is found in multiple buckets • Timing-based deletion can still be applied 18 Simulation • To investigate the size/accuracy trade-off for the 3 approaches • State machine: 10 states • Legal state changes: 1 → 2 → 3 → … → 10 • Run for 1 million flows • About 60000 simultaneous flows • 100 ± 40 packets for each flow • Some packets trigger state change 19 Simulation • 3 kinds of simulation flows • Interesting flows (30%) – flows with legal state changes only, always complete • Noise flows (30%) – flows with random (can be legal or illegal) state changes, never complete • Random flows (40%) – flows without state change 20 Simulation False positive rate: % of completed flows which is not-interesting False negative rate: % of interesting flows without completion 21 Applications Place in the application level QoS:• Video congestion control • Peer-to-Peer (P2P) traffic identification 22 Video congestion control • Apply to MPEG video streaming • 3 kinds of frames for MPEG video: I frame – scene information P frame – differential information B frame – least important information • Can drop B frames up to 30% with acceptable quality • Need to keep track of current frame 23 Video congestion control • Use FCF ACSM to keep track of state • Experimentally the highest false positive rate acceptable is 0.37% • This requires a memory size of 27 bits per flow (about ¼ compared to original 100 bits) 24 P2P Traffic Identification • To limit P2P flows to increase quality for other applications • One possible way to identify a P2P flow: concurrent TCP and UDP flows • Use ACSM for real-time P2P identification 25 Conclusion • It’s feasible for ACSM • FCF approach is the best approach • Two potential applications are introduced for ACSM • ACSM may be beneficial to QoS applications, which are fault-tolerant 26 Comments • Authors focus on accuracy and memory size, but not real performance • FCF approach may not perform well on hardware 27 Question & Answer - End -
© Copyright 2026 Paperzz