The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel Problem Definition yx user yx cost = 1 S cost = 10 local cache x cost = 10 M central memory with all elements x y z u v z • Requirement: A data structure in user with fast answer to • Solutions: o O(n) – Searching in a list o O(log(n)) – Searching in a sorted list o O(1) – But with false positives / negatives y user 2 Two Possible Errors • False Positive: but the data structure answers • Results in a redundant access to the local cache. y Additional cost of 1. • False Negative: but the data structure answers • Results in an expensive access to the central memory instead of the local cache. x Additional cost of 10-1=9. 3 Bloom Filters (Bloom, 1970) • Initialization: Array of 0 0 zero bits. 0 0 0 0 0 0 0 0 0 0 • Insertion: Each of the elements is hashed times, the corresponding bits are set. • Query: Hashing the element, checking that all bits are set. y x 1 0 1 0 0 1 x 1 1 1 1 0 0 1 11 0 • False positive rate (probability) of • No false negatives. 1 0 1 z 0 1 1 1 1 0 0 11 1 w 1 0 0 . 4 Bloom Filters are Widely Used • • • • • • Cache/Memory Framework Packet Classification Intrusion Detection Routing Accounting Beyond networking: Spell Checking, DNA Classification • Can be found in o Google's web browser Chrome o Google's database system BigTable o Facebook's distributed storage system Cassandra o Mellanox's IB Switch System 5 The Bloom Paradox Sometimes, it is better to disregard the Bloom filter results, and in fact not to even query it, thus making the Bloom filter useless. 6 Outline Introduction to Bloom Filters The Bloom Paradox o The Bloom Paradox in Bloom Filters o Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter Summary 7 Bloom Paradox Example Bloom filter • Parameters: • Extreme case without locality: All elements with equal probability of belonging to the cache. o Toy example 8 Bloom Paradox Example • Parameters: • Let be the set of elements that the Bloom filter indicates are in o In particular, no false negatives in Bloom filter • Intuition: B user Bloom filter Bloom filter cost = 1 S cost = 10 cost = 10 local cache x central memory with all elements x z . M . y z u v 9 Bloom Paradox Example • Parameters: • Let be the set of elements that the Bloom filter indicates are in o In particular, no false negatives in Bloom filter • Surprise: B Bloom filter cost = 1 S cost = 10 cost = 10 local cache x central memory with all elements x z . M . y z u v 9 Bloom Paradox Example • Parameters: • Let be the set of elements that the Bloom filter indicates are in o In particular, no false negatives in Bloom filter • Surprise: B Bloom filter . . The Bloom filter indicates the membership of elements. Only of them are indeed in . Bloom Paradox Example • When the Bloom filter states that , it is wrong with probability • Average cost if we listen to the Bloom filter: • Average cost if we don’t: = = The Bloom filter is useless! Don’t listen to the Bloom filter 11 Outline Introduction to Bloom Filters The Bloom Paradox o The Bloom Paradox in Bloom Filters o Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter Summary 12 Costs of the Two Possible Errors • The cost of a false positive : 1 • The cost of a false negative : • In the cache example: 13 Conditions for the Bloom Paradox • Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter • Intuition: The Bloom paradox occurs more often when: o is small local cache Bloom filter central memory 14 Conditions for the Bloom Paradox • Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter • Intuition: The Bloom paradox occurs more often when: o is small o is large (i.e. is small) local cache Bloom filter central memory 14 Conditions for the Bloom Paradox • Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter • Intuition: The Bloom paradox occurs more often when: o is small o is large (i.e. is small) o is small (because the Bloom filter implicitly assumes ) local cache Bloom filter central memory 14 Conditions for the Bloom Paradox • Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter • Intuition: The Bloom paradox occurs more often when: If and the Bloom o is small if is small) o paradox is largeoccurs (i.e. o is small (because the Bloom filter implicitly assumes Theorem 1: of the Bloom Paradox: (for • Boundaries ) ) The Bloom paradox occurs if and only if 14 Bloom Filter Improvements • Theorem 1: The Bloom paradox occurs if and only if • Use the formula to improve the Bloom filter o Only insert / query Bloom filter if the formula expects it to be useful local cache Bloom filter central memory 15 Bloom Filter Improvements • Theorem 1: The Bloom paradox occurs if and only if • Use the formula to improve the Bloom filter o Only insert / query Bloom filter if the formula expects it to be useful local cache Bloom filter central memory 15 Outline Introduction to Bloom Filters The Bloom Paradox o The Bloom Paradox in Bloom Filters o Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter Summary 16 Counting Bloom Filters (CBFs) • Bloom filters do not support deletions of elements. Simply resetting bits might cause false negatives. y x 1 0 1 0 1 1 1 1 0 0 0 1 0 0 1 0 0 0 • The solution: Counting Bloom filters - Storing array of instead of bits. o Insertion: Incrementing counters by one. o Deletion: Decrementing counters by one. o Query: Checking that counters are positive. y x +1 +1 0 1 0 +1 +1 1 0 0 2 0 +1 1 counters +1 0 1 • The same false positive probability. • Require too much memory, e.g. 57 bits per element for 0 . Counting Bloom Filter Query • Query o y Checking that 0 1 counters are positive. 0 2 y 5 0 1 8 3 0 2 1 z o Question: Which is more likely to be correct? y or z? 18 The Bloom Paradox in the Counting Bloom Filter • Theorem 2: Let set of denote the values of the counters pointed by the hash functions. Then, Only counters product matters! 19 CBF Based Membership Probability -Before checking CBF, a priori membership probability = -CBF indicates counters product=8 a posteriori membership probability ≈ 0.69 • Parameters: n=3328, m = 28485, k=6 ≈ 0.03 20 Experimental Results • Internet trace (equinix-chicago) with real hash functions. Counting Bloom filter parameters: n=210, m / n = 30, k=5, 220 queries 21 Concluding Remarks • Discovery of the Bloom paradox • Importance of the a priori membership probability • Using the counters product to estimate the correctness of a positive indication of the CBF 22 Thank You
© Copyright 2026 Paperzz