The Bloom Paradox in the Counting Bloom Filter

The Bloom Paradox
Ori Rottenstreich
Joint work with Isaac Keslassy
Technion, Israel
Problem Definition
yx
user
yx
cost = 1
S
cost = 10
local cache
x
cost = 10
M
central memory with
all elements
x
y
z
u
v
z
• Requirement: A data structure in user with fast answer to
• Solutions:
o O(n) – Searching in a list
o O(log(n)) – Searching in a sorted list
o O(1) – But with false positives / negatives
y
user
2
Two Possible Errors
• False Positive:
but the data structure answers
• Results in a redundant access to the local cache.
y
 Additional cost of 1.
• False Negative:
but the data structure answers
• Results in an expensive access to the central memory instead of
the local cache.
x
 Additional cost of 10-1=9.
3
Bloom Filters (Bloom, 1970)
• Initialization: Array of
0
0
zero bits.
0
0
0
0
0
0
0
0
0
0
• Insertion: Each of the elements is hashed times, the
corresponding bits are set.
• Query: Hashing the element, checking that all bits are set.
y
x
1
0
1
0
0
1
x
1
1
1
1
0
0
1
11
0
• False positive rate (probability) of
• No false negatives.
1
0
1
z
0
1
1
1
1
0
0
11
1
w
1
0
0
.
4
Bloom Filters are Widely Used
•
•
•
•
•
•
Cache/Memory Framework
Packet Classification
Intrusion Detection
Routing
Accounting
Beyond networking: Spell Checking, DNA Classification
• Can be found in
o Google's web browser Chrome
o Google's database system BigTable
o Facebook's distributed storage system Cassandra
o Mellanox's IB Switch System
5
The Bloom Paradox
Sometimes, it is better to disregard the Bloom
filter results, and in fact not to even query it,
thus making the Bloom filter useless.
6
Outline
 Introduction to Bloom Filters
 The Bloom Paradox
o The Bloom Paradox in Bloom Filters
o Analysis of the Bloom Paradox
o The Bloom Paradox in the Counting Bloom Filter
 Summary
7
Bloom Paradox Example
Bloom filter
• Parameters:
• Extreme case without locality: All elements with equal probability of
belonging to the cache.
o Toy example
8
Bloom Paradox Example
• Parameters:
• Let
be the set of elements that the Bloom filter indicates are in
o In particular, no false negatives in Bloom filter 
• Intuition:
B
user
Bloom filter
Bloom filter
cost = 1
S
cost = 10
cost = 10
local cache
x
central memory with
all elements
x
z
.
M
.
y
z
u
v
9
Bloom Paradox Example
• Parameters:
• Let
be the set of elements that the Bloom filter indicates are in
o In particular, no false negatives in Bloom filter 
• Surprise:
B
Bloom filter
cost = 1
S
cost = 10
cost = 10
local cache
x
central memory with
all elements
x
z
.
M
.
y
z
u
v
9
Bloom Paradox Example
• Parameters:
• Let
be the set of elements that the Bloom filter indicates are in
o In particular, no false negatives in Bloom filter 
• Surprise:
B
Bloom filter
.
.
The Bloom filter indicates the membership of
elements. Only
of them are indeed in
.
Bloom Paradox Example
• When the Bloom filter states that
, it is wrong with probability
• Average cost if we listen to the Bloom filter:
•
Average cost if we don’t:
=
=

The Bloom filter is useless!
Don’t listen to
the Bloom filter
11
Outline
 Introduction to Bloom Filters
 The Bloom Paradox
o The Bloom Paradox in Bloom Filters
o Analysis of the Bloom Paradox
o The Bloom Paradox in the Counting Bloom Filter
 Summary
12
Costs of the Two Possible Errors
• The cost of a false positive : 1
• The cost of a false negative :
• In the cache example:
13
Conditions for the
Bloom Paradox
• Let
be the a priori membership probability of
o i.e. before getting the answer of the Bloom filter
• Intuition: The Bloom paradox occurs more often when:
o
is small
local
cache
Bloom filter
central memory
14
Conditions for the
Bloom Paradox
• Let
be the a priori membership probability of
o i.e. before getting the answer of the Bloom filter
• Intuition: The Bloom paradox occurs more often when:
o
is small
o
is large (i.e.
is small)
local
cache
Bloom filter
central memory
14
Conditions for the
Bloom Paradox
• Let
be the a priori membership probability of
o i.e. before getting the answer of the Bloom filter
• Intuition: The Bloom paradox occurs more often when:
o
is small
o
is large (i.e.
is small)
o
is small (because the Bloom filter implicitly assumes
)
local
cache
Bloom filter
central memory
14
Conditions for the
Bloom Paradox
• Let
be the a priori membership probability of
o i.e. before getting the answer of the Bloom filter
• Intuition: The Bloom paradox occurs more often when:
If
and
the Bloom
o
is small
if is small)
o paradox
is largeoccurs
(i.e.
o
is small (because the Bloom filter implicitly assumes
Theorem 1: of the Bloom Paradox: (for
• Boundaries
)
)
The Bloom paradox occurs if and only if
14
Bloom Filter Improvements
• Theorem 1:
The Bloom paradox occurs if and only if
• Use the formula to improve the Bloom filter
o Only insert / query Bloom filter if the formula expects it to be
useful
local
cache
Bloom filter
central memory
15
Bloom Filter Improvements
• Theorem 1:
The Bloom paradox occurs if and only if
• Use the formula to improve the Bloom filter
o Only insert / query Bloom filter if the formula expects it to be
useful
local
cache
Bloom filter
central memory
15
Outline
 Introduction to Bloom Filters
 The Bloom Paradox
o The Bloom Paradox in Bloom Filters
o Analysis of the Bloom Paradox
o The Bloom Paradox in the Counting Bloom Filter
 Summary
16
Counting Bloom Filters (CBFs)
• Bloom filters do not support deletions of elements. Simply resetting
bits might cause false negatives.
y
x
1
0
1
0
1
1
1
1
0
0
0
1
0
0
1
0
0
0
• The solution: Counting Bloom filters - Storing array of
instead of bits.
o Insertion: Incrementing counters by one.
o Deletion: Decrementing counters by one.
o Query: Checking that counters are positive.
y
x
+1 +1
0
1
0
+1
+1
1
0
0
2
0
+1
1
counters
+1
0
1
• The same false positive probability.
• Require too much memory, e.g. 57 bits per element for
0
.
Counting Bloom Filter Query
• Query
o
y
Checking that
0
1
counters are positive.
0
2
y
5
0
1
8
3
0
2
1
z
o Question: Which is more likely to be correct? y or z?
18
The Bloom Paradox in the
Counting Bloom Filter
• Theorem 2:
Let
set of
denote the values of the counters pointed by the
hash functions. Then,
Only counters product matters!
19
CBF Based
Membership Probability
-Before checking CBF, a priori membership probability =
-CBF indicates counters product=8
 a posteriori membership probability ≈ 0.69
• Parameters: n=3328, m = 28485, k=6
≈ 0.03
20
Experimental Results
• Internet trace (equinix-chicago) with real hash functions.
Counting Bloom filter parameters: n=210, m / n = 30, k=5, 220
queries
21
Concluding Remarks
• Discovery of the Bloom paradox
• Importance of the a priori membership probability
• Using the counters product to estimate the
correctness of a positive indication of the CBF
22
Thank You