Blocked Bloom Filter

Access-Efficient Balanced Bloom Filters
Motivation
 Classical Bloom Filters need a significant number of memory accesses.
 Bloom Filter with 30 bits per element
30・ln 2 ≈ 21 memory accesses.
 Each memory access may be to a distinct memory block
Energy- and cache-inefficient.
 Blocked Bloom Filter [Putze et al.]: Choose a memory block first; then local Bloom Filter
within each block.
 Might cause imbalance between memory blocks
Efficient, but poor false positive rate.
Our Proposal
• Balance the load between the blocks using
multiple hash functions.
• Upon a query, an element might be found in
more than one bucket.
• Objective: minimize the average number of
buckets needed to be checked, ensuring the
optimal balancing
between the buckets.
• In addition, restrict the worst-case
number of buckets need to be checked.
• Utilize an overflow list (in CAM) of size
to
mitigate such extreme situations.
Lower Bound – Basic Idea
Model
•
- The occupancy of bucket , given buckets,
element, and a balancing scheme (random
variable).
•
- General cost function mapping
bucket occupancy to its cost.
• Overall balancing quality:
• Given and , we consider an initial
insertion of
distinct elements.
• Then, we remove elements from the
most occupied buckets, until exactly
elements are left.
• Looks like “pushed-back” Poisson
distribution.
• The specific cost function for Bloom filter is FPR:
• Valid for any convex cost function
Upper Bounds
Example:The Balanced Bloom Filter
• Based on Multi-level Hash Tables (MHT).
• Upon a query, up to hash functions are
• Using 3 subtables of (4,2 and 1 memory
used one by one, where the next hash is
blocks).
used only if the occupancy of the mapped
• In each memory block, 2 bits are saved for
bucket exceeds some threshold.
a counter so bucket occupancy can be
• More specifically, the following schemes
determined.
are used:
• A simple scheme (using only a single
hash function).
• A sequential scheme, using hash
functions with uniform distribution.
• A Mutli-level Hash-Table scheme (MHT),
divides the memory into subtables with
a single hash function for each subtable.
.
Evaluation
• Two orders of magnitude FPR improvement upon
Blocked Bloom Filter, for a memory block size of
256 bits MHT with an overflow list size 0.49% and
only 1.2 memory accesses per element.
• All results were confirmed using real-life traces.