Bloom Filters in Adversarial Environments

Bloom Filters in Adversarial
Environments
Eylon Yogev
Moni Naor
Weizmann Institute of Science
Dagstuhl Workshop on Hashing. May 2017
Who is your Adversary
and what power does it have?
Worst case analysis of algorithms is the hallmark
of theoretical computer science
• For a given algorithm A find an input D on
which A has the worst performance
– Evaluate the algorithm based on this value
• But what about probabilistic algorithms?
• What about reactive algorithms?
Approximate Set Membership
Bloom Filters
• Universe 𝑈, subset 𝑆 ⊂ 𝑈 of size n.
• Goal: Answer “Is 𝑥 ∈ 𝑆? “
• Data structure should be:
|log
– Fast (even constant time query).
– Small (smaller than explicit representation).
• The price: Allow errors.
• Introduced by Bloom 1970.
For simplicity: let 𝑆
be fixed in advance
𝑢
| ≈ 𝑛 log 𝑢
𝑛
𝑈
𝑆
Bloom Filters
Preprocessing
algorithm
• (𝑛, 𝜀)- Bloom filter is a pair of algorithms (𝑃, 𝑄):
• Definition: for 𝑀 ← 𝑃 𝑆 we require:
Query response
algorithm
– For any 𝑥 ∈ 𝑆: Pr[𝑄𝑀 𝑥 = ′Yes′] = 1
– For any 𝑥 ∉ 𝑆: Pr[𝑄𝑀 𝑥 = ′Yes′] ≤ 𝜀
• Probability taken over 𝑃.
• Constructions: 𝑚 ≈ 𝑛 log
𝑈
1
𝜀
– This is tight. (Carter et al. 1978)
𝑆
Bloom Filter Constructions
• Universe 𝑈, subset 𝑆 ⊂ 𝑈 of size n.
• Goal: Answer “Is 𝑥 ∈ 𝑆? “
The classics
Storing 𝑥:
• Hash 𝑥 to a range [ℓ] by function ℎ
– Have an array of size ℓ
– Put ‘1’ in location ℎ 𝑥
Looking up 𝑥:
• Check whether location ℎ 𝑥 is ‘1’
0
1
1
0
1
0
1
0
𝑈
𝑆
Repeat log 1/𝜖
Memory:
1.4𝑛 log 1/𝜖
Reduction to exact membership
• Universe 𝑈, subset 𝑆 ⊂ 𝑈 of size n.
• Goal: Answer “Is 𝑥 ∈ 𝑆? “
𝑈
𝑆
Possible to construct a Bloom filter
from any exact dictionary D:
• Storing x: store 𝑔 𝑥 in D
Sufficient: Pairwise
independent
• Query x: check if 𝑔 𝑥 in D
Memory:
𝑛 log 1/𝜖
Bloom Filter Constructions and
𝑈
Dictionaries
• Universe 𝑈, subset 𝑆 ⊂ 𝑈 of size n.
• Goal: Answer “Is 𝑥 ∈ 𝑆? “
𝑆
Possible to construct a Bloom filter from any
Oblivious dictionary:
• Searching for 𝑥: by accessing a sequence
of locations
– determined by 𝑥 and hash functions
• Linear Probing
• Cuckoo Hashing
• Until finding 𝑥, or giving up
• Instead of storing 𝑥: put 𝑣 𝑥
0
6
3
3
5
0
9
0
Sufficient: Pairwise
independent
Applications
• Extremely useful. Applied in various areas:
– Databases, networking, spam filtering, streaming,
security, caching...
• Common usage: approximate a cache’s content.
Web Proxy
Bloom
Filter
Disk
Internet
Definition, Revisited
• For any 𝑥 ∉ 𝑆: Pr[𝑄𝑀 𝑥 = ′Yes′] ≤ 𝜀.
• Suppose the user gets the responses of the queries.
– E.g. by measuring the response time.
• Can a (malicious) user increase the false positive
probability?
Web Proxy
0.01 sec.
Bloom
Filter
1 sec.
Disk
Internet
2 sec.
Definition, Revisited
Adversary
Challenger
𝑆
𝑀←𝑃 𝑆
𝑥
𝑦
Adversary wins if:
• 𝑦=1
• 𝑥∉𝑆
𝑦 ← 𝑄𝑀 𝑥
Security: Pr 𝐴 𝑤𝑖𝑛𝑠 ≤ 𝜀.
10
Definition, Revisited
Adversary
Challenger
𝑆
𝑀←𝑃 𝑆
𝒙𝒊
𝒚𝒊
𝑥∗
Adversary wins if:
• 𝑦=1
• 𝑥∉𝑆
𝒚𝒊 ← 𝑸𝑴 𝒙𝒊
𝑦 ∗ ← 𝑄𝑀 𝑥 ∗
Security: Pr 𝐴 𝑤𝑖𝑛𝑠 ≤ 𝜀.
11
Definition, Revisited
Adversary
Challenger
𝑆
𝑀←𝑃 𝑆
𝑥𝑖
𝑦𝑖
𝑥∗
Adversary wins if:
• 𝐲∗ = 𝟏
• 𝒙∗ ∉ 𝑺 ∪ 𝒙𝒊
𝑦𝑖 ← 𝑄𝑀 𝑥𝑖
𝑦 ∗ ← 𝑄𝑀 𝑥 ∗
Security: Pr 𝐴 𝑤𝑖𝑛𝑠 ≤ 𝜀.
𝒊
12
Adversarial Resilient Bloom Filter
Adversary
Challenger
𝑆
𝑀←𝑃 𝑆
𝑥𝑖
𝑦𝑖
𝑥∗
Adversary wins if:
• 𝐲∗ = 𝟏
• 𝒙∗ ∉ 𝑺 ∪ 𝒙𝒊
𝒊
𝑦𝑖 ← 𝑄𝑀 𝑥𝑖
𝑦 ∗ ← 𝑄𝑀 𝑥 ∗
(𝑛, 𝜀) – Strong Adversarial
Resilient Bloom filter:
Pr 𝐴 𝑤𝑖𝑛𝑠 ≤ 𝜀
13
Adversarial Resilient Bloom Filter
Adversary
Challenger
𝑆
𝑀←𝑃 𝑆
𝑥𝑖
𝒕
𝑦𝑖
𝑥∗
Adversary wins if:
• y∗ = 1
• 𝑥 ∗ ∉ 𝑆 ∪ 𝑥𝑖
𝑖
𝑦𝑖 ← 𝑄𝑀 𝑥𝑖
𝑦 ∗ ← 𝑄𝑀 𝑥 ∗
(𝑛, 𝜀, 𝒕) – Adversarial Resilient
Bloom filter:
Pr 𝐴 𝑤𝑖𝑛𝑠 ≤ 𝜀
14
This Talk
• Defining adversarial resilient Bloom filter.
• A transformation making any Bloom filter adversarial
resilient
– Using PRPs.
– Also a concrete implementation.
• The necessity of one way functions.
– Even for unsteady representations.
• A construction resilient to unbounded adversaries.
• Implementations Issues
Transformation
Theorem: A transformation for any Bloom filter using
PRPs.
Preserving the parameters
Standard
Bloom filter
Adversarial resilient
Bloom filter
Parameters: (𝑛, 𝜀)
Memory: 𝑚
Parameters: (𝑛, 𝜀)
Memory: 𝑚 + 𝜆
Tools: pseudo-random
permutations.
Security
parameter
Pseudo-Random Permutations
(PRP)
Pseudo-Random Permutations
PRPs
Function family ℱ = {ℱ𝑛 : {0,1}𝑛 → {0,1}𝑛 }𝑛∈ℕ
• Each member of ℱ𝑛 is a permutation on {0,1}𝑛
• It is efficiently sampleable and computable Key
– Given the key
• Hard to distinguish from random
– Given blackbox access
Known (existentially):
One-way functions
Input
PRP
Key length 𝜆
Output
PRF
Models block ciphers such as DES or AES
PRP
Transformation
Theorem: A transformation for any Bloom filter using
PRPs.
Preserving the parameters
𝑥
PRP(𝑥)
Standard
Bloom filter
Necessity of Cryptographic Functions
Theorem 2: (Non-trivial) resilient Bloom filters must
use one-way functions.
Uses less space than it
takes to store
the elements explicitly
Recall:
One-way functions
PRF
PRP
Proof of Necessity of OWF
• Show: No-OWF
• Proof Recipe:
1.
2.
3.
4.
no-resilient Bloom filter.
𝑡=𝑂
𝑚
𝜀
Query some 𝑥1 , … , 𝑥𝑡 and get 𝑦1 , … , 𝑦𝑡 .
“Learn” some 𝑀′ ≈ 𝑀.
Find an 𝑥 ∗ such that: 𝑄𝑀′ 𝑥 ∗ = 1.
Prove that (w.h.p.): 𝑄𝑀 𝑥 ∗ = 1. It takes 1/𝜀 queries
to learn 1 bit
• Modeled as a PAC learning problem.
• Use the inverter to find a consistent hypothesis
Proof of Necessity of OWF
• Show: No-OWF
• Proof Recipe:
1.
2.
3.
4.
no-resilient Bloom filter.
Query some 𝑥1 , … , 𝑥𝑡 and get 𝑦1 , … , 𝑦𝑡 .
“Learn” some 𝑀′ ≈ 𝑀.
Find an 𝑥 ∗ such that: 𝑄𝑀′ 𝑥 ∗ = 1.
Prove that (w.h.p.): 𝑄𝑀 𝑥 ∗ = 1.
𝑄𝑀′ ⋅ ≈ 𝑄𝑀 (⋅)
Unsteady Representations
What if the query algorithm Q had more power?
• Q could be randomized.
– Add noise to responses.
– Differential Privacy style
• Q could change the underlying representation.
– Be consistent with previous responses. 𝑥
𝑥
< 100
Bloom
filter 1
≥ 100
Bloom
filter 2
Unsteady Representations
Theorem: (Non-trivial) resilient Bloom filters with
unsteady representations must use one-way functions.
Adversarial resilient
Bloom filter
Even with unsteady
representation.
One-way functions
Proof for Unsteady Representations
• Show: No-OWF implies no-resilient Bloom filter.
• Proof Recipe:
1.
2.
3.
4.
Query some 𝑥1 , … , 𝑥𝑡 and get 𝑦1 , … , 𝑦𝑡 .
“Learn” some 𝑀′ ≈ 𝑀𝑡 .
Find an 𝑥 ∗ such that 𝑄𝑀′ 𝑥 ∗ = 1.
Prove that (w.h.p.) 𝑄𝑀𝑡 𝑥 ∗ = 1.
Uses “Adaptively Changing
Distributions” framework
[NR06].
𝑡=𝑂
𝑚
𝜀2
Proof for Unsteady Representations
• Show: No-OWF implies no-resilient Bloom filter.
• Proof Recipe:
1.
2.
3.
4.
Query some 𝑥1 , … , 𝑥𝑡 and get 𝑦1 , … , 𝑦𝑡 .
“Learn” some 𝑀′ ≈ 𝑀𝑡 .
Find an 𝑥 ∗ such that 𝑄𝑀′ 𝑥 ∗ = 1.
Prove that (w.h.p.) 𝑄𝑀𝑡 𝑥 ∗ = 1.
Δ 𝑄𝑀′ ⋅ , 𝑄𝑀 (⋅) is small
Unbounded Adversaries
Theorem: For any 𝜀, 𝑡 there exists an (𝑛, 𝜀, 𝑡)-resilient
Bloom filter, against unbounded adversaries that
uses 𝑚 bits of memory where
1
𝑚 = 𝑂 𝑛 log
+ 𝑡
𝜀
Adversary
learns 1 bit
per query
additional memory
required for any
Bloom filter
Unbounded
Unbounded Adversaries
Theorem:
For any 𝜀, 𝑡 there exists an (𝑛, 𝜀, 𝑡)-resilient Bloom
filter, against unbounded adversaries that uses 𝑚 bits
of memory where
Adversary
1
learns 1 bit
Open: close
𝑚 = 𝑂 𝑛 log
+ 𝑡
per query
𝜀
the gap!
• Lower Bound: 𝑚 ≥ 𝑛 log
Adversary learns 1
bit per 1/𝜀 queries
1
𝜀
+ 𝜀𝑡
Bloom Filter Constructions and
Dictionaries
• Universe 𝑈, subset 𝑆 ⊂ 𝑈 of size n.
• Goal: Answer “Is 𝑥 ∈ 𝑆? “ • Need to specify dictionary
• Need to construct 𝑣 𝑥
Can construct a Bloom filter from any oblivious
dictionary:
• Searching for 𝑥: by accessing a sequence of locations
– determined by 𝒙 and hash functions
• Linear Probing
• Cuckoo Hashing
• Until finding 𝑥, or giving up
• Instead of storing 𝑥: put 𝑣 𝑥
0
6
3
3
5
0
9
0
fingerprint
Cuckoo Hashing: Basics
• Introduced by Pagh and Rodler (2001)
• Extremely simple:
– 2 tables: T1 and T2
• Each of size r = (1+𝜷)n
h1(x)
– 2 hash functions:
h1 and h2
31
..
.
– Check in T1 and T2
c
x
a Where
is x?
t
y
b
...
Lookup:
d
T1
z
T2
h2(x)
The Cuckoo Graph
Set S ⊂ U containing n elements
h1,h2 : U {0,...,r-1}
Insertion
algorithm
achieves
this
Bipartite graph with |L|=|R|=r
Edge (h1(x), h2(x)) for every x∈S
Fact:
S is successfully stored
Every connected component in the cuckoo
graph has at most one cycle
Nodes:
locations in memory
33
Expected insertion time: O(1)
t-wise Independent Hash Functions
• Family H of hash functions {hi}hi ∈H
•  i hi : U  {0,...,m-1}
Definition: A family H of hash functions is t-wise
independent if for any distinct x1,...,xt∈U and for
any y1, ..., yt∈{0,...,m-1}:
Pr [h(x1)=y1 ^ h(x2)=y2 ^ ... ^ h(xt)=yt] = 1/mt
h∈H
Implementing 𝑣 by a t-wise independent family
34

Need: t-wise ind. family that can be computed efficiently
t-wise Uniform Hashing
[Pagh-Pagh’08]
Function defined
by
• h1, h2
• T1, T2
Compute F(x):
h2(x)
h1(x)
Combine
35
T1
...
• T2[h2(x)]
...
• T1[h1(x)]
T2
T1, T2
Random tables
Uniform Hashing [Pagh-Pagh’08]
Efficient construction of highly independent
functions that are easily evaluated
• In the spirit of Cuckoo hashing
Let ℋ: 𝒟 → 𝒮 , 𝒢: 𝒟 → ℛ and ℱ: 𝒮 → ℛ be
function families.
Given 𝑓1 , 𝑓2 ∈ ℱ , ℎ1 , ℎ2 ∈ ℋ and 𝑔 ∈ 𝒢, define
T2
T1
𝒫𝒫 𝑥 = 𝑓1 ℎ1 𝑥
Truly independent
•Realized as a table
⊕ 𝑓2 ℎ2 𝑥
Cuckoo Hashing
⊕ 𝑔(𝑥)
Overflow
…Uniform Hashing [Pagh-Pagh’08]
When U=2𝑘 :
• ℋ and 𝒢 are 𝑘-wise independent families
• ℱ is the family of all functions from 𝒮 to ℛ,
then the family
𝒫𝒫 ℋ, 𝒢, ℱ = ℱ ∘ ℋ ⊕ ℱ ∘ ℋ ⊕ 𝒢
𝑞
2𝑘
is 𝑂
-indistinguishable from a random
function by any 𝑞-query, non-adaptive,
distinguisher.
BHKN 2013: Show that this
actually holds for adaptive
distinguishers
Constructing 𝑣 via t-wise
independence
• Suppose cuckoo hash functions are known
• Adversary learns each query whether
𝑣 𝑥 = 𝑣(𝑥 ′ )
– Therefore: t-wise independence suffices
• Construction requires 𝑡 log
Can save an extra log
1
𝜀
1
𝜀
bits
Using
PaghPagh
factor:
Instead of computing 𝑣 𝑥 in one shot, compute it
bit by bit as needed
as long as equality holds
39
Adversary learns in average the
value of a single function!
Implementation
• Bloom filters are used when performance is crucial.
• We want an implementation that is:
– Fast: use a simple heuristic hash function.
– Secure: use a secure PRF.
How can we satisfy the two simultaneously?
GGM:
𝐹𝑆 𝑥 = 𝐺𝑠𝑛 (⋅⋅⋅ (𝐺𝑠2 𝐺𝑠1 𝑆
⋅⋅⋅)
AES-NI
• Stands for: AES-New Instructions.
• Provides hardware acceleration for
encryption/decryption of AES.
• Embedded in most modern CPUs.
• Few cycles per byte with AES/ECB.
≈ 1 + ϵ for large buffers
Source: www.legitreviews.com
AES as a PRF
• Assumption: blockcipher behaves as a PRF/PRP.
• Reuse the randomness:
– Split the 128 bit output of (pseudo) random bits.
• Worst-case to average case reduction.
Conclusion: a Bloom filter that is:
1. Secure.
2. “As fast as” any other
implementation.
𝑥
PRF(𝑥)
Bloom filter
Experiments
• Implementation based on linear probing
– Memory utilization: 𝛼 = 0.6
microsec per 105
operations
• Compared with difference hash instantiations.
10.0
AES
9.0
CityHash
8.0
Jenkins
7.0
Murmur3
Spooky
6.0
Pearson
5.0
Blake2
4.0
3.0
This work
2.0
1.0
# of insert + query
operations
0.0
100,000,000
10,000,000
1,000,000
100,000
Open Problems
1. Can cryptographic primitives help streaming
algorithms in the adversarial model?
- For example, consider computing the median.
2. Show that randomized streaming algorithm
in the adversarial model must use crypto.
3. Construct streaming algorithms which are
secure in the pan-private-adversarial model?
44
Other Examples of Adaptive/Active
Adversary
• Timing attacks on dictionaries with universal hashing
Lipton and Naughton 1993
• Sketching in an adversarial environment [MNS11]
• Hardt and Woodruff [HW13]: linear sketch
algorithms with an adversary that can adaptively
choose the inputs according to previous evaluations
of the sketch.
Thanks!