The Power of File-Injection Attacks on Searchable Encryption

All Your Queries Are Belong to Us:
The Power of File-Injection Attacks
on Searchable Encryption
Yupeng Zhang , Jonathan Katz, Charalampos Papamanthou
University of Maryland
What is Searchable Encryption?
client
server
search query:
keyword
An Example of Searchable Encryption
k1
F1 F4 F2
k2
F3 F6 F4 F2
k3
F5 F1
F1
F2
F3
F4
F5
F6
An Example of Searchable Encryption
token
k1
PRFsk ( k1 )
PRFsk (k1 )
k1
F1 F4 F2
k2
F3 F6 F4 F2
k3
F5 F1
F1
F2
F3
F4
F5
F6
An Example of Searchable Encryption
k1
F1 F4 F2
k2
F3 F6 F4 F2
k3
F5 F1
F1
F2
F3
F7
F4
F5
F6
F7
Leakage of Searchable Encryption
search k1 on new files!
deterministic!
k1
k1
F 1 F 4 F 2 F7
k2
F3 F6 F4 F2
k3
F5 F1
F1
F2
F3
F4
F5
F6
F7
file access patterns!
Leakage of Searchable Encryption
• Search pattern leakage.
• Access pattern leakage.
Leaked by all efficient searchable encryption schemes.
• No Forward Privacy.
All SE schemes except [CM05, SPS14] do not have forward privacy.
Goal of Our Work
• What semantic information does this leakage actually reveal?
• We explore a new class of attacks that is devastating for query privacy.
Attacks on Searchable Encryption
• Islam et al. (IKK12) proposed a query recovery attack.
• Cash et al. (CGPR15) proposed another attack with higher
success probability.
The server knows all or most of the client’s files in plaintext.
Attack Model: File-injection Attack
client
server
search query:
F3
k
F5
F1
F2
F3
F4
F5
F6
First proposed in CGPR15, but not used for query recovery attacks.
Binary Search Attack
search result
File 1:
k k k k k k k k
7
0
File 2:
k k k k k k k k
7
1
File 3:
k k k k k k k k
7
0
0
0
0
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
6
6
6
•
Only inject 14 files for a universe of 10,000 keywords.
•
Inject before seeing the queries (non-adaptive).
•
Can recover all queries with probability 1.
Threshold Countermeasure
Limitation of the attack: long injected files (|K|/2 keywords each).
Countermeasure: filter all files that contains more than T keywords.
Enron data set: 30,109 files, universe of 5,000 keywords
Only 3% of files have more than T=200 keywords.
Enron email dataset. https://www.cs.cmu.edu/~./enron/. Accessed: 2015-12-14.
Modifying the Attack
File 1:
k k k k k k k k
0
1
2
3
File 1
4
5
6
7
File 2
• |K|/2T files of T keywords each to replace 1 file with |K|/2 keywords.
• Inject 131 files for |K|=5,000 and T=200.
Attacks with Partial File Leakage
The server learns a portion of client’s files in plaintext.
(Announcement and alert emails broadcasted to many people)
Attacks with Partial File Leakage
Frequency of a token/keyword:
# of files containing it
total # of files
keywords
estimated
frequency
k1
f*(k1)
k2
f*(k2)
k3
f*(k3)
k4
f*(k4)
k5
f*(k5)
candidate
universe:
f*(k)≈f(t)
token
exact
frequency
t
f(t)
binary
search
attack
Adaptive, applies to SE schemes with no forward privacy.
The server does not always succeed, but can determine whether attacks fails.
Attacks with Partial File Leakage
Refer to our paper for an attack to recover multiple tokens
Experimental Methodology
• Enron data set with 30,109 emails.
• Stem words in the emails (remove -able, -ing etc.).
• Remove stop words (“to”, “you” etc.).
• Extract keywords (in total 77,000).
• Choose top 5,000 with highest frequency as the universe.
Experimental Results: Recover 1 Query
U = 5,000, T = 200, number of injected files = 9
Experimental Results: Recover 100 Queries
U = 5,000, T = 200, number of injected files <= 40
Extensions to Conjunctive SE
• Search files with keywords k1, k2, … kd.
• Ideal leakage: only leak the intersection of their search results.
(No existing scheme achieves ideal leakage.)
Extensions to Conjunctive SE
• Inject n files, each contains L keywords randomly and independently
from the universe.
• Find the injected files that are in the search result of a conjunctive
query, take the intersection of their keywords.
1/𝑑
• L = (1/2) |𝐾|, n > 2d log|K|, the intersection is the conjunctive
query with high probability.
Extensions to Conjunctive SE
1/𝑑
L = (1/2)
|K|, n = (2+e)d log|K|
For any conjunctive query with keywords k1, k2, … kd
1. Nearly half of the injected files are in its search result.
𝑑
• Pr[all keywords in one injected file] ≈ (L/|K|) =1/2
• n’ ≈ n/2 = (1+e/2) d log|K|
2. No other keyword is in the intersection.
1
𝑒
𝑒
n’
∗𝑑(1+
)log|𝐾|
(1+
)
2
• Pr[a keyword in all n’ files] = (L/|K|) =(1/2)𝑑
=1/|K| 2
𝑒 |K|−d
(1+2)
• Pr[no other keyword in all n’ files] = (1−1/|K|
)
→1
|K| = 5000, d=3, n=110, the attack succeed with probability 0.97.
Extensions to Conjunctive SE
Two other attacks, refer to our paper for more details.
Discussions on Potential Countermeasures
• Semantic filter.
Does not work!
• Search result padding.
Does not work!
• File ID shuffling and file length padding. Partially works for
static SE.
• Batched updates.
Partially works.
Conclusions
• File-injection attacks are devastating for query privacy in SE
• Is it a satisfactory tradeoff between efficiency and leakage for
existing SE?
• Future research:
Reduce or eliminate access pattern leakage
Exploring new directions such as interactive protocol or multi-server
• Forward Privacy