Challenges in Realizing Secure Cloud Storage

Secure Deduplication of Encrypted Data
without Additional Independent Servers
Jian Liu1, N. Asokan1,2, Benny Pinkas3
1Aalto
University
of Helsinki
3Bar-Ilan University
2University
Outline
• Challenges for the current deduplication schemes
• Our protocol
• Evaluation
2
Outline
• Challenges for the current deduplication schemes
• Our protocol
• Evaluation
3
Deduplication
Server-side deduplication
Client-side deduplication
F
F
c
Online brute-force attack by a corrupt client
Randomized threshold
D. Harnik, B. Pinkas, A. Shulman-Peleg. Side channels in cloud services:
Deduplication in cloud storage. IEEE Security & Privacy, 2010, 8(6): 40-47.
4
Deduplication on Encrypted Data
?
KB
KA
F
F
c
Convergent Encryption: e.g., K = h(F)
J. R. Douceur, et al. Reclaiming space from duplicate files in a serverless
distributed file system. In ICDCS, pages 617-624. IEEE, 2002.
5
Deduplication on Encrypted Data
?
K
K
F
F
c
Convergent Encryption: e.g., K = h(F)
Offline brute-force attack by a corrupt storage server
6
DupLESS: Independent Key Server
M. Bellare, S. Keelveedhi, and T. Ristenpart. DupLESS: server-aided encryption for deduplicated storage.
In USENIX Security, pages 179-194. USENIX Association, 2013
KA
FB
FA
c
Oblivious PRF
KB
Oblivious PRF
KA
KB = KA iff FA = FB
7
DupLESS: Independent Key Server
Per-client rate limit = 825,000/week
Online brute-force attack by a corrupt storage server
K
F
F
c
Oblivious PRF
Oblivious PRF
K
K
Who will run the independent key server?
8
Threat Model
Online brute-force attack by a corrupt client
Offline brute-force attack by a corrupt storage server
Online brute-force attack by a corrupt storage server
9
Outline
• Challenges for the current deduplication schemes
• Our protocol
• Evaluation
10
Design Goals
• Deployable in commercial settings
• require no independent servers
• maximize deduplication effectiveness
• Deduplication percentage = 1- (size of stored files)/(total size
uploaded files)
• Secure
• Usable
• minimize performance overhead
12
Idea: Oblivious Key Sharing
How do we design a secure oblivious key sharing protocol?
How does Bob decide which Alices run the protocol with?
How to ensure that the performance overhead is minimal?
FB
FA
Oblivious Key Sharing
KB
KA
KB = KA iff FA = FB
13
Protocol Overview
SK_A
PK_A
13-bit
13-bit
KA
FA
FB
c
Oblivious Key Sharing
KB
KA
14
Protocol Overview
13-bit
13-bit
Randomized threshold
SK_AK
PK_A
F
F
c
Oblivious Key Sharing
K
K
Per-File Rate Limit =825,000 / 213 ~ 100
15
Password Authenticated Key Exchange (PAKE)
PWB
PWA
PWB
PWA
Password Authenticated Key Exchange (PAKE)
k’B
k’A
k’B = k’A iff PWB=PWA
Explicit VS. Implicit
16
Password Authenticated Key Exchange (PAKE)
PWB
PWA
M B , g RB M BPWB
M A , g RA M APWA
K B  ( g RA M A
PW A
/MA
PWB
) RB
K A = (g RB M BPWB / M BPWA )RA
M. Abdalla and D. Pointcheval. Simple password-based encrypted key exchange protocols.
In A. Menezes, editor, CT-RSA, volume 3376 of LNCS, pages 191–208. Springer, 2005.
17
Oblivious Key Sharing Protocol
FB
FA, KA
…
H(FB)
H(FA)
c
Password Authenticated Key Exchange (PAKE)
k‘A
k’B
k’B
k’A
Per-file rate limit
will bound the
number of protocol
runs by prioritizing
popular files.
Key Delivery
KB
KA
E(KB, FB)
18
Oblivious Key Sharing Protocol
FB, (pk, sk)
FA, KA
H(FB)
H(FA)
c
Password Authenticated Key Exchange (PAKE)
k‘A
k’B
k’BL, CB= Enc(pk,k’BR+r)
e
k’AL, CA= Enc(pk, KA+k’AR)
if k’AL = k’BL
e = CA – CB = Enc(pk, K-r)
else
e = Enc(pk, r')
KB = Dec(sk, e)+r
E(KB, FB)
(pk, sk): keypair in a
homomorphic encryption
scheme
19
Outline
• Challenges for the current deduplication schemes
• Our protocol
• Evaluation
20
Proof approach
Execution in real model is computationally
indistinguishable from execution in ideal model
• Assume
– Ideal model: a TTP implementing the ideal functionality
– hash function modeled as a random oracle RO
– PAKE protocol is modeled as an oracle PAKE
• Construct a simulator in ideal model that can
simulate protocol execution in real model
https://eprint.iacr.org/2015/455.pdf
21
Security Evaluation
• DupLESS: 2
e
> 825, 000 × x
• Our Protocol: 2
e
> 2 ×100 × y
13
e: min-entropy for a predictable file
x: number of sybils
y: potential owners
22
Deduplication Effectiveness Evaluation
10 6
File popularity in media dataset
File popularity in enterprise dataset
•
“Media” Dataset
– Proxy: Android app popularity
dataset
– 7,396,235 “uploads”
– 178,396 distinct files
•
“Enterprise” Dataset
– Debian Popularity Contest
– 217,927,332 “uploads”
– 143,949 are distinct
•
No “file size” info
Number of Upload Requests
10 5
10 4
10 3
10 2
–
10 1
10 0 0
10
10 1
10 2
10 3
10 4
10 5
Deduplication percentage = 1- (#
stored files)/(# total uploads)
10 6
File ID
24
Deduplication Effectiveness vs. Rate-limit
100
Deduplication percentage in media dataset
Deduplication percentage in enterprise dataset
Perfect deduplication in media dataset
Perfect deduplication in enterprise dataset
Deduplication Percentage %
99.5
99
98.5
98
97.5
97
96.5
10(90)
20(80)
30(70)
40(60)
50(50)
RL (RL )
u
60(40)
70(30)
80(20)
90(10)
RLu: Rate-limit for Bob
RLc: Rate-limit for Alice
c
25
Deduplication Effectiveness vs. Offline Rate
If no checkers are online, deduplication will fail
→ How does this affect the overall deduplication effectiveness?
100
Deduplication percentage in media dataset
Deduplication percentage in enterprise dataset
Deduplication Percentage %
99
98
97
96
95
With 50% users offline
deduplication efficiency
degrades by < 2%
94
93
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Offline Rate
26
Performance Evaluation
• Node.js for the Web framework, Redis for the database
• SHA-256, AES-256-CBC, GMP Library (C)
2 16
2 28
2 26
2 14
2 12
Bandwidth Usage (Byte)
Time Usage (Millisecond)
2 24
2 10
28
2
6
2 22
2 20
2 18
2 16
2 14
24
22 0
2
Our protocol (with 30 PAKE runs)
Upload encrypted files
Upload plain files
22
24
26
28
File Size (KB)
2 10
2 12
2 14
Our protocol (with 30 PAKE runs)
Upload encrypted files
Upload plain files
2 12
2 16
2 10 0
2
22
24
26
28
2 10
2 12
2 14
2 16
File Size (KB)
27
Recap: Design Goals
• Deployable in commercial settings
• require no independent servers
• maximize deduplication effectiveness
• Deduplication percentage = 1- (# stored files)/(# total uploads)
• Secure
• Usable
• minimize performance overhead
28
Summary
• First single server scheme that supports both
deduplication and strong encryption
• Does not require any independent servers
• Deduplication effectiveness close to perfect
• Security guarantees comparable to previous work
• Minimal overhead for large files
http://tinyurl.com/close-wp2
29
10 6
53
10 5
52.5
Deduplication Percentage %
Number of Uploads Requests
Real-life dataset
10 4
10 3
10 2
10 1
10 0
10 0
52
51.5
51
50.5
10 2
10 4
10 6
10 8
50
10(90)
File ID
Deduplication percentage
Perfect deduplication
20(80)
30(70)
40(60)
50(50)
60(40)
70(30)
80(20)
90(10)
RL (RL )
u
c
•
Department file server: 18,956,831uploads, 9,003,530 distinct files
•
File size information is available/used
30
Proof – corrupt uploader
F
RO
h
sh
h
c
PAKE
{k'bi}
F
K
{k'biL, Enc(pk,k'biR+r)}
For a random index j, if k'biL is the
right value, send Enc(pk, K-r);
otherwise send Enc(pk, r')
c
31
Proof – corrupt uploader
h
PAKE
k
k'aL, Enc(pk, K+k'aR)
c
If k'aLis right, it sends K, otherwise,
sends a random value
32
Proof – corrupt server
sh
c
{k'aiL, Enc(pk, K+k'aiR)}
E(F)
{k'biL, Enc(pk, xi)}
c
Enc(pk, y)
If y = (k'ajR+K)-xj, it sends E(F),
otherwise, it sends a random string
33
Limitations
• Proof approach considers only one protocol run
• But system allows/requires multiple runs
– may lead to additional attacks
– E.g.: multiple colluding Bobs upload same file; compare keys
• Correctness for multiple runs requires complex proofs
– requires UC framework or differential privacy
– current work
34