Poster: Privacy-Preserving String Search for Genome Sequences

Poster: Privacy-Preserving String Search for Genome Sequences
using Fully Homomorphic Encryption
Yu Ishimaki1
1Waseda University,
3Japan Science and
Kana Shimizu1,2
Koji Nuida2,3
Hayato Yamana1
Tokyo, Japan 2National Institute of Advanced Industrial Science and Technology(AIST), Tokyo, Japan
Technology Agency (JST) PRESTO Researcher
Introduction ( Secure Genome Search Protocol )
=
Genome Sequence
Does Bob have a typical
pattern for the disease?
Privacy protection for genome sequences is indispensable
Individual ID
search!!
G T G
(normal)
(normal)
(normal)
(disease)
4
1 2 3 4 5 6 ...
A T C G T G. . .
Bob
User(Client)
..
..
Server
30%
1
T
T
A
G
..
..
2
G
A
C
C
..
..
3
A
C
A
T
..
..
4
G
G
T
G
..
..
5
T
A
T
T
..
..
6
A
G
C
G
...
...
...
...
..
..
User only knows the statistics
▶ Server does not learn the user’s query
▶
Previous Work using Additively Homomorphic Encryption ( PBWT-sec[1] [4])
f1 f2 f3
Efficient searchable data structure(PBWT)[2]
A
2
Query= G T G
1
2
3
4
5
6
...
1
x
T G A G T A . . . → x 34
T A C G A G ...
x1
x2
A C A T T C ...
x1
x2
x3
x4
2
x2
x3
x4
x1
3
x3
x1
x2
x4
4
x1
x2
x4
x3
5
6
x2
x1
x4
x3
x1
x3
x2
x4
1
...
...
...
...
...
→
3
G C T G T G ...
original database
2
4
sorted order
3
4
5
6
C C T
C A G
G T G
A A G
T
G ...
A
T
T
A ...
G ...
C ...
PBWT
...
f0
C
g0
G
T
v (4-th column) 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 3 4 4 4 4
g1
f1
v (5-th column) 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 3 4
g2
f2
v (6-th column) 0 0 1 1 1 1 1 1 1 1 2 3 3 4 4 4 4 4 4 4
f 3 {g3
▶ Search interval ( f i , gi ] = (v[ f i−1 ], v[gi−1 ]]
f 3 = v[v[v[ f 0]]]
g3 = v[v[v[g0]]]
▶ v is pre-computed from PBWT
▶ String search = looking-up an element of v
g1 g2 g3
Look-up v[. . . v[t] . . .] by Recursive OT to protect "v" [1]
Look-up v[t] by Oblivious Transfer (OT) to protect "t"
c=
v = [v[1], . . . , v[t], . . . , v[n]]
Query: q = [q[1] = 0, . . . , q[t] = 1, . . . , q[n] = 0]
⊕n
c=
i=1 (Enc (q[i]) ⊗ v[i]) = v[t]
⊕n
i=1 (Enc (q[i])
⊗ (v[i] + r)
mod n )
Set the next query
q = [q[1] = 0, . . . , q[(v[t] + r)
mod n] = 1, . . . , q[n] = 0]
r -left permutation
Enc⃗(q) = [Enc (q[1]) = 0, . . . , Enc (q[v[t]] = 1), . . . , Enc (q[n]) = 0]
Again compute c
PBWT-sec only counts # of prefix match (advanced statistics and wildcard search are not available) because it uses additively HE
FHE is desired!
Our Proposed Method adopting Fully Homomorphic Encryption
Polynomial-CRT packing[3]
Look-up v[t] by 2D representation
Enc (q1 )
0 ... 1 ... 0
encrypt one vector into one ciphertext
20400 ⊕ 00001 = 20401
20400 ⊗ 10201 = 20800
Perm ( 2 0 4 5 3 , 2) = 4 5 3 2 0
▶
Utilize element-wise computation
v1
..
v t0
..
vℓ
1 . . . t1 . . . ℓ
v[t]
server’s vector
Experimental Result
{
Enc (q0 )
c=
t 0 = t/ℓ
t 1 = t mod ℓ
0.
.
0.
.
1.
.
0
⊕ℓ
i=1 (
Enc (q1 ) ⊗ Perm (Enc (q0 ),i) ⊗vi )




0. . . 1. . . 0 iff i = t 0

Yield 

 0. . . 0. . . 0 (all zero vector) otherwise

c = 0. . . v[t]. . . 0
(t 0 + t 1 )
mod ℓ -th
▶
Get v[t] !
Temporal masking r is omitted here
Conclusion and Future Work
Conclusion
Implemented secure genome search protocol with FHE, only
10 times slower than PBWT-sec.
Future Work
Extend the functionality
conducting wild card search
calculating more advanced statistics
[1] K. Shimizu, K. Nuida and S. Rätsch: Efficient Privacy-Preserving String Search and an Application in Genomics,
Bioinformatics, doi:10.1093/bioinformatics/btw050, 2016.
[2] R. Durbin: Efficient haplotype matching and storage using the Positional Burrows-Wheeler Transform (PBWT),
Bioinformatics, Vol. 30, No. 9, pp. 1266-1272, 2014.
Figure 1: Runtime
Figure 2: Transfered Data Size
[3] N. P. Smart and F. Vercauteren: Fully homomorphic SIMD operations, Designs, Codes and Cryptography, Vol. 71, No.
1, pp. 57-81, 2014.
[4] PBWT-sec. https://github.com/iskana/PBWT-sec/, accessed on 2016-4-1.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.