GPU and CPU Parallelization of Secure Two

GPU and CPU Parallelization of
Honest-but-Curious Secure TwoParty Computation
Nathaniel Husted, Steve Myers, abhi shelat, Paul Grubbs
Secure Two-party Computation
Alice and Bob want to compute a public
function of their private inputs.
Disease Database
Alice
Bob
Secure Two-party Computation
Alice
Bob
X
Y
F(X,Y) => Alice & Bob
Alice provides X. Bob provides Y. F(X,Y) is correctly calculated without
Bob learning X and Alice learning Y.
Yao’s Garbled Circuits [Yao1986]
F(X,Y)
X
Y
XOR 0
XOR 1
0
AND 2
OR 4
AND 3
O
U
T
P
U
T
S
I’m going to discuss the current fastest solution for
processing Yao’s Garbled Circuits.
Yao’s Garbled Circuits [Yao1986]
F(X,Y) = X + Y
X
Y
XOR 0
XOR 1
0
AND 2
OR 4
AND 3
O
U
T
P
U
T
S
Wires in Yao’s Garbled Circuits [Yao1986]
• Alice must use random labels (𝜆) for wire values instead of 0’s and
1’s.
Label 0 (𝜆00 )
Wire 0 (𝑊0 )
Label 1 (𝜆10 )
Yao’s Garbled Circuits [Yao1986]
F(X,Y) = X + Y
X
XOR 0
Y
0
0x1212
0x1234
0xCC1C
0x1112
XOR 1
Label 0 = 0xF1F1
Label 1 = 0xABAB
AND 2
OR 4
0x4321
0x9932
0x6753
0x9B3F
AND 3
0x93FA
0x8843
O
U
T
P
U
T
S
Encrypting Gates in Yao’s Garbled Circuits
[Yao1986]
Gate 2 (𝐺2 )
AND
𝑊0
AND GATE
𝑊1
𝑊0
𝑊1
Output
𝜆00
𝜆10
𝜆02
𝜆00
𝜆11
𝜆02
𝜆10
𝜆10
𝜆02
𝜆10
𝜆11
𝜆12
𝑊2
Encrypting Gates in Yao’s Garbled Circuits
[Yao1986]
• Notation shortcut: 𝐸𝑛𝑐𝜆00,𝜆01 (𝜆02 ) = 𝐸𝑛𝑐𝜆00 (𝐸𝑛𝑐𝜆01 (𝜆02 ))
Gate 2 (𝐺2 )
AND
𝑊0
AND GATE
𝑊1
𝑊0
𝑊1
Output
𝜆00
𝜆10
𝐸𝑛𝑐𝜆00,𝜆01 (𝜆02 )
𝜆00
𝜆11
𝐸𝑛𝑐𝜆00,𝜆11 (𝜆02 )
𝜆10
𝜆10
𝐸𝑛𝑐𝜆10,𝜆01 (𝜆02 )
𝜆10
𝜆11
𝐸𝑛𝑐𝜆10,𝜆11 (𝜆12 )
𝑊2
Garbling Gates in Yao’s Garbled Circuits
[Yao1986]
Gate 2 (𝐺2 )
AND
𝑊0
AND GATE
Encrypted Entry
𝑊1
0,0
𝐸𝑛𝑐𝜆00,𝜆01 (𝜆02 )
0,1
𝐸𝑛𝑐𝜆00,𝜆11 (𝜆02 )
1,0
𝐸𝑛𝑐𝜆10,𝜆01 (𝜆02 )
1,1
𝐸𝑛𝑐𝜆10,𝜆11 (𝜆12 )
𝑊2
Garbling Gates in Yao’s Garbled Circuits
[Yao1986]
Gate 2 (𝐺2 )
AND
𝑊0
AND GATE
Encrypted Entry
𝑊1
0,0
𝐸𝑛𝑐𝜆00,𝜆01 (𝜆02 )
0,1
𝐸𝑛𝑐𝜆00,𝜆11 (𝜆02 )
1,0
𝐸𝑛𝑐𝜆10,𝜆01 (𝜆02 )
1,1
𝐸𝑛𝑐𝜆10,𝜆11 (𝜆12 )
𝑊2
Garbling Gates in Yao’s Garbled Circuits
[Yao1986]
Gate 2 (𝐺2 )
AND
𝑊0
AND GATE
𝜋0 , 𝜋1
𝑊1
Encrypted Entry
0,0
𝐸𝑛𝑐𝜆10,𝜆01 (𝜆02 , 𝜋20 )
0,1
𝐸𝑛𝑐𝜆10,𝜆11 (𝜆12 , 𝜋21 )
1,0
𝐸𝑛𝑐𝜆00,𝜆01 (𝜆02 , 𝜋20 )
1,1
𝐸𝑛𝑐𝜆00,𝜆11 (𝜆02 , 𝜋20 )
𝑊2
Yao’s Garbled Circuits [Yao1986]
F(X,Y) = X + Y
X
XOR 0
Y
0
0x1212
0x1234
0xCC1C
0x1112
Label 0 = 0xF1F1
Label 1 = 0xABAB
XOR 1
AND 2
OR GATE
𝑊0
𝑊1
Output
𝜆00
𝜆10
𝐸𝑛𝑐𝜆00,𝜆01 (𝜆02 )
1
1
𝐸𝑛𝑐𝜆00,𝜆01 (𝜆12 )
𝜆00
0x4321
0x9932
0x6753
0x9B3F
AND 3
OR𝜆 4
0x93FA
0x8843
(𝜆 )
𝜆10
𝜆10
𝐸𝑛𝑐𝜆10,𝜆01
𝜆10
𝜆11
𝐸𝑛𝑐𝜆10,𝜆11 (𝜆12 )
1
2
O
U
T
P
U
T
S
Alice sends the generated circuit to Bob.
• Alice sends ALL garbled truth tables to Bob.
XOR GATE
𝜋0 , 𝜋1
Encrypted Entry
1
XOR
GATE
0,0
𝐻 𝜆0 ∥ 𝜆01 ∥ 0 ⨁(𝜆02 ∥ 𝜋20 )
𝜋 ,𝜋
1
1 Encrypted1 Entry
1
𝐻 𝜆GATE
0,1 0 1 AND
0 ∥ 𝜆1 ∥ 0 ⨁(𝜆2 ∥ 𝜋2 )
0
0
1
0,0
𝐻 𝜆00 ∥ 𝜆1 ∥ 0 0⨁(𝜆20 ∥ 𝜋20 )
0
𝜋
,
𝜋
Encrypted Entry
𝐻
𝜆
0
1
1,0
0 ∥ 𝜆1 ∥ 01 ⨁(𝜆2 ∥ 𝜋12 ) 1
𝐻OR
𝜆0GATE
∥ 𝜆1 ∥1 0 ⨁(𝜆
0,1
2 ∥ 𝜋2 )0
0
0
0,0 0
0 1 ∥ 00 ⨁(𝜆2 ∥ 𝜋2 )
1 𝐻 𝜆0 ∥ 𝜆
𝐻
𝜆
∥
𝜆
∥
0
⨁(𝜆
∥
𝜋
)
0
1
2
2
0
0
0
0
1,1 1,0
𝜋 , 𝜋 1𝜆∥1 0∥ 𝜆⨁(𝜆
Encrypted
1
1
𝜋2 )12 ∥ 𝜋Entry
2 ∥⨁(𝜆
0,1 𝐻 0𝜆0 ∥1𝐻𝜆AND
0 GATE
1 ∥0
2)
0
0
1
0,0
𝐻0 𝜆00 ∥ 𝜆10∥00 ⨁(𝜆
∥ 𝜋20 )
0
1 0
0 2
𝐻
𝜆
∥
𝜆
∥
0
⨁(𝜆
∥
𝜋
)
0
1
2
2
𝐻
𝜆
∥
𝜆
∥
0
⨁(𝜆
∥
𝜋
)
𝜋
,
𝜋
Encrypted
Entry
1,0
2
2
00 1 1 1
1,1
𝐻 𝜆0 ∥ 𝜆11 ∥ 0 ⨁(𝜆12 ∥ 𝜋21 )
0,1
0
0
0 0 , 𝜋2
1 𝐸𝑛𝑐 1 ,𝜆0 (𝜆
𝐻 𝜆0,0
∥ 𝜋20 )) 0
0 01 2 2
0 ∥ 𝜆1 ∥ 00 𝜆⨁(𝜆
1,1
𝐻
𝜆
∥
𝜆
∥
0
⨁(𝜆
∥ 𝜋20 )
1,0
0
1
1 1 2
𝐸𝑛𝑐
1
1
(𝜆
,
𝜋
)
0,1
𝜆0 ,𝜆1 2 2
0
𝐻
𝜆
∥
𝜆11 ∥ 0 ⨁(𝜆02 ∥ 𝜋20 )
0
1,1
𝐸𝑛𝑐𝜆00 ,𝜆01 (𝜆02 , 𝜋20 )
1,0
1,1
ALICE
𝐸𝑛𝑐𝜆00 ,𝜆11 (𝜆02 , 𝜋20 )
BOB
Sent over the network…
Bob evaluates the circuit.
• Evaluation is the reverse of generation.
Gate 3 (𝐺3 )
AND
𝑊0
𝜆0 = 0xCC1C
𝜋0 = 0x1
𝑊1
𝜆1 = 0x1234
𝜋1 = 0x0
AND GATE
𝜋0 , 𝜋1
Encrypted Entry
𝑊2
,𝜆1
(𝐸𝑛𝑐𝜆10,𝜆01 (𝜆02 , 𝜋20 ))
,𝜆1
(𝐸𝑛𝑐𝜆10,𝜆11 (𝜆12 , 𝜋21 ))
𝜆2 = ? ?
,𝜆1
(𝐸𝑛𝑐𝜆00,𝜆01 (𝜆02 , 𝜋20 ))
𝜋2 = ??
0 ,𝜆1
(𝐸𝑛𝑐𝜆00,𝜆11 (𝜆02 , 𝜋20 ))
0,0
𝐷𝑒𝑐𝜆
0,1
𝐷𝑒𝑐𝜆
1,0
𝐷𝑒𝑐𝜆
1,1
𝐷𝑒𝑐𝜆
0
0
0
Bob evaluates the circuit.
• Evaluation is the reverse of generation.
Gate 3 (𝐺3 )
AND
𝑊0
𝜆0 = 0xCC1C
𝜋0 = 0x1
𝑊1
𝜆1 = 0x1234
𝜋1 = 0x0
AND GATE
𝜋0 , 𝜋1
Encrypted Entry
𝑊2
,𝜆1
(𝐸𝑛𝑐𝜆10,𝜆01 (𝜆02 , 𝜋20 ))
,𝜆1
(𝐸𝑛𝑐𝜆10,𝜆11 (𝜆12 , 𝜋21 ))
𝜆2 = ? ?
,𝜆1
(𝐸𝑛𝑐𝜆00,𝜆01 (𝜆02 , 𝜋20 ))
𝜋2 = ??
0 ,𝜆1
(𝐸𝑛𝑐𝜆00,𝜆11 (𝜆02 , 𝜋20 ))
0,0
𝐷𝑒𝑐𝜆
0,1
𝐷𝑒𝑐𝜆
1,0
𝐷𝑒𝑐𝜆
1,1
𝐷𝑒𝑐𝜆
0
0
0
ENTRY TO DECODE
Bob evaluates the circuit.
• Evaluation is the reverse of generation.
Gate 3 (𝐺3 )
AND
𝑊0
𝜆0 = 0xCC1C
𝜋0 = 0x1
𝑊1
𝜆1 = 0x1234
𝜋1 = 0x0
AND GATE
𝜋0 , 𝜋1
Encrypted Entry
0,0
⊥
0,1
⊥
𝜆2 = ? ?
𝜆02 , 𝜋20
1,0
1,1
𝑊2
⊥
ENTRY TO DECODE
𝜋2 = ??
Other security models for Yao’s Garbled
Circuits
• Malicious-Leaks-A-Bit [Huang2013]
• Benefits:
• Attacker can analyze results and lie in the protocol.
• Only requires one extra Generation and Evaluation.
• Drawbacks:
• Leaks 1-bit of output.
• Fully Malicious [Lindell2013]
• Benefits:
• Leaks no information to the attacker.
• Drawbacks
• Requires Alice generate between 60 – 130 circuits. Bob must evaluate ~1/2 and verify the
rest.
• NOTE: Our methods can work with either of these models!
Brief survey of garbled circuit systems
CPU
Based
?
GPU
Based?
OT
Parallel?
Extension?
Bottleneck
Security
Model
Our Work
Yes*
Yes
Yes*
Yes
Communication
Honest-but-curious,
malicious leaks a bit,
(Fully Malicious)
Huang et al.
Yes
No
Yes
No
Processing
Honest-but-curious,
malicious leaks a bit
Kreuter et al.
[Kreuter2013]
Yes
No
Yes
Super computers
Communication
Fully Malicious
Frederiksen et al.
[Frederiksen201]
No
Yes
Yes*
Single GPUs
Communication
Fully Malicious
Contributions to Garbled Circuit Optimization
1. A method for accurately comparing garbled circuit systems with
very different circuit formats.
2. A method for generating all gates in a circuit at once.
3. A method for reducing the number of calculations for each gate
garbling.
4. A scalable generation method that can be combined with other
best-in-class implementations.
Fast Garbled Circuit Processing With GPUs
• GPUs are highly parallel Single Instruction Multiple Data (SIMD)
processors.
• We can use every “core” on the GPU to process a gate.
• But the SIMD parallelism requires protocol modifications.
Generating all gates at once allows high throughput but requires protocol modification.
• The Free XOR Technique [Kolesnikov2008]
Label 0 (𝜆00 )
Label 1 (𝜆10 ) = 𝜆00 ⨁𝑅
Gate 0 (𝐺0 )
XOR
𝜆02 = 𝜆00 ⨁𝜆10
𝜆12 = 𝜆00 ⨁𝜆10 ⨁𝑅
Gate 2 (𝐺2 )
AND
𝜆10
𝜆11 = 𝜆10 ⨁𝑅
𝑅 : Randomly Generated Constant
Generating all gates at once allows high throughput but requires protocol modification.
• Our modified Free XOR technique
Label 0 (𝜆00 )
Gate 0 (𝐺0 )
XOR
Label 1 (𝜆10 ) = 𝜆00 ⨁𝑅
XOR Offset
𝜆00 ⨁𝜆10 ⨁𝜆02
𝜆02
𝜆12 = 𝜆02 ⨁𝑅
𝜆10
𝜆11 = 𝜆10 ⨁𝑅
𝑅 : Randomly Generated Constant
Gate 2 (𝐺2 )
AND
Benefits of increased Throughput
Benchmarking Machines
Name
CPU
GPU
Tie (DARPA)
Tesla K20
0.71 Ghz
EC2 (Amazon)
Tesla S2050
1.15 Ghz
Kreuter et al.
Xenon E5506
2.13 Ghz
Garbling Truth Tables in practice
Gate 2 (𝐺2 )
AND
𝑊0
AND GATE GARBLED TRUTH TABLE
𝜋0 , 𝜋1
𝑊1
Encrypted Entry
0,0
SHA1 𝜆10 ∥ 𝜆10 ∥ 2 ⨁(𝜆02 ∥ 𝜋20 )
0,1
SHA1 𝜆10 ∥ 𝜆11 ∥ 2 ⨁(𝜆12 ∥ 𝜋21 )
1,0
SHA1 𝜆00 ∥ 𝜆10 ∥ 2 ⨁(𝜆02 ∥ 𝜋20 )
1,1
SHA1 𝜆00 ∥ 𝜆11 ∥ 2 ⨁(𝜆02 ∥ 𝜋20 )
𝑊2
Reducing calculations required per-gate
provided benefits over other GPU systems.
SHA1 Counts
Random Wire Label (per wire)
1 SHA1
Garbled Truth Table
4 SHA1
XOR Offset
0 SHA1
• But recall there are three wires for every gate in the circuit…
Inputs and Outputs of SHA1
Buckets holding inputs:
1
2
3
4
…
Buckets holding algorithm state:
A
B
C
D
E
15
16
Pre-computing SHA1 intermediate values
Inputs for random wire values:
Seed
Seed
Seed
Seed
… 0x0 …
Wire ID
Wire ID
Pre-computing SHA1 intermediate values
Buckets holding inputs:
Seed
Seed
Seed
Seed
… 0x0 …
Only buckets used during the first 14 rounds.
= Common for all Wires
Wire ID
Wire ID
Benefits of SHA1 pre-computation
Benchmarking Machines
Name
GPU
GPU Cores
Tie (DARPA)
Tesla K20
0.71 Ghz
2496
EC2 (Amazon)
Tesla S2050
1.15 Ghz
448
Current and On-Going Work
• Now implement the PCF2 circuit format developed by Kreuter et al.
• Working on additional circuit optimizations on top of those provided
by the PCF2 compiler.
• Provide a full scale solution from honest-but-curious to fully malicious
processing.
• Multiple GPUs
• Super computers
• Experiments and source code are available upon request.
Questions?
Extra Slide Matter
Using GPUs we show the fastest single
machine garbled circuit generator
• XOR Gates: ~ 60.2 Million Gates Per Second
• TT Gates: ~34.1 Million Gates Per Second
1. Alice will generate the Yao’s circuit.
• Alice must construct the circuit using a series of Boolean gates with
two input wires and one output wire.
• Each gate has a serial number and garbled truth table.
𝑊0
Gate 0 (𝐺0 )
𝑊1
AND
𝑊2
Wires in Yao’s Garbled Circuits [Yao1986]
• Alice must use random labels (𝜆) for wire values instead of 0’s and
1’s.
• Alice must use permutation bits (p-bits; 𝜋) to signify the label choice.
Wire 0 (𝑊0 )
Label 0 (𝜆00 ) = 0xA1B2
P-bit 0 (𝜋00 ) = 0x1
Label 1 (𝜆10 ) = 0x192F
P-bit 1 (𝜋01 ) =𝜋00 ⨁ 1 = 0x0
Encrypting Gates in Yao’s Garbled Circuits
[Yao1986]
• How Alice creates garbled truth tables in two steps
• Step 1: Create Encrypted Truth Table
Gate 2 (𝐺2 )
AND
𝑊0
AND GATE
𝐺2
Serial #: 2
𝑊1
𝑊1
𝜆10 = 0x428F
𝜋10 = 0x0
𝜆11 = 0xADC1
𝜋11 = 0x1
𝑊0
𝜆00 = 0xA1B2
𝜋00 = 0x1
𝜆10 = 0x192F
𝜋01 = 0x0
𝐸𝑛𝑐𝜆00,𝜆01 (𝜆02 , 𝜋20 )
𝐸𝑛𝑐𝜆10,𝜆01 (𝜆02 , 𝜋20 )
𝐸𝑛𝑐𝜆00,𝜆11 (𝜆02 , 𝜋20 )
𝐸𝑛𝑐𝜆10,𝜆11 (𝜆12 , 𝜋21 )
𝑊2
𝜆02 = 0xA1B2
𝜋20 = 0x0
𝜆12 = 0x192F
𝜋21 = 0x1
Encrypting Gates in Yao’s Garbled Circuits
[Yao1986]
• How Alice creates garbled truth tables in two steps
• Step 1: Create Encrypted Truth Table
Gate 2 (𝐺2 )
AND
𝑊0
AND GATE
𝐺2
Serial #: 2
𝑊1
𝑊1
𝜆10 = 0x428F
𝜋10 = 0x0
𝜆11 = 0xADC1
𝜋11 = 0x1
𝑊0
𝜆00 = 0xA1B2
𝜋00 = 0x1
𝜆10 = 0x192F
𝜋01 = 0x0
𝐸𝑛𝑐𝜆00,𝜆01 (𝜆02 , 𝜋20 )
𝐸𝑛𝑐𝜆10,𝜆01 (𝜆02 , 𝜋20 )
𝐸𝑛𝑐𝜆00,𝜆11 (𝜆02 , 𝜋20 )
𝐸𝑛𝑐𝜆10,𝜆11 (𝜆12 , 𝜋21 )
STEP 1 OUTPUT
𝑊2
𝜆02 = 0xA1B2
𝜋20 = 0x0
𝜆12 = 0x192F
𝜋21 = 0x1
Encrypting Gates in Yao’s Garbled Circuits
[Yao1986]
• How Alice creates garbled truth tables in two steps
• Step 1: Create Encrypted Truth Table
Gate 1 (𝐺1 )
XOR
𝑊0
XOR GATE
𝑊1
𝑊0
𝑊1
Output
𝜆00
𝜆10
𝐸𝑛𝑐𝜆00,𝜆01 (𝜆02 )
𝜆00
𝜆11
𝐸𝑛𝑐𝜆00,𝜆01 (𝜆12 )
𝜆10
𝜆10
𝐸𝑛𝑐𝜆10,𝜆01 (𝜆12 )
𝜆10
𝜆11
𝐸𝑛𝑐𝜆10,𝜆11 (𝜆02 )
𝑊2
Encrypting Gates in Yao’s Garbled Circuits
[Yao1986]
• How Alice creates garbled truth tables in two steps
• Step 1: Create Encrypted Truth Table
Gate 4 (𝐺4 )
OR
𝑊0
OR GATE
𝑊1
𝑊0
𝑊1
Output
𝜆00
𝜆10
𝐸𝑛𝑐𝜆00,𝜆01 (𝜆02 )
𝜆00
𝜆11
𝐸𝑛𝑐𝜆00,𝜆01 (𝜆12 )
𝜆10
𝜆10
𝐸𝑛𝑐𝜆10,𝜆01 (𝜆12 )
𝜆10
𝜆11
𝐸𝑛𝑐𝜆10,𝜆11 (𝜆12 )
𝑊2
A basic overview of the Yao’s protocol
• Assumptions:
• Security Model: Honest but Curious
• Process:
1.
2.
3.
4.
5.
Alice will generate the Yao’s circuit.
Alice sends the generated circuit to Bob.
Bob will use Oblivious Transfer to learn Alice’s inputs.
Bob will evaluate the circuit.
Bob sends the output to Alice
Yao’s Garbled Circuits under an Honest-butCurious Security Model
1. Alice generates wire labels and garbled truth tables for all wires and
gates in a circuit.
2. Alice sends the garbled truth tables to Bob.
3. Bob obtains Alice’s input using Oblivious Transfer.
4. Bob evaluates the circuit.
5. Bob sends output to Alice.
Both party can analyze data t all steps of this protocol but must
perform all steps.
Bob performs Oblivious Transfer to obtain
Alice’s Inputs
𝑆0
Alice
𝑆1
Oblivious
Transfer
𝑆𝑎
Bob
So how fast can we process garbled circuits?
• XOR Gates: ~ 60.2 Million Gates Per Second
• TT Gates: ~34.1 Million Gates Per Second