Hardware Implementation of Cryptography - ARCHI 2017

ARCHI 2017, Nancy, France — March 6–10, 2017
Hardware Implementation of
Cryptography
Jérémie Detrey
CARAMBA team, LORIA
INRIA Nancy – Grand Est, France
[email protected]
/*
/*
/*
CARAMBA
d[5],Q[999
(;i--;e=scanf("%"
++i<C
;++Q[
c:i);
for(;i
--;n
+=!u*Q
+l*lc*s*
l=i,s=u,r=4;r;E=
%C,l=E%C+r
*/
*/
*/
]={0};main(n
"d",d+i));for(C
i*i%
C],c=
--;)
for(u
[l%C
],e+=
s%C)
%C])
i*l+c*u*s,s=(u*l
--[d]);printf
/* cc caramba.c; echo f3 f2 f1 f0 p | ./a.out */
E,C,
c,r,
u,l,
e,s,
i=5,
){for
=*d;
i[Q]?
=C;u
Q[(C
for(
+i*s)
("%d"
"\n",
(e+n*
n)/2
-C);}
ARCHI 2017, Nancy, France — March 6–10, 2017
Hardware Implementation of
(Elliptic Curve) Cryptography
Jérémie Detrey
CARAMBA team, LORIA
INRIA Nancy – Grand Est, France
[email protected]
/*
/*
/*
CARAMBA
d[5],Q[999
(;i--;e=scanf("%"
++i<C
;++Q[
c:i);
for(;i
--;n
+=!u*Q
+l*lc*s*
l=i,s=u,r=4;r;E=
%C,l=E%C+r
*/
*/
*/
]={0};main(n
"d",d+i));for(C
i*i%
C],c=
--;)
for(u
[l%C
],e+=
s%C)
%C])
i*l+c*u*s,s=(u*l
--[d]);printf
/* cc caramba.c; echo f3 f2 f1 f0 p | ./a.out */
E,C,
c,r,
u,l,
e,s,
i=5,
){for
=*d;
i[Q]?
=C;u
Q[(C
for(
+i*s)
("%d"
"\n",
(e+n*
n)/2
-C);}
Cryptography
Alice
Bob
I Alice and Bob want to communicate using a public channel (e.g., Internet)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
1 / 43
Cryptography
Alice
Bob
Eve
I Alice and Bob want to communicate using a public channel (e.g., Internet)
• ... but Eve is listening (passive attack: eavesdropping)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
1 / 43
Cryptography
Alice
Mallory
Bob
Eve
I Alice and Bob want to communicate using a public channel (e.g., Internet)
• ... but Eve is listening (passive attack: eavesdropping)
• ... and Mallory is interfering (active attack: tampering, forgery, replay, etc.)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
1 / 43
Cryptography
Alice
Bob
?
Mallory
Eve
I Alice and Bob want to communicate using a public channel (e.g., Internet)
• ... but Eve is listening (passive attack: eavesdropping)
• ... and Mallory is interfering (active attack: tampering, forgery, replay, etc.)
I Cryptography: how to prevent such attacks, and ensure
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
1 / 43
Cryptography
Alice
Bob
?
Mallory
Eve
I Alice and Bob want to communicate using a public channel (e.g., Internet)
• ... but Eve is listening (passive attack: eavesdropping)
• ... and Mallory is interfering (active attack: tampering, forgery, replay, etc.)
I Cryptography: how to prevent such attacks, and ensure
• confidentiality (Who can read the message?)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
→ encryption
1 / 43
Cryptography
Alice
Bob
?
Mallory
Eve
I Alice and Bob want to communicate using a public channel (e.g., Internet)
• ... but Eve is listening (passive attack: eavesdropping)
• ... and Mallory is interfering (active attack: tampering, forgery, replay, etc.)
I Cryptography: how to prevent such attacks, and ensure
• confidentiality (Who can read the message?)
• integrity (Was the message modified?)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
→ encryption
→ cryptographic hash functions
1 / 43
Cryptography
Alice
Bob
?
Mallory
Eve
I Alice and Bob want to communicate using a public channel (e.g., Internet)
• ... but Eve is listening (passive attack: eavesdropping)
• ... and Mallory is interfering (active attack: tampering, forgery, replay, etc.)
I Cryptography: how to prevent such attacks, and ensure
• confidentiality (Who can read the message?)
→ encryption
• integrity (Was the message modified?)
→ cryptographic hash functions
• authenticity (Who sent the message?) → message auth. code (MAC), signature
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
1 / 43
Cryptography
Alice
Bob
?
Mallory
Eve
I Alice and Bob want to communicate using a public channel (e.g., Internet)
• ... but Eve is listening (passive attack: eavesdropping)
• ... and Mallory is interfering (active attack: tampering, forgery, replay, etc.)
I Cryptography: how to prevent such attacks, and ensure
•
•
•
•
confidentiality (Who can read the message?)
→ encryption
integrity (Was the message modified?)
→ cryptographic hash functions
authenticity (Who sent the message?) → message auth. code (MAC), signature
... and many others: non-repudiation, zero-knowledge proof, secret sharing, etc.
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
1 / 43
Cryptographic layers
I A complete cryptosystem implementation relies on many layers:
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
2 / 43
Cryptographic layers
I A complete cryptosystem implementation relies on many layers:
• protocol (OpenPGP, TLS, SSH, etc.)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
2 / 43
Cryptographic layers
I A complete cryptosystem implementation relies on many layers:
• protocol (OpenPGP, TLS, SSH, etc.)
• cryptographic mechanisms (encryption, hashing, signature, etc.)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
2 / 43
Cryptographic layers
I A complete cryptosystem implementation relies on many layers:
• protocol (OpenPGP, TLS, SSH, etc.)
• cryptographic mechanisms (encryption, hashing, signature, etc.)
• cryptographic primitives (AES, RSA, ECDH, etc.)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
2 / 43
Cryptographic layers
I A complete cryptosystem implementation relies on many layers:
•
•
•
•
protocol (OpenPGP, TLS, SSH, etc.)
cryptographic mechanisms (encryption, hashing, signature, etc.)
cryptographic primitives (AES, RSA, ECDH, etc.)
arithmetic and logic operations (CPU / ASIP instruction set)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
2 / 43
Cryptographic layers
I A complete cryptosystem implementation relies on many layers:
•
•
•
•
•
protocol (OpenPGP, TLS, SSH, etc.)
cryptographic mechanisms (encryption, hashing, signature, etc.)
cryptographic primitives (AES, RSA, ECDH, etc.)
arithmetic and logic operations (CPU / ASIP instruction set)
logic circuits (registers, multiplexers, adders, etc.)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
2 / 43
Cryptographic layers
I A complete cryptosystem implementation relies on many layers:
•
•
•
•
•
•
protocol (OpenPGP, TLS, SSH, etc.)
cryptographic mechanisms (encryption, hashing, signature, etc.)
cryptographic primitives (AES, RSA, ECDH, etc.)
arithmetic and logic operations (CPU / ASIP instruction set)
logic circuits (registers, multiplexers, adders, etc.)
logic gates (NOT, NAND, etc.) and wires
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
2 / 43
Cryptographic layers
I A complete cryptosystem implementation relies on many layers:
•
•
•
•
•
•
•
protocol (OpenPGP, TLS, SSH, etc.)
cryptographic mechanisms (encryption, hashing, signature, etc.)
cryptographic primitives (AES, RSA, ECDH, etc.)
arithmetic and logic operations (CPU / ASIP instruction set)
logic circuits (registers, multiplexers, adders, etc.)
logic gates (NOT, NAND, etc.) and wires
transistors
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
2 / 43
Cryptographic layers
I A complete cryptosystem implementation relies on many layers:
•
•
•
•
•
•
•
protocol (OpenPGP, TLS, SSH, etc.)
cryptographic mechanisms (encryption, hashing, signature, etc.)
cryptographic primitives (AES, RSA, ECDH, etc.)
arithmetic and logic operations (CPU / ASIP instruction set)
logic circuits (registers, multiplexers, adders, etc.)
logic gates (NOT, NAND, etc.) and wires
transistors
I When designing a cryptoprocessor, the hardware/software partitioning can be
tailored to the application’s requirements
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
2 / 43
Cryptographic layers
I A complete cryptosystem implementation relies on many layers:
•
•
•
•
•
•
•
protocol (OpenPGP, TLS, SSH, etc.)
cryptographic mechanisms (encryption, hashing, signature, etc.)
cryptographic primitives (AES, RSA, ECDH, etc.)
arithmetic and logic operations (CPU / ASIP instruction set)
logic circuits (registers, multiplexers, adders, etc.)
logic gates (NOT, NAND, etc.) and wires
transistors
I When designing a cryptoprocessor, the hardware/software partitioning can be
tailored to the application’s requirements
I All top layers (esp. the blue and green ones) might lead to critical vulnerabilities if
poorly implemented!
⇒ a cryptosystem is no more secure than its weakest link
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
2 / 43
Cryptographic layers
I A complete cryptosystem implementation relies on many layers:
•
•
•
•
•
•
•
protocol (OpenPGP, TLS, SSH, etc.)
cryptographic mechanisms (encryption, hashing, signature, etc.)
cryptographic primitives (AES, RSA, ECDH, etc.)
arithmetic and logic operations (CPU / ASIP instruction set)
logic circuits (registers, multiplexers, adders, etc.)
logic gates (NOT, NAND, etc.) and wires
transistors
I When designing a cryptoprocessor, the hardware/software partitioning can be
tailored to the application’s requirements
I All top layers (esp. the blue and green ones) might lead to critical vulnerabilities if
poorly implemented!
⇒ a cryptosystem is no more secure than its weakest link
I In this lecture, we will mostly focus on the green layers
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
2 / 43
Which target platforms?
I Cryptography should be available everywhere:
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
3 / 43
Which target platforms?
I Cryptography should be available everywhere:
• on desktop PCs and laptops
→ 64-bit Intel or AMD CPUs with SIMD instructions (SSE / AVX)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
3 / 43
Which target platforms?
I Cryptography should be available everywhere:
• on
→
• on
→
desktop PCs and laptops
64-bit Intel or AMD CPUs with SIMD instructions (SSE / AVX)
smartphones
low-power 32- or 64-bit ARM CPUs, maybe with SIMD (NEON)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
3 / 43
Which target platforms?
I Cryptography should be available everywhere:
• on
→
• on
→
• on
→
desktop PCs and laptops
64-bit Intel or AMD CPUs with SIMD instructions (SSE / AVX)
smartphones
low-power 32- or 64-bit ARM CPUs, maybe with SIMD (NEON)
wireless sensors
tiny 8-bit microcontroller (such as Atmel AVRs)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
3 / 43
Which target platforms?
I Cryptography should be available everywhere:
• on desktop PCs and laptops
→ 64-bit Intel or AMD CPUs with SIMD instructions (SSE / AVX)
• on smartphones
→ low-power 32- or 64-bit ARM CPUs, maybe with SIMD (NEON)
• on wireless sensors
→ tiny 8-bit microcontroller (such as Atmel AVRs)
• on smart cards and RFID chips
→ custom cryptoprocessor (ASIC or ASIP) with dedicated hardware for
cryptographic operations
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
3 / 43
Which target platforms?
I Cryptography should be available everywhere:
• on desktop PCs and laptops
→ 64-bit Intel or AMD CPUs with SIMD instructions (SSE / AVX)
• on smartphones
→ low-power 32- or 64-bit ARM CPUs, maybe with SIMD (NEON)
• on wireless sensors
→ tiny 8-bit microcontroller (such as Atmel AVRs)
• on smart cards and RFID chips
→ custom cryptoprocessor (ASIC or ASIP) with dedicated hardware for
cryptographic operations
I Other possible target platforms, mostly for cryptanalytic computations:
• clusters of CPUs
• GPUs (graphics processors)
• FPGAs (reconfigurable circuits)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
3 / 43
Which target platforms?
I Cryptography should be available everywhere:
• on desktop PCs and laptops
→ 64-bit Intel or AMD CPUs with SIMD instructions (SSE / AVX)
• on smartphones
→ low-power 32- or 64-bit ARM CPUs, maybe with SIMD (NEON)
• on wireless sensors
→ tiny 8-bit microcontroller (such as Atmel AVRs)
• on smart cards and RFID chips
→ custom cryptoprocessor (ASIC or ASIP) with dedicated hardware for
cryptographic operations
I Other possible target platforms, mostly for cryptanalytic computations:
• clusters of CPUs
• GPUs (graphics processors)
• FPGAs (reconfigurable circuits)
⇒ In such cases, implementation security is usually less critical
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
3 / 43
Efficient and secure implementation?
I Many possible meanings for efficiency:
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
4 / 43
Efficient and secure implementation?
I Many possible meanings for efficiency:
• fast? → low latency or high throughput?
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
4 / 43
Efficient and secure implementation?
I Many possible meanings for efficiency:
• fast? → low latency or high throughput?
• small? → low memory / code / silicon usage?
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
4 / 43
Efficient and secure implementation?
I Many possible meanings for efficiency:
• fast? → low latency or high throughput?
• small? → low memory / code / silicon usage?
• low power?... or low energy?
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
4 / 43
Efficient and secure implementation?
I Many possible meanings for efficiency:
• fast? → low latency or high throughput?
• small? → low memory / code / silicon usage?
• low power?... or low energy?
⇒ Identify constraints according to application and target platform
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
4 / 43
Efficient and secure implementation?
I Many possible meanings for efficiency:
• fast? → low latency or high throughput?
• small? → low memory / code / silicon usage?
• low power?... or low energy?
⇒ Identify constraints according to application and target platform
I Secure against which attacks?
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
4 / 43
Efficient and secure implementation?
I Many possible meanings for efficiency:
• fast? → low latency or high throughput?
• small? → low memory / code / silicon usage?
• low power?... or low energy?
⇒ Identify constraints according to application and target platform
I Secure against which attacks?
• protocol attacks? (POODLE, FREAK, LogJam, etc.)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
4 / 43
Efficient and secure implementation?
I Many possible meanings for efficiency:
• fast? → low latency or high throughput?
• small? → low memory / code / silicon usage?
• low power?... or low energy?
⇒ Identify constraints according to application and target platform
I Secure against which attacks?
• protocol attacks? (POODLE, FREAK, LogJam, etc.)
• cryptanalysis? (weak cipher, small keys, etc.)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
4 / 43
Efficient and secure implementation?
I Many possible meanings for efficiency:
• fast? → low latency or high throughput?
• small? → low memory / code / silicon usage?
• low power?... or low energy?
⇒ Identify constraints according to application and target platform
I Secure against which attacks?
• protocol attacks? (POODLE, FREAK, LogJam, etc.)
• cryptanalysis? (weak cipher, small keys, etc.)
• timing attacks?
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
4 / 43
Efficient and secure implementation?
I Many possible meanings for efficiency:
• fast? → low latency or high throughput?
• small? → low memory / code / silicon usage?
• low power?... or low energy?
⇒ Identify constraints according to application and target platform
I Secure against which attacks?
•
•
•
•
protocol attacks? (POODLE, FREAK, LogJam, etc.)
cryptanalysis? (weak cipher, small keys, etc.)
timing attacks?
power or electromagnetic analysis?
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
4 / 43
Efficient and secure implementation?
I Many possible meanings for efficiency:
• fast? → low latency or high throughput?
• small? → low memory / code / silicon usage?
• low power?... or low energy?
⇒ Identify constraints according to application and target platform
I Secure against which attacks?
•
•
•
•
•
protocol attacks? (POODLE, FREAK, LogJam, etc.)
cryptanalysis? (weak cipher, small keys, etc.)
timing attacks?
power or electromagnetic analysis?
fault attacks? [See A. Tisserand’s talk]
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
4 / 43
Efficient and secure implementation?
I Many possible meanings for efficiency:
• fast? → low latency or high throughput?
• small? → low memory / code / silicon usage?
• low power?... or low energy?
⇒ Identify constraints according to application and target platform
I Secure against which attacks?
•
•
•
•
•
•
protocol attacks? (POODLE, FREAK, LogJam, etc.)
cryptanalysis? (weak cipher, small keys, etc.)
timing attacks?
power or electromagnetic analysis?
fault attacks? [See A. Tisserand’s talk]
cache attacks?
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
4 / 43
Efficient and secure implementation?
I Many possible meanings for efficiency:
• fast? → low latency or high throughput?
• small? → low memory / code / silicon usage?
• low power?... or low energy?
⇒ Identify constraints according to application and target platform
I Secure against which attacks?
•
•
•
•
•
•
•
protocol attacks? (POODLE, FREAK, LogJam, etc.)
cryptanalysis? (weak cipher, small keys, etc.)
timing attacks?
power or electromagnetic analysis?
fault attacks? [See A. Tisserand’s talk]
cache attacks?
branch-prediction attacks?
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
4 / 43
Efficient and secure implementation?
I Many possible meanings for efficiency:
• fast? → low latency or high throughput?
• small? → low memory / code / silicon usage?
• low power?... or low energy?
⇒ Identify constraints according to application and target platform
I Secure against which attacks?
•
•
•
•
•
•
•
•
protocol attacks? (POODLE, FREAK, LogJam, etc.)
cryptanalysis? (weak cipher, small keys, etc.)
timing attacks?
power or electromagnetic analysis?
fault attacks? [See A. Tisserand’s talk]
cache attacks?
branch-prediction attacks?
etc.
⇒ Possible attack scenarios depend on the application
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
4 / 43
Some references
Alfred Menezes, Paul van Oorschot, and Scott Vanstone,
Handbook of Applied Cryptography.
Chapman & Hall / CRC, 1996.
http://www.cacr.math.uwaterloo.ca/hac/
Francisco Rodrı́guez-Henrı́quez, Arturo Dı́az Pérez, Nazar Abbas
Saqib, and Çetin Kaya Koç,
Cryptographic Algorithms on Reconfigurable Hardware.
Springer, 2006.
Proceedings of the CHES workshop and of other crypto conferences.
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
5 / 43
Some references
Elliptic Curves in Cryptography,
Ian F. Blake, Gadiel Seroussi, and Nigel P. Smart.
London Mathematical Society 265,
Cambridge University Press, 1999.
Guide to Elliptic Curve Cryptography,
Darrel Hankerson, Alfred Menezes, and Scott Vanstone.
Springer, 2004.
Handbook of Elliptic and Hyperelliptic Curve Cryptography,
Henri Cohen and Gerhard Frey (editors).
Chapman & Hall / CRC, 2005.
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
6 / 43
Outline
I Some encryption mechanisms
I Elliptic curve cryptography
I Scalar multiplication
I Elliptic curve arithmetic
I Finite field arithmetic
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
7 / 43
Symmetric encryption
Alice
Bob
M
I Alice wants to send a confidential message M to Bob
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
8 / 43
Symmetric encryption
Alice
?
Bob
M
I Alice wants to send a confidential message M to Bob
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
8 / 43
Symmetric encryption
Alice
K
?
M
Bob
K
I Alice wants to send a confidential message M to Bob
• they decide upon a shared secret key K (might be tricky!)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
8 / 43
Symmetric encryption
Alice
K
?
M
C = EncK (M )
Bob
K
I Alice wants to send a confidential message M to Bob
• they decide upon a shared secret key K (might be tricky!)
• encrypt message using shared key: C = EncK (M)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
8 / 43
Symmetric encryption
Alice
K
?
M
C = EncK (M )
Bob
K
I Alice wants to send a confidential message M to Bob
• they decide upon a shared secret key K (might be tricky!)
• encrypt message using shared key: C = EncK (M)
• decrypt ciphertext using shared key: M = DecK (C )
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
8 / 43
Symmetric encryption
Alice
K
?
M
C = EncK (M )
Bob
K
I Alice wants to send a confidential message M to Bob
• they decide upon a shared secret key K (might be tricky!)
• encrypt message using shared key: C = EncK (M)
• decrypt ciphertext using shared key: M = DecK (C )
I Block cipher:
• split message M into n-bit blocks (e.g., n = 128 bits)
• encryption/decryption primitive : iterated keyed permutation {0, 1}n → {0, 1}n
• requires a mode of operation to combine the blocks
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
8 / 43
AES [Daemen & Rijmen, 2001]
M
K
k0
I Advanced Encryption Standard
I Key sizes: 128, 192 or 256 bits
I Block size: 128 bits
S S SubBytes S S
I Substitution–permutation network
ShiftRows
• SubBytes: nonlinear subst. on bytes
• ShiftRows & MixColumns: mainly
wires, plus a few XORs
MixColumns
k1
Key schedule
..
.
I 10, 12, or 14 rounds
(depending on key size)
k9
S S SubBytes S S
ShiftRows
k10
C
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
9 / 43
AES [Daemen & Rijmen, 2001]
M
K
k0
I Advanced Encryption Standard
I Key sizes: 128, 192 or 256 bits
I Block size: 128 bits
S S SubBytes S S
I Substitution–permutation network
ShiftRows
• SubBytes: nonlinear subst. on bytes
• ShiftRows & MixColumns: mainly
wires, plus a few XORs
MixColumns
k1
Key schedule
..
.
k9
S S SubBytes S S
ShiftRows
k10
C
I 10, 12, or 14 rounds
(depending on key size)
I Low-area version (1 S-box):
20 cycles / round, 2.5 to 5 kGE
I Parallel version (20 S-boxes):
1 cycle / round, 20 to 35 kGE
I Fully unrolled version (200 S-boxes):
1 cycle / block, at least 200 kGE
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
9 / 43
Symmetric encryption
Alice
K
?
M
C = EncK (M )
Bob
K
I Alice wants to send a confidential message M to Bob
• decide upon a shared secret key K
• encrypt message using shared key: C = EncK (M)
• decrypt ciphertext using shared key: M = DecK (C )
I Block cipher:
• split message M into n-bit blocks (e.g., n = 128 bits)
• encryption/decryption primitive : keyed permutation {0, 1}n → {0, 1}n
• requires a mode of operation to combine the blocks
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
10 / 43
Symmetric encryption
Alice
K
?
M
C = EncK (M )
Bob
K
I Alice wants to send a confidential message M to Bob
• decide upon a shared secret key K
• encrypt message using shared key: C = EncK (M)
• decrypt ciphertext using shared key: M = DecK (C )
I Block cipher:
• split message M into n-bit blocks (e.g., n = 128 bits)
• encryption/decryption primitive : keyed permutation {0, 1}n → {0, 1}n
• requires a mode of operation to combine the blocks
I Stream cipher:
• generate a pseudorandom keystream Z using a PRNG initialized by the key K
and a random initialization vector (IV)
• use Z to mask the message: C = M ⊕ Z and M = C ⊕ Z
(⊕ is XOR)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
10 / 43
Trivium [De Cannière & Preneel, 2005]
4
1
286
287
288
I Part of the eSTREAM portfolio
(low-area hardware ciphers)
26
I Key size: 80 bits
I IV size: 80 bits
24
66
3
69
zi
I 288-bit circular shift register,
plus a few XOR and AND gates
91
92
93
178
17
1
17 75
6
94
162
7
171
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
11 / 43
Trivium [De Cannière & Preneel, 2005]
4
1
286
287
288
I Part of the eSTREAM portfolio
(low-area hardware ciphers)
26
I Key size: 80 bits
I IV size: 80 bits
24
66
3
69
zi
91
92
93
I 288-bit circular shift register,
plus a few XOR and AND gates
I Serial version:
• 1 keystream bit / clock cycle
• 2.6 kGE
178
17
1
17 75
6
94
162
7
171
I Parallel version:
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
• up to 64 bits / clock cycle
• 4.9 kGE
11 / 43
Public-key encryption
Alice
Bob
K
K
C = EncK (M )
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
12 / 43
Public-key encryption
Alice
Bob
?
?
K
K
C = EncK (M )
I Agreeing on a shared secret key K over a public channel is difficult
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
12 / 43
Public-key encryption
Alice
Bob
?
?
K
C = EncK (M )
KB
SK
PK B
I Agreeing on a shared secret key K over a public channel is difficult
I Use public-key cryptography:
• Bob generates a public/secret key-pair (PK B , SK B )
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
12 / 43
Public-key encryption
Alice
Bob
?
?
K
C = EncK (M )
KB
SK
PK B
I Agreeing on a shared secret key K over a public channel is difficult
I Use public-key cryptography:
• Bob generates a public/secret key-pair (PK B , SK B )
• Alice retrieves Bob’s public key PK B
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
12 / 43
Public-key encryption
Alice
Bob
?
?
K
C = EncK (M )
C = EncPK B (M )
KB
SK
PK B
I Agreeing on a shared secret key K over a public channel is difficult
I Use public-key cryptography:
• Bob generates a public/secret key-pair (PK B , SK B )
• Alice retrieves Bob’s public key PK B
• encryption only uses the public key: C = EncPK B (M)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
12 / 43
Public-key encryption
Alice
Bob
?
?
K
C = EncK (M )
C = EncPK B (M )
KB
SK
PK B
I Agreeing on a shared secret key K over a public channel is difficult
I Use public-key cryptography:
• Bob generates a public/secret key-pair (PK B , SK B )
• Alice retrieves Bob’s public key PK B
• encryption only uses the public key: C = EncPK B (M)
• but decryption requires the secret key: M = DecSK B (C )
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
12 / 43
Public-key encryption
Alice
Bob
?
?
K
C = EncK (M )
C = EncPK B (M )
KB
SK
PK B
I Agreeing on a shared secret key K over a public channel is difficult
I Use public-key cryptography:
• Bob generates a public/secret key-pair (PK B , SK B )
• Alice retrieves Bob’s public key PK B
• encryption only uses the public key: C = EncPK B (M)
• but decryption requires the secret key: M = DecSK B (C )
I Security: computing SK B from PK B should be difficult
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
12 / 43
Public-key encryption
Alice
Bob
?
?
K
C = EncK (M )
C = EncPK B (M )
KB
SK
PK B
I Agreeing on a shared secret key K over a public channel is difficult
I Use public-key cryptography:
• Bob generates a public/secret key-pair (PK B , SK B )
• Alice retrieves Bob’s public key PK B
• encryption only uses the public key: C = EncPK B (M)
• but decryption requires the secret key: M = DecSK B (C )
I Security: computing SK B from PK B should be difficult
⇒ rely on (supposedly) “hard” number-theoretic problems
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
12 / 43
Public-key encryption
Alice
Bob
?
?
K
C = EncK (M )
C = EncPK B (M )
KB
SK
PK B
I Agreeing on a shared secret key K over a public channel is difficult
I Use public-key cryptography:
• Bob generates a public/secret key-pair (PK B , SK B )
• Alice retrieves Bob’s public key PK B
• encryption only uses the public key: C = EncPK B (M)
• but decryption requires the secret key: M = DecSK B (C )
I Security: computing SK B from PK B should be difficult
⇒ rely on (supposedly) “hard” number-theoretic problems
• integer factorization (RSA)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
12 / 43
Public-key encryption
Alice
Bob
?
?
K
C = EncK (M )
C = EncPK B (M )
KB
SK
PK B
I Agreeing on a shared secret key K over a public channel is difficult
I Use public-key cryptography:
• Bob generates a public/secret key-pair (PK B , SK B )
• Alice retrieves Bob’s public key PK B
• encryption only uses the public key: C = EncPK B (M)
• but decryption requires the secret key: M = DecSK B (C )
I Security: computing SK B from PK B should be difficult
⇒ rely on (supposedly) “hard” number-theoretic problems
• integer factorization (RSA)
• discrete logarithm problem in finite fields (ElGamal, DSA, etc.)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
12 / 43
Public-key encryption
Alice
Bob
?
?
K
C = EncK (M )
C = EncPK B (M )
KB
SK
PK B
I Agreeing on a shared secret key K over a public channel is difficult
I Use public-key cryptography:
• Bob generates a public/secret key-pair (PK B , SK B )
• Alice retrieves Bob’s public key PK B
• encryption only uses the public key: C = EncPK B (M)
• but decryption requires the secret key: M = DecSK B (C )
I Security: computing SK B from PK B should be difficult
⇒ rely on (supposedly) “hard” number-theoretic problems
• integer factorization (RSA)
• discrete logarithm problem in finite fields (ElGamal, DSA, etc.)
• discrete logarithm problem in elliptic curves (elliptic curve cryptography)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
12 / 43
Public-key encryption
Alice
Bob
?
?
K
C = EncK (M )
C = EncPK B (M )
KB
SK
PK B
I Agreeing on a shared secret key K over a public channel is difficult
I Use public-key cryptography:
• Bob generates a public/secret key-pair (PK B , SK B )
• Alice retrieves Bob’s public key PK B
• encryption only uses the public key: C = EncPK B (M)
• but decryption requires the secret key: M = DecSK B (C )
I Security: computing SK B from PK B should be difficult
⇒ rely on (supposedly) “hard” number-theoretic problems
• integer factorization (RSA)
• discrete logarithm problem in finite fields (ElGamal, DSA, etc.)
• discrete logarithm problem in elliptic curves (elliptic curve cryptography)
→ harder problem, and thus requires smaller keys (256 vs. 3072 bits)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
12 / 43
Public-key encryption
Alice
Bob
?
?
K
C = EncK (M )
C = EncPK B (M )
KB
SK
PK B
I Agreeing on a shared secret key K over a public channel is difficult
I Use public-key cryptography:
• Bob generates a public/secret key-pair (PK B , SK B )
• Alice retrieves Bob’s public key PK B
• encryption only uses the public key: C = EncPK B (M)
• but decryption requires the secret key: M = DecSK B (C )
I Security: computing SK B from PK B should be difficult
⇒ rely on (supposedly) “hard” number-theoretic problems
• integer factorization (RSA)
• discrete logarithm problem in finite fields (ElGamal, DSA, etc.)
• discrete logarithm problem in elliptic curves (elliptic curve cryptography)
→ harder problem, and thus requires smaller keys (256 vs. 3072 bits)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
12 / 43
Outline
I Some encryption mechanisms
I Elliptic curve cryptography
I Scalar multiplication
I Elliptic curve arithmetic
I Finite field arithmetic
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
13 / 43
A primer on elliptic curves
I Let us consider a field K (e.g.,
Q, R, Fp , etc.)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
14 / 43
A primer on elliptic curves
I Let us consider a field K (e.g.,
Q, R, Fp , etc.)
I An elliptic curve E defined over K is given by an equation of the form
E : y 2 = x 3 + Ax + B,
with parameters A, B ∈ K
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
14 / 43
A primer on elliptic curves
I Let us consider a field K (e.g.,
Q, R, Fp , etc.)
I An elliptic curve E defined over K is given by an equation of the form
E : y 2 = x 3 + Ax + B,
with parameters A, B ∈ K
I The set of K -rational points of E is defined as
E (K ) = {(x, y ) ∈ K × K | (x, y ) satisfy E }
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
14 / 43
A primer on elliptic curves
I Let us consider a field K (e.g.,
Q, R, Fp , etc.)
I An elliptic curve E defined over K is given by an equation of the form
E : y 2 = x 3 + Ax + B,
with parameters A, B ∈ K
I The set of K -rational points of E is defined as
E (K ) = {(x, y ) ∈ K × K | (x, y ) satisfy E } ∪ {O}
O is called the “point at infinity”
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
14 / 43
A primer on elliptic curves
I Let us consider a field K (e.g.,
Q, R, Fp , etc.)
I An elliptic curve E defined over K is given by an equation of the form
E : y 2 = x 3 + Ax + B,
with parameters A, B ∈ K
I The set of K -rational points of E is defined as
E (K ) = {(x, y ) ∈ K × K | (x, y ) satisfy E } ∪ {O}
O is called the “point at infinity”
I Additive group law: E (K ) is an abelian group
• addition via the “chord and tangent” method
• O is the neutral element
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
14 / 43
Elliptic curves and group law
E /R : y 2 = x 3 − 3x + 1
6
4
2
−4
−3
−2
−1
0
1
2
3
4
−2
−4
−6
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
15 / 43
Elliptic curves and group law
E /R : y 2 = x 3 − 3x + 1
6
O
4
2
−4
−3
−2
−1
0
1
2
3
4
−2
−4
−6
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
15 / 43
Elliptic curves and group law
E /R : y 2 = x 3 − 3x + 1
6
O
4
2
−4
−3
−2
−1
0
1
2
3
4
−2
−4
−6
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
15 / 43
Elliptic curves and group law
E /R : y 2 = x 3 − 3x + 1
∞O
50
20
10
5
2
−4
−3
−2
−1
0
1
2
3
4
−2
−3
−4
−5
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
15 / 43
Elliptic curves and group law
E /R : y 2 = x 3 − 3x + 1
6
O
4
2
−4
−3
−2
−1
0
1
2
3
4
−2
−4
−6
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
15 / 43
Elliptic curves and group law
E /R : y 2 = x 3 − 3x + 1
6
O
4
2
Q
−4
−3
−2
−1
0
1
2
3
4
P
−2
−4
−6
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
15 / 43
Elliptic curves and group law
E /R : y 2 = x 3 − 3x + 1
6
O
4
2
Q
−4
−3
−2
−1
0
1
2
3
4
P
−2
−4
−6
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
15 / 43
Elliptic curves and group law
E /R : y 2 = x 3 − 3x + 1
6
O
4
2
Q
−4
−3
−2
−1
0
P
−2
1
2
3
4
P +Q
−4
−6
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
15 / 43
Elliptic curves and group law
E /F17 : y 2 = x 3 + x + 7
16
O
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
0
1
2
3
4
5
6
7
8
9
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
10 11 12 13 14 15 16
15 / 43
Elliptic curves and group law
E /F17 : y 2 = x 3 + x + 7
16
O
15
14
13
12
11
10
9
8
7
6
5
4
3
2
Q
1
P
0
0
1
2
3
4
5
6
7
8
9
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
10 11 12 13 14 15 16
15 / 43
Elliptic curves and group law
E /F17 : y 2 = x 3 + x + 7
16
O
15
14
13
12
11
10
9
8
7
6
5
4
3
2
Q
1
P
0
0
1
2
3
4
5
6
7
8
9
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
10 11 12 13 14 15 16
15 / 43
Elliptic curves and group law
E /F17 : y 2 = x 3 + x + 7
16
O
15
14
13
12
11
10
9
8
P +Q
7
6
5
4
3
2
Q
1
P
0
0
1
2
3
4
5
6
7
8
9
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
10 11 12 13 14 15 16
15 / 43
Scalar multiplication and discrete logarithm
E /K : y 2 = x 3 + Ax + B
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
16 / 43
Scalar multiplication and discrete logarithm
E /K : y 2 = x 3 + Ax + B
I If K is a finite field
Fq , then E (K ) is a finite abelian group
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
16 / 43
Scalar multiplication and discrete logarithm
E /K : y 2 = x 3 + Ax + B
I If K is a finite field
Fq , then E (K ) is a finite abelian group
• let P ∈ E (Fq ), with P 6= O
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
16 / 43
Scalar multiplication and discrete logarithm
E /K : y 2 = x 3 + Ax + B
I If K is a finite field
Fq , then E (K ) is a finite abelian group
• let P ∈ E (Fq ), with P 6= O
• consider 2P = P + P, then 3P = P + P + P, etc.
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
16 / 43
Scalar multiplication and discrete logarithm
E /K : y 2 = x 3 + Ax + B
I If K is a finite field
Fq , then E (K ) is a finite abelian group
• let P ∈ E (Fq ), with P 6= O
• consider 2P = P + P, then 3P = P + P + P, etc.
• since E (Fq ) is finite, take the smallest ` > 0 such that `P = O
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
16 / 43
Scalar multiplication and discrete logarithm
E /K : y 2 = x 3 + Ax + B
I If K is a finite field
•
•
•
•
Fq , then E (K ) is a finite abelian group
let P ∈ E (Fq ), with P 6= O
consider 2P = P + P, then 3P = P + P + P, etc.
since E (Fq ) is finite, take the smallest ` > 0 such that `P = O
define G as
G = {O, P, 2P, 3P, . . . , (` − 1)P}
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
16 / 43
Scalar multiplication and discrete logarithm
E /K : y 2 = x 3 + Ax + B
I If K is a finite field
•
•
•
•
•
Fq , then E (K ) is a finite abelian group
let P ∈ E (Fq ), with P 6= O
consider 2P = P + P, then 3P = P + P + P, etc.
since E (Fq ) is finite, take the smallest ` > 0 such that `P = O
define G as
G = {O, P, 2P, 3P, . . . , (` − 1)P}
G is a cyclic subgroup of E (Fq ), of order `, and P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
is a generator of
G
16 / 43
Scalar multiplication and discrete logarithm
E /K : y 2 = x 3 + Ax + B
I If K is a finite field
•
•
•
•
•
Fq , then E (K ) is a finite abelian group
let P ∈ E (Fq ), with P 6= O
consider 2P = P + P, then 3P = P + P + P, etc.
since E (Fq ) is finite, take the smallest ` > 0 such that `P = O
define G as
G = {O, P, 2P, 3P, . . . , (` − 1)P}
G is a cyclic subgroup of E (Fq ), of order `, and P
is a generator of
I The scalar multiplication in base P gives an isomorphism between
expP :
Z/`Z
k
G
Z/`Z and G:
−→ G
7−→ kP = P
| +P +
{z. . . + P}
k times
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
16 / 43
Scalar multiplication and discrete logarithm
E /K : y 2 = x 3 + Ax + B
I If K is a finite field
•
•
•
•
•
Fq , then E (K ) is a finite abelian group
let P ∈ E (Fq ), with P 6= O
consider 2P = P + P, then 3P = P + P + P, etc.
since E (Fq ) is finite, take the smallest ` > 0 such that `P = O
define G as
G = {O, P, 2P, 3P, . . . , (` − 1)P}
G is a cyclic subgroup of E (Fq ), of order `, and P
is a generator of
I The scalar multiplication in base P gives an isomorphism between
expP :
Z/`Z
k
G
Z/`Z and G:
−→ G
7−→ kP = P
| +P +
{z. . . + P}
k times
I The inverse map is the so-called discrete logarithm (in base P):
dlogP = exp−1
:
P
G
−→
Q 7−→
Z/`Z
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
k
such that Q = kP
16 / 43
Towards elliptic curve cryptography
I Scalar multiplication can be computed in polynomial time:
P
k
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
17 / 43
Towards elliptic curve cryptography
I Scalar multiplication can be computed in polynomial time:
kPP
k
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
17 / 43
Towards elliptic curve cryptography
I Scalar multiplication can be computed in polynomial time:
kPP
k
I Under a few conditions, the discrete logarithm can only be computed in
exponential time (as far as we know):
Q = kP
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
17 / 43
Towards elliptic curve cryptography
I Scalar multiplication can be computed in polynomial time:
kPP
k
I Under a few conditions, the discrete logarithm can only be computed in
exponential time (as far as we know):
Q = kP
k
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
17 / 43
Towards elliptic curve cryptography
I Scalar multiplication can be computed in polynomial time:
kPP
k
I Under a few conditions, the discrete logarithm can only be computed in
exponential time (as far as we know):
Q = kP
k
I That’s a one-way function
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
17 / 43
Towards elliptic curve cryptography
I Scalar multiplication can be computed in polynomial time:
kPP
k
I Under a few conditions, the discrete logarithm can only be computed in
exponential time (as far as we know):
Q = kP
k
I That’s a one-way function ⇒ public-key cryptography!
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
17 / 43
Towards elliptic curve cryptography
I Scalar multiplication can be computed in polynomial time:
kPP
k
I Under a few conditions, the discrete logarithm can only be computed in
exponential time (as far as we know):
Q = kP
k
I That’s a one-way function ⇒ public-key cryptography!
• secret key: an integer k in Z/`Z
• public key: the point kP in G ⊆ E (Fq )
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
17 / 43
Example protocol: EC Diffie–Hellman key exchange
I Alice and Bob want to establish a secure communication channel
Alice
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
Bob
18 / 43
Example protocol: EC Diffie–Hellman key exchange
I Alice and Bob want to establish a secure communication channel
I How can they decide upon a shared secret key over a public channel?
Alice
?
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
Bob
?
18 / 43
Example protocol: EC Diffie–Hellman key exchange
I Alice and Bob want to establish a secure communication channel
I How can they decide upon a shared secret key over a public channel?
Alice
?
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
Bob
?
18 / 43
Example protocol: EC Diffie–Hellman key exchange
I Alice and Bob want to establish a secure communication channel
I How can they decide upon a shared secret key over a public channel?
Alice
?
P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
Bob
?
P
18 / 43
Example protocol: EC Diffie–Hellman key exchange
I Alice and Bob want to establish a secure communication channel
I How can they decide upon a shared secret key over a public channel?
Alice
a
?
P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
Bob
b
?
P
18 / 43
Example protocol: EC Diffie–Hellman key exchange
I Alice and Bob want to establish a secure communication channel
I How can they decide upon a shared secret key over a public channel?
Alice
a
?
aPP
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
Bob
b
?
bPP
18 / 43
Example protocol: EC Diffie–Hellman key exchange
I Alice and Bob want to establish a secure communication channel
I How can they decide upon a shared secret key over a public channel?
Alice
?
a
aP
aPP
Bob
b
?
bPP
bP
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
18 / 43
Example protocol: EC Diffie–Hellman key exchange
I Alice and Bob want to establish a secure communication channel
I How can they decide upon a shared secret key over a public channel?
Alice
?
a
aP
aabPPP
Bob
b
?
abP
bPP
bP
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
18 / 43
Central operation: the scalar multiplication
I Elliptic curve Diffie–Hellman (ECDH):
• Alice: QA ← aP and K ← aQB (2 scalar mults)
• Bob: QB ← bP and K ← bQA (2 scalar mults)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
19 / 43
Central operation: the scalar multiplication
I Elliptic curve Diffie–Hellman (ECDH):
• Alice: QA ← aP and K ← aQB (2 scalar mults)
• Bob: QB ← bP and K ← bQA (2 scalar mults)
I Elliptic curve Digital Signature Algorithm (ECDSA):
• Alice (KeyGen): QA ← aP
(1 scalar mult)
• Alice (Sign):
R ← kP
(1 scalar mult)
0
• Bob (Verify): R ← uP + vQA (2 scalar mults)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
19 / 43
Central operation: the scalar multiplication
I Elliptic curve Diffie–Hellman (ECDH):
• Alice: QA ← aP and K ← aQB (2 scalar mults)
• Bob: QB ← bP and K ← bQA (2 scalar mults)
I Elliptic curve Digital Signature Algorithm (ECDSA):
• Alice (KeyGen): QA ← aP
(1 scalar mult)
• Alice (Sign):
R ← kP
(1 scalar mult)
0
• Bob (Verify): R ← uP + vQA (2 scalar mults)
I etc.
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
19 / 43
Central operation: the scalar multiplication
I Elliptic curve Diffie–Hellman (ECDH):
• Alice: QA ← aP and K ← aQB (2 scalar mults)
• Bob: QB ← bP and K ← bQA (2 scalar mults)
I Elliptic curve Digital Signature Algorithm (ECDSA):
• Alice (KeyGen): QA ← aP
(1 scalar mult)
• Alice (Sign):
R ← kP
(1 scalar mult)
0
• Bob (Verify): R ← uP + vQA (2 scalar mults)
I etc.
I Other important operations might be required, such as pairings
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
19 / 43
Central operation: the scalar multiplication
I Elliptic curve Diffie–Hellman (ECDH):
• Alice: QA ← aP and K ← aQB (2 scalar mults)
• Bob: QB ← bP and K ← bQA (2 scalar mults)
I Elliptic curve Digital Signature Algorithm (ECDSA):
• Alice (KeyGen): QA ← aP
(1 scalar mult)
• Alice (Sign):
R ← kP
(1 scalar mult)
0
• Bob (Verify): R ← uP + vQA (2 scalar mults)
I etc.
I Other important operations might be required, such as pairings
I Several algorithmic and arithmetic layers:
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
19 / 43
Central operation: the scalar multiplication
I Elliptic curve Diffie–Hellman (ECDH):
• Alice: QA ← aP and K ← aQB (2 scalar mults)
• Bob: QB ← bP and K ← bQA (2 scalar mults)
I Elliptic curve Digital Signature Algorithm (ECDSA):
• Alice (KeyGen): QA ← aP
(1 scalar mult)
• Alice (Sign):
R ← kP
(1 scalar mult)
0
• Bob (Verify): R ← uP + vQA (2 scalar mults)
I etc.
I Other important operations might be required, such as pairings
I Several algorithmic and arithmetic layers:
• scalar multiplication
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
19 / 43
Central operation: the scalar multiplication
I Elliptic curve Diffie–Hellman (ECDH):
• Alice: QA ← aP and K ← aQB (2 scalar mults)
• Bob: QB ← bP and K ← bQA (2 scalar mults)
I Elliptic curve Digital Signature Algorithm (ECDSA):
• Alice (KeyGen): QA ← aP
(1 scalar mult)
• Alice (Sign):
R ← kP
(1 scalar mult)
0
• Bob (Verify): R ← uP + vQA (2 scalar mults)
I etc.
I Other important operations might be required, such as pairings
I Several algorithmic and arithmetic layers:
• scalar multiplication
• elliptic curve arithmetic (point addition, point doubling, etc.)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
19 / 43
Central operation: the scalar multiplication
I Elliptic curve Diffie–Hellman (ECDH):
• Alice: QA ← aP and K ← aQB (2 scalar mults)
• Bob: QB ← bP and K ← bQA (2 scalar mults)
I Elliptic curve Digital Signature Algorithm (ECDSA):
• Alice (KeyGen): QA ← aP
(1 scalar mult)
• Alice (Sign):
R ← kP
(1 scalar mult)
0
• Bob (Verify): R ← uP + vQA (2 scalar mults)
I etc.
I Other important operations might be required, such as pairings
I Several algorithmic and arithmetic layers:
• scalar multiplication
• elliptic curve arithmetic (point addition, point doubling, etc.)
• finite field arithmetic (addition, multiplication, inversion, etc.)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
19 / 43
Available implementations
I There already exist several free-software, open-source implementations of ECC
(or of useful layers thereof):
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
20 / 43
Available implementations
I There already exist several free-software, open-source implementations of ECC
(or of useful layers thereof):
• at the protocol level:
GnuPG, OpenSSL, GnuTLS, OpenSSH, cryptlib, etc.
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
20 / 43
Available implementations
I There already exist several free-software, open-source implementations of ECC
(or of useful layers thereof):
• at the protocol level:
GnuPG, OpenSSL, GnuTLS, OpenSSH, cryptlib, etc.
• at the cryptographic primitive level:
RELIC, NaCl (Ed25519), crypto++, etc.
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
20 / 43
Available implementations
I There already exist several free-software, open-source implementations of ECC
(or of useful layers thereof):
• at the protocol level:
GnuPG, OpenSSL, GnuTLS, OpenSSH, cryptlib, etc.
• at the cryptographic primitive level:
RELIC, NaCl (Ed25519), crypto++, etc.
• at the curve arithmetic level: PARI, Sage (not for crypto!)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
20 / 43
Available implementations
I There already exist several free-software, open-source implementations of ECC
(or of useful layers thereof):
• at the protocol level:
GnuPG, OpenSSL, GnuTLS, OpenSSH, cryptlib, etc.
• at the cryptographic primitive level:
RELIC, NaCl (Ed25519), crypto++, etc.
• at the curve arithmetic level: PARI, Sage (not for crypto!)
• at the field arithmetic level: MPFQ, GF2X, NTL, GMP, etc.
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
20 / 43
Available implementations
I There already exist several free-software, open-source implementations of ECC
(or of useful layers thereof):
• at the protocol level:
GnuPG, OpenSSL, GnuTLS, OpenSSH, cryptlib, etc.
• at the cryptographic primitive level:
RELIC, NaCl (Ed25519), crypto++, etc.
• at the curve arithmetic level: PARI, Sage (not for crypto!)
• at the field arithmetic level: MPFQ, GF2X, NTL, GMP, etc.
I Available open-source hardware implementations of ECC:
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
20 / 43
Available implementations
I There already exist several free-software, open-source implementations of ECC
(or of useful layers thereof):
• at the protocol level:
GnuPG, OpenSSL, GnuTLS, OpenSSH, cryptlib, etc.
• at the cryptographic primitive level:
RELIC, NaCl (Ed25519), crypto++, etc.
• at the curve arithmetic level: PARI, Sage (not for crypto!)
• at the field arithmetic level: MPFQ, GF2X, NTL, GMP, etc.
I Available open-source hardware implementations of ECC:
• implementation of NaCl’s crypto box (Ed25519 + Salsa20 + Poly1305)
in 29.3 to 32.6 kGE [Hutter et al., 2015]
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
20 / 43
Available implementations
I There already exist several free-software, open-source implementations of ECC
(or of useful layers thereof):
• at the protocol level:
GnuPG, OpenSSL, GnuTLS, OpenSSH, cryptlib, etc.
• at the cryptographic primitive level:
RELIC, NaCl (Ed25519), crypto++, etc.
• at the curve arithmetic level: PARI, Sage (not for crypto!)
• at the field arithmetic level: MPFQ, GF2X, NTL, GMP, etc.
I Available open-source hardware implementations of ECC:
• implementation of NaCl’s crypto box (Ed25519 + Salsa20 + Poly1305)
in 29.3 to 32.6 kGE [Hutter et al., 2015]
• PAVOIS project: ECC cryptoprocessor designed to evaluate algorithmic and
arithmetic protections against side-channel attacks [See A. Tisserand’s talk]
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
20 / 43
Outline
I Some encryption mechanisms
I Elliptic curve cryptography
I Scalar multiplication
I Elliptic curve arithmetic
I Finite field arithmetic
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
21 / 43
Scalar multiplication
I Given k in
Z/`Z and P
in
G ⊆ E (Fq ), we want to compute
kP = P
{z. . . + P}
| +P +
k times
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
22 / 43
Scalar multiplication
I Given k in
Z/`Z and P
in
G ⊆ E (Fq ), we want to compute
kP = P
{z. . . + P}
| +P +
k times
I Size of ` (and k) for crypto applications: from 250 to 500 bits
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
22 / 43
Scalar multiplication
I Given k in
Z/`Z and P
in
G ⊆ E (Fq ), we want to compute
kP = P
{z. . . + P}
| +P +
k times
I Size of ` (and k) for crypto applications: from 250 to 500 bits
I Repeated addition, in O(k) complexity, is out of the question!
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
22 / 43
Double-and-add algorithm
I Available operations on E (Fq ):
• point addition: (Q, R) 7→ Q + R
• point doubling: Q 7→ 2Q = Q + Q
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
23 / 43
Double-and-add algorithm
I Available operations on E (Fq ):
• point addition: (Q, R) 7→ Q + R
• point doubling: Q 7→ 2Q = Q + Q
I Idea: iterative algorithm based on the binary expansion of k
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
23 / 43
Double-and-add algorithm
I Available operations on E (Fq ):
• point addition: (Q, R) 7→ Q + R
• point doubling: Q 7→ 2Q = Q + Q
I Idea: iterative algorithm based on the binary expansion of k
• start from the most significant bit of k
• double current result at each step
• add P if the corresponding bit of k is 1
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
23 / 43
Double-and-add algorithm
I Available operations on E (Fq ):
• point addition: (Q, R) 7→ Q + R
• point doubling: Q 7→ 2Q = Q + Q
I Idea: iterative algorithm based on the binary expansion of k
•
•
•
•
start from the most significant bit of k
double current result at each step
add P if the corresponding bit of k is 1
same principle as binary exponentiation
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
23 / 43
Double-and-add algorithm
I Denoting by (kn−1 . . . k1 k0 )2 , with n = dlog2 `e, the binary expansion of k:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
24 / 43
Double-and-add algorithm
I Denoting by (kn−1 . . . k1 k0 )2 , with n = dlog2 `e, the binary expansion of k:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
I Example: k = 431
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
24 / 43
Double-and-add algorithm
I Denoting by (kn−1 . . . k1 k0 )2 , with n = dlog2 `e, the binary expansion of k:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
I Example: k = 431 = (110101111)2
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
24 / 43
Double-and-add algorithm
I Denoting by (kn−1 . . . k1 k0 )2 , with n = dlog2 `e, the binary expansion of k:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
I Example: k = 431 = (110101111)2
T =
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
=
O
24 / 43
Double-and-add algorithm
I Denoting by (kn−1 . . . k1 k0 )2 , with n = dlog2 `e, the binary expansion of k:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
I Example: k = 431 = (110101111)2
T =
P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
=
P
24 / 43
Double-and-add algorithm
I Denoting by (kn−1 . . . k1 k0 )2 , with n = dlog2 `e, the binary expansion of k:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
I Example: k = 431 = (110101111)2
T =
P ·2
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
=
2P
24 / 43
Double-and-add algorithm
I Denoting by (kn−1 . . . k1 k0 )2 , with n = dlog2 `e, the binary expansion of k:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
I Example: k = 431 = (110101111)2
T =
P ·2+P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
=
3P
24 / 43
Double-and-add algorithm
I Denoting by (kn−1 . . . k1 k0 )2 , with n = dlog2 `e, the binary expansion of k:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
I Example: k = 431 = (110101111)2
T =
(P · 2 + P) · 2
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
=
6P
24 / 43
Double-and-add algorithm
I Denoting by (kn−1 . . . k1 k0 )2 , with n = dlog2 `e, the binary expansion of k:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
I Example: k = 431 = (110101111)2
T =
(P · 2 + P) · 22
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
= 12P
24 / 43
Double-and-add algorithm
I Denoting by (kn−1 . . . k1 k0 )2 , with n = dlog2 `e, the binary expansion of k:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
I Example: k = 431 = (110101111)2
T =
(P · 2 + P) · 22 + P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
= 13P
24 / 43
Double-and-add algorithm
I Denoting by (kn−1 . . . k1 k0 )2 , with n = dlog2 `e, the binary expansion of k:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
I Example: k = 431 = (110101111)2
T =
((P · 2 + P) · 22 + P) · 2
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
= 26P
24 / 43
Double-and-add algorithm
I Denoting by (kn−1 . . . k1 k0 )2 , with n = dlog2 `e, the binary expansion of k:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
I Example: k = 431 = (110101111)2
T =
((P · 2 + P) · 22 + P) · 22
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
= 52P
24 / 43
Double-and-add algorithm
I Denoting by (kn−1 . . . k1 k0 )2 , with n = dlog2 `e, the binary expansion of k:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
I Example: k = 431 = (110101111)2
T =
((P · 2 + P) · 22 + P) · 22 + P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
= 53P
24 / 43
Double-and-add algorithm
I Denoting by (kn−1 . . . k1 k0 )2 , with n = dlog2 `e, the binary expansion of k:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
I Example: k = 431 = (110101111)2
T =
(((P · 2 + P) · 22 + P) · 22 + P) · 2
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
= 106P
24 / 43
Double-and-add algorithm
I Denoting by (kn−1 . . . k1 k0 )2 , with n = dlog2 `e, the binary expansion of k:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
I Example: k = 431 = (110101111)2
T =
(((P · 2 + P) · 22 + P) · 22 + P) · 2 + P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
= 107P
24 / 43
Double-and-add algorithm
I Denoting by (kn−1 . . . k1 k0 )2 , with n = dlog2 `e, the binary expansion of k:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
I Example: k = 431 = (110101111)2
T = ((((P · 2 + P) · 22 + P) · 22 + P) · 2 + P) · 2
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
= 214P
24 / 43
Double-and-add algorithm
I Denoting by (kn−1 . . . k1 k0 )2 , with n = dlog2 `e, the binary expansion of k:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
I Example: k = 431 = (110101111)2
T = ((((P · 2 + P) · 22 + P) · 22 + P) · 2 + P) · 2 + P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
= 215P
24 / 43
Double-and-add algorithm
I Denoting by (kn−1 . . . k1 k0 )2 , with n = dlog2 `e, the binary expansion of k:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
I Example: k = 431 = (110101111)2
T = (((((P · 2 + P) · 22 + P) · 22 + P) · 2 + P) · 2 + P) · 2
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
= 430P
24 / 43
Double-and-add algorithm
I Denoting by (kn−1 . . . k1 k0 )2 , with n = dlog2 `e, the binary expansion of k:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
I Example: k = 431 = (110101111)2
T = (((((P · 2 + P) · 22 + P) · 22 + P) · 2 + P) · 2 + P) · 2 + P = 431P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
24 / 43
Double-and-add algorithm
I Denoting by (kn−1 . . . k1 k0 )2 , with n = dlog2 `e, the binary expansion of k:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
I Example: k = 431 = (110101111)2
T = (((((P · 2 + P) · 22 + P) · 22 + P) · 2 + P) · 2 + P) · 2 + P = 431P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
24 / 43
Double-and-add algorithm
I Denoting by (kn−1 . . . k1 k0 )2 , with n = dlog2 `e, the binary expansion of k:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
I Example: k = 431 = (110101111)2
T = (((((P · 2 + P) · 22 + P) · 22 + P) · 2 + P) · 2 + P) · 2 + P = 431P
I Complexity in O(n) = O(log2 `) operations over E (Fq ):
• n − 1 doublings, and
• n/2 additions on average
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
24 / 43
Windowed method
I Consider 2w -ary expansion of k: i.e., split k into w -bit chunks
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
25 / 43
Windowed method
I Consider 2w -ary expansion of k: i.e., split k into w -bit chunks
I Precompute 2P, 3P, . . . , (2w − 1)P:
• 2w −1 − 1 doublings, and
• 2w −1 − 1 additions
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
25 / 43
Windowed method
I Consider 2w -ary expansion of k: i.e., split k into w -bit chunks
I Precompute 2P, 3P, . . . , (2w − 1)P:
• 2w −1 − 1 doublings, and
• 2w −1 − 1 additions
I Example with w = 3: k = 431
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
25 / 43
Windowed method
I Consider 2w -ary expansion of k: i.e., split k into w -bit chunks
I Precompute 2P, 3P, . . . , (2w − 1)P:
• 2w −1 − 1 doublings, and
• 2w −1 − 1 additions
I Example with w = 3: k = 431 = (110 101 111)2
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
25 / 43
Windowed method
I Consider 2w -ary expansion of k: i.e., split k into w -bit chunks
I Precompute 2P, 3P, . . . , (2w − 1)P:
• 2w −1 − 1 doublings, and
• 2w −1 − 1 additions
I Example with w = 3: k = 431 = (110 101 111)2 = (657)23
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
25 / 43
Windowed method
I Consider 2w -ary expansion of k: i.e., split k into w -bit chunks
I Precompute 2P, 3P, . . . , (2w − 1)P:
• 2w −1 − 1 doublings, and
• 2w −1 − 1 additions
I Example with w = 3: k = 431 = (110 101 111)2 = (657)23
T =
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
=
O
25 / 43
Windowed method
I Consider 2w -ary expansion of k: i.e., split k into w -bit chunks
I Precompute 2P, 3P, . . . , (2w − 1)P:
• 2w −1 − 1 doublings, and
• 2w −1 − 1 additions
I Example with w = 3: k = 431 = (110 101 111)2 = (657)23
T = 6P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
=
6P
25 / 43
Windowed method
I Consider 2w -ary expansion of k: i.e., split k into w -bit chunks
I Precompute 2P, 3P, . . . , (2w − 1)P:
• 2w −1 − 1 doublings, and
• 2w −1 − 1 additions
I Example with w = 3: k = 431 = (110 101 111)2 = (657)23
T = 6P · 23
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
= 48P
25 / 43
Windowed method
I Consider 2w -ary expansion of k: i.e., split k into w -bit chunks
I Precompute 2P, 3P, . . . , (2w − 1)P:
• 2w −1 − 1 doublings, and
• 2w −1 − 1 additions
I Example with w = 3: k = 431 = (110 101 111)2 = (657)23
T = 6P · 23 + 5P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
= 53P
25 / 43
Windowed method
I Consider 2w -ary expansion of k: i.e., split k into w -bit chunks
I Precompute 2P, 3P, . . . , (2w − 1)P:
• 2w −1 − 1 doublings, and
• 2w −1 − 1 additions
I Example with w = 3: k = 431 = (110 101 111)2 = (657)23
T = (6P · 23 + 5P) · 23
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
= 424P
25 / 43
Windowed method
I Consider 2w -ary expansion of k: i.e., split k into w -bit chunks
I Precompute 2P, 3P, . . . , (2w − 1)P:
• 2w −1 − 1 doublings, and
• 2w −1 − 1 additions
I Example with w = 3: k = 431 = (110 101 111)2 = (657)23
T = (6P · 23 + 5P) · 23 + 7P = 431P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
25 / 43
Windowed method
I Consider 2w -ary expansion of k: i.e., split k into w -bit chunks
I Precompute 2P, 3P, . . . , (2w − 1)P:
• 2w −1 − 1 doublings, and
• 2w −1 − 1 additions
I Example with w = 3: k = 431 = (110 101 111)2 = (657)23
T = (6P · 23 + 5P) · 23 + 7P = 431P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
25 / 43
Windowed method
I Consider 2w -ary expansion of k: i.e., split k into w -bit chunks
I Precompute 2P, 3P, . . . , (2w − 1)P:
• 2w −1 − 1 doublings, and
• 2w −1 − 1 additions
I Example with w = 3: k = 431 = (110 101 111)2 = (657)23
T = (6P · 23 + 5P) · 23 + 7P = 431P
I Complexity:
• n − w doublings, and
• (1 − 2−w )n/w additions on average
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
25 / 43
Windowed method
I Consider 2w -ary expansion of k: i.e., split k into w -bit chunks
I Precompute 2P, 3P, . . . , (2w − 1)P:
• 2w −1 − 1 doublings, and
• 2w −1 − 1 additions
I Example with w = 3: k = 431 = (110 101 111)2 = (657)23
T = (6P · 23 + 5P) · 23 + 7P = 431P
I Complexity:
• n − w doublings, and
• (1 − 2−w )n/w additions on average
I Select w carefully so that precomputation cost does not become predominant
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
25 / 43
Windowed method
I Consider 2w -ary expansion of k: i.e., split k into w -bit chunks
I Precompute 2P, 3P, . . . , (2w − 1)P:
• 2w −1 − 1 doublings, and
• 2w −1 − 1 additions
I Example with w = 3: k = 431 = (110 101 111)2 = (657)23
T = (6P · 23 + 5P) · 23 + 7P = 431P
I Complexity:
• n − w doublings, and
• (1 − 2−w )n/w additions on average
I Select w carefully so that precomputation cost does not become predominant
I Sliding window variant: half as many precomputations
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
25 / 43
Security issues
I Back to the double-and-add algorithm:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
26 / 43
Security issues
I Back to the double-and-add algorithm:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
I At step i, point addition T ← T + P is computed if and only if ki = 1
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
26 / 43
Security issues
I Back to the double-and-add algorithm:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
I At step i, point addition T ← T + P is computed if and only if ki = 1
• careful timing analysis will reveal Hamming weight of secret k
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
26 / 43
Security issues
I Back to the double-and-add algorithm:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
Power
I At step i, point addition T ← T + P is computed if and only if ki = 1
• careful timing analysis will reveal Hamming weight of secret k
• simple power analysis (SPA) will leak bits of k
Time
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
26 / 43
Security issues
I Back to the double-and-add algorithm:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
return T
Power
I At step i, point addition T ← T + P is computed if and only if ki = 1
• careful timing analysis will reveal Hamming weight of secret k
• simple power analysis (SPA) will leak bits of k
1
0
0
1
1
0
1
0
0
1
Time
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
26 / 43
Security issues
I Back to the double-and-add algorithm:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
else:
Z ←T +P
return T
Power
I At step i, point addition T ← T + P is computed if and only if ki = 1
• careful timing analysis will reveal Hamming weight of secret k
• simple power analysis (SPA) will leak bits of k
1
0
0
1
1
0
1
0
0
1
Time
I Use double-and-add-always algorithm?
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
26 / 43
Security issues
I Back to the double-and-add algorithm:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
else:
Z ←T +P
return T
Power
I At step i, point addition T ← T + P is computed if and only if ki = 1
• careful timing analysis will reveal Hamming weight of secret k
• simple power analysis (SPA) will leak bits of k
1
0
0
1
1
0
1
0
0
1
Time
I Use double-and-add-always algorithm?
• the result of the point addition is used if and only if ki = 1
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
26 / 43
Security issues
I Back to the double-and-add algorithm:
function scalar-mult(k, P):
T ←O
for i ← n − 1 downto 0:
T ← 2T
if ki = 1:
T ←T +P
else:
Z ←T +P
return T
Power
I At step i, point addition T ← T + P is computed if and only if ki = 1
• careful timing analysis will reveal Hamming weight of secret k
• simple power analysis (SPA) will leak bits of k
1
0
0
1
1
0
1
0
0
1
Time
I Use double-and-add-always algorithm?
• the result of the point addition is used if and only if ki = 1
⇒ vulnerable to fault attacks [See A. Tisserand’s lecture]
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
26 / 43
The Montgomery ladder
I Algorithm proposed by Montgomery in 1987:
function scalar-mult(k, P):
T0 ← O
T1 ← P
for i ← n − 1 downto 0:
if ki = 1:
T0 ← T0 + T1
T1 ← 2T1
else:
T1 ← T0 + T1
T0 ← 2T0
return T0
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
27 / 43
The Montgomery ladder
I Algorithm proposed by Montgomery in 1987:
function scalar-mult(k, P):
T0 ← O
T1 ← P
for i ← n − 1 downto 0:
if ki = 1:
T0 ← T0 + T1
T1 ← 2T1
else:
T1 ← T0 + T1
T0 ← 2T0
return T0
I Properties:
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
27 / 43
The Montgomery ladder
I Algorithm proposed by Montgomery in 1987:
function scalar-mult(k, P):
T0 ← O
T1 ← P
for i ← n − 1 downto 0:
if ki = 1:
T0 ← T0 + T1
T1 ← 2T1
else:
T1 ← T0 + T1
T0 ← 2T0
return T0
I Properties:
• perform one addition and one doubling at each step
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
27 / 43
The Montgomery ladder
I Algorithm proposed by Montgomery in 1987:
function scalar-mult(k, P):
T0 ← O
T1 ← P
for i ← n − 1 downto 0:
if ki = 1:
T0 ← T0 + T1
T1 ← 2T1
else:
T1 ← T0 + T1
T0 ← 2T0
return T0
I Properties:
• perform one addition and one doubling at each step
• ensure that both results are used in the next step
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
27 / 43
The Montgomery ladder
I Algorithm proposed by Montgomery in 1987:
function scalar-mult(k, P):
T0 ← O
T1 ← P
for i ← n − 1 downto 0:
if ki = 1:
T0 ← T0 + T1
T1 ← 2T1
else:
T1 ← T0 + T1
T0 ← 2T0
return T0
I Properties:
• perform one addition and one doubling at each step
• ensure that both results are used in the next step
• loop invariant: T1 = T0 + P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
27 / 43
The Montgomery ladder
I Algorithm proposed by Montgomery in 1987:
function scalar-mult(k, P):
T0 ← O
T1 ← P
for i ← n − 1 downto 0:
if ki = 1:
T0 ← T0 + T1
T1 ← 2T1
else:
T1 ← T0 + T1
T0 ← 2T0
return T0
I Properties:
• perform one addition and one doubling at each step
• ensure that both results are used in the next step
• loop invariant: T1 = T0 + P
I Example: k = 19
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
27 / 43
The Montgomery ladder
I Algorithm proposed by Montgomery in 1987:
function scalar-mult(k, P):
T0 ← O
T1 ← P
for i ← n − 1 downto 0:
if ki = 1:
T0 ← T0 + T1
T1 ← 2T1
else:
T1 ← T0 + T1
T0 ← 2T0
return T0
I Properties:
• perform one addition and one doubling at each step
• ensure that both results are used in the next step
• loop invariant: T1 = T0 + P
I Example: k = 19 = (10011)2
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
27 / 43
The Montgomery ladder
I Algorithm proposed by Montgomery in 1987:
function scalar-mult(k, P):
T0 ← O
T1 ← P
for i ← n − 1 downto 0:
if ki = 1:
T0 ← T0 + T1
T1 ← 2T1
else:
T1 ← T0 + T1
T0 ← 2T0
return T0
I Properties:
• perform one addition and one doubling at each step
• ensure that both results are used in the next step
• loop invariant: T1 = T0 + P
I Example: k = 19 = (10011)2
T0 =
T1 = P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
=
=
O
P
27 / 43
The Montgomery ladder
I Algorithm proposed by Montgomery in 1987:
function scalar-mult(k, P):
T0 ← O
T1 ← P
for i ← n − 1 downto 0:
if ki = 1:
T0 ← T0 + T1
T1 ← 2T1
else:
T1 ← T0 + T1
T0 ← 2T0
return T0
I Properties:
• perform one addition and one doubling at each step
• ensure that both results are used in the next step
• loop invariant: T1 = T0 + P
I Example: k = 19 = (10011)2
T0 =
T1 = P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
=
=
O
P
27 / 43
The Montgomery ladder
I Algorithm proposed by Montgomery in 1987:
function scalar-mult(k, P):
T0 ← O
T1 ← P
for i ← n − 1 downto 0:
if ki = 1:
T0 ← T0 + T1
T1 ← 2T1
else:
T1 ← T0 + T1
T0 ← 2T0
return T0
I Properties:
• perform one addition and one doubling at each step
• ensure that both results are used in the next step
• loop invariant: T1 = T0 + P
I Example: k = 19 = (10011)2
T0 = P
T1 = P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
=
=
P
P
27 / 43
The Montgomery ladder
I Algorithm proposed by Montgomery in 1987:
function scalar-mult(k, P):
T0 ← O
T1 ← P
for i ← n − 1 downto 0:
if ki = 1:
T0 ← T0 + T1
T1 ← 2T1
else:
T1 ← T0 + T1
T0 ← 2T0
return T0
I Properties:
• perform one addition and one doubling at each step
• ensure that both results are used in the next step
• loop invariant: T1 = T0 + P
I Example: k = 19 = (10011)2
T0 = P
T1 = P · 2
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
= P
= 2P
27 / 43
The Montgomery ladder
I Algorithm proposed by Montgomery in 1987:
function scalar-mult(k, P):
T0 ← O
T1 ← P
for i ← n − 1 downto 0:
if ki = 1:
T0 ← T0 + T1
T1 ← 2T1
else:
T1 ← T0 + T1
T0 ← 2T0
return T0
I Properties:
• perform one addition and one doubling at each step
• ensure that both results are used in the next step
• loop invariant: T1 = T0 + P
I Example: k = 19 = (10011)2
T0 = P
T1 = P · 2
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
= P
= 2P
27 / 43
The Montgomery ladder
I Algorithm proposed by Montgomery in 1987:
function scalar-mult(k, P):
T0 ← O
T1 ← P
for i ← n − 1 downto 0:
if ki = 1:
T0 ← T0 + T1
T1 ← 2T1
else:
T1 ← T0 + T1
T0 ← 2T0
return T0
I Properties:
• perform one addition and one doubling at each step
• ensure that both results are used in the next step
• loop invariant: T1 = T0 + P
I Example: k = 19 = (10011)2
T0 = P
T1 = P · 2 + P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
= P
= 3P
27 / 43
The Montgomery ladder
I Algorithm proposed by Montgomery in 1987:
function scalar-mult(k, P):
T0 ← O
T1 ← P
for i ← n − 1 downto 0:
if ki = 1:
T0 ← T0 + T1
T1 ← 2T1
else:
T1 ← T0 + T1
T0 ← 2T0
return T0
I Properties:
• perform one addition and one doubling at each step
• ensure that both results are used in the next step
• loop invariant: T1 = T0 + P
I Example: k = 19 = (10011)2
T0 = P · 2
T1 = P · 2 + P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
= 2P
= 3P
27 / 43
The Montgomery ladder
I Algorithm proposed by Montgomery in 1987:
function scalar-mult(k, P):
T0 ← O
T1 ← P
for i ← n − 1 downto 0:
if ki = 1:
T0 ← T0 + T1
T1 ← 2T1
else:
T1 ← T0 + T1
T0 ← 2T0
return T0
I Properties:
• perform one addition and one doubling at each step
• ensure that both results are used in the next step
• loop invariant: T1 = T0 + P
I Example: k = 19 = (10011)2
T0 = P · 2
T1 = P · 2 + P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
= 2P
= 3P
27 / 43
The Montgomery ladder
I Algorithm proposed by Montgomery in 1987:
function scalar-mult(k, P):
T0 ← O
T1 ← P
for i ← n − 1 downto 0:
if ki = 1:
T0 ← T0 + T1
T1 ← 2T1
else:
T1 ← T0 + T1
T0 ← 2T0
return T0
I Properties:
• perform one addition and one doubling at each step
• ensure that both results are used in the next step
• loop invariant: T1 = T0 + P
I Example: k = 19 = (10011)2
T0 = P · 2
T1 = P · 2 + P + 2P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
= 2P
= 5P
27 / 43
The Montgomery ladder
I Algorithm proposed by Montgomery in 1987:
function scalar-mult(k, P):
T0 ← O
T1 ← P
for i ← n − 1 downto 0:
if ki = 1:
T0 ← T0 + T1
T1 ← 2T1
else:
T1 ← T0 + T1
T0 ← 2T0
return T0
I Properties:
• perform one addition and one doubling at each step
• ensure that both results are used in the next step
• loop invariant: T1 = T0 + P
I Example: k = 19 = (10011)2
T 0 = P · 22
T1 = P · 2 + P + 2P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
= 4P
= 5P
27 / 43
The Montgomery ladder
I Algorithm proposed by Montgomery in 1987:
function scalar-mult(k, P):
T0 ← O
T1 ← P
for i ← n − 1 downto 0:
if ki = 1:
T0 ← T0 + T1
T1 ← 2T1
else:
T1 ← T0 + T1
T0 ← 2T0
return T0
I Properties:
• perform one addition and one doubling at each step
• ensure that both results are used in the next step
• loop invariant: T1 = T0 + P
I Example: k = 19 = (10011)2
T 0 = P · 22
T1 = P · 2 + P + 2P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
= 4P
= 5P
27 / 43
The Montgomery ladder
I Algorithm proposed by Montgomery in 1987:
function scalar-mult(k, P):
T0 ← O
T1 ← P
for i ← n − 1 downto 0:
if ki = 1:
T0 ← T0 + T1
T1 ← 2T1
else:
T1 ← T0 + T1
T0 ← 2T0
return T0
I Properties:
• perform one addition and one doubling at each step
• ensure that both results are used in the next step
• loop invariant: T1 = T0 + P
I Example: k = 19 = (10011)2
T0 = P · 22 + 5P
T1 = P · 2 + P + 2P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
= 9P
= 5P
27 / 43
The Montgomery ladder
I Algorithm proposed by Montgomery in 1987:
function scalar-mult(k, P):
T0 ← O
T1 ← P
for i ← n − 1 downto 0:
if ki = 1:
T0 ← T0 + T1
T1 ← 2T1
else:
T1 ← T0 + T1
T0 ← 2T0
return T0
I Properties:
• perform one addition and one doubling at each step
• ensure that both results are used in the next step
• loop invariant: T1 = T0 + P
I Example: k = 19 = (10011)2
T0 = P · 22 + 5P
= 9P
T1 = (P · 2 + P + 2P) · 2 = 10P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
27 / 43
The Montgomery ladder
I Algorithm proposed by Montgomery in 1987:
function scalar-mult(k, P):
T0 ← O
T1 ← P
for i ← n − 1 downto 0:
if ki = 1:
T0 ← T0 + T1
T1 ← 2T1
else:
T1 ← T0 + T1
T0 ← 2T0
return T0
I Properties:
• perform one addition and one doubling at each step
• ensure that both results are used in the next step
• loop invariant: T1 = T0 + P
I Example: k = 19 = (10011)2
T0 = P · 22 + 5P
= 9P
T1 = (P · 2 + P + 2P) · 2 = 10P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
27 / 43
The Montgomery ladder
I Algorithm proposed by Montgomery in 1987:
function scalar-mult(k, P):
T0 ← O
T1 ← P
for i ← n − 1 downto 0:
if ki = 1:
T0 ← T0 + T1
T1 ← 2T1
else:
T1 ← T0 + T1
T0 ← 2T0
return T0
I Properties:
• perform one addition and one doubling at each step
• ensure that both results are used in the next step
• loop invariant: T1 = T0 + P
I Example: k = 19 = (10011)2
T0 = P · 22 + 5P + 10P = 19P
T1 = (P · 2 + P + 2P) · 2 = 10P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
27 / 43
The Montgomery ladder
I Algorithm proposed by Montgomery in 1987:
function scalar-mult(k, P):
T0 ← O
T1 ← P
for i ← n − 1 downto 0:
if ki = 1:
T0 ← T0 + T1
T1 ← 2T1
else:
T1 ← T0 + T1
T0 ← 2T0
return T0
I Properties:
• perform one addition and one doubling at each step
• ensure that both results are used in the next step
• loop invariant: T1 = T0 + P
I Example: k = 19 = (10011)2
T0 = P · 22 + 5P + 10P = 19P
T1 = (P · 2 + P + 2P) · 22 = 20P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
27 / 43
The Montgomery ladder
I Algorithm proposed by Montgomery in 1987:
function scalar-mult(k, P):
T0 ← O
T1 ← P
for i ← n − 1 downto 0:
if ki = 1:
T0 ← T0 + T1
T1 ← 2T1
else:
T1 ← T0 + T1
T0 ← 2T0
return T0
I Properties:
• perform one addition and one doubling at each step
• ensure that both results are used in the next step
• loop invariant: T1 = T0 + P
I Example: k = 19 = (10011)2
T0 = P · 22 + 5P + 10P = 19P
T1 = (P · 2 + P + 2P) · 22 = 20P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
27 / 43
Outline
I Some encryption mechanisms
I Elliptic curve cryptography
I Scalar multiplication
I Elliptic curve arithmetic
I Finite field arithmetic
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
28 / 43
Addition and doubling
O
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
E
29 / 43
Addition and doubling
O
E
Q
P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
29 / 43
Addition and doubling
O
E
LP,Q
Q
P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
29 / 43
Addition and doubling
O
E
LP,Q
R0
Q
P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
29 / 43
Addition and doubling
O
LR 0 ,O
E
LP,Q
R0
Q
P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
29 / 43
Addition and doubling
O
LR 0 ,O
E
LP,Q
R0
Q
P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
R =P +Q
29 / 43
Addition and doubling
O
E
P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
29 / 43
Addition and doubling
O
E
LP,P
P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
29 / 43
Addition and doubling
O
E
LP,P
R0
P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
29 / 43
Addition and doubling
O
LR 0 ,O
E
LP,P
R0
P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
29 / 43
Addition and doubling
O
LR 0 ,O
E
LP,P
R0
P
R = 2P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
29 / 43
Addition and doubling formulae
E /Fq : y 2 = x 3 + Ax + B
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
30 / 43
Addition and doubling formulae
E /Fq : y 2 = x 3 + Ax + B
I Let P = (xP , yP ) and Q = (xQ , yQ ) ∈ E (Fq )\{O} (affine coordinates)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
30 / 43
Addition and doubling formulae
E /Fq : y 2 = x 3 + Ax + B
I Let P = (xP , yP ) and Q = (xQ , yQ ) ∈ E (Fq )\{O} (affine coordinates)
I The opposite of P is −P = (xP , −yP )
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
30 / 43
Addition and doubling formulae
E /Fq : y 2 = x 3 + Ax + B
I Let P = (xP , yP ) and Q = (xQ , yQ ) ∈ E (Fq )\{O} (affine coordinates)
I The opposite of P is −P = (xP , −yP )
I If P 6= −Q, then P + Q = R = (xR , yR ) with
xR = λ2 − xP − xQ
where

yQ − yP


 x −x
Q
P
λ=
2
3x + A


 P
2yP
and
yR = λ(xP − xR ) − yP
if P 6= Q (addition), or
if P = Q (doubling)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
30 / 43
Addition and doubling formulae
E /Fq : y 2 = x 3 + Ax + B
I Let P = (xP , yP ) and Q = (xQ , yQ ) ∈ E (Fq )\{O} (affine coordinates)
I The opposite of P is −P = (xP , −yP )
I If P 6= −Q, then P + Q = R = (xR , yR ) with
xR = λ2 − xP − xQ
where

yQ − yP


 x −x
Q
P
λ=
2
3x + A


 P
2yP
and
yR = λ(xP − xR ) − yP
if P 6= Q (addition), or
if P = Q (doubling)
I Cost (number of multiplications, squarings, and inversions in
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
Fq ):
30 / 43
Addition and doubling formulae
E /Fq : y 2 = x 3 + Ax + B
I Let P = (xP , yP ) and Q = (xQ , yQ ) ∈ E (Fq )\{O} (affine coordinates)
I The opposite of P is −P = (xP , −yP )
I If P 6= −Q, then P + Q = R = (xR , yR ) with
xR = λ2 − xP − xQ
where

yQ − yP


 x −x
Q
P
λ=
2
3x + A


 P
2yP
and
yR = λ(xP − xR ) − yP
if P 6= Q (addition), or
if P = Q (doubling)
I Cost (number of multiplications, squarings, and inversions in
Fq ):
• addition: 2M + 1S + 1I
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
30 / 43
Addition and doubling formulae
E /Fq : y 2 = x 3 + Ax + B
I Let P = (xP , yP ) and Q = (xQ , yQ ) ∈ E (Fq )\{O} (affine coordinates)
I The opposite of P is −P = (xP , −yP )
I If P 6= −Q, then P + Q = R = (xR , yR ) with
xR = λ2 − xP − xQ
where

yQ − yP


 x −x
Q
P
λ=
2
3x + A


 P
2yP
and
yR = λ(xP − xR ) − yP
if P 6= Q (addition), or
if P = Q (doubling)
I Cost (number of multiplications, squarings, and inversions in
Fq ):
• addition: 2M + 1S + 1I
• doubling: 2M + 2S + 1I
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
30 / 43
Addition and doubling formulae
E /Fq : y 2 = x 3 + Ax + B
I Let P = (xP , yP ) and Q = (xQ , yQ ) ∈ E (Fq )\{O} (affine coordinates)
I The opposite of P is −P = (xP , −yP )
I If P 6= −Q, then P + Q = R = (xR , yR ) with
xR = λ2 − xP − xQ
where

yQ − yP


 x −x
Q
P
λ=
2
3x + A


 P
2yP
and
yR = λ(xP − xR ) − yP
if P 6= Q (addition), or
if P = Q (doubling)
I Cost (number of multiplications, squarings, and inversions in
Fq ):
• addition: 2M + 1S + 1I
• doubling: 2M + 2S + 1I
⇒ field inversion is expensive!
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
30 / 43
Other coordinate systems
E /Fq : y 2 = x 3 + Ax + B
I One can use other coordinate systems which provide more efficient formulae
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
31 / 43
Other coordinate systems
E /Fq : y 2 = x 3 + Ax + B
I One can use other coordinate systems which provide more efficient formulae
I Projective coordinates: points (X : Y : Z ) with (x, y ) = (X /Z , Y /Z )
E /Fq : Y 2 Z = X 3 + AXZ 2 + BZ 3
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
31 / 43
Other coordinate systems
E /Fq : y 2 = x 3 + Ax + B
I One can use other coordinate systems which provide more efficient formulae
I Projective coordinates: points (X : Y : Z ) with (x, y ) = (X /Z , Y /Z )
E /Fq : Y 2 Z = X 3 + AXZ 2 + BZ 3
• idea: get rid of the inversion over
Fq
by using Z as the denominator
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
31 / 43
Other coordinate systems
E /Fq : y 2 = x 3 + Ax + B
I One can use other coordinate systems which provide more efficient formulae
I Projective coordinates: points (X : Y : Z ) with (x, y ) = (X /Z , Y /Z )
E /Fq : Y 2 Z = X 3 + AXZ 2 + BZ 3
• idea: get rid of the inversion over
• addition: 12M + 2S
• doubling: 7M + 5S
Fq
by using Z as the denominator
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
31 / 43
Other coordinate systems
E /Fq : y 2 = x 3 + Ax + B
I One can use other coordinate systems which provide more efficient formulae
I Projective coordinates: points (X : Y : Z ) with (x, y ) = (X /Z , Y /Z )
E /Fq : Y 2 Z = X 3 + AXZ 2 + BZ 3
• idea: get rid of the inversion over
• addition: 12M + 2S
• doubling: 7M + 5S
Fq
by using Z as the denominator
I Jacobian coordinates: points (X : Y : Z ) with (x, y ) = (X /Z 2 , Y /Z 3 )
E /Fq : Y 2 = X 3 + AXZ 4 + BZ 6
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
31 / 43
Other coordinate systems
E /Fq : y 2 = x 3 + Ax + B
I One can use other coordinate systems which provide more efficient formulae
I Projective coordinates: points (X : Y : Z ) with (x, y ) = (X /Z , Y /Z )
E /Fq : Y 2 Z = X 3 + AXZ 2 + BZ 3
• idea: get rid of the inversion over
• addition: 12M + 2S
• doubling: 7M + 5S
Fq
by using Z as the denominator
I Jacobian coordinates: points (X : Y : Z ) with (x, y ) = (X /Z 2 , Y /Z 3 )
E /Fq : Y 2 = X 3 + AXZ 4 + BZ 6
• addition: 12M + 4S
• doubling: 4M + 6S
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
31 / 43
Other coordinate systems
E /Fq : y 2 = x 3 + Ax + B
I One can use other coordinate systems which provide more efficient formulae
I Projective coordinates: points (X : Y : Z ) with (x, y ) = (X /Z , Y /Z )
E /Fq : Y 2 Z = X 3 + AXZ 2 + BZ 3
• idea: get rid of the inversion over
• addition: 12M + 2S
• doubling: 7M + 5S
Fq
by using Z as the denominator
I Jacobian coordinates: points (X : Y : Z ) with (x, y ) = (X /Z 2 , Y /Z 3 )
E /Fq : Y 2 = X 3 + AXZ 4 + BZ 6
• addition: 12M + 4S
• doubling: 4M + 6S
I And many others: modified jacobian coordinates, López–Dahab (over
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
F2 ), etc.
n
31 / 43
Other coordinate systems
E /Fq : y 2 = x 3 + Ax + B
I One can use other coordinate systems which provide more efficient formulae
I Projective coordinates: points (X : Y : Z ) with (x, y ) = (X /Z , Y /Z )
E /Fq : Y 2 Z = X 3 + AXZ 2 + BZ 3
• idea: get rid of the inversion over
• addition: 12M + 2S
• doubling: 7M + 5S
Fq
by using Z as the denominator
I Jacobian coordinates: points (X : Y : Z ) with (x, y ) = (X /Z 2 , Y /Z 3 )
E /Fq : Y 2 = X 3 + AXZ 4 + BZ 6
• addition: 12M + 4S
• doubling: 4M + 6S
I And many others: modified jacobian coordinates, López–Dahab (over
F2 ), etc.
n
I Explicit-Formula Database (by Bernstein and Lange):
http://hyperelliptic.org/EFD/
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
31 / 43
Outline
I Some encryption mechanisms
I Elliptic curve cryptography
I Scalar multiplication
I Elliptic curve arithmetic
I Finite field arithmetic
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
32 / 43
Implementing finite field arithmetic
I Group law over E (Fq ) requires:
• additions / subtractions over Fq
• multiplications / squarings over Fq
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
33 / 43
Implementing finite field arithmetic
I Group law over E (Fq ) requires:
• additions / subtractions over Fq
• multiplications / squarings over Fq
• a few inversions over Fq
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
33 / 43
Implementing finite field arithmetic
I Group law over E (Fq ) requires:
• additions / subtractions over Fq
• multiplications / squarings over Fq
• a few inversions over Fq
I Typical finite fields
Fq :
• prime field Fp , with p an n-bit prime and n between 250 and 500 bits
• binary field F2n , with n prime and between 250 and 500
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
33 / 43
Implementing finite field arithmetic
I Group law over E (Fq ) requires:
• additions / subtractions over Fq
• multiplications / squarings over Fq
• a few inversions over Fq
I Typical finite fields
Fq :
• prime field Fp , with p an n-bit prime and n between 250 and 500 bits
• binary field F2n , with n prime and between 250 and 500 (... still secure?)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
33 / 43
Implementing finite field arithmetic
I Group law over E (Fq ) requires:
• additions / subtractions over Fq
• multiplications / squarings over Fq
• a few inversions over Fq
I Typical finite fields
Fq :
• prime field Fp , with p an n-bit prime and n between 250 and 500 bits
• binary field F2n , with n prime and between 250 and 500 (... still secure?)
I What we have at our disposal:
• basic integer arithmetic (addition, multiplication)
• left and right shifts
• bitwise logic operations (bitwise NOT, AND, etc.)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
33 / 43
Implementing finite field arithmetic
I Group law over E (Fq ) requires:
• additions / subtractions over Fq
• multiplications / squarings over Fq
• a few inversions over Fq
I Typical finite fields
Fq :
• prime field Fp , with p an n-bit prime and n between 250 and 500 bits
• binary field F2n , with n prime and between 250 and 500 (... still secure?)
I What we have at our disposal:
• basic integer arithmetic (addition, multiplication)
• left and right shifts
• bitwise logic operations (bitwise NOT, AND, etc.)
I ... on w -bit words:
• w = 32 or 64 on CPUs
• w = 8 or 16 bits on microcontrollers
• a bit more flexibility in hardware
(but integer arithmetic with w > 64 bits is hard!)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
33 / 43
Implementing finite field arithmetic
I Group law over E (Fq ) requires:
• additions / subtractions over Fq
• multiplications / squarings over Fq
• a few inversions over Fq
I Typical finite fields
Fq :
• prime field Fp , with p an n-bit prime and n between 250 and 500 bits
• binary field F2n , with n prime and between 250 and 500 (... still secure?)
I What we have at our disposal:
• basic integer arithmetic (addition, multiplication)
• left and right shifts
• bitwise logic operations (bitwise NOT, AND, etc.)
I ... on w -bit words:
• w = 32 or 64 on CPUs
• w = 8 or 16 bits on microcontrollers
• a bit more flexibility in hardware
(but integer arithmetic with w > 64 bits is hard!)
⇒ elements of
Fq
represented using several words
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
33 / 43
Multiprecision representation
I Consider A ∈ FP , with P an n-bit prime
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
34 / 43
Multiprecision representation
I Consider A ∈ FP , with P an n-bit prime
• represent A as an integer modulo P
n
A
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
34 / 43
Multiprecision representation
I Consider A ∈ FP , with P an n-bit prime
• represent A as an integer modulo P
• split A into k = dn/w e w -bit words (or limbs), ak−1 , ..., a1 , a0 :
A = ak−1 2(k−1)w + · · · + a1 2w + a0
w
a3
w
w
w
a2
a1
a0
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
34 / 43
Multiprecision representation
I Consider A ∈ FP , with P an n-bit prime
• represent A as an integer modulo P
• split A into k = dn/w e w -bit words (or limbs), ak−1 , ..., a1 , a0 :
A = ak−1 2(k−1)w + · · · + a1 2w + a0
w
w
w
w
a3
a2
a1
a0
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
34 / 43
Multiprecision representation
I Consider A ∈ FP , with P an n-bit prime
• represent A as an integer modulo P
• split A into k = dn/w e w -bit words (or limbs), ak−1 , ..., a1 , a0 :
A = ak−1 2(k−1)w + · · · + a1 2w + a0
I Addition of A and B ∈ FP :
+
a3
a2
a1
a0
b3
b2
b1
b0
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
34 / 43
Multiprecision representation
I Consider A ∈ FP , with P an n-bit prime
• represent A as an integer modulo P
• split A into k = dn/w e w -bit words (or limbs), ak−1 , ..., a1 , a0 :
A = ak−1 2(k−1)w + · · · + a1 2w + a0
I Addition of A and B ∈ FP :
• right-to-left word-wise addition
+
a3
a2
a1
a0
b3
b2
b1
b0
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
34 / 43
Multiprecision representation
I Consider A ∈ FP , with P an n-bit prime
• represent A as an integer modulo P
• split A into k = dn/w e w -bit words (or limbs), ak−1 , ..., a1 , a0 :
A = ak−1 2(k−1)w + · · · + a1 2w + a0
I Addition of A and B ∈ FP :
• right-to-left word-wise addition
• need to propagate carry
c
+
a3
a2
a1
a0
b3
b2
b1
b0
r0
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
34 / 43
Multiprecision representation
I Consider A ∈ FP , with P an n-bit prime
• represent A as an integer modulo P
• split A into k = dn/w e w -bit words (or limbs), ak−1 , ..., a1 , a0 :
A = ak−1 2(k−1)w + · · · + a1 2w + a0
I Addition of A and B ∈ FP :
• right-to-left word-wise addition
• need to propagate carry
c
+
a3
a2
a1
a0
b3
b2
b1
b0
r0
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
34 / 43
Multiprecision representation
I Consider A ∈ FP , with P an n-bit prime
• represent A as an integer modulo P
• split A into k = dn/w e w -bit words (or limbs), ak−1 , ..., a1 , a0 :
A = ak−1 2(k−1)w + · · · + a1 2w + a0
I Addition of A and B ∈ FP :
• right-to-left word-wise addition
• need to propagate carry
c
+
a3
a2
a1
a0
b3
b2
b1
b0
r1
r0
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
34 / 43
Multiprecision representation
I Consider A ∈ FP , with P an n-bit prime
• represent A as an integer modulo P
• split A into k = dn/w e w -bit words (or limbs), ak−1 , ..., a1 , a0 :
A = ak−1 2(k−1)w + · · · + a1 2w + a0
I Addition of A and B ∈ FP :
• right-to-left word-wise addition
• need to propagate carry
c
+
a3
a2
a1
a0
b3
b2
b1
b0
r2
r1
r0
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
34 / 43
Multiprecision representation
I Consider A ∈ FP , with P an n-bit prime
• represent A as an integer modulo P
• split A into k = dn/w e w -bit words (or limbs), ak−1 , ..., a1 , a0 :
A = ak−1 2(k−1)w + · · · + a1 2w + a0
I Addition of A and B ∈ FP :
• right-to-left word-wise addition
• need to propagate carry
+
c
a3
a2
a1
a0
b3
b2
b1
b0
r3
r2
r1
r0
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
34 / 43
Multiprecision representation
I Consider A ∈ FP , with P an n-bit prime
• represent A as an integer modulo P
• split A into k = dn/w e w -bit words (or limbs), ak−1 , ..., a1 , a0 :
A = ak−1 2(k−1)w + · · · + a1 2w + a0
I Addition of A and B ∈ FP :
• right-to-left word-wise addition
• need to propagate carry
+
c
a3
a2
a1
a0
b3
b2
b1
b0
r3
r2
r1
r0
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
34 / 43
Multiprecision representation
I Consider A ∈ FP , with P an n-bit prime
• represent A as an integer modulo P
• split A into k = dn/w e w -bit words (or limbs), ak−1 , ..., a1 , a0 :
A = ak−1 2(k−1)w + · · · + a1 2w + a0
I Addition of A and B ∈ FP :
• right-to-left word-wise addition
• need to propagate carry
• might need reduction modulo P: compare then subtract (in constant time!)
+
c
?
≥
a3
a2
a1
a0
b3
b2
b1
b0
r3
r2
r1
r0
P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
34 / 43
Multiprecision representation
I Consider A ∈ FP , with P an n-bit prime
• represent A as an integer modulo P
• split A into k = dn/w e w -bit words (or limbs), ak−1 , ..., a1 , a0 :
A = ak−1 2(k−1)w + · · · + a1 2w + a0
I Addition of A and B ∈ FP :
• right-to-left word-wise addition
• need to propagate carry
• might need reduction modulo P: compare then subtract (in constant time!)
+
c
?
≥
a3
a2
a1
a0
b3
b2
b1
b0
r3
r2
r1
r0
p3
p2
p1
p0
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
34 / 43
Multiprecision representation
I Consider A ∈ FP , with P an n-bit prime
• represent A as an integer modulo P
• split A into k = dn/w e w -bit words (or limbs), ak−1 , ..., a1 , a0 :
A = ak−1 2(k−1)w + · · · + a1 2w + a0
I Addition of A and B ∈ FP :
• right-to-left word-wise addition
• need to propagate carry
• might need reduction modulo P: compare then subtract (in constant time!)
+
c
−
a3
a2
a1
a0
b3
b2
b1
b0
r3
r2
r1
r0
p3
p2
p1
p0
r30
r20
r10
r00
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
34 / 43
Multiprecision representation
I Consider A ∈ FP , with P an n-bit prime
• represent A as an integer modulo P
• split A into k = dn/w e w -bit words (or limbs), ak−1 , ..., a1 , a0 :
A = ak−1 2(k−1)w + · · · + a1 2w + a0
I Addition of A and B ∈ FP :
•
•
•
•
right-to-left word-wise addition
need to propagate carry
might need reduction modulo P: compare then subtract (in constant time!)
lazy reduction: if kw > n, do not reduce after each addition
+
c
−
a3
a2
a1
a0
b3
b2
b1
b0
r3
r2
r1
r0
p3
p2
p1
p0
r30
r20
r10
r00
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
34 / 43
MP multiplication
I Multiplication of A and B ∈ FP :
×
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
a3
a2
a1
a0
b3
b2
b1
b0
35 / 43
MP multiplication
I Multiplication of A and B ∈ FP :
• schoolbook method: k 2 w -by-w -bit products
×
a3
a2
a1
a0
b3
b2
b1
b0
a2 b0
a3 b3
a0 b0
a3 b0
a1 b0
a2 b1
a0 b1
a3 b1
a1 b1
a2 b2
a0 b2
a3 b2
a1 b2
a2 b3
a0 b3
a1 b3
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
35 / 43
MP multiplication
I Multiplication of A and B ∈ FP :
• schoolbook method: k 2 w -by-w -bit products
×
a3
a2
a1
a0
b3
b2
b1
b0
a2 b0
a0 b0
+
a3 b0
a1 b0
+
a2 b1
a0 b1
+
a3 b1
a1 b1
+
a2 b2
a0 b2
+
a3 b2
a1 b2
+
a2 b3
a0 b3
+
a3 b3
r7
a1 b3
r6
r5
r4
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
r3
r2
r1
r0
35 / 43
MP multiplication
I Multiplication of A and B ∈ FP :
• schoolbook method: k 2 w -by-w -bit products
• subquadratic algorithms (e.g., Karatsuba) when k is large
×
a3
a2
a1
a0
b3
b2
b1
b0
a2 b0
a0 b0
+
a3 b0
a1 b0
+
a2 b1
a0 b1
+
a3 b1
a1 b1
+
a2 b2
a0 b2
+
a3 b2
a1 b2
+
a2 b3
a0 b3
+
a3 b3
r7
a1 b3
r6
r5
r4
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
r3
r2
r1
r0
35 / 43
MP multiplication
I Multiplication of A and B ∈ FP :
• schoolbook method: k 2 w -by-w -bit products
• subquadratic algorithms (e.g., Karatsuba) when k is large
• final product fits into 2k words → requires reduction modulo P (see later)
×
a3
a2
a1
a0
b3
b2
b1
b0
a2 b0
a0 b0
+
a3 b0
a1 b0
+
a2 b1
a0 b1
+
a3 b1
a1 b1
+
a2 b2
a0 b2
+
a3 b2
a1 b2
+
a2 b3
a0 b3
+
a3 b3
r7
a1 b3
r6
r5
r4
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
r3
r2
r1
r0
35 / 43
MP multiplication
I Multiplication of A and B ∈ FP :
• schoolbook method: k 2 w -by-w -bit products
• subquadratic algorithms (e.g., Karatsuba) when k is large
• final product fits into 2k words → requires reduction modulo P (see later)
• should run in constant time (for fixed P)!
×
a3
a2
a1
a0
b3
b2
b1
b0
a2 b0
a0 b0
+
a3 b0
a1 b0
+
a2 b1
a0 b1
+
a3 b1
a1 b1
+
a2 b2
a0 b2
+
a3 b2
a1 b2
+
a2 b3
a0 b3
+
a3 b3
r7
a1 b3
r6
r5
r4
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
r3
r2
r1
r0
35 / 43
MP modular reduction
I Given an integer A < P 2 (on 2k words), compute R = A mod P
2n
A
P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
36 / 43
MP modular reduction
I Given an integer A < P 2 (on 2k words), compute R = A mod P
I Easy case: P is a pseudo-Mersenne prime P = 2n − c with c “small” (e.g., < 2w )
2n
A
P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
36 / 43
MP modular reduction
I Given an integer A < P 2 (on 2k words), compute R = A mod P
I Easy case: P is a pseudo-Mersenne prime P = 2n − c with c “small” (e.g., < 2w )
• then 2n ≡ c (mod P)
2n
A
P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
36 / 43
MP modular reduction
I Given an integer A < P 2 (on 2k words), compute R = A mod P
I Easy case: P is a pseudo-Mersenne prime P = 2n − c with c “small” (e.g., < 2w )
• then 2n ≡ c (mod P)
• split A wrt. 2n : A = AH 2n + AL
n
n
AH
AL
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
36 / 43
MP modular reduction
I Given an integer A < P 2 (on 2k words), compute R = A mod P
I Easy case: P is a pseudo-Mersenne prime P = 2n − c with c “small” (e.g., < 2w )
• then 2n ≡ c (mod P)
• split A wrt. 2n : A = AH 2n + AL
• compute A0 ← c · AH + AL (one 1 × k-word multiplication)
AL
AH
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
36 / 43
MP modular reduction
I Given an integer A < P 2 (on 2k words), compute R = A mod P
I Easy case: P is a pseudo-Mersenne prime P = 2n − c with c “small” (e.g., < 2w )
• then 2n ≡ c (mod P)
• split A wrt. 2n : A = AH 2n + AL
• compute A0 ← c · AH + AL (one 1 × k-word multiplication)
AL
c · AH
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
36 / 43
MP modular reduction
I Given an integer A < P 2 (on 2k words), compute R = A mod P
I Easy case: P is a pseudo-Mersenne prime P = 2n − c with c “small” (e.g., < 2w )
• then 2n ≡ c (mod P)
• split A wrt. 2n : A = AH 2n + AL
• compute A0 ← c · AH + AL (one 1 × k-word multiplication)
• rinse & repeat (one 1 × 1-word multiplication)
AL
c · AH
+
A0H
A0L
≤w
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
36 / 43
MP modular reduction
I Given an integer A < P 2 (on 2k words), compute R = A mod P
I Easy case: P is a pseudo-Mersenne prime P = 2n − c with c “small” (e.g., < 2w )
• then 2n ≡ c (mod P)
• split A wrt. 2n : A = AH 2n + AL
• compute A0 ← c · AH + AL (one 1 × k-word multiplication)
• rinse & repeat (one 1 × 1-word multiplication)
AL
+
c · AH
A0L
c · A0H
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
36 / 43
MP modular reduction
I Given an integer A < P 2 (on 2k words), compute R = A mod P
I Easy case: P is a pseudo-Mersenne prime P = 2n − c with c “small” (e.g., < 2w )
• then 2n ≡ c (mod P)
• split A wrt. 2n : A = AH 2n + AL
• compute A0 ← c · AH + AL (one 1 × k-word multiplication)
• rinse & repeat (one 1 × 1-word multiplication)
AL
c · AH
+
A0L
c · A0H
+
A00
≤1
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
36 / 43
MP modular reduction
I Given an integer A < P 2 (on 2k words), compute R = A mod P
I Easy case: P is a pseudo-Mersenne prime P = 2n − c with c “small” (e.g., < 2w )
• then 2n ≡ c (mod P)
• split A wrt. 2n : A = AH 2n + AL
• compute A0 ← c · AH + AL (one 1 × k-word multiplication)
• rinse & repeat (one 1 × 1-word multiplication)
• final subtraction might be necessary
AL
c · AH
+
A0L
c · A0H
+
A00
−
P
A mod P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
36 / 43
MP modular reduction
I Given an integer A < P 2 (on 2k words), compute R = A mod P
I Easy case: P is a pseudo-Mersenne prime P = 2n − c with c “small” (e.g., < 2w )
• then 2n ≡ c (mod P)
• split A wrt. 2n : A = AH 2n + AL
• compute A0 ← c · AH + AL (one 1 × k-word multiplication)
• rinse & repeat (one 1 × 1-word multiplication)
• final subtraction might be necessary
I Examples: P = 2255 − 19 (Curve25519) or P = 2448 − 2224 − 1 (Ed448-Goldilocks)
AL
c · AH
+
A0L
c · A0H
+
A00
−
P
A mod P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
36 / 43
MP modular reduction: general case
I Idea: find quotient Q = bA/Pc, then take remainder as A − QP
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
37 / 43
MP modular reduction: general case
I Idea: find quotient Q = bA/Pc, then take remainder as A − QP
• Euclidean division is way too expensive!
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
37 / 43
MP modular reduction: general case
I Idea: find quotient Q = bA/Pc, then take remainder as A − QP
• Euclidean division is way too expensive!
• since P is fixed, precompute 1/P with enough precision
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
37 / 43
MP modular reduction: general case
I Idea: find quotient Q = bA/Pc, then take remainder as A − QP
• Euclidean division is way too expensive!
• since P is fixed, precompute 1/P with enough precision
I Barrett reduction:
p3
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
p2
p1
p0
37 / 43
MP modular reduction: general case
I Idea: find quotient Q = bA/Pc, then take remainder as A − QP
• Euclidean division is way too expensive!
• since P is fixed, precompute 1/P with enough precision
I Barrett reduction:
• precompute P 0 = b22kw /Pc (k + 1 words)
0.
0
0
0
p40
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
p30
p20
p10
p00
37 / 43
MP modular reduction: general case
I Idea: find quotient Q = bA/Pc, then take remainder as A − QP
• Euclidean division is way too expensive!
• since P is fixed, precompute 1/P with enough precision
I Barrett reduction:
• precompute P 0 = b22kw /Pc (k + 1 words)
p40
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
p30
p20
p10
p00
37 / 43
MP modular reduction: general case
I Idea: find quotient Q = bA/Pc, then take remainder as A − QP
• Euclidean division is way too expensive!
• since P is fixed, precompute 1/P with enough precision
I Barrett reduction:
• precompute P 0 = b22kw /Pc (k + 1 words)
• given A < P 2 , get the k + 1 most significant words AH ← bA/2(k−1)w c
a7
a6
a5
p40
p30
p20
p10
p00
a4
a3
a2
a1
a0
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
37 / 43
MP modular reduction: general case
I Idea: find quotient Q = bA/Pc, then take remainder as A − QP
• Euclidean division is way too expensive!
• since P is fixed, precompute 1/P with enough precision
I Barrett reduction:
• precompute P 0 = b22kw /Pc (k + 1 words)
• given A < P 2 , get the k + 1 most significant words AH ← bA/2(k−1)w c
a7
a6
a5
p40
p30
p20
p10
p00
a4
a3
a2
a1
a0
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
37 / 43
MP modular reduction: general case
I Idea: find quotient Q = bA/Pc, then take remainder as A − QP
• Euclidean division is way too expensive!
• since P is fixed, precompute 1/P with enough precision
I Barrett reduction:
• precompute P 0 = b22kw /Pc (k + 1 words)
• given A < P 2 , get the k + 1 most significant words AH ← bA/2(k−1)w c
p40
p30
p20
p10
p00
a7
a6
a5
a4
a3
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
37 / 43
MP modular reduction: general case
I Idea: find quotient Q = bA/Pc, then take remainder as A − QP
• Euclidean division is way too expensive!
• since P is fixed, precompute 1/P with enough precision
I Barrett reduction:
• precompute P 0 = b22kw /Pc (k + 1 words)
• given A < P 2 , get the k + 1 most significant words AH ← bA/2(k−1)w c
• compute Q̃ ← bAH · P 0 /2(k+1)w c (one (k + 1) × (k + 1)-word multiplication)
×
q̃8
q̃7
q̃6
q̃5
p40
p30
p20
p10
p00
a7
a6
a5
a4
a3
q̃4
q̃3
q̃2
q̃1
q̃0
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
37 / 43
MP modular reduction: general case
I Idea: find quotient Q = bA/Pc, then take remainder as A − QP
• Euclidean division is way too expensive!
• since P is fixed, precompute 1/P with enough precision
I Barrett reduction:
• precompute P 0 = b22kw /Pc (k + 1 words)
• given A < P 2 , get the k + 1 most significant words AH ← bA/2(k−1)w c
• compute Q̃ ← bAH · P 0 /2(k+1)w c (one (k + 1) × (k + 1)-word multiplication)
×
q̃8
q̃7
q̃6
q̃5
p40
p30
p20
p10
p00
a7
a6
a5
a4
a3
q̃4
q̃3
q̃2
q̃1
q̃0
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
37 / 43
MP modular reduction: general case
I Idea: find quotient Q = bA/Pc, then take remainder as A − QP
• Euclidean division is way too expensive!
• since P is fixed, precompute 1/P with enough precision
I Barrett reduction:
• precompute P 0 = b22kw /Pc (k + 1 words)
• given A < P 2 , get the k + 1 most significant words AH ← bA/2(k−1)w c
• compute Q̃ ← bAH · P 0 /2(k+1)w c (one (k + 1) × (k + 1)-word multiplication)
×
p40
p30
p20
p10
p00
a7
a6
a5
a4
a3
q̃9
q̃8
q̃6
q̃5
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
37 / 43
MP modular reduction: general case
I Idea: find quotient Q = bA/Pc, then take remainder as A − QP
• Euclidean division is way too expensive!
• since P is fixed, precompute 1/P with enough precision
I Barrett reduction:
• precompute P 0 = b22kw /Pc (k + 1 words)
• given A < P 2 , get the k + 1 most significant words AH ← bA/2(k−1)w c
• compute Q̃ ← bAH · P 0 /2(k+1)w c (one (k + 1) × (k + 1)-word multiplication)
• compute à ← Q̃ · P
(one k × k-word multiplication)
×
p40
p30
p20
p10
p00
a7
a6
a5
a4
a3
q̃9
q̃8
q̃6
q̃5
p3
p2
p1
p0
ã3
ã2
ã1
ã0
×
ã7
ã6
ã5
ã4
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
37 / 43
MP modular reduction: general case
I Idea: find quotient Q = bA/Pc, then take remainder as A − QP
• Euclidean division is way too expensive!
• since P is fixed, precompute 1/P with enough precision
I Barrett reduction:
• precompute P 0 = b22kw /Pc (k + 1 words)
• given A < P 2 , get the k + 1 most significant words AH ← bA/2(k−1)w c
• compute Q̃ ← bAH · P 0 /2(k+1)w c (one (k + 1) × (k + 1)-word multiplication)
• compute à ← Q̃ · P
(one k × k-word multiplication)
×
p40
p30
p20
p10
p00
a7
a6
a5
a4
a3
q̃9
q̃8
q̃6
q̃5
p3
p2
p1
p0
×
ã7
ã6
ã5
ã4
ã3
ã2
ã1
ã0
a7
a6
a5
a4
a3
a2
a1
a0
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
37 / 43
MP modular reduction: general case
I Idea: find quotient Q = bA/Pc, then take remainder as A − QP
• Euclidean division is way too expensive!
• since P is fixed, precompute 1/P with enough precision
I Barrett reduction:
• precompute P 0 = b22kw /Pc (k + 1 words)
• given A < P 2 , get the k + 1 most significant words AH ← bA/2(k−1)w c
• compute Q̃ ← bAH · P 0 /2(k+1)w c (one (k + 1) × (k + 1)-word multiplication)
• compute à ← Q̃ · P
(one k × k-word multiplication)
• compute remainder R ← A − Ã
×
p40
p30
p20
p10
p00
a7
a6
a5
a4
a3
q̃9
q̃8
q̃6
q̃5
p3
p2
p1
p0
×
−
ã7
ã6
ã5
ã4
ã3
ã2
ã1
ã0
+
a7
a6
a5
a4
a3
a2
a1
a0
r3
r2
r1
r0
r4
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
37 / 43
MP modular reduction: general case
I Idea: find quotient Q = bA/Pc, then take remainder as A − QP
• Euclidean division is way too expensive!
• since P is fixed, precompute 1/P with enough precision
I Barrett reduction:
• precompute P 0 = b22kw /Pc (k + 1 words)
• given A < P 2 , get the k + 1 most significant words AH ← bA/2(k−1)w c
• compute Q̃ ← bAH · P 0 /2(k+1)w c (one (k + 1) × (k + 1)-word multiplication)
• compute à ← (Q̃ · P) mod 2(k+1)w (one k × k-word short multiplication)
• compute remainder R ← (A − Ã) mod 2(k+1)w
×
p40
p30
p20
p10
p00
a7
a6
a5
a4
a3
q̃9
q̃8
q̃6
q̃5
p3
p2
p1
p0
×
−
ã7
ã6
ã5
ã4
ã3
ã2
ã1
ã0
+
a7
a6
a5
a4
a3
a2
a1
a0
r3
r2
r1
r0
r4
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
37 / 43
MP modular reduction: general case
I Idea: find quotient Q = bA/Pc, then take remainder as A − QP
• Euclidean division is way too expensive!
• since P is fixed, precompute 1/P with enough precision
I Barrett reduction:
• precompute P 0 = b22kw /Pc (k + 1 words)
• given A < P 2 , get the k + 1 most significant words AH ← bA/2(k−1)w c
• compute Q̃ ← bAH · P 0 /2(k+1)w c (one (k + 1) × (k + 1)-word multiplication)
• compute à ← (Q̃ · P) mod 2(k+1)w (one k × k-word short multiplication)
• compute remainder R ← (A − Ã) mod 2(k+1)w
• since Q − 2 ≤ Q̃ ≤ Q, at most two final subtractions
×
p40
p30
p20
p10
p00
a7
a6
a5
a4
a3
q̃9
q̃8
q̃6
q̃5
p3
p2
p1
p0
×
−
ã7
ã6
ã5
ã4
ã3
ã2
ã1
ã0
+
a7
a6
a5
a4
a3
a2
a1
a0
r3
r2
r1
r0
r4
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
37 / 43
MP modular reduction: general case
I Montgomery reduction (REDC): like Barrett, but on the least significant words
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
38 / 43
MP modular reduction: general case
I Montgomery reduction (REDC): like Barrett, but on the least significant words
• requires P odd (on k words) and A < 2kw P
p3
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
p2
p1
p0
38 / 43
MP modular reduction: general case
I Montgomery reduction (REDC): like Barrett, but on the least significant words
• requires P odd (on k words) and A < 2kw P
• precompute P 0 ← (−P −1 ) mod 2kw (on k words)
p30
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
p20
p10
p00
38 / 43
MP modular reduction: general case
I Montgomery reduction (REDC): like Barrett, but on the least significant words
• requires P odd (on k words) and A < 2kw P
• precompute P 0 ← (−P −1 ) mod 2kw (on k words)
• given A, compute K ← (A · P 0 ) mod 2kw (one k × k-word short multiplication)
a7
a6
a5
a4
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
p30
p20
p10
p00
a3
a2
a1
a0
38 / 43
MP modular reduction: general case
I Montgomery reduction (REDC): like Barrett, but on the least significant words
• requires P odd (on k words) and A < 2kw P
• precompute P 0 ← (−P −1 ) mod 2kw (on k words)
• given A, compute K ← (A · P 0 ) mod 2kw (one k × k-word short multiplication)
a7
a6
a5
a4
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
p30
p20
p10
p00
a3
a2
a1
a0
38 / 43
MP modular reduction: general case
I Montgomery reduction (REDC): like Barrett, but on the least significant words
• requires P odd (on k words) and A < 2kw P
• precompute P 0 ← (−P −1 ) mod 2kw (on k words)
• given A, compute K ← (A · P 0 ) mod 2kw (one k × k-word short multiplication)
×
p30
p20
p10
p00
a7
a6
a5
a4
a3
a2
a1
a0
k7
k6
k5
k4
k3
k2
k1
k0
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
38 / 43
MP modular reduction: general case
I Montgomery reduction (REDC): like Barrett, but on the least significant words
• requires P odd (on k words) and A < 2kw P
• precompute P 0 ← (−P −1 ) mod 2kw (on k words)
• given A, compute K ← (A · P 0 ) mod 2kw (one k × k-word short multiplication)
• compute à ← K · P (one k × k-word multiplication)
×
p30
p20
p10
p00
a7
a6
a5
a4
a3
a2
a1
a0
k7
k6
k5
k4
k3
k2
k1
k0
p3
p2
p1
p0
ã3
ã2
ã1
ã0
×
ã7
ã6
ã5
ã4
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
38 / 43
MP modular reduction: general case
I Montgomery reduction (REDC): like Barrett, but on the least significant words
• requires P odd (on k words) and A < 2kw P
• precompute P 0 ← (−P −1 ) mod 2kw (on k words)
• given A, compute K ← (A · P 0 ) mod 2kw (one k × k-word short multiplication)
• compute à ← K · P (one k × k-word multiplication)
• compute remainder R ← A + Ã
×
p30
p20
p10
p00
a7
a6
a5
a4
a3
a2
a1
a0
k7
k6
k5
k4
k3
k2
k1
k0
p3
p2
p1
p0
×
ã7
ã6
ã5
ã4
ã3
ã2
ã1
ã0
a7
a6
a5
a4
a3
a2
a1
a0
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
38 / 43
MP modular reduction: general case
I Montgomery reduction (REDC): like Barrett, but on the least significant words
• requires P odd (on k words) and A < 2kw P
• precompute P 0 ← (−P −1 ) mod 2kw (on k words)
• given A, compute K ← (A · P 0 ) mod 2kw (one k × k-word short multiplication)
• compute à ← K · P (one k × k-word multiplication)
• compute remainder R ← A + Ã
×
r8
p20
p10
p00
a7
a6
a5
a4
a3
a2
a1
a0
k7
k6
k5
k4
k3
k2
k1
k0
p3
p2
p1
p0
×
+
p30
ã7
ã6
ã5
ã4
ã3
ã2
ã1
ã0
a7
a6
a5
a4
a3
a2
a1
a0
r7
r6
r5
r4
r3
r2
r1
r0
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
38 / 43
MP modular reduction: general case
I Montgomery reduction (REDC): like Barrett, but on the least significant words
• requires P odd (on k words) and A < 2kw P
• precompute P 0 ← (−P −1 ) mod 2kw (on k words)
• given A, compute K ← (A · P 0 ) mod 2kw (one k × k-word short multiplication)
• compute à ← K · P (one k × k-word multiplication)
• compute remainder R ← A + Ã
×
r8
p20
p10
p00
a7
a6
a5
a4
a3
a2
a1
a0
k7
k6
k5
k4
k3
k2
k1
k0
p3
p2
p1
p0
×
+
p30
ã7
ã6
ã5
ã4
ã3
ã2
ã1
ã0
a7
a6
a5
a4
a3
a2
a1
a0
r7
r6
r5
r4
0
0
0
0
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
38 / 43
MP modular reduction: general case
I Montgomery reduction (REDC): like Barrett, but on the least significant words
• requires P odd (on k words) and A < 2kw P
• precompute P 0 ← (−P −1 ) mod 2kw (on k words)
• given A, compute K ← (A · P 0 ) mod 2kw (one k × k-word short multiplication)
• compute à ← K · P (one k × k-word multiplication)
• compute remainder R ← A + Ã
×
r8
p20
p10
p00
a7
a6
a5
a4
a3
a2
a1
a0
k7
k6
k5
k4
k3
k2
k1
k0
p3
p2
p1
p0
×
+
p30
ã7
ã6
ã5
ã4
ã3
ã2
ã1
ã0
a7
a6
a5
a4
a3
a2
a1
a0
r7
r6
r5
r4
0
0
0
0
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
38 / 43
MP modular reduction: general case
I Montgomery reduction (REDC): like Barrett, but on the least significant words
• requires P odd (on k words) and A < 2kw P
• precompute P 0 ← (−P −1 ) mod 2kw (on k words)
• given A, compute K ← (A · P 0 ) mod 2kw (one k × k-word short multiplication)
• compute à ← K · P (one k × k-word multiplication)
• compute remainder R ← (A + Ã)/2kw
×
p20
p10
p00
a7
a6
a5
a4
a3
a2
a1
a0
k7
k6
k5
k4
k3
k2
k1
k0
p3
p2
p1
p0
×
+
p30
ã7
ã6
ã5
ã4
ã3
ã2
ã1
ã0
a7
a6
a5
a4
a3
a2
a1
a0
r7
r6
r5
r4
r8
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
38 / 43
MP modular reduction: general case
I Montgomery reduction (REDC): like Barrett, but on the least significant words
• requires P odd (on k words) and A < 2kw P
• precompute P 0 ← (−P −1 ) mod 2kw (on k words)
• given A, compute K ← (A · P 0 ) mod 2kw (one k × k-word short multiplication)
• compute à ← K · P (one k × k-word multiplication)
• compute remainder R ← (A + Ã)/2kw
• at most one extra subtraction
×
p20
p10
p00
a7
a6
a5
a4
a3
a2
a1
a0
k7
k6
k5
k4
k3
k2
k1
k0
p3
p2
p1
p0
×
+
p30
ã7
ã6
ã5
ã4
ã3
ã2
ã1
ã0
a7
a6
a5
a4
a3
a2
a1
a0
r7
r6
r5
r4
r8
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
38 / 43
MP modular reduction: general case
I Montgomery reduction (REDC): like Barrett, but on the least significant words
• requires P odd (on k words) and A < 2kw P
• precompute P 0 ← (−P −1 ) mod 2kw (on k words)
• given A, compute K ← (A · P 0 ) mod 2kw (one k × k-word short multiplication)
• compute à ← K · P (one k × k-word multiplication)
• compute remainder R ← (A + Ã)/2kw
• at most one extra subtraction
I REDC(A) returns R = (A · 2−kw ) mod P, not A mod P!
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
38 / 43
MP modular reduction: general case
I Montgomery reduction (REDC): like Barrett, but on the least significant words
• requires P odd (on k words) and A < 2kw P
• precompute P 0 ← (−P −1 ) mod 2kw (on k words)
• given A, compute K ← (A · P 0 ) mod 2kw (one k × k-word short multiplication)
• compute à ← K · P (one k × k-word multiplication)
• compute remainder R ← (A + Ã)/2kw
• at most one extra subtraction
I REDC(A) returns R = (A · 2−kw ) mod P, not A mod P!
• represent X ∈ FP in Montgomery representation: X̂ = (X · 2kw ) mod P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
38 / 43
MP modular reduction: general case
I Montgomery reduction (REDC): like Barrett, but on the least significant words
• requires P odd (on k words) and A < 2kw P
• precompute P 0 ← (−P −1 ) mod 2kw (on k words)
• given A, compute K ← (A · P 0 ) mod 2kw (one k × k-word short multiplication)
• compute à ← K · P (one k × k-word multiplication)
• compute remainder R ← (A + Ã)/2kw
• at most one extra subtraction
I REDC(A) returns R = (A · 2−kw ) mod P, not A mod P!
• represent X ∈ FP in Montgomery representation: X̂ = (X · 2kw ) mod P
• if Z = (X · Y ) mod P, then
REDC(X̂ · Ŷ ) = (X · Y · 2kw ) mod P = Ẑ
→ that’s the so-called Montgomery multiplication
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
38 / 43
MP modular reduction: general case
I Montgomery reduction (REDC): like Barrett, but on the least significant words
• requires P odd (on k words) and A < 2kw P
• precompute P 0 ← (−P −1 ) mod 2kw (on k words)
• given A, compute K ← (A · P 0 ) mod 2kw (one k × k-word short multiplication)
• compute à ← K · P (one k × k-word multiplication)
• compute remainder R ← (A + Ã)/2kw
• at most one extra subtraction
I REDC(A) returns R = (A · 2−kw ) mod P, not A mod P!
• represent X ∈ FP in Montgomery representation: X̂ = (X · 2kw ) mod P
• if Z = (X · Y ) mod P, then
REDC(X̂ · Ŷ ) = (X · Y · 2kw ) mod P = Ẑ
→ that’s the so-called Montgomery multiplication
• conversions:
X̂ = REDC(X , 22kw mod P)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
and
X = REDC(X̂ , 1)
38 / 43
MP modular reduction: general case
I Montgomery reduction (REDC): like Barrett, but on the least significant words
• requires P odd (on k words) and A < 2kw P
• precompute P 0 ← (−P −1 ) mod 2kw (on k words)
• given A, compute K ← (A · P 0 ) mod 2kw (one k × k-word short multiplication)
• compute à ← K · P (one k × k-word multiplication)
• compute remainder R ← (A + Ã)/2kw
• at most one extra subtraction
I REDC(A) returns R = (A · 2−kw ) mod P, not A mod P!
• represent X ∈ FP in Montgomery representation: X̂ = (X · 2kw ) mod P
• if Z = (X · Y ) mod P, then
REDC(X̂ · Ŷ ) = (X · Y · 2kw ) mod P = Ẑ
→ that’s the so-called Montgomery multiplication
• conversions:
X̂ = REDC(X , 22kw mod P)
and
X = REDC(X̂ , 1)
• Montgomery representation is compatible with addition / subtraction in
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
FP
38 / 43
MP modular reduction: general case
I Montgomery reduction (REDC): like Barrett, but on the least significant words
• requires P odd (on k words) and A < 2kw P
• precompute P 0 ← (−P −1 ) mod 2kw (on k words)
• given A, compute K ← (A · P 0 ) mod 2kw (one k × k-word short multiplication)
• compute à ← K · P (one k × k-word multiplication)
• compute remainder R ← (A + Ã)/2kw
• at most one extra subtraction
I REDC(A) returns R = (A · 2−kw ) mod P, not A mod P!
• represent X ∈ FP in Montgomery representation: X̂ = (X · 2kw ) mod P
• if Z = (X · Y ) mod P, then
REDC(X̂ · Ŷ ) = (X · Y · 2kw ) mod P = Ẑ
→ that’s the so-called Montgomery multiplication
• conversions:
X̂ = REDC(X , 22kw mod P)
and
X = REDC(X̂ , 1)
• Montgomery representation is compatible with addition / subtraction in FP
⇒ do all computations in Montgomery repr. instead of converting back and forth
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
38 / 43
MP modular reduction: general case
I Montgomery reduction (REDC): like Barrett, but on the least significant words
• requires P odd (on k words) and A < 2kw P
• precompute P 0 ← (−P −1 ) mod 2kw (on k words)
• given A, compute K ← (A · P 0 ) mod 2kw (one k × k-word short multiplication)
• compute à ← K · P (one k × k-word multiplication)
• compute remainder R ← (A + Ã)/2kw
• at most one extra subtraction
I REDC(A) returns R = (A · 2−kw ) mod P, not A mod P!
• represent X ∈ FP in Montgomery representation: X̂ = (X · 2kw ) mod P
• if Z = (X · Y ) mod P, then
REDC(X̂ · Ŷ ) = (X · Y · 2kw ) mod P = Ẑ
→ that’s the so-called Montgomery multiplication
• conversions:
X̂ = REDC(X , 22kw mod P)
and
X = REDC(X̂ , 1)
• Montgomery representation is compatible with addition / subtraction in FP
⇒ do all computations in Montgomery repr. instead of converting back and forth
I REDC can be computed iteratively (one word at a time) and
interleaved with the computation of X̂ · Ŷ
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
38 / 43
MP field inversion
I Given A ∈ F∗P , compute A−1 mod P
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
39 / 43
MP field inversion
I Given A ∈ F∗P , compute A−1 mod P
I Extended Euclidean algorithm:
• compute Bézout’s coefficients: U and V such that UA + VP = gcd(A, P) = 1
• then UA ≡ 1 (mod P) and A−1 ≡ U (mod P)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
39 / 43
MP field inversion
I Given A ∈ F∗P , compute A−1 mod P
I Extended Euclidean algorithm:
• compute Bézout’s coefficients: U and V such that UA + VP = gcd(A, P) = 1
• then UA ≡ 1 (mod P) and A−1 ≡ U (mod P)
• fast, but running time depends on A
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
39 / 43
MP field inversion
I Given A ∈ F∗P , compute A−1 mod P
I Extended Euclidean algorithm:
• compute Bézout’s coefficients: U and V such that UA + VP = gcd(A, P) = 1
• then UA ≡ 1 (mod P) and A−1 ≡ U (mod P)
• fast, but running time depends on A
⇒ requires randomization of A to protect against timing attacks
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
39 / 43
MP field inversion
I Given A ∈ F∗P , compute A−1 mod P
I Extended Euclidean algorithm:
• compute Bézout’s coefficients: U and V such that UA + VP = gcd(A, P) = 1
• then UA ≡ 1 (mod P) and A−1 ≡ U (mod P)
• fast, but running time depends on A
⇒ requires randomization of A to protect against timing attacks
I Fermat’s little theorem:
• we know that AP−1 ≡ 1 (mod P), whence AP−2 ≡ A−1 (mod P)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
39 / 43
MP field inversion
I Given A ∈ F∗P , compute A−1 mod P
I Extended Euclidean algorithm:
• compute Bézout’s coefficients: U and V such that UA + VP = gcd(A, P) = 1
• then UA ≡ 1 (mod P) and A−1 ≡ U (mod P)
• fast, but running time depends on A
⇒ requires randomization of A to protect against timing attacks
I Fermat’s little theorem:
• we know that AP−1 ≡ 1 (mod P), whence AP−2 ≡ A−1 (mod P)
• precompute short sequence of squarings and multiplications for fast
exponentiation of A
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
39 / 43
MP field inversion
I Given A ∈ F∗P , compute A−1 mod P
I Extended Euclidean algorithm:
• compute Bézout’s coefficients: U and V such that UA + VP = gcd(A, P) = 1
• then UA ≡ 1 (mod P) and A−1 ≡ U (mod P)
• fast, but running time depends on A
⇒ requires randomization of A to protect against timing attacks
I Fermat’s little theorem:
• we know that AP−1 ≡ 1 (mod P), whence AP−2 ≡ A−1 (mod P)
• precompute short sequence of squarings and multiplications for fast
exponentiation of A
• example: P = 2255 − 19 in 11M and 254S [Bernstein, 2006]
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
39 / 43
MP field inversion
I Given A ∈ F∗P , compute A−1 mod P
I Extended Euclidean algorithm:
• compute Bézout’s coefficients: U and V such that UA + VP = gcd(A, P) = 1
• then UA ≡ 1 (mod P) and A−1 ≡ U (mod P)
• fast, but running time depends on A
⇒ requires randomization of A to protect against timing attacks
I Fermat’s little theorem:
• we know that AP−1 ≡ 1 (mod P), whence AP−2 ≡ A−1 (mod P)
• precompute short sequence of squarings and multiplications for fast
exponentiation of A
• example: P = 2255 − 19 in 11M and 254S [Bernstein, 2006]
A
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
39 / 43
MP field inversion
I Given A ∈ F∗P , compute A−1 mod P
I Extended Euclidean algorithm:
• compute Bézout’s coefficients: U and V such that UA + VP = gcd(A, P) = 1
• then UA ≡ 1 (mod P) and A−1 ≡ U (mod P)
• fast, but running time depends on A
⇒ requires randomization of A to protect against timing attacks
I Fermat’s little theorem:
• we know that AP−1 ≡ 1 (mod P), whence AP−2 ≡ A−1 (mod P)
• precompute short sequence of squarings and multiplications for fast
exponentiation of A
• example: P = 2255 − 19 in 11M and 254S [Bernstein, 2006]
A
S
A2
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
39 / 43
MP field inversion
I Given A ∈ F∗P , compute A−1 mod P
I Extended Euclidean algorithm:
• compute Bézout’s coefficients: U and V such that UA + VP = gcd(A, P) = 1
• then UA ≡ 1 (mod P) and A−1 ≡ U (mod P)
• fast, but running time depends on A
⇒ requires randomization of A to protect against timing attacks
I Fermat’s little theorem:
• we know that AP−1 ≡ 1 (mod P), whence AP−2 ≡ A−1 (mod P)
• precompute short sequence of squarings and multiplications for fast
exponentiation of A
• example: P = 2255 − 19 in 11M and 254S [Bernstein, 2006]
A
S
A2
S
A4
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
39 / 43
MP field inversion
I Given A ∈ F∗P , compute A−1 mod P
I Extended Euclidean algorithm:
• compute Bézout’s coefficients: U and V such that UA + VP = gcd(A, P) = 1
• then UA ≡ 1 (mod P) and A−1 ≡ U (mod P)
• fast, but running time depends on A
⇒ requires randomization of A to protect against timing attacks
I Fermat’s little theorem:
• we know that AP−1 ≡ 1 (mod P), whence AP−2 ≡ A−1 (mod P)
• precompute short sequence of squarings and multiplications for fast
exponentiation of A
• example: P = 2255 − 19 in 11M and 254S [Bernstein, 2006]
A
S
2
A
S2
A8
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
39 / 43
MP field inversion
I Given A ∈ F∗P , compute A−1 mod P
I Extended Euclidean algorithm:
• compute Bézout’s coefficients: U and V such that UA + VP = gcd(A, P) = 1
• then UA ≡ 1 (mod P) and A−1 ≡ U (mod P)
• fast, but running time depends on A
⇒ requires randomization of A to protect against timing attacks
I Fermat’s little theorem:
• we know that AP−1 ≡ 1 (mod P), whence AP−2 ≡ A−1 (mod P)
• precompute short sequence of squarings and multiplications for fast
exponentiation of A
• example: P = 2255 − 19 in 11M and 254S [Bernstein, 2006]
A
S
2
A
S2
A9
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
39 / 43
MP field inversion
I Given A ∈ F∗P , compute A−1 mod P
I Extended Euclidean algorithm:
• compute Bézout’s coefficients: U and V such that UA + VP = gcd(A, P) = 1
• then UA ≡ 1 (mod P) and A−1 ≡ U (mod P)
• fast, but running time depends on A
⇒ requires randomization of A to protect against timing attacks
I Fermat’s little theorem:
• we know that AP−1 ≡ 1 (mod P), whence AP−2 ≡ A−1 (mod P)
• precompute short sequence of squarings and multiplications for fast
exponentiation of A
• example: P = 2255 − 19 in 11M and 254S [Bernstein, 2006]
A
S
2
A
S2
A9
A11
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
39 / 43
MP field inversion
I Given A ∈ F∗P , compute A−1 mod P
I Extended Euclidean algorithm:
• compute Bézout’s coefficients: U and V such that UA + VP = gcd(A, P) = 1
• then UA ≡ 1 (mod P) and A−1 ≡ U (mod P)
• fast, but running time depends on A
⇒ requires randomization of A to protect against timing attacks
I Fermat’s little theorem:
• we know that AP−1 ≡ 1 (mod P), whence AP−2 ≡ A−1 (mod P)
• precompute short sequence of squarings and multiplications for fast
exponentiation of A
• example: P = 2255 − 19 in 11M and 254S [Bernstein, 2006]
A
S
2
A
S2
A9
A11
S
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
5
A2 −1
39 / 43
MP field inversion
I Given A ∈ F∗P , compute A−1 mod P
I Extended Euclidean algorithm:
• compute Bézout’s coefficients: U and V such that UA + VP = gcd(A, P) = 1
• then UA ≡ 1 (mod P) and A−1 ≡ U (mod P)
• fast, but running time depends on A
⇒ requires randomization of A to protect against timing attacks
I Fermat’s little theorem:
• we know that AP−1 ≡ 1 (mod P), whence AP−2 ≡ A−1 (mod P)
• precompute short sequence of squarings and multiplications for fast
exponentiation of A
• example: P = 2255 − 19 in 11M and 254S [Bernstein, 2006]
A
S
2
A
S2
A
9
A
11
S
25 −1
A
S5
210 −1
A
S10
A2
20
−1
S20
A
2255 −21
S5
A
2250 −1
S50
A
2100 −1
S100
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
A
2100 −1
S50
250 −1
A
S10
A2
40
−1
39 / 43
The Residue Number System (RNS)
I Let B = (m1 , . . . , mk ) a tuple of k pairwise coprime integers
• typically, the mi ’s are chosen to fit in a machine word (w bits)
• pseudo-Mersenne primes allow for easy reduction modulo mi :
mi = 2w − ci , with small ci
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
40 / 43
The Residue Number System (RNS)
I Let B = (m1 , . . . , mk ) a tuple of k pairwise coprime integers
• typically, the mi ’s are chosen to fit in a machine word (w bits)
• pseudo-Mersenne primes allow for easy reduction modulo mi :
mi = 2w − ci , with small ci
• write M =
k
Y
mi and, for all i, Mi = M/mi
i=1
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
40 / 43
The Residue Number System (RNS)
I Let B = (m1 , . . . , mk ) a tuple of k pairwise coprime integers
• typically, the mi ’s are chosen to fit in a machine word (w bits)
• pseudo-Mersenne primes allow for easy reduction modulo mi :
mi = 2w − ci , with small ci
• write M =
k
Y
mi and, for all i, Mi = M/mi
i=1
I Let A < M be an integer
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
40 / 43
The Residue Number System (RNS)
I Let B = (m1 , . . . , mk ) a tuple of k pairwise coprime integers
• typically, the mi ’s are chosen to fit in a machine word (w bits)
• pseudo-Mersenne primes allow for easy reduction modulo mi :
mi = 2w − ci , with small ci
• write M =
k
Y
mi and, for all i, Mi = M/mi
i=1
I Let A < M be an integer
→
−
• represent A as the tuple A = (a1 , . . . , ak ) with ai = A mod mi = |A|mi , for all i
→ that is the RNS representation of A in base B
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
40 / 43
The Residue Number System (RNS)
I Let B = (m1 , . . . , mk ) a tuple of k pairwise coprime integers
• typically, the mi ’s are chosen to fit in a machine word (w bits)
• pseudo-Mersenne primes allow for easy reduction modulo mi :
mi = 2w − ci , with small ci
• write M =
k
Y
mi and, for all i, Mi = M/mi
i=1
I Let A < M be an integer
→
−
• represent A as the tuple A = (a1 , . . . , ak ) with ai = A mod mi = |A|mi , for all i
→ that is the RNS representation of A in base B
→
−
• given A = (a1 , . . . , ak ), retrieve the unique corresponding integer A ∈ Z/M Z
using the Chinese remaindering theorem (CRT):
k
X
−1
A=
|ai · Mi |mi · Mi i=1
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
M
40 / 43
The Residue Number System (RNS)
I Let B = (m1 , . . . , mk ) a tuple of k pairwise coprime integers
• typically, the mi ’s are chosen to fit in a machine word (w bits)
• pseudo-Mersenne primes allow for easy reduction modulo mi :
mi = 2w − ci , with small ci
• write M =
k
Y
mi and, for all i, Mi = M/mi
i=1
I Let A < M be an integer
→
−
• represent A as the tuple A = (a1 , . . . , ak ) with ai = A mod mi = |A|mi , for all i
→ that is the RNS representation of A in base B
→
−
• given A = (a1 , . . . , ak ), retrieve the unique corresponding integer A ∈ Z/M Z
using the Chinese remaindering theorem (CRT):
k
X
−1
A=
|ai · Mi |mi · Mi i=1
I If M > P, we can represent elements of
M
FP
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
in RNS
40 / 43
RNS arithmetic
→
−
→
−
I Let A = (a1 , . . . , ak ) and B = (b1 , . . . , bk )
→
−
A
→
−
B
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
41 / 43
RNS arithmetic
→
−
→
−
I Let A = (a1 , . . . , ak ) and B = (b1 , . . . , bk )
• add., sub. and mult. can be performed in parallel on all “channels”:
→
− →
−
A ± B = (|a1 ± b1 |m1 , . . . , |ak ± bk |mk )
→
− →
−
A × B = (|a1 × b1 |m1 , . . . , |ak × bk |mk )
→
−
A
→
−
B
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
41 / 43
RNS arithmetic
→
−
→
−
I Let A = (a1 , . . . , ak ) and B = (b1 , . . . , bk )
• add., sub. and mult. can be performed in parallel on all “channels”:
→
− →
−
A ± B = (|a1 ± b1 |m1 , . . . , |ak ± bk |mk )
→
− →
−
A × B = (|a1 × b1 |m1 , . . . , |ak × bk |mk )
a1
a2
a3
a4
b1
b2
b3
b4
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
41 / 43
RNS arithmetic
→
−
→
−
I Let A = (a1 , . . . , ak ) and B = (b1 , . . . , bk )
• add., sub. and mult. can be performed in parallel on all “channels”:
→
− →
−
A ± B = (|a1 ± b1 |m1 , . . . , |ak ± bk |mk )
→
− →
−
A × B = (|a1 × b1 |m1 , . . . , |ak × bk |mk )
a1
×
b1
a2
×
b2
a3
×
b3
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
a4
×
b4
41 / 43
RNS arithmetic
→
−
→
−
I Let A = (a1 , . . . , ak ) and B = (b1 , . . . , bk )
• add., sub. and mult. can be performed in parallel on all “channels”:
→
− →
−
A ± B = (|a1 ± b1 |m1 , . . . , |ak ± bk |mk )
→
− →
−
A × B = (|a1 × b1 |m1 , . . . , |ak × bk |mk )
a1
×
b1
a2
×
b2
a3
×
b3
a4
×
b4
r1
r2
r3
r4
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
41 / 43
RNS arithmetic
→
−
→
−
I Let A = (a1 , . . . , ak ) and B = (b1 , . . . , bk )
• add., sub. and mult. can be performed in parallel on all “channels”:
→
− →
−
A ± B = (|a1 ± b1 |m1 , . . . , |ak ± bk |mk )
→
− →
−
A × B = (|a1 × b1 |m1 , . . . , |ak × bk |mk )
• native parallelism: suited to SIMD instructions and hardware implementation
a1
×
b1
a2
×
b2
a3
×
b3
a4
×
b4
r1
r2
r3
r4
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
41 / 43
RNS arithmetic
→
−
→
−
I Let A = (a1 , . . . , ak ) and B = (b1 , . . . , bk )
• add., sub. and mult. can be performed in parallel on all “channels”:
→
− →
−
A ± B = (|a1 ± b1 |m1 , . . . , |ak ± bk |mk )
→
− →
−
A × B = (|a1 × b1 |m1 , . . . , |ak × bk |mk )
• native parallelism: suited to SIMD instructions and hardware implementation
a1
×
b1
a2
×
b2
a3
×
b3
a4
×
b4
r1
r2
r3
r4
I Limitations:
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
41 / 43
RNS arithmetic
→
−
→
−
I Let A = (a1 , . . . , ak ) and B = (b1 , . . . , bk )
• add., sub. and mult. can be performed in parallel on all “channels”:
→
− →
−
A ± B = (|a1 ± b1 |m1 , . . . , |ak ± bk |mk )
→
− →
−
A × B = (|a1 × b1 |m1 , . . . , |ak × bk |mk )
• native parallelism: suited to SIMD instructions and hardware implementation
a1
×
b1
a2
×
b2
a3
×
b3
a4
×
b4
r1
r2
r3
r4
I Limitations:
• operations are computed in
Z/M Z:
beware of overflows! (we need M > P 2 )
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
41 / 43
RNS arithmetic
→
−
→
−
I Let A = (a1 , . . . , ak ) and B = (b1 , . . . , bk )
• add., sub. and mult. can be performed in parallel on all “channels”:
→
− →
−
A ± B = (|a1 ± b1 |m1 , . . . , |ak ± bk |mk )
→
− →
−
A × B = (|a1 × b1 |m1 , . . . , |ak × bk |mk )
• native parallelism: suited to SIMD instructions and hardware implementation
a1
×
b1
a2
×
b2
a3
×
b3
a4
×
b4
a5
×
b5
a6
×
b6
a7
×
b7
a8
×
b8
r1
r2
r3
r4
r5
r6
r7
r8
I Limitations:
• operations are computed in
Z/M Z:
beware of overflows! (we need M > P 2 )
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
41 / 43
RNS arithmetic
→
−
→
−
I Let A = (a1 , . . . , ak ) and B = (b1 , . . . , bk )
• add., sub. and mult. can be performed in parallel on all “channels”:
→
− →
−
A ± B = (|a1 ± b1 |m1 , . . . , |ak ± bk |mk )
→
− →
−
A × B = (|a1 × b1 |m1 , . . . , |ak × bk |mk )
• native parallelism: suited to SIMD instructions and hardware implementation
a1
×
b1
a2
×
b2
a3
×
b3
a4
×
b4
a5
×
b5
a6
×
b6
a7
×
b7
a8
×
b8
r1
r2
r3
r4
r5
r6
r7
r8
I Limitations:
• operations are computed in Z/M Z: beware of overflows! (we need M > P 2 )
• RNS modular reduction has quadratic complexity O(k 2 )
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
41 / 43
RNS Montgomery reduction
I Requires two RNS bases Bα = (mα,1 , . . . , mα,k ) and Bβ = (mβ,1 , . . . , mβ,k ) such
that Mα > P, Mβ > P, and gcd(Mα , Mβ ) = 1
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
42 / 43
RNS Montgomery reduction
I Requires two RNS bases Bα = (mα,1 , . . . , mα,k ) and Bβ = (mβ,1 , . . . , mβ,k ) such
that Mα > P, Mβ > P, and gcd(Mα , Mβ ) = 1
I RNS base extension algorithm (BE) [Kawamura et al., 2000]
−
→
−
→
−
→
• given Xα in base Bα , BE(Xα , Bα , Bβ ) computes Xβ , the repr. of X in base Bβ
−
→
−
→
• similarly, BE(Xβ , Bβ , Bα ) computes Xα in base Bα
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
42 / 43
RNS Montgomery reduction
I Requires two RNS bases Bα = (mα,1 , . . . , mα,k ) and Bβ = (mβ,1 , . . . , mβ,k ) such
that Mα > P, Mβ > P, and gcd(Mα , Mβ ) = 1
I RNS base extension algorithm (BE) [Kawamura et al., 2000]
−
→
−
→
−
→
• given Xα in base Bα , BE(Xα , Bα , Bβ ) computes Xβ , the repr. of X in base Bβ
−
→
−
→
• similarly, BE(Xβ , Bβ , Bα ) computes Xα in base Bα
• similar to RNS modular reduction → O(k 2 ) complexity
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
42 / 43
RNS Montgomery reduction
Bα
−
→
Aα
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
Bβ
−
→
Aβ
43 / 43
RNS Montgomery reduction
Bβ
Bα
−
→
Aα
aα,1
aα,2
aα,3
aα,4
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
aβ,1
aβ,2
aβ,3
aβ,4
−
→
Aβ
43 / 43
RNS Montgomery reduction
Bβ
Bα
−
→
Aα
aα,1
aα,2
aα,3
aα,4
−−−−
−→
(−P −1 )α
0
pα,1
0
pα,2
0
pα,3
0
pα,4
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
aβ,1
aβ,2
aβ,3
aβ,4
−
→
Aβ
43 / 43
RNS Montgomery reduction
Bβ
Bα
−
→
Aα
aα,1
aα,2
aα,3
aα,4
−−−−
−→
(−P −1 )α
×
0
pα,1
×
0
pα,2
×
0
pα,3
×
0
pα,4
−
→
Kα
kα,1
kα,2
kα,3
kα,4
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
aβ,1
aβ,2
aβ,3
aβ,4
−
→
Aβ
43 / 43
RNS Montgomery reduction
Bβ
Bα
−
→
Aα
aα,1
aα,2
aα,3
aα,4
−−−−
−→
(−P −1 )α
×
0
pα,1
×
0
pα,2
×
0
pα,3
×
0
pα,4
−
→
Kα
kα,1
kα,2
kα,3
kα,4
BE
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
aβ,1
aβ,2
aβ,3
aβ,4
−
→
Aβ
kβ,1
kβ,2
kβ,3
kβ,4
−
→
Kβ
43 / 43
RNS Montgomery reduction
Bβ
Bα
−
→
Aα
aα,1
aα,2
aα,3
aα,4
−−−−
−→
(−P −1 )α
×
0
pα,1
×
0
pα,2
×
0
pα,3
×
0
pα,4
kα,1
×
pα,1
kα,2
×
pα,2
kα,3
×
pα,3
kα,4
×
pα,4
−
→
Kα
−
→
Pα
BE
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
aβ,1
aβ,2
aβ,3
aβ,4
kβ,1
×
pβ,1
kβ,2
×
pβ,2
kβ,3
×
pβ,3
kβ,4
×
pβ,4
−
→
Aβ
−
→
Kβ
−
→
Pβ
43 / 43
RNS Montgomery reduction
Bβ
Bα
−
→
Aα
aα,1
aα,2
aα,3
aα,4
−−−−
−→
(−P −1 )α
×
0
pα,1
×
0
pα,2
×
0
pα,3
×
0
pα,4
−
→
Pα
kα,1
×
pα,1
kα,2
×
pα,2
kα,3
×
pα,3
kα,4
×
pα,4
−
→
Aα
+
aα,1
+
aα,2
+
aα,3
+
aα,4
−
→
Kα
BE
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
aβ,1
aβ,2
aβ,3
aβ,4
kβ,1
×
pβ,1
kβ,2
×
pβ,2
kβ,3
×
pβ,3
kβ,4
×
pβ,4
+
aβ,1
+
aβ,2
+
aβ,3
+
aβ,4
−
→
Aβ
−
→
Kβ
−
→
Pβ
−
→
Aβ
43 / 43
RNS Montgomery reduction
Bβ
Bα
−
→
Aα
aα,1
aα,2
aα,3
aα,4
−−−−
−→
(−P −1 )α
×
0
pα,1
×
0
pα,2
×
0
pα,3
×
0
pα,4
−
→
Pα
kα,1
×
pα,1
kα,2
×
pα,2
kα,3
×
pα,3
kα,4
×
pα,4
−
→
Aα
+
aα,1
+
aα,2
+
aα,3
0
0
0
−
→
Kα
−
→
Tα ≡ 0 (mod Mα )
−
→
Aβ
aβ,1
aβ,2
aβ,3
aβ,4
kβ,1
×
pβ,1
kβ,2
×
pβ,2
kβ,3
×
pβ,3
kβ,4
×
pβ,4
+
aα,4
+
aβ,1
+
aβ,2
+
aβ,3
+
aβ,4
−
→
Aβ
0
tβ,1
tβ,2
tβ,3
tβ,4
−
→
Tβ
BE
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
−
→
Kβ
−
→
Pβ
43 / 43
RNS Montgomery reduction
Bβ
Bα
−
→
Aα
aα,1
aα,2
aα,3
aα,4
−−−−
−→
(−P −1 )α
×
0
pα,1
×
0
pα,2
×
0
pα,3
×
0
pα,4
−
→
Kα
kα,1
kα,2
kα,3
kα,4
BE
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
−
→
Aβ
aβ,1
aβ,2
aβ,3
aβ,4
kβ,1
×
pβ,1
kβ,2
×
pβ,2
kβ,3
×
pβ,3
kβ,4
×
pβ,4
+
aβ,1
+
aβ,2
+
aβ,3
+
aβ,4
−
→
Aβ
tβ,1
tβ,2
tβ,3
tβ,4
−
→
Tβ
−
→
Kβ
−
→
Pβ
43 / 43
RNS Montgomery reduction
Bβ
Bα
−
→
Aα
aα,1
aα,2
aα,3
aα,4
−−−−
−→
(−P −1 )α
×
0
pα,1
×
0
pα,2
×
0
pα,3
×
0
pα,4
−
→
Kα
kα,1
kα,2
kα,3
kα,4
BE
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
aβ,1
aβ,2
aβ,3
aβ,4
kβ,1
×
pβ,1
kβ,2
×
pβ,2
kβ,3
×
pβ,3
kβ,4
×
pβ,4
+
aβ,1
+
aβ,2
+
aβ,3
+
aβ,4
tβ,1
×
0
mβ,1
tβ,2
×
0
mβ,2
tβ,3
×
0
mβ,3
tβ,4
×
0
mβ,4
−
→
Aβ
−
→
Kβ
−
→
Pβ
−
→
Aβ
−
→
Tβ
−−−
−→
(Mα−1 )β
43 / 43
RNS Montgomery reduction
Bβ
Bα
−
→
Aα
aα,1
aα,2
aα,3
aα,4
−−−−
−→
(−P −1 )α
×
0
pα,1
×
0
pα,2
×
0
pα,3
×
0
pα,4
−
→
Kα
kα,1
kα,2
kα,3
kα,4
BE
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
aβ,1
aβ,2
aβ,3
aβ,4
kβ,1
×
pβ,1
kβ,2
×
pβ,2
kβ,3
×
pβ,3
kβ,4
×
pβ,4
+
aβ,1
+
aβ,2
+
aβ,3
+
aβ,4
tβ,1
×
0
mβ,1
tβ,2
×
0
mβ,2
tβ,3
×
0
mβ,3
tβ,4
×
0
mβ,4
rβ,1
rβ,2
rβ,3
rβ,4
−
→
Aβ
−
→
Kβ
−
→
Pβ
−
→
Aβ
−
→
Tβ
−−−
−→
(Mα−1 )β
−
→
Rβ
43 / 43
RNS Montgomery reduction
Bβ
Bα
−
→
Aα
aα,1
aα,2
aα,3
aα,4
−−−−
−→
(−P −1 )α
×
0
pα,1
×
0
pα,2
×
0
pα,3
×
0
pα,4
−
→
Kα
kα,1
kα,2
kα,3
kα,4
−
→
Rα
rα,4
rα,3
rα,2
rα,1
BE
BE
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
aβ,1
aβ,2
aβ,3
aβ,4
kβ,1
×
pβ,1
kβ,2
×
pβ,2
kβ,3
×
pβ,3
kβ,4
×
pβ,4
+
aβ,1
+
aβ,2
+
aβ,3
+
aβ,4
tβ,1
×
0
mβ,1
tβ,2
×
0
mβ,2
tβ,3
×
0
mβ,3
tβ,4
×
0
mβ,4
rβ,1
rβ,2
rβ,3
rβ,4
−
→
Aβ
−
→
Kβ
−
→
Pβ
−
→
Aβ
−
→
Tβ
−−−
−→
(Mα−1 )β
−
→
Rβ
43 / 43
RNS Montgomery reduction
Bβ
Bα
−
→
Aα
aα,1
aα,2
aα,3
aα,4
−−−−
−→
(−P −1 )α
×
0
pα,1
×
0
pα,2
×
0
pα,3
×
0
pα,4
−
→
Kα
kα,1
kα,2
kα,3
kα,4
−
→
Rα
rα,4
rα,3
rα,2
rα,1
BE
BE
aβ,1
aβ,2
aβ,3
aβ,4
kβ,1
×
pβ,1
kβ,2
×
pβ,2
kβ,3
×
pβ,3
kβ,4
×
pβ,4
+
aβ,1
+
aβ,2
+
aβ,3
+
aβ,4
tβ,1
×
0
mβ,1
tβ,2
×
0
mβ,2
tβ,3
×
0
mβ,3
tβ,4
×
0
mβ,4
rβ,1
rβ,2
rβ,3
rβ,4
−
→
Aβ
−
→
Kβ
−
→
Pβ
−
→
Aβ
−
→
Tβ
−−−
−→
(Mα−1 )β
−
→
Rβ
−
→ −
→
I Result is (Rα , Rβ ) ≡ (A · Mα−1 ) (mod P)
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
43 / 43
RNS Montgomery reduction
Bβ
Bα
−
→
Aα
aα,1
aα,2
aα,3
aα,4
−−−−
−→
(−P −1 )α
×
0
pα,1
×
0
pα,2
×
0
pα,3
×
0
pα,4
−
→
Kα
kα,1
kα,2
kα,3
kα,4
−
→
Rα
rα,4
rα,3
rα,2
rα,1
BE
BE
aβ,1
aβ,2
aβ,3
aβ,4
kβ,1
×
pβ,1
kβ,2
×
pβ,2
kβ,3
×
pβ,3
kβ,4
×
pβ,4
+
aβ,1
+
aβ,2
+
aβ,3
+
aβ,4
tβ,1
×
0
mβ,1
tβ,2
×
0
mβ,2
tβ,3
×
0
mβ,3
tβ,4
×
0
mβ,4
rβ,1
rβ,2
rβ,3
rβ,4
−
→
Aβ
−
→
Kβ
−
→
Pβ
−
→
Aβ
−
→
Tβ
−−−
−→
(Mα−1 )β
−
→
Rβ
−
→ −
→
I Result is (Rα , Rβ ) ≡ (A · Mα−1 ) (mod P)
I See also the hybrid position–residues number system [Bigou & Tisserand, 2016]
Jérémie Detrey — Hardware Implementation of (Elliptic Curve) Cryptography
43 / 43
Un peu de publicité éhontée...
Journées Codage & Cryptographie 2017
du 23 au 28 avril à La Bresse (Vosges)
Soumission de résumés: jusqu’au 8 mars
Inscriptions:
jusqu’au 3 avril
https://jc2-2017.inria.fr/
À très bientôt dans les Vosges !