Design of block ciphers

Design of block ciphers
Joan Daemen
STMicroelectronics and Radboud University
University of Zagreb
Zagreb, Croatia, March 23, 2016
1 / 49
Outline
1
Data Encryption Standard
2 Wide Trail Strategy
3 Rijndael
4 Lessons learnt from 17 years of public scrutiny
5 Conclusions
2 / 49
What modern block ciphers look like
Iterated block cipher:
Data path: transforms P to C
iteration of a round function that
…shall be non-linear
…depends on a round key
Key schedule
generates round keys from cipher key K
3 / 49
Substitution-permutation network
Round function with two (or three) layers
substitution layer: lookup tables applied in parallel to blocks
S-boxes
non-linear: S(x ⊕ y) ̸= S(x) ⊕ S(y)
permutation layer: moves bits to different S-box positions
either key-dependent S-boxes or third layer of key addition
4 / 49
But what do we want our block cipher to achieve?
Formal security notion: PRP security
PRP: ”Pseudorandom Permutation”
Infeasibility to distinguish B[K] from random permutation
Covers all use cases where key is fixed and secret
5 / 49
Data Encryption Standard
Outline
1
Data Encryption Standard
2 Wide Trail Strategy
3 Rijndael
4 Lessons learnt from 17 years of public scrutiny
5 Conclusions
6 / 49
Data Encryption Standard
Data encryption standard
Standard by and for US government
By National Institute for Standardization and Technology (NIST)
Designed by IBM in collaboration with NSA
1977: Federal Information Processing Standard (FIPS) 46
complete block cipher specification
block length: 64 bits, key length: 56 bits
no design rationale
freely usable
Massively adopted by banks and industry worldwide
Dominated symmetric crypto for more than 20 years
7 / 49
Data Encryption Standard
The Feistel structure
State: left half L and right half
R
Alteration of involutions
apply F to Ri and add to Li
swap left and right
omit swap in last round
B−1 similar to B
same sequence of operations
round keys in reversed order
no need for F−1
used in DES
8 / 49
Data Encryption Standard
Data encryption standard: overview
data path
key schedule
9 / 49
Data Encryption Standard
DES algorithmic structure
Data path
16-round Feistel
Initial (IP) and final permutations (FP):
no cryptographic significance
historical, due to addressing in hardware implementation
Key schedule
8 bits thrown away in permuted choice 1 (PC1)
remaining 56-bit string
split in two 28-bit strings
rotated for each round over 1 or 2 bits
48-bit round key obtained with PC2 of these 56 bits
each round key bit is just a cipher key bit
10 / 49
Data Encryption Standard
Data encryption standard: F-function
4 layers:
expansion E: from 32 to 48 bits
bitwise round key addition
substitution: 8 different 6-to-4 bit non-linear S-boxes
permutation P: moving nearby bits to remote positions
clearly hardware-oriented
11 / 49
Data Encryption Standard
Data encryption standard: F-function
1
32
S1
S2
S3
S4
S5
S6
S7
E
RK
S8
P
1
32
4 layers:
expansion E: from 32 to 48 bits
bitwise round key addition
substitution: 8 different 6-to-4 bit non-linear S-boxes
permutation P: moving nearby bits to remote positions
clearly hardware-oriented
11 / 49
Data Encryption Standard
Trouble for DES: Weak Keys
What happens if the cipher key is all-zero?
all round keys are all-zero
all rounds are the same
cipher and its inverse are the same
Same is true for an all-one cipher key
And two more keys due to symmetry in key schedule
Weak key Kw :
DES[Kw ] ◦ DES[Kw ] = I
Also 6 semi-weak key pairs (K1 , K2 )
DES[K1 ] ◦ DES[K2 ] = I
Mostly of academic interest
12 / 49
Data Encryption Standard
Trouble for DES: Complementation Property
What happens if we complement the cipher input?
flip all bits in key
flip all bits in plaintext
In first round
input to F complemented so output of E complemented
round key also complemented so input to S-boxes unaffected
output of F unaffected
Output of first round is simply complemented
Repeat this until you reach the ciphertext
Complementation property:
DES[K](P) = C ⇔= DES[K](P) = C
Reduces effective key length from 56 bits to 55 bits
13 / 49
Data Encryption Standard
Intermezzo: statistical attacks (simplified)
Distinguisher Ω over r − 1 rounds, for (almost) all keys
property Ω should not be present in a random permutation
Online part: collect many couples (Ci , Pi )
Offline part: recover ka of last round key
make a guess ka′ for ka
compute ai from Ci
if not Ω, ka′ is wrong
Assumption:
wrong
ka′
destroys Ω
variants and optimizations …
K
P
Key
Data
sched.
path
rounds
rounds
C
14 / 49
Data Encryption Standard
Intermezzo: statistical attacks (simplified)
Distinguisher Ω over r − 1 rounds, for (almost) all keys
property Ω should not be present in a random permutation
Online part: collect many couples (Ci , Pi )
Offline part: recover ka of last round key
make a guess ka′ for ka
compute ai from Ci
if not Ω, ka′ is wrong
P
Assumption:
wrong ka′ destroys Ω
variants and optimizations …
Distinguisher
a
ka
C
14 / 49
Data Encryption Standard
Intermezzo: statistical attacks (simplified)
Distinguisher Ω over r − 1 rounds, for (almost) all keys
property Ω should not be present in a random permutation
Online part: collect many couples (Ci , Pi )
Offline part: recover ka of last round key
make a guess ka′ for ka
compute ai from Ci
if not Ω, ka′ is wrong
Assumption:
wrong
ka′
P
∆p
destroys Ω
variants and optimizations …
DP(∆p, ∆a)
∆a
a
ka
C
14 / 49
Data Encryption Standard
Intermezzo: statistical attacks (simplified)
Distinguisher Ω over r − 1 rounds, for (almost) all keys
property Ω should not be present in a random permutation
Online part: collect many couples (Ci , Pi )
Offline part: recover ka of last round key
make a guess ka′ for ka
compute ai from Ci
if not Ω, ka′ is wrong
Assumption:
wrong
ka′
P
up
destroys Ω
variants and optimizations …
C2(up, ua)
ua
a
ka
C
14 / 49
Data Encryption Standard
Trouble for DES: Differential Cryptanalysis (DC)
Statistical attack published by [Biham and Shamir, 1990]
Distinguishing property: differential (∆p , ∆a )
difference ∆p leads to difference ∆a with probability DP(∆p , ∆a )
Online phase: collect many pairs (Pi , Ci ), (Pi∗ , Ci∗ ) with Pi ⊕ Pi∗ = ∆p
Offline phase: per key guess ka compute ∆a from Ci and Ci∗
Required # pairs proportionate to 1/DP(∆p , ∆a )
Biham-Shamir broke DES with 247 chosen plaintexts: 1000 TeraByte
13-round differential with DP 2−47.2
guessing part of keys of last 2 rounds
many subtleties and optimizations
15 / 49
Data Encryption Standard
DC of DES: origin of fatal differential
Difference propagation over multiple rounds
called characteristic or differential trail Q
differential probability of a trail DP(Q) product over the rounds
special case: iterative trails
important: # active S-boxes in trail
Q
for this trail DP(∆p , ∆a ) ≈ DP(Q) for ∆p → ∆a
1
32
S1
S2
S3
S4
S5
S6
S7
E
RK
S8
P
1
32
16 / 49
Data Encryption Standard
DC of DES: origin of fatal differential
Difference propagation over multiple rounds
called characteristic or differential trail Q
differential probability of a trail DP(Q) product over the rounds
special case: iterative trails
important: # active S-boxes in trail
Q
for this trail DP(∆p , ∆a ) ≈ DP(Q) for ∆p → ∆a
16 / 49
Data Encryption Standard
Trouble for DES: Linear Cryptanalysis (LC)
Statistical attack published by Matsui 1992
Distinguishing property: correlation C(up , ua )
between sum of plaintext bits uTp p and sum of output bits uTa a
correlation C(up , ua ) ranges between −1 and +1
Online phase: collect many couples (Pi , Ci )
Offline phase: per key guess ka compute ai from Ci
Required # pairs proportionate to 1/C2 (up , ua ) = 1/LP(up , ua )
High correlation C(up , ua ) constructed using linear trail
dual of differential trails, propagation of masks u rather than
difference patterns ∆
used trail has few active S-boxes
Matsui broke DES with 243 known plaintexts: 64 TeraByte
17 / 49
Data Encryption Standard
LC of DES: origin of fatal correlation
Correlations over F
If output mask is active in a single S-box only
Its parity has high correlation with parity for some input mask
Correlation contribution of a trail is product of correlation of
rounds
1
32
S1
S2
S3
S4
S5
S6
S7
E
RK
S8
P
1
32
18 / 49
Data Encryption Standard
LC of DES: origin of fatal correlation
Correlations over F
If output mask is active in a single S-box only
Its parity has high correlation with parity for some input mask
Correlation contribution of a trail is product of correlation of
rounds
18 / 49
Data Encryption Standard
Trouble for DES: the short key
Exhaustive key search: about 3.6 × 1014 trials
More than 15 years ago: “software” cracking
about 10.000 workstations
500.000 trials per second per workstation
expected time: 7.200.000 seconds: 2,5 months
applied in cracking RSA labs DES challenge, June 97
Cracking using dedicated hardware
COPACOBANA RIVYERA (2008)
costs about 10.000$
board with 128 Spartan-3 5000 FPGAs.
finds a DES key in less than a day
Short DES key is real-world concern!
19 / 49
Data Encryption Standard
The solution: Triple DES (FIPS 46-2 and 46-3)
Double DES allows meet-in-the-middle attacks
Three variants of Triple-DES
3-key: 168-bit key, only option allowed by NIST
2-key: 112-bit key
K3 = K1
still massively deployed by banks worldwide
1-key: 56-bit key
K3 = K2 = K1
falls back to single DES thanks to inverse in middle
20 / 49
Wide Trail Strategy
Outline
1
Data Encryption Standard
2 Wide Trail Strategy
3 Rijndael
4 Lessons learnt from 17 years of public scrutiny
5 Conclusions
21 / 49
Wide Trail Strategy
Wide trail strategy
a rationale for designing the data path round function
main objectives:
absence of differential trails with high DP
absence of linear trails with high LP
how? See next slides
22 / 49
Wide Trail Strategy
Trails in substitution-permutation networks
active S-box: non-zero input difference
DP(Q) is product over its active S-boxes: ∏i DP(Sboxi )
Assuming independence of propagation in active S-boxes
23 / 49
Wide Trail Strategy
Trails in substitution-permutation networks
active S-box: non-zero input difference
DP(Q) is product over its active S-boxes: ∏i DP(Sboxi )
Assuming independence of propagation in active S-boxes
23 / 49
Wide Trail Strategy
NOT the wide trail strategy
Naive approach to cipher design
DP(Q) = ∏i DP(Sboxi ) ≤ ∏i DPmax (Sboxi )
DC/LC of DES and SPN: few active S-boxes per round
decrease DP(Q) (or LP(Q)): S-boxes with low maximum DP/LP
For given S-box width b, upper bounds:
DPmax (Sbox) ≥ 21−b
LPmax (Sbox) ≥ 2−b
Consensus in nineties: bigger S-boxes are better!
Implementation complexity of S-boxes strongly increases with size
software: lookup tables of size 2b
hardware: increase of combinatorial logic
24 / 49
Wide Trail Strategy
Principle of wide trail
Principle of the wide trail strategy
Instead of big S-boxes have many active S-boxes
Ensure that any trail has many active S-boxes
multiple active S-boxes per round
Separate layers for nonlinearity and diffusion
nonlinear layer: S-boxes with some maximum DP and LP
diffusion layer: ensures many S-boxes per trail
Origin
introduced early nineties for lightweight bit-oriented designs
particular flavor became mainstream after Rijndael was chosen AES
25 / 49
Wide Trail Strategy
Mixing layers and branch number
Introducing mixing layers and their branch number
A new type of transformation in the round function: mixing layer
linear
goal: each output bit depends on multiple input bits
Desired properties of mixing layer:
avalanche: few active bits at input → many at output
avalanche of inverse: few at output → many at input
Branch number B of a mixing layer
minimum number of active bits at input and output
two types: linear and differential
relative to a state partition in bits, bytes, or …
26 / 49
Wide Trail Strategy
First-order approach
Wide trail: first-order approach
Round function with three layers
round key addition
nonlinearity: n S-boxes of b bits (say b = 8)
mixing
Use a mixing layer with highest possible branch number
value: B = n + 1
defines a maximum distance separable code: MDS matrix
# active S-boxes per two rounds ≥ B
Strategy adopted in SHARK [SHARK, FSE 1996]
b = 8, n = 8 so block length is 64
9 active S-boxes per 2 rounds
first-order: locally optimizes worst-case 2-round behaviour
27 / 49
Wide Trail Strategy
First-order approach
First-order approach illustrated
28 / 49
Wide Trail Strategy
First-order approach
First-order approach illustrated
S
S
S
S
S
S
S
S
S
S
S
S
S
MDS
S
S
S
S
S
MDS
28 / 49
Wide Trail Strategy
First-order approach
The trouble with the first-order approach
Mappings with high branch numbers are expensive
software: n look-up tables with 28 entries of size 8n
hardware: # gates per bit grows quickly as a function of n
Instead of expensive big S-boxes,
…we now have expensive big MDS matrices
29 / 49
Wide Trail Strategy
Second-order approach
Second-order approach
Round function with four layers
round key addition
nonlinearity: n S-boxes
mixing: m parallel MDS mappings each taking n/m bytes
dispersion: moving bytes/bits around
Use an optimal dispersion layer
moves bytes in MDS block to all different MDS blocks
# active S-boxes per four rounds ≥ B 2
Strategy adopted in Square [Square, FSE 1997]
b = 8, m = 4, n = 16 so block length is 128
25 active S-boxes per 4 rounds
30 / 49
Wide Trail Strategy
Second-order approach
Second-order approach illustrated
31 / 49
Wide Trail Strategy
Second-order approach
Second-order approach illustrated
31 / 49
Rijndael
Outline
1
Data Encryption Standard
2 Wide Trail Strategy
3 Rijndael
4 Lessons learnt from 17 years of public scrutiny
5 Conclusions
32 / 49
Rijndael
The AES competition
The start of the AES competition
January 1997: NIST announces the AES initiative
replacement of DES
open call for block cipher proposals
…and for analysis, comparisons, etc.
September 1997: official request for proposals
faster than Triple-DES
128-bit blocks, 128-, 192- and 256-bit keys
specs, reference and optimized code, test vectors
design rationale and preliminary analysis
patent waiver
Vincent Rijmen and I decided to submit a variant of Square
Most important change: multiple key and block lengths
We call it Rijndael
33 / 49
Rijndael
The AES competition
The AES competition
First round: August 1998 to August 1999
15 candidates at 1st AES conference in Ventura, California
analysis presented at 2nd AES conf. in Rome, March 1999
NIST narrowed down to 5 finalists using this analysis
Second round: August 1999 to summer 2000
analysis presented at 3rd AES conf. in New York, April 2000
NIST selected winner using this analysis
Criteria
security margin
efficiency in software and hardware
key agility
simplicity
NIST motivated their choice with two solid reports
34 / 49
Rijndael
The cipher
Rijndael
Block cipher with block and key lengths ∈ {128, 160, 192, 224, 256}
set of 25 block ciphers
AES fixes block length to 128 and limits key length to multiples of 64
Simple round function with four steps (like Square)
all rounds are identical
…except for the round keys
…and omission of mixing layer in last round
parallel and symmetric
Key schedule
Expansion of cipher key to round key sequence
Recursive procedure that can be done in-place
Manipulates bytes with simple operations in GF(28 )
35 / 49
Rijndael
The layers of the round function
The non-linear layer: SubBytes
Single S-box with two layers
y = x−1 in GF(28 ), or more exactly y = x254
max LP = max DP = 2−6 [Nyberg, Eurocrypt 1993]
Affine mapping: multiplication by 8 × 8 matrix in GF(2)
to complicate the algebraic expressions
36 / 49
Rijndael
The layers of the round function
The mixing layer: MixColumns
Single MDS mapping applied to columns: B = 5
Multiplication by a 4 × 4 circulant matrix in GF(28 )
Elements: 1, 1, x and x + 1
circulant MDS matrix with the simplest elements
Inverse has more complex elements
37 / 49
Rijndael
The layers of the round function
The dispersion layer: ShiftRows
Each row is shifted by a different amount
Different shift offsets for higher block lengths
38 / 49
Rijndael
The layers of the round function
Round key addition: AddRoundKey
39 / 49
Rijndael
The layers of the round function
Key schedule: 192-bit key, 128-bit block example
k0 k1 k2 k3 k4 k5 k6 k7 k8 k9 k10 k11 k12 k13 k14 k15 . . .
Round key 0 Round key 1 Round key 2
…
k6n = k6n−6 ⊕ f(k6n−1 )
ki = ki−6 ⊕ ki−1 , i ̸= 6n
40 / 49
Rijndael
The layers of the round function
Rijndael: summary
# rounds: 6 + max(ℓk , ℓb ) with ℓk key and ℓb block length in 32-bit
words
last round has no MixColumns to make inverse similar to cipher
41 / 49
Rijndael
The layers of the round function
Rijndael: some distinguishing features
Bounds on trails:
minimum 25 active S-boxes in any 4-round trail
DP(Q) ≤ 2−150 , LP(Q) ≤ 2−150
Symmetric and simple
too simple to be secure?
no successful attacks up to now
Inverse is different and slightly more expensive
Table-lookup implementation:
4 Kbytes of table
1 table-lookup + 1 XOR per byte per round
inverse uses different tables
No integer arithmetic
as opposed to IDEA, SHA-1, SHA-2
42 / 49
Lessons learnt from 17 years of public scrutiny
Outline
1
Data Encryption Standard
2 Wide Trail Strategy
3 Rijndael
4 Lessons learnt from 17 years of public scrutiny
5 Conclusions
43 / 49
Lessons learnt from 17 years of public scrutiny
Heavily (crypt-)analyzed
Many papers on AES and Rijndael
IACR Crypto DB search on AES: about 16900 hits
IACR Crypto DB search on Rijndael: about 1490 hits
Not all are equally relevant or interesting
We discuss some instructive attacks:
square attack
algebraic attacks
biclique attacks
44 / 49
Lessons learnt from 17 years of public scrutiny
The Square attack
The Square attack
Consider Square, predecessor of Rijndael
DP of 4-round differential trails ≤ 2−150
LP of 4-round linear trails ≤ 2−150
So we thought 6 rounds would be sufficient
Knudsen invents Square attack [Square, FSE 1997]
input sets: constant in some and complete in other bytes
properties decay only slowly through steps of the round
broke up to 6 rounds of Square (or Rijndael)
Lesson learnt: interpret trail bounds with caution
later even stronger attacks, e.g., impossible differentials
helped by byte-alignment and strong local diffusion
remedy: just add rounds
45 / 49
Lessons learnt from 17 years of public scrutiny
Algebraic cryptanalysis
Algebraic attacks
Algebraic attack on AES [Courtois, Pieprzyk, Asiacrypt 2002]
consider the cipher as a set of algebraic equations
simple equations due to multiplicative inverse, e.g. xy = 1
Another suspect property [Murphy, Robshaw, Crypto 2002]
embedding AES in 8 times larger structure BES
BES is fully linear in GF(28 ) except multiplicative inverse
Both turned out to be false alarms
ambitions of algebraic attacks have been adjusted since
tools in statistical attacks
attacks exploiting low degree of round function
46 / 49
Lessons learnt from 17 years of public scrutiny
Biclique attacks
Biclique attack
[Khovratovich, Bogdanov, Rechberger, 2011]
first academic single-key attacks of full AES
covers all three AES key lengths
Speed-up of exhaustive keysearch by a small factor (2 or 3)
large (288 ) to modest (240 ) data complexity
reducing proportion of AES to be computed for each key
by combination of sophisticated tools and techniques
Lessons learnt
no practical threat and unlikely it can be improved
some say this does not even qualify as an academic attack
quasi all modern ciphers are vulnerable to biclique or similar
attacks
protecting against possible but very costly: huge # rounds
47 / 49
Conclusions
Outline
1
Data Encryption Standard
2 Wide Trail Strategy
3 Rijndael
4 Lessons learnt from 17 years of public scrutiny
5 Conclusions
48 / 49
Conclusions
Conclusions
Modern block ciphers iterate a simple round function
DES:
historically important as test bed for attacks
3-key Triple-DES still reasonably secure today
Wide trail strategy: inspired by linear and differential cryptanalysis
Rijndael is fully based on the wide trail strategy
AES (Rijndael) is the dominating block cipher nowadays
omnipresent, even dedicated instructions in Intel CPUs (AES-NI)
inspired many other block cipher and permutation designs
17 years of public scrutiny: no threatening attacks
Thanks for your attention!
49 / 49