Parallel Analysis of the Rijndael
Block Cipher
Philip Brisk
Adam Kaplan Majid Sarrafzadeh
Embedded & Reconfigurable Systems Lab
Computer Science Department
IASTED-PDCS November, 2003
Outline
• Introduction
• Background Material
• Analysis of the Rijndael Cipher
• Concluding Remarks
1/34
IASTED-PDCS November, 2003
Parallel Models of Computation
and Cryptography
• Achieving optimal performance of
cryptographic algorithms is imperative!
• Goal: Understand how to accelerate
performance by studying cryptography
under parallel models of computation.
2/34
IASTED-PDCS November, 2003
What can we Learn from Parallel
Models of Computation?
• Identification of performance bottlenecks.
• How to design efficient cryptographic
hardware.
• Techniques to improve future algorithms.
3/34
IASTED-PDCS November, 2003
Outline
• Introduction
• Background Material
– Cost Model
– Prefix Sum Computation
• Analysis of the Rijndael Cipher
• Concluding Remarks
4/34
IASTED-PDCS November, 2003
Cost Model
• n : problem size
• t(n) : number of steps
• p(n) = N > 1 : number of processors
c(n) : cost
s(n) : speedup
c ( n) t ( n ) p ( n )
5/34
s ( n)
p ( n) N
IASTED-PDCS November, 2003
t ( n)
t ( n)
p ( n ) 1
p ( n) N
Cost Optimality
• Cost ≡ the number of steps executed
collectively by all processors.
• An algorithm is cost-optimal on a parallel
model of computation if:
c ( n)
6/34
p ( n) N
t ( n)
p ( n ) 1
IASTED-PDCS November, 2003
Prefix Sum Computation
• P – a set of N processors: {P1, …, PN}
• Processor Pi holds a value ai.
• For each processor Pi, compute the sum Si:
i
S i ak
k 1
Algorithm:
for i = 1 to N
Si = ai + Si-1
• Addition can be generalized to any binary associative
operation.
7/34
IASTED-PDCS November, 2003
Prefix Sum Computation
• Meijer and Akl [1987] described a solution
using a binary tree of processors.
3
8/34
6
1
4
IASTED-PDCS November, 2003
Prefix Sum Computation
• Meijer and Akl [1987] described a solution
using a binary tree of processors.
3
3
8/34
6
1
6
1
4
IASTED-PDCS November, 2003
Prefix Sum Computation
• Meijer and Akl [1987] described a solution
using a binary tree of processors.
9
3
3
8/34
6
1
1
4
IASTED-PDCS November, 2003
Prefix Sum Computation
• Meijer and Akl [1987] described a solution
using a binary tree of processors.
9
3
8/34
96
1
54
IASTED-PDCS November, 2003
Prefix Sum Computation
• Meijer and Akl [1987] described a solution
using a binary tree of processors.
9
3
8/34
9
1
9
5
IASTED-PDCS November, 2003
Prefix Sum Computation
• Meijer and Akl [1987] described a solution
using a binary tree of processors.
3
8/34
9
10
14
IASTED-PDCS November, 2003
A Cost-Optimal Prefix Sum
• To achieve cost optimality:
n
t n O O N log N
N
n
pn N O
log n
9/34
IASTED-PDCS November, 2003
Outline
• Introduction
• Background Material
• Analysis of the Rijndael Cipher
• Concluding Remarks
10/34
IASTED-PDCS November, 2003
The Rijndael Cipher
• The cipher iterates in a series of rounds.
– Each round requires a Key
• Using the same key every round is not secure.
• Providing a sequence of keys as an input is
unreasonable.
• A key schedule is uses the original key to compute a
new key for each round.
11/34
IASTED-PDCS November, 2003
The Rijndael Cipher
Key Schedule
Round Transformation
– Key Expansion
• Expands the original
key analogously to
prefix-sum
computation.
– Round Key Selection
• Divides the expanded
key between the rounds
of the cipher
12/34
– 4 sub-transformations
applied during each
round:
•
•
•
•
ByteSub
Shift Row
MixColumn
AddRoundKey
IASTED-PDCS November, 2003
The Rijndael Cipher: Parameters
• Nb – Block Length (# bytes in state)
• Nk – Key Length
• Nr – Number of Rounds
• The key and state are represented as
2-dimensional arrays of bytes.
13/34
IASTED-PDCS November, 2003
Representation of the State
• The state is represented by a 4 x Nb/4 array
of bytes (Nb = 4, 6, or 8)
a0,0
a1,0
4
a2,0
a3,0
14/34
Nb
a0,1 a0,2
a1,1 a1,2
a2,1 a2,2
a3,1 a3,2
a0,3
a1,3
a2,3
a3,3
IASTED-PDCS November, 2003
The ByteSub Transformation
• Apply an S-Box to every byte in the state.
a0,0 a0,1 a0,2
a1,0 a1,1
ai,j a1,2
a2,0 a2,1 a2,2
a3,0 a3,1 a3,2
State
15/34
a0,3
a1,3
a2,3
a3,3
S-BOX
b0,0 b0,1 b0,2
b1,0 b1,1
bi,j b1,2
b2,0 b2,1 b2,2
b3,0 b3,1 b3,2
8-bit
lookup table
IASTED-PDCS November, 2003
State
b0,3
b1,3
b2,3
b3,3
The ByteSub Transformation
y0 1 0 0 0
y 1 1 0 0
1
y2 1 1 1 0
y3 1 1 1 1
1 1 1 1
4
a0,0 a y0,1
a 0,2 a0,3
y5 0 1 1 1
a1,0 a y1,1
a01,2 0 a1,3
a
1 1
6 i,j
a2,0 a y2,1
0 1
7
a02,2 0 a2,3
1 1 1 1 x0 1
0 1 1 1 x1 1
0 0 1 1 x2 0
0 0 0 1 x3 0
1 0 0 0 x4 0
1 1 0 0 x5 1
1 1 1 0 x6 1
1 1 1 1 x7 0
S-BOX
a3,0 a3,1 a3,2 a3,3
State
15/34
b0,0 b0,1 b0,2
b1,0 b1,1
bi,j b1,2
b2,0 b2,1 b2,2
b3,0 b3,1 b3,2
8-bit
lookup table
IASTED-PDCS November, 2003
State
b0,3
b1,3
b2,3
b3,3
The ByteSub Transformation
• 1 processor
a0,0 a0,1 a0,2
a1,0 a1,1
ai,j a1,2
a2,0 a2,1 a2,2
a3,0 a3,1 a3,2
State
15/34
a0,3
a1,3
a2,3
a3,3
t(n) = O(Nb)
S-BOX
b0,0 b0,1 b0,2
b1,0 b1,1
bi,j b1,2
b2,0 b2,1 b2,2
b3,0 b3,1 b3,2
8-bit
lookup table
IASTED-PDCS November, 2003
State
b0,3
b1,3
b2,3
b3,3
The ByteSub Transformation
• 4 x Nb processors
a0,0 a0,1 a0,2
a1,0 a1,1
ai,j a1,2
a2,0 a2,1 a2,2
a3,0 a3,1 a3,2
State
15/34
a0,3
a1,3
a2,3
a3,3
t(n) = O(1)
S-BOX
b0,0 b0,1 b0,2
b1,0 b1,1
bi,j b1,2
b2,0 b2,1 b2,2
b3,0 b3,1 b3,2
8-bit
lookup table
IASTED-PDCS November, 2003
State
b0,3
b1,3
b2,3
b3,3
The Shift-Row Transformation
• Shift each row of the state by a constant.
16/34
a0,0 a0,1 a0,2 a0,3
a1,0 a1,1 a1,2 a1,3
b0,0 b0,1 b0,2 b0,3
b1,1 b1,2 b1,3 b1,0
a2,0 a2,1 a2,2 a2,3
a3,0 a3,1 a3,2 a3,3
b2,2 b2,3 b2,0 b2,1
b3,3 b3,0 b3,1 b3,2
State
State
IASTED-PDCS November, 2003
The Shift-Row Transformation
• 1 processor
16/34
t(n) = O(Nb)
a0,0 a0,1 a0,2 a0,3
a1,0 a1,1 a1,2 a1,3
b0,0 b0,1 b0,2 b0,3
b1,1 b1,2 b1,3 b1,0
a2,0 a2,1 a2,2 a2,3
a3,0 a3,1 a3,2 a3,3
b2,2 b2,3 b2,0 b2,1
b3,3 b3,0 b3,1 b3,2
State
State
IASTED-PDCS November, 2003
The Shift-Row Transformation
• 4 x Nb processors t(n) = O(1)
16/34
a0,0 a0,1 a0,2 a0,3
a1,0 a1,1 a1,2 a1,3
b0,0 b0,1 b0,2 b0,3
b1,1 b1,2 b1,3 b1,0
a2,0 a2,1 a2,2 a2,3
a3,0 a3,1 a3,2 a3,3
b2,2 b2,3 b2,0 b2,1
b3,3 b3,0 b3,1 b3,2
State
State
IASTED-PDCS November, 2003
The Mix-Column Transformation
• Apply to each column in the state.
a0,0
a1,0
a0,j
a0,1 a0,2 a0,3
a1,1a1,ja1,2 a1,3
a2,0 a2,1 a2,2 a2,3
a2,j
a3,0 a3,1 a3,2 a3,3
a3,j
State
17/34
MixColumn
b0,0
b1,0
b0,j
b0,1 b0,2 b0,3
b1,1b1,jb1,2 b1,3
b2,0 b2,1 b2,2 b2,3
b2,j
b3,0 b3,1 b3,2 b3,3
b3,j
4x4 Byte
Matrix
IASTED-PDCS November, 2003
State
The Mix-Column Transformation
a0,0
a1,0
a2,0
a3,0
b0 02
b 01
a0,j1
a0,1 a0,2 a0,3
b2 01
a1,1a1,ja1,2
a 1,3
b
3 03
a2,1 a2,2 a2,3
a2,j
a3,1 a3,2 a3,3
a3,j
State
17/34
03 01 01 a0
02 03 01 a1
01 02 03 a2
01 01 02 a3
MixColumn
b0,0
b1,0
b0,j
b0,1 b0,2 b0,3
b1,1b1,jb1,2 b1,3
b2,0 b2,1 b2,2 b2,3
b2,j
b3,0 b3,1 b3,2 b3,3
b3,j
4x4 Byte
Matrix
IASTED-PDCS November, 2003
State
The Mix-Column Transformation
• 1 processor
a0,0
a1,0
a0,j
a0,1 a0,2 a0,3
a1,1a1,ja1,2 a1,3
a2,0 a2,1 a2,2 a2,3
a2,j
a3,0 a3,1 a3,2 a3,3
a3,j
State
17/34
t(n) = O(Nb)
MixColumn
b0,0
b1,0
b0,j
b0,1 b0,2 b0,3
b1,1b1,jb1,2 b1,3
b2,0 b2,1 b2,2 b2,3
b2,j
b3,0 b3,1 b3,2 b3,3
b3,j
4x4 Byte
Matrix
IASTED-PDCS November, 2003
State
The Mix-Column Transformation
• O(Nb) processors
a0,0
a1,0
a0,j
a0,1 a0,2 a0,3
a1,1a1,ja1,2 a1,3
a2,0 a2,1 a2,2 a2,3
a2,j
a3,0 a3,1 a3,2 a3,3
a3,j
State
17/34
t(n) = O(1)
MixColumn
b0,0
b1,0
b0,j
b0,1 b0,2 b0,3
b1,1b1,jb1,2 b1,3
b2,0 b2,1 b2,2 b2,3
b2,j
b3,0 b3,1 b3,2 b3,3
b3,j
4x4 Byte
Matrix
IASTED-PDCS November, 2003
State
The Add-Round-Key
Transformation
• Xor each state byte with each key byte..
a0,0 a0,1 a0,2
a1,0 a1,1
ai,j a1,2
a2,0 a2,1 a2,2
a3,0 a3,1 a3,2
a0,3
a1,3
a2,3
a3,3
k0,0 k0,1 k0,2
k1,0 k1,1
ki,j k1,2
k2,0 k2,1 k2,2
k3,0 k3,1 k3,2
State
k0,3
k1,3
k2,3
k3,3
b0,0 b0,1 b0,2
b1,0 b1,1
bi,j b1,2
b2,0 b2,1 b2,2
b3,0 b3,1 b3,2
Key
XOR
18/34
IASTED-PDCS November, 2003
State
b0,3
b1,3
b2,3
b3,3
The Add-Round-Key
Transformation
• 1 processor
a0,0 a0,1 a0,2
a1,0 a1,1
ai,j a1,2
a2,0 a2,1 a2,2
a3,0 a3,1 a3,2
a0,3
a1,3
a2,3
a3,3
t(n) = O(Nb)
k0,0 k0,1 k0,2
k1,0 k1,1
ki,j k1,2
k2,0 k2,1 k2,2
k3,0 k3,1 k3,2
State
k0,3
k1,3
k2,3
k3,3
b0,0 b0,1 b0,2
b1,0 b1,1
bi,j b1,2
b2,0 b2,1 b2,2
b3,0 b3,1 b3,2
Key
XOR
18/34
IASTED-PDCS November, 2003
State
b0,3
b1,3
b2,3
b3,3
The Add-Round-Key
Transformation
• 4 x Nb processors t(n) = O(1)
a0,0 a0,1 a0,2
a1,0 a1,1
ai,j a1,2
a2,0 a2,1 a2,2
a3,0 a3,1 a3,2
a0,3
a1,3
a2,3
a3,3
k0,0 k0,1 k0,2
k1,0 k1,1
ki,j k1,2
k2,0 k2,1 k2,2
k3,0 k3,1 k3,2
State
k0,3
k1,3
k2,3
k3,3
b0,0 b0,1 b0,2
b1,0 b1,1
bi,j b1,2
b2,0 b2,1 b2,2
b3,0 b3,1 b3,2
Key
XOR
18/34
IASTED-PDCS November, 2003
State
b0,3
b1,3
b2,3
b3,3
The Round Transformation
For i = 1 to Nr – 1
State ByteSub(State)
State ShiftRow(State)
State MixColumn(State)
State AddRoundKey(State, Key)
Final Round:
State ByteSub(State)
State ShiftRow(State)
State AddRoundKey(State, Key)
19/34
IASTED-PDCS November, 2003
The Round Transformation
• Sequential Model
p(n) = 1
t(n) = O(Nb x Nr)
• Fully Parallel Model
p(n) = O(Nb)
t(n) = O(Nr)
s(n) = O(Nb)
c(n) = O(Nb x Nr)
We have achieved
cost-optimality!
20/34
IASTED-PDCS November, 2003
Key Expansion Algorithm
For j = 1 to Nk
W[j] = (Key[4j],Key[4j+1],Key[4j+2],Key[4j+3])
For j = Nk+1 to Nb x (Nr+1)
temp = W[j-1]
if( j % Nk = 0 )
temp = SubByte(RotByte(temp)) ^
Rcon[j/Nk]
else if( Nk > 6 && j % Nk == 4 )
temp = SubByte(temp)
W[j] = W[j-Nk] XOR temp
21/34
IASTED-PDCS November, 2003
Key Expansion Algorithm on a
Uniprocessor (Sequential) Machine
Basic Algorithm Structure:
Nk iterations
For j = 1 to Nk
{…}
For j = Nk+1 to Nb x (Nr+1)
{…}
Nb x (Nr + 1) - Nk iterations
Total: Nb x (Nr + 1) iterations
1 processor
22/34
t(n) = O(Nb x Nr)
IASTED-PDCS November, 2003
Key Expansion Algorithm on a
Parallel Machine
• The loop-carried dependence appears to render
the algorithm impossible to parallelize…
For j = Nk+1 to Nb x (Nr+1)
temp = W[j-1]
…
W[j] = W[j-Nk] XOR temp
23/34
IASTED-PDCS November, 2003
Key Expansion Algorithm on a
Parallel Machine
• … Observe that XOR is a binary associative
operation.
For j = Nk+1 to Nb x (Nr+1)
temp = W[j-1]
…
W[j] = W[j-Nk] XOR temp
23/34
IASTED-PDCS November, 2003
Key Expansion Algorithm on a
Parallel Machine
• This algorithm is simply a variant of Prefix Sum
with XOR instead of +.
For j = Nk+1 to Nb x (Nr+1)
temp = W[j-1]
…
W[j] = W[j-Nk] XOR temp
23/34
IASTED-PDCS November, 2003
Key Expansion Algorithm
• To compute the prefix sum cost-optimally:
Nb Nr
pn O
log Nb log Nr
t n Olog Nb log Nr
24/34
IASTED-PDCS November, 2003
Round Key Selection
• Bytes W[Nb x i] through W[Nb x (i+1) – 1]
are chosen to be the key bits for round i.
W[1..Nb-1]
W[Nb..2Nb-1]
…
W[NbNr..Nb(Nr+1)-1]
• Can be interleaved with the Key Expansion
phase with no additional overhead.
25/34
IASTED-PDCS November, 2003
Key Schedule
• Sequential Algorithm
pn 1
t n ONb Nr
• Parallel (Prefix-Sum) Algorithm
Nb Nr
pn O
log Nb log Nr
t n Olog Nb log Nr
26/34
IASTED-PDCS November, 2003
The Rijndael Cipher:
Sequential Model
Key Schedule
Round Transformation
pn 1
pn 1
t n ONb Nr
t n ONb Nr
Overall
pn 1
t n ONb Nr
27/34
IASTED-PDCS November, 2003
The Rijndael Cipher:
Parallel Model
Key Schedule
Nb Nr
pn O
log Nb log Nr
t n Olog Nb log Nr
Round Transformation
pn ONb
t n ONr
28/34
IASTED-PDCS November, 2003
The Rijndael Cipher:
Parallel Model
Altogether
t n ONr
Nb Nr
pn O
log Nb log Nr sn ONb
Nb Nr 2
ONb Nr
cn O
log Nb log Nr
This model does NOT yield a cost-optimal solution!
29/34
IASTED-PDCS November, 2003
Achieving Cost Optimality with a
Parallel Model of Computation
• Reduce the number of processors from
Nb
pn ONb O
log Nb
• The Round Transformation requires time
t n Olog Nb
• The Key Schedule requires time
t n ONr log Nb
30/34
IASTED-PDCS November, 2003
Achieving Cost Optimality
• Final Results:
Nb
t n ONr log Nb
pn O
log Nb
• Speedup and Cost:
Nb
cn ONb Nr
sn O
log Nb
31/34
IASTED-PDCS November, 2003
Summary of Results
• Fastest Model
• Cost-Optimal Model
Nb Nr
pn O
log Nb log Nr
Nb
pn O
log Nb
t n ONr
t n ONr log Nb
sn ONb
Nb
s n O
log Nb
Nb Nr 2
cn O
log Nb log Nr
32/34
cn ONb Nr
IASTED-PDCS November, 2003
Outline
• Introduction
• Background Material
• Analysis of the Rijndael Cipher
• Concluding Remarks
33/34
IASTED-PDCS November, 2003
Concluding Remarks
• First theoretical study of the parallelism
inherent in the Rijndael AES.
• Fastest parallel model was not cost-optimal
- some acceleration was sacrificed in
order to achieve cost-optimality.
34/34
IASTED-PDCS November, 2003
© Copyright 2026 Paperzz