wp_hcores_rijndael.pdf

White Paper
Low-Speed Rijndael Encryption/Decryption
Processors (Advanced Encryption Standard (AES))
Introduction
The Altera® low-speed Rijndael encryption/decryption processors implement the Rijndael encryption or decryption
algorithms, and are optimized for Altera FLEX™ 10KE and APEX® 20K devices.
The cores are parameterized for key and block sizes. All combinations of key (128, 192, and 256 bits) and block
lengths (128, 192, and 256 bits) are supported.
A test case generator is included with the core, which converts a file containing key and block data into a simulation
file for the core.
The Rijndael Algorithm
The Rijndael algorithm processes an encryption or decryption operation in a number of rounds: 10, 12, or 14. The
key is expanded into a much larger keyspace. The keyspace is as large as the number of block bits, multiplied by the
number of rounds. The block and keyspace are stored as matrixes of four rows, with Nb columns for the block, and
Nk columns for the key.
An encryption round consists of the following operations (decryption reverses the order of operations for each
round):
„
„
„
„
Byte by byte finite field substitution
Matrix index shuffling
Matrix multiply over a finite field
Key addition
Ports and Parameters
Table 1 shows the parameters, which set the key and bit sizes.
Table 1 Parameters
Parameter
Description
Nb
As defined in the Rijndael specification—the number of columns in the block. Nb = 4 corresponds to a 128 bit
block, Nb = 6 is for 192 bits, and Nb = 8 is for 256 bits. Any combination of Nb and Nk can be chosen.
Nk
As defined in the Rijndael specification - the number of columns in the key. Nk = 4 corresponds to a 128 bit key,
Nk = 6 is for 192 bits, and Nk = 8 is for 256 bits. Any combination of Nb and Nk can be chosen.
WP-LWSPDRIJNDL-1.0
Date: June 2002
1
Altera Corporation
Low-Speed Rijndael Encryption/Decryption Processors
Table 2 shows the input signals.
Table 2 Input Signals
Signal Name
Description
SYSCLK
SYSCLK is the main system clock.
RESET
The core is asynchronously reset when the RESET signal is asserted high. This prepares the core to receive
another plaintext (in the case of an encryptor), or ciphertext (in the case of a decryptor) input. The RESET signal
does not clear the previously calculated key expansion, which can be re-used. If a different key is needed from
the one used in the previous cycle, a new key must be loaded and calculated first.
DATALOAD
When this signal is asserted, data on the DATAIN bus is latched into the core.
GO
When all data for an encryption or decryption algorithm has been loaded into the core, GO is asserted to start
data processing. GO must be held high until processing is complete (indicated by ENCDONE or DECDONE,
depending on the core).
DATAIN[8..1]
Input bytes are written into the core using this bus, when the DATALOAD input is set high. A total of 4 × Nb bytes
are written into the core during an input cycle.
ENCADD[5..1]
DECADD[5..1]
Data bytes written into the core must be accompanied by an address, on ENCADD[] (for an encryption core) or
DECADD[] (for a decryption core).
KEYLOAD
When this signal is asserted, data on the KEYIN [] bus is latched into the core.
KEYIN[8..1]
Key bytes are written into the core using this bus, when the KEYLOAD input is asserted. A total of 4 × Nk bytes
are written into the core during an input cycle. After a new key is loaded, the keyspace must be calculated. Once
the keyspace is calculated, it does not have to be calculated again for any subsequent encryption or decryption
operation, until the key needs to be changed.
KEYCALC
Once a new key is loaded, the KEYCALC signal is asserted and held until the key calculation is completed.
(Indicated by KEYDONE).
Table 3 shows the output signals.
Table 3 Output Signals
Signal Name
Description
KEYDONE
The core asserts KEYDONE when the key space has been calculated. At this point, data can be loaded into the
core and processed.
ENCDONE
DECDONE
The core asserts ENCDONE (encryption cores), or DECDONE (decryption cores), when the encryption or
decryption operation is complete.
ENCOUT[8..1]
DECOUT[8..1]
Encrypted or decrypted bytes can be read out of this bus, once processing is complete. ENCADD[] or
DECADD[] are used to read out the bytes in the desired sequence.
Core Interface
The following figures illustrate the relationship between control and data signals needed to load a key, calculate a
keyspace, load plaintext, perform an encryption operation, and read the ciphertext out. The example is for an
encryption processor, Nb = 4, Nk = 4, but the signal relationships are the same for any parameter combination,
although processing times are specific to this particular parameter combination.
2
Altera Corporation
Low-Speed Rijndael Encryption/Decryption Processors
Figure 1 shows the loading of a key into the core. Once the key is loaded, the KEYCALC control can be asserted.
Figure 1: Loading a Key
SYSCLK
RESET
ENCADD
KEYLOAD
KEYIN[]
KEYCALC
The key space calculation requires 190 clock cycles.
Figure 2 shows the loading of plaintext into the core. Once the data loading is complete, GO is used to start the
encryption processing.
Figure 2: Loading Plaintext
SYSCLK
RESET
DATALD
GO
DATAIN[]
ENCADD
The encryption operation requires 260 clock cycles.
The ciphertext can then be read out, (see Figure 3).
There is a two-cycle latency between the address input, and ciphertext output.
3
Altera Corporation
Low-Speed Rijndael Encryption/Decryption Processors
Figure 3: Ciphertext Output
SYSCLK
RESET
DATALD
GO
DATAIN[]
ENCADD
Compiling the Core
The core is optimized for Altera FLEX 10KE and APEX 20K devices. It must be compiled into a device that
supports dual port RAM. For best results, the core should be compiled with a synthesis style that supports carry
chains.
Table 4 and 5 show resource requirements and performance for the encryption and decryption cores, respectively,
with all parameter combinations. The number of clocks required for key and code generation include loading the key
or text information into the core.
Table 4 Area and Performance of RJENCAA
Nb
Nk
LEs
EAB/
ESB
Rounds
Key
Gen.
Rounds
Key
Space
Key
Gen
Clocks
Code Gen
Clocks
4
4
516
5
10
10
176
196
276
4
6
516
5
12
8
208
232
328
4
8
526
5
14
7
240
270
380
6
4
546
5
12
19
312
358
432
6
6
546
5
12
12
312
366
432
6
8
559
5
14
11
360
406
498
8
4
549
5
14
29
480
538
620
8
6
547
5
14
20
480
544
620
8
8
561
5
14
14
480
508
620
4
Altera Corporation
Low-Speed Rijndael Encryption/Decryption Processors
Table 5 Area and Performance of RJDECAA
Nb
Nk
LEs
EAB/
ESB
Rounds
Key
Gen.
Rounds
Key
Space
Key
Gen
Clocks
Code Gen
Clocks
4
4
593
5
10
10
176
196
276
4
6
593
5
12
8
208
232
328
4
8
601
5
14
7
240
270
380
6
4
621
5
12
19
312
358
432
6
6
621
5
12
12
312
366
432
6
8
635
5
14
11
360
406
498
8
4
628
5
14
29
480
538
620
8
6
627
5
14
20
480
544
620
8
8
644
5
14
14
480
508
620
In a 10KE-1 device, the cores typically achieve an fMAX of 78 MHz.
In a 20KE-1 device, the cores typically achieve an fMAX of 95 MHz.
Core Performance
The core performance (in Mbits/s) can be easily calculated. As the keyspace must only be calculated once per key, it
does not affect the throughput, except for the first operation for a new key.
Performance = fMAX × (bits per block)/(code clocks per block)
In the case of Nb = 4 and Nk = 4, in a 20KE-1 device:
Performance = 95 × 128/276
Performance = 44 Mbit/s.
Testing the Core
A utility is included with the core package that encrypts a key and plaintext sequence stored in a file, and decrypts
the ciphertext back into plaintext. It uses the plaintext and ciphertext to generate test cases for the Altera tools, for the
encryption and decryption processors respectively.
Example Test – Standard Sequence
The data format in the test source file must be byte wide hexadecimal, from ‘00’ to ‘FF’.
First create a testing file foo.txt, containing the following data:
12
13
23
34
45
56
67
78
89 ff ee dd ea a1 b1 b2
0a
0b
0c
0d
44
55
66
77
1a
1b
1c
1d
1
2
3
4
The utility parses the data into a key, followed by plaintext.
5
Altera Corporation
Low-Speed Rijndael Encryption/Decryption Processors
Now, run the utility:
FIVE 128 128
(Running the utility without parameters returns the parameter list – number of key bits, number of block bits.)
The utility returns:
Key: 121323344556677889ffeeddeaa1b1b2
Plaintext: 0a0b0c0d445566771a1b1c1d01020304
Ciphertext: 69bf969db6c11b1ad3c66461f743a09f
Plaintext: 0a0b0c0d445566771a1b1c1d01020304
The utility simultaneously generates testcases for the RJENCAA.TDF (encryption) and RJDECAA.TDF
(decryption) cores.
Now, run the testcases.
From the simulator window, in Inputs/Outputs option (File menu), select the vector file RJENCAA., and start the
simulation.
After simulation is complete, the ciphertext is read out (left to right bytes) on the ENCOUT[] bus.
The simulation of the decryption core is handled identically.
Appendix A—Top Level Wrapper
Unencrypted top level wrappers for both cores (TOP_LEVEL_RJENCAA.TDF and
TOP_LEVEL_RJDECAA.TDF, respectively) are provided, to make it easier to instantiate the cores. The source
code for the two wrappers follows.
TOP_LEVEL_RJENCAA.TDF
FUNCTION rjencaa (sysclk, reset, datain[8..1], keyin[8..1], encadd[5..1],
dataload, keyload, keycalc, go)
RETURNS (keydone, encdone, encout[8..1]);
PARAMETERS
(
nb = 4,
nk = 4
);
subdesign top_level_rjencaa
(
sysclk, reset : INPUT;
6
Altera Corporation
Low-Speed Rijndael Encryption/Decryption Processors
datain[8..1], keyin[8..1], encadd[5..1] : INPUT;
dataload, keyload : INPUT;
keycalc, go : INPUT;
keydone, encdone : OUTPUT;
encout[8..1] : OUTPUT;
)
BEGIN
(keydone, encdone, encout[8..1]) = rjencaa (sysclk, reset, datain[8..1],
keyin[8..1], encadd[5..1],
dataload,
keyload, keycalc, go)
WITH (nb=nb,nk=nk);
END;
TOP_LEVEL_RJDECAA.TDF
FUNCTION rjdecaa (sysclk, reset, datain[8..1], keyin[8..1], decadd[5..1],
dataload, keyload, keycalc, go)
RETURNS (keydone, decdone, decout[8..1]);
PARAMETERS
(
nb = 4,
nk = 4
);
subdesign top_level_rjdecaa
(
sysclk, reset : INPUT;
datain[8..1], keyin[8..1], decadd[5..1] : INPUT;
dataload, keyload : INPUT;
keycalc, go : INPUT;
keydone, decdone : OUTPUT;
decout[8..1] : OUTPUT;
)
7
Altera Corporation
Low-Speed Rijndael Encryption/Decryption Processors
BEGIN
(keydone, decdone, decout[8..1]) = rjdecaa (sysclk, reset, datain[8..1],
keyin[8..1], decadd[5..1],
dataload, keyload, keycalc, go)
WITH (nb=nb,nk=nk);
END;
101 Innovation Drive
San Jose, CA 95134
(408) 544-7000
http://www.altera.com
Copyright © 2002 Altera Corporation. All rights reserved. Altera, The Programmable Solutions Company, the stylized Altera logo, specific device
designations, and all other words and logos that are identified as trademarks and/or service marks are, unless noted otherwise, the trademarks and
service marks of Altera Corporation in the U.S. and other countries. All other product or service names are the property of their respective holders.
Altera products are protected under numerous U.S. and foreign patents and pending applications, mask work rights, and copyrights. Altera warrants
performance of its semiconductor products to current specifications in accordance with Altera's standard warranty, but reserves the right to make
changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any
information, product, or service described herein except as expressly agreed to in writing by Altera Corporation. Altera customers are advised to obtain
the latest version of device specifications before relying on any published information and before placing orders for products or services.
8
Altera Corporation
Low-Speed Rijndael Encryption/Decryption Processors
9