wp_hcores_rijnfast.pdf

White Paper
High-Speed Rijndael Encryption/Decryption
Processors (Advanced Encryption Standard (AES))
Introduction
The Altera® high-speed Rijndael encryption/decryption processors are optimized for Altera APEX™ 20K devices.
The current versions of the high-speed cores support block lengths of 128 bits and key lengths of 128 bits only. Two
versions of the encryption/decryption cores are supported; one has 128-bit parallel inputs and outputs, and the other
has a 16-bit wide bus (multi-cycle access) for all inputs and outputs.
A test case generator is included with the core, which converts a file containing key and block data into a simulation
file for the core.
Core List
The four cores are:
„
„
„
„
RJPECAA—Encryptor with 128-bit wide I/O.
RJPDCAA—Decryptor with 128-bit wide I/O.
RJPESAA—Encryptor with 16-bit wide bus I/O.
RJPDSAA—Decryptor with 16-bit wide bus I/O.
The naming convention is as follows:
RiJndael Parallel (Encryptor/Decryptor) (Core/System) AA
where:
Parallel—High-speed core (as opposed to Hammercores by Altera low-speed Rijndael processors)
Encryptor/Decryptor—Self Explanatory
Core—Core with 128-bit wide I/O
System—Core with multi-cycle access bus
AA—Version
The Rijndael Algorithm
The Rijndael algorithm processes an encryption or decryption operation in a number of rounds, a total of 10 when the
block and key lengths are 128 bits. The key is expanded into a much larger keyspace. The keyspace is as large as the
number of block bits, times the number of rounds. The block and keyspace are stored in byte format as matrices of
four rows, with four columns due to the 128 bit block and key lengths.
An encryption round consists of the following operations (decryption reverses the order of operations for each
round):
„
„
„
„
Byte by Byte Finite Field Substitution
Matrix Index Shuffling
Matrix Multiply over a Finite Field
Key Addition
WP-HGHSPDRIJNDL-1.0
Date: June 2002
1
Altera Corporation
High-Speed Rijndael Encryption/Decryption Processors
Ports and Parameters
The ports for all four cores, RJPECAA, RJPDCAA, RJPESAA, and RJPDSAA are described separately.
There are no parameters for any of the four cores.
RJPECAA Ports
Table 1 RJPECAA Input Signals
Signal
Description
SYSCLK
SYSCLK is the main system clock. After loading (using LOAD), 11 clocks are required to compute the
encryption result.
RESET
The core is asynchronously reset when the RESET signal is asserted high. The core does not need to be
RESET between encryption cycles.
LOAD
When this signal is asserted, data on the PLAIN[16..1][8..1] and KEYIN[16..1][8..1]busses
are latched into the core. This signal needs to be only one clock in duration, and must be de-asserted
before the CALC signal is asserted.
CALC
When CALC is asserted, processing begins. CALC must be held high until processing is complete
(indicated by the ENCDONE) signal. The keyspace calculation is performed at the same time as the
encryption operation.
PLAIN[16..1][8..1]
Plaintext input bytes are written into the core using this bus, when the LOAD input is asserted.
KEYIN[16..1][8..1]
Key input bytes are written into the core using this bus, when the LOAD input is asserted.
Table 2 RJPECAA Output Signals
Signal
Description
ENCDONE
The core will assert ENCDONE when the encryption is complete.
CIPHER[16..1][8..1]
Ciphertext bytes are output on this bus when the ENCDONE signal is asserted.
2
Altera Corporation
High-Speed Rijndael Encryption/Decryption Processors
Ports - RJPDCAA
Table 3 RJPDCAA Input Signals
Signal
Description
SYSCLK
SYSCLK is the main system clock.
RESET
The core is asynchronously reset when the RESET signal is asserted high. The core does not need to
be RESET between decryption cycles.
LOADKEY
When this signal is asserted, data on the KEYIN[16..1][8..1]bus is latched into the core. Once a
key has been loaded, and the keyspace calculated, it does not have to be done again until the key is
changed. The signal must be de-asserted before CALCKEY is asserted.
CALCKEY
When this signal is asserted, the key space is calculated from the key that was previously loaded with
LOADKEY. CALCKEY must remain high until the KEYDONE signal is asserted. When a key is being
loaded or calculated, the core cannot be decrypting data. A keyspace must be calculated before any
decryption operation can start. A total of 10 clock cycles are required to calculate a new keyspace.
LOAD
When this signal is asserted, data on the CIPHER[16..1][8..1] bus is latched into the core. This
signal needs to be only one clock in duration, and must be de-asserted before the CALC signal is
asserted.
CALC
When CALC is asserted, processing begins. CALC must be held high until processing is complete
(indicated by the DECDONE) signal.
CIPHER[16..1][8..1]
Ciphertext input bytes are written into the core using this bus, when the LOAD input is asserted.
KEYIN[16..1][8..1]
Key input bytes are written into the core using this bus, when the KEYLOAD input is asserted.
Table 4 RJPDCAA Output Signals
Signal
Description
KEYDONE
The core will assert KEYDONE when the keyspace calculation is complete.
DECDONE
The core will assert DECDONE when the decryption is complete.
PLAIN16..1][8..1]
Decrypted plaintext bytes are output on this bus when the DECDONE signal is asserted.
3
Altera Corporation
High-Speed Rijndael Encryption/Decryption Processors
Ports - RJPESAA
Table 5 RJPESAA Input Signals
Signal
Description
SYSCLK
SYSCLK is the main system clock.
RESET
The core is asynchronously reset when the RESET signal is asserted high. The core does not need to
be RESET between encryption cycles.
LOADKEY
When this signal is asserted, data on the KEYIN[16..1] bus is latched into the staging registers
around the core. Once a key has been loaded it does not have to be done again until the key is
changed. A total of eight 16 bit key portions are loaded. A new key can be staged while the previous
key and plaintext are still being processed by the core.
LOADTEXT
When this signal is asserted, data on the PLAIN[16..1]bus is latched into the staging registers
around the core. A total of eight 16 bit plaintext portions are loaded. New plaintext can be staged while
the previous key and plaintext are still being processed by the core.
INITCORE
When this signal is asserted, both the plaintext and key in the staging registers is latched into the core.
The input staging registers are independent of the core, and no transfer takes place from the input
staging registers until the INITCORE signal is asserted.
CALCCORE
This signal enables the encryption operation, and must be held high until the operation is complete, as
indicated by the ENCDONE signal. The keyspace calculation is performed at the same time as the
encryption operation.
UNLOAD
Once encryption is complete, asserting UNLOAD will shift the result out of the output staging registers,
starting with the least significant word. The encryption core can be unloaded at the same time as new
key and plaintext is input to the core, as well as when an encryption operation is taking place. Note:
The output staging registers are loaded automatically with ENCDONE.
PLAIN[16..1]
Plaintext input words are written into the staging registers using this bus, when the LOADTEXT input is
asserted. The least significant word is loaded first.
KEYIN[16..1]
Key input words are written into the staging registers using this bus, when the LOADKEY input is
asserted. The least significant word is loaded first.
Table 6 RJPESAA Output Signals
Signal
Description
ENCDONE
The core asserts ENCDONE when the encryption operation is complete. The output staging registers
will be automatically loaded with the encryption result.
CIPHER[16..1]
Encrypted words are shifted out of this bus, least significant word first, when the UNLOAD signal is
asserted.
4
Altera Corporation
High-Speed Rijndael Encryption/Decryption Processors
Ports - RJPDSAA
Table 7 RJPDSAA Input Signals
Signal
Description
SYSCLK
SYSCLK is the main system clock.
RESET
The core is asynchronously reset when the RESET signal is asserted high. The core does not need to
be RESET between decryption cycles.
LOADKEY
When this signal is asserted, data on the KEYIN[16..1] bus is latched into the staging registers
around the core. Once a key has been loaded it does not have to be done again until the key is
changed. A total of eight 16 bit key portions are loaded. A new key can be staged while the previous
key and plaintext are still being processed by the core.
LOADTEXT
When this signal is asserted, data on the CIPHER[16..1]bus is latched into the staging registers
around the core. A total of eight 16 bit ciphertext portions are loaded. New ciphertext can be staged
while the previous key and ciphertext are still being processed by the core.
INITKEY
When this signal is asserted, the key in the staging registers is latched into the core. The key staging
registers are independent of the core, and no transfer takes place from the key staging registers until
the INITKEY signal is asserted.
CALCKEY
This signal enables the keyspace calculation operation, and must be held high until the keyspace has
been calculated, indicated by the KEYDONE output. A keyspace must be calculated before any
decryption operation can take place, but once a keyspace has been calculated, it can be reused an
unlimited number of times without recalculation. A total of 10 clock cycles are required to calculate a
new keyspace.
INITCORE
When this signal is asserted, the ciphertext in the staging registers is latched into the core. The input
staging registers are independent of the core, and no transfer takes place from the input staging
registers until the INITCORE signal is asserted.
CALCCORE
This signal enables the decryption operation, and must be held high until the operation is complete, as
indicated by the DECDONE signal.
UNLOAD
Once encryption is complete, asserting UNLOAD shifts the result out of the output staging registers,
starting with the least significant word. The encryption core can be unloaded at the same time as new
ciphertext is input to the core, as well as when a decryption operation is taking place. Note: The output
staging registers are loaded automatically with DECDONE.
CIPHER[16..1]
Ciphertext input words are written into the staging registers using this bus, when the LOADTEXT input is
asserted. The least significant word is loaded first.
KEYIN[16..1]
Key input words are written into the staging registers using this bus, when the LOADKEY input is
asserted. The least significant word is loaded first.
Table 8 RJPDSAA Output Signals
Signal
Description
KEYDONE
The core will assert KEYDONE when a keyspace calculation is complete.
DECDONE
The core will assert DECDONE when the decryption operation is complete. The output staging registers
will be automatically loaded with the decryption result.
PLAIN[16..1]
Decrypted words are shifted out of this bus, least significant word first, when the UNLOAD signal is
asserted.
Compiling the Core
The core is optimized for Altera APEX 20K devices, because of the large amount of memory required, especially for
the decryptor cores.
5
Altera Corporation
High-Speed Rijndael Encryption/Decryption Processors
Table 9 shows the resource requirements and fMAX of the four cores.
Table 9 Core Summary
Core
LEs
ESBs
fMAX (MHz)
RJPECAA
820
20
78
RJPESAA
1211
20
78
RJPDCAA
1437
28
60
RJPDSAA
1694
28
60
Core Performance
The core performance (in Mbits/s) can be calculated as described. As the keyspace must only be calculated once per
key, it does not affect the throughput, except for the first operation for a new key.
Performance = fMAX × (128 bits/block) / (11 clocks) = fMAX × 11.63
For the encryptors, the throughput is 907 Mbps.
For the decryptors, the throughput is 698 Mbps.
Testing the Core
A utility is included with the core package that encrypts a key and plaintext sequence stored in a file, and decrypts
the ciphertext back into plaintext. It uses the plaintext and ciphertext to generate test cases for the Altera tools, for the
encryption and decryption cores respectively.
Example Test – Standard Sequence
The data format in the test source file must be byte wide hexadecimal, from ‘00’ to ‘FF’.
First create a testing file foo.txt, containing the following data:
12 13 23 34 45 56 67 78 89 ff ee dd ea a1 b1 b2
0a 0b 0c 0d 44 55 66 77 1a 1b 1c 1d 1 2 3 4
The utility will parse the data into a key, followed by plaintext.
Now, run the utility:
SIX
(There are no input parameters for the utility)
The utility returns:
6
Altera Corporation
High-Speed Rijndael Encryption/Decryption Processors
Key: 121323344556677889ffeeddeaa1b1b2
Plaintext: 0a0b0c0d445566771a1b1c1d01020304
Ciphertext: 69bf969db6c11b1ad3c66461f743a09f
Plaintext: 0a0b0c0d445566771a1b1c1d01020304
At the same time, the utility generates testcases for the RJPESAA.TDF (encryption) and RJPDSAA.TDF
(decryption) cores.
Now, run the testcases.
With the Quartus™ II software in simulator mode, open the vector file RJPESAA.VEC and start the simulation.
After simulation is complete, the ciphertext is read out (left to right bytes) on the CIPHER[16..1] bus.
The simulation of the decryption core is handled identically.
Appendix A—Top Level Wrapper
Unencrypted top level wrappers for all four cores (TOP_LEVEL_RJPECAA.TDF,
TOP_LEVEL_RJPESAA.TDF, TOP_LEVEL_RJPDCAA.TDF, and TOP_LEVEL_RJPDSAA.TDF) are
provided, to make it easier to instantiate the cores. The source code for one of the wrappers is described below:
TOP_LEVEL_RJPESAA.TDF
FUNCTION rjpesaa (sysclk, reset, loadtext, loadkey, unload, initcore,
calccore, plain[16..1], key[16..1])
RETURNS (cipher[16..1], encdone);
subdesign top_level_rjpesaa
(
sysclk, reset : INPUT;
loadtext, loadkey, unload : INPUT;
initcore, calccore : INPUT;
plain[16..1] : INPUT;
key[16..1] : INPUT;
cipher[16..1] : OUTPUT;
encdone : OUTPUT;
)
7
Altera Corporation
High-Speed Rijndael Encryption/Decryption Processors
BEGIN
(cipher[16..1], encdone) = rjpesaa (sysclk, reset, loadtext, loadkey,
unload,
initcore, calccore, plain[16..1], key[16..1]);
END;
101 Innovation Drive
San Jose, CA 95134
(408) 544-7000
http://www.altera.com
Copyright © 2002 Altera Corporation. All rights reserved. Altera, The Programmable Solutions Company, the stylized Altera logo, specific device
designations, and all other words and logos that are identified as trademarks and/or service marks are, unless noted otherwise, the trademarks and
service marks of Altera Corporation in the U.S. and other countries. All other product or service names are the property of their respective holders.
Altera products are protected under numerous U.S. and foreign patents and pending applications, mask work rights, and copyrights. Altera warrants
performance of its semiconductor products to current specifications in accordance with Altera's standard warranty, but reserves the right to make
changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any
information, product, or service described herein except as expressly agreed to in writing by Altera Corporation. Altera customers are advised to obtain
the latest version of device specifications before relying on any published information and before placing orders for products or services.
8
Altera Corporation
High-Speed Rijndael Encryption/Decryption Processors
9