Dynamic High-Performance Multi-Mode Architectures for AES Encryption Eric Swankoski Naval Research Lab Vijay Narayanan Penn State University Swankoski 1 MAPLD 2005 / B103 Background & Motivation Bandwidth and throughput capabilities of modern optical networks is skyrocketing Protecting transmitted data becoming more and more critical Current encryption architectures generally aren’t capable of keeping up with high-speed environments SEU effects rarely, if ever, considered Swankoski 2 MAPLD 2005 / B103 Plan of Attack: FPGA Encryption Algorithm: Advanced Encryption Standard (AES) – – – Target Architecture: Xilinx FPGAs – – Supports multiple key lengths Supports multiple encryption modes Supports multiple levels of pipelining Can be adapted to ASIC devices Virtex-II, Virtex-4 Target Performance: 60+ gigabits per second – Swankoski Requires both inner-round and outer-round pipelining 3 MAPLD 2005 / B103 The AES Algorithm 10 Rounds of Encryption for 128-bit operands Four basic operations: – SubBytes: – ShiftRows: – Polynomial multiplication (4 parallel operations per round) AddRoundKey Swankoski Byte reordering and rotation (4 parallel operations per round) MixColumns: – 8-bit substitution (16 parallel operations per round) Simple 128-bit XOR 4 MAPLD 2005 / B103 Optimizing for Performance Exploit all possible parallelism Alternative byte substitution methods – – 1 cycle for a lookup-based substitution 5 cycles for a mathematical transformation Utilize pipelining – – Outer-Round: 1 cycle per round Inner-Round: Swankoski 4 cycles per round (lookup-based byte substitution) 8 cycles per round (pipelined byte substitution) 5 MAPLD 2005 / B103 Combinatorial Byte Substitution Actual mathematical transformation Conventional implementation cannot be pipelined – Smaller than lookup table Faster than lookup table – Swankoski Simple (atomic) 8x8 lookup table Utilizes five-stage pipeline All internal operands are four bits wide 6 MAPLD 2005 / B103 Encryption Round Diagram Atomic S-Box: – Combinatorial S-Box: – – 76 Pipeline Stages Needs a constant stream to be effective Parallel Key Scheduling – No performance penalty Offline Key Scheduling – Swankoski 40 Pipeline Stages 7 Precomputed keys can be stored in registers MAPLD 2005 / B103 Counter (CTR) Mode Effectively converts AES into a stream cipher High security – similar to CBC Supports inner-round and outer-round pipelining No error propagation – errors are completely isolated Swankoski 8 MAPLD 2005 / B103 Cipher Block Chaining (CBC) Mode Most secure – no patterns are observed Cannot be pipelined 100% downstream corruption resulting from data loss or singleevent upsets (SEUs) during encryption – Swankoski Errors are isolated during decryption 9 MAPLD 2005 / B103 Electronic Codebook (ECB) Mode Supports full pipelining No error propagation – errors are completely isolated Least secure – identical input gives identical output – Swankoski Patterns observable in video and image data 10 MAPLD 2005 / B103 Staggered CBC Mode Pipelined with Output Feedback Each encrypted block n depends on itself and the block (n – x) where x is the latency of the pipeline Maintains security while mitigating some error propagation problems Swankoski 11 MAPLD 2005 / B103 More Challenges Error-Tolerant Encryption Maintaining High Security Maintaining High Performance Swankoski 12 MAPLD 2005 / B103 Error-Tolerant Encryption Are errors acceptable? – Possibly, but better to assume not How do the multiple modes of encryption deal with upsets? Is there a benefit to triple modular redundancy (TMR)? – Swankoski Is it what we expect? 13 MAPLD 2005 / B103 Error-Tolerant Encryption CTR and ECB encryption isolate errors – Transmission integrity largely preserved even without SEU mitigation TMR can ensure 100% transmission integrity – Swankoski TMR REQUIRED for CBC encryption 14 MAPLD 2005 / B103 Error-Tolerant Encryption Image 1: Error-Free Plaintext Image – – Image 2: Decrypted Plaintext Image – – Before Encryption / After Decryption CTR, ECB, or CBC with mitigation One corrupted block CTR or ECB without mitigation Image 3: Decrypted Plaintext Image – – Swankoski One block corrupted during encryption CBC without mitigation 15 MAPLD 2005 / B103 Maintaining High Security How do the multiple modes of encryption affect security? Is physical protection of the key necessary? – Depends on the environment How is throughput affected by increased security? – Swankoski Hopefully, not at all… 16 MAPLD 2005 / B103 Maintaining High Security ECB-encrypted image has observable patterns CTR/CBC/SCBC encryption looks like random noise Swankoski 17 MAPLD 2005 / B103 Maintaining High Security Physical Key Protection – Power Analysis / Soft Attacks – Not required in aerospace applications Countermeasures not mode specific Throughput Effects – – Swankoski ECB & CTR far outperform CBC Why is CBC an official mode? 18 MAPLD 2005 / B103 System-Level Diagram Supports ECB, CTR, CBC, and SCBC modes Supports two types of TMR – – Swankoski 19 System: triplicates all control, key hardware, and mode logic Encryption: triplicates only encryption and key scheduling hardware MAPLD 2005 / B103 Performance Results – Virtex-4 Byte Substitution Key Scheduling Area Frequency Throughput (CTR, ECB, SCBC) Throughput (CBC) ROM Online 3588 339.5 MHz 43.5 Gbps 1.088 Gbps ROM Offline 2827 446.8 MHz 57.2 Gbps 1.430 Gbps Combinatorial Online 13651 519.2 MHz 66.5 Gbps 700.0 Mbps Combinatorial Offline 10912 519.2 MHz 66.5 Gbps 700.0 Mbps Key Scheduling – – Offline uses precomputed and stored keys (compile or design time) Online uses dynamically computed keys (run time) Significant performance improvement for combinatorial byte substitution in pipelined mode Virtex-II Pro performs better with ROM implementation (56.42 & 60.35 Gbps) Better CBC performance achieved through other architectures Swankoski 20 MAPLD 2005 / B103 Lessons Learned Don’t try to over-optimize FPGA code – – Know your synthesis tool – Returns diminish quickly Sometimes less is more Now why did it do THAT? Check your system’s memory – RAM does fail at inopportune times… Swankoski ESPECIALLY if it has a lifetime warranty 21 MAPLD 2005 / B103 Lessons Learned Over-optimization – In a highly pipelined FPGA design, routing plays a MAJOR role in the clock frequency – – – Swankoski 70%-80% of the total delay What would work in an ASIC (or in theory, or on paper…) might actually make things worse Manual floorplanning and P&R might help, but usually provides minimal (if any) improvement Moral? – Try reducing the pipeline depth as well as increasing it, it just might help! 22 MAPLD 2005 / B103
© Copyright 2026 Paperzz