Asynchronous Datapath Design • Adders • Comparators • Multipliers • Registers • Completion Detection • Bus • Pipeline •….. Read Reading 3: Delay-Insensitive Adders Asynchronous Adder Design • Motivation • Background: Sync and Async adders • Delay-insensitive carry-lookahead adders • Complexity Analysis • Conclusions Motivation • Integer addition is one of the most important operations in digital computer systems • Statistics shows that in a prototypical RISC machine (DLX) 72% of the instructions perform additions(or subtractions) in the datapath. • In ARM processors it even reaches 80%. • The performance of processors is significantly influenced by the speed of their adders. Background • Adders: synchronous or asynchronous synchronous adders: worst case performance asynchronous adders: average case performance • For example: Ripple-Carry Adders(synchronous): O(n) Carry-Completion Sensing Adders(asynchronous): O(log n) Background: Binary Addition • Worst case • Best case 00000001 00000000 + 11111111 + 00000000 ------------------------------------------S 00000000 S 00000000 C 11111111 C 00000000 ------------------------------------------000000000 100000000 • Adders can perform average case behavior Background • Ripple-Carry Adders: • One-stage full adder: • Logic complexity: O(n) • Time complexity: O(n) Background • Carry-Sensing Completion Detection Adders: (asynchronous version of RCA) Background • One-stage CSCD Adder: • Carry-Sensing Completion Detection Adders: Logic complexity: O(n) Time complexity: O(log n) Background • Delay-Insensitive Ripple-Carry Adders: (DI version of RCA): Background • One-stage DIRCA: • DIRCA Adders: Logic complexity: O(n) Time complexity: O(log n) • One of the most robust adders Background • Completion detection for asynchronous adders: Background • DI adder VS Bundling Constraint adder: Carry-Lookahead Adders • RCA requires n stage-propagation delays. • For high speed processors, this scheme is undesirable. • One way to improve adder performance is to use parallel processing in computing the carries. • That is why Carry-Lookahead Adders (CLA) are introduced. • CLAs: Logic complexity: O(n) Time complexity: O(log n) Carry-Lookahead Adders Carry-Lookahead Adders • A module: • B module: DI Carry-Lookahead Adders • Delay-Insensitive Carry-Lookahead Adders (DICLA) may be implemented by using delay-insensitive code. 1. dual-rail signaling: inputs, sums, and carry bits a. No data b. valid 0 c. valid 1 d. illegal A1=0 A0=0 A1=0 A0=1 A1=1 A0=0 A1=1 A0=1 2. one-hot code: internal signals a. No data: b. c. d. 000 001 010 100 QDI Carry-Lookahead Adders • DI C module: 1. internal signals: one-hot code, k, g, p 2. input and sum bits: dual-rail signals CLA A module QDI Carry-Lookahead Adders • DI D module: 1. Internal signals: one-hot code, K, G, P 2. Carry bits: dual-rail signals CLA B module DI Carry-Lookahead Adders DI Carry-Lookahead Adders k3,g3 If A3=B3 then C3 is carry kill or generate DI Carry-Lookahead Adders k3,g3 K3,2, G3,2 G3,2, K3,2 can be used to speed up the carry computation too. Speeding Up DICLA • Idea: Send the carry-generate’s and carry-kill’s to any possible stages which needs these information to compute carries immediately. • D module with speed-up circuitry Speeding Up DICLA • General form: • D module with speed-up circuitry for carry-kill for carry-generate = gj-1+gj-2Pj-1+…+g0p1p2…pj-1 This is in fact the full carry-lookahead scheme. Speeding Up DICLA • Problem of full carry-lookahead scheme • practical limitations on fan-in and fan-out, irregular structure, and many long wire. • logic complexity increases more than linearly • Solution: use the properties of tree-like structure • New speed-up circuitry: • SP focuses on the root node of a subtree. • All leftmost root node of its right subtree Power of Speed-up Circuitry x : carry chain x’ in r subtree x-x’ in l subtree Power of Speed-up Circuitry Without Speed-up circuitry Power of Speed-up Circuitry With Speed-up circuitry Optimization: • Simplified D module • Simplified D’ module • Better logic complexity • Delay-Insensitive again Complexity Analysis • DICLASP • Logic Complexity: (n) • Time Complexity: (log log n) • Best area-time efficiency: (n log log n) Complexity Analysis CMOS: C module CMOS: SD module CMOS: SD’ module SPICE Simulation: SPICE Simulation contains two parts: • Random number inputs: 10000 random generated input pairs • Statistical data: running examples on a 32-bit ARM emulator SPICE Simulation: • Random number input distribution SPICE Simulation: • SPICE simulation results: random number inputs • Speedup: DIRCA vs RCA: 6.39 DICLASP vs CLA: 2.64 SPICE Simulation: • Breakdown of addition/subtraction operations: by runing three benchmark programs: Dhrystone f1, Dhrystone f2 and Espresso dc2 on a 32-bit ARM simulator SPICE Simulation:dynamic traces SPICE Simulation: • dynamic traces • 83.92% instructions: |carry chain| <17 SPICE Simulation: • SPICE simulation results: dynamic traces • Average computation time: DIRCA 9.61ns DICALSP 5.25ns • Speedup: DIRCA vs RCA: 4.1 DICLASP vs CLA: 2.2 Conclusion • DICLASP Best area-time efficiency: (n log log n) Correctness: No adder is more robust than DICLASP Cost(Logic Complexity):No parallel adder is cheaper than DICLASP ((n)). Speed(Time Complexity):No adder is better than DICLASP ((log log n)). Suitable for VLSI implementation.
© Copyright 2026 Paperzz