A 670ps, 64bit Dynamic Low-Power Adder Design Ramchan Woo*, Se-Joong Lee and Hoi-Jun Yoo Semiconductor System Laboratory Department of Electrical Engineering and Computer Science Korea Advanced Institute of Science and Technology (KAIST) Ramchan Woo, ISCAS 2000 1 Outline • Introduction • Adder Design – Fast ‘P’ Generation – Carry Propagation Architecture and Circuit Implementation • Adder Summary – Worst-case Waveforms – Features • Conclusions Ramchan Woo, ISCAS 2000 2 Introduction • Adders in High-Speed Microprocessor – High-Speed Data Path – Effective Address Calculation in Load/Store Unit ADD/SUB MUL MAC MEM Address Base + Effective Address TLB Physical Address Cache Memory MEM Address Offset Ramchan Woo, ISCAS 2000 3 Introduction • Speeding-up the Adder in deep-submicron process technology ISLPE1994 DEC (CMOS) Speed 8ns – Fast ‘P’ Generation in Carry Look Ahead – Separated Carry Propagation Path – Efficient Carry Grouping to Extract More Parallelism – Simple Dynamic Circuits JSSC1996 Mitsubishi (BiCMOS) 3ns 2ns 1ns ISSCC2000 IBM (SOI) 0.18 ISCAS2000 This Work (CMOS) 0.25 ISSCC1996 HP (CMOS) 0.5 0.6 Technology (um) Adder Speed Comparison Ramchan Woo, ISCAS 2000 4 Fast P Generation • Fast P Generation in Carry Look Ahead A CLK B P A B P Conventional XOR ‘P’ (XORP) P = A⊕B G = A•B Proposed Dynamic-OR ‘P’ (DORP) P=A+B G = A•B Ramchan Woo, ISCAS 2000 5 Fast P Generation • DORP and XORP Simulation Results FO2 250 200 200 Circuit Delay (ps) Circuit Delay (ps) FO1 250 150 100 50 0 0.18 0.25 0.35 0.6 0.8 1 Process (um) 150 100 50 0 0.18 0.25 0.35 0.6 0.8 1 Process (um) FO4 Circuit Delay (ps) 350 XORP 300 250 DORP 200 150 100 DORP is 30% Faster than XORP 50 0 0.18 0.25 0.35 0.6 0.8 1 @ 0.25µm Process (um) Ramchan Woo, ISCAS 2000 6 Separated Carry Propagation Path • Conventional Carry Propagation Path – Large Capacitance Loading in PG and Carry Path – Increased Circuit Complexity in Carry Propagate Path PG Circuit Carry Path 16bis or more SUM Stage PG Circuit Carry Propagate Intermediate Carry Signals Ramchan Woo, ISCAS 2000 7 Separated Carry Propagation Path • Proposed Carry Propagation Path – Reduced Capacitance Loading – Simple Circuits in Carry Path Separation Buffer 0 1 Only 3bits ! PG Circuit Simple Carry Path Conditional Carry Generation 16bits SUM Stage PG Circuit Carry Propagate Group Carry Signals Intermediate Carry Signals Ramchan Woo, ISCAS 2000 8 Efficient Carry Grouping • Parallel Quaternary-tree form of Group Carry Selection A[0], B[0] PG PG PG GPGG PG GPGG GGPGGG GGP[15] GGG[15] GGPGGG GGP[31] GGG[31] Contain P,G Information of A[16], B[16] ~ A[31], B[31] GGPGGG GGP[47] GGG[47] Contain P,G Information of A[32], B[32] ~ A[47], B[47] GPGG Contain P,G Information of A[0], B[0] ~ A[15], B[15] PG PG PG GPGG PG A[63], B[63] First Level Ramchan Woo, ISCAS 2000 9 Efficient Carry Grouping Conditional Carry Block from 1st Level GGP[15] GGG[15] 0 GGC GC GGC[15] carry propagate GGP[31] GGG[31] GC GC GC 1 to SUM Stage GC[19] GC[23] GC[27] GC[31] GGC GGC[31] GGP[47] GGG[47] GGC GC[59] GGC[47] Only 3bits are indirectly connected to SUM stage Second Level Ramchan Woo, ISCAS 2000 10 Conditional Sum Select • 4bit Ripple-Carry SUM Blocks with Conditional Sum Select C[59] C[55] C[47] C[51] P[63:60] G[63:60] P[51:48] G[51:48] SUM 0 SUM 0 SUM 0 SUM 0 SUM 1 SUM 1 SUM 1 SUM 1 SUM51:48_0 SUM63:60_1 SUM63:60_0 MUX SUM[63:60] from Second Level of Carry Propagation MUX MUX SUM51:48_1 MUX SUM[51:48] Each SUM generation is changed because of Dynamic-OR ‘P’ SUM[i] = G[i] • C[i − 1] + G′[i] • (C[i − 1] ⊕ P[i]) Ramchan Woo, ISCAS 2000 11 Floorplan and Carry-Path 63 47 1stGPGG 1stGPGG 1stGPGG condGC condGC condGC MUX condGC MUX condGC MUX condGC MUX genSUM genSUM genSUM genSUM genSUM MUX MUX 15 31 1stGPGG 1stGPGG 1stGPGG 1stGPGG 1stGPGG 1stGPGG 1stGPGG 1stGPGG 0 1stGPGG 1stGPGG 1stGPGG 2ndGPGG 2ndGPGG 2ndGPGG condGC condGC condGC condGC condGC condGC MUX condGC MUX condGC MUX condGC MUX genSUM genSUM genSUM genSUM genSUM genSUM genSUM genSUM MUX MUX MUX MUX 1stGPGG condGC condGC condGC condGC condGC condGC MUX condGC MUX condGC MUX condGC MUX MUX MUX MUX genSUM genSUM genSUM genSUM genSUM genSUM genSUM genSUM genSUM genSUM genSUM genSUM genSUM genSUM genSUM genSUM genSUM genSUM MUX MUX MUX MUX MUX MUX MUX MUX MUX PG generation Carry Propagation Path Conditional Carry Block Sum Stage MUX g Carry passes only 8 Blocks Ramchan Woo, ISCAS 2000 12 Circuit Implementation • Simple Dynamic CMOS Circuits CLK P[3] P[2] P[1] P[0] GP CLK GC GGP[i-1] GGG[i-1] P[3] P[2] GG G[3] G[2] GC[i-16] P[1] G[1] GP/GG Generation G[0] GC Generation 25% Speed-Up due to Small Junction Capacitance Ramchan Woo, ISCAS 2000 13 Adder Summary Delay Results Power Results Average Current = ~40mA @ 2.5V Process 0.25µm 1-poly 5-metal CMOS Supply Volatge 2.5V Adder Delay 670ps Power Dissipation 100mW @ 500MHz CLK Cycle Transistor Count 5460 Active Adder Area 0.16mm2 (1284µm x 130µm) Chip Size 0.84mm2 (1400µm x 600µm) Adder Features Ramchan Woo, ISCAS 2000 14 Layout and Die-Photo Layout of Adder Test Chip Die-Photo Chip Size : 0.84mm2 (1400µm x 600µm) Ramchan Woo, ISCAS 2000 15 Conclusions • A 64bit Dynamic Adder is designed and fabricated using 0.25µm CMOS technology. • Dynamic-OR ‘P’ Scheme yields 30% Speed-up in PG-stage. • Separated Carry Propagation Path yields 25% Performance Improvement by using Simple Dynamic Circuits. • Adder shows 670ps Worst-case Delay and 100mW Power Consumption @ 500MHz Clock Cycle. Ramchan Woo, ISCAS 2000 16
© Copyright 2026 Paperzz