Ramchan Woo, ISCAS 2000 1

A 670ps, 64bit Dynamic
Low-Power Adder Design
Ramchan Woo*, Se-Joong Lee and Hoi-Jun Yoo
Semiconductor System Laboratory
Department of Electrical Engineering and Computer Science
Korea Advanced Institute of Science and Technology (KAIST)
Ramchan Woo, ISCAS 2000
1
Outline
• Introduction
• Adder Design
– Fast ‘P’ Generation
– Carry Propagation Architecture and Circuit
Implementation
• Adder Summary
– Worst-case Waveforms
– Features
• Conclusions
Ramchan Woo, ISCAS 2000
2
Introduction
• Adders in High-Speed Microprocessor
– High-Speed Data Path
– Effective Address Calculation in Load/Store Unit
ADD/SUB
MUL
MAC
MEM Address
Base
+
Effective
Address
TLB
Physical
Address
Cache
Memory
MEM Address
Offset
Ramchan Woo, ISCAS 2000
3
Introduction
• Speeding-up the Adder in deep-submicron
process technology
ISLPE1994
DEC
(CMOS)
Speed
8ns
– Fast ‘P’ Generation in
Carry Look Ahead
– Separated Carry
Propagation Path
– Efficient Carry Grouping to
Extract More Parallelism
– Simple Dynamic Circuits
JSSC1996
Mitsubishi
(BiCMOS)
3ns
2ns
1ns
ISSCC2000
IBM
(SOI)
0.18
ISCAS2000
This Work
(CMOS)
0.25
ISSCC1996
HP
(CMOS)
0.5
0.6
Technology
(um)
Adder Speed Comparison
Ramchan Woo, ISCAS 2000
4
Fast P Generation
• Fast P Generation in Carry Look Ahead
A
CLK
B
P
A
B
P
Conventional XOR ‘P’
(XORP)
P = A⊕B
G = A•B
Proposed Dynamic-OR ‘P’
(DORP)
P=A+B
G = A•B
Ramchan Woo, ISCAS 2000
5
Fast P Generation
• DORP and XORP Simulation Results
FO2
250
200
200
Circuit Delay (ps)
Circuit Delay (ps)
FO1
250
150
100
50
0
0.18
0.25
0.35
0.6
0.8
1
Process (um)
150
100
50
0
0.18
0.25
0.35
0.6
0.8
1
Process (um)
FO4
Circuit Delay (ps)
350
XORP
300
250
DORP
200
150
100
DORP is 30% Faster than XORP
50
0
0.18
0.25
0.35
0.6
0.8
1
@ 0.25µm
Process (um)
Ramchan Woo, ISCAS 2000
6
Separated Carry Propagation Path
• Conventional Carry Propagation Path
– Large Capacitance Loading in PG and Carry Path
– Increased Circuit Complexity in Carry Propagate
Path
PG Circuit
Carry
Path
16bis or more
SUM
Stage
PG Circuit
Carry
Propagate
Intermediate
Carry
Signals
Ramchan Woo, ISCAS 2000
7
Separated Carry Propagation Path
• Proposed Carry Propagation Path
– Reduced Capacitance Loading
– Simple Circuits in Carry Path
Separation
Buffer
0
1
Only 3bits !
PG Circuit
Simple
Carry
Path
Conditional
Carry
Generation
16bits
SUM
Stage
PG Circuit
Carry
Propagate
Group
Carry
Signals
Intermediate
Carry
Signals
Ramchan Woo, ISCAS 2000
8
Efficient Carry Grouping
• Parallel Quaternary-tree form of Group
Carry Selection
A[0], B[0]
PG
PG
PG
GPGG
PG
GPGG
GGPGGG
GGP[15]
GGG[15]
GGPGGG
GGP[31]
GGG[31]
Contain P,G Information of
A[16], B[16] ~ A[31], B[31]
GGPGGG
GGP[47]
GGG[47]
Contain P,G Information of
A[32], B[32] ~ A[47], B[47]
GPGG
Contain P,G Information of
A[0], B[0] ~ A[15], B[15]
PG
PG
PG
GPGG
PG
A[63], B[63]
First Level
Ramchan Woo, ISCAS 2000
9
Efficient Carry Grouping
Conditional
Carry Block
from 1st
Level
GGP[15]
GGG[15]
0
GGC
GC
GGC[15]
carry
propagate
GGP[31]
GGG[31]
GC
GC
GC
1
to SUM
Stage
GC[19]
GC[23]
GC[27]
GC[31]
GGC
GGC[31]
GGP[47]
GGG[47]
GGC
GC[59]
GGC[47]
Only 3bits are indirectly
connected to SUM stage
Second Level
Ramchan Woo, ISCAS 2000
10
Conditional Sum Select
• 4bit Ripple-Carry SUM Blocks with
Conditional Sum Select
C[59]
C[55]
C[47]
C[51]
P[63:60] G[63:60]
P[51:48] G[51:48]
SUM
0
SUM
0
SUM
0
SUM
0
SUM
1
SUM
1
SUM
1
SUM
1
SUM51:48_0
SUM63:60_1
SUM63:60_0
MUX
SUM[63:60]
from Second Level of
Carry Propagation
MUX
MUX
SUM51:48_1
MUX
SUM[51:48]
Each SUM generation is changed because of Dynamic-OR ‘P’
SUM[i] = G[i] • C[i − 1] + G′[i] • (C[i − 1] ⊕ P[i])
Ramchan Woo, ISCAS 2000
11
Floorplan and Carry-Path
63
47
1stGPGG
1stGPGG
1stGPGG
condGC
condGC
condGC
MUX
condGC
MUX
condGC
MUX
condGC
MUX
genSUM
genSUM
genSUM
genSUM
genSUM
MUX
MUX
15
31
1stGPGG
1stGPGG
1stGPGG
1stGPGG
1stGPGG
1stGPGG
1stGPGG
1stGPGG
0
1stGPGG
1stGPGG
1stGPGG
2ndGPGG
2ndGPGG
2ndGPGG
condGC
condGC
condGC
condGC
condGC
condGC
MUX
condGC
MUX
condGC
MUX
condGC
MUX
genSUM
genSUM
genSUM
genSUM
genSUM
genSUM
genSUM
genSUM
MUX
MUX
MUX
MUX
1stGPGG
condGC
condGC
condGC
condGC
condGC
condGC
MUX
condGC
MUX
condGC
MUX
condGC
MUX
MUX
MUX
MUX
genSUM
genSUM
genSUM
genSUM
genSUM
genSUM
genSUM
genSUM
genSUM
genSUM
genSUM
genSUM
genSUM
genSUM
genSUM
genSUM
genSUM
genSUM
MUX
MUX
MUX
MUX
MUX
MUX
MUX
MUX
MUX
PG generation
Carry
Propagation
Path
Conditional
Carry Block
Sum
Stage
MUX
g
Carry passes only 8 Blocks
Ramchan Woo, ISCAS 2000
12
Circuit Implementation
• Simple Dynamic CMOS Circuits
CLK
P[3] P[2] P[1] P[0]
GP
CLK
GC
GGP[i-1]
GGG[i-1]
P[3]
P[2]
GG
G[3]
G[2]
GC[i-16]
P[1]
G[1]
GP/GG Generation
G[0]
GC Generation
25% Speed-Up due to Small Junction Capacitance
Ramchan Woo, ISCAS 2000
13
Adder Summary
Delay Results
Power Results
Average Current = ~40mA @ 2.5V
Process
0.25µm 1-poly 5-metal
CMOS
Supply Volatge
2.5V
Adder Delay
670ps
Power Dissipation
100mW @ 500MHz CLK
Cycle
Transistor Count
5460
Active Adder Area
0.16mm2 (1284µm x
130µm)
Chip Size
0.84mm2 (1400µm x
600µm)
Adder Features
Ramchan Woo, ISCAS 2000
14
Layout and Die-Photo
Layout of Adder Test Chip
Die-Photo
Chip Size : 0.84mm2 (1400µm x 600µm)
Ramchan Woo, ISCAS 2000
15
Conclusions
• A 64bit Dynamic Adder is designed and fabricated using
0.25µm CMOS technology.
• Dynamic-OR ‘P’ Scheme yields 30% Speed-up in PG-stage.
• Separated Carry Propagation Path yields 25% Performance
Improvement by using Simple Dynamic Circuits.
• Adder shows 670ps Worst-case Delay and 100mW Power
Consumption @ 500MHz Clock Cycle.
Ramchan Woo, ISCAS 2000
16