An Exact Polynomial Time Algorithm for Clock Tree

An Exact Polynomial Time Algorithm
for Clock Tree Sizing for Register Files
Alexander Berkovich, Lawrence D. Gonzales, Ataur Patwary
Intel Corporation
Rupesh S. Shelar
Synopsys Inc.
Agenda
• Introduction
• Clock Trees for Register Files
• Sizing Algorithm
• Experimental Results & Conclusion
Background
• Memory hierarchy: on-chip, DRAM, SSDs, (X-point), hard drives
• On-chip memory: latch/flop, Register files, SRAM
• Register files for fast on-chip storage
• Employed in processors meant for low power & high performance
applications
• Have been around for decades
Register Files
• Contain bit-cells arranged in rows and columns
• Bit-cell optimized for area, power, speed in each technology
• Circuitry for read-/write address decoding
• Rail-to-rail voltage swing compared to SRAM
• Typically, 8-transistor cells for 1R1W
• Bit-cells are larger
• Use static domino latch instead of sense amplifiers
• Commercial tools available for layout of different configurations
Synchronous Operation
• Most circuits are synchronous
• Register files also accessed in the same fashion
• Requires clock tree sizing considering races
• Commercial tools avoid races by employing self-timed circuits
• Comes at the cost of additional area, power
• May not be desirable for high volume processors
Why clock trees for register files are important
• Clock trees consume significant power, in general
• True for register files also, as all read/write accesses go through clock network
• Bit-cells highly optimized for each technology
• Layout is also optimized using regular arrangement in rows & columns
• Clock trees for RFs have additional constraints, apart from skew, slew
• Races between read & precharge, read & write
• Tedious, time-consuming to size manually
Agenda
• Introduction
• Clock Trees for Register Files
• Sizing Algorithm
• Experimental Results & Conclusion
High-level View
MCLK
ENTRY 0
edgewrckdrv
vn7lap00ls0i1
vn7inc30ls0v
1
vn7lap00ls0i1
vn7inc30ls0v
1
vn7lap00ls0i1
vn7inc30ls0v
1
vn7lap00ls0i1
vn7inc30ls0v
1
inv
Write Address Protection
Latches/3:8 Decoder
RCB
Write Address Protection
Latches/3:8 Decoder
rddecsdlc0
CKWRM2N33
Read Address Protection
Latches/3:8 Decoder
RCB
ENTRY 47
CKRDM1N22
WRDECC
WRDECC
WRDECC
RDDECC
RDDECC
RDDECC
RDDECC
RDDECC
Read Address Protection
Latches/3:8 Decoder
edgerdpdecc0
Layout summary of 48 entry x 4-bit array
Clock Tree Topology
BP14 BP13 AP11 OP1
1
Lo
Hi
Lo
BP12
• Clock cell instance naming
convention (for this example only)
BP11
L1PCH
Lo
Hi
•
•
•
•
•
•
Relative spec 0:
R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0
Relative spec 1:
F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0
CKRDM1N44
Lo
L0PCH
Hi
rdbll0
Relative spec 0:
R L0PCH 0.05 BR RDWL[N:0]
Relative spec 1:
F L0PCH 5ps AF RDWL[N:0]
G
RDWL[N:0]
.. N ..
CKRDM1N44
RCB
MCLK
Hsw default
mclk rise
- 55 ar
INSTANCE[A]
• Numbers in suffix denote the
reverse topological order in which
solutions are created
BP01
RCB
CKRDM1N22
Absolute
spec 0:
-27 ar
G
GR2 BR2
Lo
L0PCH
Hi
rdbll0
Relative spec 0:
R L0PCH 0.05 BR RDWL[N:0]
Relative spec 1:
F L0PCH 5ps AF RDWL[N:0]
RDWL[N:0]
GR1 BR1
RDWL[N].
Absolute spec 2:
27 ar
.. N ..
G = Ganged NAND
B = buffer
R = RDWL
P0(1) = L0(1) Precharge
A = AND
O = OR
INSTANCE[0]
Relative Timing Constraints between L0PCH and RDWL
rdbll0
bit
bit
bit
bit
bit
bit
LFFRDWL
LFFL0PCH
EFRL0PCH
Level0 Bitline[1]
Level0 Bitline[0]
bit
EFRRDWL
0
memcell[7]
memcell[5]
memcell[4]
memcell[3]
memcell[6]
Latest L0PCH
memcell[2]
memcell[1]
memcell[0]
Latest rise and fall
ENRRDWL
bit
LNFRDWL
0
ENRL0PCH
rdbll0
Level0 Bitline[1]
Level0 Bitline[0]
bit
bit
bit
bit
bit
bit
bit
bit
rdbll0
Level0 Bitline[0]
bit
bit
bit
bit
bit
bit
RDWL[7]
RDWL[6]
RDWL[5]
RDWL[4]
RDWL[3]
RDWL[2]
RDWL[1]
Earliest rise and fall
RDWL[0]
Level0 Bitline[1]
bit
Level0 precharge clock ( left )
bit
LNFL0PCH
Earliest L0PCH
• Relative timing constraint between RDWL & L0PCH
• Rise transition: earliest (latest) L0PCH rise before
earliest of earliest (latest) RDWL receivers
• ENR: Earliest among nearests
• EFR: Earliest among farthests
• Fall transition: latest (earliest) L0PCH falls after
latest of latest (earliest) RDWL receivers
• LNF: Latest among nearests
• LFF: Latest among farthests
Additional Constraints for RF Clock Tree Sizing
• Absolute required arrival time at a specific entry
• Relative timing constraints for races between
• L0PCH and RDWL
• L1PCH and RDBL
• RDWL and WRWL
• Approach
• Size one RDWL to the absolute arrival time
• Let the rest of the RDWLs fall as they may
• Size L0PCH, L1PCH, WRWL for relative timing constraints
Agenda
• Introduction
• Clock Trees for Register Files
• Sizing Algorithm
• Experimental Results & Conclusion
Overall Algorithm
• Two stage approach
• Stage 1: Size RDWLs and L0PCH in the 1st stage
• Relative timing constraints for L0PCHs will be met
•
•
•
•
Run timing analysis to find arrival times for RDBLs and RDWLs
Generate constraints for absolute arrival time requirements for L1PCHs and WRWLs
Stage2: Size L1PCH, WRWLs considering those constraints
Iterate to account for perturbations of RDWL timing during Stage 2
• Dynamic programming for clock tree sizing considering races
• Specifically, will show how bottom-up solution generation is carried out
• Most critical step
• Also, dominant one from time complexity perspective
Example
BP14 BP13 AP11 OP1
1
Lo
Hi
Lo
BP12
•Assume
–
Each clock buffer has 3 strengths
• Power = 3 unit, delay = 22 unit
• Power = 2 unit, delay = 24 unit
• Power = 1 unit, delay = 26 unit
BP11
L1PCH
Lo
Hi
Relative spec 0:
R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0
Relative spec 1:
F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0
CKRDM1N44
Lo
L0PCH
Hi
rdbll0
Relative spec 0:
R L0PCH 0.05 BR RDWL[N:0]
Relative spec 1:
F L0PCH 5ps AF RDWL[N:0]
G
•Generate solutions in reverse topological
order
RDWL[N:0]
.. N ..
•A solution associated with a clock buffer
characterized by a 10-element vector (EFR,
LFF, ENR, LNF, RR_spc, RF_spc, Rise-celldelay, fall-cell-delay, cell-power, total-power)
BP01
RCB
CKRDM1N44
RCB
MCLK
Hsw default
mclk rise
- 55 ar
INSTANCE[A]
CKRDM1N22
Absolute
spec 0:
-27 ar
G
GR2 BR2
Lo
L0PCH
Hi
rdbll0
Relative spec 0:
R L0PCH 0.05 BR RDWL[N:0]
Relative spec 1:
F L0PCH 5ps AF RDWL[N:0]
RDWL[N:0]
GR1 BR1
RDWL[N].
Absolute spec 2:
27 ar
.. N ..
INSTANCE[0]
Bottom-up Solution Generation
BP14 BP13 AP11 OP1
1
BP12
BP11
•Assume
–
required arrival time at the
specified entry is 27 unit
–
L0PCH has to rise/fall
before/after RDWL by ≥5 unit
•Initialize the solutions at leaves, i.e., the
receivers of BR1[0:7] and BP01 to (27, 27,
27, 27, 27, 27, 0, 0, 0, 0)
BP01
RCB
GR2 BR2
GR1 BR1
•Create solutions in the following order:
–
BR1, GR1, BP01, BR2, GR2, RCB
–
BP12, OP11, AP11, BP13, BP14, RCB
Bottom-up Solution Generation (CNTD.)
•3 steps:
– Combine solutions at
the output
– Consider sizes for the
current clock buffer
to create solutions
– Prune inferior
solutions
Bottom-up Solution Generation: Combining
Solutions at BR1
BP14 BP13 AP11 OP1
1
Lo
Hi
Lo
Hi
BP12
BP11
•Assume max and min wiredelay to be 6 and 2 unit
L1PCH
Lo
Relative spec 0:
R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0
Relative spec 1:
F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0
CKRDM1N44
Lo
•Combine solutions at the
output of BR1
L0PCH
Hi
rdbll0
Relative spec 0:
R L0PCH 0.05 BR RDWL[N:0]
Relative spec 1:
F L0PCH 5ps AF RDWL[N:0]
G
–
RDWL[N:0]
.. N ..
BP01
RCB
CKRDM1N44
RCB
MCLK
Hsw default
mclk rise
- 55 ar
CKRDM1N22
Absolute
spec 0:
-27 ar
G
GR2 BR2
Lo
Hi
INSTANCE[A]
L0PCH
rdbll0
Relative spec 0:
R L0PCH 0.05 BR RDWL[N:0]
Relative spec 1:
F L0PCH 5ps AF RDWL[N:0]
–
RDWL[N:0]
GR1 BR1
RDWL[N].
Absolute spec 2:
27 ar
.. N ..
INSTANCE[0]
(27-6, 27-6, 27-2, 272, 27-6, 27-6, 0, 0, 0,
0) = (21, 21, 25, 25,
21, 21, 0, 0, 0, 0)
For 2nd and 3rd iteration,
RDWL[i] may have
different wire-delays, so
use the appropriate
ones for each BR1[i]
Bottom-up Solution Generation: Considering
choices for BR1
BP14 BP13 AP11 OP1
1
Lo
Hi
Lo
BP12
BP11
•For 1st iteration, assume slope;
use actual values for later ones
L1PCH
Lo
Hi
Relative spec 0:
R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0
Relative spec 1:
F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0
CKRDM1N44
Lo
L0PCH
Hi
•3 solutions corresponding to 3
choices
rdbll0
Relative spec 0:
R L0PCH 0.05 BR RDWL[N:0]
Relative spec 1:
F L0PCH 5ps AF RDWL[N:0]
G
–
RDWL[N:0]
.. N ..
BP01
RCB
CKRDM1N44
RCB
MCLK
Hsw default
mclk rise
- 55 ar
INSTANCE[A]
CKRDM1N22
Absolute
spec 0:
-27 ar
G
GR2 BR2
Lo
L0PCH
Hi
rdbll0
Relative spec 0:
R L0PCH 0.05 BR RDWL[N:0]
Relative spec 1:
F L0PCH 5ps AF RDWL[N:0]
RDWL[N:0]
GR1 BR1
–
RDWL[N].
Absolute spec 2:
27 ar
.. N ..
INSTANCE[0]
–
Solution 1: (21-22, 2122, 25-22, 25-22, 2122, 21-22, 22, 22, 3, 3)
= (-1, -1, 3, 3, -1, -1,
22, 22, 3, 3)
Solution 2: (-3, -3, 1, 1,
-3, -3, 24, 24, 2, 2)
Solution 3: (-5, -5, -1, 1, -5, -5, 26, 26, 1, 1)
Bottom-up Solution Generation: Pruning at BR1
BP14 BP13 AP11 OP1
1
Lo
Hi
Lo
BP12
•No solution with same
RR_spc, RF_spc, but higher
power
BP11
L1PCH
Lo
Hi
Relative spec 0:
R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0
Relative spec 1:
F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0
CKRDM1N44
Lo
L0PCH
Hi
rdbll0
•So, keep all 3 solutions
Relative spec 0:
R L0PCH 0.05 BR RDWL[N:0]
Relative spec 1:
F L0PCH 5ps AF RDWL[N:0]
G
–
RDWL[N:0]
.. N ..
BP01
RCB
CKRDM1N44
RCB
MCLK
Hsw default
mclk rise
- 55 ar
CKRDM1N22
Absolute
spec 0:
-27 ar
G
GR2 BR2
Lo
Hi
INSTANCE[A]
L0PCH
rdbll0
Relative spec 0:
R L0PCH 0.05 BR RDWL[N:0]
Relative spec 1:
F L0PCH 5ps AF RDWL[N:0]
–
RDWL[N:0]
GR1 BR1
RDWL[N].
Absolute spec 2:
27 ar
.. N ..
INSTANCE[0]
–
Solution 1: (21-22, 2122, 25-22, 25-22, 2122, 21-22, 22, 22, 3, 3)
= (-1, -1, 3, 3, -1, -1,
22, 22, 3, 3)
Solution 2: (-3, -3, 1, 1,
-3, -3, 24, 24, 2, 2)
Solution 3: (-5, -5, -1, 1, -5, -5, 26, 26, 1, 1)
Bottom-up Solution Generation: Combining at GR1
BP14 BP13 AP11 OP1
1
Lo
Hi
Lo
Hi
BP12
BP11
L1PCH
Lo
Relative spec 0:
R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0
Relative spec 1:
F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0
CKRDM1N44
Lo
L0PCH
Hi
rdbll0
Relative spec 0:
R L0PCH 0.05 BR RDWL[N:0]
Relative spec 1:
F L0PCH 5ps AF RDWL[N:0]
G
RDWL[N:0]
.. N ..
BP01
RCB
CKRDM1N44
RCB
MCLK
Hsw default
mclk rise
- 55 ar
CKRDM1N22
Absolute
spec 0:
-27 ar
G
GR2 BR2
Lo
Hi
INSTANCE[A]
L0PCH
rdbll0
Relative spec 0:
R L0PCH 0.05 BR RDWL[N:0]
Relative spec 1:
F L0PCH 5ps AF RDWL[N:0]
RDWL[N:0]
GR1 BR1
RDWL[N].
Absolute spec 2:
27 ar
.. N ..
INSTANCE[0]
•Assuming max-/min wire-delay to be 2
and 6 unit, and for the specific entry wiredelay is 4 unit, we get the following
solutions after the result of combine:
–
Solution 1: (-1-6, -1-6, 3-2, 3-2, 1-4, -1-4, 22, 22, 3, 3*4) = (-7, -7,
1, 1, -5, -5, 22, 22, 3, 12)
–
Solution 2: (-3-6, -3-6, 1-2, 1-2, 3-4, -3-4, 24, 24, 2, 2*4) = (-9, -9,
-1, -1, -7, -7, 24, 24, 2, 8)
–
Solution 3: (-5-6, -5-6, -1-2, -1-2, 5-4, -5-4, 26, 26, 1, 1*4) = (-11, 11, -3, -3, -9, -9, 26, 26, 1, 4)
Bottom-up Solution Generation: Considering Choices for GR1
• 3 solutions at the output, (-7, -7, 1, 1, -5, -5, 22, 22, 12, 12), (-9, -9, -1, -1, -7, -7, 24, 24, 8, 8), (-11,
-11, -3, -3, -9, -9, 26, 26, 4, 4), and 3 choices for GR1, Power=3, 2, 1 unit results in 9 solutions as
follows:
√ • Solution 1 (Power=3 unit): (-7-22, -7-22, 1-22, 1-22, -5-22, -5-22, 22, 22, 3, 12+3) = (-29, -29, 21, -21, -27, -27, 22, 22, 3, 15)
√ • Solution 2 (Power=3 unit): (-31, -31, -23, -23, -29, -29, 22, 22, 3, 11)
√ • Solution 3 (Power=3 unit): (-33, -33, -25, -25, -31, -31, 22, 22, 3, 7)
X • Solution 4 (Power=2 unit): (-7-24, -7-24, 1-24, 1-24, -5-24, -5-24, 24, 24, 2, 12+2) = (-31, -31, 23, -23, -29, -29, 24, 24, 2, 14)
X • Solution 5 (Power=2 unit): (-33, -33, -25, -25, -31, -31, 24, 24, 2, 10)
√ • Solution 6 (Power=2 unit): (-35, -35, -27, -27, -33, -33, 24, 24, 2, 6)
X • Solution 7 (Power=1 unit): (-7-26, -7-26, 1-26, 1-26, -5-26, -5-26, 26, 26, 1, 12+1) = (-33, -33, 25, -25, -31, -31, 26, 26, 1, 13)
X • Solution 8 (Power=1 unit): (-35, -35, -27, -27, -33, -33, 26, 26, 1, 9)
√ • Solution 9 (Power=1 unit): (-37, -37, -29, -29, -35, -35, 26, 26, 1, 5)
Bottom-up Solution Generation: Considering Choices for GR1
(CNTD.)
• Pruning the solutions with same RR_spc, RF_spc but higher power
results in the following:
- Solution 1 (Power=3 unit): (-29, -29, -21, -21, -27, -27, 22, 22, 3, 15)
- Solution 2 (Power=3 unit): (-31, -31, -23, -23, -29, -29, 22, 22, 3, 11)
- Solution 3 (Power=3 unit): (-33, -33, -25, -25, -31, -31, 22, 22, 3, 7)
- Solution 6 (Power=2 unit): (-35, -35, -27, -27, -33, -33, 24, 24, 2, 6)
- Solution 9 (Power=1 unit): (-37, -37, -29, -29, -35, -35, 26, 26, 1, 5)
• Note, how optimum power solutions are preserved
Bottom-up Solution Generation: Creating Solutions at BP01
BP14 BP13 AP11 OP1
1
Lo
Hi
Lo
BP12
•Assume max and min wiredelay to be 6 and 2 unit
BP11
•Combine solutions at the
output of BP01
–(27-6, 27-6, 27-2, 272, 27-6, 27-6, 0, 0, 0,
0) = (21, 21, 25, 25,
21, 21, 0, 0, 0, 0)
L1PCH
Lo
Hi
Relative spec 0:
R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0
Relative spec 1:
F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0
CKRDM1N44
Lo
L0PCH
Hi
rdbll0
Relative spec 0:
R L0PCH 0.05 BR RDWL[N:0]
Relative spec 1:
F L0PCH 5ps AF RDWL[N:0]
G
RDWL[N:0]
.. N ..
RCB
CKRDM1N44
RCB
MCLK
Hsw default
mclk rise
- 55 ar
INSTANCE[A]
BP01
CKRDM1N22
Absolute
spec 0:
-27 ar
G
GR2 BR2
Lo
L0PCH
Hi
rdbll0
Relative spec 0:
R L0PCH 0.05 BR RDWL[N:0]
Relative spec 1:
F L0PCH 5ps AF RDWL[N:0]
RDWL[N:0]
GR1 BR1
•BP01 is skewed, so let’s
assume fall is 18 unit slower
and that it has 3 sizes with
rise/fall delays as follows
•3 unit, 44/62 unit
RDWL[N].
Absolute spec 2:
27 ar
.. N ..
INSTANCE[0]
•2 unit, 48/66 unit
•1 unit, 52/70 unit
Bottom-up Solution Generation: Creating Solutions at BP01
BP14 BP13 AP11 OP1
1
Lo
Hi
Lo
Hi
BP12
•3 Solutions at BP01 as follows:
BP11
L1PCH
•Solution 1: (21-44, 21-62, 25-44, 2562, 21-44, 21-62, 44, 62, 3, 3) = (-23, 41, -19, -37, -23, -41, 44, 62, 3, 3)
Lo
Relative spec 0:
R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0
Relative spec 1:
F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0
CKRDM1N44
Lo
L0PCH
Hi
rdbll0
•Solution 2: (21-48, 21-66, 25-48, 2566, 21-48, 21-66, 48, 66, 2, 2) = (-27, 45, -23, -41, -27, -45, 48, 66, 2, 2)
Relative spec 0:
R L0PCH 0.05 BR RDWL[N:0]
Relative spec 1:
F L0PCH 5ps AF RDWL[N:0]
G
RDWL[N:0]
.. N ..
BP01
RCB
CKRDM1N44
RCB
MCLK
Hsw default
mclk rise
- 55 ar
INSTANCE[A]
CKRDM1N22
Absolute
spec 0:
-27 ar
G
GR2 BR2
Lo
L0PCH
Hi
rdbll0
Relative spec 0:
R L0PCH 0.05 BR RDWL[N:0]
Relative spec 1:
F L0PCH 5ps AF RDWL[N:0]
•Pruning is trivial, so all the solutions
preserved in this case
RDWL[N:0]
GR1 BR1
•Solution 3: (21-52, 21-70, 25-52, 2570, 21-52, 21-70, 52, 70, 1, 1) = (-31, 49, -27, -45, -31, -49, 52, 70, 1, 1)
RDWL[N].
Absolute spec 2:
27 ar
.. N ..
INSTANCE[0]
Bottom-up Solution Generation: Combining Solutions at
fanout of BR2
•Update the required arrival times for solutions
BP14 BP13 AP11 OP1
1
Lo
Hi
Lo
Hi
BP12
BP11
from BP01 and GR1
•Assume wire-delay from BR2 to BP01 to be 4 units.
So, the solutions from L0PCH path are:
L1PCH
Lo
Relative spec 0:
R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0
Relative spec 1:
F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0
CKRDM1N44
Lo
•Sol.1: (-27, -45, -23, -41, -27, -45, 22, 31, 3, 3)
L0PCH
Hi
rdbll0
•Sol. 2: (-31, -49, -27, -45, -31, -49, 24, 33, 2, 2)
Relative spec 0:
R L0PCH 0.05 BR RDWL[N:0]
Relative spec 1:
F L0PCH 5ps AF RDWL[N:0]
G
•Sol. 3: (-35, -53, -31, -49, -35, -53, 26, 35, 1, 1)
RDWL[N:0]
.. N ..
BP01
RCB
CKRDM1N44
RCB
MCLK
Hsw default
mclk rise
- 55 ar
INSTANCE[A]
CKRDM1N22
Absolute
spec 0:
-27 ar
G
GR2 BR2
Lo
L0PCH
Hi
•Assume wire-delay from BR2 to GR1 to be 6 unit.
So, the solutions from RDWL path are:
•Sol. 1: (-35, -35, -27, -27, -33, -33, 22, 22, 3, 15)
rdbll0
Relative spec 0:
R L0PCH 0.05 BR RDWL[N:0]
Relative spec 1:
F L0PCH 5ps AF RDWL[N:0]
•Sol. 2: (-37, -37, -29, -29, -35, -35, 22, 22, 3, 11)
RDWL[N:0]
GR1 BR1
•Sol. 3: (-39, -39, -31, -31, -37, -37, 22, 22, 3, 7)
RDWL[N].
Absolute spec 2:
27 ar
.. N ..
INSTANCE[0]
•Sol. 6: (-41, -41, -33, -33, -39, -39, 24, 24, 2, 6)
•Sol. 9: (-43, -43, -35, -35, -41, -41, 26, 26, 1, 5)
Bottom-up Solution Generation: Combining Solutions at fanout of BR2
• Combine solutions from L0PCH path those from RDWL path iff (∆ = 5 units)
• Rise transition: EFRL0PCH ≥ EFRRDWL+ 5 && ENRL0PCH ≥ ENRRDWL+ 5
• Fall transition: LFFL0PCH ≤ LFFRDWL - 5 && LNFL0PCH ≤ LNFRDWL - 5
• 15 possible solutions:
• Combination: Sol. 1L0PCH && Sol. 1RDWLinvalid (ENRL0PCH ≥ ENRRDWL + 5 false)
• Combination: Sol. 1L0PCH && Sol. 2RDWL valid
• Combination: Sol. 1L0PCH && Sol. 3RDWL valid
• Combination: Sol. 1L0PCH && Sol. 6RDWL invalid (LFFL0PCH ≤ LFFRDWL - 5 false)
• Combination: Sol. 1L0PCH && Sol. 9RDWL invalid (LFFL0PCH ≤ LFFRDWL - 5 false)
• Combination: Sol. 2L0PCH && Sol. 1RDWL invalid (EFRL0PCH ≥ EFRRDWL + 5 false)
• Combination: Sol. 2L0PCH && Sol. 2 RDWL invalid (ENRL0PCH ≥ ENRRDWL+ 5 false)
• Combination: Sol. 2L0pCH && Sol. 3RDWL invalid (ENRL0PCH ≥ ENRRDWL+ 5 false)
• Combination: Sol. 2L0PCH && Sol. 6RDWL valid
• Combination: Sol. 2L0pch and Sol. 9RDWLvalid
• Sol. 3L0PCH cannot be combined with any solution from RDWL, since ENRL0PCH ≥ ENRRDWL + 5
• Out of 15 only 4 solutions valid (that’s why problem is more difficult than that without relative timing constraints)
• Also, the combinations of GR1, BR1, and BP01 should be considered simultaneously (LR-like algorithm is
at disadvantage)
• For valid combinations, add total powers and keep RATs due to the solutions from RDWL path
Bottom-up Solution Generation: Combining Solutions at fanout of BR2
•4 Valid combinations:
– Sol. 2: (-37, -37, -29, -29, -35, -35, 22, 22, 3, 11+3)
– Sol. 3: (-39, -39, -31, -31, -37, -37, 22, 22, 3, 7+3)
– Sol. 6: (-41, -41, -33, -33, -39, -39, 24, 24, 2, 6+2)
– Sol. 9: (-43, -43, -35, -35, -41, -41, 26, 26, 1, 5+2)
•Consider 3 sizes for BR2 to create 12 possible solutions and then prune, and continue
sizing GR2, RCB
•At the root, choose the solution whose RAT is the closest to that of grid clock arrival
•Next, perform timing analysis and apply the sizing for L1PCH path with absolute arrival
time requirement, and by sizing RCB to the same delay from L0PCH/RDWL sizing
iteration (Why? )
•In the next iteration, size L0PCH and RDWL paths…
•Typically, in 3 iterations sizes converge or do not change much
Time Complexity & Limitations
• Time complexity: O(NB2), N = number of clock buffers in the tree; B =
maximum number of size choices for a clock buffer
• Since only part of the clock tree is sized (remaining instantiations get
the same sizes), runtimes are fast, in minutes, excluding timing
analysis/extraction, etc.
• Limitation:
• Inaccuracies in delays due to actual slopes - typically, diminish in 3 iterations
Agenda
• Introduction
• Clock Trees for Register Files
• Sizing Algorithm
• Experimental Results & Conclusion
Experimental Results
• Algorithm implemented on top of internal timer
• Employed for clock tree sizing of ~100 register file blocks in 22 nm
• Runs in minutes on these blocks
• Saves ~22% of clock power over traditional sizing
Conclusion
• Algorithm results in power savings and productivity improvement
• Has been proliferated for use in advanced technologies
• Has also been incorporated in proprietary RF automation flow
• Resulted in wide use across microprocessors, from low power to high
performance applications
• Applicable to sizing for races in general, as long as those can be
capturing employing time constraints
Acknowledgments
• Authors would like to thank several of their colleagues for review and
comments
Questions