An Exact Polynomial Time Algorithm for Clock Tree Sizing for Register Files Alexander Berkovich, Lawrence D. Gonzales, Ataur Patwary Intel Corporation Rupesh S. Shelar Synopsys Inc. Agenda • Introduction • Clock Trees for Register Files • Sizing Algorithm • Experimental Results & Conclusion Background • Memory hierarchy: on-chip, DRAM, SSDs, (X-point), hard drives • On-chip memory: latch/flop, Register files, SRAM • Register files for fast on-chip storage • Employed in processors meant for low power & high performance applications • Have been around for decades Register Files • Contain bit-cells arranged in rows and columns • Bit-cell optimized for area, power, speed in each technology • Circuitry for read-/write address decoding • Rail-to-rail voltage swing compared to SRAM • Typically, 8-transistor cells for 1R1W • Bit-cells are larger • Use static domino latch instead of sense amplifiers • Commercial tools available for layout of different configurations Synchronous Operation • Most circuits are synchronous • Register files also accessed in the same fashion • Requires clock tree sizing considering races • Commercial tools avoid races by employing self-timed circuits • Comes at the cost of additional area, power • May not be desirable for high volume processors Why clock trees for register files are important • Clock trees consume significant power, in general • True for register files also, as all read/write accesses go through clock network • Bit-cells highly optimized for each technology • Layout is also optimized using regular arrangement in rows & columns • Clock trees for RFs have additional constraints, apart from skew, slew • Races between read & precharge, read & write • Tedious, time-consuming to size manually Agenda • Introduction • Clock Trees for Register Files • Sizing Algorithm • Experimental Results & Conclusion High-level View MCLK ENTRY 0 edgewrckdrv vn7lap00ls0i1 vn7inc30ls0v 1 vn7lap00ls0i1 vn7inc30ls0v 1 vn7lap00ls0i1 vn7inc30ls0v 1 vn7lap00ls0i1 vn7inc30ls0v 1 inv Write Address Protection Latches/3:8 Decoder RCB Write Address Protection Latches/3:8 Decoder rddecsdlc0 CKWRM2N33 Read Address Protection Latches/3:8 Decoder RCB ENTRY 47 CKRDM1N22 WRDECC WRDECC WRDECC RDDECC RDDECC RDDECC RDDECC RDDECC Read Address Protection Latches/3:8 Decoder edgerdpdecc0 Layout summary of 48 entry x 4-bit array Clock Tree Topology BP14 BP13 AP11 OP1 1 Lo Hi Lo BP12 • Clock cell instance naming convention (for this example only) BP11 L1PCH Lo Hi • • • • • • Relative spec 0: R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0 Relative spec 1: F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0 CKRDM1N44 Lo L0PCH Hi rdbll0 Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] Relative spec 1: F L0PCH 5ps AF RDWL[N:0] G RDWL[N:0] .. N .. CKRDM1N44 RCB MCLK Hsw default mclk rise - 55 ar INSTANCE[A] • Numbers in suffix denote the reverse topological order in which solutions are created BP01 RCB CKRDM1N22 Absolute spec 0: -27 ar G GR2 BR2 Lo L0PCH Hi rdbll0 Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] Relative spec 1: F L0PCH 5ps AF RDWL[N:0] RDWL[N:0] GR1 BR1 RDWL[N]. Absolute spec 2: 27 ar .. N .. G = Ganged NAND B = buffer R = RDWL P0(1) = L0(1) Precharge A = AND O = OR INSTANCE[0] Relative Timing Constraints between L0PCH and RDWL rdbll0 bit bit bit bit bit bit LFFRDWL LFFL0PCH EFRL0PCH Level0 Bitline[1] Level0 Bitline[0] bit EFRRDWL 0 memcell[7] memcell[5] memcell[4] memcell[3] memcell[6] Latest L0PCH memcell[2] memcell[1] memcell[0] Latest rise and fall ENRRDWL bit LNFRDWL 0 ENRL0PCH rdbll0 Level0 Bitline[1] Level0 Bitline[0] bit bit bit bit bit bit bit bit rdbll0 Level0 Bitline[0] bit bit bit bit bit bit RDWL[7] RDWL[6] RDWL[5] RDWL[4] RDWL[3] RDWL[2] RDWL[1] Earliest rise and fall RDWL[0] Level0 Bitline[1] bit Level0 precharge clock ( left ) bit LNFL0PCH Earliest L0PCH • Relative timing constraint between RDWL & L0PCH • Rise transition: earliest (latest) L0PCH rise before earliest of earliest (latest) RDWL receivers • ENR: Earliest among nearests • EFR: Earliest among farthests • Fall transition: latest (earliest) L0PCH falls after latest of latest (earliest) RDWL receivers • LNF: Latest among nearests • LFF: Latest among farthests Additional Constraints for RF Clock Tree Sizing • Absolute required arrival time at a specific entry • Relative timing constraints for races between • L0PCH and RDWL • L1PCH and RDBL • RDWL and WRWL • Approach • Size one RDWL to the absolute arrival time • Let the rest of the RDWLs fall as they may • Size L0PCH, L1PCH, WRWL for relative timing constraints Agenda • Introduction • Clock Trees for Register Files • Sizing Algorithm • Experimental Results & Conclusion Overall Algorithm • Two stage approach • Stage 1: Size RDWLs and L0PCH in the 1st stage • Relative timing constraints for L0PCHs will be met • • • • Run timing analysis to find arrival times for RDBLs and RDWLs Generate constraints for absolute arrival time requirements for L1PCHs and WRWLs Stage2: Size L1PCH, WRWLs considering those constraints Iterate to account for perturbations of RDWL timing during Stage 2 • Dynamic programming for clock tree sizing considering races • Specifically, will show how bottom-up solution generation is carried out • Most critical step • Also, dominant one from time complexity perspective Example BP14 BP13 AP11 OP1 1 Lo Hi Lo BP12 •Assume – Each clock buffer has 3 strengths • Power = 3 unit, delay = 22 unit • Power = 2 unit, delay = 24 unit • Power = 1 unit, delay = 26 unit BP11 L1PCH Lo Hi Relative spec 0: R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0 Relative spec 1: F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0 CKRDM1N44 Lo L0PCH Hi rdbll0 Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] Relative spec 1: F L0PCH 5ps AF RDWL[N:0] G •Generate solutions in reverse topological order RDWL[N:0] .. N .. •A solution associated with a clock buffer characterized by a 10-element vector (EFR, LFF, ENR, LNF, RR_spc, RF_spc, Rise-celldelay, fall-cell-delay, cell-power, total-power) BP01 RCB CKRDM1N44 RCB MCLK Hsw default mclk rise - 55 ar INSTANCE[A] CKRDM1N22 Absolute spec 0: -27 ar G GR2 BR2 Lo L0PCH Hi rdbll0 Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] Relative spec 1: F L0PCH 5ps AF RDWL[N:0] RDWL[N:0] GR1 BR1 RDWL[N]. Absolute spec 2: 27 ar .. N .. INSTANCE[0] Bottom-up Solution Generation BP14 BP13 AP11 OP1 1 BP12 BP11 •Assume – required arrival time at the specified entry is 27 unit – L0PCH has to rise/fall before/after RDWL by ≥5 unit •Initialize the solutions at leaves, i.e., the receivers of BR1[0:7] and BP01 to (27, 27, 27, 27, 27, 27, 0, 0, 0, 0) BP01 RCB GR2 BR2 GR1 BR1 •Create solutions in the following order: – BR1, GR1, BP01, BR2, GR2, RCB – BP12, OP11, AP11, BP13, BP14, RCB Bottom-up Solution Generation (CNTD.) •3 steps: – Combine solutions at the output – Consider sizes for the current clock buffer to create solutions – Prune inferior solutions Bottom-up Solution Generation: Combining Solutions at BR1 BP14 BP13 AP11 OP1 1 Lo Hi Lo Hi BP12 BP11 •Assume max and min wiredelay to be 6 and 2 unit L1PCH Lo Relative spec 0: R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0 Relative spec 1: F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0 CKRDM1N44 Lo •Combine solutions at the output of BR1 L0PCH Hi rdbll0 Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] Relative spec 1: F L0PCH 5ps AF RDWL[N:0] G – RDWL[N:0] .. N .. BP01 RCB CKRDM1N44 RCB MCLK Hsw default mclk rise - 55 ar CKRDM1N22 Absolute spec 0: -27 ar G GR2 BR2 Lo Hi INSTANCE[A] L0PCH rdbll0 Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] Relative spec 1: F L0PCH 5ps AF RDWL[N:0] – RDWL[N:0] GR1 BR1 RDWL[N]. Absolute spec 2: 27 ar .. N .. INSTANCE[0] (27-6, 27-6, 27-2, 272, 27-6, 27-6, 0, 0, 0, 0) = (21, 21, 25, 25, 21, 21, 0, 0, 0, 0) For 2nd and 3rd iteration, RDWL[i] may have different wire-delays, so use the appropriate ones for each BR1[i] Bottom-up Solution Generation: Considering choices for BR1 BP14 BP13 AP11 OP1 1 Lo Hi Lo BP12 BP11 •For 1st iteration, assume slope; use actual values for later ones L1PCH Lo Hi Relative spec 0: R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0 Relative spec 1: F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0 CKRDM1N44 Lo L0PCH Hi •3 solutions corresponding to 3 choices rdbll0 Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] Relative spec 1: F L0PCH 5ps AF RDWL[N:0] G – RDWL[N:0] .. N .. BP01 RCB CKRDM1N44 RCB MCLK Hsw default mclk rise - 55 ar INSTANCE[A] CKRDM1N22 Absolute spec 0: -27 ar G GR2 BR2 Lo L0PCH Hi rdbll0 Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] Relative spec 1: F L0PCH 5ps AF RDWL[N:0] RDWL[N:0] GR1 BR1 – RDWL[N]. Absolute spec 2: 27 ar .. N .. INSTANCE[0] – Solution 1: (21-22, 2122, 25-22, 25-22, 2122, 21-22, 22, 22, 3, 3) = (-1, -1, 3, 3, -1, -1, 22, 22, 3, 3) Solution 2: (-3, -3, 1, 1, -3, -3, 24, 24, 2, 2) Solution 3: (-5, -5, -1, 1, -5, -5, 26, 26, 1, 1) Bottom-up Solution Generation: Pruning at BR1 BP14 BP13 AP11 OP1 1 Lo Hi Lo BP12 •No solution with same RR_spc, RF_spc, but higher power BP11 L1PCH Lo Hi Relative spec 0: R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0 Relative spec 1: F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0 CKRDM1N44 Lo L0PCH Hi rdbll0 •So, keep all 3 solutions Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] Relative spec 1: F L0PCH 5ps AF RDWL[N:0] G – RDWL[N:0] .. N .. BP01 RCB CKRDM1N44 RCB MCLK Hsw default mclk rise - 55 ar CKRDM1N22 Absolute spec 0: -27 ar G GR2 BR2 Lo Hi INSTANCE[A] L0PCH rdbll0 Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] Relative spec 1: F L0PCH 5ps AF RDWL[N:0] – RDWL[N:0] GR1 BR1 RDWL[N]. Absolute spec 2: 27 ar .. N .. INSTANCE[0] – Solution 1: (21-22, 2122, 25-22, 25-22, 2122, 21-22, 22, 22, 3, 3) = (-1, -1, 3, 3, -1, -1, 22, 22, 3, 3) Solution 2: (-3, -3, 1, 1, -3, -3, 24, 24, 2, 2) Solution 3: (-5, -5, -1, 1, -5, -5, 26, 26, 1, 1) Bottom-up Solution Generation: Combining at GR1 BP14 BP13 AP11 OP1 1 Lo Hi Lo Hi BP12 BP11 L1PCH Lo Relative spec 0: R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0 Relative spec 1: F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0 CKRDM1N44 Lo L0PCH Hi rdbll0 Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] Relative spec 1: F L0PCH 5ps AF RDWL[N:0] G RDWL[N:0] .. N .. BP01 RCB CKRDM1N44 RCB MCLK Hsw default mclk rise - 55 ar CKRDM1N22 Absolute spec 0: -27 ar G GR2 BR2 Lo Hi INSTANCE[A] L0PCH rdbll0 Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] Relative spec 1: F L0PCH 5ps AF RDWL[N:0] RDWL[N:0] GR1 BR1 RDWL[N]. Absolute spec 2: 27 ar .. N .. INSTANCE[0] •Assuming max-/min wire-delay to be 2 and 6 unit, and for the specific entry wiredelay is 4 unit, we get the following solutions after the result of combine: – Solution 1: (-1-6, -1-6, 3-2, 3-2, 1-4, -1-4, 22, 22, 3, 3*4) = (-7, -7, 1, 1, -5, -5, 22, 22, 3, 12) – Solution 2: (-3-6, -3-6, 1-2, 1-2, 3-4, -3-4, 24, 24, 2, 2*4) = (-9, -9, -1, -1, -7, -7, 24, 24, 2, 8) – Solution 3: (-5-6, -5-6, -1-2, -1-2, 5-4, -5-4, 26, 26, 1, 1*4) = (-11, 11, -3, -3, -9, -9, 26, 26, 1, 4) Bottom-up Solution Generation: Considering Choices for GR1 • 3 solutions at the output, (-7, -7, 1, 1, -5, -5, 22, 22, 12, 12), (-9, -9, -1, -1, -7, -7, 24, 24, 8, 8), (-11, -11, -3, -3, -9, -9, 26, 26, 4, 4), and 3 choices for GR1, Power=3, 2, 1 unit results in 9 solutions as follows: √ • Solution 1 (Power=3 unit): (-7-22, -7-22, 1-22, 1-22, -5-22, -5-22, 22, 22, 3, 12+3) = (-29, -29, 21, -21, -27, -27, 22, 22, 3, 15) √ • Solution 2 (Power=3 unit): (-31, -31, -23, -23, -29, -29, 22, 22, 3, 11) √ • Solution 3 (Power=3 unit): (-33, -33, -25, -25, -31, -31, 22, 22, 3, 7) X • Solution 4 (Power=2 unit): (-7-24, -7-24, 1-24, 1-24, -5-24, -5-24, 24, 24, 2, 12+2) = (-31, -31, 23, -23, -29, -29, 24, 24, 2, 14) X • Solution 5 (Power=2 unit): (-33, -33, -25, -25, -31, -31, 24, 24, 2, 10) √ • Solution 6 (Power=2 unit): (-35, -35, -27, -27, -33, -33, 24, 24, 2, 6) X • Solution 7 (Power=1 unit): (-7-26, -7-26, 1-26, 1-26, -5-26, -5-26, 26, 26, 1, 12+1) = (-33, -33, 25, -25, -31, -31, 26, 26, 1, 13) X • Solution 8 (Power=1 unit): (-35, -35, -27, -27, -33, -33, 26, 26, 1, 9) √ • Solution 9 (Power=1 unit): (-37, -37, -29, -29, -35, -35, 26, 26, 1, 5) Bottom-up Solution Generation: Considering Choices for GR1 (CNTD.) • Pruning the solutions with same RR_spc, RF_spc but higher power results in the following: - Solution 1 (Power=3 unit): (-29, -29, -21, -21, -27, -27, 22, 22, 3, 15) - Solution 2 (Power=3 unit): (-31, -31, -23, -23, -29, -29, 22, 22, 3, 11) - Solution 3 (Power=3 unit): (-33, -33, -25, -25, -31, -31, 22, 22, 3, 7) - Solution 6 (Power=2 unit): (-35, -35, -27, -27, -33, -33, 24, 24, 2, 6) - Solution 9 (Power=1 unit): (-37, -37, -29, -29, -35, -35, 26, 26, 1, 5) • Note, how optimum power solutions are preserved Bottom-up Solution Generation: Creating Solutions at BP01 BP14 BP13 AP11 OP1 1 Lo Hi Lo BP12 •Assume max and min wiredelay to be 6 and 2 unit BP11 •Combine solutions at the output of BP01 –(27-6, 27-6, 27-2, 272, 27-6, 27-6, 0, 0, 0, 0) = (21, 21, 25, 25, 21, 21, 0, 0, 0, 0) L1PCH Lo Hi Relative spec 0: R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0 Relative spec 1: F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0 CKRDM1N44 Lo L0PCH Hi rdbll0 Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] Relative spec 1: F L0PCH 5ps AF RDWL[N:0] G RDWL[N:0] .. N .. RCB CKRDM1N44 RCB MCLK Hsw default mclk rise - 55 ar INSTANCE[A] BP01 CKRDM1N22 Absolute spec 0: -27 ar G GR2 BR2 Lo L0PCH Hi rdbll0 Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] Relative spec 1: F L0PCH 5ps AF RDWL[N:0] RDWL[N:0] GR1 BR1 •BP01 is skewed, so let’s assume fall is 18 unit slower and that it has 3 sizes with rise/fall delays as follows •3 unit, 44/62 unit RDWL[N]. Absolute spec 2: 27 ar .. N .. INSTANCE[0] •2 unit, 48/66 unit •1 unit, 52/70 unit Bottom-up Solution Generation: Creating Solutions at BP01 BP14 BP13 AP11 OP1 1 Lo Hi Lo Hi BP12 •3 Solutions at BP01 as follows: BP11 L1PCH •Solution 1: (21-44, 21-62, 25-44, 2562, 21-44, 21-62, 44, 62, 3, 3) = (-23, 41, -19, -37, -23, -41, 44, 62, 3, 3) Lo Relative spec 0: R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0 Relative spec 1: F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0 CKRDM1N44 Lo L0PCH Hi rdbll0 •Solution 2: (21-48, 21-66, 25-48, 2566, 21-48, 21-66, 48, 66, 2, 2) = (-27, 45, -23, -41, -27, -45, 48, 66, 2, 2) Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] Relative spec 1: F L0PCH 5ps AF RDWL[N:0] G RDWL[N:0] .. N .. BP01 RCB CKRDM1N44 RCB MCLK Hsw default mclk rise - 55 ar INSTANCE[A] CKRDM1N22 Absolute spec 0: -27 ar G GR2 BR2 Lo L0PCH Hi rdbll0 Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] Relative spec 1: F L0PCH 5ps AF RDWL[N:0] •Pruning is trivial, so all the solutions preserved in this case RDWL[N:0] GR1 BR1 •Solution 3: (21-52, 21-70, 25-52, 2570, 21-52, 21-70, 52, 70, 1, 1) = (-31, 49, -27, -45, -31, -49, 52, 70, 1, 1) RDWL[N]. Absolute spec 2: 27 ar .. N .. INSTANCE[0] Bottom-up Solution Generation: Combining Solutions at fanout of BR2 •Update the required arrival times for solutions BP14 BP13 AP11 OP1 1 Lo Hi Lo Hi BP12 BP11 from BP01 and GR1 •Assume wire-delay from BR2 to BP01 to be 4 units. So, the solutions from L0PCH path are: L1PCH Lo Relative spec 0: R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0 Relative spec 1: F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0 CKRDM1N44 Lo •Sol.1: (-27, -45, -23, -41, -27, -45, 22, 31, 3, 3) L0PCH Hi rdbll0 •Sol. 2: (-31, -49, -27, -45, -31, -49, 24, 33, 2, 2) Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] Relative spec 1: F L0PCH 5ps AF RDWL[N:0] G •Sol. 3: (-35, -53, -31, -49, -35, -53, 26, 35, 1, 1) RDWL[N:0] .. N .. BP01 RCB CKRDM1N44 RCB MCLK Hsw default mclk rise - 55 ar INSTANCE[A] CKRDM1N22 Absolute spec 0: -27 ar G GR2 BR2 Lo L0PCH Hi •Assume wire-delay from BR2 to GR1 to be 6 unit. So, the solutions from RDWL path are: •Sol. 1: (-35, -35, -27, -27, -33, -33, 22, 22, 3, 15) rdbll0 Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] Relative spec 1: F L0PCH 5ps AF RDWL[N:0] •Sol. 2: (-37, -37, -29, -29, -35, -35, 22, 22, 3, 11) RDWL[N:0] GR1 BR1 •Sol. 3: (-39, -39, -31, -31, -37, -37, 22, 22, 3, 7) RDWL[N]. Absolute spec 2: 27 ar .. N .. INSTANCE[0] •Sol. 6: (-41, -41, -33, -33, -39, -39, 24, 24, 2, 6) •Sol. 9: (-43, -43, -35, -35, -41, -41, 26, 26, 1, 5) Bottom-up Solution Generation: Combining Solutions at fanout of BR2 • Combine solutions from L0PCH path those from RDWL path iff (∆ = 5 units) • Rise transition: EFRL0PCH ≥ EFRRDWL+ 5 && ENRL0PCH ≥ ENRRDWL+ 5 • Fall transition: LFFL0PCH ≤ LFFRDWL - 5 && LNFL0PCH ≤ LNFRDWL - 5 • 15 possible solutions: • Combination: Sol. 1L0PCH && Sol. 1RDWLinvalid (ENRL0PCH ≥ ENRRDWL + 5 false) • Combination: Sol. 1L0PCH && Sol. 2RDWL valid • Combination: Sol. 1L0PCH && Sol. 3RDWL valid • Combination: Sol. 1L0PCH && Sol. 6RDWL invalid (LFFL0PCH ≤ LFFRDWL - 5 false) • Combination: Sol. 1L0PCH && Sol. 9RDWL invalid (LFFL0PCH ≤ LFFRDWL - 5 false) • Combination: Sol. 2L0PCH && Sol. 1RDWL invalid (EFRL0PCH ≥ EFRRDWL + 5 false) • Combination: Sol. 2L0PCH && Sol. 2 RDWL invalid (ENRL0PCH ≥ ENRRDWL+ 5 false) • Combination: Sol. 2L0pCH && Sol. 3RDWL invalid (ENRL0PCH ≥ ENRRDWL+ 5 false) • Combination: Sol. 2L0PCH && Sol. 6RDWL valid • Combination: Sol. 2L0pch and Sol. 9RDWLvalid • Sol. 3L0PCH cannot be combined with any solution from RDWL, since ENRL0PCH ≥ ENRRDWL + 5 • Out of 15 only 4 solutions valid (that’s why problem is more difficult than that without relative timing constraints) • Also, the combinations of GR1, BR1, and BP01 should be considered simultaneously (LR-like algorithm is at disadvantage) • For valid combinations, add total powers and keep RATs due to the solutions from RDWL path Bottom-up Solution Generation: Combining Solutions at fanout of BR2 •4 Valid combinations: – Sol. 2: (-37, -37, -29, -29, -35, -35, 22, 22, 3, 11+3) – Sol. 3: (-39, -39, -31, -31, -37, -37, 22, 22, 3, 7+3) – Sol. 6: (-41, -41, -33, -33, -39, -39, 24, 24, 2, 6+2) – Sol. 9: (-43, -43, -35, -35, -41, -41, 26, 26, 1, 5+2) •Consider 3 sizes for BR2 to create 12 possible solutions and then prune, and continue sizing GR2, RCB •At the root, choose the solution whose RAT is the closest to that of grid clock arrival •Next, perform timing analysis and apply the sizing for L1PCH path with absolute arrival time requirement, and by sizing RCB to the same delay from L0PCH/RDWL sizing iteration (Why? ) •In the next iteration, size L0PCH and RDWL paths… •Typically, in 3 iterations sizes converge or do not change much Time Complexity & Limitations • Time complexity: O(NB2), N = number of clock buffers in the tree; B = maximum number of size choices for a clock buffer • Since only part of the clock tree is sized (remaining instantiations get the same sizes), runtimes are fast, in minutes, excluding timing analysis/extraction, etc. • Limitation: • Inaccuracies in delays due to actual slopes - typically, diminish in 3 iterations Agenda • Introduction • Clock Trees for Register Files • Sizing Algorithm • Experimental Results & Conclusion Experimental Results • Algorithm implemented on top of internal timer • Employed for clock tree sizing of ~100 register file blocks in 22 nm • Runs in minutes on these blocks • Saves ~22% of clock power over traditional sizing Conclusion • Algorithm results in power savings and productivity improvement • Has been proliferated for use in advanced technologies • Has also been incorporated in proprietary RF automation flow • Resulted in wide use across microprocessors, from low power to high performance applications • Applicable to sizing for races in general, as long as those can be capturing employing time constraints Acknowledgments • Authors would like to thank several of their colleagues for review and comments Questions
© Copyright 2024 Paperzz