Construction of Realistic Gate Sizing Benchmarks With Known Optimal Solutions Andrew B. Kahng, Seokhyeong Kang VLSI CAD LABORATORY, UC San Diego International Symposium on Physical Design March 27th, 2012 UC San Diego / VLSI CAD Laboratory -1- Outline Background and Motivation Benchmark Generation Experimental Framework and Results Conclusions and Ongoing Work -2- Gate Sizing in VLSI Design Gate sizing – Essential for power, delay and area optimization – Tunable parameters: gate-width, gate-length and threshold voltage – Sizing problem seen in all phases of RTL-to-GDS flow Common heuristics/algorithms – LP, Lagrangian relaxation, convex optimization, DP, sensitivity-based gradient descent, ... 1. Which heuristic is better? 2. How suboptimal a given sizing solution is? systematic and quantitative comparison is required -3- Suboptimality of Sizing Heuristics Eyechart * Chain STAR MESH – Built from three basic topologies, optimally sized with DP – allow suboptimalities to be evaluated – Non-realistic: Eyechart circuits have different topology from real design – large depth (650 stages) and small Rent parameter (0.17) More realistic benchmarks are required along w/ automated generation flow *Gupta et al., “Eyecharts: Constructive Benchmarking of Gate Sizing Heuristics”, DAC 2010. -4- Our Work: Realistic Benchmark Generation w/ Known Optimal Solution 1. Propose benchmark circuits with known optimal solutions 2. The benchmarks resemble real designs – Gate count, path depth, Rent parameter and net degree Characteristic parameters Extract parameters Construct chains Find optimal solution Real design Netlist generator 3. Assess suboptimality of standard gate sizing approaches Automated benchmark generation flow Connect chains Benchmark circuit keeping the optimal solution w/ known optimal solution -5- Outline Background and Motivation Benchmark Considerations and Generation Experimental Framework and Results Conclusions and Ongoing Work -6- Benchmark Considerations Realism vs. Tractability to Analysis – opposing goals To construct realistic benchmark: use design characteristic parameters – # primary ports, path depth, fanin/fanout distribution Path depth: 72 Avg. net degree: 1.84 Rent parameter: 0.72 0.6 0.4 0.2 fanin fanout 0 1 2 3 4 5 6 design: JPEG Encoder Fanin distirbution 25%: 1-input 60%: 2-input 15%: >3-input To enable known optimal solutions – Library simplification as in Gupta et al. 2010: slew-independent library -7- Benchmark Generation Input parameters 1. timing budget T 2. depth of data path K 3. number of primary ports N 4. fanin, fanout distribution fid(i), fod(j) Constraints – T should be larger than min. delay of K-stage chain – 𝐼 𝑖=1 𝑖 ∙ 𝑓𝑖𝑑(𝑖) = 𝑂 𝑜=1 𝑜 ∙ 𝑓𝑜𝑑(𝑜) Generation flow 1. construct N chains with depth K 2. attach connection cells (C ) 3. connect chains netlist with N*K + C cells -8- Benchmark Generation: Construct Chains chain1 gate(1,1) chain2 ... stage1 stageK-1 stageK chainN gate(N,K) 1. Construct N chains each with depth k (N*k cells) 2. Assign gate instance according to fid(i) 3. Assign # fanouts to output ports according to fod(o) Assignment strategy: arranged and random -9- Benchmark Generation: Construct Chains fanout fanin Arranged assignment Random assignment 1. Construct N chains each with depth k (N*k cells) 2. Assign gate instance according to fid(i) 3. Assign # fanouts to output ports according to fod(o) Assignment strategy: arranged and random -10- Benchmark Generation: Find Optimal Solution with DP connection cell chain1 chain2 ... chainN 1. Attach connection cells to all open fanouts - to connect chains keeping optimal solution 2. Perform dynamic programming with timing budget T - optimal solution is achievable w/ slew-independent lib. -11- Benchmark Generation: Solving a Chain Optimally (Example) Stage 1 Stage 3 Stage 2 Dmax = 8 6 INV1 INV3 INV2 Stage 1 Load =3 Load =6 size Size 1 3 5 3 4 Size 2 6 10 1 2 Stage 3 Stage 2 Budget Power Size 1 10 2 10 3 5 4 5 5 5 6 5 7 5 8 5 2 10 3 10 4 5 5 5 6 5 7 5 8 5 delay leakage power load 3 load 6 input cap 2 2 1 1 1 1 1 1 2 2 1 1 1 1 1 Budget Power Size Load =3 Load =6 3 4 5 6 7 8 20 15 15 10 10 10 2 1 2 1 1 1 4 5 6 7 8 20 15 15 10 10 2 1 2 1 1 Budget Power Size 8 20 1 8 25 2 OPTIMIZED CHAIN size 2 size 1 size 1 -12- Benchmark Generation: Connect Chains VDD chain1 chain2 g ag ... wc,g c ac dg c chainN 1. Run STA and find arrival time for each gate 2. Connect each connection cell to open fanin port - connect only if timing constraints are satisfied - connection cells do not change the optimal chain solution 3. Tie unconnected ports to logic high or low -13- Benchmark Generation: Generated Netlist Generated output: – benchmark circuit of N*K + C cells w/ optimal solution Chains are connected to each other(N=various Schematic of generated netlist 10, K =topologies 20) -14- Outline Background and Motivation Benchmark Generation Experimental Framework and Results Conclusions and Ongoing Work -15- Experimental Setup Delay and Power model (library) – LP: linear increase in power – gate sizing context – EP: exponential increase in power – Vt or gate-length Heuristics compared – Two commercial tools (BlazeMO, Cadence Encounter) – UCLA sizing tool – UCSD sensitivity-based leakage optimizer Realistic benchmarks: six open-source designs Suboptimality calculation powerheuristic - poweropt Suboptimality = poweropt -16- Generated Benchmark - Complexity Complexity (suboptimality) of generated benchmark Chain-only vs. connected-chain topologies Greedy Commercial tool Suboptimality 20.0% 15.0% chain-only connected 20.0% 15.0% 10.0% 10.0% 5.0% 5.0% 0.0% Chain-only: avg. 2.1% Connected-chain: avg. 12.8% chain-only connected 0.0% [library]-[N]-[k] -17- Generated Benchmark - Connectivity Problem complexity and circuit connectivity 1. Arranged assignment: improve connectivity (larger fanin – later stage, larger fanout – earlier stage) 2. Random assignment: improve diversity of topology arranged random unconnected Subopt. 100% 0% 0.00% 2.60% 75% 25% 0.00% 6.80% 50% 50% 0.25% 10.30% 25% 75% 0.75% 11.20% 0% 100% 17.00% 7.70% -18- Suboptimality w.r.t. Parameters For different number of chains 14% 10000 runtime (min) subopt.(Comm) subopt.(Greedy) subopt.(SensOpt) runtime(Comm) runtime(Greedy) runtime(SensOpt) runtime (min) suboptimality 13% subopt.(Comm) subopt.(Greedy) subopt.(SensOpt) runtime(Comm) runtime(Greedy) runtime(SensOpt) 1000 12% 11% 100 10% 10 9% 8% 1 40 80 160 320 640 number of chains For different number of stages 14% suboptimality 1000 13% 12% 100 11% 10% 10 9% 8% 1 20 40 60 80 100 number of stages Total # paths increase significantly w.r.t. N and K -19- Suboptimality w.r.t. Parameters (2) For different average net degrees suboptimality 120% 1000.0 100% 100.0 80% 60% 10.0 40% 1.0 20% 0% 0.1 1.2 1.6 2 runtime (min) 2.4 subopt.(Comm) subopt.(Greedy) subopt.(SensOpt) runtime(Comm) runtime(Greedy) runtime(SensOpt) average net degree For different delay constraints 25% 100.0 20% 10.0 15% 10% 1.0 5% 0% 0.1 0.4 0.5 0.6 0.7 0.8 0.9 timing constraint (ns) 1 runtime (min) suboptimality subopt.(Comm) subopt.(Greedy) subopt.(SensOpt) runtime(Comm) runtime(Greedy) runtime(SensOpt) 1.1 -20- Generated Realistic Benchmarks Target benchmarks – SASC, SPI, AES, JPEG, MPEG (from OpenCores) – EXU (from OpenSPARC T1) Characteristic parameters of real and generated benchmarks real designs generated data depth #instance Rent param. net degree Rent param. net degree SASC 20 624 0.858 2.06 0.865 2.06 SPI 33 1092 0.880 1.81 0.877 1.80 EXU 31 25560 0.858 1.91 0.814 1.90 AES 23 23622 0.810 1.89 0.820 1.88 JPEG 72 141165 0.721 1.84 0.831 1.84 MPEG 33 578034 0.848 1.59 0.848 1.60 -21- Suboptimality of Heuristics Suboptimality w.r.t. known optimal solutions for generated realistic benchmarks 60.00% 40.00% Suboptimality With EP library Comm1 Comm2 Greedy SensOpt 20.00% 0.00% eyechart 60.00% SASC SPI EXU JPEG MPEG * Greedy results for MPEG are missing With LP library Comm1 AES Vt swap context – up to 52.2% avg. 16.3% Comm2 Greedy SensOpt Gate sizing context – 40.00% up to 43.7% avg. 25.5% 20.00% 0.00% eyechart SASC SPI AES EXU JPEG MPEG -22- Comparison w/ Real Designs Suboptimality versus one specific heuristic (SensOpt) 40.00% Comm1 30.00% Comm2 Greedy 20.00% 10.00% 0.00% -10.00% SASC SPI AES EXU JPEG MPEG Real designs and real delay/leakage library (TSMC 65nm) case Actual suboptimaltiy will be greater ! 50.00% 40.00% 30.00% Comm1 Comm2 Greedy Suboptimality from our benchmarks 20.00% 10.00% 0.00% -10.00% SASC SPI AES EXU JPEG MPEG Discrepancy: simplified delay model, reduced library set, ... -23- Conclusions A new benchmark generation technique for gate sizing construct realistic circuits with known optimal solutions Our benchmarks enable systematic and quantitative study of common sizing heuristics Common sizing methods are suboptimal for realistic benchmarks by up to 52.2% (Vt assignment) and 43.7% (sizing) http://vlsicad.ucsd.edu/SIZING/ -24- Ongoing Work Analyze discrepancies between real and artificial benchmarks Handle more realistic delay model – Use realistic delay library in the context of realistic benchmarks with tight upper bounds Alternate approach for netlist generation – (1) cutting nets in a real design and find optimal solution (2) reconnecting the nets keeping the optimal solution -25- Thank you -26-
© Copyright 2026 Paperzz