Presentation kit - UCSD VLSI CAD Lab

Construction of Realistic Gate
Sizing Benchmarks
With Known Optimal Solutions
Andrew B. Kahng, Seokhyeong Kang
VLSI CAD LABORATORY, UC San Diego
International Symposium on Physical Design
March 27th, 2012
UC San Diego / VLSI CAD Laboratory
-1-
Outline




Background and Motivation
Benchmark Generation
Experimental Framework and Results
Conclusions and Ongoing Work
-2-
Gate Sizing in VLSI Design

Gate sizing
– Essential for power, delay and area optimization
– Tunable parameters: gate-width, gate-length and
threshold voltage
– Sizing problem seen in all phases of RTL-to-GDS flow

Common heuristics/algorithms
– LP, Lagrangian relaxation, convex optimization, DP,
sensitivity-based gradient descent, ...
1. Which heuristic is better?
2. How suboptimal a given sizing solution is?
 systematic and quantitative comparison is required
-3-
Suboptimality of Sizing Heuristics

Eyechart *
Chain
STAR
MESH
– Built from three basic topologies, optimally sized with
DP – allow suboptimalities to be evaluated
– Non-realistic: Eyechart circuits have different topology
from real design – large depth (650 stages) and small
Rent parameter (0.17)

More realistic benchmarks are required along w/
automated generation flow
*Gupta et al., “Eyecharts: Constructive Benchmarking of Gate Sizing Heuristics”,
DAC 2010.
-4-
Our Work: Realistic Benchmark
Generation w/ Known Optimal Solution
1. Propose benchmark circuits with known optimal solutions
2. The benchmarks resemble real designs
– Gate count, path depth, Rent parameter and net degree
Characteristic
parameters
Extract
parameters
Construct chains
Find optimal
solution
Real design
Netlist generator
3. Assess suboptimality of standard gate sizing approaches
Automated benchmark
generation flow
Connect chains
Benchmark
circuit
keeping the optimal solution
w/ known optimal
solution
-5-
Outline




Background and Motivation
Benchmark Considerations and
Generation
Experimental Framework and Results
Conclusions and Ongoing Work
-6-
Benchmark Considerations


Realism vs. Tractability to Analysis – opposing goals
To construct realistic benchmark: use design
characteristic parameters
– # primary ports, path depth, fanin/fanout distribution
Path depth: 72
Avg. net degree: 1.84
Rent parameter: 0.72
0.6
0.4
0.2
fanin
fanout
0
1

2
3
4
5
6
design:
JPEG Encoder
Fanin distirbution
25%: 1-input
60%: 2-input
15%: >3-input
To enable known optimal solutions
– Library simplification as in Gupta et al. 2010:
slew-independent library
-7-
Benchmark Generation

Input parameters
1. timing budget T
2. depth of data path K
3. number of primary ports N
4. fanin, fanout distribution fid(i), fod(j)

Constraints
– T should be larger than min. delay of K-stage chain
–

𝐼
𝑖=1 𝑖
∙ 𝑓𝑖𝑑(𝑖) =
𝑂
𝑜=1 𝑜
∙ 𝑓𝑜𝑑(𝑜)
Generation flow
1. construct N chains with depth K
2. attach connection cells (C )
3. connect chains
 netlist with N*K + C cells
-8-
Benchmark Generation:
Construct Chains
chain1
gate(1,1)
chain2
...
stage1
stageK-1
stageK
chainN
gate(N,K)
1. Construct N chains each with depth k (N*k cells)
2. Assign gate instance according to fid(i)
3. Assign # fanouts to output ports according to fod(o)
 Assignment strategy: arranged and random
-9-
Benchmark Generation:
Construct Chains
fanout
fanin
Arranged assignment
Random assignment
1. Construct N chains each with depth k (N*k cells)
2. Assign gate instance according to fid(i)
3. Assign # fanouts to output ports according to fod(o)
 Assignment strategy: arranged and random
-10-
Benchmark Generation:
Find Optimal Solution with DP
connection cell
chain1
chain2
...
chainN
1. Attach connection cells to all open fanouts
- to connect chains keeping optimal solution
2. Perform dynamic programming with timing budget T
- optimal solution is achievable w/ slew-independent lib.
-11-
Benchmark Generation:
Solving a Chain Optimally (Example)
Stage 1
Stage 3
Stage 2
Dmax = 8
6
INV1
INV3
INV2
Stage 1
Load
=3
Load
=6
size
Size 1
3
5
3
4
Size 2
6
10
1
2
Stage 3
Stage 2
Budget Power Size
1 10
2 10
3 5
4 5
5 5
6 5
7 5
8 5
2 10
3 10
4 5
5 5
6 5
7 5
8 5
delay
leakage
power load 3 load 6
input
cap
2
2
1
1
1
1
1
1
2
2
1
1
1
1
1
Budget Power Size
Load
=3
Load
=6
3
4
5
6
7
8
20
15
15
10
10
10
2
1
2
1
1
1
4
5
6
7
8
20
15
15
10
10
2
1
2
1
1
Budget Power Size
8
20
1
8
25
2
OPTIMIZED CHAIN
size 2
size 1
size 1
-12-
Benchmark Generation:
Connect Chains
VDD
chain1
chain2
g ag
...
wc,g
c ac
dg c
chainN
1. Run STA and find arrival time for each gate
2. Connect each connection cell to open fanin port
- connect only if timing constraints are satisfied
- connection cells do not change the optimal chain solution
3. Tie unconnected ports to logic high or low
-13-
Benchmark Generation:
Generated Netlist
 Generated output:
– benchmark circuit of N*K + C cells w/ optimal solution
Chains
are connected
to each
other(N=various
Schematic
of generated
netlist
10, K =topologies
20)
-14-
Outline




Background and Motivation
Benchmark Generation
Experimental Framework and Results
Conclusions and Ongoing Work
-15-
Experimental Setup


Delay and Power model (library)
– LP: linear increase in power – gate sizing context
– EP: exponential increase in power – Vt or gate-length
Heuristics compared
– Two commercial tools (BlazeMO, Cadence Encounter)
– UCLA sizing tool
– UCSD sensitivity-based leakage optimizer


Realistic benchmarks: six open-source designs
Suboptimality calculation
powerheuristic - poweropt
Suboptimality =
poweropt
-16-
Generated Benchmark - Complexity

Complexity (suboptimality) of generated benchmark
Chain-only vs. connected-chain topologies
Greedy
Commercial tool
Suboptimality
20.0%
15.0%
chain-only
connected
20.0%
15.0%
10.0%
10.0%
5.0%
5.0%
0.0%
Chain-only: avg. 2.1%
Connected-chain: avg. 12.8%
chain-only
connected
0.0%
[library]-[N]-[k]
-17-
Generated Benchmark - Connectivity

Problem complexity and circuit connectivity
1. Arranged assignment: improve connectivity
(larger fanin – later stage, larger fanout – earlier stage)
2. Random assignment: improve diversity of topology
arranged
random
unconnected
Subopt.
100%
0%
0.00%
2.60%
75%
25%
0.00%
6.80%
50%
50%
0.25%
10.30%
25%
75%
0.75%
11.20%
0%
100%
17.00%
7.70%
-18-
Suboptimality w.r.t. Parameters

For different number of chains
14%
10000
runtime (min)
subopt.(Comm)
subopt.(Greedy)
subopt.(SensOpt)
runtime(Comm)
runtime(Greedy)
runtime(SensOpt)
runtime (min)
suboptimality
13%
subopt.(Comm)
subopt.(Greedy)
subopt.(SensOpt)
runtime(Comm)
runtime(Greedy)
runtime(SensOpt)
1000
12%
11%
100
10%
10
9%
8%
1
40
80
160
320
640
number of chains
For different number of stages
14%
suboptimality

1000
13%
12%
100
11%
10%
10
9%
8%
1
20
40
60
80
100
number of stages
Total # paths increase significantly w.r.t. N and K
-19-
Suboptimality w.r.t. Parameters (2)
For different average net degrees
suboptimality
120%
1000.0
100%
100.0
80%
60%
10.0
40%
1.0
20%
0%
0.1
1.2
1.6
2
runtime (min)

2.4
subopt.(Comm)
subopt.(Greedy)
subopt.(SensOpt)
runtime(Comm)
runtime(Greedy)
runtime(SensOpt)
average net degree
For different delay constraints
25%
100.0
20%
10.0
15%
10%
1.0
5%
0%
0.1
0.4
0.5
0.6
0.7
0.8
0.9
timing constraint (ns)
1
runtime (min)
suboptimality

subopt.(Comm)
subopt.(Greedy)
subopt.(SensOpt)
runtime(Comm)
runtime(Greedy)
runtime(SensOpt)
1.1
-20-
Generated Realistic Benchmarks

Target benchmarks
– SASC, SPI, AES, JPEG, MPEG (from OpenCores)
– EXU (from OpenSPARC T1)

Characteristic parameters of real and generated benchmarks
real designs
generated
data
depth
#instance
Rent
param.
net
degree
Rent
param.
net
degree
SASC
20
624
0.858
2.06
0.865
2.06
SPI
33
1092
0.880
1.81
0.877
1.80
EXU
31
25560
0.858
1.91
0.814
1.90
AES
23
23622
0.810
1.89
0.820
1.88
JPEG
72
141165
0.721
1.84
0.831
1.84
MPEG
33
578034
0.848
1.59
0.848
1.60
-21-
Suboptimality of Heuristics

Suboptimality w.r.t. known optimal solutions for
generated realistic benchmarks
60.00%
40.00%
Suboptimality
With EP library
Comm1
Comm2
Greedy
SensOpt
20.00%
0.00%
eyechart
60.00%
SASC
SPI
EXU
JPEG
MPEG
* Greedy results for MPEG are missing
With LP library
Comm1
AES
Vt swap
context –
up to 52.2%
avg. 16.3%
Comm2
Greedy
SensOpt
Gate sizing
context –
40.00%
up to 43.7%
avg. 25.5%
20.00%
0.00%
eyechart
SASC
SPI
AES
EXU
JPEG
MPEG
-22-
Comparison w/ Real Designs

Suboptimality versus one specific heuristic (SensOpt)
40.00%
Comm1
30.00%
Comm2
Greedy
20.00%
10.00%
0.00%
-10.00%
SASC
SPI
AES
EXU
JPEG
MPEG
Real designs and real
delay/leakage library
(TSMC 65nm) case
Actual suboptimaltiy
will be greater !
50.00%
40.00%
30.00%
Comm1
Comm2
Greedy
Suboptimality from
our benchmarks
20.00%
10.00%
0.00%
-10.00%

SASC
SPI
AES
EXU
JPEG
MPEG
Discrepancy: simplified delay model, reduced library set, ...
-23-
Conclusions




A new benchmark generation technique for gate
sizing  construct realistic circuits with known
optimal solutions
Our benchmarks enable systematic and
quantitative study of common sizing heuristics
Common sizing methods are suboptimal for
realistic benchmarks by up to 52.2% (Vt
assignment) and 43.7% (sizing)
http://vlsicad.ucsd.edu/SIZING/
-24-
Ongoing Work


Analyze discrepancies between real and
artificial benchmarks
Handle more realistic delay model
– Use realistic delay library in the context of
realistic benchmarks with tight upper bounds

Alternate approach for netlist generation
– (1) cutting nets in a real design and find
optimal solution  (2) reconnecting the nets
keeping the optimal solution
-25-
Thank you
-26-