Variability-Driven Formulation for
Simultaneous Gate Sizing and
Post-Silicon Tunability Allocation
Vishal Khandelwal and Ankur Srivastava
Department of Electrical and Computer Engineering
University of Maryland College Park
http://www.ece.umd.edu/~vishalk
Introduction
Process variations cause significant spread in design
performance in sub 90nm technologies
Impact yield and reliability
It is necessary to explicitly consider the impact of
process variations on design parameters
Several statistical analysis and optimization techniques
have been proposed to improve timing/power yields
2
Handling Process Variations
Process Variations
Design-Time Optimization
Statistical Gate Sizing
[Davoodi, DAC’06]
[Sapatnekar, DAC’05]
[Zhou, ICCAD’05]
Post-Fabrication Tunability
Post-Silicon Tunable
Clock-Tree Buffers
[Chen, ICCAD’05]
[Mahoney, ISSC’05]
[Takahashi, 2003]
[Tam, JSSC’00]
Statistical Buffer Insertion
[He, ISPD’06]
[Davoodi, ICCD’05]
[Wong, ICCAD’05]
[Khandelwal, ICCAD’03]
3
Adaptive Body-Biasing
[Kim, ISLPED’03]
[Orshansky, ICCAD’06]
Traditional Gate Sizing
Gate size: si
Minimize area, or power
Subject to:
meeting a delay constraint at the output
size constraints
Tcons
tj
0
i
ti
n
di
Minimize Area, Power, …
t0 0
t j d i ( s ) ti
t n Tcons
smin i si smax i
4
[Fishburn, Dunlop 1985]
[Sapatnekar,1993]
Traditional Gate Sizing
j
di ai 0
i
a
sj
ij
jFO ( i )
si
Posynomial Gate Delay Expression
[Fishburn, Dunlop 1985]
[Sapatnekar,1993]
Minimize Area, Power, …
t0 0
t j d i ( s ) ti
Minimize Area, Power, …
si e x
i
( ai 0 aij e
t n Tcons
smin i si smax i
5
t0 0
x j xi
) t j ti 0
j
Convex
Formulation
d i (x)
t n Tcons
smin i e x smax i
i
Effects of Process Variations
Tox
n+
{Leff , Tox ,...}
Set of random
variables with
arbitrary
distributions
Leff
n+
di ( s, ) ai 0 ()
jFanout ( i )
aij () s j
si
Delay of each gate becomes
a random variable
Statistical Gate Sizing
[Davoodi, DAC’06]
[Sapatnekar, DAC’05]
[Zhou, ICCAD’05]
6
Post-Silicon Tunable (PST) Clock Tree Buffers
B1
B2
B4
FF
1
FF
2
B3
B5
FF
3
FF
4
B6
FF
5
FF
6
B7
FF
7
FF
8
Tunable clock buffers can introduce extra slack into
critical paths after fabrication
Design Overhead
7
Area, Clock-Tree Power
[Chen, ICCAD’05]
[Mahoney, ISSC’05]
[Takahashi, 2003]
[Tam, JSSC’00]
Post-Silicon Tunable Clock Tree Buffers
B1
B2
B4
FF
1
FF
2
B3
B5
FF
3
FF
4
B6
FF
5
FF
6
B7
FF
7
FF
8
Let Dij be the delay of the longest path between flip-flops i
and j
Ti Dij Tclk T j Tset
Consider Flip-Flops 2 and 7: Tune buffers to change
clock-skew
(T2 T1Buf T2Buf T4Buf ) D27 (T7 T1Buf T3Buf T7Buf ) Tclk Tset
0 Ti Buf MaxiBuf
8
Optimization Objective: Tunability Cost
Metric to capture the overhead due to PST buffers in the
design
Silicon Area
Clock-Tree
Power
TunabilityCost
iPST Buffers
9
MaxiBuf
Optimization Objective: Binning Yield Loss
fT (t )
Loss
Delay (t)
Tcons
BinningYieldLoss T Loss (t ) fT (t )dt
(BYL)
cons
Convex loss function Q(.)
BYL
Tcons
Q(t Tcons ) fT (t )dt
[V. Zolotov, DAC’04]
10
[D. Blaauw, GLSVLSI’05]
Problem Statement
Given a sequential design with a synthesized PST clocktree (known buffer locations), perform simultaneous
Statistical gate sizing
PST buffer tuning range determination
Such that Binning Yield Loss and Tunability Cost is
minimized
B1
Tcons
0
i
di
11
B2
n
B4
F
F
1
F
F
2
B3
B5
F
F
3
F
F
4
B6
F
F
5
F
F
6
B7
F
F
7
F
F
8
Two-Stage Formulation
Gate Size: x , Tuning Buffer Range: r
Minimize ( BYL( x, r ) TunabilityCost (r ) GateSizes)
First
Stage
Ti Dij Tclk T j Tset FlipFlops (i, j )
t p d q ( x , 0 ) tq p fanin(q )
FlipFlops (i, j )
q fanin( FlipFlop j )
tq Dij
xmin x xmax
Buf
0 r Max
1. Deterministic constraints:
meeting timing requirement assuming no variations
0 {leff 0 , tox 0 ,...}
2. Capturing variability in objective
12
Second Stage Formulation
fT (t )
v
Loss
Q
Q( Dij ( x , r , ) Tcons )
V ( x , r , )
0
Dij Tcons
Otherwise
Tcons
BYL( x , r )
Tcons
Q( Dij ( x , r , ) Tcons ) fT (t )dt v fV (v)dv E[V ( x , r , )]
( x0 , r0 , )
Given
a solution
to the Timing
first stage
problem scheme
and a variability
No
Statistical
Analysis
existssample:
to estimate
viol
the timing
given
gate sizes and
v( x0 , r0distribution
, ) MinimizeofaFFcircuit
Q
(
T
)
ij
(i , j )
tuning buffer ranges
Buf
Buf
(
T
T
)
D
(
x
,
)
T
(
T
T
)
i
ij
0 requires
clk
j
k
kof
Ci kvariability
k Cjamount
Each sample
different
of
tuning for maximum timing yield T T viol FlipFlop(i, j )
set
ij
t p d q ( x0 , ) tq p fanin(q )
Second
FlipFlop (i, j )
Stage
q fanin( FlipFlop ( j ))
tq Dij ( x0 , )
viol
FlipFlop(i, j )
Tij 0
0 T Buf r
PST Buffer
0
13
Convex Problem
THEOREM: The proposed two-stage stochastic programming
formulation is convex
PROOF:
Detailed proof omitted for brevity
First stage constraints are convex
First stage objective is convex if BYL(x,r) is convex
0
BYL( x, r ) E[V ( x, r , )] v fV (v)dv V ( x, r , ) f ()d
Need to show each sample V ( x , r , ) is convex
From second stage formulation one can show that V ( x, r , )
is convex
14
Kelley’s Cutting Plane Algorithm
Iteratively solve first and second stage formulation
Given a solution to the first stage formulation, we use
method of finite differences to generate a lower bound to
BYL from the second stage formulation
BYL( x, r ) k k ,( x, r )
Add this constraint to the first stage formulation at each iteration
15
Shortest-Path Constraints
Inherently non-convex in nature
j
Ti Dijshort Tj Thold
FlipFlop(i, j)
Approximate gate delay using a linear approximation
(lower bound)
short p
Dij
d mlin
m
d mlin am0 a1m xm
gates m on path p
n fanout ( m )
bn xn
The two-stage stochastic programming formulation can
be modified to consider shortest path constraints
16
Experimental Results
Implemented the framework in SIS using
MOSEK to solve the convex formulation
Used CAPO to place netlist to get spatially
correlated gate delays
yj
yi
Assumed 15% Vth variation in 90nm
technology node [Predictive Technology Model]
Synthesized the PST clock-tree using the
technique proposed in [Chen et. al, ICCAD’05]
17
j
i
xj xi
Experimental Results
Experimental Comparison – ISCAS benchmarks
[Chen]:
Nominal gate sizing
PST clock-tree generation using [Chen et. al, ICCAD’05]
Sensitivity:
Retain PST clock-tree location and range
Sensitivity-driven statistical gate sizing algorithm
– Size the gate with maximum yield gain greedily (iterative)
– Similar in spirit to [Zhou ICCAD’05, Zolotov DAC’05]
Stochastic:
18
Retain PST clock-tree buffer locations
Proposed simultaneous gate sizing and post-silicon tunability
allocation algorithm
BYL, Area and Tuning Range Comparison
Binning Yield Loss
300000
250000
200000
[Chen]
Sensitivity
Stochastic
150000
100000
50000
0
s344
s382
s400
s526
s635
Area (Logic Gates) Comparison
Tuning Range Comparison
9000
14
8000
12
10
7000
[Chen]
Sensitivity
Stochastic
6000
5000
4000
6
4
2
3000
0
s344
19
[Chen]
Sensitivity
Stochastic
8
s382
s400
s526
s635
s344
s382
s400
s526
s635
Timing Yield Loss Comparison
Timing Yield Loss
0.3
0.25
0.2
[Chen]
Sensitivity
Stochastic
0.15
0.1
0.05
0
s344
Average Timing
Yield Loss
20
s382
s400
s526
s635
[Chen]
Sensitivity
Stochastic
0.22
0.19
0.03
Runtime Comparison
Runtime
400
350
300
250
200
150
100
50
0
Sensitivity
Stochastic
s344
s382
s526
s635
Technique
s344
s382
s400
s526
s635
Sensitivity
24
40
18
15
109
Stochastic
7
19
13
14
7
Number of Iterations
21
s400
Summary and Future Work
Variability-driven framework for simultaneous gate sizing
and post-silicon tunability allocation to minimize binningyield loss and tunability cost
Efficient stochastic programming based scheme to solve
the formulation
No assumptions about parameter distribution or their
correlations
Need to develop a statistical timing analysis scheme that
can consider the effect of post-silicon tunability
22
Thank You!
23
© Copyright 2026 Paperzz