ppt

Variability-Driven Formulation for
Simultaneous Gate Sizing and
Post-Silicon Tunability Allocation
Vishal Khandelwal and Ankur Srivastava
Department of Electrical and Computer Engineering
University of Maryland College Park
http://www.ece.umd.edu/~vishalk
Introduction
 Process variations cause significant spread in design
performance in sub 90nm technologies
 Impact yield and reliability
 It is necessary to explicitly consider the impact of
process variations on design parameters
 Several statistical analysis and optimization techniques
have been proposed to improve timing/power yields
2
Handling Process Variations
Process Variations
Design-Time Optimization
 Statistical Gate Sizing
[Davoodi, DAC’06]
[Sapatnekar, DAC’05]
[Zhou, ICCAD’05]
Post-Fabrication Tunability
 Post-Silicon Tunable
Clock-Tree Buffers
[Chen, ICCAD’05]
[Mahoney, ISSC’05]
[Takahashi, 2003]
[Tam, JSSC’00]
 Statistical Buffer Insertion
[He, ISPD’06]
[Davoodi, ICCD’05]
[Wong, ICCAD’05]
[Khandelwal, ICCAD’03]
3
 Adaptive Body-Biasing
[Kim, ISLPED’03]
[Orshansky, ICCAD’06]
Traditional Gate Sizing
 Gate size: si
 Minimize area, or power

Subject to:


meeting a delay constraint at the output
size constraints
Tcons
tj
0
i
ti
n
di
Minimize Area, Power, …
t0  0
t j  d i ( s )  ti
t n  Tcons
smin i  si  smax i
4
[Fishburn, Dunlop 1985]
[Sapatnekar,1993]
Traditional Gate Sizing
j
di  ai 0 
i
a
 sj
ij
jFO ( i )
si
Posynomial Gate Delay Expression
[Fishburn, Dunlop 1985]
[Sapatnekar,1993]
Minimize Area, Power, …
t0  0
t j  d i ( s )  ti
Minimize Area, Power, …
si  e x
i
( ai 0   aij e
t n  Tcons
smin i  si  smax i
5
t0  0
x j  xi
)  t j  ti  0
j
Convex
Formulation
d i (x)
t n  Tcons
smin i  e x  smax i
i
Effects of Process Variations
Tox
n+
  {Leff , Tox ,...}
Set of random
variables with
arbitrary
distributions
Leff
n+
di ( s, )  ai 0 () 

jFanout ( i )
aij ()  s j
si
 Delay of each gate becomes
a random variable
 Statistical Gate Sizing
[Davoodi, DAC’06]
[Sapatnekar, DAC’05]
[Zhou, ICCAD’05]
6
Post-Silicon Tunable (PST) Clock Tree Buffers
B1
B2
B4
FF
1
FF
2
B3
B5
FF
3
FF
4
B6
FF
5
FF
6
B7
FF
7
FF
8
 Tunable clock buffers can introduce extra slack into
critical paths after fabrication
 Design Overhead

7
Area, Clock-Tree Power
[Chen, ICCAD’05]
[Mahoney, ISSC’05]
[Takahashi, 2003]
[Tam, JSSC’00]
Post-Silicon Tunable Clock Tree Buffers
B1
B2
B4
FF
1
FF
2
B3
B5
FF
3
FF
4
B6
FF
5
FF
6
B7
FF
7
FF
8
 Let Dij be the delay of the longest path between flip-flops i
and j
Ti  Dij  Tclk  T j  Tset
 Consider Flip-Flops 2 and 7: Tune buffers to change
clock-skew
(T2  T1Buf  T2Buf  T4Buf )  D27  (T7  T1Buf  T3Buf  T7Buf )  Tclk  Tset
0  Ti Buf  MaxiBuf
8
Optimization Objective: Tunability Cost
 Metric to capture the overhead due to PST buffers in the
design
 Silicon Area
 Clock-Tree
Power
TunabilityCost 

iPST  Buffers
9
MaxiBuf
Optimization Objective: Binning Yield Loss
fT (t )
Loss
Delay (t)
Tcons

BinningYieldLoss  T Loss (t )  fT (t )dt
(BYL)
cons
 Convex loss function Q(.)
BYL  

Tcons
Q(t  Tcons ) fT (t )dt
[V. Zolotov, DAC’04]
10
[D. Blaauw, GLSVLSI’05]
Problem Statement
Given a sequential design with a synthesized PST clocktree (known buffer locations), perform simultaneous
 Statistical gate sizing
 PST buffer tuning range determination
Such that Binning Yield Loss and Tunability Cost is
minimized
B1
Tcons
0
i
di
11
B2
n
B4
F
F
1
F
F
2
B3
B5
F
F
3
F
F
4
B6
F
F
5
F
F
6
B7
F
F
7
F
F
8
Two-Stage Formulation
 Gate Size: x , Tuning Buffer Range: r
Minimize ( BYL( x, r )  TunabilityCost (r )   GateSizes)
First
Stage
Ti  Dij  Tclk  T j  Tset FlipFlops (i, j )


t p  d q ( x ,  0 )  tq p  fanin(q )
 FlipFlops (i, j )

q  fanin( FlipFlop j ) 
tq  Dij

 xmin  x  xmax

Buf
0  r  Max
1. Deterministic constraints:
 meeting timing requirement assuming no variations
0  {leff 0 , tox 0 ,...}
2. Capturing variability in objective
12
Second Stage Formulation
fT (t )
v
Loss
Q
Q( Dij ( x , r , )  Tcons )
V ( x , r , )  
0

Dij  Tcons
Otherwise
Tcons
BYL( x , r )  

Tcons

Q( Dij ( x , r , )  Tcons ) fT (t )dt   v  fV (v)dv  E[V ( x , r , )]

( x0 , r0 ,  )
Given
a solution
to the Timing
first stage
problem scheme
and a variability
 No
Statistical
Analysis
existssample:
to estimate
viol
the timing
given
gate sizes and
v( x0 , r0distribution
,  )  MinimizeofaFFcircuit
Q
(
T
)
ij
(i , j )
tuning buffer ranges
Buf
Buf

(
T

T
)

D
(
x
,

)

T

(
T

T
)


i
ij
0 requires
clk
j
k
kof
Ci kvariability
k Cjamount
 Each sample
different
of

tuning for maximum timing yield  T  T viol FlipFlop(i, j )
set
ij


t p  d q ( x0 ,  )  tq p  fanin(q )
Second

 FlipFlop (i, j )
Stage
q  fanin( FlipFlop ( j )) 
tq  Dij ( x0 ,  )
 viol
FlipFlop(i, j )
Tij  0
0  T Buf  r
PST Buffer
0
13

Convex Problem
THEOREM: The proposed two-stage stochastic programming
formulation is convex
PROOF:
Detailed proof omitted for brevity
 First stage constraints are convex
 First stage objective is convex if BYL(x,r) is convex




0
BYL( x, r )  E[V ( x, r , )]   v  fV (v)dv   V ( x, r , )  f  ()d 
Need to show each sample V ( x , r , ) is convex
 From second stage formulation one can show that V ( x, r , )
is convex
14
Kelley’s Cutting Plane Algorithm
 Iteratively solve first and second stage formulation
 Given a solution to the first stage formulation, we use
method of finite differences to generate a lower bound to
BYL from the second stage formulation
BYL( x, r )  k   k ,( x, r ) 
Add this constraint to the first stage formulation at each iteration
15
Shortest-Path Constraints
 Inherently non-convex in nature
j
Ti  Dijshort  Tj  Thold
FlipFlop(i, j)
 Approximate gate delay using a linear approximation
(lower bound)
short p
Dij
  d mlin
m
d mlin  am0  a1m xm 
gates m on path p

n fanout ( m )
bn xn
 The two-stage stochastic programming formulation can
be modified to consider shortest path constraints
16
Experimental Results
 Implemented the framework in SIS using
MOSEK to solve the convex formulation
 Used CAPO to place netlist to get spatially
correlated gate delays
yj
yi
 Assumed 15% Vth variation in 90nm
technology node [Predictive Technology Model]
 Synthesized the PST clock-tree using the
technique proposed in [Chen et. al, ICCAD’05]
17
j
i
xj xi
Experimental Results
 Experimental Comparison – ISCAS benchmarks
 [Chen]:


Nominal gate sizing
PST clock-tree generation using [Chen et. al, ICCAD’05]
 Sensitivity:


Retain PST clock-tree location and range
Sensitivity-driven statistical gate sizing algorithm
– Size the gate with maximum yield gain greedily (iterative)
– Similar in spirit to [Zhou ICCAD’05, Zolotov DAC’05]
 Stochastic:


18
Retain PST clock-tree buffer locations
Proposed simultaneous gate sizing and post-silicon tunability
allocation algorithm
BYL, Area and Tuning Range Comparison
Binning Yield Loss
300000
250000
200000
[Chen]
Sensitivity
Stochastic
150000
100000
50000
0
s344
s382
s400
s526
s635
Area (Logic Gates) Comparison
Tuning Range Comparison
9000
14
8000
12
10
7000
[Chen]
Sensitivity
Stochastic
6000
5000
4000
6
4
2
3000
0
s344
19
[Chen]
Sensitivity
Stochastic
8
s382
s400
s526
s635
s344
s382
s400
s526
s635
Timing Yield Loss Comparison
Timing Yield Loss
0.3
0.25
0.2
[Chen]
Sensitivity
Stochastic
0.15
0.1
0.05
0
s344
Average Timing
Yield Loss
20
s382
s400
s526
s635
[Chen]
Sensitivity
Stochastic
0.22
0.19
0.03
Runtime Comparison
Runtime
400
350
300
250
200
150
100
50
0
Sensitivity
Stochastic
s344
s382
s526
s635
Technique
s344
s382
s400
s526
s635
Sensitivity
24
40
18
15
109
Stochastic
7
19
13
14
7
Number of Iterations
21
s400
Summary and Future Work
 Variability-driven framework for simultaneous gate sizing
and post-silicon tunability allocation to minimize binningyield loss and tunability cost
 Efficient stochastic programming based scheme to solve
the formulation
 No assumptions about parameter distribution or their
correlations
 Need to develop a statistical timing analysis scheme that
can consider the effect of post-silicon tunability
22
Thank You!
23