Timing Problems for DSM

Timing Issues for DSM
R. Brayton
U.C. Berkeley
12/5/97
1
Caveats
This talk is about a work in progress
Much of the work is roughly described
with the idea of just communicating the
general thrust.
Many details remain to be decided and
currently several algorithms are being
programmed for experimental purposes.
We are just in the middle of many studies
and depending on their results, the
direction of the project may change.
12/5/97
Tau97
2
Outline
Introduction - DSM project at Berkeley
Our timing abstraction and motivation
Timing driven placement (wireplanning)
slicing approach
programming approach
matching approach
Iterated logic decomposition
Logic rip-up and re-route
Technology aspects
12/5/97
Tau97
3
Overview
Two levels of approach
electrical and technology level
logic level using timing abstraction
Electrical level used to insure reality
predict technology dimensions
place and wire transistors to create leaf cells
using Cadence’s LAS tool or CADABRA
extract parasitics using SPACE or FASTCAP
simulate using SPICE with advanced BSIM
12/5/97
Tau97
model
4
Overview
Logic level works with a timing abstraction
(to be explained)
we need to be sure that abstraction is correct
(thus electrical experiments)
Currently cross-talk noise effects on
timing ignored
Immediate goal is to build combinational
logic macros that meet timing constraints
sequential circuits can
be handled similarly
Tau97
12/5/97
5
Macro Problem Statement
Given:
•rectangular area, inputs and outputs on perimeter.
•required times on outputs, arrival times on inputs.
•set of logic functions to be synthesized
R
f
Aa
A
g R
(possibly pin locations
can be somewhat flexible)
b
A
c
d
A
Find: Logic decomposition of the functions that can be:
•placed and wired in the given area
•meeting the timing constraints.
12/5/97
Tau97
6
Some Facts
As dimensions shrink, gate delays
decrease and wire delays increase
in the limit all delays are in the wires.
On a net, by a combination of buffer
insertion and wire sizing:
delay of net from root to any leaf can be
made linear in the Manhatten distance from
root to leaf.
12/5/97
Tau97
7
Linear Delay
By buffer insertion
spacing is determined by resistance and
capacitance of the line and the buffers
optimum # of optimum sized buffers
makes the delay linear
12/5/97
Tau97
8
Linear Wire Delay Model
for a Net
y
x
delay ( x, y)   (
| x  x
i
horizontal
path segments
j
|
| y
k
 yl |)
vertical
path segments
Delay is made linear by buffer insertion and wire
and buffer sizing
12/5/97
Tau97
9
Timing Abstraction:
Linear Delay Model (LDM)
Delay is linear function of the Manhatten
distance, independent of the logic it
meets along a path.
f  f (a, b, c)
a
f
b
c
12/5/97
Tau97
Since f depends on
b, then  ( DM (b, f ))
is the minimum delay
that can be on any path
from b to f.
10
Caveat
So far we are not considering the effect of
cross-talk noise on delay
victim
aggressor
Victim can be slowed by aggressor
if transitions are opposing
12/5/97
Tau97
11
Common Divisors May
Cause Paths to Stray
~
f  f (h(a, b), c)
g  g~(h(a, b), c)
f
h
a
12/5/97
g
b
But in this example,
the longest path is
not increased
c
Tau97
12
Example Where Longest Path
Must be Increased
~
f  f (h(a, b), a, b)
g  g~(h(a, b), a, b)
f
b
h
g
Any divisor h(a,b)
common to both
f and g cannot be
placed without
increasing longest
path
a
12/5/97
Tau97
13
Problem 1: Timing Driven
Point Placement
Given: Area, Arrival and Required times, pin
positions, and a decomposition (netlist)
Find: Point placement that satisfies all timing
constraints.
No consideration of areas required to implement
logic gates
Areas of gates can be approximated by count of
literals in factored form
12/5/97
Tau97
14
Pure Point Placement
congested
area
f
a
12/5/97
b
Tau97
g
c
15
Problem 2: Placement with
Area Constraints
Areas are flexible. Leaf cell “gates” remain to
be built. Gates types remain to be determined
(PLAs, domino, PTL, etc.)
Three experimental “wireplanning ” approaches
slicing
programming
matching
12/5/97
Tau97
16
Slicing Approach
Use simulated annealing to get point
placement
cost function for SA is derived by doing a
delay trace through the placed points
After SA, derive slicing structure from
point placement
Use flexibility of areas for final placement
12/5/97
Tau97
17
Slicing Approach
Hypothesis: Can make slicing so that distances are not
perturbed too much from point placement
Distances are estimated
now as Manhatten distance
center-to-center
Once we get slicing structure,
we need to build logic in blocks
allocated
LDM implies that we can build the logic so that
delay < distance across logic sub-block
12/5/97
Tau97
18
Programming Approach
Get initial point placement with force
directed type method (or SA)
force points apart to provide space for areas
this gives relative point positions
Distribute slacks using zero slack
distribution
Formulate and solve LP
12/5/97
Tau97
19
LP Formulation
 Distributed slacks give bound on wire lengths, dij
 Assume aspect ratio given for each “gate”
 Point placement gives relative positions
max 
subject to :
x i  x j  y j  yi  d ij
if i is right of j
j is above i and
i is connected to j
xi  x j   (
y j  yi   (
12/5/97
wi  w j
2
hi  h j
2
)
)
All areas scaled
by
to
guarantee feasibility

if i is right of j
if j is above i
Tau97
20
Matching Approach
Divide area into minimum size squares
Label each square with functions that it
can contain without violating timing
f
fg/abc
gh/bc
fh/ac
a
b
g
12/5/97
h
c
Tau97
21
Matching Approach
Each logic “gate” fans out to set of
primary outputs (fg) and fans in from set
of primary inputs (abc)
Thus a gate is labeled say fg/abc
Each gate is given an area (#lits in FF)
Want to match gates to squares so that
square’s capacity is not violated.
12/5/97
Tau97
22
Iterated Decomposition
Given: netlist and current placement
Select divisor that can be placed, still
satisfying timing constraints
smaller
areas
some
paths
longer
before
12/5/97
after
Tau97
23
Iterated Decomposition
Choose divisor that maximally decreases
 size   delay
Algorithm:
12/5/97
Get initial decomposition (say minimum area)
Selectively duplicate nodes
and adjust outputs
Collapse local trees
Global timing driven placement
Do {
select “best” divisor
locally adjust placement
(reset global placement
after k divisors)
Until area constraints are met}
Tau97
24
Fast Local Adjustment
With slicing method, can insert new
divisor into slicing structure, get new
placement and do delay trace efficiently.
So we can accurately reflect area change as
it affects delay
With LP method, can also solve fast.
Just need inequalities where areas may
overlap
12/5/97
Tau97
25
Comments
After k divisors selected and placed, re-do
global placement to better reflect all
divisors
i.e. do total timing driven placement on new
netlist
Selective duplication and collapsing can
be done to improve timing during the
iteration.
experimenting with how to choose this
selective collapsing Tau97
12/5/97
26
Rewiring
To alleviate timing further, rewiring can be
done
Can use SPFDs since exact logic in “gate”
is somewhat irrelevant.
SPFDs allow one wire to replace another
Gives more flexibility
than redundancy addition
and removal
Uses that logic in blue box
can be changed
12/5/97
Tau97
27
Technology Studies
Guess at process dimensions for DSM
“strawman ” .25m process
shrink to get .18m, ... , .05m processes
Design and layout different complex “gates”
Use Cadence’s LAS tool or Cadabra tool
Extract parasitics using SPACE or FASTCAP
Simulate with SPICE and Hu’s advanced
BSIM model
Verify LDM
12/5/97
Tau97
28
Strawman 0.05 um Process
Interconnect
H/W = 2.5/2.0
Not to scale
•9 metal layers
H/W = 2.4/1.2 •Copper wires and vias
•Polyimide dielectric (k=2)
•H/W = 2 for all layers except M9
H/W = 1.6/0.8 •M9 kept same as .25 um process
•Insulator thickness = .7m
H/W = 0.6/0.3
12/5/97
H/W = 0.14/0.07
Tau97
29
First Six Layers of Metal
Approximately to scale
12/5/97
Tau97
30
Design and Extract Flow
manual
wireplanning
netlist decomposition
test.blif
test.blifmv
Hand design
Standard Cell
Domino
Pass Transistor Logic
technology file
format?
LAS or Cadabra
test.gds
constraint file
test.verilog
test.gds
SPACE(3D)
SPICE
0.25m... interconnect
0.18m... technology
0.10m... parameters
0.05m...
12/5/97
transistor
models
Tau97
...0.25m
...0.18m
...0.10m
...0.05m
31
Acknowledgements
Richard Newton
Alberto Sangiovanni
Ralph Otten
Wilsin Gosti
Amit Narayan
Philip Chong
Mukul Prasad
Amit Mehrotra
12/5/97
Sunil Khatri
Ravi Gunturi
Subarna Sinha
Hiroshi Murata
IBM, Motorola, Intel,
Fujitsu, Cadence
SRC
Tau97
32