Timing Issues for DSM
R. Brayton
U.C. Berkeley
12/5/97
1
Caveats
This talk is about a work in progress
Much of the work is roughly described
with the idea of just communicating the
general thrust.
Many details remain to be decided and
currently several algorithms are being
programmed for experimental purposes.
We are just in the middle of many studies
and depending on their results, the
direction of the project may change.
12/5/97
Tau97
2
Outline
Introduction - DSM project at Berkeley
Our timing abstraction and motivation
Timing driven placement (wireplanning)
slicing approach
programming approach
matching approach
Iterated logic decomposition
Logic rip-up and re-route
Technology aspects
12/5/97
Tau97
3
Overview
Two levels of approach
electrical and technology level
logic level using timing abstraction
Electrical level used to insure reality
predict technology dimensions
place and wire transistors to create leaf cells
using Cadence’s LAS tool or CADABRA
extract parasitics using SPACE or FASTCAP
simulate using SPICE with advanced BSIM
12/5/97
Tau97
model
4
Overview
Logic level works with a timing abstraction
(to be explained)
we need to be sure that abstraction is correct
(thus electrical experiments)
Currently cross-talk noise effects on
timing ignored
Immediate goal is to build combinational
logic macros that meet timing constraints
sequential circuits can
be handled similarly
Tau97
12/5/97
5
Macro Problem Statement
Given:
•rectangular area, inputs and outputs on perimeter.
•required times on outputs, arrival times on inputs.
•set of logic functions to be synthesized
R
f
Aa
A
g R
(possibly pin locations
can be somewhat flexible)
b
A
c
d
A
Find: Logic decomposition of the functions that can be:
•placed and wired in the given area
•meeting the timing constraints.
12/5/97
Tau97
6
Some Facts
As dimensions shrink, gate delays
decrease and wire delays increase
in the limit all delays are in the wires.
On a net, by a combination of buffer
insertion and wire sizing:
delay of net from root to any leaf can be
made linear in the Manhatten distance from
root to leaf.
12/5/97
Tau97
7
Linear Delay
By buffer insertion
spacing is determined by resistance and
capacitance of the line and the buffers
optimum # of optimum sized buffers
makes the delay linear
12/5/97
Tau97
8
Linear Wire Delay Model
for a Net
y
x
delay ( x, y) (
| x x
i
horizontal
path segments
j
|
| y
k
yl |)
vertical
path segments
Delay is made linear by buffer insertion and wire
and buffer sizing
12/5/97
Tau97
9
Timing Abstraction:
Linear Delay Model (LDM)
Delay is linear function of the Manhatten
distance, independent of the logic it
meets along a path.
f f (a, b, c)
a
f
b
c
12/5/97
Tau97
Since f depends on
b, then ( DM (b, f ))
is the minimum delay
that can be on any path
from b to f.
10
Caveat
So far we are not considering the effect of
cross-talk noise on delay
victim
aggressor
Victim can be slowed by aggressor
if transitions are opposing
12/5/97
Tau97
11
Common Divisors May
Cause Paths to Stray
~
f f (h(a, b), c)
g g~(h(a, b), c)
f
h
a
12/5/97
g
b
But in this example,
the longest path is
not increased
c
Tau97
12
Example Where Longest Path
Must be Increased
~
f f (h(a, b), a, b)
g g~(h(a, b), a, b)
f
b
h
g
Any divisor h(a,b)
common to both
f and g cannot be
placed without
increasing longest
path
a
12/5/97
Tau97
13
Problem 1: Timing Driven
Point Placement
Given: Area, Arrival and Required times, pin
positions, and a decomposition (netlist)
Find: Point placement that satisfies all timing
constraints.
No consideration of areas required to implement
logic gates
Areas of gates can be approximated by count of
literals in factored form
12/5/97
Tau97
14
Pure Point Placement
congested
area
f
a
12/5/97
b
Tau97
g
c
15
Problem 2: Placement with
Area Constraints
Areas are flexible. Leaf cell “gates” remain to
be built. Gates types remain to be determined
(PLAs, domino, PTL, etc.)
Three experimental “wireplanning ” approaches
slicing
programming
matching
12/5/97
Tau97
16
Slicing Approach
Use simulated annealing to get point
placement
cost function for SA is derived by doing a
delay trace through the placed points
After SA, derive slicing structure from
point placement
Use flexibility of areas for final placement
12/5/97
Tau97
17
Slicing Approach
Hypothesis: Can make slicing so that distances are not
perturbed too much from point placement
Distances are estimated
now as Manhatten distance
center-to-center
Once we get slicing structure,
we need to build logic in blocks
allocated
LDM implies that we can build the logic so that
delay < distance across logic sub-block
12/5/97
Tau97
18
Programming Approach
Get initial point placement with force
directed type method (or SA)
force points apart to provide space for areas
this gives relative point positions
Distribute slacks using zero slack
distribution
Formulate and solve LP
12/5/97
Tau97
19
LP Formulation
Distributed slacks give bound on wire lengths, dij
Assume aspect ratio given for each “gate”
Point placement gives relative positions
max
subject to :
x i x j y j yi d ij
if i is right of j
j is above i and
i is connected to j
xi x j (
y j yi (
12/5/97
wi w j
2
hi h j
2
)
)
All areas scaled
by
to
guarantee feasibility
if i is right of j
if j is above i
Tau97
20
Matching Approach
Divide area into minimum size squares
Label each square with functions that it
can contain without violating timing
f
fg/abc
gh/bc
fh/ac
a
b
g
12/5/97
h
c
Tau97
21
Matching Approach
Each logic “gate” fans out to set of
primary outputs (fg) and fans in from set
of primary inputs (abc)
Thus a gate is labeled say fg/abc
Each gate is given an area (#lits in FF)
Want to match gates to squares so that
square’s capacity is not violated.
12/5/97
Tau97
22
Iterated Decomposition
Given: netlist and current placement
Select divisor that can be placed, still
satisfying timing constraints
smaller
areas
some
paths
longer
before
12/5/97
after
Tau97
23
Iterated Decomposition
Choose divisor that maximally decreases
size delay
Algorithm:
12/5/97
Get initial decomposition (say minimum area)
Selectively duplicate nodes
and adjust outputs
Collapse local trees
Global timing driven placement
Do {
select “best” divisor
locally adjust placement
(reset global placement
after k divisors)
Until area constraints are met}
Tau97
24
Fast Local Adjustment
With slicing method, can insert new
divisor into slicing structure, get new
placement and do delay trace efficiently.
So we can accurately reflect area change as
it affects delay
With LP method, can also solve fast.
Just need inequalities where areas may
overlap
12/5/97
Tau97
25
Comments
After k divisors selected and placed, re-do
global placement to better reflect all
divisors
i.e. do total timing driven placement on new
netlist
Selective duplication and collapsing can
be done to improve timing during the
iteration.
experimenting with how to choose this
selective collapsing Tau97
12/5/97
26
Rewiring
To alleviate timing further, rewiring can be
done
Can use SPFDs since exact logic in “gate”
is somewhat irrelevant.
SPFDs allow one wire to replace another
Gives more flexibility
than redundancy addition
and removal
Uses that logic in blue box
can be changed
12/5/97
Tau97
27
Technology Studies
Guess at process dimensions for DSM
“strawman ” .25m process
shrink to get .18m, ... , .05m processes
Design and layout different complex “gates”
Use Cadence’s LAS tool or Cadabra tool
Extract parasitics using SPACE or FASTCAP
Simulate with SPICE and Hu’s advanced
BSIM model
Verify LDM
12/5/97
Tau97
28
Strawman 0.05 um Process
Interconnect
H/W = 2.5/2.0
Not to scale
•9 metal layers
H/W = 2.4/1.2 •Copper wires and vias
•Polyimide dielectric (k=2)
•H/W = 2 for all layers except M9
H/W = 1.6/0.8 •M9 kept same as .25 um process
•Insulator thickness = .7m
H/W = 0.6/0.3
12/5/97
H/W = 0.14/0.07
Tau97
29
First Six Layers of Metal
Approximately to scale
12/5/97
Tau97
30
Design and Extract Flow
manual
wireplanning
netlist decomposition
test.blif
test.blifmv
Hand design
Standard Cell
Domino
Pass Transistor Logic
technology file
format?
LAS or Cadabra
test.gds
constraint file
test.verilog
test.gds
SPACE(3D)
SPICE
0.25m... interconnect
0.18m... technology
0.10m... parameters
0.05m...
12/5/97
transistor
models
Tau97
...0.25m
...0.18m
...0.10m
...0.05m
31
Acknowledgements
Richard Newton
Alberto Sangiovanni
Ralph Otten
Wilsin Gosti
Amit Narayan
Philip Chong
Mukul Prasad
Amit Mehrotra
12/5/97
Sunil Khatri
Ravi Gunturi
Subarna Sinha
Hiroshi Murata
IBM, Motorola, Intel,
Fujitsu, Cadence
SRC
Tau97
32
© Copyright 2026 Paperzz