BonnCell
Automatic Layout of Leaf Cells
Stefan Hougardy, Tim Nieberg, Jan Schneider
Research Institute for Discrete Mathematics
University of Bonn
January 24, 2013
Outline
I
Introduction
I
BonnCell Placement
I
BonnCell Routing
I
Status and Results
The Leaf Cell Layout Problem
Input: A netlist, the cell image
Goal : A “good” solution
I
Place the FETs
I
I
I
I
I
Choose number of fingers
Decide to swap FETs
Obey all placement rules
Guarantee routability
Routing
I
I
I
Find an LVS-clean routing
Should be almost DRC-clean
Minimize M2 usage
Leaf Cell Layout – Overview
I
Leaf cell layout is used in highly optimized arrays
I
So far has been done manually
I
Example: a next generation Core SRAM
I
I
I
Expected time for first pass layout: 3 months or more
Actual time needed: 1.5 months with BonnCell
By now, BonnCell is used worldwide
I
IBM Design Centers in Germany, USA, Israel, India
Main Features of BonnCell
I
Generates 1-dimensional and 2-dimensional cell layouts
I
Interleaving of stacks
I
Instances can be non-dual, non-series-parallel, non-planar
I
Unequal number of P and N devices possible
Dynamic placement and dynamic folding
I
I
I
Folding of multiple FETs
First time done by an automatic tool
I
We do not enforce and-stack-clustering
I
Routing does not rely on structured placement
I
DRC rules are taken into account during routing
Previous Work
Bar-Yehuda, Feldman, Pinter, Wimer (1989)
“Depth-First Search and Dynamic Programming Algorithms for
Efficient CMOS Cell Generation”
I
Similar basic approach as BonnCell (also branch & bound)
I
Similar target function
Much more restricted:
I
I
I
I
I
Strictly 1-dimensional
All FETs have exactly 1 finger
Only very structured routings are considered
No optimum is found due to simple flipping strategy
Previous Work
Iizuka, Ikeda, Asada (2006)
“Exact Minimum-Width Multi-Row Transistor Placement for Dual
and Non-Dual CMOS Cells”
I
Different approach: Transformation to CNF, applying a SAT
solver
I
Similar flow: Increase cell size until feasibility is reached
I
Supports 2-dimensional cells
Much more restricted:
I
I
I
I
Strictly 1-dimensional (i.e. no interleaving)
All FETs have exactly 1 finger
Routability is hardly addressed
Leaf Cell Placement
in BonnCell
Leaf Cell Placement Problem
|Pl1c
|Nl1c
8
35
vd
gd
3
3
4
|Pd1clk
|Nd1clk
34
4
3
3
|Nmfdo2
|Pmfdo2
|Nmfdo
|Pmfdo
|Pmfdm
|Nmfdm
|Pmfdi
|Nmfdi
30
3
29
9
9
|Nd1tg
3
31
5
24
17
4
32
8
25
18
6
8
26
19
8
33
vd
gd
8
Given an image and a netlist,
containing
36
11
11
8
|Pd1tg
4
7
7
9
9
28
27
I
FETs and nets,
|Nd2tg
5
|Pd2tg
|Psi
|Nsi
32
assign to each FET
I
a location,
|Pso
|Nso
33
14
14
|Nsinv
|Psinv
|Nsinv2
|Psinv2
so that the placement
|Psfdo
|Nsfdo
|Psfdi
|Nsfdi
17
15
16
13
|Pltg
|Nltg
16
12
12
10
10
15
14
13
9
9
|Nl1q
9
12
vd
gd
|Pl1q
9
11
10
10
meets all placement
constraints,
18
14
28
9
9
I
19
14
29
21
15
20
13
|Psfdo2
|Nsfdo2
22
14
21
13
vd
13
a number of fingers,
23
30
23
13
16
I
24
22
13
14
a swap status, and
25
14
33
gd
I
26
32
vd
gd
14
6
35
34
10
vd
gd
13
13
31
|Nqb
|Pqb
9
31
13
13
|Nlclk
15
8
vd
gd
|Plclk
15
7
16
16
6
I
is routable, and
9
|Nl1
|Pl1
|Nl12
|Pl12
is optimal.
4
9
3
vd
gd
I
9
27
20
9
5
8
8
11
11
12
|Nl1t
|Pl1t
2
12
11
11
5
|Nd2clk
6
1
vd
gd
5
|Pd2clk
6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
0
Anatomy of a Leaf Cell
Cross Section of a FET
I
A leaf cell spans 3 metal layers: PC, M1, M2
I
Gates are on PC
I
Source and drain are on M1
M2
M2
M2
V1
V1
V1
M1
M1
M1
CA
CA
PC
CA
RX
Cross section of a FET
Placing a FET
4
3
2
Placing a FET allows
many degrees of
freedom
I
x-coordinate
I
y-coordinate
I
Number of fingers
I
Swap status
3
1
gd
0
2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
4
3
gd
2
3
1
gd
0
2
2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Which Placement Constraints?
vd
4
4
3
2
2
Depending on the
FETs’ properties, they
can/must be placed in
various ways:
I
With a gap
I
Abutting
I
Overlapping
3
1
gd
0
2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
4
vd
3
3
2
3
1
gd
0
2
2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
4
Plus more
technology-dependent
placement constraints
3
vd
2
3
1
gd
0
2
2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
What is Optimality?
Optimality in BonnCell
I
Our quality measure is a quadruple (h, ggNL, wNL, σ) with:
I
I
h := cell height in #tracks
ggNL := sum of sizes of net’s gate intervals
I
I
wNL := weighted sum of sizes of net intervals
I
I
(where “gate interval” is the interval between bottommost
and topmost gate)
(where “net interval” is the interval between bottommost and
topmost terminal)
σ := free routing space
I
(i.e. free spaces on every track summed up)
I
“Optimal” placement is the placement with the
lexicographically best quality measure
I
Measure proved to be good in practice
Algorithm Overview
Algorithm Overview
I
I
Assume that FETs are placed in 2 stacks
BonnCell Placement runs in 2 phases:
I
I
I
Phase 1: Compute an optimal placement with the restriction
that both stacks can be divided by a vertical line
Phase 2: Compute an optimal placement without the
restriction
On a timeout, the best solution found so far is returned
1-Stack Algorithm
gd
14
6
13
8
1-Stack Subroutine
8
gd
|N3
12
8
6
11
gd
10
I
Important subroutine of
the main algorithm
I
Find one (or all)
track-minimal
placements of a single
stack
I
Can be solved by simple
enumeration
8
9
8
8
5
9
|N19
5
8
7
6
5
5
9
3
10
|N22
3
9
4
3
3
2
10
4
gd
|N23
1
4
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
0
2-Stack Flow
2-Stack Flow – Phase 1
I
Assume that both stacks are divided by a vertical line
I
Step 1: Choose x coordinate for this line
I
Step 2: Find single track-minimal 1-stack placements for
both stacks independently
Step 3: For all track-minimal placements of “bigger” stack
I
I
I
I
I
. . . find all placements of “smaller” stack
. . . which do not exceed the big stack’s height
. . . and evaluate their quality
If a placement is best so far, save it
2-Stack Flow
16
6
gd
8
8
|N3
8
15
vc
6
8
14
6
gd
8
8
13
vc
6
|P5
8
12
6
8
11
vc
8
8
5
|N19
5
10
6
9
8
9
vc
8
5
5
8
8
9
|P7
3
5
7
vc
10
5
3
6
8
9
|N22
3
6
|P28
5
7
10
|T1
3
6
4
vc
9
3
3
10
4
|N23
4
10
2
vd
gd
|P33
4
10
1
4
gd
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Optimal placement after phase 1.
2-Stack Flow
2-Stack Flow – Phase 2
I
I
Do not assume the vertical line and allow interleaved
placements
Problem: Height of optimal solution is unknown
I
I
I
Step 1: For all placements of “bigger” stack
I
I
I
I
I
Solution: Start to run phase 2 with cell height 0
Run phase 2 for increasing height as long as no solution exists
. . . find all legal placements of “smaller” stack
. . . which do not exceed the big stack’s height
. . . and evaluate their quality
If a placement is best so far, save it
In case of time-out, return best result so far (including
phase 1)
2-Stack Flow
14
6
gd
8
8
13
vc
6
8
8
gd
|N3
8
12
6
|P5
8
11
vc
6
8
8
10
6
gd
8
9
vc
8
vc
9
8
8
5
5
|N19
5
|P7
7
5
8
vc
9
8
6
5
5
3
5
6
|P28
4
7
10
|N22
3
|T1
6
3
vc
9
3
2
10
4
gd
4
10
|N23
1
vd
|P33
4
10
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Optimal placement after phase 2.
Branch & Bound
Branch & Bound
I
The 1-Stack subroutine can be implemented recursively
I
I
A lower bound lb can be computed in every step
I
I
StackPlacer(S, k) places stack S, assuming that the lower
k FETs are already placed
lb := cur height + remaining fingers
Stop extending the stack if lb > best height
Branch & Bound – Improvements
Improvements
I
I
Use the number of nets accessing an odd number of
remaining FETs in the lower bound computation
In a first step, look for a single height-optimal solution by
bounding on lb ≥ best height
I
I
I
Ininitialize upper bound with optimal value
More bounding possibilities in 2-Stack flow
I
I
I
I
Additionally, only look for solutions with a specific structure
Netlength-based bounding of secondary stack
Smart pruning for primary stack
Look-ahead netlength estimation
Some DRC rules can be implemented as additional bounding
steps
Runtime Analysis
|Nsclk_bb
|Psfdo
|Nsfdi
|Psfdi
19
|Nstg
|Pstg
18
17
17
22
13
vc
6
19
19
5
20
8
8
21
6
5
14
6
gd
20
vd
|Nsfdo
11
17
17
|Psclk_bb
gd
21
21
19
19
17
16
8
8
gd
|N3
8
12
6
|P5
11
|Psinv
15
5
14
vd
gd
10
6
gd
|Nsinv
5
8
8
21
21
8
vc
6
9
8
vc
8
9
8
|Nsi
16
13
vd
gd
8
vc
|Psi
16
20
22
17
17
12
11
5
5
|N19
5
|P7
7
5
6
vc
8
18
|N22
|T1
3
vc
9
13
|Pfdm
7
vd
|Nfdi
14
8
15
9
14
|Pfdi
6
15
15
3
9
18
|Pfdo
|Nfdm
13
6
10
18
vd
8
4
7
10
|Psclk
|Nfdo
18
6
|P28
3
5
8
9
3
|Nsclk
gd
5
5
2
10
5
4
gd
4
10
|N23
1
vd
|P33
0
13
|Npclk
|Ppclk
15
|Nqn
|Pqn
|Ndyn2
|Pdyn2
9 FETs, 15 tracks
Estimated runtime: 3 years
|Ndyn1
|Pdyn1
14
2
4
1
vd
7
3
3
15
14
10
4
13
vd
gd
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
4
12
12
4
10
3
14
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
28 FETs, 22 tracks
Estimated runtime: 1030 years
Runtime Analysis
|Nsclk_bb
|Psfdo
|Nsfdi
|Psfdi
19
|Nstg
|Pstg
18
17
17
22
13
vc
6
19
19
5
20
8
8
21
6
5
14
6
gd
20
vd
|Nsfdo
11
17
17
|Psclk_bb
gd
21
21
19
19
17
16
8
8
gd
|N3
8
12
6
|P5
11
|Psinv
15
5
14
vd
gd
10
6
gd
|Nsinv
5
8
8
21
21
8
vc
6
9
8
vc
8
9
8
|Nsi
16
13
vd
gd
8
vc
|Psi
16
20
22
17
17
12
11
5
5
|N19
5
|P7
7
5
6
vc
8
18
|N22
|T1
3
vc
9
13
|Pfdm
7
vd
|Nfdi
14
8
15
9
14
|Pfdi
6
15
15
3
9
18
|Pfdo
|Nfdm
13
6
10
18
vd
8
4
7
10
|Psclk
|Nfdo
18
6
|P28
3
5
8
9
3
|Nsclk
gd
5
5
2
10
5
4
gd
4
|N23
1
vd
|P33
0
13
|Npclk
|Ppclk
15
|Nqn
|Pqn
|Ndyn2
|Pdyn2
9 FETs, 15 tracks
Estimated runtime: 3 years
BonnCell runtime: 0.8 sec.
|Ndyn1
|Pdyn1
(7,782,302 branch & bound nodes)
2
4
1
vd
7
3
3
15
14
10
4
13
vd
gd
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
4
12
12
4
10
10
3
14
14
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
28 FETs, 22 tracks
Estimated runtime: 1030 years
BonnCell runtime: 38 seconds
(83,198,650 branch & bound nodes)
Leaf Cell Routing
in BonnCell
Leaf Cell Routing Problem
Input:
Output:
Placed leaf cell with net set N = {T1 , . . . , Tn }
Packing of Steiner trees for Tk , k = 1, . . . , n, subject to
various constraints (DRC, limited M2 availabilty, ...)
14
vc
gd
gd
|N3
gd
8
|P5
6
|P5
8
8
9
8
8
|N19
|P7
5
6
vc
8
5
5
|N22
|T1
5
6
|P28
4
7
10
3
vc
5
8
9
|N22
3
3
6
vc
3
6
9
|P7
8
4
7
10
3
|N19
5
5
6
|P28
7
8
9
5
5
8
9
3
→
8
5
5
7
8
9
vc
8
5
5
9
6
8
vc
8
5
10
vc
gd
8
6
11
6
8
10
vc
12
vc
8
6
8
gd
|N3
8
11
6
8
13
8
8
12
vc
8
6
6
8
14
8
8
13
6
6
8
vc
gd
8
8
|T1
6
3
vc
9
3
2
10
4
2
10
4
gd
4
10
|N23
1
vd
|P33
4
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
gd
4
0
10
|N23
1
vd
|P33
4
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
0
Leaf Cell Routing Problem
Input:
Output:
Placed leaf cell with net set N = {T1 , . . . , Tn }
Packing of Steiner trees for Tk , k = 1, . . . , n, subject to
various constraints (DRC, limited M2 availabilty, ...)
Our approach:
I
Construct half-track routing grid
I
Block edges for gates, RX, ...
⇒ G = (V , E ) with Tk ⊂ V for all k
Route by MIP-based constraint generation approach
I
I
Identify each edge usage (per net) with variable
Packing Steiner Trees (s.t. add/l constraints)
Core MIP formulation to be solved on G = (V , E ):
min
P
cij xij
(i,j)∈E
s.t.
xij −
n
P
k=1
P
(i,j)∈E |i∈W ,j6∈W
xijk
= 0
∀(i, j) ∈ E
xij
≤ 1
∀(i, j) ∈ E
xijk
≥ 1
∀W ⊂ V , W ∩ Tk 6= ∅
(V \ W ) ∩ Tk 6= ∅, ∀k
xijk
∈ {0, 1} ∀(i, j) ∈ E , ∀k
I
Note: third set of constraints (Steiner Cut Inequalities) has
exponential cardinality
I
Gives edge-disjoint Steiner tree packing [Grötschel et al. 97]
Packing Steiner Trees (s.t. add/l constraints)
Add constraints to ensure connectivity of net
I
e.g. flow-based LP formulation for single Steiner tree
k ∈ {1, . . . , n}, r ∈ Tk [Goemans et al. 93]
Sx k f := {(x k , f ) | f t (δ + (i)) − f t (δ − (i)) = rhsi , i ∈ V , t ∈ Tk }
fijt
≤ xijk
∀{i, j} ∈ E , ∀t ∈ Tk
fet
≥
∀e ∈ A, ∀t ∈ Tk
0
1 i =r
−1 i = t
rhsi :=
0 i ∈ V \ {t, r }
Design Rules
Extend core MIP so that solution satisfies additional constraints
I
Distance rules between different nets (minspace, interlayer via,...)
I
Samenet rules (samenet minspace, minarea...)
I
...
Overall, there are 80+ constraints induced per (edge, net) pair
Design Rules – Example
Line-End ⇐⇒ Polygonal edge between two convex corners closer
than tLE ∈ N
Line-End requires additional spacing for OPC
xn
xw xe
xs
xei 0
In this example, xw − (xn + xe + xs ) + xei 0 ≤ 1 forces xei 0 to zero.
Constraint Generation
Solution Approach
I
Due to high number of constraints (of which many are
nonbinding), we use constraint generation:
• Start with a limited but useful subset of constraints
• Solve MIP
• Check all constraints for feasibility
I
I
I
All feasible: Optimal solution found
Not all feasible: Add (some) violated constraints to MIP and
resolve
Efficient constraint checking:
I
I
Net connectivity ⇒ New cuts or flow formulation
DRC violations ⇒ Wire-dependent constraints
Add/l Benefit: MIP infeasible ⇒ Placement unroutable!
Example BonnCell Routing
cab_ccache_aoi_lclk
6
8
6
|T5
gd
5
vd
|T7
6
6
4
5
8
3
|T6
5
3
3
2
7
8
1
7
8
|T0
4
3
5
|T3
|T4
4
vd
5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
BonnCell result (basic routing flow)
Running time: 324 seconds
0
Additional Features
Many features beyond basic functionality
I
Spend additional tracks to improve routability
I
Place within fixed cell height
I
Multi-dimensional placement & routing
I
Folding multiple FETs
I
GUI integration into Cadence environment
I
Pin placement w/o routing
I
User-defined constraints like fixations and
maximum/minimum widths
I
Several modes for external pins
I
Powerful Postprocessing heuristics
Results & Outlook
Designer vs. BonnCell
14
vc
gd
8
8
13
6
6
8
8
gd
|N3
12
vc
8
8
|P5
6
11
6
8
8
10
vc
gd
8
9
6
8
8
vc
8
5
5
7
8
9
|N19
5
|P7
5
6
vc
8
5
5
6
|P28
4
7
10
|N22
3
5
8
9
3
|T1
6
3
vc
9
3
2
10
4
gd
|N23
4
10
1
vd
|P33
4
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Designer placement: 22 tracks
BonnCell
placement: 15 tracks
Routing runtime: 14:33
0
Results on 22 nm testbed
Cell
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|F |
9
7
6
12
6
14
4
5
15
8
2
2
16
5
|N |
12
10
9
13
10
16
9
9
23
12
6
6
16
8
|T |
57
32
41
57
25
39
47
34
47
46
32
56
51
16
Placement
h
time
15
0:04
10
0:01
10
0:01
14
0:43
7
0:01
10
0:01
11
0:01
13
0:01
14
0:01
11
0:01
8
0:00
14
0:01
14
0:01
4
0:00
Routing
time
2:55
0:12
0:13
27:45
0:08
0:33
3:26
0:23
1:49
38:27
2:10
5:08
14:33
2:24
|N | number of nets, |T | number of terminals (active gates,
contacts, and external pins), h height in tracks, time is [mm:ss]
Additional BonnCell Applications
I
Cell Tuning
I
resizing transistors
I
timing optimization
I
optimize pin positions
I
create different versions of same cell
I
Postoptimization of Cell Libraries
I
Library Design
I
Evaluation of early Design Decisions:
I
how many tracks should be used?
I
how much M2 will be needed?
BonnCell 22nm Results on Library
cell type
aoi21
aoi22
buff
invert
latch
nand2
nand3
nand4
nor2
nor3
oai21
oai22
xnor2
xor2
other
all
∗
# in library
% area reduction
# used∗
% area reduction
156
80
96
92
48
176
132
56
160
116
148
80
80
80
9
1509
7.69
5.99
2.89
0.00
2.12
12.88
15.26
7.49
10.17
15.65
6.24
6.32
12.36
11.20
5.14
8.91
56
55
0
66
17
58
50
39
52
44
53
54
54
57
2
657
0.79
6.15
—
0.00
1.72
5.28
8.97
7.73
1.77
10.12
0.62
6.98
12.03
10.78
4.13
6.05
= used on a testbed of current chips.
Future Work
14 nm version
I
I
14 nm placement (done)
14 nm routing (WIP)
I
I
Mix of MIP-based and combinatorial approach
Many technology changes and new design rule types
Thank you!
© Copyright 2026 Paperzz