slides - Sung Kyu Lim

Placement
ECE6133
Physical Design Automation of VLSI Systems
Prof. Sung Kyu Lim
School of Electrical and Computer Engineering
Georgia Institute of Technology
Placement
• The process of arranging the circuit components on a layout surface.
• Inputs: A set of fixed modules, a netlist.
• Goal: Find the best position for each module on the chip according to
appropriate cost functions.
– Considerations: routability/channel density, wirelength, cut size,
performance, thermal issues, I/O pads.
1
2
1
3
3
5
5
6
1
5
2
D
B
C
A
E
F
G
H
Density = 2 (2 tracks required)
7
3
8
4
6
8
4
8
7
2
7
6
A
B
C
D
E
F
G
H
4
wirelength = 10
wirelength = 12
Shorter wirelength, 3 tracks required.
Estimation of Wirelength
• Semi-perimeter method: Half the perimeter of the bounding rectangle
that encloses all the pins of the net to be connected. Most widely used
approximation!
• Complete graph: Since #edges in a complete graph ( n(n−1)
) is
2
P
of tree edges (n − 1), wirelength ≈ n2 (i,j)∈net dist(i, j).
n
×
2
#
• Minimum chain: Start from one vertex and connect to the closest one,
and then to the next closest, etc.
• Source-to-sink connection: Connect one pin to all other pins of the
net. Not accurate for uncongested chips.
• Steiner-tree approximation: Computationally expensive.
• Minimum spanning tree
4
10
7
8
7
8
3
3
3
3
4
semi−perimeter len = 11
complete graph len * 2/n = 17.5
chain len = 14
8
10
7
4
3
3
3
4
source−to−sink len = 17
Steiner tree len = 12
Spanning tree len = 13
Placement Methods
• Constructive methods
–
–
–
–
Cluster growth algorithm
Force-directed method
Algorithm by Goto
Min-cut based method
• Iterative improvement methods
– Pairwise exchange
– Simulated annealing: Timberwolf
– Genetic algorithm
• Analytical methods
– Gordian, Gordian-L
Min-Cut Placement
• Breuer, “A class of min-cut placement algorithms,” DAC-77.
• Quadrature: suitable for circuits with high density in the center.
• Bisection: good for standard-cell placement.
• Slice/Bisection: good for cells with high interconnection on the periphery.
3a
2a
3b
1
3c
2b
3d
3a
1
3b
4a
n/2
2
4b
10a 9a10b8 10c 9b 10d
6a 5a 6b 4 6c 5b 6d
C2
n/4
C1
n/4
C2
n/2
C1
1
2
3
4
5
6
7
n/k
n/k
C1
n/4
n/4
n/2
n/2
quadrature
n/2
bisection
(k−1)n/k
slice/bisection
n/k
C2
(k−2)n/k
Algorithm for Min-Cut Placement
Algorithm: Min Cut Placement(N, n, C)
/* N : the layout surface */
/* n: # of cells to be placed */
/* n0 : # of cells in a slot */
/* C: the connectivity matrix */
1
2
3
4
5
6
7
8
begin
if (n ≤ n0 ) then PlaceCells(N, n, C);
else
(N1 , N2 ) ← CutSurface(N );
(n1 , C1 ), (n2 , C2 ) ← Partition(n, C);
Call Min Cut Placement(N1 , n1 , C1 );
Call Min Cut Placement(N2 , n2 , C2 );
end
Quadrature Placement Example
• Apply K-L heuristic to partition + Quadrature Placement: Cost C1 = 4, C2L = C2R = 2,
etc.
P
Q
8
4
2
7
1
5
R
14
Q1
16
Q2
15
Q3
12
3
13
9
6
11
10
P
C4a
2,4,5,7
8,12,13,14
2
4
8
14
5
7
12
13
1
9
11
16
3
6
10
15
C2 C2
1,3,6,9
10,11,15,16
C1
Q
C4b
R
C3a
C1
C3b
O1
C4a
C2
O2
C4b
O3
Min-Cut Placement with Terminal Propagation
• Dunlop & Kernighan, “A procedure for placement of standard-cell VLSI
circuits,” IEEE TCAD, Jan. 1985.
• Drawback of the original min-cut placement: Does not consider the
positions of terminal pins that enter a region.
– What happens if we swap {1, 3, 6, 9} and {2, 4, 5, 7} in the previous
example?
prefer to have them in R1
S
S
L1
L1
R1
L2
R2
R
L2
Terminal Propagation
• We should use the fact that s is in L1 !
dummy cell
center
L1
s
p
L2
Lower cost
R1
L1
R2
L2
s
p
R1
R2
higher cost
P will stay in R1 for the rest of partitioning!
• When not to use p to bias partitioning? Net s has cells in many groups?
minimum rectilinear
Steiner tree
p2
p
p1
p
R
h/3
h/3
h
h
L
p3
Don’t use p to bias the
solution in either direction!
Use p!
G
Terminal Propagation Example
• Partitioning must be done breadth-first, not depth-first.
a
S
b
c
a
C1
b
L
a
d
d
C1
C1
p1
b
R L
c
b
c
d
C1
S
a
L1
L1
a
b
R1
L2
c
d
R2
c
b
a
d
R1
R
c
d
unbiased partition
of R
L2
with terminal
propagation
without terminal
propagation
R2
Creating Rows
• Terminal propagation reduce overall area by ~30%
• Creating rows
– Choose α and β preferably to balance row to balance row length
(during re-arrangement )
Row 1
Row 2
Row 3
Row 4
C1 C2 C3
cells in C1→ row1
cells in C3→ row1
cells in C2
Row 1
Row 2
C2
α+β=1
α
β
Creating Rows
• Example
– Partitioning of circuit into 32 groups
– Each group is either assigned to a single row or divided into 2 rows
1
1
1,2
1,2
2
1
1,2
1,2
2
2,3
2,3
2,3
2,3
3
3,4
4
5
5
3,4
4
5
5
3
3
3,4
4
4,5
5
3,4
4
4,5
5
a four-row
standard cell
design
Experimental Results
• CMOS Chip with 453 nets and 412 cells
• Manual solution
– track density=147; feedthroughs=184
• Automated solution
–
–
–
–
–
without terminal propagation: t.d.=313; f.t.=591
(t.d. reduced to 235 by iterative interchanges)
with terminal propagation: t.d.=186; f.t.=182
(t.d. reduced to 152 by iterative interchanges)
Iterative Interchange Refinement is helpful
• The program is in production use as part of an automatic
placement system in AT&T Bell Lab.
– Solutions within 10% of the best hand layout
Remarks on Min-cut Placement
• Also implemented F-M partitioning method
– Much faster but solutions appeared to be not as good as K-L
• Use Simulated Annealing to do partitioning
– Much slower. If restricted to a reasonable CPU time, solutions are
of similar quality of those by F-M method. Easy to implement
• Seeking an elegant way to force some cells to be in
particular positions
• Investigate other algorithms for terminal propagation
– Terminal propagation is the bottleneck of CPU time
Mincut Placement

Perform quadrature mincut onto 4 × 4 grid
Start with vertical cut first
undirected graph model w/ k-clique weighting
thin edges = weight 0.5, thick edges = weight 1
Practical Problems in VLSI Physical Design
Mincut Placement (1/12)
Cut 1 and 2

First cut has min-cutsize of 3 (not unique)
Both cuts 1 and 2 divide the entire chip
Practical Problems in VLSI Physical Design
Mincut Placement (2/12)
Cut 3 and 4

Each cut minimizes cutsize
Helps reduce overall wirelength
Practical Problems in VLSI Physical Design
Mincut Placement (3/12)
Cut 5 and 6

16 partitions generated by 6 cuts
HPBB wirelength = 27
Practical Problems in VLSI Physical Design
Mincut Placement (4/12)
Recursive Bisection

Start with vertical cut
Perform terminal propagation with middle third window
Practical Problems in VLSI Physical Design
Mincut Placement (5/12)
Cut 3: Terminal Propagation

Two terminals are propagated and are “pulling” nodes
Node k and o connect to n and j: p1 propagated (outside window)
Node g connect to j, f and b: p2 propagated (outside window)
Terminal p1 pulls k/o/g to top partition, and p2 pulls g to bottom
Practical Problems in VLSI Physical Design
Mincut Placement (6/12)
Cut 4: Terminal Propagation

One terminal propagated
Node n and j connect to o/k/g: p1 propagated
Node i and j connect to e/f/a: no propagation (inside window)
Terminal p1 pulls n and j to right partition
Practical Problems in VLSI Physical Design
Mincut Placement (7/12)
Cut 5: Terminal Propagation

Three terminals propagated
Node i propagated to p1, j to p2, and g to p3
Terminal p1 pulls e and a to left partition
Terminal p2 and p3 pull f/b/e to right partition
Practical Problems in VLSI Physical Design
Mincut Placement (8/12)
Cut 6: Terminal Propagation

One terminal propagated
Node n and j are propagated to p1
Terminal p1 pulls o and k to left partition
Practical Problems in VLSI Physical Design
Mincut Placement (9/12)
Cut 7: Terminal Propagation

Three terminals propagated
Node j/f/b propagated to p1, o/k to p2, and h/p to p3
Terminal p1 and p2 pull g and l to left partition
Terminal p3 pull l and d to right partition
Practical Problems in VLSI Physical Design
Mincut Placement (10/12)
Cut 8 to 15

16 partitions generated by 15 cuts
HPBB wirelength = 23
Practical Problems in VLSI Physical Design
Mincut Placement (11/12)
Comparison

Quadrature vs recursive bisection + terminal propagation
Number of cuts: 6 vs 15
Wirelength: 27 vs 23
Practical Problems in VLSI Physical Design
Mincut Placement (12/12)
Analytical Placement
• Gordian package:
– GORDIAN: Gordian: VLSI Placement by Quadratic
Programming and slicing Optimization: J. M. Kleinhans, G.Sigl,
F.M. Johannes, K.J. Antreich, IEEE TCAD, 1991
– GORDIAN-L: Analytical Placement: A Linear or a Quadratic
Objective Function?: G. Sigl, K. Doll, F.M. Johannes, DAC91
• Gordian: A Quadratic Placement Approach
– Global optimization: solves a sequence of quadratic programming
problems
– Partitioning: enforces the non-overlap constraints
i=0
i=29
i=58
i=87
Adaptec1 Stats
• Circuit stats
–
–
–
–
–
# cells/nets/pins
chip size
bin size
# placement bins
Average bin occupancy
210,863/219,687/19,205
6000um × 6000um
50um × 50um
120 × 120
210K/1202 =14.6 gates/bin
• Wirelength result (HPBB)
–
–
–
–
iteration 0
iteration 29
iteration 58
iteration 87
34,069,060
46,352,680
80,783,336
98,111,904
Overview of Gordian Package
Procedure Gordian
l:=1;
global-optimize(l);
while (there exists |Ml|>k)
for each r є R(l)
partition(r, r’, r”);
l++;
setup-constraints(l);
global-optimize(l);
repartition(l);
final-placement(l);
endprocedure
Problem Definition
connection to
other modules
y
module u
lvu
net node v
pin vu (xuv, yuv)
(xu, yu)
(avu, bvu) = offset from center of u
(xv, yv)
x
Squared wire length of net v
Lv =
2
2
[(
x
−
x
)
+
(
y
−
y
)
]
∑ uv v
uv
v
u∈M v
x uv = x u + avu , yuv = yu + bvu
Cost Function
• Minimize the following:
1
φ = ∑ Lv wv
2 v∈N
φ ( x , y ) = X T CX + d Tx X + Y T CY + d Ty Y
φ ( x ) = X T CX + d T X
Constraints
• The center of gravity constraints
–
–
–
–
At level l, chip is divided into q (≤ 2l ) regions
For region p, the center coordinates: (up, vp)
Mp: set of modules in region p
Matrix from for all regions
constraint :
∑F x
u∈ M p
u
u
= up
∑F
u∈ M p
u
 Fi / ∑ Fi if i ∈ M p

i∈ M p
Al X = u l , a iu = 
0 otherwise
Problem Formulation
(uρ’, vρ’)
D
E
A B C D E F
F
A
B
C
A( l
M M M M M M M

ρ
* * * 0 0 0
)= 
ρ ' 0 0 0 * * *

M M M M M M M
(uρ, vρ)
Linearly constrained Quadratic Programmin g problem
LQP : minm {Φ( x ) = X T CX + d T X such that A l X = u l }
x∈ R
G
M

L
L

M
Solution Method
• Algebraic Manipulation
Aq×m = [ Dq×q Bq×( m − q ) ]
X d is dependent variable
Xd 
[ DB ]  = u, where
X i is independent variable
 Xi 
X d = − D −1 BX i + D −1 u
 − D −1 B 
 D −1 u 
X =
 Xi + 
 = ZX + x0
 I 
 0 
• Unconstrained Quadratic Programming problem
– Solved by Conjugate-Gradient method
T
T
T
T
UQP : min
{
Ψ
(
x
)
=
X
Z
CZX
+
C
X
}
,
where
C
= CX 0 + d
i
i
i
m −q
xi ∈R
3 Types of Quadratic Programming (QP) Problems:
o Positive Definite Hessian Matrix (Bowl):
• One optimal objective value
• Convex
• Easy: One Point
o Semi‐definite Hessian Matrix (Trough):
• Line of optimal objective values
• Convex
• Moderate: Any point on line
o Indefinite Hessian Matrix (Saddle):
• Optimal is on the boundaries.
• Non‐Convex
• NP Hard
Hessian Matrix
o Second order partial derivatives
o Describes local curvature
o Generalization of Laplacian
o Trace (Hessian) = Laplacian
GORDIAN Quadratic Programming (QP) Problem:
o Here C is Positive Definite.
o Hence LQP or UQP are Convex
o Unique optimal X.
Hence, GORDIAN QP always finds a global optimal
Thanks to: Anirudha Kurhade, Fall 2016
Partitioning
• Recursive partitioning is needed
– to resolve module overlap in global placement
– global placement problem will be solved again with two
additional center_of_gravity constraints
Cp(a)
M p → ( M p' , M p'' )
x u' ≤ x u'' u'∈ M p' and u' '∈ M p''
α=
∑F / ∑F
u '∈ M p '
u
u∈ M p
u
cut value : C p (α ) =
≈ 0.5
∑w
v∈ N C
40
30
20
10
v
0
0.0
0.25
0.5
0.75
1.0
Repartitioning
• Module exchange after each cut to improve cut size
– terminal propagation using global placement positions
• Repartitioning
– to ‘undo’ the mistake made at the previous level:
Procedure repartition(l)
if overlap exists
for each r∈R(l-1)
merge-regions(r, r’, r’’);
partition(r, r’, r’’);
setup-constraints(l);
global-optimize(l);
endif
Summary of Gordian
module coordinates
Partitioning of
module set and
dissection of
placement region
Global
Optimization
minimization of
wire length
position constraints
module
coordinates
Final
Placement
adoption of style
dependent
constraints
Regions
with ≤ k
modules
Complexity: space = O(m), time = O(m1.5 log2m)
Final placement: standard cell, macro-cell & SOG
Experimental Results
Comparison of Results for Standard Cell Blocks
Area After Routing/mm2
Circuit
scb1
scb2
scb3
scb4
scb5
scb6
scb7
scb8
scb9
CPU-time scb8
CPU-time scb9
ratio
GORDIAN
2.7
5.8
15.7
14.0
10.6
11.3
16.4
51.7
54.0
120s
135s
1
Min-Cut
3.1
5.3
25.6
16.9
11.3
12.7
20.2
89.2
98.6
366s
440s
:3
Annealing
2.6
5.0
9.1
13.2
10.9
12.8
19.8
59.5
80.0
39851s
34709s
:300
GORDIAN Placement

Perform GORDIAN placement
Uniform area and net weight, area balance factor = 0.5
Undirected graph model: each edge in k-clique gets weight 2/k
Practical Problems in VLSI Physical Design
GORDIAN Placement (1/21)
IO Placement

Necessary for GORDIAN to work
Practical Problems in VLSI Physical Design
GORDIAN Placement (2/21)
Adjacency Matrix

Shows connections among movable nodes
Among nodes a to j
Practical Problems in VLSI Physical Design
GORDIAN Placement (3/21)
Pin Connection Matrix

Shows connections between movable nodes and IO
Rows = movable nodes, columns = IO (fixed)
Practical Problems in VLSI Physical Design
GORDIAN Placement (4/21)
Degree Matrix

Based on both adjacency and pin connection matrices
Sum of entries in the same row (= node degree)
Practical Problems in VLSI Physical Design
GORDIAN Placement (5/21)
Laplacian Matrix

Degree matrix minus adjacency matrix
Practical Problems in VLSI Physical Design
GORDIAN Placement (6/21)
Fixed Pin Vectors

Based on pin connection matrix and IO location
Y-direction is defined similarly
Practical Problems in VLSI Physical Design
GORDIAN Placement (7/21)
Fixed Pin Vectors (cont)
Practical Problems in VLSI Physical Design
GORDIAN Placement (8/21)
Fixed Pin Vectors (cont)
Practical Problems in VLSI Physical Design
GORDIAN Placement (9/21)
Level 0 QP Formulation

No constraint necessary
Practical Problems in VLSI Physical Design
GORDIAN Placement (10/21)
Level 0 Placement

Cells with real dimension will overlap
Practical Problems in VLSI Physical Design
GORDIAN Placement (11/21)
Level 1 Partitioning

Perform level 1 partitioning
Obtain center locations for center-of-gravity constraints
Practical Problems in VLSI Physical Design
GORDIAN Placement (12/21)
Level 1 Constraint
Practical Problems in VLSI Physical Design
GORDIAN Placement (13/21)
Level 1 LQP Formulation
Practical Problems in VLSI Physical Design
GORDIAN Placement (14/21)
Level 1 Placement
Practical Problems in VLSI Physical Design
GORDIAN Placement (15/21)
Verification

Verify that the constraints are satisfied in the left partition
Practical Problems in VLSI Physical Design
GORDIAN Placement (16/21)
Level 2 Partitioning

Add two more cut-lines
This results in p1={c,d}, p2={a,b,e}, p3={g,j}, p4={f,h,i}
Practical Problems in VLSI Physical Design
GORDIAN Placement (17/21)
Level 2 Constraint
Practical Problems in VLSI Physical Design
GORDIAN Placement (18/21)
Level 2 LQP Formulation
Practical Problems in VLSI Physical Design
GORDIAN Placement (19/21)
Level 2 Placement

Clique-based wiring is shown
Practical Problems in VLSI Physical Design
GORDIAN Placement (20/21)
Summary

Center-of-gravity constraint
Helps spread the cells evenly while monitoring wirelength
Removes overlaps among the cells (with real dimension)
Practical Problems in VLSI Physical Design
GORDIAN Placement (21/21)
Linear vs. Quadratic Objective
A
fixed
b
a
B
movable
g
C
fixed
Quadratic objective function
g
A B
fixed movable
C
fixed
Linear objective function
φq = lα2 + l β2 + lγ2 = 2( l − lγ ) 2 + lγ2
2
3
φq' = −4( l − lγ ) + 2lγ = 0 → lγ = l
lγ
1
lα = l β = l = , φl = lα + l β + lγ
3
2
Linear vs. Quadratic Objective
• Quadratic objective function
– tends to make very long net shorter than linear objective function
– lets short nets become slightly longer
A
row2
B
row1
row1
row3
row4
Linear objective function
row2
A
B
row3
row4
Quadratic objective function
Optimizing Linear Objective
• Global Placement with linear objective function
φq = ∑
2
(
x
−
x
)
∑ uv v → quadratic objective function
φl = ∑
∑x
v∈ N u∈ M v
v∈ N u∈ M v
uv
− xv
→ linear objective function
• Trick
– use quadratic programming to minimize linear objective function
( x uv − xv ) 2
( x uv − xv ) 2
φl = ∑ ∑
=∑∑
x uv − xv
g uv
v∈ N u∈ M v
v∈ N u∈ M v
g uv = x uv − xv , gv =
∑x
u∈ M v
uv
− xv
Analytical Placement Results
wire length /mm
400
circuit primary 2
Gordian
GordianL
300
200
100
number of pins of a net
17
14
12
10
8
6
4
2
0
Figure: Sum of wire lengths versus #pins
Analytical Placement Results
Quadratic objective function
Linear objective function
(a) Global placement with 1 region
Analytical Placement Results
Quadratic objective function
Linear objective function
(b) Global placement with 4 regions
Analytical Placement Results
Quadratic objective function
Linear objective function
(c) Final placements

Download Report

slides - Sung Kyu Lim

Paperzz.com

Your Paperzz