New Global Objectives for Timing

Analytical Minimization of Signal Delay
in VLSI Placement
Andrew B. Kahng and Igor L. Markov
UCSD, Univ. of Michigan
http://www.eecs.umich.edu/~imarkov
IBM technical contact: Paul Villarrubia
Outline
• Background: Global Placement for VLSI
– wirelength minimization
– delay minimization
• Contribution
– minimization objective
– “generic” minimization algorithm: outer loop and inner loop
– empirical results
• Futures
VLSI Global Placement
• Find locations for standard cells
• Standard cells placed in rows, without overlap
• Minimize wirelength, “routing congestion”
• Minimize clock cycle
• Key abstractions:
– standard cells  rectangular outlines
– netlist  weighted hypergraph (signal nets  hyperedges)
– signal delay  function of cell locations (interconnect dominates)
A VLSI Global Placement Example
bad placement
good placement
Netlist Hypergraph and Timing Graph
• Two signal nets: 3 pins (l.blue), and 4 pins (l.green)
• Ovals: hyperedges
• Red edges: timing graph edges
Top-Down Global Placement
• Placement blocks represent cells and layout area
– single block at the start, driven by recursive (min-cut) bipartitioning
– each pass: number of blocks doubles, size of blocks halves
– end case: several cells in a tiny region
etc.
•Intuition: many cells can operate in parallel.
Partitioning finds “independent” groups of cells
Analytical Global Placement
• Find a continuous placement (locations == reals)
• Efficient optimizations when nonconvex constraints are
relaxed (e.g., cells are allowed to overlap)
• Represent multi-pin hyperedges by sets of edges
– minimize total weighted “wirelength” of all edges
P1
P2
Popular objectives:
• Linear (Manhattan) WL = w12 ( |x1-x2| + |y1-y2| )
• Quadratic “squared” WL = w12 ( (x1-x2)2 + (y1-y2)2 )
Constraints: fixed vertices and/or “region constraints”
Analytical Placement Alone is Not Enough
• Many cells overlap
• Must “spread” the placement
• IBM CPlace and XQ
– Remove overlap (comp. geometry)
– Cplace combines min-cut
with analytical techniques
Timing-Driven Placement
• Cycle time  maximum path delay, not total path delay (!)
– max(x,y,...) is not differentiable
– framework: pin-based timing graph
• Analytical approaches allow cell overlaps
– Cell overlaps are resolved later
• Main difficulty: cannot enumerate signal paths
• Signal paths implicitly defined by device types
– signal path sources, sinks == I/O pins and storage elements
• Timing constraints also implicitly defined
– “actual arrival times” (AATs) at sources
– “required arrival times” (RATs) at sinks
– source-sink path constraint: path delay  RAT@sink - AAT@source
Implicit Analysis of Path Constraints
• Static Timing Analysis (STA) methodology
–
–
–
–
forward topological traversal in timing graph  AAT@every_pin
similar backward traversal  RAT@every_pin
slack@pin is given by RAT@pin - AAT@pin
negative slacks  violated timing constraints
• STA-based and STA-inspired placement methods
– slacks  net weights for HPWL minimization
• top-down placement to maximize negative slack (Marek-Sadowska/Lin 86)
– note: STA requires edge delays (e.g., from placement)
– delay budgets
• zero-slack (Hauge, Nair and Yoffa 86)
• iterative min-max (Shragowitz et al. 90/92)
• limit-bumping (Frankle 92)
Motivations For Novelty
• Many promising techniques available
– net reweighting
– delay budgeting
– others
• Existing frameworks have weaknesses
– speed/scalability
– loss or ignorance of input information
• delay budgeting algorithms tend to ignore fixed locations, obstacles
– optimization of “wrong” global objectives (e.g., average wirelength)
The Dimensionless Path-Timing Objective
• For path  consider edge e
• Dimensionless Path-Timing Objective (DPO)
=max {t /c}= max {(e de)/c}
• Where
– c is path constraint
– t is path delay
– de= dij(xi,yi,xj,yj) is edge delay
DPO: Properties
=max {t /c}= max {(e de)/c}
•
  1  all timing constraints are satisfied
• Convex when edge delay models are convex
• Min DPO  max slack when all c are equal
• Max slack can be reduced to min DPO
– add two new vertices: the source and the sink
– connect the source to former sources
– connect the sink to former sinks
– use constant edge delay models
Criticalities: “Multiplicative Slacks”
• By analogy with slack, define criticalities
i = max  v {t /c} for vertex v=vi
ij = max  e {t /c} for edge e=eij
• Criticalities are multiplicative versions of slack
• DPO and criticalities quickly computable
– STA + postprocessing
• Vertex criticalities  cells on critical paths
– can be used by the proposed top-down timing-driven placement flow
Generic Minimization of DPO
• Reduce DPO to a simpler objective: maxij wijdij
– maximal weighted edge delay
– use “reweighting iterations”
• One reweighting iteration
– assume a placement
– compute edge criticalities
– compute new edge weights wij
– minimize maxij wijdij
• (New weights: wij’= ij / dij where  = maxij wijdij )
Properties of Reweighting
• Theorem 1. If  = maxij wijdij does not increase at a
particular iteration, all timing constraints must be
satisfied.
• Theorem 2. A re-weighting iteration either decreases
DPO, or leaves it unchanged.
• Reweighting upper-bounds dij because wijdij  
 can interpret reweighting as delay rebudgeting
• Youssef and Shragowitz used wij= ij in 1990/92
– [interpretation of their iterative MiniMax]
– no iterations with placement: ignore fixed pad locations
Optimization of Maximal Edge Delay
• Must consider particular edge delay models
– popular choices: linear and quadratic
• Theorem 3. 2-dim max edge delay can be reduced to
1-dim case with double #vertices
• [“Inlined” implementation: no new graph]
max akm |tk-tm|
max bkm (tk-tm)2
• Theorem 4. Let bkm=akm2  minimizers coincide
 Linear and quadratic WL are numerically equivalent!
Top-Down Placement Framework
• Top-down placement done in passes
• In one pass
– split every previously existing block
• Cell-to-block assignments
– viewed as region constraints
– gradually refine, converge to cell locs
•
•
•
•
•
Assume we analytically minimized signal delay
 have cell locations  can compute edge delays
 can perform Static Timing Analysis
 know which cells lie on critical paths
Use delay-minimizing cell locs when splitting
blocks
Empirical Validation
• We combined min-max placement with
recursive min-cut bisection (Capo  CapoT)
• Implemented minimization of edge delay objectives:
–
–
–
–
Length as delay
Squared length as delay
Quadratic RC delay
MST-based Elmore delay (using
• Evaluated
– Internal evaluators (after placement): sanity check
– Industry timing analyzer
• Compared to an industry placer on 4 test-cases
– Won on three test-cases (by slack computed with industry STA)
Results of Quadratic, Linear
and Min-Max Placement
Results of Quadratic, Linear
and Min-Max Placement
Conclusions and Ongoing Work
• New timing-driven placement framework
–
–
–
–
can potentially be combined with budgeting or reweighting
expected to be successful enough on its own
leverages mincut placement
relies on a novel analytical delay minimization
• Dimensionless Path-timing Objective (DPO)
– novel global timing objective; generalizes slack optimization
• New minimization algorithms
– reweighting iteration: reduction to simpler MAX-based objective
– MAX-based objective can be minimized very quickly
• Ongoing work in the context of timing-driven flows
Future Work
• Observation (how the proposed method works)
– a classic placement approach is split into stages
– a new timing optimization is performed between those stages
– most critical wires/gates are found first
(traditionally: placement is found first)
 Try other types of optimizations during placement
– routing of timing-critical nets
• better delay estimation
• early cross-talk detection?
– sizing of timing-critical drivers
– buffer insertion for timing-critical nets
– early detection of dangerous cross-talk
Faster and cheaper ICs