Partitioning

Simulated Evolution Algorithm for MultiObjective VLSI Netlist
Bi-Partitioning
Sadiq M. Sait, Aiman El-Maleh, Raslan Al Abaji
King Fahd University of Petroleum & Minerals
Dhahran, Saudi Arabia
http://www.kfupm.edu.sa
27th May, ISCAS-2003, Bangkok, Thailand
Outline






Introduction
Problem Formulation
Cost Functions
Proposed Approaches
Experimental results
Conclusion
2
VLSI Technology Trends
Design Characteristics
0.06M
2MHz
6um
SPICE
Simulation
0.13M
12MHz
1.5um
CAE
Systems,
Silicon
compilation
1.2M
50MHz
0.8um
HDLs,
Synthesis
3.3M
200MHz
0.6um
Top-Down
Design,
Emulation
7.5M
333MHz
0.25um
Cycle-based
simulation,
Formal
Verification
Key CAD Capabilities
The challenges to sustain such a fast growth to achieve giga-scale
integration have shifted in a large degree, from the process of
manufacturing technologies to the design technology.
3
The VLSI Chip in 2006
Technology
Transistors
Logic gates
Size
Clock
Chip I/O’s
Wiring levels
Voltage
Power
Supply current
0.1 um
200 M
40 M
520 mm2
2 - 3.5 GHz
4,000
7-8
0.9 - 1.2
160 Watts
~160 Amps
Performance
Power consumption
Noise immunity
Area
Cost
Time-to-market
Tradeoffs!!!
4
VLSI Design Cycle
VLSI design process is carried out at a number of levels.
1. System Specification
2. Functional Design
3. Logic Design
4. Circuit Design
5. Physical Design
6. Design Verification
7. Fabrication
8. Packaging Testing and Debugging
5
Physical Design
The physical design cycle consists of:
1. Partitioning
2. Floorplanning and Placement
3. Routing
4. Compaction
Physical
design
converts
a
circuit
description
(behavioral/structural), into a geometric description. This
description is used to manufacture a chip.
6
Why do we need Partitioning ?
 Decomposition of a complex system into smaller
subsystems.
 Each subsystem can be designed independently
speeding up the design process (divide-and conquerapproach).
 Dividing a complex IC into a number of functional
blocks, each of them designed by one or a team of
engineers.
 The partitioning scheme has to minimize the
interconnections between subsystems.
7
Levels of Partitioning
System
System Level Partitioning
PCBs
Board Level Partitioning
Chip Level Partitioning
Chips
Subcircuits
/ Blocks
8
Classification of Partitioning Algorithms
Partitioning Algorithms
Group Migration
1. Kernighan-Lin
2. FiducciaMattheyeses
(FM)
3. Multilevel Kway
Partitioning
Iterative Heuristics
1.
Simulated
annealing
2.
Simulated
evolution
3.
Tabu Search
4.
Genetic
Performance
Driven
Others
1. Lawler et al.
1. Spectral
2. Vaishnav
3. Choi et al.
4. Jun’ichiro et
al.
2. Multilevel
Spectral
9
Related previous Works
1999 Two low power oriented techniques based on simulated annealing
(SA) algorithm by choi et al.
1969 A bottom-up approach for delay optimization (clustering) was
proposed by Lawler et al.
1998 A circuit partitioning algorithm under path delay constraint is
proposed by jun’ichiro et al. The proposed algorithm consists of the
clustering and iterative improvement phases.
1999 Enumerative partitioning algorithm targeting low power is
proposed in Vaishnav et al. Enumerates alternate partitioning and
selects a partitioning that has the same delay but less power
dissipation. (not feasible for huge circuits.)
10
Motivation
Need for Power optimization
 Portable devices
 Power consumption is a hindrance in further integration
 Increasing clock frequency
Need for delay optimization
 In current sub micron design wire delay tend to dominate
gate delay.
 Larger die size imply long on-chip global routes, which
affect performance
 Optimizing delay due to off-chip capacitance
11
Objective

Design a class of iterative algorithms for VLSI multi-objective
partitioning.

Explore partitioning from a wider angle and consider circuit delay,
power dissipation and interconnect in the same time, under a given
balance constraint
Objectives
 Power cost is optimized
 Delay cost is optimized
 Cutset cost is optimized
Constraint
 Balanced partitions to a certain tolerance degree (10%)
12
Problem formulation

The circuit is modeled as a hypergraph H(V,E),
where V ={v1,v2,v3,… vn} is a set of modules (cells).

And E = {e1, e2, e3,… ek} is a set of hyperedges.
Being the set of signal nets, each net is a subset of
V containing the modules that the net connects.

A two-way partitioning of a set of nodes V is to
determine two subsets VA and VB such that VA U VB
= V and VA VB = 
13
Cutset




Based on hypergraph model H = (V, E)
Cost 1: c(e) = 1 if e spans more than 1 block
Cutset = sum of hyperedge costs
Efficient gain computation and update
cutset = 3
14
Delay Model
Metal 1
Metal 2
C2
C7
C3
SE1
C1
C4
Cut Line
CoffChip
C5
SE2
C6
Delay ( Pi) 
 Delay (cell )   Delay (net )
cellPi
netPi
Objective : Max Delay ( Pi) 
path : SE1  C1C4C5SE2.
PiP
Delay  = CDSE1 + CDC1+ CDC4+ CDC5+ CDSE2
CDC1 = BDC1 + LFC1 * ( Coffchip + CINPC2+ CINPC3+ CINPC4)
15
Power
The average dynamic power consumed by CMOS logic gate in a
synchronous circuit is given by:
Ni is the number of output
2
Vdd
gate transition per cycle
average
Load
Pi
 0.5 
 Ci
 Ni
(Switching Probability)
Tcycle
CiLoad  Cibasic  Ciextra
load capacitance = Load Capacitances before Partitioning + load due to off chip capacitance
objective:  N i
Total Power dissipation of a Circuit:
P   
2
dd


iv
V
  Cibasic  Ciextra  N i
Tcycle i
16
Unifying Objectives by Fuzzy logic
Weighted Sum Approach




 Delay _ Cost _ of _ Circuit 
 CutsetCost 
 Power 
  Wc 
cos t  Wd * 
  Wp 
Maxdelay
MaxPower 
 MaxCutest 






1. Problems in choosing weights.
2. Need to tune for every circuit.
Imprecise values of the objectives
best represented by linguistic terms that are basis of fuzzy
algebra
 Conflicting objectives
 Operators for aggregating function
17
Fuzzy logic for Multi-objective function
1. The cost to membership mapping
2. Linguistic fuzzy rule for combining the membership values in
an aggregating function
3. Translation of the linguistic rule in form of appropriate fuzzy
operators
4. And-like operators: Min operator  = min (1, 2)
• And-like OWA: = * min (1,2) + ½ (1-) (1+ 2)
• Or-like operators Max operator  = max (1, 2)
• Or-like OWA: = * max (1,2) + ½ (1-) (1+ 2)
Where  is a constant in range [0,1]
18
Membership functions
Where Oi and Ci are lower bound and actual cost of objective “i”
 i(x) is the membership of solution x in set “good ‘i’ ”
gi is the relative acceptance limit for each objective.
19
Fuzzy linguistic rule
A good partitioning can be described by the following fuzzy
rule
IF solution has
small cutset AND
low power AND
short delay AND
good Balance.
THEN it is a good solution
20
Fuzzy cost function
The above rule is translated to AND-like OWA
1
 ( x)    min  C ,  P ,  D ,  B  1     C   P   D   B 
4
 (x)
Represent the total Fuzzy fitness of the solution, our aim
is to Maximize this fitness
C ,  P ,  D ,  B Respectively (Cutset, Power, Delay, Balance)
Fitness
21
Simulated Evolution
Algorithm Simulated evolution
Begin
Start with an initial feasible Partition S
Repeat
Evaluation :
Evaluate the Gi (goodness) of all modules
Selection :
For each Vi (cell) DO
begin
if Random Rm > Gi then select the cell
End For
Allocation: For each selected Vi (cell) DO
begin
Move the cell to destination Block.
End For
Until Stopping criteria is satisfied.
Return best solution.
End
22
Cut goodness
d i  wi
gci 
di
gc5 
3 2
3
 0.33
di: set of all nets, Connected and not cut.
wi : set of all nets, Connected and cut.
23
Power Goodness
k
gpi 
k
 S  j V    S  j U 
j 1
j
I
j
j 1
i
k
 S  j V 
j 1
j
I
Vi is the set of all nets connected
and Ui is the set of all nets
connected and cut.
0.7  0.4
gp5 
 0.428
0.7
24
Delay Goodness
gd i 
K i  Li
Ki
Ki: is the set of cells in all paths passing
by cell i.
Li: is the set of cells in all paths passing by
cell i and are not in same block as i.
52
gd 5 
 0.6
5
gd 4 
53
 0.4
5
25
Final selection Fuzzy rule
IF Cell ‘i’ is near its optimal Cut-set goodness as
compared to other cells
AND
AND
near its optimal power goodness
compared to other cells
near its optimal net delay goodness as compared
to other cells
OR T(max)(i) is much smaller than Tmax
THEN it has a high goodness.
26
Fuzzy Goodness
Tmax :delay of most critical path in
current iteration.
T(max)(i) :delay of longest path
traversing cell i.
Xpath= Tmax / T(max)(i)
1
g i   i ( x)    min  iC ,  iP ,  iD  1     iC   iP   iD 
3
 iC ,  iP ,  iD
Respectively (Cutset, Power, Delay ) goodness.
27
Experimental Results
ISCAS 85-89 Benchmark Circuits
28
SimE versus Tabu Search & GA against time
Circuit: s13207
29
Experimental Results: SimE versus TS and GA
SimE results were better than TS and GA, with faster execution time.
30
Conclusion :Re-write this

The present work successfully addressed the
important issue of reducing power and delay
consumption in VLSI circuits.

The present work successfully formulate and
provide solutions to the problem of multi-objective
VLSI partitioning.

TS partitioning algorithm outperformed GA in terms
of quality of solution and execution time.
31
Thank you
32