Principle of Search space smoothing

Solution Space Smoothing Method
and its Application
Dong Sheqin
Hong Xianlong
董社勤 洪先龙
Department of Computer Science and Technology, Tsinghua
University, Beijing,100084 P.R. China
1
Outline
Principle of Search Space Smoothing
 VLSI Placement based on SSS
 Local Smoothing & Global
Smoothing
 VLSI Placement based on P-SSS
 Experimental Results
 Applications: TSP, Temporal
planning, FPGA Floorplanning

2
NP-hard Problem and
Optimization Algorithm

A NP-Hard Problem has a complicated search
space, a greedy search strategy gets stuck at
one of the deep canyons easily, unable to climb
out and to reach the global energy minimum.
 To avoid getting stuck at one local minimum,
there are commonly two types of approaches. 1).
To introduce complex moves, 2). To introduce
some mechanisms to allow the search to climb
over the energy barrier. (ie. The simulated
annealing algorithm)
3
History of Search Space
Smoothing (SSS)
Jun Gu and Xiaofei Huang proposed a 3rd
approach, that is, to smooth the solution space.
 They had applied this method to the classical
NP-hard problem: TSP. Using the smoothing
function below.

4
Principle of Search Space
Smoothing

By Search Space Smoothing ,The rugged terrain of
search space of a NP-hard problem is smoothed,
and therefore the original problem instance is
transformed into a series of gradually simplified
problem instances. The solutions of the simplified
instances are used to guide the search of more
complicated ones. Finally, the original problem is
solved in the end of the series.
5
Principle of Search space
smoothing
The smoothed solution space 1
The smoothed solution space 2
………
The smoothed solution space n
The original solution space
The
initial
search
point in original space
The minimum solution in original space
An example of one dimensional solution space smoothing: the minimum solution
of solution space i will be the initial starting point in the solution space i+1
6
Formal Description of SSS
Algorithm
//Initialization
α ← α0; x ← x0;
//Search
while (α >= 0) do begin
H(α) ← DoSmooth(α, H);
for (some times) do begin
x’ ← NeighborhoodTransition(x);
if (Acceptable(α, x, x’)) then begin
x ← x’;
end;
end;
α ← NewAlpha(α);
end;
End;
7
History of Search Space
Smoothing (SSS)

Johannes Schneider investigates this method
for traveling salesman problem thoroughly and
pointed out that “the advantage(of search space
smoothing) over the later one(simulated
annealing: SA) is that a certain amount of
computational effort usually provides a better
result than in the case of SA”
 SSS + SA is infeasible proved by analytic and
experimental results of Johannes Schneider
8
History of Search Space
Smoothing (SSS)
9
History of Search Space
Smoothing (SSS)
10
Summary of The Principle of
SSS




To solve the original Problem instance Pi, SSS
first transform Pi to a series of Problem instance
Pi0, Pi1 , Pi2 , Pi3 , Pi4 ……..
it is obvious that Pi0 is similar to Pi1 , Pi1 is similar
to Pi2 , and so on. In some sense, the “distance”
between Pi0 and Pi1 is smaller than the “distance”
between Pi0 and Pi2.
Obviously, the optimal solution of Pi1 is very
close to the optimal solution of Pi0 in some sense.
Because the two problem instances have great
similarity.
In the series, each problem instance is a
gradually smoothed approximation of the
previous problem instance in some sense.
11
Outline
Principle of Search Space Smoothing
 VLSI Placement based on SSS
 Local Smoothing & Global
Smoothing
 VLSI Placement based on P-SSS
 Experimental Results
 Applications: TSP, Temporal
planning, FPGA Floorplanning

12
Problem of VLSI Placement

A Soc is composed of IPs and Macro building
blocks
 The first step to physically design a Soc is
constraint driven floorplanning and placement.
CPU
PLA
I/O
ROM/
RAM
A/D
13
How to smooth the search
space for a Placement instance
--an example

Incremental optimization
(b)
(a)
From an optimal placement of all building
blocks with the same size (a), we can easily
get the optimal placement with the size of
only one block has been changed in (b)
14
How to smooth the search space
for a Placement instance
--an example

Placement instance smoothing(their optimal
solution must be very close to each other)
(Pi3)
(Pi)
(Pi0)
(Pi1)
(Pi2)
Placement instance Pi, Pi0 ,Pi1 , Pi2 , Pi3
15
The basic smoothing
function

To calculate the
first placement
instance by left
formula
1
h
n
n

1
hi , w 
n
i 1
w, h 
w
i
i 1


 h / h 
PinX j  w / wi
PinY

n
j
i
To calculate the successive placement
instances by the formula below:


 1
 1
hi  h

w  wi  w 


wi  
 1
wi  w

h  hi  h


hi  

h  h  hi
 1

w  w  wi 
hi  h
wi  w
PinX j  wi   / wi 
PinY j  hi   / hi 
16
Outline
Principle of Search Space Smoothing
 VLSI Placement based on SSS
 Local Smoothing & Global
Smoothing
 VLSI Placement based on P-SSS
 Experimental Results
 Applications: TSP, Temporal
planning, FPGA Floorplanning

17
Global Smoothing
18
Global Smoothing


For a placement instance which has 4 modules, their sizes are 2 *
1.8, 2 * 1.8,1.8 * 0.5, 1.8 * 0.5, and the corresponding numbers are 1,
2, 3, 4. The global optimal solution is depicted as Fig.1(a), we
name it as solution-1. Obviously, another solution depicted as
Fig.1(b) is not a global optimal solution, we name it as solution-2.
For a BSG as Fig.1(c), if we place modules 2 * 1.8, 2 * 1.8, 1.8 * 0.5,
1.8 * 0.5, their corresponding numbers are 1, 2, 3, 4, in Fig.1(c)
BSG in (2,2), (3, 2), (2, 1), (3, 1), we get a local minimum (solution2). If we use greedy search described above, there is no way to
achieve the global optimal solution from this local minimum.
However, if we “smoothed” the placement instance that the four
modules have the same size, solution-1 and solution-2 will both be
the global optimal solutions of the “smoothed” placement
instances. Note that, original placement instance (2 * 1.8, 2 * 1.8,
1.8 * 0.5, 1.8 * 0.5), smoothed to placement instance (2 * 1.8, 2 * 1.8,
2 * 1.8, 2 * 1.8), or (1.8 * 0.5, 1.8 * 0.5, 1.8 * 0.5, 1.8 * 0.5), the later
two “smoothed” instances are similar to the original instance,
their solution spaces are also similar to the solution space of the
original placement instance, but the number of local minimums
are reduced in the “smoothed” solution spaces.
19
Global Smoothing

Definition 1: Suppose the neighborhood of a
solutionis, is a local minimum, iff
si si  N s0   H s0   H si 

After smoothed with parameter α, we say the
local minimumis eliminated, iff






si  si  N s0   H s0   H si 
20
Local effect of the
smoothing operation

We first randomly choose tens of different solutions, and
for each solution si, randomly select 1000 other solutions
sj (j = 1, 2, …, 1000) within its neighborhood N(si). For
every α, we calculate the energy (area, for Placement
Problem) differences:
 
H ij   H   s j  H   si 

j  1,2,...,1000
Then the Root-Mean-Square (RMS) value of all the 1000
energy differences is:

1000
RMS 
1
1000
i 1
H  
ij

2
21
Local effect of the
smoothing operation
22
Local Smoothing

Definition 2: Suppose the neighborhood of a
solution s0 is, and the size of the neighborhood is :

N s 0   s1 , s 2 , , s N s0 


And we have a vector of energy differences

Then the local smoothness of could be described
by:
Vs0 i   H si   H s0  i  1,2, , N s0 
1
LS 
Vs0
N si 
23
How to make use of Global
Smoothing effect in SSS


For Greedy Local Search, Suppose there are
two
solutions
si
and
sj
within
a
H ij  H s j   H si  , then the
neighborhood,
probability of accepting the transition is
1 H ij  0
Aij  
0 H ij  0
For the global smoothing effect, which changes
the sign of energy difference of some pairs of
solutions, the greedy strategy is effective, but as
to the local smoothing, greedy strategy is
completely insensitive to it.
24
How to make use of Local
smoothing effect

,
In Simulated Annealing, Metropolis algorithm
is used to make a quasi-equilibrium state at a
given temperature t.
.
Aij
t 
1
ΔH ij  0


    ΔH ij 


exp  t  ΔH ij  0

 
H ij

 H ij
H ijt  
H ij
Obviously, t
, for t  1 .
could
t
be viewed as the smoothed result of energy
difference under control parameter t.
25
How to make use of Local
smoothing effect

A local search, which is sensitive to both global
and local smoothing effects, would leads to a
better result.
 The Metropolis function that have a smooth
transition from 1 to 0 of the accepting probability
should be introduced and the local search
algorithm could degenerate to greedy algorithm in
the original un-smoothed search space for
convergence reason.
26
A Local search that can make
use of local smoothing

A Local search with a proper accepting
probability can make use of both global
smoothing effect and local smoothing
effect

1
H ij   0

 


Aij     H ij 
   0
exp

H
ij
  K 
 

27
Outline






Principle of Search Space Smoothing
VLSI Placement based on SSS
Local Smoothing & Global Smoothing
VLSI Placement based on Probability
Search Space Smoothing
Experimental Results
Applications: TSP, Temporal planning,
FPGA Floorplanning
28
Algorithm: Probability-SSS ()





STEP 1: create the initial placement instance according to the
smoothing function.
STEP 2: use a local search with probability acceptance function to
search the solution for the initial placement instance. The result is a
starting solution.
STEP 3: α ← NewAlpha(α) ; apply the smoothing function to the
previous solution to produce a new placement instance.
STEP 4: use local search algorithm a local search with probability
acceptance function to search the solution for the new placement
instance. The result is the current solution.
STEP 5: if =0, stop. The current solution is the final solution.
Otherwise, using the current solution, go to STEP 3.
29
Outline






Principle of Search Space Smoothing
VLSI Placement based on SSS
Local Smoothing & Global Smoothing
VLSI Placement based on Probability
Search Space Smoothing
Experimental Results
Applications: TSP, Temporal planning,
FPGA Floorplanning
30
Experimental Results
TableI: Comparison with the solution quality and run-time
The Area (mm2) / Time(sec) comparison among Fast-SP(on Ultra-I), CBL(on Sparc
20),O-tree(on Ultra60),B*-tree(on Ultra-I),TCG(on Ultra 60), Probability SSS(on v880)
Case
ECBL
Fast-SP
O-tree
B*-tree
TCG
P-SSS
Ami33
P-SSS vs.
others
Ami49
1.192/73
1.205/20
1.242/119
1.27/3417
1.20/306
1.170/31
1.8%
2.9%
5.7%
7.8%
2.5%
0
36.70/117
36.50/31
37.73/526
36.8/4752
36.77/434
36.08/64
1.6%
1.2%
4.3%
1.9%
1.9%
0
P-SSS vs.
others
Tbale II : Minimum / average distribution:O-tree (Sun ultra 1) vs. P-SSS (Sun v880 ) for
simultaneously area and wire length optimization
Circuits
O-tree
P-SSS
2
2
Area (mm )
Wire (mm)
Area (mm )
Wire (mm)
Ami33
1.26 / 1.34
51.6 / 59.8
1.221 / 1.242
31.34 / 39.94
Ami49
39.1 / 42.0
671 / 777
37.60 / 38.18
675.2 / 789.7
31
Experimental Results: placement
example ami33(1)- area usage is 98.85%
32
Experimental Results: placement
example ami49 - area usage is 98.85%
33
Outline






Principle of Search Space Smoothing
VLSI Placement based on SSS
Local Smoothing & Global Smoothing
VLSI Placement based on Probability
Search Space Smoothing
Experimental Results
Applications: TSP, Temporal planning,
FPGA Floorplanning
34
Application:
Using P-SSS to solve TSP

Solution quality (excess over optimal solution )
35
Application: FPGA Temporal
Planning using P-SSS
3D-BSSG representation
36
Experimental Results

Cost Function:
Φ = Volume + β * Wirelength
 Temporal precedence requirements, which
describe the temporal ordering among
modules, should also be satisfied in our
algorithm.
 Using 3D-MCNC benchmarks, two groups
of experiments are performed.
37
Experimental Results

In the first experiment, our objective is to compare P-SSS with
G-SSS as the quantity of precedence constraints varies.
Volume Usage (%)
100
P-SSS
90
G-SSS
80
70
60

2
7
12
17
22
Conclusions from this experiment.
Quantity of Precedence Constraints
1. The increase of the precedence constraints number leads to
decrease of the quality of search.
2. Combination with Metropolis algorithm makes SSS more
powerful than that with Greedy algorithm as local search method.
38
Experimental Results


Experimental results on all circuits of 3D-MCNC
Conclusion: P-SSS algorithm improve over G-SSS algorithm in
both volume and wirelength
39
Experimental Results

In the second experiment, using same benchmarks and same
constraints, we respectively execute the Simulated Annealing
approach and P-SSS algorithm based on two kind of
representation: 3D-subTCG and Sequence Triplet (ST).
40
Best Results of 3Dami49:Volume usage is 84.9%
41
Application:
Heterogeneous FPGA
Floorplanning Based on Instance
Augmentation
42
Instance Augmentation



Instance Augmentation is a new stochastic
optimization method, which showed great ability
in constrained floorplanning, such as fixedoutline floorplanning [Rong Liu, ISCAS05].
Floorplanning for heterogeneous can be regarded
as a constrained floorplanning problem:
– fixed-outline, since size of the device is fixed;
– Each module’s requirement for all kinds of
resources must be satisfied.
Therefore, we applied IA on heterogeneous
floorplanning problem.
43
Overview




Start from sub-instance of the given instance.
That is, it first floorplans a subset of the given
modules.
Simulated annealing or greedy local search may
be adopted to find feasible solutions of specific
instance.
When feasible solution of sub-instance is found,
augment it by inserting modules (called downcasting).
If no feasible solution of current sub-instance is
found, “shrink” it by removing a module (called
up-casting).
Illustration of so called instance augmentation
44
Overview
Note: a solution is feasible iff
all the modules of the
instance are put into the
device and their requirement
for all kinds of resources
fulfilled.
Main flow of Instance Augmentation
45
Some Details

Once augment a sub-instance to a bigger
one, an initial solution of the bigger
instance is also generated by inserting the
module to the feasible solution of the subinstance.
For example: (abcd, badc) -> (aebcd,
baedc)
 Different inserting positions and
realizations of the module is tried to find a
better insertion. Experiments show that
lots of feasible solutions can be obtained
directly by this way.
46
Some Details
When “shrink” current instance to a
smaller one, an initial solution of the
smaller instance is also generated by
removing the module to the feasible
solution of the sub-instance.
For example: (aebcd, baedc)->(abcd, badc)
 To avoid the algorithm stuck in local
minimum, there may be more than one
module be removed when “shrink” current
instance.

47
Some Details

Either simulated annealing or greedy local
search can be used to search for feasible
solution of current instance;
 This work adopted simulated annealing;
 Since the initial solutions often have good
qualities, simulated annealing used here
has:
– very low start temperature;
– few iterations at each temperature;
48
Some Details

Inserting small modules has less
destruction to the floorplan than inserting
large modules.
 Therefore, we
– sort the modules by their requirements for
resources in descending order;
– insert modules in this order;
 Experiments prove that inserting modules
in this order induce higher success-ratios.
49
Problem Definition

Heterogeneous FPGA
device
– Instead of being composed
of similar CLBs, modern
FPGA devices have more
heteroge-neous logical
resources.
– Xilinx’s Vertix II and
Simplified architecture of Xilinx’s
Spartan 3 families are
XC3S5000, which is composed
of CLBs, RAMs and Multipliers.
typical heterogeneous
FPGA devices.
50
Problem Definition

Heterogeneous FPGA floorplanning
– A module in FPGA floorplanning is
associated with a resource requirement
vector r=(nc, nr, nm), indicating that the
module requires nc CLBs, nr RAMs, and nm
Multipliers.
– Given a set of modules and their
connections, the objective of floorplanning
is to place and shape each module inside
the chip so as to fulfill its resource
requirements, assure no modules
overlapping with each other, and a given
cost is optimized.
51
Experimental Results


Employ Xilinx’s XC3S5000 as target device
Generate testing data with different modules and
resource requirements
Results of different testing data.
# of
modules
CLB-rate
RAM-rate
21
72%
88%
23
83%
37
MUL-rate
time(sec)
success-rate
86%
0.1
100%
81%
81%
0.1
100%
94%
75%
75%
2.4
100%
50
78%
78%
76%
0.6
100%
100
78%
79%
77%
1.7
91%
52
Experimental Results
Floorplan of 20 modules,
which is obtained in 1.2
sec. (the resource utilization is 100%)
A resultant floorplan of 50
modules
53
54