A Mixed Heuristic for Circuit Partitioning | SpringerLink

Computational Optimization and Applications, 23, 321–340, 2002
c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands.
A Mixed Heuristic for Circuit Partitioning
C. GIL∗
Departamento de Arquitectura de Computadores y Electrónica, Universidad de Almerı́a,
La Cañada de San Urbano s/n, 04120 Almerı́a, SPAIN
[email protected]
J. ORTEGA
Departamento de Arquitectura y Tecnologı́a de Computadores, Universidad de Granada,
Campus de Fuentenueva, Granada, SPAIN
[email protected]
M.G. MONTOYA AND R. BAÑOS
Departamento de Arquitectura de Computadores y Electrónica, Universidad de Almerı́a,
La Cañada de San Urbano s/n, 04120 Almerı́a, SPAIN
[email protected]
Abstract. As general-purpose parallel computers are increasingly being used to speed up different VLSI applications, the development of parallel algorithms for circuit testing, logic minimization and simulation, HDL-based
synthesis, etc. is currently a field of increasing research activity. This paper describes a circuit partitioning algorithm which mixes Simulated Annealing (SA) and Tabu Search (TS) heuristics. The goal of such an algorithm is to
obtain a balanced distribution of the target circuit among the processors of the multicomputer allowing a parallel
CAD application for Test Pattern Generation to provide good efficiency. The results obtained indicate that the
proposed algorithm outperforms both a pure Simulated Annealing and a Tabu Search. Moreover, the usefulness of
the algorithm in providing a balanced workload distribution is demonstrated by the efficiency results obtained by
a topological partitioning parallel test-pattern generator in which the proposed algorithm has been included. An
extented algorithm that works with general graphs to compare our approach with other state of the art algorithms
has been also included.
Keywords: circuit partitioning, optimisation, parallel test pattern generation, simulated annealing, Tabu Search
1.
Introduction
The circuit partitioning problem arises in many VLSI applications [2, 24]. Due to the
increasing complexity of VLSI circuits, the NP-complete [11] character of many VLSI CAD
problems makes a “divide and conquer” approach more attractive to solve these problems in
reasonable periods of time by parallel processing, and to handle arbitrarily large circuits, that
may not fit in the memory of standard workstations, on distributed memory multiprocessors.
The usefulness of parallel processing to speed up the resolution of VLSI CAD problems
and to address the circuit storage problems has been considered in the recent literature on
circuit testing, logic synthesis, cell placement, etc. [5, 7].
In this way, circuit partitioning has become an important previous step in VLSI CAD
applications [7]. It appears when trying to exploit the concurrency in the target circuit (data
∗ Author
to whom correspondence should be addressed.
322
GIL ET AL.
parallelism) instead of exploiting the concurrency of the algorithm ( functional parallelism)
[24]. In any parallel application, the workload distribution among the processors of a parallel computer is an important factor for efficient use of the parallel computer. For some
applications it is difficult to provide a graph model for the processing and communication
volumes corresponding to the tasks of a program, and the usefulness of procedures for
the workload distribution based on graphs is reduced. In such circumstances, a dynamic
load-balancing procedure is required [25]. However, the testing application we are interested in is usually based on applying a given procedure to the different circuit elements
(logic gates and connections between them). Thus, as the data structure of the algorithm
is defined by the corresponding netlist, it is relatively easy to describe the program by a
graph. The volume of processing associated with each node is that corresponding to the
application of the algorithm to the elements of the circuit allocated to a given processor,
and the communication cost results from transferring data between processors with interconnected subcircuits allocated. Due to these characteristics, it is very useful to possess
efficient algorithms for circuit partitioning because these would allow a balanced distribution of the workload among processors. Particularly interesting are those algorithms that
can be applied to irregular and sparse graphs, as these are the graphs normally associated
with digital circuits.
In this paper we present a procedure for circuit partitioning in the context of parallel
test pattern generation. In a moderate time, the algorithm is able to find a partitioning
of the circuit graph so that the parallel overall run time of the test generation process is
minimised. This implies both maximizing the processor’s concurrency and minimizing the
communication overhead, thus the objective function that we have used simultaneously
takes these two objectives into account.
Several approaches for circuit partitioning have been reported [3, 4, 8, 10, 15, 17–19, 21,
22, 27, 29, 31]. They can be classified as combinatorial or move-based approaches [10, 17,
22], approaches based on geometric representations [15, 18], multilevel and hierarchical
clustering [8, 19, 21, 31] and hybrid schemes [4] that combine diffrentes types of approaches
and can maximize the advantages of them.
The procedure here proposed belongs to the class of move-based approaches and it is also
a hybrid scheme in which the solution is built iteratively from an initial solution by applying
a move or transformation to the current solution. The set of possible transformations that
can be applied to a given solution defines the neighbourhood structure of the solution space,
which is explored repeatedly moving from the current solution to a neighbouring one.
These move-based procedures are simple to describe and implement, and thus, this kind of
procedures is the most frequently used, together with a multilevel approach in some cases.
The move-based procedures include iterative improvement methods [10, 22, 27], which
move from the current solution to the best solution in its neighbour, and stochastic hilldescending procedures such as those based on Simulated Annealing (SA) [1], Tabu Search
(TS) [16], and Genetic Algorithms (GA) [26], which allow movements towards solutions
worse than the current one in order to escape from local minima.
Iterative improvement algorithms such as the algorithms of Kernighan-Lin (KL) [22] and
Fiduccia-Mattheyses (FM) [10] for graph bipartitioning, and [19, 21, 27, 31] for partitioning
into multiple blocks, are widely applied and have almost become standards, their results
frequently being used for comparison with other methods. The stochastic hill-descending
A MIXED HEURISTIC FOR CIRCUIT PARTITIONING
323
procedures previously indicated are metaheuristics that allow the user to define the time
complexity by deciding to trade solution quality for speed. Thus, these metaheuristics are
very suitable when it is important to have a good solution in a limited amount of time. This
is the situation when the partitioning algorithm is used to distribute the load in a parallel
program because a large amount of time to obtain it will limit the efficiency.
SA is used in [29] to find near optimal solutions to the problem of partitioning a digital
combinational circuit for pseudo-exhaustive testing. An algorithm for circuit partitioning
based on the TS metaheuristic, also applied to pseudo-exhaustive testing, was reported in
[3]. Nevertheless, the goals of this application are different from those considered in the
present paper. In [3], the partitioning problem involves the division of the target circuit into
non-overlapping subcircuits with no more than a given number of inputs each, and is subject
to some connectivity constraints. In our case the number of inputs to each subcircuit is not
limited.
In [20], SA is compared with iterative algorithms and SA is found to outperform the
KL algorithm for geometric and random graphs. However, it is suggested that multiple
runs of KL with random initial solutions would be better than SA for the kind of graphs
that are applied to circuit netlists. SA is not usually seen as an effective approach for VLSI
applications because the computing times are quite large. For example, at low temperatures,
many candidate solutions are explored and rejected before accepting an improved solution.
Instead, TS uses a tabu list to avoid cycling near local optima and to enable moves towards
worse solutions. Thus, it is usually argued that TS is able to explore the solution space
more efficiently than SA because it does not waste time in previously visited regions of the
solution space. In any case, these approaches need not necessarily be seen as opposed and
can be combined to obtain an improved procedure without the drawbacks characteristic of
each method.
Thus, hybrid methods has been proposed such as [4], in which a heuristic that mixes Tabu
Search and genetics algorithms is applied to the circuit partitioning problem, and a classification of hybrid algorithms is also provided. A hybrid method that allows the temperature
parameter to be strategically manipulated, rather than progressively diminished, has been
shown to yield an improved performance over standard SA approaches [9]. The algorithm
here proposed can also be considered as a hybrid heuristic with additional elements of a Tabu
Search in a Simulated Annealing algorithm; thus it is termed Mixed Simulated Annealing
and Tabu Search algorithm (MSATS).
Then, a large number of graph partitioning schemes exist and they differ in the edgecut quality produced, run time, degree of parallelism and applicability to certain kinds
of graphs. Often, it is not clear as to which scheme is better under what scenarios. In
[28], these properties are categorized for some graph partitioning algorithms and in [30],
the edge-cut quality is analized with different benchmarks for some graph partitioning
packages.
In the following, Section 2 gives a more precise definition of the circuit partitioning
problem and describes the cost function to optimize used in this paper. The description of
the proposed algorithm (MSATS) is provided in Section 3, along with the experimental
results for circuit partitioning in the context of parallel test pattern generation. Section 4
extends the original MSATS to work with graphs that are not restricted to directed acyclic
graph (combinatorial circuits) and Finally Section 5 presents conclusions of the paper.
324
2.
GIL ET AL.
The circuit partitioning problem
The circuit partitioning problem consists of finding a decomposition of the target circuit
into non-overlapping subcircuits with at least one logical gate in each subcircuit. Among the
different objectives that may be satisfied by the desired partitioning are (i) the minimization
of the number of cuts, (ii) the minimization of the number of subcircuits, and (iii) the
minimization of the deviation in the number of elements (inputs, logical gates, outputs and
fanouts points) assigned to each partition.
Criterion (i) corresponds to minimizing the communication cost, since cutting a line
usually implies passing data between the processors where the subcircuits connected by
the cut line have been assigned. Criterion (ii) is used when the goal is to determine the
partition consuming less resources (processors in this case). Finally, criterion (iii) corresponds to the obtention of subcircuits of similar sizes in order to get a balanced workload
distribution among processors. As our goal is to use all the available processors in the
multicomputer to generate the test patterns in parallel while trying to keep all the processors working during all the run time, the number of subcircuits is fixed to be equal
to the number of available processors in the machine, and the objectives correspond to
criteria (i) and (iii). This means obtaining subcircuits with similar sizes to balance the
workload of the processor (considered as proportional to the number of nodes), and minimizing the number of cuts. In the following, a mathematical formulation of the problem is
provided.
Let G = (X , A) be the directed acyclic graph associated with a combinational circuit C,
where X denotes the set of components (inputs, logical gates, and outputs) and A the set of
lines used for signal propagation. The nodes of X can be classified as inputs, logical gates
and outputs of circuit C. Thus X is the union of three disjoint sets, the set of inputs E, the
set of logical gates P(nodes), and the set of outputs O. Figure 1 shows an example circuit
with its graph representation.
e1
e2
p1
p3
e3
e4
p2
e5
e6
e1
p5
p6
p1
e2
p7
o1
e3
o2
e4
e5
e6
p4
p3
p5
p6
p2
p7
p4
e7
e7
e8
o2
e8
(a)
o1
(b)
Figure 1. Representation of a combinational circuit with 7 logic gates (a) and the directed graph associated with
it (b).
A MIXED HEURISTIC FOR CIRCUIT PARTITIONING
325
The problem is to find a partition of X into a fixed number of K subsets X k , (k = 1,. . . ,K )
such that each induced subgraph satisfies the following conditions:
◦
K
1. X = k=1
X k and X k X h = ✪, ∀k = h, (k, h) ∈ {1, . . . , K }2
2. pk = |X k ∩ P| = ✪ ∀k = 1, . . . , K
3. L i ≤ pk ≤ L u , with L i = n/k − n/k*θ and L u = n/k + n/k*θ, k =
1, . . . , K ; where n = |P|; n/k represents the number of gates that should be included
in each subcircuit to obtain a partition of similar sized subcircuits; and θ is the parameter
representing the proportion of gates that is tolerated as a imbalance tolerance with respect
to n/k or perfect balance. In the first part of this work (MSATS), θ has been set to
values between 0.2 and 0.3. and 0.03 in the second part (RMSATS).
4. G k (X k , Ak ) is a connected graph, ∀k = 1, . . . , K .
◦
In this way, the problem can be formulated as a combinatorial optimization problem,
which means:
minimize c (s) (or maximize c (s)), subject to s ∈ S
where S is a discrete set of feasible solutions and c(s) is the cost or objective function. Thus,
solving a combinatorial optimization problem implies finding the best or optimal solution
among a finite or countably infinite number of possible solutions. In [29] the formulation
of the partitioning problem as an integer linear programming problem and as a quadratic
transportation problem is provided. Here, given a circuit graph G = (X, A), the cost function
to minimize is defined as:
c(s) = α · n cuts(s) + β ·
K
2deviation(k)
(1)
k=1
where deviation(k) is the amount by which the number of gates in the subcircuit G k varies
from the bounds L u or L i
deviation(k) = maximum {0, |X k ∩ P| − L u , L i − |X k ∩ P|}
n cuts(s) is the number of cuts of the solution, and s is any solution to the circuit partitioning
problem, feasible or not, i.e. verifying the above condition 3 or not. Thus, whenever for
all k = 1, . . . , K , in a given partition s, the deviation in the number of gates of G k with
respect to n/k is less than θ · n/k., the solution is feasible and the cost is c(s) =
α · n cuts(s) + β · K since deviation(k) = 0 (k = 1, . . . ,K ).The second term in (1) penalizes
the deviation from the feasible solution space and its magnitude is determined by the constant
β. Nevertheless, according to the relative magnitude of α and β, a transition to a solution
s determining a deviation higher than θ · n/k in the number of gates of any subcircuit
still reduces the cost function if the reduction in the number of cuts is sufficiently high. For
example, if α = A · β is set, a transition to a solution with a reduction of one in n cuts(s)
K
2deviation(k) ) − K ) is less
will still produce a reduction in the cost function whenever (( k=1
than the factor A.
326
GIL ET AL.
The proposed cost function does not take into account the connection topology of the
multicomputer where the parallel program is executed. In the present paper the communication cost has been assumed to depend only on the volume of communications between
the processors, thus considering equal the distance between processors. Nevertheless, this
assumption is indeed verified in commercial architectures that use a specific hardware or
a software layer for message routing, providing homogeneous latency between processors.
Moreover, the specific characteristics of the cost function do not influence the evaluation of
the proposed algorithm, or the comparison with SA and TS because, as indicated in Section
1, these are meta-heuristics which do not depend on the characteristics of the cost function
considered.
In the following sections, the description of the procedure proposed to solve this combinatorial optimization problem is described. As the procedure is based on a mixing of
Simulated Annealing and Tabu Search techniques, a brief introduction to the notation and
terminology is first given.
3.
The MSATS algorithm
As has been said, move-based procedures for solving combinatorial optimization problems,
such as the partitioning problem, implement a local search in the solution space S starting
from an initial solution s0 ∈ S. At each iteration, a heuristic is used to obtain a new solution
s in the neighbourhood, N (s), of the current solution of s, through applying transformations, or moves, to s. Every feasible solution, s̄ ∈ N (s), is evaluated according to the cost
function c(s̄) to be optimized, thus determining a change in the value of the cost function,
move value = c(s̄) − c(s). The basic local search approach corresponds to the so-called
hill-descending algorithms, in which a monotone sequence of improving solutions is examined, until a local optimum is found. Hill-descending algorithms always stop at the first
local optimum. To avoid this drawback, several metaheuristics have been proposed in the
literature, such as Simulated Annealing [1] and Tabu Search [16]. These use mechanisms
that allow moves which increase the cost of the current solution as an attempt to escape
from local optima. Simulated Annealing and Tabu Search have been widely applied to
partitioning circuits and many other combinatorial optimization problems. The MSATS
(Mixed Simulated Annealing Tabu Search) procedure described in this section, is a hybrid
method that takes advantage of both meta-heuristics to outperform the results provided by
each.
At each iteration of MSATS, admissible moves are applied to the current solution allowing transitions that increase the cost function as in Simulated Annealing. When a move
increasing the cost function is accepted, the reverse move should be forbidden during some
iterations in order to avoid cycling, as in Tabu Search. The restrictions in the admissible
moves are implemented by using a short term memory function which determines how long
a tabu restriction will be enforced and the admissible moves at each iteration.
The MSATS algorithm is shown in figure 2. It adds the characteristics of the search
implemented by Simulated Annealing to the features of Tabu Search which correspond to a
search that centres more on specific zones according to the history and the best movements
applied. Thus, a powerful algorithm is provided, as the results given in Subsection 3.1 show.
A MIXED HEURISTIC FOR CIRCUIT PARTITIONING
Figure 2.
Algorithm MSATS for circuit partitioning.
327
328
GIL ET AL.
In MSATS, the temperature t is used as a parameter to control the probability of accepting a
new solution, as in Simulated Annealing. At a given temperature, only the solutions which
are selected by using the SA cooling schedule are considered as candidates to produce a
transition. Thus, a certain randomness is introduced into a pure TS, in order to explore zones
of the solution space that do not appear as very promising at first. As the algorithm also has
the characteristics of a Tabu Search, it avoids the cycles around local minima, allowing a
more efficient exploration of the solution space without revisiting solutions, as may occur
in a pure SA.
Two different initial solutions, s1 and s2, have been used in our experiments. They are
obtained by fast algorithms that assign n/K nodes to each partition. The initial solution s1
is obtained by an algorithm named Input Partitioning, in which the circuit graph is traversed
in a depth-first way starting from the inputs. Solution s2 is provided by an algorithm called
Output Partitioning which processes the graph in a depth-first manner from the output
nodes. These partitioning algorithms have been applied for circuit partitioning in parallel
logic simulations and they take O(n) time and obtain partitions in which strongly connected
components of the graph are assigned to the same partition. As shown in Subsection 3.1,
the quality of the best solution found with MSATS (and also with SA and with TS) is
sometimes affected by the choice of one or another initial solution. Nevertheless, the best
initial solution depends on the specific circuit considered and on the control parameters. In
many cases both initial solutions provide similar results. Figure 3 shows two examples of
initial solutions s1 (figure 3(a)) and s2 (figure 3(b)) for the circuit of figure 1.
In the application of MSATS to the partitioning problem, the neighborhood N(s) of the
current solution s contains all solutions s̄ which may be obtained from s by transferring only
one gate from one subcircuit (source subcircuit) to an other circuit (destination subcircuit).
The gate that is transferred must belong to the boundary of the corresponding subcircuit,
which contains all the gates connected at least to one gate belonging to a different subcircuit.
These gates are called boundary gates. Moreover, the destination subcircuit must be one of
c1
e4
p5
p
6
p
o1
c2
e8
e5
e6
Subcircuit 1
e1
e2
p
1
e1
e2
7
p3
p2
p
4
o2
e7
c1
Subcircuit 3
e3
c2
1
p3
p5
c1
e3
e4
p2
e5
e6
e7
Subcircuit 2
p
Subcircuit 1
p4
c2
o2
p
c1
c2
6
p
7
o1
e8
Subcircuit 2
Subcircuit 3
(a)
(b)
Figure 3. A partition example using the circuit in figure 1(a), for k = 3 with s1 initial solution (a) and s2 initial
solution (b).
329
A MIXED HEURISTIC FOR CIRCUIT PARTITIONING
the subcircuits connected to the gate transferred. In this way, the condition 4 described in
Section 2 is verified during the process.
As can be seen in figure 2, the set of solutions explored at a given temperature is defined
by selecting the boundary gates, according to their level in the circuit. Of course, the set of
boundary gates might change during process as the solution s changes. For example, due to
a move in an other boundary gate, a boundary gate would no longer be such a boundary gate.
Furthermore, a no-boundary gate can become such a gate if it is connected to a boundary
gate which is moved to a different subcircuit. In the first case, the gate will not be selected,
and in the case of a new boundary gate, it can be selected in subsequent moves at the present
temperature. In any case, as each gate can be selected only once at a given temperature, the
number of solutions explored at this temperature is finite. When a boundary gate is selected
it is allocated to the subcircuit having more lines connected to this boundary gate. Whenever
this move is accepted and implies an increase in the cost function, the inverse move, which
allocates the gate in the initial subcircuit, is included in the short term memory to tag it as
a tabu move. The move is maintained in the short term memory during the next iteration
of the while loop because, after one iteration, it is quite likely that the present solution has
changed enough with respect to the solution to which the move was applied. At a given
temperature, a boundary gate can be selected only once.
Figure 4 shows an example of boundary gates movements. After selecting the boundary
gate, the algorithm determines if it is possible to move it according to the problem constraints.
If a move in the selected gate is allowed, the gate is allocated to the subcircuit with most gates
connected to it. In this example the algorithm begins with p1 and p2 (figure 4(a)). As these
gates are not boundary gates, they are not selected, and the algorithm proceeds with gate
p3. Gate p3 is a boundary gate but it cannot be moved because a move in gate p3 will leave
only a gate in subcircuit 2 and each subcircuit must have at least 2 gates. The same happens
with gate p4. Gate 5 is a boundary gate that can be moved and it is allocated to subcircuit
2. In this case, the number of cuts remains the same because one cut appears and also one
cut disappears. As p3 and p4, gate p6 cannot be moved, and finally p7 is not a boundary
gate. Thus, the circuit after the first iteration of the while loop is shown in figure 4(b).
c1
e4
p5
p6
c1
c2
o1
c2
e8
e1
e2
p6
p7
e5
e6
Subcircuit 1
p1
p3
c1
p2
p4
c2
e1
e2
Subcircuit 3
e3
e3
Subcircuit 2
o1
e8
o2
e7
p7
Subcircuit 1
p1
p3
e4
e5
e6
p2
p4
o2
e7
p5
Subcircuit 3
c1
Subcircuit 2
(a)
c2
(b)
Figure 4. An example of moving the boundary gate p5 in subcircuit 1 (a) to the subcircuit 2 (b).
330
GIL ET AL.
MSATS stops when one of the following conditions is verified: (i) the temperature is
equal to a final value (in this paper a temperature equal to zero has been used as final
temperature), (ii) the number of moves applied without improving the best solution found
so far (n failures) reaches a maximum bound of consecutive iterations (max failures), and
(iii) the number of iterations reaches the value max iteration (figure 2).
At high temperatures, MSATS behaves almost as a pure SA because most of the transitions
are accepted, and it is very unlikely to need to select one of the transitions included in the
tabu list. On the other hand, at low temperatures the solutions that increase the cost are
rarely selected in an SA and the effect of Tabu Search is also small. Thus, the difference in
the behaviour of MSATS with respect to a pure SA due to the use of tabu moves is more
important at intermediate temperatures. The effect of the TS elements included in MSATS
allows the reduction of the number of iterations to get a solution with a given quality and
therefore, in the end MSATS is able to improve the quality of the solution obtained, as can
be seen from the results provided in Subsection 3.1.
3.1.
Experimental results
Next, we summarize the results obtained by using the MSATS algorithm. The algorithm
was programmed in C and executed on a Power Challenge XL (Silicon Graphics). The
circuits used to evaluate the performance of the hybrid algorithm are the ISCAS’85 [6]
as they are common benchmark circuits in the context of test pattern generation. The first
seven rows of Table 1 (c432, . . . ,c6288) shows the basic characteristics of these circuits.
The values of the parameters are set to their best values according to previous experimental
results obtained, thus max failures is set to 0.25*(max iterations), and tfactor to 0.999. The
value of max iterations is usually taken as 1000, and the initial temperature, t0 as 10. For
these values, the cost function reaches a stable final value at the end of the 1000 iterations.
An increase in the number of iterations, in most of cases, does not allow us to get better
solutions unless the value of tfactor is increased. Nevertheless, this also implies an increase
in the run time and, except for extremely large circuits, the new solution does not imply a
great improvement.
Table 2 shows the best results obtained by the Simulated Annealing, Tabu Search and
MSATS compared with the initial solution s1 for partitions (K ) of 2, 4, 8, 16 and 32
subcircuits. The row cut reduction indicates the average of the reduction in the number of
cuts obtained by MSATS with respect to the best solution of those provided by SA and TS in
each case. As can be seen the results obtained by MSATS are better than the results obtained
with TS and SA in most cases, specially when the circuit size increases. In these circuits,
the neighbourhood of a given solution is large, and the effect of considering tabu transitions
is more important. Moreover, Table 2 compares the computing times for MSATS, TS and
SA. As can be seen the times for MSATS are similar to those of Simulated Annealing and
lower than those of Tabu Search.
With respect to a pure TS algorithm, MSATS accepts a smaller number of solutions
increasing the cost function. Thus, although the number of iterations executed by MSATS
is similar to that of TS, as an iteration consumes more time in the TS algorithm, it takes
longer to stop. As is shown in figure 4, although a TS algorithm is able to reach solutions
331
A MIXED HEURISTIC FOR CIRCUIT PARTITIONING
Table 1.
Benchmark graphs.
Graph
V
E
max
min
avg
Tam file (KB)
c432
292
432
10
1
2.96
3
c499
334
499
13
1
2.99
4
c880
594
880
9
1
2.96
6
c1355
878
1355
13
1
3.09
10
c3540
2320
3540
17
1
3.05
31
c5315
3414
5315
16
1
3.11
49
c6288
3936
6288
17
1
3.2
58
add20
2395
7462
123
1
6.23
63
data
2851
15093
17
3
10.59
140
3elt
4720
13722
9
3
5.81
136
uk
4824
6837
3
1
2.83
70
add32
4960
9462
31
1
3.82
90
whitaker3
9800
28989
8
3
5.92
294
crack
10240
30380
9
3
5.93
297
wing nodal
10937
75488
28
5
13.80
768
fe 4elt2
11143
32818
12
3
5.89
341
vibrobox
12328
165250
120
8
26.81
1679
4elt
15606
45878
10
3
5.88
501
fe sphere
16386
49152
6
4
6.00
540
cti
16840
48232
6
3
5.73
532
memplus
17758
54196
573
1
6.10
536
cs4
22499
43858
4
2
3.90
506
62032
78136
110971
121544
452591
741934
4
39
26
2
3
5
3.92
11.58
13.37
1482
5413
9030
wing
fe tooth
598a
fe ocean
143437
409593
6
1
5.71
5242
wave
156317
1059331
44
3
13.55
13479
with small values in the cost function, it takes a long time trying to improve that solutions
without reaching the stop condition, determined by the number of moves without obtaining
an improvement in the cost function and a maximum number of iterations.
Furthermore, MSATS is able to obtain solutions with a more balanced distribution of gates
among the different subcircuits. For example, in the partition of c432 into two subcircuits,
MSATS provides subcircuits of 76 and 77 gates respectively, while TS obtains subcircuits
with 65 and 88 gates, and SA subcircuits with 70 and 83.
3.2.
MSATS in a parallel test pattern generator
The MSATS algorithm has been used as a first step in a parallel test-pattern generator
[12, 14]. It starts by applying MSATS for partitioning the circuit under test, and after
332
GIL ET AL.
Table 2. A summary of results with the best obtained solutions and the times for the algorithms: Initial solution
(s1), Simulated Annealing (SA), Tabu Search (TS) and MSATS.
K =2
Circuit
K =8
K = 16
K = 32
Cuts
Sec.
Cuts
Sec.
Cuts
Sec.
Cuts
Sec.
Cuts
Sec.
s1
SA
TS
MSATS
Cuts reduction (%)
66
21
22
20
0,2
2,0
6,1
2,1
106
45
41
40
0,3
2,2
9,2
2,3
158
68
62
61
0,4
2,8
11,2
2,8
186
93
87
85
0,5
3,9
13,6
3,7
212
142
130
128
0,6
7,1
15,3
6,9
c499
s1
SA
TS
MSATS
Cuts reduction (%)
41
17
18
16
c880
s1
SA
TS
MSATS
Cuts reduction (%)
44
19
19
17
c1355
s1
SA
TS
MSATS
Cuts reduction (%)
60
28
38
17
c1908
s1
SA
TS
MSATS
Cuts reduction (%)
123
55
50
35
c3540
s1
SA
TS
MSATS
Cuts reduction (%)
116
67
96
46
c5315
s1
SA
TS
MSATS
Cuts reduction (%)
188
61
88
50
s1
SA
TS
MSATS
Cuts reduction (%)
60
48
46
46
c432
c6288
Algorithm
K =4
5
3
0,2
2,2
7,2
2,1
94
58
51
54
0,3
3,0
18,1
3,1
100
41
43
37
0,4
5,3
21,3
5,6
137
45
46
45
0,3
2,3
11,4
2,4
143
86
79
78
0,4
3,8
28,5
3,9
155
98
89
80
0,5
6,0
34,6
6,1
240
97
94
70
−5
6
11
210
74
76
71
0,8
15,0
49,2
15,1
333
156
179
132
1,1
20,9
80,3
21,4
332
162
172
160
30
0,7
9,5
60,2
9,5
298
125
120
108
0,9
16,2
83,2
16,9
535
250
259
221
1,4
28,1
115,3
28,4
1,5
22,5
93,4
22,9
496
350
309
174
222
129
129
129
0,7
8,1
39,4
7,9
351
98
98
98
1,8
30,8
145,7
30,6
25
225
172
175
168
0,7
6,3
42,3
6,1
300
189
185
179
0,9
11,2
47,8
10,2
394
156
148
128
1,2
20,9
76,4
20,1
771
375
377
298
1,9
25,8
116,4
25,8
44
654
301
402
286
2,3
35,1
168,5
34,2
671
450
550
355
0,7
7,5
19,5
7,2
3
0,8
8,5
53,3
8,2
4
0
450
168
156
125
5
0,6
4,1
17,1
3,9
0
0,9
11,5
68,5
10,9
334
315
320
301
2
10
1,1
16,4
61,9
15,7
14
1,2
12,9
78,5
12,1
629
234
225
205
1,5
26,5
90,7
25,4
1023
478
498
455
2,3
33,4
140,3
31,6
939
542
532
407
3,1
52,3
203,6
48,7
24
2,9
44,8
211,8
42,6
22
947
745
774
524
3,8
70,5
290,6
66,9
30
20
12
2
155
135
157
102
0,6
5,1
35,7
4,9
10
16
24
205
132
132
119
26
15
32
0,5
3,2
15,2
3,1
11
0
0,6
8,0
54,2
8,2
3
2
10
40
0
2
1,4
20,2
84,9
19,0
9
21
1,9
42,8
124,6
40,6
5
5
333
A MIXED HEURISTIC FOR CIRCUIT PARTITIONING
16
c432
c499
14
c880
12
10
8
Speedup
6
4
2
0
2
4
6
8
10
12
14
16
Processors
Figure 5.
Speedup with some ISCAS circuits for test pattern generation.
this each processor receives one of the subcircuits obtained. Thus, it is possible for all
the processors to concurrently apply the test generation algorithm, described in [13], to
determine the test patterns for the stuck-at faults in the nodes of the corresponding subcircuit. The communication between processors is needed to complete the determination of
the test equation in each node, and it grows with the number of cuts among subcircuits.
Thus, one way to demonstrate the performance of MSATS is to consider the increase in
the speedup provided by the parallel test-generator when the number of processors grows.
If the speedup grows proportionally to the number of processors, or in other words, if the
efficiency is more or less constant, the performance of MSATS is adequate according to
the conditions given in Section 2. This behaviour is observed in figure 5, which shows the
speedup obtained for different circuits of the ISCAS set when the number of processors
working grows. Speedup results for the ISCAS’85 circuits are provided in Table 3. The
parallel test generator has been run in a multicomputer Intel Paragon. The speedups obtained with K processors are given in the columns labeled S K and the number of cuts
produced when the circuit is partitioned in K subcircuits are given in the columns C K in
Table 3.
As has been indicated, the cost function used in this paper does not model characteristics
of the multicomputer to be used, such as the interconnection topology and the relative
communication and computation costs. This is justified by taking into account that in many
cases the communication delays are similar for all the processors in the machine due to the
existence of a suitable specific hardware or to the use of a corresponding software layer for
334
GIL ET AL.
Table 3.
Speedups, number of cuts and fault coverages obtained with the parallel test-pattern-generator.
Circuit
Faults
S2
C2
S4
C4
S8
C8
S 12
C 12
S 16
C 16
Coverage
864
1.69
20
3.70
40
6.23
61
8.76
72
10.34
85
98%
c499
998
1.78
16
3.85
54
6.34
78
8.20
96
11.43
119
99%
c880
1760
1.82
17
3.87
37
7.05
80
9.34
98
12.56
129
100%
c1355
2710
1.90
17
2.93
45
6.25
70
9.45
81
11.65
98
98%
c1908
3816
1.97
35
2.50
71
6.86
108
9.67
113
10.26
125
98%
c3540
7080
1.78
46
3.67
132
6.40
221
8.30
240
10.20
298
98%
c6288
12570
1.67
46
2.75
102
6.53
301
8.50
327
10.45
355
97%
c432
routing. Also, the use of simple cost functions is more suitable when, as in this case, the
speed of the optimization algorithm is important. Nevertheless, the performances obtained
by the test pattern generator shows that the model provided by our cost function is good
enough.
4.
Extensions to other partitioning problems
As discussed above, the MSATS algorithm was proposed as a first step in the application of
parallel test pattern generation. Its field of application was thus restricted to combinatorial
circuits or their equivalent representation as acyclic directed graphs. A further requirement
was that the partitions must be contiguous (Section 2, condition 4).
Therefore, to compare this technique with other approaches, we extended the original MSATS to create RMSATS (Refinement of MSATS), which can be applied in graphs
without the above restrictions. The main differences with respect to MSATS are as
follows:
1. The input format to read and store the information of the graph was adapted to the public
domain benchmarks currently used [30] to compare different packages.
2. The initial solution was modified slightly. In MSATS, an input (output) partitioning
algorithm is used where the circuit is traversed in a depth-first way starting from the
inputs (outputs). However, in the new graph configuration, this strategy is not possible
as primary inputs (outputs) are not available, or at least are not so specified. Therefore, we
sought alternatives, and have implemented a growing algorithm where a width traverse
of the graph is performed, beginning with a randomised vertex.
3. We included a refinement step at the end of the algorithm. In this step, the balance
objective is considered for optimisation in a greedy way, without worsening the first
objective (edge cut).
4. For comparison with other approaches, the factor θ (imbalance tolerance) was fixed
at 0.03, while in MSATS it was fixed in the range of 0.2 to 0.3. This is because it
was considered more important to reduce the cut weight considerably (interprocessor
communication), to the detriment of node balance.
A MIXED HEURISTIC FOR CIRCUIT PARTITIONING
4.1.
335
Experimental results
For the RMSATS experimental results, we used the ISCAS’85 benchmarks adapted to the
new format, together with public domain benchmarks currently used to compare different
approaches. None of the graphs have vertex or edge weights. Table 1 lists all the graphs,
their sizes, the maximum, minimum and average angle of the vertices (number of adjacent
vertices) and the size in KB. The test graphs were chosen to conform a representative sample
of small to large scale real-life problems, and include both 2D and 3D examples of nodal
graphs, dual graphs, 3D semi-structured graphs and other non-mesh like combinatorial
circuits (ISCAS and add32).
We compared the results of RMSATS with those produced by a public domain partitioning package, METIS 4.0 [21]. The algorithms chosen were pmetis (Multilevel recursive
bisection) and kmetis (Multilevel k-way partitioning).
A large number of experiments were carried out to adjust different parameters of the
algorithm, including initial temperature t0 and tfactor. The values for tfactor depend on
the size of the graph and the run time of the algorithm. From small to large graphs, we
chose a tfactor in the range of 0.999 to 0.9999 respectively, with an initial temperature of 2.
This initial temperature is lower than in MSATS because we have larger graphs and we can
explore the solution space without wasting time on solutions that only worsen the objective
function.
Table 4 shows the results obtained from using RMSATS for 5 values of K (number of
partitions). The table also shows the total weight of cut edges and the imbalance for the
RMSATS, growing (initial solution of RMSATS), pmetis and kmetis algorithms.
It is important to note that graph partitioning algorithms can usually find higher quality
partitions if the balancing constraint is relaxed slightly. Indeed, some of the public domain
partitioning packages, such as JOSTLE and METIS have an in-built, although adjustable,
imbalance tolerance of 3%. For this reason, we used the same value in RMSATS, although
the algorithm allows a deviation factor (Eq. (1)) of 2%. Thus, the total imbalance tolerance
is at worst 5%. However, after the refinement step, this imbalance is reduced to 0% in some
cases. Note, too, that since pmetis uses recursive bisection, and thus produces partitions
with perfect balance, the cut weight is worse than that achieved by kmetis and RMSATS
in most cases.
The results given in Table 4 show that RMSATS improves the total weight of cut edges
in many cases, even in larger graphs. On the other hand, run times were longer in every case
with RMSATS. This range varied from one to three orders of magnitude. Nevertheless, run
times were of the order of minutes, or several hours in the worst case (other algorithms take
weeks for best cut edge quality [30]). Thus, as indicated in [30], some applications prefer
better quality solutions for the cut edge objective even at the cost of a slight increase in the
imbalance or in the run time in this phase, since this implies a decrease in the total time
required by a parallel application (i.e. a reduction in interprocessor communication time),
as in parallel test pattern generation.
It is not easy to identify a trend in the results obtained, other than that RMSATS does
particularly well when the value of K increases (i.e. 16 or 32 partitions) in most graphs, and
does badly in the case of the add32.
336
GIL ET AL.
Number of cuts (Cuts) and imbalance (Desv.) for different algorithms (pmetis, kmetis,
growing and RMSATS) with all the bechmarks in Table 1 for different values of partitions (K).
Table 4.
K =2
gra
c432
c499
c880
Algorithm
c3540
c5315
c6288
add20
uk
K =4
Desv.
Cuts
K =8
Desv.
Cuts
K = 16
Desv.
Cuts
K = 32
Desv.
Cuts
Desv.
pmetis
36
0
61
0
80
1
119
10
171
21
kmetis
39
1
60
3
87
1
147
37
179
64
growing
98
0
291
0
363
0
397
0
415
0
RMSATS
33
1
56
1
76
3
103
0
147
0
pmetis
23
0
61
1
92
5
140
5
188
5
kmetis
20
4
62
2
92
3
167
20
211
211
growing
109
0
268
0
387
0
440
0
473
0
RMSATS
20
4
55
5
83
5
131
5
185
1
pmetis
24
0
60
1
100
1
153
5
211
8
24
kmetis
c1355
Cuts
44
3
60
3
93
2
152
2
219
growing
135
0
331
0
596
0
746
0
802
0
RMSATS
19
1
50
5
84
4
132
3
198
0
pmetis
24
0
59
0
101
0
134
4
204
6
kmetis
22
1
68
2
97
4
144
9
218
13
growing
305
0
627
0
974
0
1173
0
1262
0
RMSATS
28
2
59
4
81
5
115
4
184
4
pmetis
75
0
147
0
289
0
384
1
566
2
kmetis
92
3
184
3
264
3
417
3
577
2
growing
300
0
952
0
1890
0
2783
0
3079
0
RMSATS
66
0
141
2
218
4
329
5
490
4
pmetis
87
0
164
0
250
0
388
1
554
1
kmetis
77
1
159
3
273
3
387
4
554
3
growing
531
0
1316
0
2896
0
3980
0
4426
0
RMSATS
83
0
169
3
229
4
326
5
483
5
pmetis
153
0
257
0
384
0
504
1
632
2
kmetis
161
3
265
1
408
1
507
3
661
2
growing
1297
0
3065
0
4214
0
4913
0
5194
0
RMSATS
134
0
246
5
326
2
416
5
561
5
pmetis
725
0
1292
0
1907
0
2504
1
3008
3
kmetis
719
3
1257
3
1857
5
2442
7
3073
74
growing
1549
0
2811
0
3696
0
4577
0
4940
0
RMSATS
701
3
1493
5
1769
5
2157
5
2466
4
pmetis
23
0
67
0
101
0
189
0
316
1
kmetis
36
3
64
3
98
2
189
3
302
3
growing
97
0
221
0
469
0
970
0
1971
0
RMSATS
30
0
69
1
89
4
161
5
259
5
(Continued on next page.)
337
A MIXED HEURISTIC FOR CIRCUIT PARTITIONING
Table 4.
(Continued ).
K =2
gra
add32
Algorithm
fe 4elt2
4elt
Cs4
cti
Desv.
Cuts
Desv.
Cuts
K = 32
Desv.
Cuts
Desv.
21
0
42
0
81
0
128
0
288
1
28
2
44
3
102
3
206
3
352
15
1067
0
2968
0
5166
0
5866
0
6169
0
41
1
119
5
191
5
285
5
349
5
pmetis
108
0
231
0
388
0
665
0
1093
0
kmetis
97
1
213
2
403
2
651
2
1096
2
growing
239
0
724
0
1487
0
3015
0
6147
0
RMSATS
pmetis
90
218
0
0
201
480
4
0
340
842
5
0
567
1370
5
0
959
2060
5
1
2
kmetis
233
2
454
2
806
3
1350
3
2080
growing
300
0
770
0
1493
0
2988
0
6018
0
RMSATS
227
2
427
5
727
5
1168
5
1818
5
135
0
406
0
719
0
1237
0
1891
0
3
whitaker3 pmetis
crack
Cuts
K = 16
pmetis
RMSATS
data
Desv.
K =8
kmetis
growing
3elt
Cuts
K =4
kmetis
133
3
446
3
769
3
1200
3
1824
growing
304
0
728
0
1642
0
3449
0
7019
0
RMSATS
128
0
385
5
687
5
1093
5
1683
5
pmetis
187
0
382
0
773
0
1255
0
1890
0
3
kmetis
225
1
408
3
809
3
1218
3
1882
growing
399
0
935
0
2048
0
4196
0
8403
0
RMSATS
184
0
363
2
685
4
1098
5
1689
5
pmetis
130
0
359
0
654
0
1152
0
1787
0
kmetis
132
0
398
3
684
3
1149
3
1770
3
growing
244
0
814
0
1823
0
3685
0
7550
0
RMSATS
130
0
349
0
622
4
1004
5
1663
5
pmetis
154
0
406
0
635
0
1056
0
1769
0
kmetis
225
0
344
2
614
3
1099
3
1784
3
growing
735
0
1848
0
4205
0
8684
0
17445
0
RMSATS
187
0
437
3
649
5
945
5
1534
5
pmetis
414
0
1154
0
1746
0
2538
0
3579
0
kmetis
410
0
1173
3
1677
3
2521
3
3396
3
0
27983
growing
1277
0
3489
0
7672
0
15685
RMSATS
377
0
1018
1
1484
4
2245
4.6
3056
4.9
0
pmetis
334
0
1113
0
2110
0
3181
0
4605
0
kmetis
395
2
1132
1
2130
2
3451
3
4713
3
growing
1698
0
4446
0
9548
0
19402
0
34918
0
RMSATS
588
1
1177
5
1853
5
2875
5
4059
5
(Continued on next page.)
338
Table 4.
GIL ET AL.
(Continued ).
K =2
gra
memplus
fe sphere
wing nodal
wing
fe ocean
598a
wave
Desv.
Cuts
K = 16
Desv.
Cuts
K = 32
Cuts
Desv.
Cuts
pmetis
6337
0
10559
0
13110
0
14942
0
17303
0
kmetis
6453
3
10483
3
12615
3
14604
6
16821
6
Desv.
Cuts
Desv.
growing
13433
0
23063
0
27957
0
30818
0
33417
0
RMSATS
8570
4
12438
5
13640
5
15301
5
16269
5
440
0
872
0
1330
0
2030
0
2913
0
2
pmetis
kmetis
444
0
903
3
1306
3
2012
2
2842
growing
472
0
1342
0
2914
0
6050
0
12294
0
RMSATS
386
0
774
4
1226
4
1726
5
2511
5
1820
0
4000
0
6070
0
9290
0
13237
0
3
pmetis
kmetis
1855
0
4355
2
6337
3
9465
3
12678
growing
5368
0
14118
0
29960
0
39425
0
44408
0
RMSATS
1723
0
3816
5
5419
5
8392
5
11741
5
950
0
2086
0
3205
0
4666
0
6700
0
3
pmetis
909
1
1943
3
3120
3
4652
3
6613
growing
2785
0
7062
0
15375
0
31486
0
63456
0
RMSATS
1353
0
2069
5
2877
5
4254
5
5938
5
12427
0
21471
0
28177
0
37441
0
46112
0
3
pmetis
kmetis
11952
1
23141
2
29640
3
38673
3
45613
growing
27835
0
64754
0
88603
0
105714
0
118650
0
RMSATS
11604
0
20500
5
30572
5
36487
5
44678
5
505
0
2039
0
4516
0
9613
0
14613
0
3
pmetis
kmetis
fe tooth
K =8
Algorithm
kmetis
vibrobox
K =4
536
0
2194
2
5627
2
10253
3
16604
growing
2376
0
7938
0
17855
0
37430
0
76065
0
RMSATS
464
0
2349
1
5481
5
9051
5
13923
5
pmetis
4292
0
8577
0
13653
0
19346
0
29215
0
kmetis
4262
2
7835
3
13544
3
20455
3
28572
3
growing
7785
0
32262
0
71076
0
145539
0
218804
0
RMSATS
4086
0
11817
5
13152
5
21562
5
26981
5
pmetis
2504
0
8533
0
17276
0
28922
0
44760
0
kmetis
2533
0
8495
1
17137
3
28647
3
44398
3
growing
26129
0
69388
0
145234
0
181053
0
243981
0
RMSATS
3881
0
12603
5
22734
5
27976
5
40896
5
pmetis
9493
0
23032
0
34795
0
48106
0
72404
0
kmetis
9655
0
21682
2
33146
3
48183
3
67860
3
growing
32380
0
85289
0
184382
0
381418
0
514427
0
RMSATS
12744
0
21390
5
38522
5
51920
5
64983
5
A MIXED HEURISTIC FOR CIRCUIT PARTITIONING
5.
339
Conclusions
In this paper we have developed a new algorithm for the circuit partitioning problem in
the framework of parallel circuit testing. More specifically, its use has been included in a
parallel test pattern generator based on the partitioning of the circuit to test. The problem is
formulated as a combinatorial optimization problem by using a cost function comprising the
contribution of the number of cuts and the deviation with respect to a balanced distribution
of the gates among the different subcircuits.
The new algorithm, called MSATS, has been proposed to obtain good solutions in a small
amount of time. It reduces the possibility of cycles in the search process by applying the
Tabu Search characteristics to a Simulated Annealing algorithm. The results provided show
that MSATS outperforms the TS and SA algorithms when applied to the same cost function.
To compare with other state of the art approaches, an extended version have been included
and the results show that this technique is comparable in terms of solution quality and even
run time.
Acknowledgments
We thank Dr. Inmaculada Garcı́a for critically reading the manuscript and making several
useful remarks. This work has been supported by project TIC2000-1348 (CICYT, Spain).
References
1. E. Aarts and J. Korst, Simulated Annealing and Boltzmann Machines. A Stochastic Approach to Combinatorial
Optimization and Neural Computing, John Wiley & Sons: New York, 1990.
2. C.J. Alpert and A. Kahng, “Recent developments in netlist partitioning: A survey,” Integration: The VLSI
Journal, vol. 19, no. 1/2, pp. 1–81, 1995.
3. A.A. Andreatta and C.C. Ribeiro, “A graph partitioning heuristic for the parallel pseudo-exhaustive logical
test of VLSI combinational circuits,” Annals of Operations Research, vol. 50, pp. 1–36, 1994.
4. S. Areibi and A. Vannelli, “Advanced search technique for circuit partitioning,” in DIMACS Series in Discrete
Mathematics and Theoretical Computer Science, vol. 16, pp. 77–98, 1993.
5. P. Banerjee, Parallel Algorithms for VLSI Computer Aided Design, Prentice Hall; Englewoods Cliffs, NJ,
1994.
6. F. Brglez and H. Fujiwara, “Neural netlist of ten combinational benchmark circuits and a target translator in
FORTRAN,” in Proc. IEEE Int. Symp. Circuits Syst., Special Session ATPG, 1985.
7. J.A. Chandy and P. Banerjee, “A parallel circuit-partitioned algorithm for timing-driven standard cell placement,” Journal of Parallel and Distributed Computing, vol. 57, pp. 64–90, 1999.
8. J. Cong and M. Smith, “A parallel bottom-up clustering algorithm with applications to circuit partitioning in
VLSI design,” in Proc. ACM/IEEE Design Automation Conference, 1993, pp. 755–760.
9. K.A. Dowsland, “Simulated annealing,” in Modern Heuristic Techniques for Combinatorial Problems,
C.R. Reeves (Ed.), Blackwell: London, 1993, pp. 20–69.
10. C. Fiduccia and R. Mattheyses, “A linear time heuristic for improving network partitions,” in Proc. 19th IEEE
Design Automation Conference, pp. 175–181, 1982.
11. M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness,
W.H. Freeman: San Francisco, 1979.
12. C. Gil and J. Ortega, “A parallel test pattern generator based on Reed-Muller spectrum,” in Euromicro Workshop on Parallel and Distributed Processing, IEEE, pp. 199–204, 1997.
340
GIL ET AL.
13. C. Gil and J. Ortega, “Algebraic test-pattern generation based on the Reed-Muller spectrum,” IEE Proceding
Computers and Digital Techniques, vol. 145, no. 4, pp. 308–316, 1998.
14. C. Gil, J. Ortega, and M.G. Montoya, “Parallel VLSI test in a shared memory multiprocessors,” Concurrency:
Practice and Experience, vol. 12, no. 5, pp. 311–326, 2000.
15. J. Gilbert, G. Miller, and S. Teng, “Geometric mesh partitioning: Implementation and experiments,” in Proceedings of International Parallel Processing Symposium, 1995.
16. F. Glover and M. Laguna, “Tabu Search,” in Modern Heuristic Techniques for Combinatorial Problems, C.R.
Reeves (Ed.), Blackwell: London, 1993, pp. 70–150.
17. T. Goehring and Y. Saad, “Heuristic algorithms for automatic graph partitioning,” Technical Report UMSI94-29. University of Minnesota Supercomputing Institute, 1994.
18. S.W. Hadley, B.L. Mark, and A. Vanelli, “An efficient eigenvector approach for finding netlist partitions,”
IEEE Trans. on Computer-Aided Dessign, vol. 11, no. 7, pp. 885–892, 1992.
19. B. Hendrickson and R. Leland, “A multilevel algorithm for partitioning graphs,” in Proceedings Supercomputing ’95, ACM Press, 1995.
20. D.S. Johnson, C.R. Aragon, and L.A. McGeogh, “Optimization by simulated annealing: An experimental
evaluation, Part I: Graph partitioning,” Operations Research, vol. 37, pp. 865–892, 1989.
21. G. Karypis and V. Kumar, “Multilevel K-way partitioning scheme for irregular graphs,” Journal of Parallel
and Distributed Computing, vol. 48, no. 1, pp. 96–129, 1998.
22. B.W. Kernighan and S. Lin, “An efficient heuristic procedure for partitioning graphics,” The Bell Sys. Tech.
Journal, pp. 291–307, 1970.
23. R.H. Klenke, R.D. Williams, and J.H. Aylor, “Parallel-processing techniques for automatic test pattern generation,” IEEE Computer, pp. 71–84, 1992.
24. V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing. Design and Analysis of
Algorithms, The Benjamin/Cummings Publishing, 1994.
25. V. Kumar, A. Grama, and V.N. Rao, “Scalable load balancing techniques for parallel computers,” Journal of
Distributed and Parallel Computing, vol. 22, pp. 60–79, 1994.
26. C.R. Reeves, “Genetic algorithms,” in Modern Heuristic Techniques for Combinatorial Problems, C.R. Reeves
(Ed.), Blackwell: London, 1993, pp. 151–196.
27. L.A. Sanchis, “Multiple-way network partitioning with different cost functions,” IEEE Trans. on Comp.,
vol. 42, no. 12, pp. 1500–1504, 1993.
28. K. Schloegel, G. Karypis, and V. Kumar, “Graph partitioning for high performance scientific simulations,” in
CRPC Parallel Computing Handbook, Morgan Kaufmann: San Matio, CA, 2000.
29. I. Shperling and E.J. McCluskey, “Circuit segmentation for pseudo-exhhaustive testing via simulated annealing,” in International Test Conference IEEE, 1987.
30. A.J. Soper, C. Walshaw, and M. Cross, “A combined evolutionary search and multilevel optimisation approach
to graph partitioning,” Mathematics Research Report 00/IM/58, University of Greenwich, 2000.
31. C. Walshaw and M. Cross, “Mesh partitioning: A multilevel balancing and refinement algorithm,” SIAM J.
Sci. Comput., vol. 22, no. 1, pp. 63–80, 2000.