Computational Optimization and Applications, 23, 321–340, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. A Mixed Heuristic for Circuit Partitioning C. GIL∗ Departamento de Arquitectura de Computadores y Electrónica, Universidad de Almerı́a, La Cañada de San Urbano s/n, 04120 Almerı́a, SPAIN [email protected] J. ORTEGA Departamento de Arquitectura y Tecnologı́a de Computadores, Universidad de Granada, Campus de Fuentenueva, Granada, SPAIN [email protected] M.G. MONTOYA AND R. BAÑOS Departamento de Arquitectura de Computadores y Electrónica, Universidad de Almerı́a, La Cañada de San Urbano s/n, 04120 Almerı́a, SPAIN [email protected] Abstract. As general-purpose parallel computers are increasingly being used to speed up different VLSI applications, the development of parallel algorithms for circuit testing, logic minimization and simulation, HDL-based synthesis, etc. is currently a field of increasing research activity. This paper describes a circuit partitioning algorithm which mixes Simulated Annealing (SA) and Tabu Search (TS) heuristics. The goal of such an algorithm is to obtain a balanced distribution of the target circuit among the processors of the multicomputer allowing a parallel CAD application for Test Pattern Generation to provide good efficiency. The results obtained indicate that the proposed algorithm outperforms both a pure Simulated Annealing and a Tabu Search. Moreover, the usefulness of the algorithm in providing a balanced workload distribution is demonstrated by the efficiency results obtained by a topological partitioning parallel test-pattern generator in which the proposed algorithm has been included. An extented algorithm that works with general graphs to compare our approach with other state of the art algorithms has been also included. Keywords: circuit partitioning, optimisation, parallel test pattern generation, simulated annealing, Tabu Search 1. Introduction The circuit partitioning problem arises in many VLSI applications [2, 24]. Due to the increasing complexity of VLSI circuits, the NP-complete [11] character of many VLSI CAD problems makes a “divide and conquer” approach more attractive to solve these problems in reasonable periods of time by parallel processing, and to handle arbitrarily large circuits, that may not fit in the memory of standard workstations, on distributed memory multiprocessors. The usefulness of parallel processing to speed up the resolution of VLSI CAD problems and to address the circuit storage problems has been considered in the recent literature on circuit testing, logic synthesis, cell placement, etc. [5, 7]. In this way, circuit partitioning has become an important previous step in VLSI CAD applications [7]. It appears when trying to exploit the concurrency in the target circuit (data ∗ Author to whom correspondence should be addressed. 322 GIL ET AL. parallelism) instead of exploiting the concurrency of the algorithm ( functional parallelism) [24]. In any parallel application, the workload distribution among the processors of a parallel computer is an important factor for efficient use of the parallel computer. For some applications it is difficult to provide a graph model for the processing and communication volumes corresponding to the tasks of a program, and the usefulness of procedures for the workload distribution based on graphs is reduced. In such circumstances, a dynamic load-balancing procedure is required [25]. However, the testing application we are interested in is usually based on applying a given procedure to the different circuit elements (logic gates and connections between them). Thus, as the data structure of the algorithm is defined by the corresponding netlist, it is relatively easy to describe the program by a graph. The volume of processing associated with each node is that corresponding to the application of the algorithm to the elements of the circuit allocated to a given processor, and the communication cost results from transferring data between processors with interconnected subcircuits allocated. Due to these characteristics, it is very useful to possess efficient algorithms for circuit partitioning because these would allow a balanced distribution of the workload among processors. Particularly interesting are those algorithms that can be applied to irregular and sparse graphs, as these are the graphs normally associated with digital circuits. In this paper we present a procedure for circuit partitioning in the context of parallel test pattern generation. In a moderate time, the algorithm is able to find a partitioning of the circuit graph so that the parallel overall run time of the test generation process is minimised. This implies both maximizing the processor’s concurrency and minimizing the communication overhead, thus the objective function that we have used simultaneously takes these two objectives into account. Several approaches for circuit partitioning have been reported [3, 4, 8, 10, 15, 17–19, 21, 22, 27, 29, 31]. They can be classified as combinatorial or move-based approaches [10, 17, 22], approaches based on geometric representations [15, 18], multilevel and hierarchical clustering [8, 19, 21, 31] and hybrid schemes [4] that combine diffrentes types of approaches and can maximize the advantages of them. The procedure here proposed belongs to the class of move-based approaches and it is also a hybrid scheme in which the solution is built iteratively from an initial solution by applying a move or transformation to the current solution. The set of possible transformations that can be applied to a given solution defines the neighbourhood structure of the solution space, which is explored repeatedly moving from the current solution to a neighbouring one. These move-based procedures are simple to describe and implement, and thus, this kind of procedures is the most frequently used, together with a multilevel approach in some cases. The move-based procedures include iterative improvement methods [10, 22, 27], which move from the current solution to the best solution in its neighbour, and stochastic hilldescending procedures such as those based on Simulated Annealing (SA) [1], Tabu Search (TS) [16], and Genetic Algorithms (GA) [26], which allow movements towards solutions worse than the current one in order to escape from local minima. Iterative improvement algorithms such as the algorithms of Kernighan-Lin (KL) [22] and Fiduccia-Mattheyses (FM) [10] for graph bipartitioning, and [19, 21, 27, 31] for partitioning into multiple blocks, are widely applied and have almost become standards, their results frequently being used for comparison with other methods. The stochastic hill-descending A MIXED HEURISTIC FOR CIRCUIT PARTITIONING 323 procedures previously indicated are metaheuristics that allow the user to define the time complexity by deciding to trade solution quality for speed. Thus, these metaheuristics are very suitable when it is important to have a good solution in a limited amount of time. This is the situation when the partitioning algorithm is used to distribute the load in a parallel program because a large amount of time to obtain it will limit the efficiency. SA is used in [29] to find near optimal solutions to the problem of partitioning a digital combinational circuit for pseudo-exhaustive testing. An algorithm for circuit partitioning based on the TS metaheuristic, also applied to pseudo-exhaustive testing, was reported in [3]. Nevertheless, the goals of this application are different from those considered in the present paper. In [3], the partitioning problem involves the division of the target circuit into non-overlapping subcircuits with no more than a given number of inputs each, and is subject to some connectivity constraints. In our case the number of inputs to each subcircuit is not limited. In [20], SA is compared with iterative algorithms and SA is found to outperform the KL algorithm for geometric and random graphs. However, it is suggested that multiple runs of KL with random initial solutions would be better than SA for the kind of graphs that are applied to circuit netlists. SA is not usually seen as an effective approach for VLSI applications because the computing times are quite large. For example, at low temperatures, many candidate solutions are explored and rejected before accepting an improved solution. Instead, TS uses a tabu list to avoid cycling near local optima and to enable moves towards worse solutions. Thus, it is usually argued that TS is able to explore the solution space more efficiently than SA because it does not waste time in previously visited regions of the solution space. In any case, these approaches need not necessarily be seen as opposed and can be combined to obtain an improved procedure without the drawbacks characteristic of each method. Thus, hybrid methods has been proposed such as [4], in which a heuristic that mixes Tabu Search and genetics algorithms is applied to the circuit partitioning problem, and a classification of hybrid algorithms is also provided. A hybrid method that allows the temperature parameter to be strategically manipulated, rather than progressively diminished, has been shown to yield an improved performance over standard SA approaches [9]. The algorithm here proposed can also be considered as a hybrid heuristic with additional elements of a Tabu Search in a Simulated Annealing algorithm; thus it is termed Mixed Simulated Annealing and Tabu Search algorithm (MSATS). Then, a large number of graph partitioning schemes exist and they differ in the edgecut quality produced, run time, degree of parallelism and applicability to certain kinds of graphs. Often, it is not clear as to which scheme is better under what scenarios. In [28], these properties are categorized for some graph partitioning algorithms and in [30], the edge-cut quality is analized with different benchmarks for some graph partitioning packages. In the following, Section 2 gives a more precise definition of the circuit partitioning problem and describes the cost function to optimize used in this paper. The description of the proposed algorithm (MSATS) is provided in Section 3, along with the experimental results for circuit partitioning in the context of parallel test pattern generation. Section 4 extends the original MSATS to work with graphs that are not restricted to directed acyclic graph (combinatorial circuits) and Finally Section 5 presents conclusions of the paper. 324 2. GIL ET AL. The circuit partitioning problem The circuit partitioning problem consists of finding a decomposition of the target circuit into non-overlapping subcircuits with at least one logical gate in each subcircuit. Among the different objectives that may be satisfied by the desired partitioning are (i) the minimization of the number of cuts, (ii) the minimization of the number of subcircuits, and (iii) the minimization of the deviation in the number of elements (inputs, logical gates, outputs and fanouts points) assigned to each partition. Criterion (i) corresponds to minimizing the communication cost, since cutting a line usually implies passing data between the processors where the subcircuits connected by the cut line have been assigned. Criterion (ii) is used when the goal is to determine the partition consuming less resources (processors in this case). Finally, criterion (iii) corresponds to the obtention of subcircuits of similar sizes in order to get a balanced workload distribution among processors. As our goal is to use all the available processors in the multicomputer to generate the test patterns in parallel while trying to keep all the processors working during all the run time, the number of subcircuits is fixed to be equal to the number of available processors in the machine, and the objectives correspond to criteria (i) and (iii). This means obtaining subcircuits with similar sizes to balance the workload of the processor (considered as proportional to the number of nodes), and minimizing the number of cuts. In the following, a mathematical formulation of the problem is provided. Let G = (X , A) be the directed acyclic graph associated with a combinational circuit C, where X denotes the set of components (inputs, logical gates, and outputs) and A the set of lines used for signal propagation. The nodes of X can be classified as inputs, logical gates and outputs of circuit C. Thus X is the union of three disjoint sets, the set of inputs E, the set of logical gates P(nodes), and the set of outputs O. Figure 1 shows an example circuit with its graph representation. e1 e2 p1 p3 e3 e4 p2 e5 e6 e1 p5 p6 p1 e2 p7 o1 e3 o2 e4 e5 e6 p4 p3 p5 p6 p2 p7 p4 e7 e7 e8 o2 e8 (a) o1 (b) Figure 1. Representation of a combinational circuit with 7 logic gates (a) and the directed graph associated with it (b). A MIXED HEURISTIC FOR CIRCUIT PARTITIONING 325 The problem is to find a partition of X into a fixed number of K subsets X k , (k = 1,. . . ,K ) such that each induced subgraph satisfies the following conditions: ◦ K 1. X = k=1 X k and X k X h = ✪, ∀k = h, (k, h) ∈ {1, . . . , K }2 2. pk = |X k ∩ P| = ✪ ∀k = 1, . . . , K 3. L i ≤ pk ≤ L u , with L i = n/k − n/k*θ and L u = n/k + n/k*θ, k = 1, . . . , K ; where n = |P|; n/k represents the number of gates that should be included in each subcircuit to obtain a partition of similar sized subcircuits; and θ is the parameter representing the proportion of gates that is tolerated as a imbalance tolerance with respect to n/k or perfect balance. In the first part of this work (MSATS), θ has been set to values between 0.2 and 0.3. and 0.03 in the second part (RMSATS). 4. G k (X k , Ak ) is a connected graph, ∀k = 1, . . . , K . ◦ In this way, the problem can be formulated as a combinatorial optimization problem, which means: minimize c (s) (or maximize c (s)), subject to s ∈ S where S is a discrete set of feasible solutions and c(s) is the cost or objective function. Thus, solving a combinatorial optimization problem implies finding the best or optimal solution among a finite or countably infinite number of possible solutions. In [29] the formulation of the partitioning problem as an integer linear programming problem and as a quadratic transportation problem is provided. Here, given a circuit graph G = (X, A), the cost function to minimize is defined as: c(s) = α · n cuts(s) + β · K 2deviation(k) (1) k=1 where deviation(k) is the amount by which the number of gates in the subcircuit G k varies from the bounds L u or L i deviation(k) = maximum {0, |X k ∩ P| − L u , L i − |X k ∩ P|} n cuts(s) is the number of cuts of the solution, and s is any solution to the circuit partitioning problem, feasible or not, i.e. verifying the above condition 3 or not. Thus, whenever for all k = 1, . . . , K , in a given partition s, the deviation in the number of gates of G k with respect to n/k is less than θ · n/k., the solution is feasible and the cost is c(s) = α · n cuts(s) + β · K since deviation(k) = 0 (k = 1, . . . ,K ).The second term in (1) penalizes the deviation from the feasible solution space and its magnitude is determined by the constant β. Nevertheless, according to the relative magnitude of α and β, a transition to a solution s determining a deviation higher than θ · n/k in the number of gates of any subcircuit still reduces the cost function if the reduction in the number of cuts is sufficiently high. For example, if α = A · β is set, a transition to a solution with a reduction of one in n cuts(s) K 2deviation(k) ) − K ) is less will still produce a reduction in the cost function whenever (( k=1 than the factor A. 326 GIL ET AL. The proposed cost function does not take into account the connection topology of the multicomputer where the parallel program is executed. In the present paper the communication cost has been assumed to depend only on the volume of communications between the processors, thus considering equal the distance between processors. Nevertheless, this assumption is indeed verified in commercial architectures that use a specific hardware or a software layer for message routing, providing homogeneous latency between processors. Moreover, the specific characteristics of the cost function do not influence the evaluation of the proposed algorithm, or the comparison with SA and TS because, as indicated in Section 1, these are meta-heuristics which do not depend on the characteristics of the cost function considered. In the following sections, the description of the procedure proposed to solve this combinatorial optimization problem is described. As the procedure is based on a mixing of Simulated Annealing and Tabu Search techniques, a brief introduction to the notation and terminology is first given. 3. The MSATS algorithm As has been said, move-based procedures for solving combinatorial optimization problems, such as the partitioning problem, implement a local search in the solution space S starting from an initial solution s0 ∈ S. At each iteration, a heuristic is used to obtain a new solution s in the neighbourhood, N (s), of the current solution of s, through applying transformations, or moves, to s. Every feasible solution, s̄ ∈ N (s), is evaluated according to the cost function c(s̄) to be optimized, thus determining a change in the value of the cost function, move value = c(s̄) − c(s). The basic local search approach corresponds to the so-called hill-descending algorithms, in which a monotone sequence of improving solutions is examined, until a local optimum is found. Hill-descending algorithms always stop at the first local optimum. To avoid this drawback, several metaheuristics have been proposed in the literature, such as Simulated Annealing [1] and Tabu Search [16]. These use mechanisms that allow moves which increase the cost of the current solution as an attempt to escape from local optima. Simulated Annealing and Tabu Search have been widely applied to partitioning circuits and many other combinatorial optimization problems. The MSATS (Mixed Simulated Annealing Tabu Search) procedure described in this section, is a hybrid method that takes advantage of both meta-heuristics to outperform the results provided by each. At each iteration of MSATS, admissible moves are applied to the current solution allowing transitions that increase the cost function as in Simulated Annealing. When a move increasing the cost function is accepted, the reverse move should be forbidden during some iterations in order to avoid cycling, as in Tabu Search. The restrictions in the admissible moves are implemented by using a short term memory function which determines how long a tabu restriction will be enforced and the admissible moves at each iteration. The MSATS algorithm is shown in figure 2. It adds the characteristics of the search implemented by Simulated Annealing to the features of Tabu Search which correspond to a search that centres more on specific zones according to the history and the best movements applied. Thus, a powerful algorithm is provided, as the results given in Subsection 3.1 show. A MIXED HEURISTIC FOR CIRCUIT PARTITIONING Figure 2. Algorithm MSATS for circuit partitioning. 327 328 GIL ET AL. In MSATS, the temperature t is used as a parameter to control the probability of accepting a new solution, as in Simulated Annealing. At a given temperature, only the solutions which are selected by using the SA cooling schedule are considered as candidates to produce a transition. Thus, a certain randomness is introduced into a pure TS, in order to explore zones of the solution space that do not appear as very promising at first. As the algorithm also has the characteristics of a Tabu Search, it avoids the cycles around local minima, allowing a more efficient exploration of the solution space without revisiting solutions, as may occur in a pure SA. Two different initial solutions, s1 and s2, have been used in our experiments. They are obtained by fast algorithms that assign n/K nodes to each partition. The initial solution s1 is obtained by an algorithm named Input Partitioning, in which the circuit graph is traversed in a depth-first way starting from the inputs. Solution s2 is provided by an algorithm called Output Partitioning which processes the graph in a depth-first manner from the output nodes. These partitioning algorithms have been applied for circuit partitioning in parallel logic simulations and they take O(n) time and obtain partitions in which strongly connected components of the graph are assigned to the same partition. As shown in Subsection 3.1, the quality of the best solution found with MSATS (and also with SA and with TS) is sometimes affected by the choice of one or another initial solution. Nevertheless, the best initial solution depends on the specific circuit considered and on the control parameters. In many cases both initial solutions provide similar results. Figure 3 shows two examples of initial solutions s1 (figure 3(a)) and s2 (figure 3(b)) for the circuit of figure 1. In the application of MSATS to the partitioning problem, the neighborhood N(s) of the current solution s contains all solutions s̄ which may be obtained from s by transferring only one gate from one subcircuit (source subcircuit) to an other circuit (destination subcircuit). The gate that is transferred must belong to the boundary of the corresponding subcircuit, which contains all the gates connected at least to one gate belonging to a different subcircuit. These gates are called boundary gates. Moreover, the destination subcircuit must be one of c1 e4 p5 p 6 p o1 c2 e8 e5 e6 Subcircuit 1 e1 e2 p 1 e1 e2 7 p3 p2 p 4 o2 e7 c1 Subcircuit 3 e3 c2 1 p3 p5 c1 e3 e4 p2 e5 e6 e7 Subcircuit 2 p Subcircuit 1 p4 c2 o2 p c1 c2 6 p 7 o1 e8 Subcircuit 2 Subcircuit 3 (a) (b) Figure 3. A partition example using the circuit in figure 1(a), for k = 3 with s1 initial solution (a) and s2 initial solution (b). 329 A MIXED HEURISTIC FOR CIRCUIT PARTITIONING the subcircuits connected to the gate transferred. In this way, the condition 4 described in Section 2 is verified during the process. As can be seen in figure 2, the set of solutions explored at a given temperature is defined by selecting the boundary gates, according to their level in the circuit. Of course, the set of boundary gates might change during process as the solution s changes. For example, due to a move in an other boundary gate, a boundary gate would no longer be such a boundary gate. Furthermore, a no-boundary gate can become such a gate if it is connected to a boundary gate which is moved to a different subcircuit. In the first case, the gate will not be selected, and in the case of a new boundary gate, it can be selected in subsequent moves at the present temperature. In any case, as each gate can be selected only once at a given temperature, the number of solutions explored at this temperature is finite. When a boundary gate is selected it is allocated to the subcircuit having more lines connected to this boundary gate. Whenever this move is accepted and implies an increase in the cost function, the inverse move, which allocates the gate in the initial subcircuit, is included in the short term memory to tag it as a tabu move. The move is maintained in the short term memory during the next iteration of the while loop because, after one iteration, it is quite likely that the present solution has changed enough with respect to the solution to which the move was applied. At a given temperature, a boundary gate can be selected only once. Figure 4 shows an example of boundary gates movements. After selecting the boundary gate, the algorithm determines if it is possible to move it according to the problem constraints. If a move in the selected gate is allowed, the gate is allocated to the subcircuit with most gates connected to it. In this example the algorithm begins with p1 and p2 (figure 4(a)). As these gates are not boundary gates, they are not selected, and the algorithm proceeds with gate p3. Gate p3 is a boundary gate but it cannot be moved because a move in gate p3 will leave only a gate in subcircuit 2 and each subcircuit must have at least 2 gates. The same happens with gate p4. Gate 5 is a boundary gate that can be moved and it is allocated to subcircuit 2. In this case, the number of cuts remains the same because one cut appears and also one cut disappears. As p3 and p4, gate p6 cannot be moved, and finally p7 is not a boundary gate. Thus, the circuit after the first iteration of the while loop is shown in figure 4(b). c1 e4 p5 p6 c1 c2 o1 c2 e8 e1 e2 p6 p7 e5 e6 Subcircuit 1 p1 p3 c1 p2 p4 c2 e1 e2 Subcircuit 3 e3 e3 Subcircuit 2 o1 e8 o2 e7 p7 Subcircuit 1 p1 p3 e4 e5 e6 p2 p4 o2 e7 p5 Subcircuit 3 c1 Subcircuit 2 (a) c2 (b) Figure 4. An example of moving the boundary gate p5 in subcircuit 1 (a) to the subcircuit 2 (b). 330 GIL ET AL. MSATS stops when one of the following conditions is verified: (i) the temperature is equal to a final value (in this paper a temperature equal to zero has been used as final temperature), (ii) the number of moves applied without improving the best solution found so far (n failures) reaches a maximum bound of consecutive iterations (max failures), and (iii) the number of iterations reaches the value max iteration (figure 2). At high temperatures, MSATS behaves almost as a pure SA because most of the transitions are accepted, and it is very unlikely to need to select one of the transitions included in the tabu list. On the other hand, at low temperatures the solutions that increase the cost are rarely selected in an SA and the effect of Tabu Search is also small. Thus, the difference in the behaviour of MSATS with respect to a pure SA due to the use of tabu moves is more important at intermediate temperatures. The effect of the TS elements included in MSATS allows the reduction of the number of iterations to get a solution with a given quality and therefore, in the end MSATS is able to improve the quality of the solution obtained, as can be seen from the results provided in Subsection 3.1. 3.1. Experimental results Next, we summarize the results obtained by using the MSATS algorithm. The algorithm was programmed in C and executed on a Power Challenge XL (Silicon Graphics). The circuits used to evaluate the performance of the hybrid algorithm are the ISCAS’85 [6] as they are common benchmark circuits in the context of test pattern generation. The first seven rows of Table 1 (c432, . . . ,c6288) shows the basic characteristics of these circuits. The values of the parameters are set to their best values according to previous experimental results obtained, thus max failures is set to 0.25*(max iterations), and tfactor to 0.999. The value of max iterations is usually taken as 1000, and the initial temperature, t0 as 10. For these values, the cost function reaches a stable final value at the end of the 1000 iterations. An increase in the number of iterations, in most of cases, does not allow us to get better solutions unless the value of tfactor is increased. Nevertheless, this also implies an increase in the run time and, except for extremely large circuits, the new solution does not imply a great improvement. Table 2 shows the best results obtained by the Simulated Annealing, Tabu Search and MSATS compared with the initial solution s1 for partitions (K ) of 2, 4, 8, 16 and 32 subcircuits. The row cut reduction indicates the average of the reduction in the number of cuts obtained by MSATS with respect to the best solution of those provided by SA and TS in each case. As can be seen the results obtained by MSATS are better than the results obtained with TS and SA in most cases, specially when the circuit size increases. In these circuits, the neighbourhood of a given solution is large, and the effect of considering tabu transitions is more important. Moreover, Table 2 compares the computing times for MSATS, TS and SA. As can be seen the times for MSATS are similar to those of Simulated Annealing and lower than those of Tabu Search. With respect to a pure TS algorithm, MSATS accepts a smaller number of solutions increasing the cost function. Thus, although the number of iterations executed by MSATS is similar to that of TS, as an iteration consumes more time in the TS algorithm, it takes longer to stop. As is shown in figure 4, although a TS algorithm is able to reach solutions 331 A MIXED HEURISTIC FOR CIRCUIT PARTITIONING Table 1. Benchmark graphs. Graph V E max min avg Tam file (KB) c432 292 432 10 1 2.96 3 c499 334 499 13 1 2.99 4 c880 594 880 9 1 2.96 6 c1355 878 1355 13 1 3.09 10 c3540 2320 3540 17 1 3.05 31 c5315 3414 5315 16 1 3.11 49 c6288 3936 6288 17 1 3.2 58 add20 2395 7462 123 1 6.23 63 data 2851 15093 17 3 10.59 140 3elt 4720 13722 9 3 5.81 136 uk 4824 6837 3 1 2.83 70 add32 4960 9462 31 1 3.82 90 whitaker3 9800 28989 8 3 5.92 294 crack 10240 30380 9 3 5.93 297 wing nodal 10937 75488 28 5 13.80 768 fe 4elt2 11143 32818 12 3 5.89 341 vibrobox 12328 165250 120 8 26.81 1679 4elt 15606 45878 10 3 5.88 501 fe sphere 16386 49152 6 4 6.00 540 cti 16840 48232 6 3 5.73 532 memplus 17758 54196 573 1 6.10 536 cs4 22499 43858 4 2 3.90 506 62032 78136 110971 121544 452591 741934 4 39 26 2 3 5 3.92 11.58 13.37 1482 5413 9030 wing fe tooth 598a fe ocean 143437 409593 6 1 5.71 5242 wave 156317 1059331 44 3 13.55 13479 with small values in the cost function, it takes a long time trying to improve that solutions without reaching the stop condition, determined by the number of moves without obtaining an improvement in the cost function and a maximum number of iterations. Furthermore, MSATS is able to obtain solutions with a more balanced distribution of gates among the different subcircuits. For example, in the partition of c432 into two subcircuits, MSATS provides subcircuits of 76 and 77 gates respectively, while TS obtains subcircuits with 65 and 88 gates, and SA subcircuits with 70 and 83. 3.2. MSATS in a parallel test pattern generator The MSATS algorithm has been used as a first step in a parallel test-pattern generator [12, 14]. It starts by applying MSATS for partitioning the circuit under test, and after 332 GIL ET AL. Table 2. A summary of results with the best obtained solutions and the times for the algorithms: Initial solution (s1), Simulated Annealing (SA), Tabu Search (TS) and MSATS. K =2 Circuit K =8 K = 16 K = 32 Cuts Sec. Cuts Sec. Cuts Sec. Cuts Sec. Cuts Sec. s1 SA TS MSATS Cuts reduction (%) 66 21 22 20 0,2 2,0 6,1 2,1 106 45 41 40 0,3 2,2 9,2 2,3 158 68 62 61 0,4 2,8 11,2 2,8 186 93 87 85 0,5 3,9 13,6 3,7 212 142 130 128 0,6 7,1 15,3 6,9 c499 s1 SA TS MSATS Cuts reduction (%) 41 17 18 16 c880 s1 SA TS MSATS Cuts reduction (%) 44 19 19 17 c1355 s1 SA TS MSATS Cuts reduction (%) 60 28 38 17 c1908 s1 SA TS MSATS Cuts reduction (%) 123 55 50 35 c3540 s1 SA TS MSATS Cuts reduction (%) 116 67 96 46 c5315 s1 SA TS MSATS Cuts reduction (%) 188 61 88 50 s1 SA TS MSATS Cuts reduction (%) 60 48 46 46 c432 c6288 Algorithm K =4 5 3 0,2 2,2 7,2 2,1 94 58 51 54 0,3 3,0 18,1 3,1 100 41 43 37 0,4 5,3 21,3 5,6 137 45 46 45 0,3 2,3 11,4 2,4 143 86 79 78 0,4 3,8 28,5 3,9 155 98 89 80 0,5 6,0 34,6 6,1 240 97 94 70 −5 6 11 210 74 76 71 0,8 15,0 49,2 15,1 333 156 179 132 1,1 20,9 80,3 21,4 332 162 172 160 30 0,7 9,5 60,2 9,5 298 125 120 108 0,9 16,2 83,2 16,9 535 250 259 221 1,4 28,1 115,3 28,4 1,5 22,5 93,4 22,9 496 350 309 174 222 129 129 129 0,7 8,1 39,4 7,9 351 98 98 98 1,8 30,8 145,7 30,6 25 225 172 175 168 0,7 6,3 42,3 6,1 300 189 185 179 0,9 11,2 47,8 10,2 394 156 148 128 1,2 20,9 76,4 20,1 771 375 377 298 1,9 25,8 116,4 25,8 44 654 301 402 286 2,3 35,1 168,5 34,2 671 450 550 355 0,7 7,5 19,5 7,2 3 0,8 8,5 53,3 8,2 4 0 450 168 156 125 5 0,6 4,1 17,1 3,9 0 0,9 11,5 68,5 10,9 334 315 320 301 2 10 1,1 16,4 61,9 15,7 14 1,2 12,9 78,5 12,1 629 234 225 205 1,5 26,5 90,7 25,4 1023 478 498 455 2,3 33,4 140,3 31,6 939 542 532 407 3,1 52,3 203,6 48,7 24 2,9 44,8 211,8 42,6 22 947 745 774 524 3,8 70,5 290,6 66,9 30 20 12 2 155 135 157 102 0,6 5,1 35,7 4,9 10 16 24 205 132 132 119 26 15 32 0,5 3,2 15,2 3,1 11 0 0,6 8,0 54,2 8,2 3 2 10 40 0 2 1,4 20,2 84,9 19,0 9 21 1,9 42,8 124,6 40,6 5 5 333 A MIXED HEURISTIC FOR CIRCUIT PARTITIONING 16 c432 c499 14 c880 12 10 8 Speedup 6 4 2 0 2 4 6 8 10 12 14 16 Processors Figure 5. Speedup with some ISCAS circuits for test pattern generation. this each processor receives one of the subcircuits obtained. Thus, it is possible for all the processors to concurrently apply the test generation algorithm, described in [13], to determine the test patterns for the stuck-at faults in the nodes of the corresponding subcircuit. The communication between processors is needed to complete the determination of the test equation in each node, and it grows with the number of cuts among subcircuits. Thus, one way to demonstrate the performance of MSATS is to consider the increase in the speedup provided by the parallel test-generator when the number of processors grows. If the speedup grows proportionally to the number of processors, or in other words, if the efficiency is more or less constant, the performance of MSATS is adequate according to the conditions given in Section 2. This behaviour is observed in figure 5, which shows the speedup obtained for different circuits of the ISCAS set when the number of processors working grows. Speedup results for the ISCAS’85 circuits are provided in Table 3. The parallel test generator has been run in a multicomputer Intel Paragon. The speedups obtained with K processors are given in the columns labeled S K and the number of cuts produced when the circuit is partitioned in K subcircuits are given in the columns C K in Table 3. As has been indicated, the cost function used in this paper does not model characteristics of the multicomputer to be used, such as the interconnection topology and the relative communication and computation costs. This is justified by taking into account that in many cases the communication delays are similar for all the processors in the machine due to the existence of a suitable specific hardware or to the use of a corresponding software layer for 334 GIL ET AL. Table 3. Speedups, number of cuts and fault coverages obtained with the parallel test-pattern-generator. Circuit Faults S2 C2 S4 C4 S8 C8 S 12 C 12 S 16 C 16 Coverage 864 1.69 20 3.70 40 6.23 61 8.76 72 10.34 85 98% c499 998 1.78 16 3.85 54 6.34 78 8.20 96 11.43 119 99% c880 1760 1.82 17 3.87 37 7.05 80 9.34 98 12.56 129 100% c1355 2710 1.90 17 2.93 45 6.25 70 9.45 81 11.65 98 98% c1908 3816 1.97 35 2.50 71 6.86 108 9.67 113 10.26 125 98% c3540 7080 1.78 46 3.67 132 6.40 221 8.30 240 10.20 298 98% c6288 12570 1.67 46 2.75 102 6.53 301 8.50 327 10.45 355 97% c432 routing. Also, the use of simple cost functions is more suitable when, as in this case, the speed of the optimization algorithm is important. Nevertheless, the performances obtained by the test pattern generator shows that the model provided by our cost function is good enough. 4. Extensions to other partitioning problems As discussed above, the MSATS algorithm was proposed as a first step in the application of parallel test pattern generation. Its field of application was thus restricted to combinatorial circuits or their equivalent representation as acyclic directed graphs. A further requirement was that the partitions must be contiguous (Section 2, condition 4). Therefore, to compare this technique with other approaches, we extended the original MSATS to create RMSATS (Refinement of MSATS), which can be applied in graphs without the above restrictions. The main differences with respect to MSATS are as follows: 1. The input format to read and store the information of the graph was adapted to the public domain benchmarks currently used [30] to compare different packages. 2. The initial solution was modified slightly. In MSATS, an input (output) partitioning algorithm is used where the circuit is traversed in a depth-first way starting from the inputs (outputs). However, in the new graph configuration, this strategy is not possible as primary inputs (outputs) are not available, or at least are not so specified. Therefore, we sought alternatives, and have implemented a growing algorithm where a width traverse of the graph is performed, beginning with a randomised vertex. 3. We included a refinement step at the end of the algorithm. In this step, the balance objective is considered for optimisation in a greedy way, without worsening the first objective (edge cut). 4. For comparison with other approaches, the factor θ (imbalance tolerance) was fixed at 0.03, while in MSATS it was fixed in the range of 0.2 to 0.3. This is because it was considered more important to reduce the cut weight considerably (interprocessor communication), to the detriment of node balance. A MIXED HEURISTIC FOR CIRCUIT PARTITIONING 4.1. 335 Experimental results For the RMSATS experimental results, we used the ISCAS’85 benchmarks adapted to the new format, together with public domain benchmarks currently used to compare different approaches. None of the graphs have vertex or edge weights. Table 1 lists all the graphs, their sizes, the maximum, minimum and average angle of the vertices (number of adjacent vertices) and the size in KB. The test graphs were chosen to conform a representative sample of small to large scale real-life problems, and include both 2D and 3D examples of nodal graphs, dual graphs, 3D semi-structured graphs and other non-mesh like combinatorial circuits (ISCAS and add32). We compared the results of RMSATS with those produced by a public domain partitioning package, METIS 4.0 [21]. The algorithms chosen were pmetis (Multilevel recursive bisection) and kmetis (Multilevel k-way partitioning). A large number of experiments were carried out to adjust different parameters of the algorithm, including initial temperature t0 and tfactor. The values for tfactor depend on the size of the graph and the run time of the algorithm. From small to large graphs, we chose a tfactor in the range of 0.999 to 0.9999 respectively, with an initial temperature of 2. This initial temperature is lower than in MSATS because we have larger graphs and we can explore the solution space without wasting time on solutions that only worsen the objective function. Table 4 shows the results obtained from using RMSATS for 5 values of K (number of partitions). The table also shows the total weight of cut edges and the imbalance for the RMSATS, growing (initial solution of RMSATS), pmetis and kmetis algorithms. It is important to note that graph partitioning algorithms can usually find higher quality partitions if the balancing constraint is relaxed slightly. Indeed, some of the public domain partitioning packages, such as JOSTLE and METIS have an in-built, although adjustable, imbalance tolerance of 3%. For this reason, we used the same value in RMSATS, although the algorithm allows a deviation factor (Eq. (1)) of 2%. Thus, the total imbalance tolerance is at worst 5%. However, after the refinement step, this imbalance is reduced to 0% in some cases. Note, too, that since pmetis uses recursive bisection, and thus produces partitions with perfect balance, the cut weight is worse than that achieved by kmetis and RMSATS in most cases. The results given in Table 4 show that RMSATS improves the total weight of cut edges in many cases, even in larger graphs. On the other hand, run times were longer in every case with RMSATS. This range varied from one to three orders of magnitude. Nevertheless, run times were of the order of minutes, or several hours in the worst case (other algorithms take weeks for best cut edge quality [30]). Thus, as indicated in [30], some applications prefer better quality solutions for the cut edge objective even at the cost of a slight increase in the imbalance or in the run time in this phase, since this implies a decrease in the total time required by a parallel application (i.e. a reduction in interprocessor communication time), as in parallel test pattern generation. It is not easy to identify a trend in the results obtained, other than that RMSATS does particularly well when the value of K increases (i.e. 16 or 32 partitions) in most graphs, and does badly in the case of the add32. 336 GIL ET AL. Number of cuts (Cuts) and imbalance (Desv.) for different algorithms (pmetis, kmetis, growing and RMSATS) with all the bechmarks in Table 1 for different values of partitions (K). Table 4. K =2 gra c432 c499 c880 Algorithm c3540 c5315 c6288 add20 uk K =4 Desv. Cuts K =8 Desv. Cuts K = 16 Desv. Cuts K = 32 Desv. Cuts Desv. pmetis 36 0 61 0 80 1 119 10 171 21 kmetis 39 1 60 3 87 1 147 37 179 64 growing 98 0 291 0 363 0 397 0 415 0 RMSATS 33 1 56 1 76 3 103 0 147 0 pmetis 23 0 61 1 92 5 140 5 188 5 kmetis 20 4 62 2 92 3 167 20 211 211 growing 109 0 268 0 387 0 440 0 473 0 RMSATS 20 4 55 5 83 5 131 5 185 1 pmetis 24 0 60 1 100 1 153 5 211 8 24 kmetis c1355 Cuts 44 3 60 3 93 2 152 2 219 growing 135 0 331 0 596 0 746 0 802 0 RMSATS 19 1 50 5 84 4 132 3 198 0 pmetis 24 0 59 0 101 0 134 4 204 6 kmetis 22 1 68 2 97 4 144 9 218 13 growing 305 0 627 0 974 0 1173 0 1262 0 RMSATS 28 2 59 4 81 5 115 4 184 4 pmetis 75 0 147 0 289 0 384 1 566 2 kmetis 92 3 184 3 264 3 417 3 577 2 growing 300 0 952 0 1890 0 2783 0 3079 0 RMSATS 66 0 141 2 218 4 329 5 490 4 pmetis 87 0 164 0 250 0 388 1 554 1 kmetis 77 1 159 3 273 3 387 4 554 3 growing 531 0 1316 0 2896 0 3980 0 4426 0 RMSATS 83 0 169 3 229 4 326 5 483 5 pmetis 153 0 257 0 384 0 504 1 632 2 kmetis 161 3 265 1 408 1 507 3 661 2 growing 1297 0 3065 0 4214 0 4913 0 5194 0 RMSATS 134 0 246 5 326 2 416 5 561 5 pmetis 725 0 1292 0 1907 0 2504 1 3008 3 kmetis 719 3 1257 3 1857 5 2442 7 3073 74 growing 1549 0 2811 0 3696 0 4577 0 4940 0 RMSATS 701 3 1493 5 1769 5 2157 5 2466 4 pmetis 23 0 67 0 101 0 189 0 316 1 kmetis 36 3 64 3 98 2 189 3 302 3 growing 97 0 221 0 469 0 970 0 1971 0 RMSATS 30 0 69 1 89 4 161 5 259 5 (Continued on next page.) 337 A MIXED HEURISTIC FOR CIRCUIT PARTITIONING Table 4. (Continued ). K =2 gra add32 Algorithm fe 4elt2 4elt Cs4 cti Desv. Cuts Desv. Cuts K = 32 Desv. Cuts Desv. 21 0 42 0 81 0 128 0 288 1 28 2 44 3 102 3 206 3 352 15 1067 0 2968 0 5166 0 5866 0 6169 0 41 1 119 5 191 5 285 5 349 5 pmetis 108 0 231 0 388 0 665 0 1093 0 kmetis 97 1 213 2 403 2 651 2 1096 2 growing 239 0 724 0 1487 0 3015 0 6147 0 RMSATS pmetis 90 218 0 0 201 480 4 0 340 842 5 0 567 1370 5 0 959 2060 5 1 2 kmetis 233 2 454 2 806 3 1350 3 2080 growing 300 0 770 0 1493 0 2988 0 6018 0 RMSATS 227 2 427 5 727 5 1168 5 1818 5 135 0 406 0 719 0 1237 0 1891 0 3 whitaker3 pmetis crack Cuts K = 16 pmetis RMSATS data Desv. K =8 kmetis growing 3elt Cuts K =4 kmetis 133 3 446 3 769 3 1200 3 1824 growing 304 0 728 0 1642 0 3449 0 7019 0 RMSATS 128 0 385 5 687 5 1093 5 1683 5 pmetis 187 0 382 0 773 0 1255 0 1890 0 3 kmetis 225 1 408 3 809 3 1218 3 1882 growing 399 0 935 0 2048 0 4196 0 8403 0 RMSATS 184 0 363 2 685 4 1098 5 1689 5 pmetis 130 0 359 0 654 0 1152 0 1787 0 kmetis 132 0 398 3 684 3 1149 3 1770 3 growing 244 0 814 0 1823 0 3685 0 7550 0 RMSATS 130 0 349 0 622 4 1004 5 1663 5 pmetis 154 0 406 0 635 0 1056 0 1769 0 kmetis 225 0 344 2 614 3 1099 3 1784 3 growing 735 0 1848 0 4205 0 8684 0 17445 0 RMSATS 187 0 437 3 649 5 945 5 1534 5 pmetis 414 0 1154 0 1746 0 2538 0 3579 0 kmetis 410 0 1173 3 1677 3 2521 3 3396 3 0 27983 growing 1277 0 3489 0 7672 0 15685 RMSATS 377 0 1018 1 1484 4 2245 4.6 3056 4.9 0 pmetis 334 0 1113 0 2110 0 3181 0 4605 0 kmetis 395 2 1132 1 2130 2 3451 3 4713 3 growing 1698 0 4446 0 9548 0 19402 0 34918 0 RMSATS 588 1 1177 5 1853 5 2875 5 4059 5 (Continued on next page.) 338 Table 4. GIL ET AL. (Continued ). K =2 gra memplus fe sphere wing nodal wing fe ocean 598a wave Desv. Cuts K = 16 Desv. Cuts K = 32 Cuts Desv. Cuts pmetis 6337 0 10559 0 13110 0 14942 0 17303 0 kmetis 6453 3 10483 3 12615 3 14604 6 16821 6 Desv. Cuts Desv. growing 13433 0 23063 0 27957 0 30818 0 33417 0 RMSATS 8570 4 12438 5 13640 5 15301 5 16269 5 440 0 872 0 1330 0 2030 0 2913 0 2 pmetis kmetis 444 0 903 3 1306 3 2012 2 2842 growing 472 0 1342 0 2914 0 6050 0 12294 0 RMSATS 386 0 774 4 1226 4 1726 5 2511 5 1820 0 4000 0 6070 0 9290 0 13237 0 3 pmetis kmetis 1855 0 4355 2 6337 3 9465 3 12678 growing 5368 0 14118 0 29960 0 39425 0 44408 0 RMSATS 1723 0 3816 5 5419 5 8392 5 11741 5 950 0 2086 0 3205 0 4666 0 6700 0 3 pmetis 909 1 1943 3 3120 3 4652 3 6613 growing 2785 0 7062 0 15375 0 31486 0 63456 0 RMSATS 1353 0 2069 5 2877 5 4254 5 5938 5 12427 0 21471 0 28177 0 37441 0 46112 0 3 pmetis kmetis 11952 1 23141 2 29640 3 38673 3 45613 growing 27835 0 64754 0 88603 0 105714 0 118650 0 RMSATS 11604 0 20500 5 30572 5 36487 5 44678 5 505 0 2039 0 4516 0 9613 0 14613 0 3 pmetis kmetis fe tooth K =8 Algorithm kmetis vibrobox K =4 536 0 2194 2 5627 2 10253 3 16604 growing 2376 0 7938 0 17855 0 37430 0 76065 0 RMSATS 464 0 2349 1 5481 5 9051 5 13923 5 pmetis 4292 0 8577 0 13653 0 19346 0 29215 0 kmetis 4262 2 7835 3 13544 3 20455 3 28572 3 growing 7785 0 32262 0 71076 0 145539 0 218804 0 RMSATS 4086 0 11817 5 13152 5 21562 5 26981 5 pmetis 2504 0 8533 0 17276 0 28922 0 44760 0 kmetis 2533 0 8495 1 17137 3 28647 3 44398 3 growing 26129 0 69388 0 145234 0 181053 0 243981 0 RMSATS 3881 0 12603 5 22734 5 27976 5 40896 5 pmetis 9493 0 23032 0 34795 0 48106 0 72404 0 kmetis 9655 0 21682 2 33146 3 48183 3 67860 3 growing 32380 0 85289 0 184382 0 381418 0 514427 0 RMSATS 12744 0 21390 5 38522 5 51920 5 64983 5 A MIXED HEURISTIC FOR CIRCUIT PARTITIONING 5. 339 Conclusions In this paper we have developed a new algorithm for the circuit partitioning problem in the framework of parallel circuit testing. More specifically, its use has been included in a parallel test pattern generator based on the partitioning of the circuit to test. The problem is formulated as a combinatorial optimization problem by using a cost function comprising the contribution of the number of cuts and the deviation with respect to a balanced distribution of the gates among the different subcircuits. The new algorithm, called MSATS, has been proposed to obtain good solutions in a small amount of time. It reduces the possibility of cycles in the search process by applying the Tabu Search characteristics to a Simulated Annealing algorithm. The results provided show that MSATS outperforms the TS and SA algorithms when applied to the same cost function. To compare with other state of the art approaches, an extended version have been included and the results show that this technique is comparable in terms of solution quality and even run time. Acknowledgments We thank Dr. Inmaculada Garcı́a for critically reading the manuscript and making several useful remarks. This work has been supported by project TIC2000-1348 (CICYT, Spain). References 1. E. Aarts and J. Korst, Simulated Annealing and Boltzmann Machines. A Stochastic Approach to Combinatorial Optimization and Neural Computing, John Wiley & Sons: New York, 1990. 2. C.J. Alpert and A. Kahng, “Recent developments in netlist partitioning: A survey,” Integration: The VLSI Journal, vol. 19, no. 1/2, pp. 1–81, 1995. 3. A.A. Andreatta and C.C. Ribeiro, “A graph partitioning heuristic for the parallel pseudo-exhaustive logical test of VLSI combinational circuits,” Annals of Operations Research, vol. 50, pp. 1–36, 1994. 4. S. Areibi and A. Vannelli, “Advanced search technique for circuit partitioning,” in DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 16, pp. 77–98, 1993. 5. P. Banerjee, Parallel Algorithms for VLSI Computer Aided Design, Prentice Hall; Englewoods Cliffs, NJ, 1994. 6. F. Brglez and H. Fujiwara, “Neural netlist of ten combinational benchmark circuits and a target translator in FORTRAN,” in Proc. IEEE Int. Symp. Circuits Syst., Special Session ATPG, 1985. 7. J.A. Chandy and P. Banerjee, “A parallel circuit-partitioned algorithm for timing-driven standard cell placement,” Journal of Parallel and Distributed Computing, vol. 57, pp. 64–90, 1999. 8. J. Cong and M. Smith, “A parallel bottom-up clustering algorithm with applications to circuit partitioning in VLSI design,” in Proc. ACM/IEEE Design Automation Conference, 1993, pp. 755–760. 9. K.A. Dowsland, “Simulated annealing,” in Modern Heuristic Techniques for Combinatorial Problems, C.R. Reeves (Ed.), Blackwell: London, 1993, pp. 20–69. 10. C. Fiduccia and R. Mattheyses, “A linear time heuristic for improving network partitions,” in Proc. 19th IEEE Design Automation Conference, pp. 175–181, 1982. 11. M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W.H. Freeman: San Francisco, 1979. 12. C. Gil and J. Ortega, “A parallel test pattern generator based on Reed-Muller spectrum,” in Euromicro Workshop on Parallel and Distributed Processing, IEEE, pp. 199–204, 1997. 340 GIL ET AL. 13. C. Gil and J. Ortega, “Algebraic test-pattern generation based on the Reed-Muller spectrum,” IEE Proceding Computers and Digital Techniques, vol. 145, no. 4, pp. 308–316, 1998. 14. C. Gil, J. Ortega, and M.G. Montoya, “Parallel VLSI test in a shared memory multiprocessors,” Concurrency: Practice and Experience, vol. 12, no. 5, pp. 311–326, 2000. 15. J. Gilbert, G. Miller, and S. Teng, “Geometric mesh partitioning: Implementation and experiments,” in Proceedings of International Parallel Processing Symposium, 1995. 16. F. Glover and M. Laguna, “Tabu Search,” in Modern Heuristic Techniques for Combinatorial Problems, C.R. Reeves (Ed.), Blackwell: London, 1993, pp. 70–150. 17. T. Goehring and Y. Saad, “Heuristic algorithms for automatic graph partitioning,” Technical Report UMSI94-29. University of Minnesota Supercomputing Institute, 1994. 18. S.W. Hadley, B.L. Mark, and A. Vanelli, “An efficient eigenvector approach for finding netlist partitions,” IEEE Trans. on Computer-Aided Dessign, vol. 11, no. 7, pp. 885–892, 1992. 19. B. Hendrickson and R. Leland, “A multilevel algorithm for partitioning graphs,” in Proceedings Supercomputing ’95, ACM Press, 1995. 20. D.S. Johnson, C.R. Aragon, and L.A. McGeogh, “Optimization by simulated annealing: An experimental evaluation, Part I: Graph partitioning,” Operations Research, vol. 37, pp. 865–892, 1989. 21. G. Karypis and V. Kumar, “Multilevel K-way partitioning scheme for irregular graphs,” Journal of Parallel and Distributed Computing, vol. 48, no. 1, pp. 96–129, 1998. 22. B.W. Kernighan and S. Lin, “An efficient heuristic procedure for partitioning graphics,” The Bell Sys. Tech. Journal, pp. 291–307, 1970. 23. R.H. Klenke, R.D. Williams, and J.H. Aylor, “Parallel-processing techniques for automatic test pattern generation,” IEEE Computer, pp. 71–84, 1992. 24. V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing. Design and Analysis of Algorithms, The Benjamin/Cummings Publishing, 1994. 25. V. Kumar, A. Grama, and V.N. Rao, “Scalable load balancing techniques for parallel computers,” Journal of Distributed and Parallel Computing, vol. 22, pp. 60–79, 1994. 26. C.R. Reeves, “Genetic algorithms,” in Modern Heuristic Techniques for Combinatorial Problems, C.R. Reeves (Ed.), Blackwell: London, 1993, pp. 151–196. 27. L.A. Sanchis, “Multiple-way network partitioning with different cost functions,” IEEE Trans. on Comp., vol. 42, no. 12, pp. 1500–1504, 1993. 28. K. Schloegel, G. Karypis, and V. Kumar, “Graph partitioning for high performance scientific simulations,” in CRPC Parallel Computing Handbook, Morgan Kaufmann: San Matio, CA, 2000. 29. I. Shperling and E.J. McCluskey, “Circuit segmentation for pseudo-exhhaustive testing via simulated annealing,” in International Test Conference IEEE, 1987. 30. A.J. Soper, C. Walshaw, and M. Cross, “A combined evolutionary search and multilevel optimisation approach to graph partitioning,” Mathematics Research Report 00/IM/58, University of Greenwich, 2000. 31. C. Walshaw and M. Cross, “Mesh partitioning: A multilevel balancing and refinement algorithm,” SIAM J. Sci. Comput., vol. 22, no. 1, pp. 63–80, 2000.
© Copyright 2026 Paperzz