MACRo 2015- 5th International Conference on Recent Achievements in Mechatronics, Automation, Computer Science and Robotics Synchronization and Load Distribution Strategies for Parallel Implementations of P-graph Optimizer Anikó BARTOS1, Botond BERTÓK2 1 Department of Computer Science and Systems Technology, Faculty of Information Technology, University of Pannonia, Veszprem, Hungary, e-mail: [email protected] 2 Department of Computer Science and Systems Technology, Faculty of Information Technology, University of Pannonia, Veszprem, Hungary, e-mail: [email protected] Manuscript received January 12, 2015, revised February 9, 2015. Abstract: Process Network Synthesis aims at determining the optimal or n-best process structures of a production system as well as the optimal volumes of the constituting operating units. In the P-graph framework algorithm ABB provides the nbest structurally different process networks. It is widely applied for optimal design of manufacturing systems as well as business processes and supply chains. Thus its effective implementation is essential in practice. The present work introduces a novel cooperative parallel implementation for algorithm ABB and it is compared to former load distribution strategies. Keywords: P-graph, search strategy, parallel computing, optimization 1. Process Network Synthesis Process Network Synthesis (PNS) was introduced formally by Friedler and Fan in 1992 [1]. The problem can be formulated as MILP, however incorporating application field specific logical implications in the decision procedures may lead to faster optimization software than a general MILP solver and results in more practical solutions. The logical implications rely on an unambiguous graphical representation of the process structures by P-graphs [2]. 2. Search strategy Numerous search strategies exist to find the optimal solution for a PNS problem not necessarily enumerating each of the structures, which can be constructed from the building blocks called operating units in the problem definition [2-4]. The accelerated branch-and-bound or ABB algorithm reduces 303 10.1515/macro-2015-0030 Unauthenticated Download Date | 7/13/17 1:24 AM 304 A. Bartos, B. Bertók both the number and complexity of subproblems to be visited during the search by logical implications due to mathematical foundations of the P-garph framework [2-4]. These foundations include an unambiguous graph representation of the structural properties of process networks by P-graphs, and expression of the necessary conditions for process networks to be structurally or combinatorially feasible by axioms. The algorithm ABB follows the branch and bound technique with disjoint branches. Contrary to the general purpose solvers, algorithm ABB provides the n-best suboptimal structures or flowsheets in addition to the optimal one, where n is given by the user before executing the algorithm. A structure is considered to be suboptimal if it does not include a better substructure [5]. Algorithm ABB constructs a process structure in retrosynthetic direction, i.e., backward from the products till the raw materials. Decisions are made on the sets of operating units producing a product or an intermediate material which is consumed. After each decision logical implications are applied to reduce the number and size of subproblems. Algorithm ABB visits each structurally feasible structure as worst case. Figure 1: Search tree. The algorithm InsideOut [4] differs from the original ABB in a sense, that decisions are made only on those operating units appearing in the continuous relaxation of the actual problem with nonzero volume. As the worst case, algorithm InsideOut visits those structures only, which are feasible according to the relaxed continuous model of the problem. It is called InsideOut since in the original ABB, the combinatorial part controls the search and analysis of the relaxed model has less priority, while in InsideOut the search is controlled according to the relaxed model and then logical implications are executed by combinatorial analysis. The algorithm InsideOut examines a subproblem in Unauthenticated Download Date | 7/13/17 1:24 AM Synchronization and Load Distribution Strategies for Parallel Implementations of P-graph Optimizer 305 each of its iterations until the container of the open subproblems becomes empty. If the subproblem of interest is not a solution then branches it, i.e., generate two subproblems by inclusion or exclusion of a selected operation unit. Both recently generated subproblem are examined if they are feasible, and included in the storage of the subproblems if yes. If the subproblem is a solution, than it is analyzed whether it is better or worse than the previously stored solutions. Both the container of the open subproblems and the container of solutions are revisited according to the new solution. Fig.1 .shows a binary tree illustrating the steps of the search. Each of its vertices represents a subproblem. The search goes top-down. The width and depth of the search tree depends on the problem to be solved and the search strategy followed. The search time is expected to be reduced by parallel implementation of the algorithm. The forthcoming sessions parallel realization of algorithm InsideOut, and the best parameter settings for efficient execution. 3. Cooperative Parallel Implementation of Algorithm InsideOut A parallel version of algorithm ABB (ABBP) for multi-thread execution was introduced by Varga et al., in 1995, following the master-slave synchronization scheme [4]. The topology of the parallel computing processing elements at that time limited the realization of the information flow. In this work the slaves form a ring topology, as it is shown in Fig. 2. Figure 2: Ring topology for master-slave implementation of algorithm ABB. In each block the transfer and the multiplexer receive and send the messages. It is easy to see that sending a message from the master to the last slave is slow, because the information has to go through all slave to arrive to the last one. Unauthenticated Download Date | 7/13/17 1:24 AM 306 A. Bartos, B. Bertók Similarly, when one of the middle slaves sends the data back to the master, it has to go through a long path, because none of the slaves in the middle is connect to the master directly. Furthermore, the master’s only work is to control the slaves’ work. During the last decade more and more cores appeared in the processors with equal access to the same memory. This has enabled the creation of a parallel algorithm, with flexible topology and load distribution. The two major problem of parallel implementation is how to balance the loads of the cores and at the same time minimize the need for communication between them. With frequent communication the communication time can slow down the algorithm, but if the communication is rare, unnecessary calculations are made due to the lack of information. This is caused by the followings. A so called bounding procedure helps eliminating useless subproblems, i.e., it can be proven that subproblems with worse estimated objective value than the bound never yield to better solutions than the ones already available. However, in parallel execution, while a thread finds a solution resulting in the update of the bound, the other threads still analyze their own subproblems, and can only eliminate some of them, when they receives information about the updated bound. Figure 3: Architecture of the shared memory implementation for parallel implementation of algorithm InsideOut. The parallelization requires some modification in the simple InsideOut algorithm. Instead of a single subproblem container, each thread has its own (Fig. 3.). If such a local subproblem container becomes empty the thread sends request to the others. There is a common storage, let us call postbox, where the Unauthenticated Download Date | 7/13/17 1:24 AM Synchronization and Load Distribution Strategies for Parallel Implementations of P-graph Optimizer 307 threads put subproblems addressed to the others. When a thread accepts a request, shares a subproblem, i.e., deletes the subproblem from its own subproblem container and puts it into the postbox. When a thread receives a subproblem, moves it from the post box to its own container. It is important that only one thread can modify the postbox at a time. The solution-container and the bounds are shared as well, i.e., their access is also limited to a single thread at a time. The threads share a subproblem only if the number of subproblems in the local container exceeds a predefined minimum. Fig.4. depicts the state diagram illustrating the behavior of each thread during the cooperative search. The search ends when the number of requests is equal to the number of threads, i.e., every local subproblem container becomes empty. Figure 4: Control logic of the cooperative parallel implementation of algorithm InsideOut Fig. 5.and Fig. 6. shows computation times of the parallel InsideOut algorithm for test problems. Fig. 5.illustrates the computation time required to determine a single optimal solution executing the algorithm non-parallel, and up to four threads. In Fig. 6. the test problems and algorithms are the same, but generation of the 10 best optimal and suboptimal solution is required. The results show that the parallelization increase the running time (from non-parallel to one-threads running), but executing the algorithm in more than one threads the algorithm accelerates. The acceleration is faster when the algorithm solves Unauthenticated Download Date | 7/13/17 1:24 AM 308 A. Bartos, B. Bertók more complex problems, e.g., problem_2 and problem_3 are more complex than the other two. Figure 5:Decrease of the computation time on multiple threads generating the optimal solution. Figure 6: Decrease of the computation time on multiple threads generating the 10 bestsolutions. Unauthenticated Download Date | 7/13/17 1:24 AM Synchronization and Load Distribution Strategies for Parallel Implementations of P-graph Optimizer 309 Search tree of the parallel algorithm executed on four threads is illustrated on Fig. 7. Different colors are assigned to different threads. In this execution a thread shared a subproblem, only if it was requested and there were more than two subproblems in the local container. It is easily noticeable that the threads are equally loaded. Figure 7: Multithread searching tree. 4. Parameter-settings More acceleration can be achieved, when the parameter settings are optimal. The first parameter is the minimum_remaining_subproblem, which can set how many subproblems required to remain in the local storage before sending a subproblem for another thread. Unauthenticated Download Date | 7/13/17 1:24 AM 310 A. Bartos, B. Bertók Figure 8: Computation time with different minimum_remaining_subproblem value. Fig. 8. shows the results expected: the faster the subproblem is shared the more the overall computational time is reduced. If that value is higher, it cause waiting, and the algorithm will be slower and slower. In the Figure 8 ‘1’ means that more than one subproblem has to remain in the thread’s own stack, i.e., if there are two subproblems, it will share one of them, when one of the other threads is waiting for. The other question of the subproblem-sharing strategy is: what to share? Two options are Local and GlobalNext subproblems. The differences between these two strategies are represented in the search tree in figures 9 and 10. Figure 9: With GlobalNext sharing strategy the algorithm shares subproblem from higher level Unauthenticated Download Date | 7/13/17 1:24 AM Synchronization and Load Distribution Strategies for Parallel Implementations of P-graph Optimizer 311 Figure 10: With LocalNext sharing strategy the algorithm shares subproblem from lower level Each thread performs a depth-first search. The green part is already discovered by a thread and the purple denotes the open subproblems available in the thread’s own storage. With the GlobalNext sharing strategy (Fig. 9.), the thread shares the subproblem available at the higher level of the search tree. Using LocalNext sharing strategy the thread shares the subproblem from the lower discovered level of the search tree (Fig. 10.). Figure 11: Computation time with LocalNext and GlobalNext sharing strategy The results show (Fig. 11.) that if a smaller number of solutions is required, the LocalNext sharing strategy is more effective, however if numerous solution are asked, the algorithm better to use GlobalNext sharing strategy to decrease the running time. What counts small number, depends on the size of the Unauthenticated Download Date | 7/13/17 1:24 AM 312 A. Bartos, B. Bertók problem. It is explained by the fact that the LocalNext strategy accelerates the depth search, and the GlobalNext strategy facilitates the width search. 140,00 120,00 time (sec) 100,00 80,00 60,00 40,00 20,00 0,00 worse better average Parameter settings Figure 12: Computation time when the parameter settings are less properly, optimal and average Fig. 12 illustrates, that if the parameter settings are less properly set, the running time may increase even for double. The difference between the worse and best results demonstrate how important is to choose the parameters properly. The results on the Table 1 show how much acceleration can be achieved on solving different problems for different required number of best structures with the parallelization when the parameter settings are optimal. It can be seen that the acceleration is higher for larger problems (60-70%), because for smaller ones the original algorithm is also very fast. Table 1: Computation time and acceleration on different problems and different required number of best structures Problemsname solutionnumber OriginalTime (s) 4 threadsTime (s) Acceleration 1 0,0 0,0 0% Denmark 3.in 10 0,1 0,1 0% 100 0,3 0,2 33% 1 98,3 41,8 57% route_vp_307_2auto.in 10 169,2 101,0 40% 100 884,2 418,3 52% 1 0,0 0,0 0% SNS2_v4.in 10 8,2 1,9 77% 100 133,3 48,6 64% 1 1,5 0,6 60% Example72.in 10 7,5 4,2 44% 100 39,0 28,9 26% Average: 38% Unauthenticated Download Date | 7/13/17 1:24 AM Synchronization and Load Distribution Strategies for Parallel Implementations of P-graph Optimizer 313 5. Conclusion A cooperative shared memory parallel implementation of the InsideOut algorithm for process-network synthesis has been presented herein. Initial tests show, that the loads of the threads can be balanced with relatively low frequency of communication, and finally executing the algorithm on more processor cores is faster than on a single one. The results also presents, that the more complex a problem is, the more its solution can be accelerated in parallel execution. The algorithm proposed has numerous parameters to be fine tuned for different fields of applications, e.g., the minimal number of the subproblems in the local subproblem containers before sharing tasks with other threads or the subproblem sharing strategy. Acknowledgements Publication of this paper has been supported by the European Union and Hungary and co-financed by the European Social Fund through the project TÁMOP- 4.2.2.C-11/1/KONV-2012-0004 - National Research Center for Development and Market Introduction of Advanced Information and Communication Technologies. References [1] [2] [3] [4] [5] Friedler F., Tarjan K., Huang Y.W., Fan L.T., Combinatorial Algorithms for Process Synthesis, Computers Chem. Engng.16, S313-320 (1992). Friedler F., Varga J.B., Feher E., Fan L.T., Combinatorially Accelerated Branch-and-Bound Method for Solving the MIP Model of Process Network Synthesis, Nonconvex Optimization and Its Applications, State of the Art in Global Optimization, Computational Meth.and App, 609-626 (1996). Varga, J. B., F. Friedler, and L. T. Fan, Parallelization of the Accelerated Branch-andBound Algorithm of Process Synthesis: Application in Total Flowsheet Synthesis, ActaChimicaSlovenica, 42, 15-20 (1995). Illés T. and Nagy Á., Sufficient optimality criteria for linearly constrained, separable concave minimization problems, Journal of Optimization Theory and Application, 125(3), 559-575 (2005). Bertok, B., M. Barany, and F. Friedler, Generating and Analyzing Mathematical Programming Models of Conceptual Process Design by P-graph Software, Industrial & Engineering Chemistry Research, 52(1), 166-171 (2013). Unauthenticated Download Date | 7/13/17 1:24 AM
© Copyright 2026 Paperzz