EvCA`96 - PRELIMINARY INFORMATION SLIP

NIKITIN A., NIKITINA L. THE GA BASED APPROACH TO OPTIMIZATION OF PARALLEL CALCULATIONS IN LARGE PHYSICS PROBLEMS.
The GA Based Approach to Optimization of Parallel Calculations in Large
Physics Problems.
Dr. Andrey Nikitin
Dr. Ludmila Nikitina
Moscow State University
[email protected]
ABSTRACT.
The parallelization of computational algorithms in large physics problems on
multiprocessor computer system containing thousand and more of processor elements
requires special tools. In the present work the approach based on the using of Genetic
Algorithm (GA) is proposed. The experiment results and the influence of GA parameters
on convergence of the method are given. The approach may be using for real-time
parallelization.
1. INTRODUCTION.
The solving of a parallelization problem for multiprocessor computer system
containing thousand and more of processor elements requires a huge number of
operations. The problem is NP-complete. Therefore the traditional algorithm of
exhaustive search cannot be applied for this purpose.
Genetic Algorithm (GA) [1] is the stochastic search technique, based on ideas
adopted from nature. GA has been successfully applied to solve many NP-complete
problems as Travelling Salesman Problem or mapping problem [5].
The existing methods do not allow exploring high – level parallelism or taking into
account performance of communication channels between processor elements.
In the present work the approach based on the Genetic Algorithm is proposed. The
method may be applied to computer systems with a huge number of processor elements
for utilizing high – level parallelism as well as operation – level parallelism.
The parallelization problem is formulated here as an optimisation of partitioning
graph to subgraphs.
2. BACKGROUND.
The criterion for distribution of nodes between subgraphs is the minimization of the
algorithm working time on the system and the time required for data transfer between
Page 1 of 8
NIKITIN A., NIKITINA L. THE GA BASED APPROACH TO OPTIMIZATION OF PARALLEL CALCULATIONS IN LARGE PHYSICS PROBLEMS.
processor elements. In this way, every subgraph corresponds to the some processor
element.
The computational algorithm can be represented as a weighted graph [3]. The
function estimates operating time of the algorithm and data transferring is determined by
the given partition R. This functional is described below.
A program is represented by an acyclic directed weighted graph Ga=<S,C>. The
nodes of the graph are the statements of the algorithm and the edges refer to the
information exchange between them. C is a vector of nodes’ weight (calculation cost) and
S is an adjacency matrix of the graph. There is decomposition to layers for an acyclic
graph. The nodes belonging to the same layer can be carried out simultaneously. Let h is a
number of the layers. The decomposition to layers is introduced as matrix Hhxn where hkj is
1 if j-th node belongs to k-th layer and 0 in other case.
A multiprocessor computer system is represented by a weighted graph Gc=<,>.
The nodes of the graph are processor elements and i is the performance of i-th processor
element. There is an edge between i-th and j-th nodes if there is a link between
corresponding processors and pxp is communication performance matrix.
We assumed:
n is a number of the algorithm statements.
p is a number of processor elements in the computer.
Rpxn is a partition matrix, ril is equal 1 if node number l belongs to i-th subgraph and
0 in other case.
We use matrix Ghxp where element gki is equal total weight of nodes belonging to i-th
subgraph and k-th layer in the graph of the algorithm:
g ki ( R ) 
n
 hkl ril cl .
l 1
The cost of the data transfer between processor elements is defined by matrix pxp:
n
 ij  
n
 ril r jm slm .
l 1 m 1
It is possible now to define functional J(R):
J ( R) 
0 
g 
  max  ki  
U A k 1 i 1   i 
h
p
p 1
p

   ij Lij 
i 1 j  i 1

where:
0 is a performance of the processor element.
Ua is a total weight of the graph of the algorithm.
Matrix Lpxp determines minimal distance between j-th and i-th processors. If both
communicating algorithm nodes are allocated on the same processor the distance is zero.
Page 2 of 8
NIKITIN A., NIKITINA L. THE GA BASED APPROACH TO OPTIMIZATION OF PARALLEL CALCULATIONS IN LARGE PHYSICS PROBLEMS.
For achieving maximum performance we need to find partition matrix R that
minimize J(R):
J(R)min.
GA allows to carry out the optimisation process and build matrix R. The J(R) is used
as a fitness function.
3. IMPLEMENTATION DETAILS.
The genetic algorithm can be characterized by the following parameters:
- coding strategy,
- population size,
- initialisation strategy,
- selection technique,
- crossover mechanism,
- mutation mechanism,
- strategy for offspring inclusion.
Below, we briefly describe techniques used in this implementation.
Coding strategy - The partition is encoded by a chromosome, represented as a byte
string, whose length is equal to the number of nodes in the algorithm graph. A number
coded on the i-th position (gene) in a chromosome is a processor element number on
which i-th node is placed.
Population size - for the algorithm presented in this paper, there are no restrictions
on a number of chromosomes in population.
Initialisation strategy - initial elements can be randomly chosen or a modified
weighted random designed.
Selection technique - two methods of choosing an individual for a reproduction are
used and compared:
- roulette wheel, the probability that the individual i with fitness f i is used as parent
is proportional to its relative fitness in the population;
- linear rank selection, the probability that the individual i with fitness f i is used as
parent based on its rank in the population as a whole.
Crossover mechanism - we consider three crossover operators:
- one-point crossover(1-PC), a pair of chromosomes representing the selected
individuals is intersected in a randomly chosen point and right-hand genes are
swapped between chromosomes;
- two-point crossover(2-PC), a pair of chromosomes representing the selected
individuals is intersected in two randomly chosen points and genes lying between
those two points are exchanged between chromosomes;
Page 3 of 8
NIKITIN A., NIKITINA L. THE GA BASED APPROACH TO OPTIMIZATION OF PARALLEL CALCULATIONS IN LARGE PHYSICS PROBLEMS.
- uniform crossover, for each gene, randomly picks each gene from either of the
two parent chromosomes.
Mutation mechanism - a node is moved to a randomly chosen subgraph or two nodes
exchange between subgraphs.
Strategy for an offspring inclusion - after each reproduction a decision should be
made whether or not an offspring should replace its parent. We assume that the better of
offspring and the better of parents are included.
4. RESULTS
The method was investigated on the real problem of self – consistent simulation of
negative central shear discharges having the exact solution. The graph has 1563 nodes.
All results are averaged over 100 runs. The functional E define relative accuracy and
is used for investigation of GA:
E
J  J ex
J ex
 100%
where J – solution found by GA,
J ex – exact solution of parallelization problem (calculated analytically).
tFirstly the influence of the population size on the convergence of GA was studied.
The results are presented on the following figure. Figure 2 helps to choice population size
for predefined precision. Bigger populations show better results in high – precision case.
Размер популяции
10
30
50
100
500
1000
85
80
75
70
65
55
E
Отн. ошибка, %
60
50
45
40
35
30
25
20
15
10
5
t
0
0
20
40
60
80
100
Номер итерации
120
140
160
180
Figure 1. An influence of the population size on the convergence of GA.
Page 4 of 8
200
NIKITIN A., NIKITINA L. THE GA BASED APPROACH TO OPTIMIZATION OF PARALLEL CALCULATIONS IN LARGE PHYSICS PROBLEMS.
75%
55%
45%
38
36
34
32
Number of of iterations
30
28
26
Число итераций
24
22
20
18
16
14
12
10
8
6
4
2
10
30
50
100
500
Размер популяции
Population size
1000
Figure 2. An influence of the population size on the number of iterations to achieve
predefined precision (55%, 45%, 75%).
Mutation brings new information to the chromosome and preserve losing diversity.
Figures 3, 4 presents results of comparing different mutation schemes. The random
movement of nodes shows better results than exchange nodes between subgraphs.
Оператор мутации
Swap
Flip
26
25
24
23
22
21
20
18
E
Отн. ошибка, %
19
17
16
15
14
13
12
11
10
9
8
7
6
5
t
0
20
40
60
80
100
120
140
Номер итерации
160
180
200
220
Figure 3. The method convergence for different mutation schemes.
Page 5 of 8
240
NIKITIN A., NIKITINA L. THE GA BASED APPROACH TO OPTIMIZATION OF PARALLEL CALCULATIONS IN LARGE PHYSICS PROBLEMS.
Вероятность мутации
115
0%
0.01%
1%
2%
5%
10%
110
105
100
95
90
85
80
70
E
Отн. ошибка, %
75
65
60
55
50
45
40
35
30
25
20
15
10
5
t
0
0
20
40
60
80
100
120
Номер итерации
140
160
180
200
220
Figure 4. The method convergence for different mutation rate.
After that, the selection strategies were compared (Figure 5). Rank and Tournament
selection schemes demonstrate the best results. GA with rank selection converges to exact
solution slightly faster.
Оператор отбора
75
Rank
Roulette
Tournament
DS
SRS
Uniform
70
65
60
55
45
E
Отн. ошибка, %
50
40
35
30
25
20
15
10
5
t
0
20
40
60
80
100
120
140
160
180
Номер итерации
200
220
240
260
280
300
320
Figure 5. A comparison of selection strategies.
In our last experiment we analysed an influence of the crossover operator on the
method performance. Experiment was done with a crossover probability varying from 0 to
1.0 and a mutation rate varying from 0.001 to 0.5. Figure 6 shows results of this
comparison. Each operator was applied with the probability that gave best results. With
uniform crossover results are better than with other operators.
Page 6 of 8
NIKITIN A., NIKITINA L. THE GA BASED APPROACH TO OPTIMIZATION OF PARALLEL CALCULATIONS IN LARGE PHYSICS PROBLEMS.
Оператор скрещивания
1pt
2pts
Uni
EvenOdd
PartialMatch
75
70
65
60
55
45
E
Отн. ошибка, %
50
40
35
30
25
20
15
10
5
t
0
0
10
20
30
40
50
60
70
Номер итерации
80
90
100
110
120
130
Figure 6. A comparison
of crossover operators.
Вероятность скрещивания
24
0%
20%
60%
80%
90%
100%
23
22
21
20
18
E
Отн. ошибка, %
19
17
16
15
14
13
12
11
10
t
0
50
100
150
200
Номер итерации
250
300
350
400
Figure 7. Performance of algorithm for different crossover rates.
CONCLUSION.
An approach for the solution of the parallelization problem on the basis of Genetic
Algorithm is suggested. Numerical study of this method was carried out. GA showed fast
convergence and may be used for for real-time parallelization.
Page 7 of 8
NIKITIN A., NIKITINA L. THE GA BASED APPROACH TO OPTIMIZATION OF PARALLEL CALCULATIONS IN LARGE PHYSICS PROBLEMS.
REFERENCES.
[1] D.E. Goldberg, Genetic Algorithms in Search, Optimization & Machine
Learning, Addison-Wesley Publishing Company, 1989
[2] Nikitin A.V. Optimization of Modular Associative Memory // Computational
mathematics and modeling. 1999. vol 10. N. 4. p. 405-412
[3] N.M. Ershov, A.M. Popov, Optimization of Parallel Computations by MonteCarlo Method, In: Proceedings of PaCT-93, Obninsk, Russia, 1993, Vol.3
[4] G. von Laszewski, Intelligent Structural Operators for the K-way Graph
Partitioning Problem, In: Proceedings of the 4-th ICGA, 1991
[5] T. Kalinowski, Solving the Mapping Problem with Genetic Algorithm on the
MasPar-1, In: Proceedings of MPCS, Ischia, Italy, 1994
Page 8 of 8