The Impact of Changing the Way the Fitness Function Is

The Impact of Changing the Way the Fitness Function Is
Implemented in an Evolutionary Algorithm for the
Design of Shapes
Andrés Gómez de Silva Garza
Computer Engineering Department, Instituto Tecnológico Autónomo de México (ITAM),Río Hondo
#1, Col. Progreso-Tizapán, 01080—México, D.F., México
[email protected]
Abstract. Evolutionary algorithms (EA's) have been used in many ways for design and other creative tasks. One of the main elements of these algorithms is
the fitness function used by the algorithm to evaluate the quality of the potential
solutions it proposes. The fitness function guides, constrains, and biases the algorithm's search for an acceptable solution. In this paper we explore the degree
to which the fitness function and its implementation affects the search process
in an evolutionary algorithm. To do this, the reliability and speed of the algorithm, as well as the quality of the designs produced by it, is measured for different fitness function implementations.
Keywords: Evolutionary algorithms, fitness function, evolutionary design
1
Introduction
Evolutionary algorithms (EA’s) are general-purpose search methods that operate on
the individuals in a population of potential solutions to a problem [4]. An EA will
evolve these potential solutions through a series of generations until some convergence criterion is reached. They embody such a general-purpose search method that it
is no surprise that EA's have often been applied to the support of design and other
creative tasks. Two compendia of example systems and applications can be found in
[1] and [2]. In many of the examples included in these compendia, the fitness function
is not automated. Instead, in some of the example approaches a user is asked to determine the fitness function by tweaking the values of different parameters provided
through system interface controls. In some of the other examples, votes are gathered
from a large set of users to rank the solutions proposed by the systems and therefore
eliminate the need for a fitness function to be explicitly programmed. Instead, in our
work we are concerned with the traditional EA method of having a completely preprogrammed, autonomous, fitness function integrated into the EA's functioning.
Andrzej M.J. Skulimowski (ed.): Proceedings of KICSS'2013, pp. 104-113
© Progress & Business Publishers, Kraków 2013
The fitness function analyzes the characteristics of the solutions generated by the
EA according to how well they match the desired characteristics. These desired characteristics are usually expressed in general terms, and the fitness of a particular solution is determined by measuring the solution in comparison to the general pattern.
Like any computational algorithm, the fitness evaluation function can be implemented
in multiple ways. All of them are equivalent in the sense of describing the same general type of algorithmic solution. However, the specific way in which the fitness function is expressed or implemented could affect the performance of the EA. This paper
describes a set of experiments designed to measure this phenomenon. It is assumed
that the reader has a passing knowledge of how a generic EA operates. If this is not
the case, the reader is pointed to [4] as a possible source of background knowledge on
EA’s.
The EA used in the experiments described in this paper is called ShEvolver (a contraction of “shape evolver”). Here we describe ShEvolver briefly to set the context for
the description of the experiments we report in this paper. A more complete description of ShEvolver can be found in [3], from which a few sections of this paper were
taken or adapted (with the permission of the Association for Computing Machinery,
ACM, the copyright holder). The domain of ShEvolver is the evolution of shapes that
consist of configurations of colored unit squares. This kind of domain and algorithm
can be applied, for example, to the design of bathroom/kitchen tile patterns.
The genetic alphabet used by ShEvolver is such that each gene represents the execution of a "move-and-place" action, moving from an initial location, by a distance of
one unit square, to a destination location in any one of eight possible directions, and
placing a colored unit square there (in one of four possible colors). A genotype is a
sequence of such moves that, when followed in succession, end up producing a configuration of colored unit squares on the "canvas" (initially empty) that the system is
working on.
Cannot be created:
1:
Can be created:
4:
5:
2:
6:
3:
7:
Fig. 1. Some shapes that cannot, and some that can, be produced using ShEvolver
Because there are only eight possible directions of movement available to
ShEvolver, shapes in which the sides of adjacent unit squares only partially overlap,
- 105 -
shapes that do not have unit squares that fit into a regular grid, or shapes that have
diagonal edges/borders, cannot be described in ShEvolver. Fig. 1 illustrates this, as
well as giving examples of shapes that can be produced by ShEvolver. As can be seen
in shape 6 in the figure, the shapes that can be produced by ShEvolver may include
holes (which could be taken to represent a fifth color, black).
2
Fitness Evaluation
There are 36 fitness evaluation functions (criteria) that have been implemented in
ShEvolver, labeled c1-c36. Some of them focus on measuring geometric characteristics of shapes, some of them focus on color-related features, and others combine aspects of both. An example of the first type of function is c4, which measures the
bumpiness of a shape (where bumpiness is taken to mean that the outer edges of a
shape are jagged, caused by diagonal adjacencies between the unit squares forming
the edges, rather than smooth, caused by horizontal or vertical adjacencies between
the unit squares forming the edges). An example of the second type of function is c12,
which measures the greenness of shapes (the percentage of unit squares in a shape
that have a green color). An example of the third type of function is c17, which
measures the degree to which x-shaped sub-shapes of a given color occur within a
shape.
As an example, Fig. 2 shows the algorithm, written in Java, used to measure the
bumpiness of a shape (evaluation function c4) and shows how it applies to three small
shapes, a, b, and c. In all three cases, a fitness value of 1 is achieved only if the shape
has the maximum possible bumpiness given its size (the number of unit squares it is
composed of). Otherwise a fitness value between 0 and 1 is assigned to the shape
which represents its degree of bumpiness. The degree of bumpiness is calculated by
counting the number of angles along the edges of a shape and dividing by the perimeter of the shape.
// Calculates the degree of bumpiness
// of a shape s:
public double c4(Shape s) {
return (double) numberOfAngles(s)/
(double) perimeter(s);
}
a)
Bumpiness: 4/4=1
b)
Bumpiness: 4/6=1/3=0.67
c)
Bumpiness: 8/8=1
Fig. 2. The algorithm for calculating the bumpiness of a shape (evaluation criterion c4) and
three examples
- 106 -
Each of the evaluation functions in ShEvolver produces a value between 0 (meaning
the absence of whatever is being measured) and 1 (meaning the maximum possible
amount of whatever is being measured). ShEvolver is designed so that multiple fitness
evaluation functions can be applied to the shapes produced by the system before assigning a global fitness value. When the user decides to use multiple evaluation functions, each one of them is given the same weight/importance, i.e., the global fitness
function consists of a linear combination of individual evaluation functions.
For example, let us assume that in a given scenario the fitness function consists of
a linear combination of two evaluation criteria, c1 and c2. Let us also assume that
each of these evaluation criteria was used to evaluate a shape s. Thus, if c1(s) is the
fitness value of shape s when evaluating it by using c1, and c2(s) is its fitness value
when evaluating it using c2, then the global, normalized, fitness value F for s would
be calculated as:
F(s)=(c1(s)+c2(s))/2.
If three evaluation criteria (c1, c2, and c3) had been used to evaluate s, then the
global, normalized, fitness value F for s would be calculated as:
F(s)=(c1(s)+c2(s)+c3(s))/3.
And so on. The result is always a value of F that falls between 0 and 1, inclusive,
calculated in such a way that each of the components (individual evaluation criteria)
that make up F is given the same importance as the rest. ShEvolver is set up so that its
EA tries to maximize the global fitness value of the shapes it generates (though it
could easily be adapted so that its EA tried to minimize said value). If a shape ends up
having a global fitness value of 0.75, this can be interpreted as meaning that its quality (according to whatever evaluation criteria were being used to measure said quality)
is of 75%.
Given this basic framework, two versions of ShEvolver were produced, A and B.
In ShEvolver A, each of the individual fitness evaluation functions was enhanced by
incorporating into it a verification procedure that checks that the shape meets some
minimum and maximum width and height requirements. The enhanced version of the
algorithm shown in Fig. 2 would thus be the one shown in Fig. 3. The value returned
by the algorithm will be identical to the value the original version would return, unless the constraints on the dimensions of the shape are not met, in which case the algorithm will return a fitness value of 0.
// Calculates the degree of bumpiness
// of a shape s:
public double c4(Shape s) {
int w=getWidth(s), h=getHeight(s);
double result;
result=(double) numberOfAngles(s)/
(double) perimeter(s);
if(w<MIN_WIDTH||w>MAX_WIDTH||
h<MIN_HEIGHT||h>MAX_HEIGHT)
result=0;
return result;
}
Fig. 3. The modified version in ShEvolver A of the algorithm from Fig. 2
- 107 -
In ShEvolver B, the fitness evaluation functions were left without modification, but
two new functions, c37 and c38, were written and added into the mix. In c37 a 1 or 0
is assigned to a shape depending on whether its dimensions fall within a minimum
and a maximum acceptable width or not, respectively. No values between 0 and 1 are
allowed. In c38, an analogous procedure is followed, but this time analyzing the
height instead of the width. When running ShEvolver B, three evaluation functions
were used and then combined linearly as described above, to assign a fitness value to
each shape produced by the EA: one of c1-c36, plus c37 and c38.
From the above descriptions it is apparent that ShEvolver A and ShEvolver B are
not very different from each other. In ShEvolver A it can be considered that satisfying
the constraints on the dimensions of a shape contributes 50% of the global fitness
value of that shape (the other 50% being due to whichever original evaluation criterion is being used at a given moment). In ShEvolver B it can be considered that satisfying the constraints on the dimensions of a shape contributes 66.7% of the global fitness value of that shape (33.3% for width and 33.3% for height), with the final 33.3%
being due to whichever original evaluation criterion is being used at a given moment.
3
Experimental Design and Results
In this section we describe an experiment that was designed to compare the individual
evaluation functions in ShEvolver. The experiment was repeated for each of two versions, one using ShEvolver A and one using ShEvolver B. Each version was run
through two stages. Stage 1 consisted of several runs. In each run the initial population of the EA was generated at random. In each run the convergence criterion used
was a combination of two factors: the EA would stop as soon as it produced the first
individual with a fitness value of 0.95 (convergence due to achieving a minimum
acceptable quality) or as soon as it had reached 5000 evolutionary cycles (convergence due to reaching a maximum acceptable time limit), whichever came first. The
limit of 5000 evolutionary generations was designed to halt the EA when no individual with a fitness of at least 95% was found within a reasonable amount of time, instead of permitting the EA to continue potentially forever, with the possibility of never converging. All other EA parameters such as population size, crossover and mutation rates, etc., also remained fixed across all runs. The minimum and maximum requirements were fixed across all runs and were set at 5 for both width and height (that
is, the EA was being forced into producing shapes measuring exactly 5x5 unit
squares, in addition to satisfying whatever other evaluation criterion was active—see
below).
In Stage 1 of the experiment we performed 50 runs of ShEvolver using each of the
36 individual evaluation functions one at a time for a total of 50x36=1800 runs. The
variables that were measured for each run were the number of generations (G) that the
EA went through before convergence and the average fitness (AF) of the individuals
in the population at the time of convergence. The best design produced at the end of
each run was also obtained.
- 108 -
The reason for performing 50 runs for each experimental condition was to be able
to filter out any unusual results that may have been due to the random nature of many
of the decision points in an EA, such as the initial makeup of its population or the
selection of crossover points. Instead, by obtaining the mean behavior over 50 runs,
general trends in the overall set of results can be observed irrespective of any quirks
that may have been present in any of the individual runs. For each set of 50 runs corresponding to each evaluation function, the general trends that were focused on were
the mean values for G and AF, as well as the reliability of the EA. The reliability is
interpreted as being the percentage of times that convergence occurred because a
shape of sufficient quality had been produced (quality convergence), as opposed to
reaching a maximum allowable number of EA generations (time convergence). Given
that the experiment consists of comparing different fitness functions, obtaining different values for the reliability of an EA for the different sets of 50 runs can also be used
to measure the relative strictness of the fitness functions. Fitness functions that are
easier to satisfy, for whatever reason, will result in higher reliability values than fitness functions for which it is more difficult to produce high-quality shapes.
The experimental results given in this section show the results for evaluation functions c4, c12, and c17, although the experiments performed took into account all
evaluation functions, as mentioned above. Tables 1 (reliability), 2 (mean EA generation at which convergence occurred), and 3 (mean fitness of the individuals in the
population when convergence occurred) show the results of performing Stage 1 of the
experiment both for ShEvolver A and for ShEvolver B. Focusing on the middle column of any table allows one to gain insight into how the different evaluation functions used compare with each other in ShEvolver A. Focusing on the right-most column permits a similar comparison in ShEvolver B. Focusing on any one of the three
lower rows of any table allows one to gain insight into how the influence of a given
evaluation function differs between ShEvolver A and ShEvolver B.
Table 1. Comparison of EA reliability when using evaluation functions c4, c12, and c17 in
ShEvolver A and ShEvolver B.
Reliability
ShEvolver A:
ShEvolver B:
c4:
48/50=0.96
50/50=1
c12:
50/50=1
50/50=1
c17:
0
0
Table 2. Comparison of mean EA generation in which convergence occurred when using
evaluation functions c4, c12, and c17 in ShEvolver A and ShEvolver B.
Mean convergence
ShEvolver A:
ShEvolver B:
c4:
272.42
7.9
c12:
186.96
53.08
c17:
5000
5000
- 109 -
Table 3. Comparison of mean fitness of the individuals in the EA population at the time
convergence occurred when using evaluation functions c4, c12, and c17 in ShEvolver A and
ShEvolver B.
Mean fitness at
convergence
ShEvolver A:
ShEvolver B:
c4:
0.896
0.638
c12:
0.900
0.922
c17:
0.002
0.681
An analysis of the results shown in Tables 1-3 leads to the following observations:
 Table 1: The intuition that the constraints imposed by some evaluation functions
are easier to satisfy than others is demonstrated by the fact that c12 had a reliability
of 1 in both ShEvolver A and ShEvolver B (very easy to satisfy), c17 had a reliability of 0 in both ShEvolver A and ShEvolver B (very difficult to satisfy), and c4
had a reliability of 0.96 in ShEvolver A but 1 in ShEvolver B (somewhat easy to
satisfy).
 Table 2: The relative results in the columns for ShEvolver A compared to
ShEvolver B seem to indicate that, when convergence is in general due to quality
(i.e., under high-reliability conditions), ShEvolver B converges much quicker than
ShEvolver A (272.42/7.9≈34 times quicker for c4 and 186.96/53.08≈3.5 times
quicker for c12), whereas under low reliability conditions (c17) there does not
seem to be any difference between ShEvolver A and ShEvolver B as far as an impact on when convergence occurs. Analyzing the column for ShEvolver A seems
to confirm the conclusion reached in the previous bullet point that c12 is an evaluation function that is very easy to satisfy, c17 is very difficult to satisfy, and c4 is
somewhat easy to satisfy. However, if we analyze the column for ShEvolver B it
would seem that c4 is easier to satisfy than c12. Thus, even tiny differences in
comparative experimental scenarios (ShEvolver A vs. ShEvolver B) can cause sets
of results that lead to different interpretations.
 Table 3: The mean fitness of the population for c17 in the ShEvolver A column,
0.002, shows that there are at least a few individuals in the population with non-0
fitness values, even though none of them were good enough to make the EA converge due to quality in any of the runs (as seen from the 5000 mean value for convergence of c17 in Table 2). The results in the column for ShEvolver A make it
difficult to distinguish c4 from c12 (the mean population fitness values that the two
evaluation functions produced were very similar: 0.896 and 0.900, respectively).
However, if we analyze the column for ShEvolver B we could conclude that c4 is
much more difficult to satisfy than c12 (since the mean fitness of the individuals in
the population when convergence occurred was only 0.638 for c4 but 0.922 for
c12), and in fact that c4 is slightly more difficult to satisfy than c17 (which produced a mean fitness value of 0.681), in direct contradiction with all other indications that c17 is very difficult to satisfy (both in absolute terms and relative to the
other evaluation functions analyzed).
- 110 -
In Stage 2 of the experiment each of the 1800 best designs produced in each of the
runs in Stage 1 was evaluated by using each of the 36 individual fitness functions in
ShEvolver. The purpose of this stage was to get a better feel for how the evaluation
functions compare when used to analyze a set of shapes of which most were not "attuned" to the functions. Only 50/1800=2.8% of the shapes that were thus evaluated
were produced by an EA using the fitness function that was doing the evaluating.
Fig. 4 shows the mean fitness values for each of the 1800 shapes when evaluated
by each of the fitness evaluation functions c1-c36 in ShEvolver A. Fig. 5 shows the
corresponding results for ShEvolver B. In the figures the horizontal axis corresponds
to the different evaluation functions used and the vertical axis is the mean fitness
value awarded by the corresponding function to the entire set of 1800 designs.
Fig. 4. Comparison of applying c1-c36 to 1800 designs in ShEvolver A
Fig. 5. Comparison of applying c1-c36 to 1800 designs in ShEvolver B
As can be observed from comparing Fig. 4 with Fig. 5, the overall behavior in the two
graphs is the same, as far as which evaluation functions can be interpreted to be very
easy to satisfy, very difficult to satisfy, and somewhat easy to satisfy. This lends further credence to the conclusions drawn earlier with respect to classifying c4, c12, and
c17 into these three categories. However, the fitness value for the entire set of 1800
designs is clearly higher (approximately twice as high) when using ShEvolver B
- 111 -
compared to its value when using ShEvolver A, for all evaluation functions. This
shows very clearly that even a slight difference in the way that fitness evaluation is
implemented can cause a great difference in the performance of an EA. We saw it
before, in Tables 1-3, when measuring various EA search parameters such as reliability and speed of convergence, and we see it again here, in Fig. 4 and Fig. 5, when
measuring the quality of the designs produced by the EA.
In fact, we actually created a third variation of the system, ShEvolver C. In
ShEvolver C we combined the functionality of c37 and c38 from ShEvolver B into
one new fitness evaluation function, c39 (which assigns a fitness value of 1 to a shape
only if it satisfies minimum and maximum width and height requirements at the same
time). We then ran the same experiment as described above using ShEvolver C, always using as a global fitness function a linear combination of two evaluation functions: one of c1-c36, plus c39. In theory the results should have been similar to those
for ShEvolver A, because now we have a scenario in which satisfying the constraints
on the dimensions of a shape contributes 50% of the global fitness value (c39) of that
shape (the other 50% being due to whichever original evaluation criterion is being
used at a given moment), as in ShEvolver A. However, the result was that the experiment could not be completed, as the computer always out of heap space (memory)
while running it, even though several attempts were made. It was decided to try again,
but setting the time convergence criterion such that the EA went on for a maximum of
500 generations instead of 5000, assuming that this would use up less memory, but
the results were the same. For some reason in ShEvolver C the nature of the shapes
being created by the EA was such that they always grew in size, instead of being constrained to the 5x5 dimensions imposed by the size-related fitness criteria, whereas in
ShEvolver A and ShEvolver B the size constraints were being met by the shapes and
thus the system never ran out of memory.
4
Summary and Discussion
In this paper we described ShEvolver, an EA used for evolving shapes that consist of
configurations of colored unit squares. In ShEvolver, several fitness evaluation functions have been implemented. Some of these functions focus on geometric features of
the designs they analyze, some on color-related features, and some on a combination
of both.
An experiment was designed and performed in order to evaluate ShEvolver's evaluation functions in several ways. First, evaluation functions were compared to each
other under the same experimental conditions and it was found that they can be classified according to whether they impose constraints that are very easy, somewhat easy,
or very difficult to satisfy. Then some of the evaluation functions were combined in
slightly different ways and the new ways of combining them were again run under the
same experimental conditions. It was found in this way that even small differences in
how an EA evaluates the solutions it proposes can cause large differences in the EA's
performance. This effect was observed when measuring several EA parameters: its
reliability, its mean convergence time, the mean fitness value of its population at con-
- 112 -
vergence, and the quality of the solutions it produced. In one of the experimental scenarios tested, a slight difference in how the evaluation functions were combined was
also the difference between the EA being able to complete its task (the experiment) or
not (because the computer ran out of memory space in the middle).
In a typical paper describing a given EA, the way that fitness evaluation is performed by the EA is usually merely described, without any explanation about why it
is the "best" way of doing it or whether alternative algorithms/implementations might
be feasible or were tested. This leads us to suspect that in most cases alternatives may
not have even been considered, and that the implementation that was used was chosen
merely because it was the first one that came to mind to the researcher involved in the
project, or because it has worked in the past for the same or other researchers (whether the application domain was similar or not), or because the researcher is comfortable
with it, or other such possibilities.
However, as we have seen with our experiments, even minor variations in the way
that fitness evaluation is implemented can cause large differences in the performance
of an EA. What this indicates is that, when using an EA, making a few preliminary
trials that evaluate alternate fitness evaluation implementations might enable the construction of more robust, more efficient, more reliable EA’s, capable of producing
better solutions. If, instead, an EA is simply endowed with whatever the first implementation that comes to mind is, without even thinking about alternatives, there is no
guarantee that one has achieved the “best” possible EA for that domain. Taking into
account these lessons learned from our experiments can have a major impact on
whether a particular EA will be able to successfully provide support for creativity in a
given domain or not. Further work would be needed to determine whether general
principles or guidelines could be proposed in order to suggest how to implement the
fitness evaluation function under different application domains and operating conditions when using EA’s for creativity support.
Acknowledgements. This work has been supported by Asociación Mexicana de Cultura, A.C.
References
1. Bentley, P. (ed.): Evolutionary Design by Computers. Morgan Kaufmann, San Francisco
CA (1999)
2. Bentley, P., Corne, D.W. (eds.): Creative Evolutionary Systems. Morgan Kaufmann, San
Francisco CA (2002)
3. Gómez de Silva Garza, A.: Exploring the Sensitivity to Representation of an Evolutionary
Algorithm for the Design of Shapes. In: Proceedings of the Eighth ACM International
Conference on Creativity and Cognition (C&C '11), pp. 259-267, Atlanta GA (2011), DOI:
http://dl.acm.org/citation.cfm?doid=2069618.2069661
4. Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, Cambridge MA (1998)
- 113 -

Download Report

The Impact of Changing the Way the Fitness Function Is

Paperzz.com

Your Paperzz