A Model for the Emergence of Adaptive Subsystems

Bulletin of Mathematical Biology (2002) 00, 1–30
doi:10.1006/bulm.2002.0315
Available online at http://www.idealibrary.com on
A Model for the Emergence of Adaptive Subsystems
H. DOPAZO∗
Departamento de Biologı́a,
Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires,
Pabellón 2, Ciudad Universitaria,
1428 Buenos Aires,
Argentina
E-mail: [email protected]
M. B. GORDON
Laboratoire Leibniz-IMAG,
46, ave. Félix Viallet,
38031 Grenoble Cedex,
France
R. PERAZZO
Departamento de Fı́sica,
Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires,
Pabellón 1, Ciudad Universitaria,
1428 Buenos Aires,
Argentina
S. RISAU-GUSMAN
Zentrum für Interdisziplinäre Forschung,
Universität Bielefeld,
Wellenberg 1,
D-33615, Bielefeld,
Germany
We investigate the interaction of learning and evolution in a changing environment.
A stable learning capability is regarded as an emergent adaptive system evolved by
natural selection of genetic variants. We consider the evolution of an asexual population. Each genotype can have ‘fixed’ and ‘flexible’ alleles. The former express
themselves as synaptic connections that remain unchanged during ontogeny and
the latter as synapses that can be adjusted through a learning algorithm. Evolution
is modelled using genetic algorithms and the changing environment is represented
by two optimal synaptic patterns that alternate a fixed number of times during the
‘life’ of the individuals. The amplitude of the change is related to the Hamming
distance between the two optimal patterns and the rate of change to the frequency
with which both exchange roles. This model is an extension of that of Hinton and
Nowlan in which the fitness is given by a probabilistic measure of the Hamming
∗ Corresponding address: Bioinformatica, CNIO, c/Melchor Fernandez Almagro 3, Madrid 28029,
Spain. E-mail: [email protected]
0092-8240/02/000001 + 30 $35.00/0
Society for Mathematical Biology.
c 2002 Published by Elsevier Science Ltd on behalf of
2
H. Dopazo et al.
distance to the optimum. We find that two types of evolutionary pathways are
possible depending upon how difficult (costly) it is to cope with the changes of
the environment. In one case the population loses the learning ability, and the
individuals inherit fixed synapses that are optimal in only one of the environmental
states. In the other case a flexible subsystem emerges that allows the individuals to
adapt to the changes of the environment. The model helps us to understand how an
adaptive subsystem can emerge as the result of the tradeoff between the exploitation
of a congenital structure and the exploration of the adaptive capabilities practised
by learning.
c 2002 Published by Elsevier Science Ltd on behalf of Society for Mathematical
Biology.
1.
I NTRODUCTION
Survival and reproduction of living beings depend upon the access to scarce
resources that are in general distributed irregularly in space and time. In order
to survive and reproduce, the individuals have to adapt to the environment and
this is achieved by developing complex behavioural patterns. Adaptation may take
place across generations or within the lifespan of an individual. In the former case
natural selection acts upon the genetic variation within a gene pool. In the latter, it
is usually referred to as learning and takes place during ontogeny. Such capacity
to learn or, what is the same, to change the behaviour on the basis of past experiences, is acquired because it contributes positively to the individuals’ reproductive
success.
Both solutions stem from the same evolutionary process, through the generation
of variations and the selection of the fittest alternative, involving the interplay of
two nested adaptive systems (Edelman, 1987).
The interaction of learning and evolution was put into a Darwinian framework
by Baldwin. In his seminal paper (Baldwin, 1896) Baldwin addressed the question
of whether it is the adaptive capabilities of the individuals that guides evolution.
Baldwin’s argument is as follows: bearing a learning ability may help individuals
of a population to survive in conditions where individuals lacking it are eliminated
by natural selection. This means that populations built by individuals capable of
learning would survive while mutations accumulate so that the function that had to
be learned in former generations becomes congenital (Baldwin, 1896).
Several models have been developed to study the Baldwin effect (Hinton and
Nowlan, 1987; Fontanari and Meir, 1990; Ackley and Littman, 1992; French and
Messinger, 1994; Ansel, 1999; Dopazo et al., 2001). The general conclusion is that
learning and biological evolution do indeed interact but such interplay is far from
having a unique outcome.
As far as the indirect transcription of environmental data into genetic information is concerned, the Baldwin effect entails a kind of quandary. On the one hand
if learning is very efficient there is no selection pressure to fix information from
Emergence of Adaptive Subsystems
3
the environment: the more efficient is learning, the less effective is the transcription. On the other hand, if learning bears an excessive cost, the inherited plasticity
becomes useless and the Baldwin effect may never take place.
Learning can be regarded as a common feature used to respond to the challenges
that adaptive systems have to face during evolution. Let us mention a few of these
as considered by Frank (1996). Such responses appear as the result of the interplay
of nested adaptive systems and its evolution therefore fall in a group of problems
related to the Baldwin effect.
One response is the construction of ‘simple rules’ to generate complex phenotypes. This arises when the genetic information is not enough to encode most of
the detailed constitutive phenotypical information, such as the one that goes into,
say, the ‘wiring’ of the central nervous system of higher vertebrates. An example
of that rule is the Hebbian process of stabilizing synaptic connections between neurons through repeated mutual stimulation of the latter. This has been extensively
mentioned in the literature as an example of a process that can lead to long-term
potentiation and the fixation of memories. This adaptive process is a particular
case of simple ‘generative rules’ that are able to construct complex phenotypical
patterns to face a changing environment.
A second response is the emergence of an ‘adaptive subsystem’. Frequently offspring have to face new environments and therefore new threats. It is then useless
for one generation to transfer to the next the best information to survive in its own
environment. Adaptation is possible if an adaptive subsystem is available that generates variations in response to environmental stimuli and later selects the fittest
alternatives. An example of this is the immune system of vertebrates that ‘learns’
to recognize the self from the foreign using the mechanism of clonal selection.
A third possible response is a balance between ‘open’ systems such as the immune
system that shapes itself during ontogeny, and ‘closed’ or ‘wired’ systems that are
a part of the constitutive information of the living being. Both exist, for example,
within the central nervous system: while one is an automatic reflex, the other corresponds to an acquired behaviour. Any living being who undergoes an adaptation
process bears a cost. There are costs due to the time involved in adaptation; due to
mistakes performed during that stage, or simply because there are organic limitations until the moment of reproductive maturity. There is always a balance between
the exploration of new possible responses and the exploitation of successful solutions that have been already acquired.
There are two common ingredients to all the above examples. One is learning
under different forms. We have mentioned the fixation of memories through a
Hebbian process, the adaptation of the immune system to the biotic environment
through clonal selection, and the balance between automatic reflexes and acquired
behaviours. The second key ingredient that is always present is a changing environment. In a fixed environment, neither learning nor the genesis and the subsequent
exploitation of a flexible subsystem are required. It is much ‘cheaper’ from an evolutionary point of view to ‘hardwire’ into the genetic information all the features
4
H. Dopazo et al.
that are relevant for survival, avoiding the need to pay the overhead of a plastic
phenotype that requires a costly adaptation during ontogeny.
The purpose of this paper is to develop a working model to investigate the origin
of adaptive subsystems as a consequence of the interplay of learning and evolution in a changing environment. The kind of questions that we address are: What
types of environmental challenge favour learning abilities? What types of challenge favour a genetic system to spawn a subsystem of variation and selection? It
is clear that, for instance, the emergence of an adaptive subsystem has to be tuned
in some way to the amplitude or to the rate of changes in the environment (Ansel,
1999). To put this into different words, the learning capability should be considered as the reflected image of the challenges arising from the changing environment
faced through the evolutionary process.
If one considers the transcription of environmental data into genetic information as in the Baldwin effect, the frequency of environmental changes gives rise
to several situations. For example, it can be that the information is only partially
transferred because it is too costly to cope with the changes, or it may be that the
variability of the environment is the feature that is extracted and transcribed into
the genetic information under the form of an adaptive subsystem.
In Section 2 we introduce the GHN model, in which we generalize the model
used by Hinton and Nowlan (1987) to include a changing environment. The individuals are characterized by a genetic information that specifies the connections of
an idealized neural network. The strengths of the connections can take only two
values (±1), and may be either inherited, encoded by corresponding ‘fixed alleles’,
or plastic, encoded by ‘flexible alleles’. The latter can change during the lifetime of
the individual. The environment is represented by a binary string, and is assumed
to change between two different configurations during the lifetime of each generation. Each string represents the optimal synaptic connections of a network that
would be perfectly adapted to the corresponding environment. The model allows
us therefore to consider environmental changes that differ both in ‘amplitude’ (the
number of bits in which the two optimal connections differ from each other) and
in frequency (the number of changes that occur in the environment during the lifetime of the individual). It also allows us to investigate how the learning (adaptive)
capability of the individuals is tuned to the environmental changes.
In Section 3 we present a statistical approximation to the fitness function of the
GHN model, that helps understanding its behaviour. In Section 4 we present the
numerical methods that we use to simulate the evolutionary process with a genetic
algorithm (GA). The results are presented and discussed in Section 5. We show
that there are two possible regimes depending upon the difficulties of adapting
to the changes of the environment. When the difficulty of such learning task is
low enough, a stable learning or adaptive subsystem emerges, that is tuned to the
changing environment. If that is not the case, the individuals have one of the possible configurations of the environment ‘hardwired’ into their genomes. We discuss
these possible evolutionary pathways in terms of the features of the fitness land-
Emergence of Adaptive Subsystems
5
scape of the model. In Section 6 we provide a further generalization of the GHN
model to the case of a population of perceptrons as in Dopazo et al. (2001). This is
done with the purpose of discussing the evolutionary consequences of the presence
of epistatic effects other than those that are attributed to learning. The conclusions
are drawn in Section 7.
2.
T HE GHN M ODEL
Our model, a generalization of the one proposed by Hinton and Nowlan (1987)
(hereafter H & N) for the Baldwin effect, considers a population of individuals,
each one having a neural network with L connections or synapses whose strengths
w
E = (w1 , . . . , w L ) result from the expression of L genes. Like H & N, we consider three possible alleles for each locus in the genotype, labelled −1, +1 and
? respectively. The alleles −1 and +1 express themselves through two types of
connections—say inhibitory (−1) and excitatory (+1)—that remain fixed during
ontogeny.† The alleles ? express themselves by adaptive or flexible synaptic connections that undergo changes during the lifetime of the individuals. This process
is hereafter loosely called learning (Hinton and Nowlan, 1987).‡ Fixed and flexible
alleles are inherited by the next generation. Notice that what is transmitted to the
offspring is not the acquired value of each flexible synapse during the ontogeny,
but only the information that the corresponding synapse is flexible. Thus, this is
not a Lamarckian inheritance process.
H & N assumed that there is a single optimal phenotype, well adapted to a fixed
environment, and that the individuals devote a time period of G ‘days’ to learn
it. The ‘learning protocol’ is the following: each ‘day’ the individual performs a
random assignment of all its flexible synapses to be either +1 or −1 with equal
probability. This process stops either if the optimal connection scheme is found,
or if the maximal allowed learning time G is over. The fitness of an individual is a
decreasing function of the number of trials needed to find the optimal connections.
If the right combination is not found within the G allowed trials, the search process
is stopped and a minimum fitness equal to 1 is assigned to the individual.
In our model, we assume that the environment oscillates periodically between
two different states, so that during the life of an individual the optimal synaptic
connections change 2F times. The extension to more than two different environment states is straightforward. The allowed learning time for each environment
state is of T = G/2F trials. Notice that, since the total learning time G is fixed,
any increase of the frequency of environmental changes, F, reduces the number of
allowed learning trials.
The optimal synaptic connections corresponding to each environment are represented by vectors of dimension L labelled w
E 1 and w
E 2 respectively, which only
† We denote the genotypes with bold, and the corresponding synaptic strengths with normal, char-
acters.
‡ In Section 6 we consider a case where learning is a more appropriate term.
6
H. Dopazo et al.
differ from each other on the first L v loci, the remaining L f = L − L v being equal.
For the sake of concreteness we take all the synaptic strengths in w
E 1 to be +1,
while the first L v values of w
E 2 are −1. Namely
Lf
L
z }|v { z }| {
w
E 1 = 1, 1, . . . , 1, 1, 1, . . . , 1
w
E 2 = −1, −1, . . . , −1, 1, 1, . . . , 1 .
|
{z
} | {z }
Lv
(1)
Lf
In the present model, as well as in that of H & N, learning is the only way for
the individuals to increase their fitness, which depends on the learning proficiency.
The fitness of an individual is defined by
φ=
2F
1 X
φi ,
2F i=1
(2)
where
ti
φi = 1 + (L − 1) 1 −
2(T − ti )
T
(3)
is the partial fitness acquired in period i. The latter depends on ti , the number of
trials used in that period to find the corresponding optimal weights. Due to the
Heaviside function§ in equation (3), the partial fitness is definite positive, and takes
values between 1 and L. If in period i the optimum is not found within the allowed
number of trials T the contribution φi to the total fitness in this period is reduced
to its minimum value. The total fitness of the individual, given by equation (2),
is an average over all partial fitnesses, equation (3). As the total learning time
G ≡ 2F T is kept constant, one expects the learning success of the individuals to
decrease with the total number 2F of environmental changes.
One difference with the models discussed by H & N and Dopazo et al. (2001) is
that here the fitness of an individual not only depends on the number of each kind
of allele, but also upon their particular location in the genotype.
By introducing a variable and a fixed part in equation (1) we aim at representing
a situation in which some features of the environment change while others remain
constant. For instance, the colour of the food may change while its smell, size
or location may remain instead constant. As a result, the neural network of the
individuals must adapt in order to still recognize what is edible from what it is not,
despite the environmental change.
Note that any individual having only a single −1 allele in the last L f loci of the
genome has the minimal possible fitness of 1 because it will be unable to find the
optimal connection weights in spite of its learning possibilities. Individuals whose
§ 2(x) = 1 for x ≥ 0, 2(x) = 0 for x < 0.
Emergence of Adaptive Subsystems
7
genotype does not contain any ? are unable to learn and its fitness is determined
at birth. If the synaptic weights match w
E 1 or w
E 2 , a maximum partial fitness of L is
reached in the corresponding environment, while it is equal to the minimum value
of 1 during the rest of the time. All other genetic epistatic effects are neglected
in this model (as well as in that of H & N). A variant of the present GHN model,
in which the fitness is a smoother function of the allelic composition due to such
effects, is discussed extensively in Section 6.
As we explain below, in the simulation of the evolutionary process each new
population is obtained from the preceding generation through a selection process
in which each individual leaves a number of descendants proportional to its total
fitness. Alleles −1, +1 or ? of the offspring are randomly mutated to either of
the two other possible states with a small probability pmut , before they begin the
learning process.
3.
A P ROBABILISTIC A PPROXIMATION OF THE GHN M ODEL
We first present some analytic results, obtained through a probabilistic treatment
developed by Fontanari and Meir (1990) and also used by Dopazo et al. (2001),
which provide a frame for the interpretation of the numerical simulations of our
model, that is explained in detail in the next section. This approach is valid in the
limit of large populations, with also large values of T and L v .
3.1. The genotype mean fitness. We estimate the fitness of each individual¶
through the mean value ϕ of φ, equation (2), averaged over all the possible outcomes of the T allowed trials per environment state. The genome is specified by
the numbers Pv( f ) , Q v( f ) and Rv( f ) of alleles 1, ? and −1 respectively, in the first
L v (the last L f ) loci.k The average, ϕ(Pv , Q v , Rv , P f , Q f , R f ), is calculated as in
Fontanari and Meir (1990). One gets
ϕ(Pv > 0, Q v , Rv = 0, P f , Q f , R f = 0)
= ϕ(Pv = 0, Q v , Rv > 0, P f , Q f , R f = 0)
1
1 − (1 − 2−Q f −Q v )T
=
1 + L − (L − 1)
;
2
2−Q f −Q v T
(4)
ϕ(Pv = 0, Q v = 0, Rv = L v , P f , Q f , R f = 0)
= ϕ(Pv = L v , Q v = 0, Rv = 0, P f , Q f , R f = 0) =
L +1
;
2
(5)
¶ We borrow the term individual from the parlance of the numerical simulation. Actually, we make
no distinction between individual and genotype.
k In fact, only four of the six parameters are independent, as they must fulfil the two relations
Pv( f ) + Q v( f ) + Rv( f ) = L v( f ) .
8
H. Dopazo et al.
ϕ(Pv = 0, Q v = L v , Rv = 0, P f , Q f , R f = 0)
= L − (L − 1)
1 − (1 − 2−Q f −L v )G
.
2−Q f −L v T
(6)
Any other values of the variables Pv , Q v , Rv , P f , Q f , R f yield the minimum fitness, φ = 1, so that also ϕ = 1. The genotype mean fitness ϕ cannot be easily
depicted in sequence space, because it depends on four parameters. Since it is
minimal and completely flat in very large subspaces, the evolutionary process is
expected to effectively take place mainly within the restricted region where ϕ > 1.
The same situation arises in the original H & N model, in which the sequence space
is parametrized by three numbers, related to those in the present model through
P = Pv + P f , Q = Q v + Q f and R = Rv + R f . This was extensively discussed by Fontanari and Meir (1990) and Dopazo et al. (2001). Since in that model
P + Q + R = L, all the realizable sequences lie within the triangle P + Q ≤ L
and the fitness is different from 1 only in the subspace P + Q = L.
In the present case, the relevant subspaces have R f = 0 and either Rv = 0 and
Pv 6= 0, or Rv 6= 0 and Pv = 0. These two regions are in turn entirely symmetrical
with respect to each other. We therefore restrict ourselves to consider the landscape
of the genotype mean fitness in the subspace defined by 0 ≤ Q f ≤ L f and 0 ≤
Q v ≤ L v with R f = Rv = 0. ϕ, defined by equations (4)–(6), has three peaks. Two
of them, which correspond to individuals that are unable to learn, are symmetrically
located at (Pv , Rv ) = (L v , 0) and (Pv , Rv ) = (0, L v ), with Q f = Q v = 0. The
corresponding synaptic connections are w
E 1 and w
E 2 respectively, which are optimal
in either of the two possible environments, but not in both. The third peak corresponds to (Q v , Q f ) = (L v , 0) and R f = 0. The corresponding neural network has
its first L v synaptic connections adaptive. The last L f are equal to the optimal connections, which are the same in both environments. From here on we will refer to
the first two maxima as the fixed maxima and to the latter as the flexible maximum.
The genotype mean fitness of both fixed maxima is ϕ = (L + 1)/2, whereas
that of the flexible maximum depends upon the learning time T . There is a critical
value Tc such that for T > Tc the absolute maximum of this fitness landscape
is the flexible one, while for T < Tc the highest fitness corresponds to any of
the two fixed maxima, which are degenerate. The change of regime can easily
be recognized in the fitness landscapes shown in Fig. 1. The critical value Tc is
obtained by solving the equation that results from equating the fitness of the three
peaks, namely
2−L v −1 Tc − 1 + (1 − 2−L v )Tc = 0.
(7)
The value of Tc scales exponentially with L v , which may be considered as a measure of the ‘amplitude’ of the environmental variation. In fact, for large L v , the
solution of equation (7) can be approximated by Tc ≈ 1.59 · 2 L v .
The existence of two different regimes has a simple interpretation. Within this
model, if the total learning time G is kept constant, a short learning time per
Emergence of Adaptive Subsystems
9
Figure 1. Landscape of the genotype mean fitness ϕ given by equations (4)–(6). The manydimensional sequence space has been reduced to the subspace with R f = 0 and Rv = 0.
The only two coordinates that are necessary are 0 ≤ Q f ≤ L f and 0 ≤ Q v ≤ L v . The
panels (a), (b) and (c) correspond to three different values of T , namely 30 = T < Tc ,
50 = T = Tc and 200 = T > Tc respectively. The value Tc = 50 corresponds to
L v = 5, L f = 16. In every point of the grid which corresponds to a realizable genotype,
the corresponding value of the mean fitness is displayed as a bar.
environment state T is equivalent to a rapidly changing environment (F large).
In such situations, the individuals do not have enough time to take advantage of
a large number of flexible alleles. Learning becomes then a burden and the best
option is to be equal to either of the two fixed maxima at birth, thus being optimally tuned to the environment, but only half of the time. During the other half the
individuals have minimal partial fitness.
The other regime corresponds to a large number of allowed learning trials. In this
case the individuals can fruitfully take advantage of the flexible alleles. By bearing
only flexible alleles in the first L v sites of the genome, an individual can adapt
successfully to both possible environments. The flexible maximum then becomes
the absolute maximum in the fitness landscape.
Both regimes can also be thought of as differing in the cost of learning, measured in terms of the fraction of the existence that has to be devoted to learning.∗
∗ Note that within this model one assumes that reproduction takes place once the learning period
has been accomplished.
10
H. Dopazo et al.
Within this interpretation, the two possible regimes can therefore be associated
with ‘expensive’ or ‘cheap’ learning. In the first case either of the fixed maxima
are the best evolutionary outcome while in the latter case a flexible subsystem arises
in response to the changing environment.
In both regimes a transfer of information from the environment to the genotype
takes place. This segments into two portions of length L f and L v , thus reflecting
the nature of the environmental variation. The final population is either homogeneous, with all the individuals having ? in the first L v first loci, or the population
splits into two groups, both having only rigid alleles, with either L v +1, or L v −1
in the first L v loci.
It is important to stress that the landscapes in the neighbourhood of the fixed
and the flexible maxima are very different. While the latter peak remains rather
isolated, there is instead a gradual road of increasing fitness towards either of the
former. This is due to the presence of the ? alleles, which play a similar role as
in the original model of H & N. These alleles provide a suitable ‘fitness road’ to
both fixed maxima, that would otherwise remain completely isolated (‘needle in a
haystack’ scenario).
3.2. The evolutionary process. In the case of a very large population we can
use the approach introduced by Fontanari and Meir (1990) to obtain an analytic
description of the evolutionary process. We assume that the genetic composition
of the population corresponds to a distribution of maximum entropy (or minimal
bias). This amounts to stating that at any generation g, the fractions of genotypes
with Pv( f ) 1, Rv( f ) −1 and Q v( f ) ? in the first L v (last L f ) loci is
5(g; Pv , Q v , Rv , P f , Q f , R f )
=
L v !( pv (g)) Pv (qv (g)) Q v (rv (g)) Rv L f !( p f (g)) P f (q f (g)) Q f (r f (g)) R f
. (8)
Pv !Q v !Rv !
P f !Q f !R f !
In (8) we have introduced the probabilities pv( f ) (g), qv( f ) (g) and rv( f ) (g) of the
different kinds of alleles at generation g. In the limit of an infinite population, these
can be approximated respectively by the frequencies Pv( f ) /L v( f ) , Q v( f ) /L v( f ) and
Rv( f ) /L v( f ) .
The evolution is described through six recursive equations which determine in
each generation the values of these six probabilities in terms of those of the preceding one. In this process one has to bear in mind that there is a mutation rate
pmut that modifies the values of the P, Q and R. The recursive equations can be
deduced in exactly the same way as in Fontanari and Meir (1990). In generation
g + 1, for pv (g + 1) we obtain
pv (g + 1) = pmut +
×
1 − 3 pmut
hϕ(g)i
X Pv
5(g; Pv , Q v , Rv , P f , Q f , R f ) ϕ(Pv , Q v , Rv , P f , Q f , R f ), (9)
Lv
Emergence of Adaptive Subsystems
11
Table 1.
Variable
pv
qv
rv
pf
qf
rf
ϕ
hϕi
hφi
Definition
Probability of occurrence of a 1 in any of the first L v loci
Probability of occurrence of a ? in any of the first L v loci
Probability of occurrence of a −1 in any of the first L v loci
Probability of occurrence of a 1 in any of the last L f loci
Probability of occurrence of a ? in any of the last L f loci
Probability of occurrence of a −1 in any of the last L f loci
Fitness of a genotype averaged over the learning trials
Fitness averaged over the learning trials and the allele distribution
Fitness averaged over the population
where the summations are extended over all the possible values of the P, Q and R,
and
hϕ(g)i =
X
5(g; Pv , Q v , Rv , P f , Q f , R f ) ϕ(Pv , Q v , Rv , P f , Q f , R f ), (10)
is the genotype mean fitness of the population at generation g, averaged over the
distribution of alleles. The remaining equations for the probabilities p f (g + 1),
qv (g + 1), q f (g + 1), rv (g + 1) and r f (g + 1) have expressions that are similar
to (9), with P f /L f , Q v /L v , Q f /L f , Rv /L v , or R f /L f in the place of Pv /L v ,
respectively.
The preceding equations have been used to determine the curves in Fig. 2, corresponding to the two different regimes reached with an allowed number of learning
trials above and below Tc . We do not detail here the analytic calculations any
further, as the results are easily obtained by iteration, starting from any initial composition of the population. As a summary we give in Table 1 a complete list of the
quantities that are considered in the analytic approach, as well as in the numerical
simulations described in the next section.
4.
T HE N UMERICAL T REATMENT OF THE GHN M ODEL
The evolutionary process is simulated using a GA (Goldberg, 1989; Mitchell,
1996) which, at variance with respect to the statistical approach discussed above,
allows us to consider only finite populations of N individuals. Throughout this
article we have used N = 10 000.
Each individual or genotype is represented by a string of L alleles −1, +1 or ?
encoding for the synaptic connections of the corresponding neural network, to be
−1, +1 or flexible respectively.
The genomes in the initial population are generated at random, with a probability q for each allele of being ?, and 1 − q of being fixed. Among the latter, we
12
H. Dopazo et al.
Figure 2. (a) hφas i as a function of T for two different initial populations, obtained with
pv = 0.225, qv = 0.55, p f = 0.38, q f = 0.55 (full circles) and pv = 0.045, qv = 0.91,
p f = 0.91, q f = 0.045 (empty circles), respectively. The full curves correspond to the
statistical formulation of Section 3. (b) Asymptotic probabilities of the different alleles (as
defined in Table 1) plotted as a function of T .
select at random with probability 1/2 one of the two possible environments, and
the individual is attributed fixed alleles that match (i.e., they encode for the corresponding optimal synapses) the selected environment with probability p. Since
both environments are selected with equal probability, there is no bias in the initial
composition of the population.
At each generation g, the adaptive synapses of each individual are determined
using the learning scheme described in Section 2, a process that allows one to
evaluate the individual’s fitness through equation (2).
The successive generations are determined through selection and mutation (Goldberg, 1989). First, all the members of the population are ranked by their fitness and
each individual leaves descendants that are identical to it with a probability proportional to its fitness. Mutations are introduced by changing each allele at random
Emergence of Adaptive Subsystems
13
into one of the two other possibilities with equal probability pmut . As usual pmut
has to be properly tuned. It should introduce variations with a rate large enough
to produce changes in the composition of the population but also sufficiently small
to make it possible the fixation of successful mutations. In all the simulations we
have used pmut = 0.005.
Among the rules of GAs it is usual to consider the operation of cross-over (Goldberg, 1989; Mitchell, 1996) that mocks up sexual reproduction. This is an important source of variation and it usually speeds up the evolutionary process. We will,
however, not use it here because it adds no relevant conceptual ingredients to the
present model.
We monitor the evolutionary process through the probabilities already introduced
in the analytic approach of Section 3, and defined in Table 1. Probabilities are here
estimated through averages of the number of different kinds of alleles over the
finite population obtained in the numerical simulation. We also calculate hφ(g)i,
the fitness of the individuals in generation g averaged over the population. This
quantity has a large dispersion in the first stages of the evolution, but is expected to
converge to the value hϕ(g)i given by equation (10) when the population reaches
its optimum: either the one where the majority of individuals have a genotype close
to the adaptive maximum or the mixed population of fixed alleles. The results of
the simulations are discussed in the next section.
5.
R ESULTS
5.1. The effect of the learning time T . In order to analyse the effects of a change
in the number of learning trials we compare the evolution of different populations
in which the individuals are allowed to search during increasing learning times T .
This is equivalent to comparing populations with the same learning time G but
evolving in environments that change with decreasing frequency F. We restrict
ourselves to consider environments characterized by L v = 5 and L f = 16 [see
equation (1)]. Considering other values of L v and L f does not introduce any qualitative changes in the results. Only the time scales of the different regimes described
here are modified.
We first consider the statistical formulation of the model, described in Section 3.
The evolution is obtained running the recursive equations given in equation (9) for
as many generations as necessary to reach a population with a stationary composition and we then calculate the average fitness of the whole population hϕ∞ i. The
control parameters of these simulations are the probabilities of fixed and flexible
alleles in the composition of the initial population. For T < Tc ' 51 the system
converges to a population in which the value of hϕ∞ i grows very slowly as a function of T . But for T > 51 we find that two different behaviours are possible. If the
initial probability of flexible alleles is small, the system converges to the same kind
of population as for T < 51. On the other hand, if the initial probability of flexible
14
H. Dopazo et al.
alleles is large enough, the population stabilizes in a value of hϕ∞ i that grows much
faster with T . As the populations in the statistical model are infinite, all the systems
are expected to converge to populations with the largest value of hϕ∞ i depicted in
the upper branch. The fact that some systems converge to a lower value of fitness
is an artifact of the calculation of the recursions, equation (9). It is the lack of
precision that causes certain systems to get ‘stuck’ in the lower branch. These two
behaviours correspond to the two different regimes discussed in Section 3. The
values of hϕ∞ i as a function of T are displayed as continuous lines in [Fig. 2(a)].
The value Tc ' 51 agrees with the solution of equation (7) for the values of L v and
L f considered here.
We next compare these results with the numerical simulation of the evolutionary process as described in Section 4. We let the population evolve during 400
generations. The (approximate) asymptotic value hφas i of the fitness of each population, determined by averaging the fitness of all the individuals in the last 100
generations, is represented by symbols in the same [Fig. 2(a)], as a function of T .
The different symbols correspond to two different choices of p and q in the initial
population. Thus, the numerical results are seen to agree with the probabilistic estimates to good degree of accuracy. Both approaches yield the same two asymptotic
regimes.
The structure of the genotypes in the population is given in Fig. 2(b) where we
plot the probabilities listed in Table 1 (estimated by the fraction of the corresponding alleles in the simulated populations, also averaged over the last 100 generations), as a function of T . The value of r f has not been represented because it is
vanishingly small for all T . On the other hand, p f remains instead always close to
unity. This is so because in all the considered environments the last L f synaptic
weights are +1 and any individual having even a single −1 allele in this part of the
genome is severely penalized.
In the limit of small T only individuals with alleles encoding for fixed synaptic
strengths close to either w
E 1 or w
E 2 have an appreciable fitness. Bearing too many ?
is a disadvantage because the individuals have not enough time for learning. The
population therefore tends to be composed of individuals as similar as possible
to either of the two fixed maxima, with no flexible synapses. The corresponding
values of pv and rv for T < Tc are not well defined because the evolutionary
process splits the population into two arbitrary fractions, each one similar to one
of the two fixed optima pv = 0, rv = 1, or pv = 1, rv = 0. We discuss this point
in greater detail in the next section.
The structure of the population for large g changes drastically at T ' Tc . The
populations corresponding to the upper branch are composed from individuals with
a mixture of flexible and fixed alleles. The former are located in the first L v loci
while the latter are all +1 and occupy the last L f loci. Correspondingly, the value
of qv undergoes a drastic variation at T ' Tc , from qv ' 0.1 to qv ' 1, while pv
and rv drop to zero. This corresponds to the transition from the fixed to the flexible
optimum.
Emergence of Adaptive Subsystems
15
Figure 3. GHN model in the regime of high learning cost. Evolution of the mean fitness
and allele probabilities as a function of the number of generations g for a population with
T = 30 and initial conditions specified by p = 0.5 q = 0.25 and r = 0.25.
The small discrepancy between hφas i and its analytical counterpart hϕ∞ i is due
to the presence, in the simulations of the finite-size population, of a small fraction
of ? in this part of the genomes ( p f ' 0.1) that have not yet been eliminated.
Simulations with different values of p and q defining the initial population, scatter the points among the two branches in different ways. However, the existence
of the two distinct regimes is robust against such changes in the initial conditions.
When the initial population has a large fraction of flexible alleles [as, for example,
for pv = 0.045, qv = 0.91, depicted in Fig. 2(a)] a clear change can be observed
at T = Tc from one evolutionary branch to the other. For initial populations with
less flexible alleles, the change is not so drastic. In the latter case, and close to
the critical value Tc , the two evolutionary branches are seen to coexist. This is an
effect of the finiteness of both, the size of the population and the number of generations considered in the numerical simulations. The events in the lower branch
have to be considered as truly metastable populations. Except for genetic drift
effects, and given an infinite number of generations, all the results that appear in
the lower branch are expected to move to the upper one, which corresponds to the
absolute maximum of the fitness landscape. Indeed, the fact that the lower branch
fades off as T grows beyond Tc is an indication that a larger fraction of numerical
simulations converge to the flexible optimum.
5.2. The rigid optimum. In this section we discuss the evolutionary process in
the regime of short learning times T . This corresponds to a situation in which
individuals face a difficult learning task because they have few learning trials to
adapt to the changing environment. We consider as an example the case of L v =
5, L f = 16 and a number of learning trials T = 30 < Tc ' 51. In Fig. 3
we show the values of the different variables (listed in Table 1) obtained in the
16
H. Dopazo et al.
numerical simulation of the evolutionary process, as a function of the number of
generations g.
The average fitness of the population is seen to grow rapidly, approaching an
asymptotic value of hφi ' 9. Within the first ∼25 generations the −1 are eliminated from the last L f loci of the genotype because their presence gives rise to a
minimal fitness. The ? alleles are eliminated from the same part somewhat more
gradually. This happens because an individual with fewer flexible alleles needs
less time to learn and has therefore a higher fitness. The rise in the average fitness
also corresponds to the elimination of individuals with a mixture of fixed alleles in
the first L v loci of the genotype. In fact, any individual bearing a combination of
1 and −1 in these sites has also minimal fitness, because it cannot adapt to any of
the two possible environments. The elimination of ? follows for the same reasons
that hold for the last L f sites of the genotype. Within the settings considered in
this example, the individuals that survive after ∼200 generations have the maximum possible partial fitness, but only for half of the time. When the environment
changes, these individuals fail completely to adapt to the new situation.
Each time the environment changes all the individuals have to engage in a learning process. Since having too many ? requires a long search time, which implies a
low fitness, the ? become a burden. Thus, the flexible alleles facilitate the evolution
towards a fixed optimum but are progressively eliminated. This effect is stronger
for lower values of T (or larger frequency F): the fewer the allowed learning trials,
the less probable it is to find the optimal connection scheme. For T < Tc the flexible alleles play the same role as in the H & N model (Hinton and Nowlan, 1987;
Maynard Smith, 1987): they guide the evolution towards a population in which
the individuals have imprinted in their genotypes what previous generations had to
learn, but finally they lose their learning ability (Dopazo et al., 2001).
The initial population is chosen in such a way that its individuals have the same
probability of resembling both reference strings. This symmetry is broken during
the evolutionary process due to random mutations. As a consequence after the first
∼75 generations the population is essentially split into two different subpopulations
having either only alleles 1 or only alleles −1 in the first L v loci of the genome.
This follows (see Fig. 3) from the fact that pv and rv remain approximately linked
to each other, verifying pv + rv = constant ' 0.90 ' 1 − qv . Individuals with the
same number of fixed alleles have the same fitness, provided that these are either
all 1 or all −1 in the first L v loci, and only 1 in the last L f . Both subpopulations can therefore coexist in a stable situation. In the example shown in Fig. 3,
the subsequent ∼200 generations of the evolutionary process only give rise to a
partial replacement of one subpopulation at the expense of the other as a result of
the minute balance of different distributions of the best-fit individuals within each
subpopulation. This process takes place with no appreciable change in the total
average fitness. The symmetry imposed in the initial population that is broken during the evolution is, of course, restored if the evolutionary process is averaged over
an ensemble of equivalent initial conditions.
Emergence of Adaptive Subsystems
17
Figure 4. GHN model in the regime of long learning time. Evolution of the mean fitness
and gene frequencies. Plot of pv , qv , rv , p f , q f , r f and hφi as a function of the number of
generations g for a population with T = 112 and initial conditions specified by p = 0.5,
q = 0.25 and r = 0.25.
5.3. The flexible optimum. In this section we discuss the evolutionary process
in the regime of a long learning time or, equivalently, an easy learning task. We
consider as an example the case of L v = 5, L f = 16 and a number of learning
trials T = 112 > Tc ' 51.†† In Fig. 4 we show the results of the numerical
simulation of the evolutionary process with the same conventions as in Fig. 3.
In the first place one notes that the average fitness of the population grows by
steps, remaining for an appreciable number of generations in the lowest ‘plateau’.
This is reached after a few generations (g ∼ 20) and is similar to that of Fig. 3. The
second plateau, attained around generation g ∼ 230, implies a further evolutionary
change in the composition of the population.
The first drastic increase of the fitness is produced at the same time that r f drops
to 0 and the population is arbitrarily split into two fractions that can adapt to either
environment, as described in the previous section. Individuals with any mixture of
alleles 1 and −1 in the first L v and the last L f loci of the genotype are therefore
eliminated. At the same time the number of flexible alleles in the last L f loci
shrinks (but does not disappear), in a similar process as the one discussed in the
previous section.
The second plateau is reached when qv → 1, corresponding to changes that
take place in the first L v loci that go one step further than what has been hitherto
discussed. The drastic increase in qv is associated with an equally drastic drop
of pv and rv . The population thus departs from both rigid maxima. The process
taking place in the first L v loci is the opposite of the one occurring in the last L f
sites. Fixed alleles are eliminated at the expense of flexible ones. The surviving
†† The choice of the value T = 112 is a matter of practical convenience to better display the main
features of the evolutionary process
18
H. Dopazo et al.
Figure 5. Fraction of the population corresponding to each possible value of the fitness
(solid squares), for two different generations. For the subpopulations associated with each
fitness bin we also show the allele probabilities (listed in Table 1), restricted to the corresponding subpopulation.
individuals tend then to have only ? in the first L v loci. Such adaptable individuals
rapidly take over due to the much higher fitness that stems from their ability to fit
to both possible configurations of the environment.
The abrupt change in the structure of the population is shown in Figs 5 and 6,
where we display the fraction of individuals in the population, distributed according to their fitness. Within the subpopulation associated with each fitness bin, we
also show the probabilities listed in Table 1.
The first plateau corresponds to a population that is split into two fractions, each
one capable of adapting to one of the two possible environments but not to the other.
This is similar to what has been already discussed in the previous section. In the
example considered in our simulations, this takes place between generations g '
20 and g = 225. Since both fractions have similar fitnesses the distribution of the
population according to the fitness is concentrated in a single peak located at hφi '
11 (see Fig. 5). This persists until g ∼ 227. In this generation appears the first
‘superfit genotype’ with Q v = L v . Since the learning time T is large this individual
has a high probability of adapting to both environments. These turn out to be highly
fit individuals, which have therefore a high probability of leaving descendants,
that end up taking over the whole population. The precise moment in which this
transition takes place depends upon the occurrence of a random event, and therefore
the length of the first plateau of fitness may vary significantly from one numerical
experiment to the other. It is also expected to increase on the average, with the
‘amplitude’ L v of the environmental change.
In the simulations, after ∼270 generations the population is composed of four,
clearly separated, subpopulations (see Fig. 6). The main, highly fit group, is in
the neighbourhood of the flexible optimum. Two other, rather smaller groups have
both the same intermediate fitness of φ ' 11 and are well adapted to one of the
Emergence of Adaptive Subsystems
19
Figure 6. Fraction of the population corresponding to each possible value of the fitness
(solid squares), for two different generations. For the subpopulations associated with each
fitness bin we also show the allele probabilities (listed in Table 1), restricted to the corresponding subpopulation.
two possible environments. The last group has a minimal fitness of 1. This pattern
is easy to understand. All the minor peaks, (at φ = 1 and 11), correspond to
sub-populations whose genotypes are only one mutation away from the flexible
optimum, and are in equilibrium with it by the competing processes of random
mutations and selection.
5.4. The population fitness landscape. We have shown two typical examples of
the evolutionary process within the GHN model leading respectively to the flexible
and rigid optima. Both situations can be put into a single comprehensive picture
by introducing a population fitness landscape. In order to do so we use the same
simplification explained for Fig. 1. However, instead of using as free parameters
the number of alleles Q v and Q f we use the fractions of such alleles in the population, that are denoted qv and q f . The same considerations concerning symmetries
are still valid in this case.
In Fig. 7 we show examples of the population fitness landscapes and the evolutionary paths for learning times T that are above and below the critical value Tc .
These are evaluated by setting a grid for values 0 ≤ qv ≤ 1 and 0 ≤ q f ≤ 1. For
each node of the grid, a random population of 10 000 individuals is generated with
a composition determined only by the corresponding probabilities of each kind of
allele. Equation (2) is used to obtain the fitness of each individual, which is next
used to calculate the corresponding average fitness. In the figure we plot the contours of equal average fitness. Note that in this case, at variance with Fig. 1 where
the number of alleles can only take integer values, the surface is smooth because
any point in the plane qv , q f corresponds to a realizable population with a given
average fitness.
20
H. Dopazo et al.
Figure 7. Population fitness landscapes of the GHN model, showing the levels of equal
fitness, projected onto the qv , q f plane. Left panel: T < Tc , right panel: T > Tc . The
evolutionary paths discussed in the previous sections are depicted as a line on top of the
contour plot of the fitness landscapes. The insets show magnification of the evolutionary
path in the neighbourhood of the fixed optimum.
With the conventions of this plot, the maximum located at (qv , q f ) = (1, 0)
corresponds to a population where all individuals are equal to the flexible optimum.
The fixed maximum with pv = 1 corresponds to (qv , q f ) = (0, 0). The effect of
increasing the learning time T can easily be recognized. For T > Tc there is only a
single absolute optimum in the fitness function, which corresponds to the flexible
maximum. For T < Tc only one of the two degenerate local optima having no
flexible alleles can be shown in this plot.
Two examples of evolutionary paths, obtained with the same initial conditions
as in the preceding sections, are shown as paths on top of the fitness landscape.‡‡
The way in which the population approaches the fixed optima is clearly depicted
in these examples. In order to appreciate the speed of the evolutionary process we
draw one symbol for each generation along the evolutionary path.
In the initial stages the evolutionary path is closely orthogonal to the contour
lines. Flat regions of the landscapes, which correspond to a mild selection pressure, are associated with the plateaux displayed in Figs 3 and 4. A detail of the
evolutionary paths in the neighbourhood of the fixed maximum (qv , q f ) = (0, 0)
is presented as an inset in both panels. In these regions with very low selection
pressure, selection and mutation produce a wandering of evolutionary path that is
similar to a random walk.
‡‡ Although these paths adjust to the principal features of the landscape, they are not expected to
fit precisely onto that surface. The average fitness obtained by the numerical simulations show the
influence of random mutations that cause a change in the composition of the population and cannot
be accounted for in the average fitness landscape.
Emergence of Adaptive Subsystems
21
In the left panel, corresponding to T < Tc , the population remains confined to
a small neighbourhood of the fixed optimum. In the right panel, where T > Tc ,
the beginning of the evolutionary path is entirely similar. However, after some
time wandering near the fixed maxima, the system finds an exit path towards the
flexible optimum. This is the signature that a ‘superfit genotype’ has appeared in
the population. The rapid take-over of these highly fit individuals gives rise to a
path that seems to jump from the fixed maximum to the flexible one bridging a
local depression in the fitness surface. This happens because the path does not lie
exactly onto the average surface (see footnote). In fact the average fitness of the
population along the evolutionary path is a never decreasing function.
6.
T HE G ENERALIZED P ERCEPTRON M ODEL (GP)
In this section we present an extension of the GHN model, along the same lines
as the one presented by Dopazo et al. (2001). In this framework, we consider that
the neural network of each individual is a perceptron (Minsky and Papert, 1969) of
L synaptic weights w
E ≡ (w1 , w2 , . . . , w L ). We restrict our model by considering
only ‘Ising perceptrons’ in which wk = ±1. This assumption is consistent with
those of the GHN model described before.
The inputs of the network are assumed to convey data from the environment.
These inputs are represented by vectors xE = (x1 , . . . , x L ). We also assume that
the environment provides the ‘classification’ of the input patterns into one of two
possible groups, labelled by y ∈ {+1, −1}. Such classification is assumed to be
given by two reference perceptrons, which represent the two possible states of the
environment, whose synaptic connections are given by equation (1). To be specific,
we assume that the class of each pattern depends on the weighted sum of the input
vector as follows:
yn = sign(w
E n · xE),
(11)
where n is an integer number that denotes the environment’s state. In turn, the class
given to input xE by the individual’s perceptron, of weights w,
E is
y = sign(w
E · xE).
(12)
At variance with the GHN model, the one considered here brings in epistatic
effects that are not exclusively due to learning. This is so because it includes
the processing made by the network considered as an additional phenotypic trait.
Such processing means that even if a perceptron is not identical to the optimum, it
may nevertheless succeed in classifying properly some fraction of the patterns [see
Dopazo et al. (2001) for an extensive discussion of this point]. With reference to
equation (12), y may be different from yn , the ‘correct’ class associated with each
environment. The fitness of each individual depends upon its ability to properly
22
H. Dopazo et al.
classify new input patterns, which are different from the examples that have been
used during the learning stage.
Although within the GHN model a stable, flexible subsystem is seen to appear,
this process may involve an unreasonably large number of generations because the
final approach to the flexible optimum can only take place as a consequence of the
random appearance of a ‘superfit genotype’. As we shall see, this process can be
greatly accelerated by the introduction of the additional epistatic effects implied in
the more complex phenotypic structure that appears in the GP model.
As in the GHN model, we assume that the alleles encode for fixed or flexible
synaptic weights. The latter can be adapted during ontogeny through a learning
protocol. However, instead of directly searching at random an optimal value of the
adaptive weights, like in the H & N model, we consider a more realistic learning
scenario in which these weights are determined by learning from examples.
This may be gauged by the probability of generalization pg , which is the probability that an input pattern be correctly classified by the (trained) perceptron.∗∗ It
is well known from the literature on neural networks (Hertz et al., 1991; Dopazo
et al., 2001) that pg is a function of the normalized overlap between the individual
perceptron weight vector w
E and that of the reference perceptron associated with
the environment state, w
En,
1
w
E ·w
En
pg (w,
E w
E n ) = 1 − arcos
.
π
|w
E ||w
En |
(13)
In general, the overlaps in equation (13) depend upon the number of examples used
for learning, and also upon the learning algorithm. In the best case, the perceptron
is identical to that associated with the environment, the normalized overlap is 1,
and the probability of generalization is pg = 1. A different situation arises when
the weight w
E associated with the individual is orthogonal to that of the reference.
In such a case pg = 0.5, meaning that the classification is correct only by chance.
The worst case arises when the normalized overlap is −1. This extreme case corresponds to an individual that systematically misclassifies the input patterns.
By way of illustration, consider the case that the perceptron classifies sensory
input concerning, say, a prey into edible or not. Each component of any L-dimensional input pattern refers to the quality of some particular attributes of the prey
(smell, colour, size, taste, etc.). The prey may change some of its attributes, e.g.,
colour or size, in different ‘seasons’ of the year, while others remain unaltered. In
each season there is an optimal classification of prey into edible or not and this
is provided by the current reference perceptron. This changes for the following
season. In order to be consistent with the conventions used for the GHN model we
define the individual fitness as the sum of the partial classification performances of
the individual in all the environment states. We choose the same normalization as
∗∗ The use of p to define the fitness implies a statistical average over many sets of input patterns. In
g
this sense a fitness defined using pg should truly be considered an average genotype fitness.
Emergence of Adaptive Subsystems
23
for the GHN model. We thus write
ϕ=
2F
1 X
[1 + (L − 1) pg (w,
E w
E i )].
2F i=1
(14)
In the following, we restrict ourselves to consider the case of only one environmental change (F = 1), and two limiting learning protocols.† In the one that we
call random learning, the synaptic connections are chosen at random and are left
unchanged during the whole ‘season’. In the other, that we call perfect learning,
all the adaptive synapses are optimally assigned to either wk = 1 or wk = −1.
These two limiting cases provide extreme bounds within which fall the results of
any other learning protocol.
If we consider the case of individuals with no flexible alleles, having Pv( f ) 1
and Rv( f ) −1 in the first L v (last L f ) loci of the genome respectively, the scalar
products entering in equation (13) are
(Pv − Rv ) + (P f − R f )
w
E ·w
E1
=
|w
E ||w
E1 |
L
(−Pv + Rv ) + (P f − R f )
w
E ·w
E2
=
.
|w
E ||w
E2 |
L
(15)
Since we have√
assumed that the neural networks are Ising perceptrons, it always
holds that w
E = L. For perceptrons with flexible alleles the values of Pv( f ) and
Rv( f ) that enter in the scalar products [see equation (13)] depend upon how the
adaptation protocol assigns the Q v( f ) flexible weights. For the case of ‘random
learning’ half of them are on the average assigned to +1 and the other half to −1.
Therefore the individual fitness is
Pv − Rv + P f − R f
1
ϕr nd = 1 + (L − 1) 1 −
arcos
2π
L
−Pv + Rv + P f − R f
+ arcos
.
(16)
L
† Although we do not consider any particular learning scheme, we mention here how it may proceed. Together with each change of the environment one may assume that an individual undergoes a
session of ‘batch’ learning [see e.g., Hertz et al. (1991)]. This may be considered as tasting several
potential prey to determine which are acceptable. During this process only the flexible synapses are
µ
adapted. With the present conventions each component of the M training inputs, xk (k = 1, . . . , L,
µ = 1, . . . , M), can be selected at random. Their ‘correct’ classification are provided by the environment reference perceptron through equation (11). The adjustable weights of the individual are
changed minimizing the number of mistakes. Next each individual is asked to classify new, randomly generated, input patterns until the ‘season’ is over. This procedure is repeated each time the
environment changes. Thus, each ‘season’ is partly devoted to training and partly to classifying new
inputs. There is a compromise here, because a longer learning time and therefore a better classification performance entails a shorter time devoted to test new examples, or equivalently to find edible
food.
24
H. Dopazo et al.
In ‘perfect learning’, when the current environment is w
E 1 , all the synapses encoded by the alleles ? are set to wk = 1, while if the current environment is w
E 2 , the
adaptive weights in the first L v loci are assigned to −1 while those in the last L f
loci are assigned to 1. The corresponding fitness therefore is
Pv + Q v − Rv + P f + Q f − R f
1
ϕ pr f = 1 + (L − 1) 1 −
arcos
2π
L
−Pv + Q v + Rv + P f + Q f − R f
+ arcos
. (17)
L
As an example of the effects of the perceptron processing we may note, for example, that even in the unfavourable circumstance in which P f = Q f = Q v = 0 and
Pv = Rv in which the last L f loci of the genome have only −1 and there is a mixture of −1 and +1 in the first L v loci, the values for ϕr nd and ϕ pr f are larger than 1.
It is practical to consider the reduced subspace in which Rv = R f = 0. Making
use of the fact that Pv + Q v = L v ; P f + Q f = L − L v the two above expressions
for the fitness can be written in terms of the variables Q f and Q v , namely
L − Qv − Q f
1
ϕr nd = 1 + (L − 1) 1 −
arcos
2π
L
L f − L v + Qv − Q f
+ arcos
(18)
L
1
L − 2L v − 2Q v
ϕ pr f = 1 + (L − 1) 1 −
arcos
.
(19)
2π
L
In Fig. 8 we illustrate the landscapes associated with ϕ pr f and ϕr nd following
the same conventions of Fig. 1, i.e. restricting the plot to the subspace in which
R f = Rv = 0. The absolute maximum of ϕr nd is attained within this subspace,
when Q f = Q v = 0 corresponding to the fixed maximum with Pv = L v and
P f = L f . This has its symmetric counterpart outside the subspace, for Rv = L v
and P f = L f . On the other hand, the maximum for ϕ pr f is found in the set of points
Q v = L v ; ∀Q f . This is consistent with the fact that equation (19) is independent
of Q f . All the points of this set correspond to genotypes in which all the first L v
loci are occupied by flexible alleles, therefore corresponding to individuals with a
flexible subsystem tuned to the changes of the environment.
A comparison of both fitnesses shows that the flexible optimum corresponds to
the absolute maximum of ϕ pr f but it is in addition a local minimum for ϕr nd . The
GP model therefore confirms the results of the GHN, in the sense that the emergence of an adaptive subsystem critically depends upon the effectiveness of the
adaptive protocol that prevails during ontogeny. If learning is only partially effective a point can be reached in which it may be worth, from an evolutionary point
of view, to give up the possibility of following the changes of the environment and
exploit the possibility of being fully adapted to one of its possible configurations.
Emergence of Adaptive Subsystems
25
Figure 8. GP model. Mean fitness landscapes using the two extreme learning protocols.
The plot is made with the same conventions as Fig. 1. Panel (a) corresponds to perfect
learning and panel (b) corresponds to random learning. Note the different scales in the
vertical axis of both panels. The calculation has been performed for L v = L f = 10 in
order to better display the slopes in panel (b). Stagnation is displayed by the absence of
any slope along the Q f direction in panel (a).
The fact that the maximum of ϕ pr f is attained in a set of points and not in a single
point can be understood as follows. For perfect learning all ? in the last L f loci
of the genome are always optimally assigned. It is therefore equally effective to
have an allele 1 than a ? and there is consequently no selection pressure to replace
the latter by the former. This situation gives rise to a stagnation of the Baldwin
transcription process of environmental data into the genotype, the same as found
and discussed by Dopazo et al. (2001). It is worth stressing that such stagnation
is instead absent for the process of replacing fixed alleles by flexible ones in the
first L v loci of the genotype, i.e. for the process of evolving a flexible subsystem.
We therefore find within the GP model the two families of evolutionary pathways
as in the GHN. For a difficult learning task the evolutionary path breaks the initial symmetry with respect to both environmental configurations, thus leading to a
mixed population in which there are individuals that resemble either possible environment. For an easy learning task the evolutionary path preserves the original
symmetry, leading instead to a population that resembles the flexible optimum and
therefore bears no bias with respect to the possible environments.
There is, however, an important difference in the nature of the evolutionary paths
for the GP model as compared with the GHN model. The landscapes for ϕ pr f
and ϕr nd are always smooth as a consequence of the epistatic effects brought in by
the processing of the perceptron. The evolutionary paths are therefore expected to
always lead gradually to the corresponding optima, either fixed or flexible, without
the abrupt changes displayed in Figs 3 or 4. We have found that the evolutionary
26
H. Dopazo et al.
process in the GHN has two distinct steps. The first is governed by the Baldwin
effect that acts primarily upon the last L f alleles. The second consists in an almost
random search of the rather isolated flexible optimum. This latter situation should
be considered as an artifact of the GHN model. When richer epistatic effect are
brought in, as in the GP model, a smoother fitness landscape is found in which a
situation such as the search of a ‘needle in a haystack’ may never happen. Thus, the
evolutionary process involves a gradual and cumulative allelic substitution without
the need of relaying on the random occurrence of a ‘superfit genotype’.
7.
C ONCLUSIONS
We have presented two working models to discuss the emergence of adaptive
subsystems and their relationship with the Baldwin effect in a changing environment. One is a generalization of the well known framework developed by Hinton
and Nowlan (1987) and the second is an extension of the perceptron model discussed by Dopazo et al. (2001) in which the individuals are endowed with a simple
neural network having a limited processing ability.
We considered a changing environment that is represented as two reference states
that exchange roles a fixed number of times during the ‘life’ of the individuals. To
fix ideas, features of a potentially edible prey may change with the season of the
year, while the others remain constant. Individuals are therefore faced to the problem of learning which are the edible prey, each time that the environment changes.
We encode the environment in two strings of information bits that represent the
weights of two hypothetical neural network. Each one is assumed to discriminate
perfectly which prey is edible when it is given the corresponding attributes as inputs
to the network. We assimilate the ‘amplitude’ of the changes with the Hamming
distance between the reference strings while the ‘frequency’ of such variation is
assimilated to the number of times that both strings exchange roles during the life
of the individual. The model can easily be extended to more complex situations in
which there are several reference strings. However, the case considered here has
all the necessary ingredients to capture the essential consequences of adaptation in
a changing environment.
The genotype of each individual involves fixed and flexible alleles. These express
themselves respectively as fixed and flexible synaptic connections of a hypothetical
neural network. The presence of flexible alleles in the genome enables the individual to attempt a searching (learning) process trying to match its synapses with the
current environment. This search has to be repeated every time that the environment changes. A greater fitness is associated with a shorter searching process and
a minimal one is attributed to an individual that is unable to find the current environment in a prescribed number of learning trials (Hinton and Nowlan, 1987).
The situation is such that an individual that has mostly correct fixed alleles has
a great success in matching one of the two possible environmental configurations
but performs very badly when this changes. Such an individual can thus be thought
Emergence of Adaptive Subsystems
27
of as succeeding in identifying its prey during one season of the year but starving
during the next. On the other hand, an individual with flexible alleles that is able to
match the changing features of the environment through learning is in the position
of adapting well in both circumstances but always has to spend a fraction of its life
learning (searching) how to match the current environment.
A statistical estimation (Fontanari and Meir, 1990) can be made of the fitness
associated with this learning process, which is related to the ‘amplitude’ and the
‘rate of change’ of the environment. Either when the changes occur too fast or
when these have a large amplitude, or equivalently when the number of learning
trials is too small, the (reproductive) cost of learning is high. On the other hand,
when the frequency of environmental change or its amplitude is low, or when a
large number of learning trials are allowed, the learning cost is low.
The initial population is chosen with minimum bias, i.e. having the same resemblance to either of the reference strings. In spite of this, the symmetry is not preserved during evolution due to random mutations. The two regimes mentioned
above are associated respectively to evolutionary processes involving pathways that
lead to populations in which the above symmetry is broken or recovered.
In the case of a high learning cost the best evolutionary strategy is to give up
flexible alleles in favour of those that match either of the two possible environment
reference configurations. The resulting individuals have a very high fitness only
during half of the time. The final population has lost the initial symmetry with
respect to both environmental strings.
In the low learning cost regime, upon evolution the individual transcribes into its
genetic information the fact that some of the features of the environment remain
unchanged and that the rest are constantly changing. The latter features are matched
by a string of flexible alleles. The resulting population recovers the initial symmetry because its individuals bear no bias in favour of either environment. When the
process of allelic substitution is not able to cope with a high rate of environmental
changes, natural selection preserves genetic plasticity in such a way that the possibility of learning is tuned to those features of the environment that are not constant.
In either case the information that there is a changing and a fixed part in the environment is properly transcribed into the genotypes of the population.
In the high-cost regime the balance between the ‘exploration’ of the different
alternatives through learning and the ‘exploitation’ of a ‘hard wired’ (or ‘closed’)
solution is decided in favour of the latter (Frank, 1996). The resulting population
avoids the risk of misadaptation due to a poor learning performance in both environmental states and prefers an inherited perfect adaptation to one of these states.
On the other hand, when learning has a low cost the evolutionary strategy is to
end in a population of individuals with a stable, adaptive subsystem tuned to the
changing environment. The solution is then to ‘explore’ an ‘open’ system geared
to adapt to all possible environmental conditions, which is more efficient during all
of the time. In this case the emergence of a stable adaptive subsystem is enhanced
by selection.
28
H. Dopazo et al.
We also present an alternative model, the GP model, with the aim of introducing
different epistatic effects, such as those that can be attributed to a complex phenotype. Within this framework it is possible to analyse two limiting cases in which the
fitness function can be formulated analytically. In one that we call ‘random learning’, the flexible alleles are randomly assigned to either 1 or −1 each time that
the environment changes. This mimics the case of a very difficult (costly) learning
process. The other extreme situation can be considered to represent instead a vanishing learning cost. In this, which we call ‘perfect learning’, the flexible alleles
are always optimally tuned. The fitness landscape for ‘random learning’ has its
largest value in coincidence with the fixed maxima.
The evolutionary process within the GP model for ‘random learning’ involves the
symmetry breaking pathways that we have mentioned above. The fitness landscape
for ‘perfect learning’ contains instead a single absolute maximum that corresponds
to the emergence of an adaptive subsystem. The corresponding evolutionary pathway therefore preserves the original symmetry. The decision between one of the
two alternatives only depends upon the reproductive cost that has to be incurred
by the individuals that engage in a learning process, i.e. on the effectiveness
of the learning protocol. However, an important difference is found with respect
to the GHN model. The flexible optimum never appears isolated thanks to the
epistatic effect involved in the GP model. The approach to it is therefore expected
to take place through gradual, cumulative allelic substitution and not by the random
appearance of a ‘superfit genotype’ as in the GHN model. Within ‘perfect learning’
we find in addition a stagnation similar to that of the Baldwin transcription process
as discussed by Dopazo et al. (2001).
In both the GHN and the GP models, with difficult learning tasks an accelerated transcription of environmental features into genetic information—the classical interpretation of the Baldwin effect—takes place in two ways. On the one
hand, the flexible alleles that were originally allocated to the last L f loci of the
genome are replaced by 1 thus encoding the (constant) environmental features. On
the other hand, individuals with a mixture of 1 and −1 in the first L v loci are eliminated. These processes are accelerated at the expense of the flexible alleles, which
progressively disappear. If these were not present the evolutionary process would
take place through an inefficient random walk in sequence space.
Within the GHN model the process leading to the emergence of a flexible subsystem follows a different pattern. During the first stage of the evolution the population tends to resemble either of the fixed reference connection patterns. Once this
situation is reached the selection pressure becomes very low and random mutations
produce essentially no improvement in the average fitness of the population. This
process may continue for a significant time (in fact it scales exponentially with the
‘amplitude’ L v of the environmental changes) until a ‘superfit genotype’ is found
by accidental mutation. This triggers a second evolutionary stage in which the
population is rapidly driven towards the flexible optimal genotype. The nature of
the evolutionary process within the GP model is different. All epistatic effects that
Emergence of Adaptive Subsystems
29
are brought in by the processing of the perceptron give rise to a smooth fitness
landscape that has no isolated maxima. The evolutionary process therefore always
entails a gradual and cumulative allelic substitution. The exponential search time
is therefore dramatically reduced.
It is amusing that there is a close analogy between the occurrence of the two
regimes of high and low learning difficulty, as pictured through the behaviour of
the population in different fitness landscapes and physical systems that undergo a
first-order phase transition. In the physical systems, the equivalent of the fitness is
(the negative of) the free energy, a quantity that is minimal when the system is at
equilibrium at a finite temperature. The latter introduces noise playing a role that
is in some senses similar to the occurrence of random mutations. The microscopic
state of the system is therefore a random variable, the same as the composition
of the population evolving through the GA. When the free energy presents two or
more minima, the system may get trapped in the one of higher energy, from which
it can only escape through the random modifications of the microscopic variables
thanks to the temperature. In Physics, this sudden change is called a first-order
phase transition. This process bears strong similarities to the way in which the
flexible optimum is approached in the GHN model: the final stage of evolution is
essentially driven by the occurrence of favourable mutations.
To summarize, we may say that we find a twofold situation. For a high learning
difficulty the traditional Baldwin effect is in full operation. The population ends
up by resembling either of the two reference strings, giving up genetic flexibility.
For a low learning difficulty, a flexible optimum is always attained. In the GHN
model the final stages of evolution are governed by a random search of a rather
isolated optimal configuration pattern. This may cast doubt upon the robustness
of such an evolutionary outcome. However, the emergence of a flexible subsystem
does not necessarily depend on the random emergence of a ‘superfit genotype’.
The GP model helps one to understand how the epistatic effects introduced by the
(phenotypic) processing ability of the perceptron eases the emergence of a flexible subsystem, giving rise to a gradual approach to the same optimal genotype
through cumulative allelic substitutions. In this case natural selection acting upon
the genetic basis of behavioural traits originates a fine-tuned adaptive subsystem
able to cope with the uncertainties of a changing environment.
ACKNOWLEDGEMENTS
MBG and RP acknowledge support from the ZIF (Bielefeld, Germany) where
part of this work has been performed, in the frame of the Research Group ‘The
Sciences of Complexity: From Mathematics to Technology to a Sustainable World’.
HD, SR-G, MBG and RP acknowledge economic support from the EU—research
contract ARG/B7-3011/94/97. HD and RP hold a UBA research contract UBACYT PS021, 1998/2000/2001. MBG is member of the CNRS.
30
H. Dopazo et al.
R EFERENCES
Ackley, D. and M. Littmann (1992). Interactions between learning and evolution, in Artificial Life II, G. C. Langton, C. Taylor, J. Farmer and S. Rasmussen (Eds), Redwood City,
CA, USA: Addison-Wesley.
Ansel, L. W. (1999). A quantitative model of the Simpson–Baldwin effect. J. Theor. Biol.
196, 197–209.
Baldwin, J. M (1896). A new factor in evolution. Am. Nat. 30, 441–451.
Dopazo, H., M. Gordon, R. P. J. Perazzo and S. Risau-Gusman (2001). A model for the
interaction of learning and evolution. Bull. Math. Biol. 63, 117–134.
Edelman, G. M. (1987). Neural Darwinism. The Theory of Neuronal Group Selection,
Oxford: Oxford University Press.
Fontanari, J. F. and R. Meir (1990). The effect of learning on the evolution of asexual
populations. Complex Syst. 4, 401–414.
Frank, S. A. (1996). The design of natural and artificial adaptive systems, in Adaptation,
M. R Rose and G. V Lauder (Eds), New York: Academic.
French, R. and A. Messinger (1994). Genes, phenes and the Baldwin effect: learning and evolution in a simulated population, in Artificial Life IV, R. Brooks and
P. Maes (Eds), Cambridge, MA: MIT Press.
Goldberg, D. (1989). Genetic Algorithms in Search, Optimization and Machine Learning,
Redwood City, CA, USA: Addison-Wesley.
Hertz, J., A. Krogh and R. G. Palmer (1991). Introduction to the Theory of Neural Computation, Redwood City, CA, USA: Addison-Wesley.
Hinton, G. E. and S. J. Nowlan (1987). How learning can guide evolution. Complex Syst.
1, 495–502.
Maynard Smith, J. (1987). When learning guides evolution. Nature 349, 761–762.
Minsky, M. and S. Papert (1969). Perceptrons, Cambridge, MA: MIT Press.
Mitchell, M (1996). An Introduction to Genetic Algorithms, Cambridge, MA: MIT Press.
Received 13 February 2002 and accepted 1 August 2002