A Reduced Markov Model of GAs without the
Exact Transition Matrix
Cheah C. J. Moey and Jonathan E. Rowe
School of Computer Science,
University of Birmingham,
Birmingham B15 2TT, Great Britain
{ccm, jer}@cs.bham.ac.uk
Abstract. Modelling a finite population genetic algorithm (GA) as a
Markov chain can quickly become unmanageable since the number of
population states increases rapidly with the population size and search
space size. One approach to resolving this issue is to “lump” similar
states together, so that a tractable Markov chain can be produced. A
paper by Spears and De Jong in [1] presents an algorithm that can be
used to lump states together, thus compressing the probability transition
matrix. However, to obtain a lumped model, one needs to calculate the
exact transition matrix before the algorithm can be applied to it. In this
paper, we explore the possibility of producing a reduced Markov model
without the need to first produce the exact model. We illustrate this
approach using the Vose model and Spears lumping algorithm on the
Onemax problem with a selection-mutation GA.
1
Introduction
A Markov process is a stochastic process, in which the current state is determined
through some random events, and is dependent on what happened in the previous
time step. Therefore, the genetic algorithm is a Markov process because the state
of the current population is determined through a series of evolutionary events
such as selection, crossover and mutation acting upon the previous population.
Markov chain models of GAs have been widely used by Davis and Principe [2],
De Jong et al. [3], He and Yao [4], Nijssen and Bäck [5], Rees and Koehler [6],
and Wright and Zhao [7] to analyse GAs. Although it is possible to derive some
general conclusions about GAs from this work, it is often impractical to perform
any calculations with these models, since the number of possible populations
(and thus states of the system) grows enormously with the population size and
string length. If the population size is N and the string length is , then the
number of distinct populations is
N + 2 − 1
.
(1)
N
Therefore, if we are using a standard binary encoded string length of 8 and a
population size of 10, we will have an order of 1017 different possible population
2
C. C. J. Moey, J. E. Rowe
states. Obviously, trying to perform calculations with such a large number of
states will be impractical.
Fortunately, there has been some interest in finding ways to reduce the number of states in Markov chain models, by “lumping” similar states together. This
allows the transition matrix of Markov chain models to become more tractable,
by greatly reducing its dimensionality. In this paper, we consider a lumping
technique proposed by Spears and De Jong in [1] and [8]. Naturally, this cannot
typically be done without loss of information, and so, only approximate results
can be obtained from such simplified models. Spears and De Jong present empirical evidence that their method can produce good approximate results, though
this has not been proven analytically.
Even though it is possible to have a low dimensionality Markov chain models,
we will still need to calculate the exact transition matrix first, before the compression algorithm proposed by Spears can be applied. So, if we are to model
a GA, say with a string length of 8 and a population size of 10, we will still
need to calculate a matrix of 1017 × 1017 dimension before Spears’ compression
algorithm can be used. Therefore, it is desirable if we can somehow obtain the
compressed transition matrix directly without the need to calculate the exact
transition matrix beforehand.
In this paper, we will explore a possible way of obtaining the compressed
transition matrix of a Markov chain model of a GA without the need to calculate
the exact transition matrix.
2
2.1
Modelling Work
Notations and Terminology
Let denote the string length. If a fixed-length standard binary encoded string
is used, then the number of possible strings in the search space is n = 2 . A
finite population of size N can be represented using an incidence vector. If the
number of possible strings is n, the vector v ∈ Nn represents
a population in
which vk is the number of copies of string k. So, we require k vk = N .
For the infinite population model, the representation of a population vector
will be independent of the population size N. A population vector p ∈ Rn repof string k. Hence pk can be
resents a population in which pk is the proportion
in the range of [0, 1]. Clearly, we require k pk = 1. Throughout this paper,
the symbols p, q will represent population vectors, and u, v will represent incidence vectors. Given an incidence vector v we can calculate the corresponding
population vector by setting p = v/N .
The set of all possible vectors, whose components sum to 1 is known as the
simplex or (the n-simplex) and is denoted by
Λ=
n
p ∈ R : ∀k, pk ≥ 0,
n−1
i=0
pi = 1
.
(2)
A Reduced Markov Model of GAs
3
Obviously, all real populations correspond to points within the simplex. However,
not all points in the simplex correspond to finite populations, since components
in the corresponding population vectors must be rational numbers.
2.2
The Vose Infinite Population Model
In this paper, we will consider a selection-mutation only GA with fitness proportional selection and bitwise mutation (no crossover). In the Vose infinite
population model, the random heuristic map G is an operator that is composed
of genetic operators such as selection and mutation that maps a point in the
simplex to another point in the simplex. For a mutation and selection only GA,
the random heuristic map is defined as G = U ◦ F, where U : Λ → Λ describes
mutation and F : Λ → Λ describes selection. According to theory of infinite
population GA by Vose in [9], a mutation and selection only GA has a fixed
point in the simplex. The action of G is is given by
G(p) = U ◦ F(p) = U(F (p)) =
U diag[f ]p
fT p
(3)
where p is the population vector. The fitness proportional selection operator
F (p) is given by
diag[f ]p
F (p) =
fT p
where f is the fitness function (expressed as a vector, fk = f (k)), diag[f ] is the
diagonal matrix of f and f T p is the average fitness of the population p. The
operator U is given by
U(p) = U p
where U is the mutation matrix, with Ui,j being the probability that item j
mutates to item i. The sequence of p, Gp, G 2 p, . . . , G t p as t → ∞ will converge
to a fixed point in the simplex. If q is the fixed point of G then
Gq =
U diag[f ]q
=q
fT q
U diag[f ]q = (f T q)q
(4)
where q is the eigenvector of the matrix U diag[f ] and (f T q) is the average
fitness of the population q. By the Perron-Frobenius theorem, there is exactly
one eigenvector in the simplex and it corresponds to the leading eigenvalue. We
therefore refer to this as the “leading fixed point” of G.
Although G gives the expected next population distribution in the infinite
population limit, it can be extended to calculate the transition matrix of a
Markov chain for finite populations [10]. It is known that for the simple genetic
algorithm, the transition matrix is calculated by the multinomial distribution
sampling on G(p), as described by Vose in [9]. The probability of transiting to a
4
C. C. J. Moey, J. E. Rowe
state u = N G(p) in the next generation given a current population v = N p, is
given by
Pr[i|v]ui
(5)
Pr[u|v] = N !
ui !
i
where Pr[i|v] is the probability that string i is generated in the next generation
(that is, Pr[i|v] = G(p)i ), and the product is over all strings.1 For a selectionmutation GA, we have
j Ui,j vj f (j)
(6)
Pr[i|v] = j vj f (j)
where Ui,j is the probability that string j mutates to i, and f (j) is the fitness
of string j. From the preceding sections, we know that the transition matrix
Q, where Qu,v = Pr[u|v] will become intractable as N and n increase in size.
Therefore, it is desirable to reduce Q using a much smaller set of states. The
following section will describe a method of reducing Q using an algorithm from
Spears and De Jong in [1].
2.3
The Spears Aggregated Markov Model
Assume we have a finite set E of states, and it is partitioned into a disjoint union
of sets. We have a transition matrix Q for the Markov process on E. We attempt
to define a transition matrix on the sets of the partition, which will approximate
Q. Given aggregates A, B ⊂ E, we would like to define Pr[A|B].
Let us begin by assuming that the value Pr[a|b] is constant for all b ∈ B.
Then it is relatively easy to calculate the probability that we end up in the subset
A, since it is only the sum of probabilities of ending up in any state contained
in A and is given by
Pr[A|B] =
Pr[a|B] .
(7)
a∈A
This equation is exact. However, in general, it is impossible to assign a consistent value to Pr[a|B], since the probability of ending up in a state a ∈ E
will depend on which state in B we are currently in. This is where we need an
approximation. A first approximation would be to simply average all the probabilities over the set B. Nevertheless, this will not work well in general, due to
the fact that some states in B may be much more likely to occur than others.
Therefore, following Spears and De Jong, we use a weighted average on those
states. We estimate the likelihood of being in a given state b ∈ B by looking at
the probability of ending up in b given a random previous state. That is, set the
weight of state b to be
Pr[b|e] .
(8)
wb =
e∈E
1
One of the reviewers has pointed out that the Nix and Vose formalism of equation 5
is also known in the Mathematical Biology community as the Wright-Fisher model
in [11] and [12].
A Reduced Markov Model of GAs
Consequently, we can now estimate Pr[a|B] by setting
wb Pr[a|b]
.
Pr[a|B] = b∈B
b∈B wb
5
(9)
Putting everything together, we can now estimate the transition between aggregated states A and B as
Pr[a|b]
b∈B wb
a∈A
.
(10)
Pr[A|B] =
w
b∈B b
Finally, we need to decide on how to create the aggregated states, so that
the exact transition matrix Q can be compressed. Spears and De Jong in [1]
only allow states to be lumped together if the corresponding columns of the
transition matrix Q are similar (that is, are close, when considered as vectors).
Hence, in order to obtain the compressed matrix, they first generate the entire
exact transition matrix and then compare similarities between its columns.
2.4
Calculating the Aggregated Markov Model Directly
In [13] by Moey and Rowe, we know that it is possible to aggregate population
states based on GA semantics such as average fitness and best fitness, and still
being able to get a good reduction in the number of states with a reasonable
loss in accuracy. However, the exact transition matrix Q will still need to be
generated first before any aggregating strategy in [13] can be applied to it. This
is the same problem that is encountered in [1] by Spears and De Jong. In this
paper, we will use a similar approach in aggregation that is shown in [13] but
instead of aggregating based on GA semantics, we will attempt to aggregate
based on the fixed-points of the Vose infinite population model. We show that
by using this approach, we are able to aggregate finite population states into
a reduced Markov chain model using the lumping algorithm by Spears and De
Jong without the need to calculate the exact transition matrix, Q.
The idea is based on the hypothesis that a finite population GA will spend
most of its time in the vicinity of a fixed-point of G (see [9]). We will accordingly
focus our attention on populations that are within a given distance of the leading
fixed-point. Those populations that are not within the specified radius of this
fixed-point (the majority) will all be lumped together into a state called “Rest”.
The reduced model therefore has (α + 1) states, where α is the number of states
that are within the given distance of the leading infinite population fixed point
(IPFP) and all others that are not within that radius are lumped together, in
the Rest state. If the hypothesis is correct, then the GA will spend only a little
time in this lumped state, and so the errors that are introduced should not be
significant.
Let p be the leading IPFP of a selection-mutation GA. We generate a finite
set E of all possible population states and calculate the distance between each
state in E from p. The distance between any two vectors is given by,
2
i |xi − yi |
√
(11)
D(x, y) =
2
6
C. C. J. Moey, J. E. Rowe
where D(x, y) is the (normalised) distance between any two vectors. The algorithm for modelling a GA using the (α + 1)-state Markov model above is given
by,
Generate a set E of all possible populations.
Identify populations x that are near p, using D(x, p) < .
All populations not near p are in a state called Rest.
Exactly calculate the transition probabilities of those states near p.
Exactly calculate Pr[Rest|i] by making sure the column sum is 1,
where i is a state near p.
6. Allocate an array w, which is of size |Rest|.
7. Let Denominator = 0.
8. For each state b in Rest,
a) Let wb = 0.
b) For each e in E,
wb = wb + P r[b|e].
c) Denominator = Denominator + wb .
9. For each i near p,
a) Let N umerator = 0.
b) For each b in Rest,
N umerator = N umerator + (wb × Pr[i|b]).
c) Calculate Pr[i|Rest] = N umerator/Denominator.
10. Calculate Pr[Rest|Rest] by making sure the column sum is 1.
1.
2.
3.
4.
5.
3
Results
We show results of our approach using a Onemax problem. The fitness function
is defined as,
f (x) = 1 +
[xi = 1]
i
where x is a bit string and [expr] is an expression operator that evaluates to 1,
if expr is true. The mutation matrix U whose entries give the probabilities that
a string with j ones mutates to a string with i ones, is given by,
Ui,j =
n−j
j
k=0 l=0
[j + k − l = i]
n−j
k
j
µk+l (1 − µ)n−k−l
l
where µ is the mutation probability and is described further in [14]. The modelling error is measured by comparing the leading eigenvector of the lumped
model with an average of 30 GA runs, and is calculated using equation 11.
We show examples of modelling the Onemax problem described in the previous paragraphs using different parameters such as string length and population
size. The reason for using an Onemax problem is because of the fact that we can
dramatically reduce the size of our search space from n = 2 to n = + 1, which
A Reduced Markov Model of GAs
(a)
(b)
(c)
(d)
7
Fig. 1. Onemax with = 2 and µ = 0.1
further helps us to reduce the size of our finite population state space. This has
been suggested in [15]. Figure 1 shows results of modelling an Onemax problem
with a string length of 2 and a mutation rate of 0.1 (i.e. ≈ 1/4), and figure 2
shows results with a string length of 4 and a mutation rate of 0.06.
For comparison purposes, we calculate the steady state probability of being
in the Rest-state and we record the number of occurences of such state once its
corresponding real GA has reached its steady state behaviour. In this paper, all
real GAs have a run length of 10 000 generations and we assume they will reach
their steady state behaviour after 200 generations. Data points in figures 1(c)
and 2(c) are recorded from an average GA run of 30 runs. Figures 1(a) and 2(a)
show the number of states that are near p as a function of for various population
sizes. Finally, figures 1(d) and 2(d) show the error between the model and its
corresponding real GA. The error bars are shown where appropriate.
4
Conclusion
In this paper, we have shown an interesting way to aggregate the states of a
finite population GA based on the fixed point of the infinite population model.
The method rests fundamentally on two hypotheses:
1. That the finite population GA spends most of its time near a fixed-point of
the operator G.
2. That the Spears and De Jong lumping algorithm produces a reduced transition matrix which is a good approximation to the original.
8
C. C. J. Moey, J. E. Rowe
(a)
(b)
(c)
(d)
Fig. 2. Onemax with = 4 and µ = 0.06
Our method enables us to calculate a reduced Markov model without having
to calculate the exact transition matrix. Although we have a reduced model of
(α + 1) states, α will still increase as a function N , and (i.e. the radius).
Therefore, it may be assumed that we are able to lump all population states
into a two-state Markov model consisting of a Near-state and a Rest-state, where
Near-state contains population states that are within the closed ball radius of
the fixed point of the GA. We have done so but at a cost of obtaining somewhat
larger errors.
From Figure 1 and Figure 2, we can see that Pr[Rest] decreases for fixed
radii when the population size increases. This is interesting because the emerging
picture is when the population size increases, the long-term behaviour of the GA
shown in this paper will spend less of its time in the Rest-state and more of its
time within the closed ball radius of the fixed point. In the infinite population
limit, the GA will converge to this fixed point.
Further work will attempt to model a selection-mutation only GA using fixed
points as “states” of Markov models as suggested in [9] by Vose. For problems
such as trap functions and Royal Road functions, the fixed-points outside the
simplex will also come into play.
A Reduced Markov Model of GAs
9
Acknowledgements
The authors would like to thank the anonymous reviewers for their many useful
comments that were helpful in improving the paper.
References
1. Spears, W.M., De Jong, K.A.: Analyzing GAs using Markov Chains with Semantically Ordered and Lumped States. In: Proceedings of Foundations of Genetic
Algorithms, Morgan Kaufmann Publishers (1996)
2. Davis, T.E., Principe, J.C.: A markov chain framework for the simple genetic
algorithm. Evolutionary Computation 1 (1993) 269–288
3. De Jong, K.A., Spears, W.M., Gordon, D.F.: Using Markov Chains to Analyze
GAFOs. In Whitley, L.D., Vose, M.D., eds.: Proceedings of Foundations of Genetic
Algorithms 3. Morgan Kaufmann, San Francisco, CA (1995) 115–137
4. He, J., Yao, X.: From an Individual to a Population: An Analysis of the First
Hitting Time of Population-Based Evolutionary Algorithm. IEEE Transactions on
Evolutionary Computation 6 (2003) 495–511
5. Nijssen, S., Bäck, T.: An Analysis of the Behaviour of Simplified Evolutionary
Algorithms on Trap Functions. IEEE Transactions on Evolutionary Computation
7 (2003) 11–22
6. Rees, J., Koehler, G.J.: An Investigation of GA Performance Results for Different
Cardinality Alphabets. In Davis, L.D., De Jong, K., Vose, M.D., Whitley, L.D.,
eds.: Evolutionary Algorithms. Springer, New York (1999) 191–206
7. Wright, A.H., Zhao, Y.: Markov chain models of genetic algorithms. In: GECCO-99
(Genetic and Evolutionary Computation Conference). Volume 1., Morgan Kaufmann Publishers (1999) 734–741
8. Spears, W.M.: A Compression Algorithm for Probability Transition Matrices.
SIAM Journal on Matrix Analysis and Applications 20 (1998) 60–77
9. Vose, M.D.: The Simple Genetic Algorithm: Foundations and Theory. The MIT
Press, Cambridge, MA (1999)
10. Nix, A.E., Vose, M.D.: Modelling genetic algorithms with Markov chains. Annals
of Mathematics and Artificial Intelligence 5 (1992) 79–88
11. Wright, S.: Evolution in Mandelian populations. Genetics 16 (1931) 97–159
12. Fisher, R.A.: The genetical theory of natural selection. Oxford: Clarendon Press
(1930)
13. Moey, C.C.J., Rowe, J.E.: Population Aggregation Based on Fitness. Natural
Computing 3 (2004) 5–19
14. Rowe, J.E.: Population fixed-points for functions of unitation. In Reeves, C.,
Banzhaf, W., eds.: In Foundations of Genetic Algorithms. Volume 5. Morgan
Kaufmann Publishers (1998)
15. Spears, W.M.: Aggregating Models of Evolutionary Algorithms. In Angeline, P.J.,
Michalewicz, Z., Schoenauer, M., Yao, X., Zalzala, A., eds.: Proceedings of the
Congress on Evolutionary Computation. Volume 1., IEEE Press (1999) 631–638
© Copyright 2026 Paperzz