Evolution under Partial Information

Evolution under Partial Information
Duc-Cuong Dang
Per Kristian Lehre
ASAP Research Group
School of Computer Science
University of Nottingham
ASAP Research Group
School of Computer Science
University of Nottingham
[email protected]
[email protected]
ABSTRACT
1.
Complete and accurate information about the quality of candidate solutions is not always available in real-world optimisation. It is often prohibitively expensive to evaluate candidate solution on more than a few test cases, or the evaluation mechanism itself is unreliable. While evolutionary algorithms are popular methods in optimisation, the theoretical
understanding is lacking for the case of partial information.
This paper initiates runtime analysis of evolutionary algorithms where only partial information about fitness is available. Two scenarios are investigated. In partial evaluation
of solutions, only a small amount of information about the
problem is revealed in each fitness evaluation. We formulate a model that makes this scenario concrete for pseudoBoolean optimisation. In partial evaluation of populations,
only a few individuals in the population are evaluated, and
the fitness values of the other individuals are missing or incorrect. For both scenarios, we prove that given a set of
specific conditions, non-elitist evolutionary algorithms can
optimise many functions in expected polynomial time even
when vanishingly little information available. The conditions imply a small enough mutation rate and a large enough
population size. The latter emphasises the importance of
populations in evolution.
In the past decades, there have been a significant advances
in the theoretical understanding Evolutionary Algorithms
(EAs) and their behaviours in optimisation problems. A
large number of results on the time-complexity of EAs have
been rigorously proved [14]. In most of those analyses, it
often assumes that the true fitness value of a solution can
be received by calling an evaluation procedure. The timecomplexity is then analysed and computed based as the
number of those calls.
In real-world optimisation problems, it often happens that
complete and accurate information about the quality of candidate solutions is either not available or prohibitively expensive to obtain. Optimisation problems with noise, dynamic objective function or stochastic data, generally known
as optimisation under uncertainty [6], are examples of the
unavailability. On the other hand, expensive evaluations often occur in engineering, especially in structural and engine
design. For example, in order to determine the fitness of
a solution, the solution has to be put in a real experiment
or a simulation which may be time/resource consuming or
even requiring collection and processing of a large amount
of data. Such complex tasks give rise to the use the surrogate model methods [5] to assist EAs where a cheap and
approximate procedure fully or partially replaces the expensive evaluations. The full replacement is equivalent to the
case of unavailability. We summarise this kind of problem as
optimisation only relying on imprecise, partial or incomplete
information (for now) about the problem.
While EAs have been widely and successfully used in this
challenging area of optimisation [5, 6], only few rigorous theoretical studies have been dedicated to fully understand the
behaviours of EAs under such environments. Especially for
discrete optimisation, the studies are often limited to the
case of imprecise information and only extreme configuration of EAs are considered. In [2, 3], the inefficiency of
(1 + 1) EA has been rigorously demonstrated for noisy and
dynamic optimisation of OneMax function. The reason behind this efficiency is that in such environments, it is very
difficult for the algorithm to decide one solution is better
than another, e.g. very often it will choose the wrong candidate. On the other hand, such effect could be reduced by
having a population, which is not the case for (1 + 1) EA.
It has been shown in [11] that with a very large or infinite
population, noise has no effect on EAs. In fact, the analysis
presented in [11] is based on approximations which are not
rigorous (but make sense for infinite population). Therefore,
the exact order of the population size cannot be specified.
Categories and Subject Descriptors
F.2 [Theory of Computation]: Analysis of Algorithms
and Problem Complexity
General Terms
Theory, Algorithms, Complexity
Keywords
Partial Evaluation, Runtime Analysis, Non-elitism
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from [email protected].
GECCO’14, July 12–16, 2014, Vancouver, BC, Canada.
Copyright 2014 ACM 978-1-4503-2662-9/14/07 ...$15.00.
http://dx.doi.org/10.1145/2576768.2598375.
1359
INTRODUCTION
Algorithm 1 Population Selection-Variation Algorithm
Require: Finite search space X ,
and initial population P0 ∼ Unif(X λ ).
1: for t = 0, 1, 2, . . . until termination condition met do
2:
for i = 1 to λ do
3:
Sample It (i) ∈ [λ] according to psel (Pt ).
4:
x := Pt (It (i)).
5:
Sample x0 according to pmut (x).
6:
Pt+1 (i) := x0 .
7:
end for
8: end for
In this paper, we initiate runtime analysis of evolutionary algorithms where only partial or incomplete information
about fitness is available. Two scenarios are investigated: (i)
in partial evaluation of solutions, only a small amount of information about the problem is revealed in each fitness evaluation, we formulate a model that makes this scenario concrete for pseudo-Boolean optimisation (ii) in partial evaluation of populations, only a few individuals in the population
are evaluated, and the fitness values of the other individuals
are missing. For both scenarios, we rigorously prove that
given a set of specific conditions, non-elitist evolutionary
algorithms can optimise many functions in expected polynomial time even with very little information available. The
conditions imply a small enough mutation rate and a large
enough population size. The latter emphasises the importance of having a population and understand its required size
to overcome the hazardous environment filled with incomplete/imprecise information. The remainder of the paper is
organised as follows. The next section provides the generalised model of optimisation and summarises our main tool
to analyse EAs with non-elitist populations which is presented in a companion paper. In Section 3, the scenario for
partial evaluation of solutions is presented with specific examples of pseudo-Boolean optimisation. Partial evaluation
of populations is discussed and analysed in Section 4 and
finally, some conclusions are drawn.
2.
independently by picking one parent from Pt using a selection mechanism, denoted by psel , then by mutating it using
a variation operator, denoted by pmut . For pure analytical
purposes, It (i) is used to denote the sorting index i of Pt ,
e.g. Pt (It (i)) returns the di/λe-ranked individual of Pt .
Note that the populations do not overlap, hence the scheme
is non-elitist. It also makes sense to assume psel is monotone
as follows. Let us denote by psel (i | P ) the probability of an
individual i of P being selected by psel . A selection mechanism psel is f -monotone if for all P ∈ X λ and pairs i, j ∈ [λ]
it holds that psel (i | P ) ≥ psel (j | P ) ⇐⇒ f (P (i)) ≥ f (P (j)).
We then define the cumulative selection probability for psel
as the probability of selecting an individual with fitness at
least as good as that of the γ-ranked individual. Formally,
ANALYTICAL TOOLS
Definition 3. The cumulative selection probability β associated with selection mechanism psel is defined for all γ ∈
(0, 1] and P ∈ X λ by
X
β(γ, P ) :=
psel (y | P ) · [f (y) ≥ f (P (I(dγλe)))]
For any positive integer n, define [n] := {1, 2, . . . , n}. The
Hamming-distance is denoted by H(·, ·). The natural logarithm is denoted by ln(·), and the logarithm to the base 2
is denoted
Pnby log(·). For a bitstring x of length n, define
|x|1 :=
i=1 xi . Given a function f (x) fully defined, the
generalised optimisation problem of f (x) is defined as,
y∈P
Algorithm 1 has been investigated in a string of papers
[7, 10, 8, 1]. The algorithm has exponential expected runtime when the selective pressure is low relative to the mutation rate [7, 10]. Conversely, when the cumulative selection
probability is sufficiently high, it is possible to derive upper
bounds on the expected runtime using a fitness-level technique [8]. The following theorem recently shown in [1] is a
further improvement which provides tighter upper bounds
when β(γ) ≥ γ(1 + ε) for arbitrary small ε > 0.
Definition 1. In the problem of optimising f (x) knowing only Fc (x) (c is a positive parameter), the evaluation
of a bitstring x returns Fc (x) instead of f (x). For c = 1,
F1 (x) = f (x) or it is a static/classical optimisation problem, shortly called optimising f (x). For c < 1, Fc (x) is a
random variable parameterised by c and x.
An illustrative example of optimisation with imprecise information is the noisy case. In [3], Fc (x) has probability of
c of being equal to f (x). Otherwise with probability 1 − c,
it is equal to the fitness of a solution randomly picked from
the neighbourhood of x with radius 1 in Hamming distance.
Our examples of Fc (x) for optimisation with incomplete information are given in the next section. The runtime is still
defined in terms of the discovery of a true optimal solution.
Theorem 4 ([1]). Given a function f : X → R, and
an f -based partition (A1 , . . . , Am+1 ), let T be the number of
function evaluations until Algorithm 1 with an f -monotone
selection mechanism psel obtains an element in Am+1 for
the first time. If there exist parameters p0 , s1 , . . . , sm , s∗ ∈
(0, 1], and γ0 ∈ (0, 1) and δ > 0, such that
(C1) pmut y ∈ A+
j | x ∈ Aj ≥ sj ≥ s∗ for all j ∈ [m]
(C2) pmut y ∈ Aj ∪ A+
j | x ∈ Aj ≥ p0 for all j ∈ [m]
(C3) β(γ, P )p0≥ (1 +δ)γ for all P ∈ X λ and γ ∈ (0, γ0 )
16m
δ 2 γ0
2
with a =
(C4) λ ≥ ln
a
acs∗
2(1 + δ)
= min{δ/2, 1/2} and c = 4 /24
!
m
X
2
p0
1
then E [T ] ≤
mλ(1 + ln(1 + cλ)) +
.
c
(1 + δ)γ0 j=1 sj
Definition 2. Assuming maximisation, the runtime of
an algorithm A to optimise f (x) knowing only Fc (x) is considered as the number of times Fc (x) being sampled by A
until for the first time a solution x∗ such that f (x∗ ) =
max{f (x)} is conceived by A.
We focus on when A finds a true optimal solution for
the first time and ignore how A recognises such an achievement. The algorithms analysed in this paper belong to the
algorithmic scheme of Algorithm 1. In each step t, the algorithmic scheme generates a new population Pt+1 based on
the current one Pt . Each individual of Pt+1 is generated
In the theorem, A+
j is a short notation for ∪k>j Ak . In
this paper, focus on the special case of Algorithm 1 where
1360
3.
the search space X is the set of bitstrings of length n, the
selection operator psel is binary tournament selection, and
the mutation operator pmut is bitwise mutation, where each
bit is flipped with probability χ/n. Note that the parameter
χ is not necessarily constant with respect to n. The following
simplifications and auxiliary results will be useful.
PARTIAL EVALUATION OF
PSEUDO-BOOLEAN FUNCTIONS
We consider pseudo-Boolean functions over bitstrings of
length n. It is well known that for any pseudo-Boolean function f : {0, 1}n → R, there exists a set S := {S1 , . . . , Sk } of
k subsets Si ⊆ [n] with associated weights wi ∈ R such that
Corollary 5. Given any constant γ0 ∈ (0, 1), if 1/δ ∈
poly(m) and 1/s∗ ∈ poly(m) then there exists a constant b
such that condition (C4) is satisfied with
f (x) =
k
X
i=1
b
ln (m)
δ2
Proof. For γ0 being a constant and δ ∈ (0, 1), we remark
that 1/ = O(1/δ), 1/c = O(1/δ 4 ) and 1/a = (2/γ0 )(1/δ 2 +
1/δ) = O(1/δ 2 ). Since 1/δ ∈ poly(m) and 1/s∗ ∈ poly(m),
then there must exist a constant d > 0 such that md ≥
16m/(acs∗ ). Thus it suffices to choose b := 8d/γ0 so that
b
4d 1
1
ln
(m)
=
+
ln (m)
δ2
γ0 δ 2
δ2
4
1
1
2
16m
d
≥
m
≥
ln
+
ln
γ0 δ 2
δ
a
acs∗
(C4’) λ ≥
wi
Y
xj .
(1)
j∈Si
For example, we have k = n and Si := {i} for linear
functions, in which OneMax is the particular case where
wi = 1 for all i ∈ [n]. For LeadingOnes, we also get
k = n and wi = 1 but Si := [i]. In a classical optimisation
problem, full access to (S, W ) is guaranteed so that f (x) can
be exactly determined from Equation (1).
In an optimisation problem under incomplete or partial
information, the access to (S, W ) is made random, so often
incomplete. There are many ways to define the randomisation, however in this paper we focus on the following model.
Fc (x) :=
Then any λ ≥ (b/δ 2 ) ln(m) satisfies (C4).
k
X
i=1
Corollary 6. Given any constant γ0 ∈ (0, 1), for any
δ ∈ (0, 1), the expected runtime of Algorithm 1 satisfying
conditions (C1-4) is
!!
m
X
1
1
4
O
mλ(1 + ln(1 + δ λ)) +
.
δ5
sj
j=1
wi Ri
Y
xj with Ri ∼ Bin(1, c)
(2)
j∈Si
Informally, each subset Si has probability c of being taken
into account in the evaluation. We consider classical functions, such as OneMax and LeadingOnes, to be the typical examples. As mentioned in the previous section, Fc (x)
is used instead of f (x) in each partial evaluation.
We first have a look at the case of OneMax, the optimisation problem is then denoted as OneMax(c). Literally, each
1-bit has only probability c to be added up in fitness evaluation. We will rigorously prove that for 1/c ∈ poly(n), a
population-based EA can optimise OneMax(c) in expected
polynomial time. On the other hand, the classical (1+1) EA
needs exponential time in expectation for any constant c <
1. For simplification, we restrict psel in our analysis to binary tournament selection. The algorithm is detailed in Algorithm 2.
Proof. For δ ∈ (0, 1), we have = min{δ/2, 1/2} = δ/2.
Hence 2/(c) = 48/5 = 1536/δ 5 . In addition, we have
p0
≤ γ10 , so by Theorem 4
(1+δ)γ0
!
m
δ4 λ
1 X 1
1536
mλ 1 + ln 1 +
+
E [T ] ≤ 5
δ
384
γ0 j=1 sj
and the result follows.
We also use drift analysis [9] to prove the inefficiency of
(1 + 1) EA under partial/incomplete information. The following theorem, which is a corollary result of Theorem 2.3
in [4], was presented in [9].
Algorithm 2 EAs (2-Tournament, Partial Information)
1: Sample P0 ∼ Unif(X λ ), where X = {0, 1}n .
2: for t = 0, 1, 2, . . . until termination condition met do
3:
for i = 1 to λ do
4:
Sample two parents x, y ∼ Unif(Pt ).
5:
fx := Fc (x) and fy := Fc (y).
6:
if fx > fy then
7:
z := x
8:
else if fx < fy then
9:
z := y
10:
else
11:
z ∼ Unif({x, y})
12:
end if
13:
Flip each bit position in z with probability χ/n.
14:
Pt+1 (i) := z.
15:
end for
16: end for
Theorem 7 (Hajek’s theorem [9]). Let {Xt }t≥0 be
a stochastic process over some bounded state S ∈ [0, ∞), Ft
be the filtration generated by X0 , ..., Xt . Given a(n) and b(n)
depending on a parameter n such that b(n) − a(n) = Ω(n),
define T := min{t|Xt ≥ b(n)}. If there exist positive constants λ, , D such that
(L1) E [Xt+1 − Xt |Ft , Xt > a(n)] ≤
− (L2) (|Xt+1 − Xt ||Ft ) Y with E eλY ≤ D
then there exists a positive constant c such that
Pr (T ≤ ecn |X0 < a(n)) ≤ e−Ω(n)
Informally, if the progress (toward state b(n)) becomes
negative from state a(n) (L1) and big jumps are very rare
(L2), then E [T ] is exponential, e.g., by Markov’s inequality
Theorem 4 requires lower bounds for β(γ) and δ so that
the necessary condition for the mutation rate χ/n is established. The value of β(γ) depends on the probability that
the fitter of individual x and y is selected in lines 6–12 of
E [T |X0 < a(n)] ≥ Pr (T > ecn |X0 < a(n)) ecn
≥ 1 − e−Ω(n) ecn = eΩ(n)
1361
Algorithm2. Formally, for any x and y where f (x) > f (y),
we want to know a lower bound on the probability that the
algorithm selects z = x.
the portion, or at least x is picked from the portion then
wins the tournament.
β(γ) ≥ γ 2 + 2γ(1 − γ) Pr(z = x)
= γ(1 + (1 − γ)(2 Pr(z = x) − 1))
Lemma 8. For any two bitstrings x and y of length n with
f (x) > f (y), we have Pr(z = x) ≥ (1/2)(1 + c Pr(X = Y ))
where X and Y are identical independent random variables
following distribution Bin(f (y), c).
From Corollary 9, we have
2 Pr(z = x) − 1
Proof. Each bit of the f (x) 1-bits of x has probability c
being counted, so Fc (x) ∼ Bin(f (x), c). Similarly, we have
Fc (y) ∼ Bin(f (y), c). Remark that f (x) = f (y) + (f (x) −
f (y)), we can decompose Fc (x) and Fc (y) further. Let X,
Y and ∆ be independent random variables such that X ∼
Bin(f (y), c), Y ∼ Bin(f (y), c) and ∆ ∼ Bin(f (x) − f (y), c),
then we have Fc (x) = X + ∆ and Fc (y) = Y .
By the law of total probability,
≥2
1
2
64c/81
1+ p
6 (n − 1)c(1 − c) + 1
!!
−1
64c/81
= p
6 (n − 1)c(1 − c) + 1
So
β(γ) ≥ γ
Pr(z = x)
Pr(∆ = 0)
= Pr(X > Y ) + Pr(X = Y ) Pr(∆ > 0) +
2
Pr(∆ = 0)
= Pr(X > Y ) + Pr(X = Y ) Pr(∆ ≥ 0) −
2
(f (x)−f (y))
(1 − c)
= Pr(X > Y ) + Pr(X = Y ) 1 −
2
1−c
≥ Pr(X > Y ) + Pr(X = Y ) 1 −
2
(1 − γ)64c/81
1+ p
6 (n − 1)c(1 − c) + 1
!
Corollary 11. For any c ∈ (0, 1) and any constant γ0 ∈
(0, 1), then there exists constant a ∈ (0, 1) such that
p β(γ) ≥
γ(1 + 2δ) for all γ ∈ (0, γ0 ] where δ = min{ac, a c/n}.
Proof. From Lemma 12, for all γ ∈ (0, γ0 ] we have
uc
β(γ) ≥ γ 1 + √
v nc + 1
where u = (1 − γ0 )64/81, v = 6
√
u
uc
≥
c.
If c ≤ 1/n, then nc ≤ 1 and √
v+1
v nc + 1
√
If c > 1/n, then nc > 1 and
r
uc
uc
u
c
√
√ =
> √
v+1
n
v nc + 1
v nc + nc
The inequality is due to (1−c) ∈ (0, 1) and f (x)−f (y) ≥ 1.
Because X and Y are identical and independent, we have
Pr(X < Y ) = Pr(X > Y ). We also have the total probability Pr(X > Y ) + Pr(X = Y ) + Pr(X < Y ) = 1. The two
results imply Pr(X > Y ) = (1 − Pr(X = Y ))/2. Put into
the previous calculation of Pr (z = x),
1 − Pr(X = Y )
1−c
+ Pr(X = Y ) 1 −
Pr(z = x) ≥
2
2
1
= (1 + c Pr(X = Y ))
2
The statement now follows by choosing a = u/(2(v +1)).
Theorem 12. Given c ∈ (0, 1) such that 1/c ∈ poly(n),
there exist constants a and b such that Algorithm p
2 with
χ = δ/3 and λ = b ln(n)/δ 2 where δ = min{ac, a c/n}
optimises OneMax(c) in expected time
 n ln(n)


O
if c ≤ 1/n and

c7
9/2
n ln(n)


O
if c > 1/n.
c7/2
Corollary 9. For any two bitstrings x and y of length
n with f (x) > f (y), we have
!
64c/81
1
Pr(z = x) ≥
1+ p
2
6 (n − 1)c(1 − c) + 1
Proof. From Lemma 8, we apply the result of Lemma
21 for X, Y ∼ Bin(n − 1, c) (in the worst case, we have
f (y) = n − 1) and d = 3 (arbitrarily chosen).
Lemma 10. For any γ ∈ (0, 1), the cumulative selection
probability of Algorithm 2 is at least
!
(1 − γ)64c/81
β(γ) ≥ γ 1 + p
6 (n − 1)c(1 − c) + 1
Proof. Remark that Corollary 11 and 1/c ∈ poly(n) imply 1/δ ∈ poly(n), δ ∈ (0, 1) and χ < 1/3. We then use
the partition Aj := {x ∈ {0, 1}n | |x|1 = j} to analyse the
runtime. The probability of improving a solution at fitness
level j by mutation is lower bounded by the probability that
a single 0-bit is flipped and not the other bits,
χ
χ n−1
j χ n 1−
> 1−
χ 1−
(n − j)
n
n
n
n
Proof. Without loss of generality, for any inputs x and
y of the tournament selection we assume that f (x) ≥ f (y).
Recall that β(γ) is the probability of picking an individual with fitness at least equal to the fitness of the γ-ranked
individual, e.g. belonging to the upper γ-portion of the population. Therefore, it suffices either x and y are picked from
It follows from Theorem 18 and χ < 1/3 that (1−χ/n)n ≥
(1 − χ) > 2/3. Therefore, χ(1 − χ/n)n > (δ/3)(2/3) = 2δ/9,
and it suffices to choose the parameters sj := (1 − j/n)(δ/9)
and s∗ := δ/(9n) so that (C1) is satisfied. In addition,
(1 − χ/n)n is the probability of not flipping any bit in the
mutation, hence picking p0 = 1 − χ satisfies (C2).
1362
It now follows from Corollary 11 that for all γ ∈ (0, γ0 ]
i.e. 1/c ∈ poly(n). Particularly, for any constant c ∈ (0, 1),
the non-elitist EAs can optimise LeadingOnes(c) in O(n5 )
and OneMax(c) in O(n9/2 ln(n)). We now show that the
classical (1 + 1) EA, which is summarised in Algorithm 3,
already requires exponential runtime on OneMax(c) for any
constant c < 1.
β(γ)p0 ≥ γ(1 + 2δ)(1 − χ) = γ(1 + 2δ)(1 − δ/3)
= γ(1 − δ/3 + 2δ − 2δ 2 /3) ≥ γ(1 − δ/3 + 2δ − 2δ/3)
= γ(1 + δ)
Therefore, (C3) is satisfied with the given value of δ. Because 1/δ ∈ poly(n), then 1/s∗ ∈ poly(n) and by Corollary 5, there exists a constant b such that condition (C4) is
satisfied with λ = (b/δ 2 ) ln(m). All conditions are satisfied,
and by Corollary 6, the expected optimisation time is
!!
m
X
1
1
4
mλ(1 + ln(1 + δ λ)) +
O
δ5
sj
j=1
!!
n
n ln(n)
1
nX1
2
.
=O
(1 + ln(1 + δ ln(n))) +
δ5
δ2
δ j=1 j
Algorithm 3 (1+1) EA (Partial Information)
1: Sample x0 ∼ Unif({0, 1}n ).
2: for t = 0, 1, 2, . . . until termination condition met do
3:
x0 := xt .
4:
Flip each bit position in x0 with probability 1/n.
5:
if Fc (x0 ) ≥ Fc (xt ) then
6:
xt+1 := x0 .
7:
else
8:
xt+1 := xt .
9:
end if
10: end for
√
By the definition of δ, it follows that δ = O(1/
n), so
P
n
2
1 + ln(1 + δ ln(n)) = O(1). Furthermore, nδ j=1 1/j =
O(n ln(n)/δ) is dominated by the left term n ln(n)/δ 2 . Hence,
the optimisation time is E [T ] = O(n ln(n)/δ 7 ). The p
theorem follows by noting that δ = ac if c ≤ 1/n, and δ = a c/n
otherwise.
Theorem 14. For any constant c ∈ (0, 1), the expected
optimisation time of (1+1) EA on OneMax(c) is eΩ(n) .
Proof. We use Theorem 7 with Xt as the Hamming distance from 0n (all-zero bitstring) to xt , so Xt := H(0n , xt )
and b(n) := n. The use of Fc (x) implies that the algorithm
can accept degraded solutions with less 1-bits than the current solution. This typically happens when all bit positions
of x0 are not flipped except a 1-bit then the evaluation of x0
does not recognise the change, such an event is denoted by
E. To analyse Pr (E), let us denote the event that a specific
bit position i being flipped and not the others, and Fc (x0 )
does not evaluate position i, by Ei .
n−1
1
1
1−c
Pr (Ei ) = (1 − c)
1−
≥
n
n
ne
We now consider the partial evaluation for LeadingOnes,
so the optimisation problem LeadingOnes(c). Literally,
Q
each product ij=1 xj , i ∈ [n] has the same probability c
of contributing to the fitness value in each evaluation. The
following result holds for the function.
Theorem 13. For any c ∈ (0, 1) where 1/c ∈ poly(n),
there exist constants a and b such that Algorithm p
2 with
χ = δ/3 and λ = b ln(n)/δ 2 where δ = min{ac, a c/n}
optimises LeadingOnes(c) in expected time
 n ln(n)


O
if c ≤ 1/n and

c7
9/2
n ln(n)
n5


O
+ 3
if c > 1/n
7/2
c
c
Assuming that there are currently j 1-bits in xt , events Ei
for those bits are non-overlapping. In addition, under the
condition that one of those events Ei happens (so with probability j Pr (Ei )), we use X to denote the number of the
remaining 1-bits in xt being evaluated, and Y the number
of evaluated 1-bits in x0 . Indeed, X and Y are identical and
independent variables following the same binomial distribution Bin(f (x), c), so Pr(Y ≥ X) ≥ 1/2. We get,
Proof. Conditions (C2)-(C4) are shown exactly as in the
proof of Theorem 12. For condition (C1), use the canonical
partition Aj := {x ∈ {0, 1}n | LeadingOnes(x) = j}. The
probability of improving a solution at fitness level j by mutation is lower bounded by the probability that the leftmost
0-bit is flipped, and no other bits are flipped,
χ
χ n−1
2δ
1
1−
(χ(1 − χ)) ≥
>
n
n
n
9n
Pr (E) ≥ j ·
j(1 − c)
1−c
· Pr (Y ≥ X) ≥
ne
2ne
The drift is then
E [Xt+1 − Xt |Ft ] ≤ −1 ·
j(1 − c)
1
+ (n − j) ·
2ne
n
We therefore choose sj = s∗ = δ/(9n) and the runtime is
n ln(n)
n2
1
2
E [T ] = O
(1
+
ln(1
+
δ
ln(n)))
+
δ5
δ2
δ
n ln(n)
n2
=O
+ 6
δ7
δ
We have a negative drift when j(1−c)
≥ n−j
or
2ne
n
n
1−c
j≥
=
n
1
−
2e + 1 − c
1 + 1−c
2e
The result
p now follows by noting that δ = ac if c ≤ 1/n, and
δ = a c/n otherwise.
From that,
∈ (1 − c, 2e), there exists
for any constant
1−c+
a(n) := n 1 − 2e+1−c so that
We have shown that non-elitist EAs with populations of
polynomial sizes and without elitism can optimise OneMax(c)
and LeadingOnes(c) in expected polynomial runtime, precisely in terms of partial evaluation calls, for very small c,
1363
E [Xt+1 − Xt |Ft , Xt > a(n)] ≤ −
2e
With n sufficiently large, we have b(n) − a(n) = Ω(n)
and the condition (L1) of Theorem 7 is satisfied. Condition
(L2) holds trivially due to the property of Hamming distance
|H(xt , 0n ) − H(xt+1 , 0n )| ≤ H(xt , xt+1 ) ≤ H(xt , x0 ). So
individual belongs to the best γ-portion, or (ii) the tournament selection happens (with probability r), and at least
one of the selected parents belongs to the γ-portion.
(|Xt − Xt+1 |Ft ) Z where Z := H(xt , x0 )
Then Z ∼ Bin(n, 1/n) and E eλZ ≤ e for λ = ln(2). We
also remark that if each bit position of x0 is initialised with
probability 1/2, then X0 will be highly concentrated near
n/2 ≤ a(n). It then follows from Theorem 7 that (1 +
1) EA requires expected exponential runtime to optimise
OneMax(c).
4.
β(γ) ≥ γ(1 − r) + r(1 − (1 − γ)2 )
= γ(1 + (1 − γ)r)
So for all γ ∈ (0, γ0 ], β(γ) ≥ γ(1 + (1 − γ0 )r) and the statement follows by choosing a = (1 − γ0 )/2.
The lemma can be generalised to k-tournament, however,
larger tournament sizes may cause overhead in the average
number of evaluations per generation. This number is a
random variable following Bin(kλ, r). Hence, we focus on
k = 2. Similar to the previous section, we consider r small,
i.e. 1/r ∈ poly(n). The following theorem holds for the
runtime of OneMax(c).
PARTIAL EVALUATION
OF POPULATIONS
In the previous section, we have seen that EAs with populations do not need complete but only a small amount of
information about the problem in each solution evaluation
to discover a true optimal solution in expected polynomial
runtime. One might wonder if the same thing could be true
about the information containing within a population.
We first give the motivation for how such a lack of information about the population can arise. In real-life applications,
the evaluation of solutions can be both time-consuming (e.g.
requiring extensive simulation) and inaccurate. We associate a probability 1 − r with the event that the quality of
a solution is not available to the algorithm. We model such
a scenario in Algorithm 4 by assuming that the outcome of
any comparison between two individuals is only available to
the algorithm with probability r.
Theorem 16. There exist constants a and b such that
Algorithm 4 optimise OneMax under condition r ∈ (0, 1)
and 1/r ∈ poly(n), with χ = δ/3 and λ = b ln(n)/δ 2 where
δ = ar in O(n ln(n)/r7 ).
Proof. We use the same partition from Theorem 12 and
the proof idea is similar. We first remark that by Lemma 15
and from 1/r ∈ poly(n) we have 1/δ ∈ poly(n), δ ∈ (0, 1)
and χ < 1/3. The probability of improving a solution at
fitness level j by mutation is lower bounded by
χ
χ n−1
j
2δ
1−
> 1−
(n − j)
n
n
n
9
So we can pick sj := (1 − j/n)(δ/9), s∗ := δ/(9n) and
p0 = 1 − χ so that (C1) and (C2) are satisfied.
It now follows Lemma 15 that for all γ ∈ (0, γ0 ]
Algorithm 4 EAs (2-Tournament, Partially Eval. Pop.)
λ
n
1: Sample P0 ∼ Unif(X ), where X = {0, 1} .
2: for t = 0, 1, 2, . . . until termination condition met do
3:
for i = 1 to λ do
4:
Sample
( two parents x, y ∼ Unif(Pt ).
argmax{f (x), f (y)} with probability r
5:
z :=
x
otherwise
6:
Flip each bit position in z with probability χ/n.
7:
Pt+1 (i) := z
8:
end for
9: end for
β(γ)p0 ≥ γ(1 + 2δ)(1 − χ) = γ(1 + 2δ)(1 − δ/3) ≥ γ(1 + δ)
Therefore, (C3) is satisfied with the given value of δ. Because 1/δ ∈ poly(n), then 1/s∗ ∈ poly(n) and by Corollary 5, there exists a constant b such that condition (C4) is
satisfied with λ = (b/δ 2 ) ln(m). All conditions are satisfied,
and by Corollary 6 the expected optimisation time is
!!
n
n ln(n)
1
nX1
2
O
(1 + ln(1 + δ ln(n))) +
δ5
δ2
δ j=1 j
n ln(n)
=O
δ7
Unlike Algorithm 2, the algorithm in this section use complete information about the problem to evaluate solutions.
However, the evaluation of individuals in the parent populations is not systematic but at random. Two individuals x
and y are sampled uniformly at random from the population as parents (line 4). With probability r, the individuals
are evaluated and the fitter individual is selected (line 5).
Otherwise, a parent x is arbitrarily chosen without concern
for the fitness value. Like before, efficient implementation is
possible but that does not change the outcome of our analysis. The analysis is more straightforward compared to the
previous section.
The result now follows by noting that δ = ar.
When the parameter r is small, the fitness function is evaluated only a few times per generation. More precisely, the
fitness function is evaluated 2N times, where N ∼ Bin(λ, r).
The proof of Theorem 4 pessimistically assumes that the fitness function is evaluated 2λ times per generation. A more
sophisticated analysis which takes this into account may lead
to tighter bounds than those above.
We can also use Algorithm 4 to optimise LeadingOnes.
The following theorem holds for its runtime.
Lemma 15. For any r ∈ (0, 1) and constant γ0 ∈ (0, 1),
there exists a constant a ∈ (0, 1) such that Algorithm 4 satisfies β(γ) ≥ γ(1 + 2δ) for all γ ∈ (0, γ0 ), where δ = ar.
Theorem 17. There exist constants a and b such that
Algorithm 4 optimise OneMax under condition r ∈ (0, 1)
and 1/r ∈ poly(n), with χ = δ/3 and λ = b ln(n)/δ 2 where
δ = ar in O(n ln(n)/r7 + n2 /r6 ).
Proof. An individual among the best γ-portion of the
population is selected if either (i) a uniformly chosen individual is chosen as parent (with probability 1 − r), and this
1364
Proof. We use the same approach as the proof of Theorem 13, including the partition. So we have the same sj =
s∗ = δ/(9n), and the expected runtime for any r ∈ (0, 1)
with 1/r ∈ poly(n) is
n ln(n)
n ln(n)
n2
n2
O
+
=
O
+
δ7
δ6
r7
r6
[4] B. Hajek. Hitting-time and occupation-time bounds
implied by drift analysis with applications. Advances
in Applied Probability, 14(3):502–525, 1982.
[5] Y. Jin. Surrogate-assisted evolutionary computation:
Recent advances and future challenges. Swarm and
Evolutionary Computation, 1(2):61–70, 2011.
[6] Y. Jin and J. Branke. Evolutionary optimization in
uncertain environments - a survey. IEEE Transactions
on Evolutionary Computation, 9(3):303–317, 2005.
[7] P. K. Lehre. Negative drift in populations. In
Proceedings of the 11th International Conference on
Parallel Problem Solving from Nature, PPSN’10, pages
244–253, Berlin, Heidelberg, 2010. Springer-Verlag.
[8] P. K. Lehre. Fitness-levels for non-elitist populations.
In Proceedings of the 13th Annual Conference on
Genetic and Evolutionary Computation, GECCO’11,
pages 2075–2082, New York, NY, USA, 2011. ACM.
[9] P. K. Lehre. Drift analysis. In Proceedings of the 14th
Annual Conference Companion on Genetic and
Evolutionary Computation, GECCO’12, pages
1239–1258, New York, NY, USA, 2012. ACM.
[10] P. K. Lehre and X. Yao. On the impact of
mutation-selection balance on the runtime of
evolutionary algorithms. IEEE Transactions on
Evolutionary Computation, 16(2):225–241, 2012.
[11] B. L. Miller and D. E. Goldberg. Genetic algorithms,
selection schemes, and the varying effects of noise.
Evolutionary Computation, 4:113–131, 1996.
[12] D. S. Mitrinović. Elementary Inequalities. P.
Noordhoff Ltd, Groningen, 1964.
[13] M. Mitzenmacher and E. Upfal. Probability and
Computing: Randomized Algorithms and Probabilistic
Analysis. Cambridge University Press, 2005.
[14] P. S. Oliveto, J. He, and X. Yao. Time complexity of
evolutionary algorithms for combinatorial
optimization: A decade of results. International
Journal of Automation and Computing, 4(3):281–293,
2007.
Our results have shown that EAs do not need the fitness
values of all individuals in their populations to efficiently
optimise a function. Another interpretation is that competition between every individual before reproduction is not
necessary for efficient evolution. Note that non-elitism can
be considered as a way of reducing the competitiveness in
populations. These observations have many analogies with
evolution in biology.
5.
CONCLUSIONS
We have shown by rigorous analysis the runtime of evolutionary algorithms under circumstances where only partial
information about fitness is available. Two scenarios have
been studied. In partial evaluation of solutions, only a small
amount of information about the problem is revealed in each
fitness evaluation. In partial evaluation of populations, only
a few individuals in the population are evaluated, and the
fitness values of the other individuals are missing. For both
scenarios, we have proved that non-elitist evolutionary algorithms satisfying a set of specific conditions can optimise
many functions in expected polynomial time even when very
little information is available. The conditions often imply a
small enough mutation rate and/or a large enough population size. The importance of populations in evolution is
emphasised through the results. Future work should evaluate the advantages of populations in optimisation under
imprecision and uncertainty, such as in noisy or stochastic
optimisation.
Acknowledgements
APPENDIX
The research leading to these results has received funding
from the European Union Seventh Framework Programme
(FP7/2007-2013) under grant agreement no 618091 (SAGE)
and from the British Engineering and Physical Science Research Council (EPSRC) grant no EP/F033214/1 (LANCS).
6.
Theorem 18 (Bernoulli’s inequality [12]). For any
integer n ≥ 0 and any real number x ≥ −1, it holds that
(1 + x)n ≥ 1 + nx.
Theorem 19 (Jensen’s inequality [12]). For any function f (x) convex in [α, β] and ai ∈ [α, β] with i ∈ [n],
REFERENCES
[1] D.-C. Dang and P. K. Lehre. Refined upper bounds on
the expected runtime of non-elitist populations from
fitness-levels. To appear in the Proceedings of the 16th
Annual Conference on Genetic and Evolutionary
Computation (GECCO’14), Vancouver, Canada, 2014.
[2] S. Droste. Analysis of the (1+1) EA for a dynamically
bitwise changing OneMax. In Proceedings of the 2003
International Conference on Genetic and Evolutionary
Computation, GECCO’03, pages 909–921, Berlin,
Heidelberg, 2003. Springer-Verlag.
[3] S. Droste. Analysis of the (1+1) EA for a noisy
OneMax. In Proceedings of the 2004 International
Conference on Genetic and Evolutionary
Computation, GECCO’04, pages 1088–1099, Berlin,
Heidelberg, 2004. Springer-Verlag.
f(
n
n
1X
1X
ai ) ≤
f (ai )
n i=1
n i=1
Theorem 20 (Chebyshev’s Inequality [13]). For any
random variable X with finite expected value µ and finite
non-zero variance σ 2 , it holds that Pr (|X − µ| ≥ dσ) ≤ 1/d2
for any d > 0.
Lemma 21. Let X and Y be identical independent discrete random variables with finite expected value µ and finite
non-zero variance σ 2 , it holds that
Pr(X = Y ) ≥
1365
(1 − 1/d2 )2
for any d ≥ 1
2dσ + 1
Proof. For any k, ` ∈ R with dke ≤ b`c, we have
Pr (X = Y ) =
X
Pr(X = i)2 ≥
i∈Z
b`c
X
Pr(X = i)2
i=dke
2
b`c
i=dke Pr(X = i)
Pr (X = Y ) ≥
P
≥
The second last equality is due to Theorem 19 with convex
function f (x) = x2 . It suffices to pick k = µ − dσ and
` = µ + dσ, so that by Theorem 20, we get
b`c − dke + 1
≥
Pr(k < X < `)2
b`c − dke + 1
=
1366
Pr(|X − µ| < dσ)2
2dbσc + 1
(1 − 1/d2 )2
(1 − Pr(|X − µ| ≥ dσ))2
≥
2dbσc + 1
2dσ + 1