Foundations of a mathematical theory of darwinism

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Journal of Mathematical Biology manuscript No.
(will be inserted by the editor)
Foundations of a mathematical theory of darwinism
Charles J. K. Batty · Paul Crewe · Alan
Grafen · Richard Gratwick
Received: date / Accepted: date
Abstract This paper pursues the ‘formal darwinism’ project of Grafen, whose
aim is to construct formal links between dynamics of gene frequencies and optimization programmes, in very abstract settings with general implications for
biologically relevant situations. A major outcome is the definition, within wide
assumptions, of the ubiquitous but problematic concept of ‘fitness’. This paper
is the first to present the project for mathematicians. Within the framework of
overlapping generations in discrete time and no social interactions, the current
model shows links between fitness maximization and gene frequency change in
a class-structured population, with individual-level uncertainty but no uncertainty in the class projection operator, where individuals are permitted to observe and condition their behaviour on arbitrary parts of the uncertainty. The
results hold with arbitrary numbers of loci and alleles, arbitrary dominance
and epistasis, and make no assumptions about linkage, linkage disequilibrium
or mating system. An explicit derivation is given of Fisher’s Fundamental
Theorem of Natural Selection in its full generality.
Keywords Formal darwinism · reproductive value · fitness maximization ·
Price Equation
Mathematics Subject Classification (2010) 28B99 · 49N99 · 60J99 ·
92D15
This work is part of the ‘Formal Darwinism’ project funded by St John’s Research Centre,
St John’s College, Oxford, grant to CJKB and AG.
Charles J. K. Batty · Paul Crewe · Alan Grafen · Richard Gratwick
St John’s College, Oxford, OX1 3JP
Tel.: +44(0)1865 280146
E-mail: [email protected]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
2
1 Introduction
This is the first paper to present the formal darwinism project for mathematicians. It presents all the results of Grafen (2002, 2006b) completely anew, but
also combines into a single model all their separate features, correcting some
errors and making explicit all needed assumptions. While the earlier papers
have relevant discussion of the biological motivation and background, from
a mathematical point of view, the current paper should be regarded as the
starting point of the formal darwinism project.
The first explicit links of this kind between gene dynamics and optimization were made by Grafen (2002), for a population evolving in discrete time
with non-overlapping generations, considering the effects of uncertainty (e.g.
climate). The starting point for this work is the Price Equation, which records
how the frequency of a given allele changes from the parent generation to the
offspring generation. Grafen (2006b) considers the more complicated case of
a population with a class structure, e.g. age, ploidy, etc., and considers the
problem of rigorously defining Fisher’s notion of reproductive value (Fisher
1930), still in discrete time, and without the effects of uncertainty. As part
of this Grafen derives versions of the Price Equation for a class-structured
population. Given that this equation is the starting point for proving links
to optimization in Grafen (2002), it seems natural to investigate what corresponding optimization results we can derive in the case of a class-structured
population.
This paper provides analogous results in the class-structured case to the
fundamental links proved in Grafen (2002). To avoid the apparently substantial and more sophisticated problems inherent in a Markov process including
uncertainty (by which a general expression for reproductive value and hence
our maximand would be defined) we have to make some assumptions so that
there is no population-level effect of uncertainty, i.e. that the total offspring
distribution over classes is independent of the state of nature. A current line
of investigation in the formal darwinism project is to allow uncertainty to interact more substantially with the population over time. Neither do we allow
any social interactions between individuals; the full extension of the project in
the context of inclusive fitness, begun in Grafen (2006a), is the next challenge.
As a preliminary to proving the links to optimization, we re-derive the Price
Equation for a class-structured population, following the original derivation
in Grafen (2006b), but making some changes for clarity and correctness. Our
general results are illustrated by an examination of the case of a finite agestructured population, which has traditionally been analyzed by means of
so-called Leslie matrices.
The final contribution we make is to provide (in discrete time) an explicit
statement, and a complete and rigorous derivation, of Fisher’s celebrated Fundamental Theorem of Natural Selection (Fisher 1930), placing the theorem
within our wider optimization programme, which seems to formalize mathematically its original and natural conceptual context.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
3
2 Biological motivation
It has been a frequent and recurrent theme in biology since Darwin that natural
selection has a tendency to result in the maximization of something. Herbert
Spencer’s phrase ‘survival of the fittest’ led to the term ‘fitness’ being used
for whatever that relevant quantity might be. Many significant advances in
biology, as well as a number of recurrent debates, hinge on what exactly fitness
is, who or what should be regarded as doing the maximizing, and how the
maximization can be formalized. These ideas can be found and pursued in
textbooks such as Davies et al (2012).
Many empirical biologists today base research projects and paradigms on
the idea that natural selection leads to individual organisms acting as if maximizing their ‘inclusive fitness’, subject to the physical, physiological and informational constraints on the development and behaviour of an individual.
An intermediate level of biological theory, represented by the theory of evolutionarily stable strategies (Maynard Smith 1982) and inclusive fitness theory (originated by Hamilton 1964), simply assumes that individuals act as
if maximizing their Darwinian fitness (roughly, lifetime number of offspring)
or their inclusive fitness (a more sophisticated concept that recognizes social
behaviour), and studies topics of interest on that basis. However, the most
fundamental level of biological theory, mathematical population genetics, has
long been resistant to the idea that any useful maximization principle can be
derived from the known processes of gene frequency change, which are modelled using difference or differential equations.
From a mathematical point of view, the obvious candidate principles are
those familiar to students of dynamical systems, such as Lyapunov functions
and gradient functions, and the general conclusion of the literature is that
only under rather special and not very interesting conditions are population
genetic systems of a kind that will admit these kinds of functions (Ewens
2004). The main premiss of the current paper is that the empirical biologist’s
individual-based idea of fitness-maximization can be done justice only in a
more sophisticated setting, in which the equations of motion are taken as fundamental in representing gene frequency change, an optimization program is
constructed from those equations of motion in which the implicit decision-taker
is the individual organism, and then links are proved between the equilibrium
concepts of the equations of motion on the one hand, and of the optimization
program on the other. The maximand of the optimization program plays the
role of fitness, and a major interest lies in the nature of that maximand, and
in how tightly the maximand is defined by the structures imposed on it. The
stronger the links between the equilibrium concepts, the more constrained is
the nature of the maximand, and so the more precisely the concept of fitness
is defined.
In the present work, the population of individuals is assumed to be divided
into classes, such as sexes and/or size and/or age. There are discrete overlapping generations, so that in a formal sense an individual may have a special
kind of asexual offspring that is itself surviving to the next period, as well
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
4
as contribute gametes to new individuals. The population may be finite or
infinite, as may the set of classes. An individual takes an action in each period
which affects the offspring it leaves in the next period. So does its class, and
the ‘state of nature’, which is drawn from a set of possible outcomes of ‘actions
by nature’, with a given probability distribution. Thus the offspring distribution across classes produced by an individual depends on its own action and
on uncertainty, but we impose the restriction that it does not depend on the
actions of others — social behaviour is therefore excluded. A further restriction is that the uncertainty is assumed not to affect the class demographics,
that is, the projection operator from classes to classes has no stochastic component. The action taken by an individual depends on its phenotype, which
we assume is determined by its genotype, and also by informational cues that
make it possible for an individual to condition its choice of action on aspects
of the uncertainty.
The aim is to prove a fitness-maximization principle at a very high level
of generality. So far as gene frequencies are concerned, we employ the covariance selection mathematics of Price (1970), though not the generalization in
Price (1972a), and very weak assumptions, which allow us not to say anything
about mating systems, linkage, or some other potential complications. On the
optimization side, we consider the equilibrium concepts of an optimization
program in which an individual organism is the decision-taker. The program
represents a sophisticated individual, who has a prior distribution over all the
relevant uncertainties, and who updates this distribution in the light of information received. For an individual to solve this program implies that the prior
distribution is correct, and that the updating is optimal Bayesian. The links
are of two kinds. Three results make an assumption related to how individuals
fare in the optimization program, and draw conclusions about gene frequency
change. A fourth result makes an assumption about gene frequency change,
and draws a conclusion that individuals each solve the optimization program.
The first explicit mathematical fitness-maximization principle in biology
was the Fundamental Theorem of Natural Selection of Fisher (1930), and
the current work can be viewed as an extension and generalization of this
theorem. Fisher’s theorem has been notoriously hard to understand, and his
arguments are famously opaque. The early rejection (endorsed and reviewed
by Ewens 1979) by mathematical population geneticists of the theorem must
now be read in the context of the exposition of the theorem and proofs by
Price (1972b). The current view in that discipline, as represented by Ewens
(2004), is that the fundamental theorem is true, and Fisher’s proofs are valid
give or take some minor typographical and other errors (see Lessard (1997)
for a careful modern derivation), but that the meaning of the theorem remains
obscure. Recent papers focus on the meaning (Okasha 2008; Ewens 2011) but
come to no firm conclusion. The present paper contributes to the debate on
fitness-maximization by generalizing the Fundamental Theorem of Natural
Selection to include arbitrary classes rather than just age classes, and to allow
for arbitrary uncertainty at an individual level; and by making fully explicit
the nature of the optimization in the conceptual scheme. Our presentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
5
should indicate that interest in the theorem is not merely historical: whereas
much of the literature on the theorem seems wholly motivated by solving the
enigma of what one man meant by one theorem, and the result is left to
stand alone and justify itself, we, contrastingly, demonstrate that it sits very
naturally alongside related results about the optimizing behaviour of natural
selection.
The high level of generality sought in this paper and in the formal darwinism project more generally is important for two reasons. First, the fewer
the assumptions, the more widely the framework will apply as a meta-model:
that is, the assumptions of other models will not contradict those made here.
Our results then offer an optimization interpretation of the results of those
previous models. Application as a meta-model also explains one significance
of dealing with infinite populations, which are admittedly hard to find in nature. The assumptions we do make are so weak as to be what population
geneticists call ‘dynamically insufficient’ (Lewontin 1974), that is, they use a
lot of information about the parental generation and calculate only one piece
of information about the offspring generation, so one cannot ‘crank the handle’ and repeat the process. One virtue is that the framework can apply as a
meta-model to models with different detailed assumptions about mating systems and genetic architecture. Second, biologists today read Darwin (1859)
and agree with his conclusions for the reasons he gave. If Darwin’s arguments
are generally valid, then there must be a formal framework in which they can
be expressed. Darwin did not concern himself with continuous versus discrete
time, ploidy levels, or whether generations were overlapping or not, and his
arguments apply regardless: so should ours. This encompassing of Darwin’s
argument within a formal framework will reduce the scope for misunderstanding and misinterpretation. The broad aim of the paper and the project is
to justify a fitness-maximization principle for understanding the outcome of
natural selection, which will involve defining fitness, in as wide a setting as
possible. Indeed, Fisher’s theorem is best viewed as showing that the change
in the mean of a quantity equals the variance of that same quantity, and it
is natural to regard that quantity as what increases under natural selection
wherever possible. Significantly, that quantity, whose exact nature is too technical to explain in this section, is intimately related to the maximand in the
optimization programme of our Theorems 2 to 5.
3 Notation and concepts
We set out to define an extremely general model of a biological population
that has an arbitrary class-structure, with genotypes in an arbitrary set, and
phenotypes in an arbitrary set. Each class may have its own ploidy (number of
haploid sets in the genome). There will be arbitrary environmental uncertainty
that, together with its phenotype, affects the number of offspring each individual has. The way phenotype affects offspring number is important. Each
individual possesses partial knowledge of the uncertainty (it observes a ‘cue’),
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
6
and the phenotype dictates how the value of that cue in turn determines an
action. The action, together with the whole uncertainty and an individual’s
class, determines its number of offspring. No restrictions are placed on how
genotype determines phenotype. Thus, the model is extremely general: its chief
restrictions are that it is in discrete time, the environmental uncertainty does
not affect the class-to-class projection at a population level, and an individual’s
offspring number is not affected by phenotypes other than its own. Formally,
this model is a simultaneous generalization of two previous papers (Grafen
2002, 2006b), using a single consistent mathematical argument that replaces
less sophisticated, and in places erroneous, arguments.
We firstly outline the very basic structure of how we shall describe a classstructured population reproducing under the effects of environmental uncertainty. Our aim is generality, and so we impose only the minimum mathematical structure required, and we make only the weakest assumptions necessary to
ensure the discussion makes sense. This has the consequence that some technical effort has to be made to attain results which are unremarkable in more
familiar settings, for example being able to exchange the order of integration.
Since our aim is mathematical rigour, it is appropriate to demonstrate that
these results can be formally justified, and that our conclusions are indeed
valid in this generality.
Let (I, I, µI ) and (Ω, O, ν) be probability spaces, representing the population and states of nature respectively. We remark that no assumption is
made on the cardinality of I, or Ω. For notational ease, we use subscripts for
functions from I and superscripts for functions from Ω. Since context shall not
always make things clear, we shall throughout the paper fully notate spaces of
measurable and integrable functions, including the relevant domain, σ-algebra,
measure, and co-domain, e.g. L1 (I, I, µI ; R) is the space of functions from I
to R, integrable with respect to the measure µI on the σ-algebra I.
Let X be a compact Hausdorff space equipped with the usual Borel σalgebra X , and let χ : I → X be measurable. X is the space of classes, and
thus χ is the map allocating each individual to a class. We let σ(χ) denote
the σ-algebra on I generated by χ, i.e. generated by the set of pre-images
{χ−1 (Y ) : Y ∈ X }.
We let d : I → N denote the ploidy of the individuals, and we assume that
d ∈ L1 (I, σ(χ), µI ; N), so in particular d is measurable with respect to σ(χ).
A ploidy-weighted probability measure µ̃I on (I, I) is then defined by
µ̃I (J) =
!"
I
#−1"
di µI (di)
di µI (di).
J
Evidently a set J ∈ I is µI -null if and only if it is µ̃I -null. Expectations and
covariances over I shall always be taken with respect to this ploidy-weighted
measure µ̃I . Expectations with respect to i ∈ I or ω ∈ Ω shall be denoted EI
and EΩ respectively, and CI shall denote the corresponding covariance over I.
We assume individuals to produce offspring as measures over classes, i.e.
as elements of M(X), the space of signed finite measures on (X, X ), equipped
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
7
with the usual norm and Borel σ-algebra. This is a Banach space, and thus
we may in principle integrate functions taking values in this space, using the
Bochner integral. We let M+ (X) denote the subset of positive measures, in
which the offspring distributions will of course lie. (We do not demand that
these offspring distributions are probability measures.)
The technical work of section 4 below is devoted to deriving a Price equation in this context, in the style of Grafen (2006b), but with the added generalization to environmental uncertainty. The generality of the set-up described
here means that the manipulations required of population, class, and uncertainty are rather delicate, and the argument must be pursued with great care.
The argument given here provides both a generalization of the corresponding
Price equation of Grafen (2006b), and a more sensitive handling of the difficult
mathematical structures used.
We now begin to articulate the decision structure for individuals, so that
our model is of an individual in possession of some (possibly none and possibly
complete) information about environmental uncertainty. In order to move towards regarding individuals as facing the same decision, we also define a space
for how the environmental uncertainty can affect an individual. This allows
us to have a function, which is the same function for all individuals, to map
from action and uncertainty (and class) to number of offspring. This reduction
from the whole population to a single implicit decision taker is a key theme of
the development. The set of possible phenotypes available to each individual
is also defined as a central part of the decision structure for an individual.
Following Grafen (2002), we further suppose we have measurable spaces
R, U , and A, where we shall not notate the associated σ-algebras. R denotes
the observable local environment on which individual behaviour can be conditioned, U denotes the set of chance events from the point of view of the
individual that are determined by the state of nature ω ∈ Ω, which may
represent events experienced by the whole population and events affecting
individuals separately (we thus combine in one notation what was notated
separately in Grafen (2002)). A denotes the space of actions which may be
taken and which in turn (partly) determine the offspring produced.
A measurable function is understood to be a function between measurable
spaces such that pre-images of elements of the target space σ-algebra are elements of the domain σ-algebra. All subspaces, product spaces, and function
spaces shall be equipped with the usual σ-algebras generated under these operations; in particular function spaces are equipped with the smallest σ-algebra
which makes each evaluation map measurable (see Kechris 1995, §10.B). We
shall let Q ⊆ AR consist of the measurable functions q : R → A and Q denote
the induced σ-algebra on Q.
For each individual i ∈ I let Si be a subset of the class of measurable
functions from R to A (‘strategies’). Thus each individual has a set of possible ways to react to any given local environment. We suppose the following
functions are measurable:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
8
– r : I × Ω → R, representing the information about the local environment
available to individual i in state of nature ω;
– u : I×Ω → U , uncertainty, in principle affecting both individuals separately
and the population as a whole;
– a : I × R → A, the phenotype or strategy, specifying the action taken by
individual i in local environment r; moreover we assume that ai ∈ Si for all
i ∈ I, i.e. that the realized phenotype is indeed an admissible phenotype;
and
– w : A × X × U → M+ (X), the offspring distribution, depending on action,
class, and chance events.
A purely technical assumption we shall require in the proofs is that for all
I˜ ∈ I of full µ̃I -measure and all E ∈ X × Q (the product σ-algebra on X × Q),
$
%
&
'
x ∈ X : {x} × {ai : i ∈ I˜ ∩ χ−1 ({x})} ∩ E &= ∅ ∈ X ,
(1)
i.e. this set is a measurable subset of X. Thus the set of those classes which
contain some individual playing one of a given measurable set of strategies is
itself measurable. This is somewhat reminiscent of the assumptions used in
measurable selection theorems, e.g. that of Kuratowski-Ryll-Nardzewski (see
Wagner 1977), and its role will be in precisely this kind of context. We note
that it is satisfied trivially if the set of classes X is finite
)
*
( or countable.
We now condense our notation a little: we define w̃ ∈ (i,ω)∈I×Ω M+ (X)Si
by
ω
ω
w̃iω (q) = d−1
i w(q(ri ), χi , ui ),
for q ∈ Si . So w̃iω (Si ) ⊆ M+ (X) is the set of all possible offspring distributions
per haploid set of individual i in state of nature ω, when considering all admissible strategies for that individual. Note that the partial maps ω (→ w̃iω (q) for
fixed individual i ∈ I and strategy q ∈ Si and i (→ w̃iω (ai ) for fixed ω ∈ Ω are
measurable (see Rudin 1966, Theorem 7.5). The need to take averages in the
offspring generation, which is described (only) in terms of measures, means we
need some theory of vector integration. We use that of the Bochner integral,
which is the most powerful, and bears the strongest resemblance to the more
familiar Lebesgue integral (see Diestel and Uhl 1977). We make some important assumptions about the offspring distribution, the first three of which are
purely technical and simply guarantee that we may pursue our argument in
great generality:
that the function (i, ω) (→ w̃iω (ai ) is strongly measurable;
that the function ω (→ w̃iω (q) is Bochner integrable for all i ∈ I and q ∈ Si ;
that the function i (→ w̃iω (ai ) is Bochner integrable for all ω ∈ Ω;
that the total offspring distribution W ω := EI [w̃iω (ai )] does not in fact
depend on ω, we therefore notate this by just W ; and
– that µX := χ# µ̃I ) W . This is the assertion that classes may not be abandoned except by µ̃I -null sets of individuals: a positive distribution of the
parental generation on some set of classes Y implies a positive distribution
on Y of the offspring.
–
–
–
–
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
9
We record here two simple facts about Bochner integration which we shall
need. The first is essentially a version of Fubini’s theorem, stating that integrating a function with respect to an average measure is the same as averaging
over the integrals over individual measures.
Lemma 1 Let (Y, Y, µY ) be a measure space, and let (Z, Z)
+ be a measurable
space. Let m : Y → M+ (Z) be Bochner integrable, m̄ := Y m(y) µY (dy) ∈
M+ (Z), and f : Z → [0, ∞] be measurable.
Then the function
"
Y + y (→
f (z) m(y)(dz)
Z
is measurable, and
#
"
" !"
f (z) m̄(dz) =
f (z) m(y)(dx) µY (dy).
Z
Y
(2)
Z
Remark 1 We do not assume that the function f is integrable; thus (2) also
applies in the case that the integrals are infinite. The result will not extend
in this generality to arbitrary real-valued (i.e. possibly negative-valued) functions, since in this case the argument would involve the indefinite term ∞−∞.
Proof The measurability result can be seen by returning to the definition of
the integral, and using the measurability of the map taking measures to their
evaluation on some fixed set, and the algebra of measurable functions.
Equation (2) can be seen by routine approximation of the integral by simple
functions.
.
Lemma 2 Let (Y, Y, µY ) be a finite measure space, (Z, Z) be a measurable
space, m : Y → M(Z) be Bochner integrable, E ∈ Z.
Then y (→ m(y)(E) is integrable and
!"
#
"
m(y) µY (dy) (E) =
m(y)(E) µY (dy).
Y
Y
Proof This is a trivial consequence of the fact that m (→ m(E) is a bounded
linear operator from M(Z) to R, and the result of Hille (Diestel and Uhl
1977, Chapter 2, Theorem 2.6) that Bochner integration commutes with closed
operators.
.
Finally, it is unusual in population genetic models to leave the connection
unspecified between genotype and phenotype, but it is one of the notable features of the covariance selection mathematics of Price (1970). We can obtain
a sufficient purchase on that connection by using two population genetic concepts. A p-score is an arbitrary weighted sum of allele frequencies, and is thus
a linear functional on the set of genotypes: by proving results for an arbitrary
p-score, we manage to say something significant about selection of genotypes
in general. If a phenotype is a real number, such as height, we can find the
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
10
p-score that best predicts the phenotype across the whole population. Sometimes it will be useful to discuss those predicted phenotypes, which are known
in biology as additive genetic values (of that given trait).
Thus we formally introduce the concepts of additive genetic value and pscores. Suppose each individual i ∈ I has at most N ≥ 1 loci, and at most
n ≥ 1 rival alleles for each locus. Then the space Rn×N of real-valued n ×
N matrices G can be regarded as containing all the matrices of genotypes
of individuals: the entry gk,l of the matrix G representing the genotype of
individual i is the number of alleles k at locus l of i. Let g : I → Rn×N be the
map assigning each individual its genotype. A p-score is a function p : I → R
representing an additive genetic trait (Grafen 2000), i.e. a linear combination
of allele frequencies. Hence the following definition.
Definition 1 (p-score and additive genetic trait) Let p ∈ L∞ (I, I, µ̃I ; R).
Then p is a p-score, i.e. represents an additive genetic trait, if there exists a
linear map ξ : Rn×N → R such that pi = ξ(d−1
i gi ). Let P denote the subspace
of L∞ (I, I, µ̃I ; R) comprising these p-scores; P is then a finite-dimensional
subspace of L∞ (I, I, µ̃I ; R), of dimension K ≤ N × n, say.
Using the Gram-Schmidt process, we can construct a basis for P consisting
of functions {pl }K
l=1 such that
,
"
1 l = l%
(pl )i (pl! )i µ̃I (di) =
0 otherwise.
I
To define the additive genetic value of an arbitrary integrable function of
individuals, we simply project onto the set of p-scores, P. We choose our
coefficients so that the kernel of this projection lies in the pre-annihilator of
the subspace P, with the usual identification of dual spaces, or rather that
the average over all individuals carrying any given allele of the additive genetic
value of a trait is equal to the same average of the trait itself.
Definition 2 (Additive genetic value) Let f ∈ L1 (I, I, µ̃I ; R). Then the
additive genetic value of f , agv(f ) ∈ P, is given by
agv(f ) =
K !"
l=1
I
#
(fi )(pl )i µ̃I (di) pl .
This is then a p-score, and represents that component of f which is heritable.
We have thus defined a bounded linear operator agv : L1 (I, I, µ̃I ; R) → P.
Evidently then f = agv(f ) + (f − agv(f )), where by choice of the pl , we see
that for any p ∈ P,
"
(fi − agv(f )i )pi µ̃I (di) = 0.
(3)
I
This equation also assures us that agv does not depend on the choice of basis for P, for suppose two bases give two definitions, agv1 and agv2 , say,
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
11
both of which satisfy equation (3). Then for all f ∈ L1 (I, I, µ̃I ; R), since
agv1 (f ), agv2 (f ) ∈ P, applying (3) gives that
"
2
(agv1 (f )i − agv2 (f )i ) µ̃I (di)
I
"
"
"
2
= (agv1 (f )i ) µ̃I (di) − 2 agv1 (f )i agv2 (f )i µ̃I (di) + (agv2 (f )i )2 µ̃I (di)
I
"I
" I
= fi agv1 (f )i µ̃I (di) − fi agv1 (f )i µ̃I (di)
I
I
"
"
− fi agv2 (f )i µ̃I (di) + fi agv2 (f )i µ̃I (di)
I
I
= 0,
and hence that agv1 (f ) = agv2 (f ) as elements of P.
4 Reproductive Value and the Price Equation
The Price Equation represents gene frequency change in our argument, but in
its original form does not admit uncertainty or class-structure. This section
derives a suitable Price Equation, simultaneously generalizing the uncertainty
of Grafen (2002) and the class-structure of Grafen (2006b). The Markov theory
allows us to weight over the classes to obtain a single average change in p-score
from one time period to the next. The central property needed is that two sets
of weights are the same: those used to average across classes to obtain an
average change in mean p-score on the left hand side; and those used to obtain
a single measure of reproductive success for each individual, averaging across
the classes of its offspring, to include in the right hand side. The original idea
of using the leading eigenvector of a class-to-class transition matrix goes back
to Taylor (1990, 1996), and may be detected in embryo form in the famous
sex ratio argument of Fisher (1930).
4.1 Links to Markov Theory
In this section we apply our assumption that, while uncertainty may affect
the fitnesses of individuals, it does not affect class-to-class projection at a
population level. We do this by defining a Markov process for arbitrary ω, and
then assuming that there exists an associated invariant measure that does not
depend on the choice of ω. Further work aims to remove this assumption. The
invariant measure represents the ‘reproductive value’ of a subset of classes: that
is, if we take a sufficiently distant generation and choose an allele at random,
what is the probability that its ancestor today is present in an individual that
belongs to a class in that subset?
Given all our data, we can follow the methods of Grafen (2006b) for each
state of nature ω ∈ Ω. To find an invariant measure and the appropriate
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
12
weightings for classes, we must first understand how the class distribution of
the population changes, given the pattern of offspring production described
above.
Lemma 3 Let ω ∈ Ω, and let f : I → R be such that i (→ fi w̃iω (ai ) as a map
from I to M(X) is Bochner integrable.
Then EI [fi w̃iω (ai )] ) W .
Proof This is trivial using Lemma 2 since i (→ w̃iω (ai ) maps into positive
measures.
.
Therefore for such a measure EI [fi w̃iω (ai )], the Radon-Nikodym derivative
d
EI [fi w̃iω (ai )], is defined. Applying this remark to charwith respect to W , dW
acteristic functions of sets of individuals sharing classes, we can define a (discrete time) Markov process with state space X by defining the probability
transition function P ω : X × X → [0, 1] by
.
."
//
d
P ω (x, A) =
w̃iω (ai ) µ̃I (di)
(x).
dW
χ−1 (A)
P ω (x, A) is then well-defined W -almost everywhere, and represents the proportionate contribution by parents belonging to a class in A to offspring of
class x.
Following Rosenblatt (1971), we can use P ω (·, ·) to define a linear functional
ω
T : L∞ (X, X , µX ; R) → L∞ (X, X , µX ; R) by defining
"
(T ω f )(x) =
f (y) P ω (x, dy).
X
This is the average of f over all those parents contributing offspring to class x,
and therefore represents how values on classes may be traced back through a
generation. T ω is clearly well-defined since P ω (x, ·) ) µX , and for W -almost
every x ∈ X, we have that
|(T ω f )(x)| ≤ 1f 1L∞ (X,X ,µX ;R) .
Since µX ) W , we see that this exceptional set is µX -null, i.e. T ω f ∈
L∞ (X, X , µX ; R) indeed. We shall find the following alternative expression
for T ω f useful when deriving the Price equation.
Lemma 4 Let f ∈ L∞ (X, X , µX ; R).
Then
!"
#
d
(T ω f )(x) =
(f ◦ χ)i w̃iω (ai ) µ̃I (di) (x)
dW
I
(4)
for W -almost every x ∈ X.
Proof This follows by routine approximation of the integral by simple functions, using the vectorial version of the dominated convergence theorem (Diestel and Uhl 1977, Chapter 2, Theorem 2.3).
.
-
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
13
Associated with any such process is the notion of an invariant measure, i.e. a
measure τ ω ∈ M+ (X) such that
"
P ω (x, A) τ ω (dx) = τ ω (A)
X
for all A ∈ X , or, equivalently,
"
"
ω
(T ω f )(x) τ ω (dx)
f (x) τ (dx) =
X
X
∞
for all f ∈ L (X, X , µX ; R). Such a weighting then appropriately balances
the reproductive outputs of classes so that average values across classes are
preserved from parental to offspring generation. We remark that since P ω (x, A)
is only well-defined for W -almost every x ∈ X, we can a priori only integrate
P ω (x, A) with respect to a measure τ ω if τ ω ) W . Thus we assume that any
discussion of invariant measures is restricted to such absolutely continuous
ω
measures, so that the derivative dτ
dW is well-defined.
We assume that an invariant measure exists for each ω ∈ Ω; we refer
to Grafen (2006b) for a discussion of the assumptions required to guarantee
this. We further assume that there exists an invariant measure τ ω which is in
fact independent of ω, thus we can just write τ for this invariant measure.
This weighting of classes based on reproductive output is the key ingredient
in our definition of fitness of individuals.
( Given the invariant measure τ , we
can now define ‘fitness operator’ F ω ∈ i∈I [0, ∞]Si by, for i ∈ I, setting
"
dτ
Si + q (→ Fiω (q) :=
(x) w̃iω (q)(dx).
dW
X
For each individual i ∈ I, this is a function of strategy, and is an appropriately
weighted average of the individual’s offspring when playing each given strategy.
For this and the following subsection we consider a fixed element p ∈
L∞ (I, I, µ̃I ; R). We emphasize that at this stage this is an arbitrary element
of the space, and not in general a p-score, and assigns (bounded) numbers to
individuals in a manner not necessarily determined by genotype. We define
the class-average value of p by defining X -measurable function π : X → R as
the Radon-Nikodym
derivative with respect to µX of the measure on X given
+
by Y (→ χ−1 (Y ) pi µ̃I (di). We see that π satisfies
(π ◦ χ) = E[p|χ],
as elements of L1 (I, I, µ̃I ; R). Note that properties of conditional expectations
imply that π ∈ L∞ (X, X , µX ; R), since p ∈ L∞ (I, I, µ̃I ; R).
Fix ω ∈ Ω. Since p ∈ L∞ (I, I, µ̃I ; R), the correspondingly weighted offspring function pw̃ω (a) is Bochner integrable and EI [pi w̃iω (ai )] ) W by
Lemma 3. We can define the class average of this weighted offspring function, π̂ ω ∈ L1 (X, X , W ; R), by
π̂ ω =
d
(EI [pi w̃iω (ai )]).
dW
(5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
14
We expand later on the precise interpretation of this value and under what
further assumptions it becomes a quantity of greater relevance, e.g. when it
becomes a useful estimate for the class average value of p in the following
generation. In the next lemma we make the observation that this offspring
class average is, like the parental class average, a bounded function.
Lemma 5 With π̂ ω as defined above, in fact π̂ ω ∈ L∞ (X, X , W ; R).
Proof Let c > 0 and use Lemma 2 to see that
"
ω −1
cW ((π̂ ) ((c, ∞))) ≤
π̂ ω (x) W (dx)
=
(π̂ ω )−1 ((c,∞))
EI [pi w̃iω (ai )]((π̂ ω )−1 ((c, ∞)))
"
pi w̃iω (ai )((π̂ ω )−1 ((c, ∞))) µ̃I (di)
"
≤ 1p1L∞ (I,I,µ̃I ;R) w̃iω (ai )((π̂ ω )−1 ((c, ∞))) µ̃I (di)
I
!"
#
ω
∞
= 1p1L (I,I,µ̃I ;R)
w̃i (ai ) µ̃I (di) ((π̂ ω )−1 ((c, ∞)))
=
I
I
= 1p1L∞ (I,I,µ̃I ;R) W ((π̂ ω )−1 ((c, ∞))).
Hence if W ((π̂ ω )−1 ((c, ∞))) &= 0, we see that c ≤ 1p1L∞ (I,I,µ̃I ;R) . Similarly we
can show that
−cW ((π̂ ω )−1 ((−∞, −c)) ≥ −1p1L∞ W ((π̂ ω )−1 ((−∞, −c))),
and hence again that if W ((π̂ ω )−1 ((−∞, −c))) &= 0 then c ≤ 1p1L∞ (I,I,µ̃I ;R) .
Combining these results and taking the contrapositive, we see that if c >
1p1L∞ (I,I,µ̃I ;R) , then W ((|π̂ ω |)−1 ((c, ∞))) = 0. In other words, we infer that
1π̂ ω 1L∞ (X,X ,W ;R) ≤ 1p1L∞ (I,I,µ̃I ;R) .
.
-
4.2 The Price Equation
It turns out that the properties of the invariant measure discussed in the previous subsection are those needed to construct a suitable Price Equation. The
arguments of this subsection apply for a fixed ω ∈ Ω. We therefore suppress
the superfluous ω superscript from the notation.
The definition of π̂ in (5), linearity of Radon-Nikodym differentiation, and
Lemma 4 imply that
π̂ =
d
d
(EI [pi w̃i (ai )]) = T π +
(EI [(pi − E[p|χ]i )w̃i (ai )]) .
dW
dW
Using the above equation, the invariance of the class-weighting τ , and the
change of variables formula for Radon-Nikodym derivatives (Halmos 1950,
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
15
§32, Theorem B), we see that
"
(π̂(x) − π(x)) τ (dx) =
X
=
" !
"X
X
#
d
(EI [(pi − E[p|χ]i )w̃i (ai )]) τ (dx)
dW
dτ
(x) EI [(pi − E[p|χ]i )w̃i (ai )] (dx).
(6)
dW
dτ
: X → [0, ∞). Integration
Note that since W and τ are positive measures, dW
with respect to signed measures is understood as in Rudin (1966, §6.18), that
is, as an integral with respect to the ‘polar decomposition’ of the measure, thus
it is defined in terms of a usual integral with respect to a positive measure.
Our plan is to interchange the order of integration in (6) so that the expression becomes an average over individuals of an integrand involving a weighted
average over classes of offspring measures, and thus looks rather more like an
integral involving the fitness of individuals as defined above. This manipulation is not immediate, but is permitted by the following technical lemma,
which is a similar statement to the quasi-Fubini result of Lemma 1, but now
involves in general non-positive weights on the measures w̃i (ai ), corresponding
to our desire to allow p-scores to be defined via arbitrary allelic weightings.
This makes the argument a little more delicate, so we include some details of
the proof.
Lemma 6 Let f ∈ L∞ (I, I, µ̃I ; R), g ∈ L1 (X, X , µX ; [0, ∞)), and all other
notation be as above.
Then
!"
#
!"
#
"
"
g(x)
fi w̃i (ai ) µ̃I (di) (dx) = fi
g(x) w̃i (ai )(dx) µ̃I (di). (7)
X
I
I
X
Proof Define the weighted offspring measure Wf ∈ M(X) by Wf = EI [fi w̃i (ai )].
The result is easily seen to be true, via Lemma 2, for measurable simple
functions. Let sn : X → [0, ∞) be a sequence of measurable simple functions
such that sn (x) ↑ g(x). We easily see by the monotone convergence theorem
that for i ∈ I,
"
"
g(x) (fi w̃i (ai ))(dx) = fi
g(x) w̃i (ai )(dx).
X
X
Now, for i ∈ I, we have that
0 "
0
"
0
0
0fi
0
sn (x) w̃i (ai )(dx)0 ≤ 1f 1L∞ (I,I,µ̃I ;R)
g(x) w̃i (ai )(dx),
0
X
X
where by Lemma 1,
#
" !"
"
g(x) w̃i (ai )(dx) µ̃I (di) =
g(x) W (dx) < ∞
I
X
X
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
16
+
since g ∈ L1 (X, X , W ; [0, ∞)). So i (→ X g(x) w̃i (ai )(dx) is µ̃I -integrable, and
thus by the dominated convergence theorem,
!"
!"
#
#
"
"
g(x) w̃i (ai )(dx) µ̃I (di).
lim
fi
sn (x) w̃i (ai )(dx) µ̃I (di) = fi
n→∞
I
X
I
X
(8)
This argument shows that the limits behave as required for the right-hand side
of our required expression.
We recall from the definition of integration with respect to a signed measure
that there exists some X -measurable function φ : X → {±1} such that dWf =
φ d|Wf |, thus
"
"
g(x) Wf (dx) =
g(x)φ(x) |Wf |(dx)
X
X
and
"
sn (x) Wf (dx) =
X
"
sn (x)φ(x) |Wf |(dx)
X
for all n ≥ 1. Now we observe that for arbitrary Y ∈ X , use of Lemma 2 shows
that
|Wf (Y )| ≤ 1f 1L∞(I,I,µ̃I ;R) W (Y ),
and hence
"
g(x) Wf (dx) ≤ 1f 1L∞ (I,I,µ̃I ;R)
X
"
g(x) W (dx),
X
since g ≥ 0. So g ∈ L1 (X, X , Wf , [0, ∞)), and thus gφ ∈ L1 (X, X , |Wf |; [0, ∞)).
Since |φ(x)| = 1 for µX -almost every x ∈ X, the choice of sn implies that
|sn (x)φ(x)| ≤ |g(x)φ(x)| ,
so we can used the dominated convergence theorem to see that
"
"
lim
sn (x)φ(x) |Wf |(dx) =
g(x)φ(x) |Wf |(dx).
n→∞
X
X
Hence by definition
lim
n→∞
"
sn (x) Wf (dx) =
"
g(x) Wf (dx).
X
X
So, using the result on measurable simple functions and (8), we see that
"
"
g(x) Wf (dx) = lim
sn (x) Wf (dx)
n→∞ X
X
#
" !"
sn (x) fi w̃i (ai )(dx) µ̃I (di)
= lim
n→∞ I
!" X
#
"
= fi
g(x) w̃i (ai )(dx) µ̃I (di),
I
as required.
X
.
-
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
17
dτ
Applying this result with fi = pi − E[p|χ]i and g(x) = dW
(x), we recall (6)
and the definition of Fi (ai ) and see now that
#
!"
"
"
dτ
(x) w̃i (ai )(dx) µ̃I (di)
(π̂(x) − π(x)) τ (dx) = (pi − E[p|χ]i )
X
I
X dW
= EI [(pi − E[p|χ]i )Fi (ai )].
(9)
We now wish to reintroduce and average over the states of nature ω ∈ Ω
which have hitherto been fixed.
Fixing both an individual i ∈ I and an admissible strategy q ∈ Si , Lemma 1
implies that ω((→ Fiω (q) is measurable. So we may define the expected fitness
function F ∈ i∈I [0, ∞]Si by
Fi (q) = EΩ [Fiω (q)] ,
or, equivalently, by (2),
Fi (q) =
"
X
dτ
(x) EΩ [w̃iω (q)](dx).
dW
(10)
This is the expected fitness of an individual i ∈ I playing strategy q ∈ Si , and
the latter expression shows it to be given by an appropriately-weighted average
of the expected contributions to offspring classes, when playing strategy q.
In this general context we must check that the fitness of the individuals
with their realized phenotypes is a well-defined finite number. As above, the
map (i, ω) (→ Fiω (ai ) is measurable, and hence by the classical Fubini Theorem (Rudin 1966, Theorem 7.8), i (→ Fi (ai ) is measurable. Fubini’s Theorem
for Bochner integrals (see Dunford and Schwartz 1958, §III.11.9 Theorem 9)
implies that
EI [EΩ [w̃iω (ai )]] = EΩ [EI [w̃iω (ai )]].
Hence we note, using (10) and (2) once more, that
#
"
" !"
dτ
Ω
ω
0 ≤ Fi (ai ) µ̃I (di) =
(x) E [w̃i (ai )](dx) µ̃I (di)
X dW
I
"I
dτ
=
(x) EI [EΩ [w̃iω (ai )]](dx)
dW
"X
dτ
(x) EΩ [EI [w̃iω (ai )]](dx)
=
X dW
"
dτ
=
(x) EΩ [W ](dx)
dW
X
"
dτ
(x) W (dx)
=
dW
X
= τ (X)
< ∞.
Thus i (→ Fi (ai ) is a function in L1 (I, I, µ̃I ; R), and hence 0 ≤ Fi (ai ) < ∞ for
µ̃I -almost every i ∈ I.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
18
The final step is to define the expected value of our class-average value
π̂ ω in the following generation. Radon-Nikodym differentiation is an isometry
from {µ ∈ M(X) : µ ) W } to L1 (X, X , W ; R), so the function ω → π̂ ω is
strongly measurable. Moreover, for each ω ∈ Ω, we see using standard facts
about Bochner integrals (see Diestel and Uhl 1977, Chapter 2, Theorem 2.4)
that
1
1
1
1 d
ω
1
1
E
[p
w̃
(a
)]
≤ 1EI [pi w̃iω (ai )]1M(X)
1 dW I i i i 1 1
L (X,X ,W ;R)
2
3
≤ EI 1pi w̃iω (ai )1M(X)
≤ 1p1L∞ (I,I,µ̃I ;R) W (X),
and therefore that ω (→ π̂ ω is Bochner integrable. We may therefore define the
expected value π̂ ∈ L1 (X, X , W ; R) by
π̂ = EΩ [π̂ ω ].
Using once more that Radon-Nikodym differentiation is an isometry, we use
the result of Hille (Diestel and Uhl 1977, Chapter 2, Theorem 2.6) to see that
we can commute it with the expectation and conclude that
4
5
d
d Ω
π̂ = EΩ
EI [pi w̃iω (ai )] =
E [EI [pi w̃iω (ai )]].
(11)
dW
dW
Having established these expected values of our variables, we use expression (9), linearity of expectation and Fubini’s theorem to see that the expected
difference in the class-average values calculated in the two generations can be
related to the p-score with which we began and expected fitness:
4"
5
"
(π̂(x) − π(x)) τ (dx) = EΩ
(π̂ ω (x) − π(x)) τ (dx))
X
X
2
3
= EI (pi − E[p|χ]i )EΩ [Fiω (ai )]
= EI [(pi − E[p|χ]i )Fi (ai )]
= EI [pi (Fi (ai ) − E[F (a)|χ]i )] ,
(12)
(13)
where the conditional expectation of F (a) is understood here and elsewhere
(including references to the corresponding covariance) to refer to the function
i (→ Fi (ai ).
We remark that at this point, with the definitions given and properties assumed, these equations are not necessarily susceptible to intelligible interpretation, despite our suggestive notation, since they do not in general represent
the change from one generation to the next of a comparable quantity. When,
however, the function p : I → R is a p-score, i.e. represents an additive genetic
trait and is thus an allele frequency or linear combination of allele frequencies, then as discussed above p is the composition of a map on an underlying
space of genotypes and a map specifying each individual’s genotype. Thus in
principle corresponding values may be computed in the offspring generation.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
19
Assuming perfect transmission, i.e. no mutation, fair meiosis, and no gametic
selection (Grafen 2000), precisely by definition of being an additive genetic
trait, the expected mean value by class of this trait in the offspring generation
is given by the expectation of the average over the gametic contributions (Falconer 1981). However, (11) implies that for all Y ∈ X ,
"
π̂(x) W (dx) = EΩ [EI [pi w̃iω (ai )]](Y ).
Y
The right-hand side is precisely the expected average of the gametic contributions to the set of classes Y . Since π̂ ∈ L1 (X, X , W ; [0, ∞)) is the unique
element satisfying this equation for all Y ∈ X , we see that π̂ therefore represents the expected mean value by class of the additive genetic trait in the
offspring generation, W -almost everywhere. Since τ ) W , any discrepancies
on W -null sets are lost when weighted by reproductive value τ . Thus the equations we derived above record the change in the mean value by class of this
additive genetic trait from the parental to the offspring generation, weighted
by reproductive value. We record these remarks in the following theorem.
Theorem 1 (The Price Equation) Supposing perfect transmission, the expected change in the mean value of an additive genetic trait, weighted by reproductive value, is given by
"
(π̂(x) − π(x)) τ (dx) = EI [(pi − E[p|χ]i )Fi (ai )]
X
= CI [pi − E[p|χ]i , Fi (ai )]
= EI [CI [(p, F (a))|χ]i ]
= EI [pi (Fi (ai ) − E[F (a)|χ]i )]
(14)
= EI [pi (agv(F (a) − E[F (a)|χ])i )],
(15)
where p ∈ P represents the parental values of the trait.
Proof The interpretation of the left-hand side of (15) is discussed above. The
equality of the first four expressions on the right-hand side follows by standard
properties of conditional expectations (see Billingsley 1995, Theorem 34.3).
Moving from (14) to (15) is possible in this situation because by assumption
p is a p-score, and therefore we can apply equation (3).
.
Equation (15) is the final version of the Price Equation in the situation of a
class-structured population with uncertainty, when this uncertainty does not
affect the class distribution of offspring at the population level.
The Price Equation thus represents for us how a special class-weighted
mean of any arbitrary weighted sum of allele frequencies changes from this
generation to the next. The class-weights do not depend on the allele-weights,
and individual expected fitnesses on the right hand depend on the class-weights
but not on the allele-weights. Thus we obtain some purchase on changes in
our arbitrary space of genotypes by knowing how every arbitrary weighted
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
20
sum of allele frequencies changes. The genetic side can be linked to phenotypes at an individual level because the genetic changes depend only on the
expected individual fitnesses. These abstract connections make the most of
the phenotype-genotype separation inherent in the original covariance selection mathematics of Price (1970) and, remarkably, allow links to be made to
fitness-maximization ideas without any further specification of how genotype
determines phenotype.
4.3 Comparison with Grafen (2006b)
Before pursuing the implications of the Price Equation for fitness-maximization,
we conclude this section by observing that our approach is indeed consistent
with that of Grafen (2006b). To see this we reduce our setup to the cases examined there. Thus we suppose there is no effect of uncertainty, so Ω = {ω},
and the offspring distributions have no explicit dependence on class. Rather,
the situation Grafen considers is where the offspring distribution depends only
on the individual. We may capture this situation by considering the measure
w(i) in the notation of Grafen (2006b) to be given in our notation as w(ai ).
Let Y ∈ X . We calculate the class reproductive value of Y , using our definitions and concepts, by regarding our fitness Fi (ai ) as a Fisherian per-capita
reproductive value per ploidy at the level of the individual. So our calculation
is, using Lemma 1, change of variables in Radon-Nikodym derivatives, and the
invariance of τ :
#
!"
"
"
dτ
(x) w(ai )(dx) µ̃I (di)
Fi (ai ) µ̃I (di) =
χ−1 (Y )
χ−1 (Y )
X dW
"
dτ
=
(x) EI [(1χ−1 (Y ) )i w(ai )](dx)
dW
"X
d
(EI [(1χ−1 (Y ) )i w(ai )])(x) τ (dx)
=
dW
"X
d
=
(EI [(1Y ◦ χ)i w(ai )])(x) τ (dx)
dW
X
"
=
T 1Y (x) τ (dx)
"X
=
1Y (x) τ (dx)
X
= τ (Y ).
This is precisely what Grafen considers to be class reproductive value. We can,
then, for example, quickly recover Fisher’s sex ratio argument (after Grafen
2006b, §8.1). In this situation we have a space of two classes, X = {M, F }
say, representing the sexes. The assumption of equal male and female contribution to offspring is precisely the assumption that EI [(1χ−1 (M) )i w(ai )] =
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
21
EI [(1χ−1 (F ) )i w(ai )] as measures on X, so, arguing as above,
"
"
dτ
(x) EI [(1χ−1 (M) )i w(ai )](dx)
Fi (ai ) µ̃I (di) =
χ−1 (M)
X dW
"
dτ
=
(x) EI [(1χ−1 (F ) )i w(ai )](dx)
dW
X
"
=
Fi (ai ) µ̃I (di).
χ−1 (F )
Thus already our work seems to support the interpretation of Fisher’s notion of reproductive value as an evolutionary maximand. We shall uncover
further connexions to Fisher’s work as we formalize the fitness-maximization
consequences of the Price Equation in the following sections.
5 Optimization
The work of this section is to construct an optimization program at the same
level of generality as the Price Equation of the previous section. The instrument will be phenotype, the constraint set will be the set of possible phenotypes, and, most significantly from a biological point of view, the maximand
will be expected fitness. This definition of what fitness-maximization means
stands in contrast to the way population geneticists have in the past attempted
to represent the biologist’s sense of fitness-maximization, namely in terms of
natural structures on dynamical systems such as Lyapunov functions and gradient functions (Ewens 2004). The contrast is immediately clear: both of those
mathematical concepts are functions from the space of gene-frequencies to the
real line, rather than from the set of possible phenotypes; it may also be noted
that working with those concepts requires dynamic sufficiency, which the current framework lacks. Grafen (2002) and Grafen (2006a) provided a parallel
optimization program including uncertainty, while Grafen (2006b) was unable
to provide one with class structure: this section provides both simultaneously.
Thus even providing an optimization program is a technical advance, and it
has biological significance in defining what fitness-maximization means.
It is worth remembering that a strategy, shortly to be defined, is a mapping from environmental cues to actions, and thus the individual is regarded
as making decisions in the face of partial knowledge about uncertainty. Solving the program will therefore imply acting as if in possession of a correct
prior distribution over the whole uncertainty, and as if performing appropriate Bayesian updating of that prior in the light of the partial information
received. The maximand is a probability-weighted arithmetic mean over the
uncertainty, and so this fitness-maximization does not exhibit the risk aversion
or bet-hedging that appears in many biological discussions of uncertainty (for
reasons best explained by Frank and Slatkin 1990): that difference arises because in this paper fitness is defined as relative to the population mean in each
given state of nature. (The recent series of papers by Frank (2011a,b, 2012a,b,c,
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
22
2013a,b) presents a modern discussion of bet-hedging and many other relevant
topics.) The advantage of the relative definition is precisely that we can operate at the current very high level of generality. The bet-hedging models are all,
in comparison, very special cases, in that they assume some definite genetic
architecture, and they do not focus on individual fitness maximization.
In moving from the population genetic model to the optimization program,
there are two steps of note. The whole population of the genetic model must
be reduced to the single implicit decision-taker of the optimization. In order
to do this, an assumption must be made to ensure that individuals within
the same class are, in some suitable sense, equivalent. Nothing assumed up to
now prevents the population being in two separate halves with quite different
selection pressures. First noticed by Grafen (2002), this kind of assumption
is an essential part of any general argument linking population genetics to
fitness maximization. Its precise form could be of interest in applications, in
determining whether fitness-maximization ideas can be applied or not. Perhaps
even more importantly, requiring mathematical rigour and proofs allowed us
to find and articulate this once-latent assumption, and also allows us to be
confident that there are no further assumptions waiting to be uncovered.
The definitions and results of the rest of the paper follow those of Grafen
(2002), but are extended to include simultaneously uncertainty and the division of the population into classes.
Definition 3 (Pairwise exchangeability) We say the assumption of pairwise exchangeability holds if
– for all measurable functions v : A × X × U → R, the function
(i, q, c) (→ ν ({ω ∈ Ω : v (q(riω ), χi , uω
i ) ≤ c}) ,
mapping from I × Q × R to [0, 1], is measurable with respect to the product
σ-algebra σ(χ) × Q × B, where B denotes the usual Borel σ-algebra on R;
and
– for all x ∈ X, Sj = Sk whenever j, k ∈ χ−1 ({x}).
The significance lies in the stipulation of the σ-algebra σ(χ) on I with respect to which measurability is asserted. Technical reasons arising from the
subtleties of integration and measurability on product spaces demand that the
assumption is stated in this somewhat elaborate form, but the content of the
assertion lies in the reference to σ(χ), rather than to I. Roughly, the assertion
is that any pair of individuals of the same class have a symmetric distribution
of chance events, and the collection of admissible strategies is the same. This
is thus the natural generalization of the corresponding assumption in Grafen
(2002, §4).
This can also be read as a comment on how the class structure is defined, in
stipulating that classes are not so broad as to allow within them individuals
which face on average wholly different situations under the effects of environmental uncertainty. The question of how appropriate and workable the class
structure is arises again later (see definition 7), where a tension in the opposite
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
23
direction is revealed, towards classes not being too small. We therefore find it
more informative to retain pairwise exchangeability as an explicit assumption
rather than incorporate it more implicitly into the fundamental properties of
the class structure, and we thereby admit the possibility of a structured population in which the assumption fails. For concreteness, suppose we have a
population in which males and females have different sets of possible actions,
and in which juveniles differ in survivorship from adults in each sex. Then it
is possible that the assumption holds when we model the population with a
full age and sex class structure. However, if we modelled with only age classes
or with only sex classes, the assumption would fail. We further remark that,
as indicated in the statements, some of our results hold without this assumption. As an aside, the issues of correctness of class structure, and whether such
questions can be precisely formulated, are intriguing, particularly in relation
to the interaction of class and genotype, and warrant further attention.
The important consequence of pairwise exchangeability is captured in the
following lemma, in which we see that under this assumption, within classes,
expected fitness is a function only of strategies, not of individuals. Again, technical reasons demand that the result is stated more subtly, but the essence is
the blindness to individual differences within classes, in expectation, of fitness. This allows us to pass from a class of individuals each playing their own
strategy to a single implicit decision maker in each class.
Lemma 7 Suppose we have pairwise exchangeability.
Then the map
6
({i} × Si ) + (i, q) (→ Fi (q),
i∈I
assigning an expected fitness to an individual playing a strategy admissible for
that individual, is measurable with respect to the induced product σ-algebra
σ(χ) × Q. As in the definition of pairwise exchangeability, the significance of
the statement lies in the assertion of measurability with respect to the σ-algebra
σ(χ).
Proof By (10) and the definition of the integral, it suffices to show that for
each set of classes Y ∈ X the map
(i, q) (→ EΩ [w̃iω (q)](Y )
is measurable with respect to σ(χ) × Q. So fix a set Y ∈ X . Using Lemma 2
and, for example, Kingman and Taylor (1966, Theorem 11.4) we have for each
individual i ∈ I and strategy q ∈ Si that
"
ν ({ω ∈ Ω : w̃iω (q)(Y ) ≤ c}) L 1 (dc),
EΩ [w̃iω (q)](Y ) = EΩ [w̃iω (q)(Y )] =
R
(16)
where the final integral is with respect to the usual one-dimensional Lebesgue
measure L 1 on R. That this final expression is measurable as a function of
(i, q) with respect to σ(χ)×Q follows by pairwise exchangeability and Fubini’s
Theorem.
.
-
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
24
Definition 4 (Optimization program) Let
S ⊆ {s : I × R → A : s is measurable and si ∈ Si for all i ∈ I}
contain the realized phenotype allocation a and satisfy the following
substitu(
tion condition: if s, t ∈ S and J ∈ I, then the function st,J ∈ i∈I Si defined
by
,
si (r) i ∈
/J
(st,J )i (r) =
ti (r) i ∈ J
also lies in S. Thus substituting a different admissible specification of phenotypes on a certain subset of individuals is also admissible. Let k ∈ I. We
consider the optimization program of maximizing Fk (sk ) over all s ∈ S,
where s : k (→ sk ∈ Sk . We say s̄ ∈ S is a solution for k in relation to S
if Fk (s̄k ) ≥ Fk (sk ) for all s ∈ S.
Remark 2 The substitution condition on the choice set S precisely stipulates
that the strategies available to any one individual do not depend on the strategies played by any other set of individuals, and is implied by our underlying
biological assumption of lack of social interaction.
We now move on to two key concepts relating to the Price Equation. They
are carefully constructed to be weak enough to be usable despite our lack of
model connecting phenotype to genotype, but strong enough to allow meaningful links to be made to optimization. Scope for selection almost means that
extant gene frequencies do change in expectation — in fact it says there is
a possible constructible p-score that would change in expectation, where we
allow ourselves to construct a ‘possible’ p-score by assigning an arbitrary real
number to each individual (in a measurable way, of course).
Definition 5 (Scope for selection) We say there is no scope for selection
whenever the expected change in any class average value is zero, i.e. whenever
"
(π̂(x) − π(x)) τ (dx) = 0
X
∞
for all p ∈ L (I, I, µ̃I ; R) (recall that π and π̂ ω depend on p). Thus there
is scope for selection when there are differences in fitness that could cause
an allele frequency to change. We emphasize that this definition discusses
behaviour for arbitrary essentially bounded functions p, not just p-scores.
This is not an analogue of the first part of the ESS definition of Maynard Smith
and Price (1973), nor of a first-order condition in simple optimization. Note
that the condition is that no mean p-score and so no gene frequency changes,
and that nothing is said about genotype frequencies. This is an inevitable consequence of our abstract setting, and has consequences for the interpretation
of the links to be proved later on.
The second condition considers a counter-factual case in which some of the
phenotypes in the population are replaced with a new phenotype, and asks
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
25
whether, supposing the individuals with altered phenotype each had one copy
of a new allele, that allele would spread in expectation. There is potential for
selection if there is a possible phenotype for which the answer is yes. Thus
scope for selection is about standing variation in phenotypes, and potential
for selection is about whether a new phenotype would spread if caused by a
rare dominant mutation.
Definition 6 (Potential for positive selection) Consider an alternative
set of strategies s ∈ S and a subset of individuals J ∈ I. We define a hypothetical rival set of admissible strategies ãs,J ∈ S by
,
ai (r) i ∈
/J
(ãs,J )i (r) =
si (r) i ∈ J.
Then ãs,J ∈ S represents individuals in J swapping to strategies given by
s, and indeed is itself admissible and lies in S precisely by our substitution
assumption on S.
We say there is no potential for positive selection in relation to S if for all
s ∈ S and all J ∈ I, we have
−1
EI [(d−1
1J )|χ]i )Fi ((ãs,J )i )] ≤ 0.
i (1J )i − E[(d
Note that the condition is satisfied trivially, with equality holding, if µ̃I (J) = 0.
We make the stipulation that the substitution of strategy on the set J affects
only the evaluation of the function F on points i ∈ J, despite the fact that
F depends by definition on W , which is in general a different measure under
this substitution.
The purpose of this definition is to enable us to discuss the possibility of a
mutant allele invading the population on some set of individuals J, and thereby
altering the strategy ai of these individuals to si . The defining inequality comes
of course from the relation (12). The function i (→ d−1
i (1J )i is the p-score
obtained by allocating weight 1 to the mutant allele and 0 to all others: it
therefore represents the presence of one copy of the mutant allele in precisely
those individuals i ∈ J. Thus there is potential for selection when a rare
mutant, producing with dominance some admissible phenotype, would initially
increase in density in the population. The restriction of rarity arises because
of the implicit assumption that no individual possesses more than one copy of
the mutant gene.
6 Links between gene dynamics and optimization
Having established the situation, we prove analogous version of the four links to
be found in Grafen (2002). These comprise three implications for gene dynamics based on optimization assumptions, and one implication for optimization
based on assumptions about gene dynamics. As the instrument and constraint
set of the optimization are more or less copied over from the population genetic assumptions, the focus of interest is the maximand. In particular, it is
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
26
important to show that the maximand has many of the properties a biologist
would wish ‘fitness’ or ‘expected fitness’ to have. From the point of view of
its construction, it is an average over environmental uncertainty of a classweighted sum of offspring numbers, which is biologically reasonable. From the
point of view of links to genetics, an unattainable dream would be to show
that exactly when each individual maximizes its fitness, the population genetic
system itself is in equilibrium. Our level of abstraction prevents us obtaining
such a definite result, but it is in any event untrue, for example in simple
cases such as over-dominance in a diploid population (Allison 1954). Our aim
is therefore to prove the strongest possible results in that direction with the
twin aims of showing that there are close ties between fitness-maximization
and gene-frequency change, and that our definition of fitness is essentially
unique, though this latter point will not be pursued formally in the current
paper. The interpretation of the theorems will be discussed in the following
section.
Theorem 2 Suppose we have pairwise exchangeability, and suppose for some
set of individuals I˜ ∈ I of full µ̃I -measure that a is a solution for i in relation
˜
to S for every i ∈ I.
Then there is no scope for selection and no potential for positive selection
in relation to S.
Proof The trick is to exploit the assumption of pairwise exchangeability to
infer from the assumption of maximization that the expected fitness of the
realized phenotypes is equal to its class average. By Lemma 7 and the DoobDynkin Lemma (see Rao 2004, §3.1 Theorem 8), there is a measurable function
H : X × Q → [0, ∞] such that
Fi (q) = H(χi , q)
for all i ∈ I and q ∈ Si .
˜ Note that by pairwise exFix x ∈ X and consider j, k ∈ χ−1 ({x}) ∩ I.
changeability aj ∈ Sk and ak ∈ Sj , so it makes sense to evaluate Fk (aj ) and
Fj (ak ). Then
Fj (aj ) = H(χj , aj ) = H(χk , aj ) = Fk (aj ) ≤ Fk (ak ),
since a is a solution for k in relation to S. Swapping the roles of j and k we get
the reverse inequality, thus Fk (ak ) = Fj (aj ). Hence for any Borel set B ⊆ R,
$
'
i ∈ I˜ : Fi (ai ) ∈ B
$
'
= i ∈ I˜ : (χi , ai ) ∈ H −1 (B)
$
%
&
'
= i ∈ I˜ : {χi } × {aj : j ∈ I˜ ∩ χ−1 ({χi })} ∩ H −1 (B) &= ∅
%$
%
&
'&
= I˜ ∩ χ−1 x ∈ X : {x} × {aj : j ∈ I˜ ∩ χ−1 ({x})} ∩ H −1 (B) &= ∅ .
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
27
Since H is measurable, this final line is the restriction to I˜ of the χ-pre-image
of a measurable subset of X, by our assumption (1), which therefore lies in
σ(χ) by definition. Hence i (→ Fi (ai ) is measurable with respect to σ(χ) as a
˜ By definition, i (→ E[F (a)|χ]i is measurable with respect to σ(χ).
map from I.
Furthermore, by definition of conditional expectation, and since I\I˜ is µ̃I -null,
we see for any Y ∈ X that
"
"
Fi (ai ) µ̃I (di) =
Fi (ai ) µ̃I (di)
˜ −1 (Y )
I∩χ
χ−1 (Y )
"
E[F (a)|χ]i µ̃I (di)
=
χ−1 (Y )
"
=
E[F (a)|χ]i µ̃I (di).
˜ −1 (Y )
I∩χ
Since Y is arbitrary, this implies (see for example Halmos 1950, §25 Theo˜ and hence for µ̃I rem E) that Fi (ai ) = E[F (a)|χ]i for µ̃I -almost every i ∈ I,
almost every i ∈ I. Thus the expression (13) is zero for any p ∈ L∞ (I, I, µ̃I ; R).
For the second assertion of the theorem, fix s ∈ S and J ∈ I. Note that
the first argument gives us in particular that
−1
EI [(d−1
1J |χ]i )Fi (ai )] = EI [d−1
i (1J )i − E[d
i (1J )i (Fi (ai ) − E[F (a)|χ]i )] = 0,
using properties of conditional expectation, and recalling that ploidy d is measurable with respect to σ(χ). So again using properties of conditional expectation, we see that, since Fi ((ãs,J )i ) = Fi (ai ) for i ∈
/ J,
2 −1
3
EI (di (1J )i − E[d−1 1J |χ]i )Fi ((ãs,J )i )
2
3
= EI d−1
(17)
i ((1J )i − E[1J |χ]i )(Fi ((ãs,J )i ) − Fi (ai ))
!"
#−1"
=
di µI (di)
d−1
i ((1J )i − E[1J |χ]i ) (Fi ((ãs,J )i ) − Fi (ai )) di µI (di)
I
=
!"
I
≤ 0,
J
#−1"
di µI (di)
(1 − E[1J |χ]i ) (Fi ((ãs,J )i ) − Fi (ai )) µI (di)
J
(18)
where the final inequality follows since E[1J |χ]i ≤ 1 µI -almost everywhere and
Fi ((ãs,J )i ) ≤ Fi (ai ) for i ∈ I˜ because a is a solution for i in S by assumption,
˜ = µ̃I (J\I)
˜ = 0.
and µI (J\I)
.
We must now consider on what subsets of individuals we can consider a hypothetical mutant allele invading, when discussing potential for positive selection.
Definition 7 (Invadable sets) Let
σ(χ) = {J ∈ I : J = K4N for some K ∈ σ(χ) and some null set N ∈ I},
where K4N = K\N ∪ N \K is the usual set-theoretic symmetric difference.
We shall say that J ∈ I of positive µ̃I -measure is invadable if there exists
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
28
K ∈ I\σ(χ) of positive µ̃I -measure such that µ̃I (K\J) = 0. In particular any
set J ∈ I\σ(χ) of positive µ̃I -measure is invadable. We shall further say that
our phenotype specification a : I × R → A is invadable if the set
{i ∈ I : a is not a solution for i in relation to S}
is either µ̃I -null or contains an invadable subset.
Invadable sets and their existence are determined by the nature of the class
map χ, as the remarks below indicate. Class is intuitively intended to be a
way of dividing a large population of individuals into smaller groups: naively
one would expect each class to be populated by many individuals, in which
case invadable sets are in abundance. In the generality pursued here, where no
restriction is placed on the cardinality of either the population or the set of
classes, such informal remarks have little meaning, but the principle underlying
this discussion is that the condition of a set being invadable should not, under
a natural and useful class allocation, be a restrictive one.
The technical point of this definition is that an invadable set contains a
subset K for which the defining condition of potential for positive selection
is not trivially zero. Restricting attention to invadable sets ensures that the
presence or absence of potential for positive selection is indeed determined by
the evaluation of the maximand.
Remark 3 The crucial consequence of K not lying in σ(χ) is that then it is
not the case that E[1K |χ] = 1 µ̃I -almost everywhere on K.
Remark 4 Removing a µ̃I -null set from an invadable set leaves an invadable
set, and supersets of invadable sets are invadable.
Remark 5 We note the following special cases:
(5.1) J ∈ I is invadable if
7
8
0 < µI (J) < inf µI (χ−1 ({x})) : x ∈ X such that µI (χ−1 ({x})) > 0 .
(5.2) If χ−1 ({x}) is an atom of µI for all x ∈ X, then no set is invadable.
(5.3) If I is finite, and singletons have positive µI -measure, then invadable
sets exist if and only if χ is not injective.
Theorem 3 Suppose that there exists a set of individuals I˜ ∈ I of full µ̃I ˜ but that the set
measure such that Fi (ai ) − E[F (a)|χ]i = 0 for each i ∈ I,
J = {i ∈ I : a is not a solution for i in relation to S}
is of positive µ̃I -measure.
Then there is no scope for selection. However, assuming pairwise exchangeability and that a is invadable, there is potential for positive selection in relation
to S.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
29
Proof For the first assertion, we first observe that the corresponding assertion
of Theorem 2 in fact required only that Fi (ai ) − E[F (a)|χ]i = 0 µ̃I -almost
everywhere. In that situation this was implied by optimality of each a for µ̃I almost every individual i, but here it is given explicitly by assumption. So the
same proof applies.
For the second assertion, by the assumption of invadability, we can choose
sets K, K̃ ∈ I of positive µ̃I -measure with K̃ ⊆ K such that (1K )i −E[1K |χ]i >
0 for each i ∈ K̃, and µ̃I (K\J) = 0.
We know that, since a ∈ S is not a solution in relation to S for each
i ∈ K ∩ J, we can choose s ∈ S and for each i ∈ K ∩ J a number *i > 0 such
that Fi (ai ) + *i < Fi (si ). Since i (→ Fi (si ) is measurable, the map i (→ *i can
be assumed to be measurable. By the substitution assumption on S, ãs,K ∈ S.
Thus for i ∈ K ∩ J, we have by choice of s that
Fi ((ãs,K )i ) > Fi (ai ) + *i .
(19)
Since there is no scope for selection, we again have the identity (17), and so
we can argue, using (19) and the fact that µ̃I (K\J) = 0, as follows:
2
3
−1
EI (d−1
1K |χ]i )Fi ((ãs,K )i )
i (1K )i − E[d
!"
#−1"
=
di µI (di)
((1K )i − E[1K |χ]i ) (Fi ((ãs,K )i ) − Fi (ai )) µI (di)
I
≥
≥
!"
!"
> 0,
I
I
K
#−1"
di µI (di)
((1K )i − E[1K |χ]i )*i µI (di)
K∩J
#−1"
di µI (di)
((1K )i − E[1K |χ]i )*i µI (di)
K̃∩J
since the choice of K̃ implies that the integrand is strictly positive at each
point of K̃ ∩ J, which is a set of positive µI -measure.
.
Theorem 4 Suppose there exists a set of individuals J ∈ I of positive µ̃I measure such that Fi (ai ) − E[F (a)|χ]i &= 0 for each i ∈ J.
Then there is scope for selection.
Proof Define p : I → R by pi = Fi (ai ) − E[F (a)|χ]i . Then we do not know a
priori that p defines an essentially bounded function on I. However, since it is
only important for this proof that the function is non-zero on a set of positive
measure, we can truncate the function if necessary and assume without loss
of generality that p ∈ L∞ (I, I, µ̃I ; R). Then (13) implies, for this definition of
p, that
"
(π̂(x) − π(x)) τ (dx) = EI [(Fi (ai ) − E[F (a)|χ]i ) (Fi (ai ) − E[F (a)|χ]i )]
X
"
2
≥
(Fi (ai ) − E[F (a)|χ]i ) µ̃I (di)
J
> 0.
.
-
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
30
Theorem 5 Suppose we have pairwise exchangeability and that a is invadable, and suppose there is no scope for selection and no potential for positive
selection in relation to S.
Then there exists a set of individuals I˜ ∈ I of full µ̃I -measure such that a
˜
is a solution for i in relation to S for each i ∈ I.
Proof By the contrapositive to Theorem 4, since there is no scope for selection
we know that Fi (ai ) − E[F (a)|χ]i = 0 for µ̃I -almost every i ∈ I.
Given this, the contrapositive to the second assertion of Theorem 3 gives
the required result.
.
A glance at the proof of Theorem 4 prompts us to record the following important consequence of our work.
Theorem 6 (Fisher’s Fundamental Theorem of Natural Selection (Discrete Time)) Let p ∈ P be defined by
pi = agv(F (a) − E[F (a)|χ])i .
Then
"
(π̂(x) − π(x)) τ (dx)
X
=
#2
" !
agv(F (a) − E[F (a)|χ])i − EI [agv(F (a) − E[F (a)|χ])] µ̃I (di),
I
recalling that
π(χi ) = E[(agv(F (a) − E[F (a)|χ])|χ]i
and
Ω
π̂(x) = E
4
d
EI [(agv(F (a) − E[F (a)|χ])i w̃iω (ai )]
dW
5
represent the (expected) mean values by class of agv(F (a) − E[F (a)|χ]) in the
parental and the offspring generations respectively.
That is: the expected change in the mean of the additive genetic value of the
deviation of the expected fitness from the class mean, weighted by reproductive
value, is equal to the (unweighted) variance of the additive genetic value of the
deviation of the expected fitness from the class mean.
Proof We first recall the definition of p-scores, and observe that if we choose
a locus 1 ≤ L ≤ N , this corresponds to choosing the column (gk,L )nk=1 of
the genotype matrix G = (gk,l ) ∈ Rn×N . The total number of alleles at this
locus in any individual must equal the ploidy of that individual, thus when we
consider the linear map ξL : Rn×N → R defined by summing the entries of the
column of the matrix G = (gk,l ) ∈ Rn×N , i.e.
ξL (G) =
n
-
k=1
gk,L ,
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
31
we see that for all i ∈ I,
ξL (gi /di ) = 1.
By definition, since ξL : Rn×N → R is linear, this—the constant function—is
a p-score. Hence we can apply (3) to see that
EI [agv(F (a) − E[F (a)|χ])i ] = EI [1 · (agv(F (a) − E[F (a)|χ])i )]
= EI [Fi (ai ) − E[F (a)|χ]i ]
= 0.
With the definition of p ∈ P made in the statement, we now apply the Price
Equation (15) to obtain
"
(π̂(x) − π(x)) τ (dx)
X
"
= (agv(F (a) − E[F (a)|χ])i )2 µ̃I (di)
I
#2
" !
=
agv(F (a) − E[F (a)|χ])i − EI [agv(F (a) − E[F (a)|χ])i ] µ̃I (di),
I
as required.
.
-
7 Interpretation, significance, and context of our results
Theorems 2 to 5 are the four links between gene frequency change and optimization, parallel to the four links proved in previous work (Grafen 2002,
2006a). If all individuals in the population solve the optimization program,
then the expected change in every gene frequency equals zero, and no phenotype, if caused by a rare dominant mutant, would cause that mutant to spread.
The strengths of this result are that it applies to all gene frequencies, whether
they affect any given trait or not, and whether they affect fitness or not; that
it therefore applies to all weighted sums of allele frequencies and so to the
additive genetic value of every quantitative trait; that fitness is a property of
the individual and in particular is the same whichever gene frequency is being
considered; that it applies to a class-structured population, with evolutionarily appropriate class-weights that are used both to aggregate the mean p-score
across classes when considering the change in mean p-score, and to provide
class-weights for offspring in the evaluation of the fitness of an individual; and
that the class-weights are the same no matter which class the parent belongs
to. Notable features of the result are that it shows that the expected change in
gene frequencies equals zero, but in any given state of nature the gene frequencies may indeed change; that it says expected gene frequencies don’t change,
but this does not imply that expected genotype frequencies do not change; that
once gene or genotype frequencies have changed from one generation to the
next, there is no guarantee, or even reason to expect, that the class-weights will
remain the same; hence the way of evaluating fitness is quite likely to change
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
32
from one period to the next. Theorem 3 is a diminished form of the previous
theorem, which supposes that each individual attains the same value of the
maximand, but does not solve it, and then while no gene-frequency changes
in expectation, there is a phenotype which, if produced by a rare dominant
mutant, would spread in expectation. Theorem 4 supposes individuals attain
different values of the maximand, and merely asserts that the expected change
in each gene frequency equals its covariance with fitness. This is a restatement
of the Price Equation, and the purpose of presenting it in this form is that
the previous two theorems would hold if the maximand were replaced with a
monotonic increasing function of the maximand, and this theorem holds only
for the expected fitness itself (up to addition of a constant). So including this
theorem ensures that the links presented do constrain as strongly as possible
the nature of the maximand of an optimization program that can take part
in the set of links. The final link is Theorem 5, which reverses the direction
of inference, and is in one way the most important. It states that if there is
no expected change in gene frequency, and if there is no phenotype which,
if produced by a rare dominant mutant, would cause that mutant to spread,
then each individual solves the optimization program: it is the only theorem
to proceed from gene frequencies to optimization, and is the central result
that contributes to establishing the reasonableness of the adaptationist viewpoint, namely that we should expect organisms to be optimally adapted. Of
course, there are many questions, which cannot be considered here, about the
extent to which this result does justify adaptationism, but the very general
nature of the whole framework, and the particularities of how ‘adaptiveness’
and in particular fitness have to be defined, represent major advances in our
understanding of adaptationism and its connection to genetics.
Theorem 6 is a generalization of the Fundamental Theorem of Natural Selection of Fisher (1930), except that we remain in discrete time and do not
pass to a continuous time limit as Fisher did. The current status of the fundamental theorem in the literature is discussed by Okasha (2008) and Ewens
(2011), and the best previous technical treatment is due to Lessard (1997).
These modern authors have discussed what the theorem states, and whether
it is true, but the consensus has been that no-one knows whether it has biological significance. Its inclusion here is intended to settle that question in
the affirmative. The formal darwinism project is the first attempt to construct
a formal justification of individual fitness maximization ideas since the fundamental theorem, and this paper shows that a wide range of abstract and
perhaps arcane concepts have to be introduced to do the job properly. It turns
out that the fundamental theorem is readily expressed and proved in terms of
those concepts, in a much more explicit way than Fisher was able to prove it,
and in a more general way than more recent derivations.
Integrating our work with the important literature on the fundamental theorem (for example Price 1972b; Ewens 1989; Frank and Slatkin 1992; Edwards
1994; Lessard 1997; Okasha 2008; Ewens 2011) must remain a task for the future. Here we indicate three significant contributions of the current work to
understanding the theorem. First, Fisher has been criticized because his verbal
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
33
statement of the theorem (‘the rate of increase in fitness of any organism at
any time is equal to its genetic variance in fitness at that time’) differ from his
technical statement. However, a statement that the mean change in X equals
the variance in X is obviously closely tied to the idea of X being maximized.
Our version is of such a form, and it is not surprising that Fisher would wish the
most accessible form of his theorem to display what he clearly believed to be
its more important implication. Further, the exact sense in which the theorem
provides a maximand, in light of the technical qualifications of the theorem,
is precisely the subject of Theorems 2 to 5. Second, the reason for the discrepancy between verbal and technical statements is that so much in Fisher’s
argument has not been made explicit. We have made the whole argument explicit, and in view of the length and complexity of the argument, it is not
surprising that Fisher did not do so, perhaps from a combination of inability
(tracking classes and reproductive values and gene frequencies is notationally
cumbersome and hard work; further, most of our mathematical references are
from later than 1930) and unwillingness (the book was for a lay audience, and
it may be doubted how much of our current argument would have been appreciated even by contemporary scientific readers, of Fisher (1941), for example).
Third, some obstacles to understanding are now explained and removed. For
example, Price (1972b)’s first technical move is to assert that the left hand
side of the theorem is not in fact the change in fitness, but only that part of
the change in fitness due to changes in gene frequencies. The meaning of this
qualification has played a major role in discussions of the meaning and significance of the theorem (Okasha 2008; Ewens 2011): our version shows that this
qualification is made because, while fitness itself does not admit a result of the
form ‘change in mean of X equals variance in X’, the additive genetic value of
fitness does. This settles in a very precise technical way what the qualification
means, and provides a wholly understandable reason for Fisher to make it.
We regard the mathematical framework of this paper as a set of ideas
about which Fisher had strong and correct intuitions, but which he managed
to articulate only in small part, and about which he drew a most significant
conclusion: that population genetics could exhibit the design-making nature
of Darwinian natural selection, demonstrate that it was constantly at work in
a very general setting, and make precise what quantity was the appropriate
measure of goodness of design. Fisher rightly regarded the fundamental theorem of natural selection as providing the fundamental link between Darwin’s
argument that natural selection brought about adaptation, on the one hand,
and population genetics, on the other.
On the dust-jacket of the 1999 variorum edition of Fisher’s book, W.D.
Hamilton writes “In some ways some of us have overtaken Fisher; in many,
however, this brilliant, daring man is still far in front”. In showing exactly
how the fundamental theorem relates to fitness-maximization, and that the
full argument is even today at the boundaries of mathematical biology, we
have taken a significant step towards “catching up with Fisher”.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
34
8 An example with an age-structured population
We present an example to show our various theorems at work, and choose
the classic case of an age-structured population for which Fisher first defined
‘reproductive value’ and proved his fundamental theorem. The results of the
current paper would have allowed us to have a sexual population and to study
sex ratio simultaneously with survival-fertility tradeoff, thus uniting Fisher’s
original uses of reproductive value. However, for simplicity, and for the historical interest of exhibiting implications of the original form of the fundamental
theorem, we have reserved that more advanced example for the future. In particular, we note that versions of the theorem that omit age structure, and so
have no need to involve reproductive value, miss out a fundamental feature of
the fundamental theorem.
We suppose I is a finite population, and our classes are K + 1 age classes
for some K ∈ N, comprising ages 0 to K. We assume the population to be
of constant ploidy one, and to be asexual. We shall consider the set of local
environments to be the set of possible amounts of some resource available to
each individual without competition: thus R = [0, ∞). Thus being in environment r is interpreted as having r resources available. These resources are to be
invested entirely in reproduction and/or survival. We shall assume each individual i ∈ I requires bi resources to produce each offspring, where bi ∈ (0, ∞).
We shall identify the phenotype of an individual with the choice the individual makes of how best to spend her resources, which, since we demand that
resources are exhausted between reproduction and survival, is the choice of
how to distribute the resources between offspring production and attempted
survival. Chance events shall represent how many of the produced offspring of
an individual will survive to the next census point, when they will be of age 0,
and whether an individual herself will survive to the next age class. Thus our
phenotype space A is given by ([0, ∞) × [0, ∞)), the first coordinate representing energy devoted to offspring, the second how much energy the individual
invests in survival.
We thus have a function r : I × Ω → [0, ∞) determining how much resource individual i ∈ I finds available in the state of nature ω ∈ Ω. The
set of admissible phenotypes for an individual i is then the set of functions
q : [0, ∞) → [0, ∞) × [0, ∞) which must satisfy, writing q = (q1 , q2 ), the relations
bi q1 (r) + q2 (r) = r, and
q2 (r) = 0 for all r ∈ [0, ∞) if χi = K.
(20)
(21)
The second condition states that no individual attempts to survive beyond
age K.
The chance events of the state) of nature ω influencing the *individual i are
then represented by u : I × Ω → (N ∪ {0})[0,∞) × {0, 1}[0,∞) , the first term
telling us, for each x ∈ [0, ∞), how many surviving offspring are produced
given the investment of x resources in offspring production, and the second
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
35
term telling us, given the individual devotes r ∈ [0, ∞) resources to survival,
whether the parent individual survives (1) or dies (0). It is reasonable to
assume that the function (uω
i )2 : [0, ∞) → {0, 1} is monotone increasing: the
more effort an individual puts into survival, the more likely she is to survive;
and that (uω
i )2 (0) = 0: individuals do not survive if they do not try to, so in
particular no individual survives beyond age K.
Offspring are then produced as follows:
ω
ω
ω
w̃iω (ai ) = (uω
i )1 ((ai )1 (ri ))δ0 + (ui )2 ((ai )2 (ri ))δχi +1 .
(22)
ω
ω
ω
Define αω
i := (ui )1 ((ai )1 (ri )), the number of surviving offspring, and βi :=
ω
(uω
)
((a
)
(r
)),
which
determines
survival
of
the
individual.
i 2 i
i 2
The offspring distribution over ages may then be captured by considering
ω
ω
the conditional expectations αω (k) := E[αω
i |χi = k] and β (k) := E[βi |χi =
k]. Supposing the parental age distribution to be given by the vector v =
(v0 , . . . , vK )T , the offspring distribution, in state of nature ω, is given by the
vector w = (w0 , . . . , wK )T where
w0 =
K
-
αω (k), and
(23)
k=0
ω
wk = β (k − 1) for k ≥ 1.
(24)
We assume, as in the general argument, that these coefficients are independent
of ω. In particular, then, β ω (k) is independent of ω for each k ≥ 0, and
thus may be written as β(k). Furthermore, by linearity, the situation can be
captured by considering the coefficients α(k) := EΩ [αω (k)] and writing


α(0) α(1) · · ·
···
α(K)
β(0) 0 · · ·
···
0 


 0 β(1) · · ·
···
0 
L=
(25)

 ..
.. . .
.. 
 .
.
.
. 
0
...
0 β(K − 1)
0
and noting that
Lv = w.
(26)
L is then the so-called Leslie matrix associated with demographic processes
(Leslie 1945, 1948; Lewis 1942). The left eigenvector then gives us the percapita reproductive value in the sense of Fisher — an observation consistent
with the more general assertions in Grafen (2006b) that per-capita reproductive value is found as an eigenvector of the adjoint of the forward process transition operator. We therefore denote such a vector τ = (τ0 , . . . , τK )T . Our definition of the fitness of an individual playing strategy q : [0, ∞) → [0, ∞) × [0, ∞)
then amounts to
ω
ω
ω
Fi (q) = EΩ [(uω
i )1 (q1 (ri ))τ0 + (ui )2 (q2 (ri ))τχi +1 ],
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
36
in particular
ω
Fi (ai ) = EΩ [αω
i τ0 + βi τχi +1 ].
Ω ω
We define αi := EΩ [αω
i ] and βi := E [βi ]. Thus
Fi (ai ) = αi τ0 + βi τχi +1 .
The situation is then analyzed by considering when this number is at a maximum. Individuals of age K cannot choose to survive, thus there is no choice of
phenotype available to them; so we concentrate on individuals in age classes
k for 0 ≤ k < K. Fix such an individual i ∈ I. Then
ω
ω
ω
Fiω (q) = (uω
i )1 (q1 (ri ))τ0 + (ui )2 (q2 (ri ))τχi +1
ω
ω
ω
ω
= (uω
i )1 (q1 (ri ))τ0 + (ui )2 (ri − bi q1 (ri ))τχi +1 .
ω
ω
ω
For an admissible strategy q, define Aω
i (q) := (ui )1 (q1 (ri )) and Bi (q) :=
ω
ω
(ui )2 (q2 (ri )). Taking expectations over states of nature we see that
ω
ω
ω
ω
Fi (q) = EΩ [(uω
i )1 (q1 (ri ))τ0 + (ui )2 (ri − bi q1 (ri ))τχi +1 ]
"
"
=
Aω
(Aω
i (q)τ0 ν(dω) +
i (q)τ0 + τχi +1 ) ν(dω)
{ω∈Ω:Biω (q)=0}
= τ0
"
{ω∈Ω:Biω (q)=1}
ω
Aω
i (q) ν(dω) + τχi +1 ν({ω ∈ Ω : Bi (q) = 1}).
Ω
The first summand is the expected number of offspring weighted by their
reproductive value; the second is the probability of survival, weighted by reproductive value of an individual in the next age class.
Let a strategy q be fixed, and consider another strategy q̃ = (q̃1 , q̃2 ). Recall
ω
that the choice to be made is the value of q1 (riω ) ∈ [0, b−1
i ri ]. Suppose q̃1 <
q1 , i.e. playing the strategy q̃ means having fewer offspring. Since (uω
i )2 is
ω
ω
monotone increasing, and Biω (q) = (uω
)
(r
−
b
q
(r
)),
we
then
have
that
2
i
1
i
i
i
ν({Biω = 1}) ≤ ν({Biω (q̃) = 1}). So Fi (q̃) ≥ Fi (q) if and only if
"
"
ω
τ0
Aω
(q̃)
ν(dω)
+
ν({B
(q̃)
=
1})τ
≥
τ
Aω
χi +1
0
i
i
i (q) ν(dω)
Ω
Ω
+ ν({Biω (q) = 1})τχi +1 ,
equivalently
(ν({Biω (q̃) = 1}) − ν({Biω (q) = 1}))τχi +1 ≥ τ0
"
ω
(Aω
i (q) − Ai (q̃)) ν(dω).
Ω
That is, for q̃ to be a better strategy, the increased chance of survival, weighted
by the reproductive value of a surviving individual, must exceed the change in
expected number of offspring, weighted by the reproductive value of offspring.
Of course, if having more offspring leads to a lower expected number of surviving offspring (e.g. if then limited resources are shared between too many
infants), then this is trivial.
Assuming pairwise exchangeability, this then rigorously justifies our intuition for this example. Theorems 2 and 3 together assert that the population
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
37
is at an evolutionary equilibrium if and only if every individual maximizes
the value Fi (ai ). This number is the reproductive value-weighted sum of the
expected number of offspring and the survival probability of the individual.
The above analysis of the maximand then shows that evolutionary equilibrium
is attained precisely when each individual reacts to each environment in such
a way that the expected contribution to the following generation, in terms of
reproductive value, is at a maximum.
This is a simple but fitting example with which to end. Fisher proved his
fundamental theorem in this setting, and concluded that in general we should
expect individuals to maximize their fitness. Here we explicitly articulate that
optimization argument in an example, and note that the fundamental theorem shows how survival and reproduction trade off in the calculation of fitness,
through the use of reproductive value. Notice that the individual is regarded
as making only the decision relevant to its current age as opposed to all lifedecisions simultaneously, and that the fecundity-survival trade-off operates
through varying the chance of survival to the next age period. The value of
surviving is obtained through knowing the reproductive value of an individual
in the next age class. Thus reproductive value in an age-structured population
is the expected future reproductive value of an individual of a given age. If we
had extended the set of classes to include condition, incorporating body weight
and health, at each age, then we could have modelled more complex tradeoffs in which producing more offspring reduced one’s condition, and in which
reproductive value would presumably be an increasing function of condition
within each age class.
Our general results extend Fisher’s theorem by permitting an arbitrary
class-structure, explicitly incorporating uncertainty, allowing each class to have
its own ploidy and, in particular, by fully articulating the meaning of fitnessmaximization. Future generalizations may further permit social behaviour,
time to be continuous or discrete, and random variation in class-to-class projection at the population level along with the demographic stochasticity that
implies.
Acknowledgements The authors wish to thank Alain Goriely and two anonymous referees
for numerous helpful comments on earlier versions of this manuscript, and the participants of
the reading seminar on Fisher (1999) held in the St John’s College Research Centre in 2012,
in particular organizer Jean-Baptiste Grodwohl, for the opportunity to explore Fisher’s work
and the topics central to this paper.
References
Allison AC (1954) Notes on sickle-cell polymorphism. Annals of Human Genetics 19:39–57
Billingsley P (1995) Probability and Measure, 3rd edn. Wiley Series in Probability and
Mathematical Statistics, John Wiley & Sons Inc., New York
Darwin CR (1859) The Origin of Species. John Murray, London
Davies NB, Krebs JR, West SA (2012) An Introduction to Behavioural Ecology. WileyBlackwell, London
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
38
Diestel J, Uhl JJ Jr (1977) Vector Measures. American Mathematical Society, Providence,
R.I.
Dunford N, Schwartz JT (1958) Linear Operators. I. General Theory. Interscience Publishers,
Inc., New York
Edwards AWF (1994) The fundamental theorem of natural selection. Biological Reviews
69:443–474
Ewens WJ (1979) Mathematical Population Genetics. Springer, Berlin, Heidelberg, New
York
Ewens WJ (1989) An interpretation and proof of the fundamental theorem of natural selection. Theoretical Population Biology 36:167–180
Ewens WJ (2004) Mathematical Population Genetics I. Theoretical Introduction. Springer,
Berlin, Heidelberg, New York
Ewens WJ (2011) What is the gene trying to do? British Journal of the Philosophy of Science
62:155–176
Falconer DS (1981) Introduction to Quantitative Genetics (2nd edn). Longman, London
Fisher RA (1930) The Genetical Theory of Natural Selection. Oxford University Press, see
Fisher (1999) for a version in print.
Fisher RA (1941) Average excess and average effect of a gene substitution. Annals of Eugenics 11:53–63
Fisher RA (1999) The Genetical Theory of Natural Selection. Oxford University Press, a
Variorum edition of the 1930 and 1958 editions, edited by J.H. Bennett
Frank SA (2011a) Natural selection. I. Variable environments and uncertain returns on
investment. Journal of Evolutionary Biology 24(11):2299–2309
Frank SA (2011b) Natural selection. II. Developmental variability and evolutionary rate.
Journal of Evolutionary Biology 24(11):2310–2320
Frank SA (2012a) Natural selection. III. Selection versus transmission and the levels of
selection. Journal of Evolutionary Biology 25(2):227–243
Frank SA (2012b) Natural selection. IV. The Price equation. Journal of Evolutionary Biology
25(6):1002–1019
Frank SA (2012c) Natural selection. V. How to read the fundamental equations of evolutionary change in terms of information theory. Journal of Evolutionary Biology 25(12):2377–
2396
Frank SA (2013a) Natural selection. VI. Partitioning the information in fitness and characters by path analysis. Journal of Evolutionary Biology 26(3):457–471
Frank SA (2013b) Natural selection. VII. History and interpretation of kin selection theory.
Journal of Evolutionary Biology 26(6):1151–1184
Frank SA, Slatkin M (1990) Evolution in a variable environment. American Naturalist
136:244–260
Frank SA, Slatkin M (1992) Fisher’s Fundamental Theorem of Natural Selection. Trends in
Ecology and Evolution 7:92–95
Grafen A (2000) Developments of Price’s Equation and natural selection under uncertainty.
Proceedings of the Royal Society, Series B 267:1223–1227
Grafen A (2002) A first formal link between the Price equation and an optimisation program.
Journal of Theoretical Biology 217:75–91
Grafen A (2006a) Optimisation of inclusive fitness. Journal of Theoretical Biology 238:541–
563
Grafen A (2006b) A theory of Fisher’s reproductive value. Journal of Mathematical Biology
53:15–60
Halmos PR (1950) Measure Theory. D. Van Nostrand Company, Inc., New York
Hamilton WD (1964) The genetical evolution of social behaviour. Journal of Theoretical
Biology 7:1–52
Kechris AS (1995) Classical Descriptive Set Theory, Graduate Texts in Mathematics, vol
156. Springer-Verlag, New York
Kingman JFC, Taylor SJ (1966) Introduction to Measure and Probability. Cambridge University Press, London
Leslie PH (1945) On the use of matrices in certain population mathematics. Biometrika
33(3):183–212
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
39
Leslie PH (1948) Some further notes on the use of matrices in population mathematics.
Biometrika 35(3/4):213–245
Lessard S (1997) Fisher’s fundamental theorem of natural selection revisited. Theoretical
Population Biology 52:119–136
Lewis EG (1942) On the generation and growth of a population. Sankhyā: The Indian
Journal of Statistics (1933-1960) 6(1):93–96
Lewontin RC (1974) The Genetic Basis of Evolutionary Change. Columbia University Press,
New York
Maynard Smith J (1982) Evolution and the Theory of Games. Cambridge University Press
Maynard Smith J, Price GR (1973) The logic of animal conflict. Nature 246:15–18
Okasha S (2008) Fisher’s fundamental theorem of natural selection — a philosophical analysis. British Journal of the Philosophy of Science 59:319–351
Price GR (1970) Selection and covariance. Nature 227:520–521
Price GR (1972a) Extension of covariance selection mathematics. Annals of Human Genetics
35:485–490
Price GR (1972b) Fisher’s ‘fundamental theorem’ made clear. Annals of Human Genetics
36:129–140
Rao MM (2004) Measure Theory and Integration, Monographs and Textbooks in Pure and
Applied Mathematics, vol 265, 2nd edn. Marcel Dekker Inc., New York
Rosenblatt M (1971) Markov Processes. Structure and Asymptotic Behavior. SpringerVerlag, New York
Rudin W (1966) Real and Complex Analysis. McGraw-Hill Book Co., New York
Taylor PD (1990) Allele-frequency change in a class-structured population. American Naturalist 135:95–106
Taylor PD (1996) Inclusive fitness arguments in genetic models of behaviour. Journal of
Mathematical Biology 34:654–674
Wagner DH (1977) Survey of measurable selection theorems. SIAM Journal on Control and
Optimization 15(5):859–903