Selection and Population Genetics

Selection and Population Genetics
Evolution by natural selection can occur when three conditions are satisfied:
Variation within populations - individuals have different traits (phenotypes).
height and weight are normally distributed in human populations
coat color varies from light to dark in deer mice (P. maniculatus)
Selection - traits influence fecundity and survivorship (fitness).
larger body size may be beneficial in cold environments
differences between coat color and ground color may affect deer mouse
vulnerability to avian predators
Heritability - offspring are similar to their parents.
human height is affected by hundreds of genetic variants
coat color differences are affected by variation in the expression of the
Agouti gene
Jay Taylor (ASU)
Selection
23 Feb 2017
1 / 33
Example: Beak size in the Medium Ground Finch (Geospiza fortis)
Restricted to the Galapagos Islands.
Forages mainly on seeds.
Large seeds are handled more efficiently by birds with larger bills.
Large seeds predominate following drought years (e.g., 1977).
Jay Taylor (ASU)
Selection
23 Feb 2017
2 / 33
Population genetics focuses on understanding evolution at the molecular level: how
does natural selection affect the dynamics of gene frequencies? There are two
fundamental issues.
1
To what extent do population genetical processes influence adaptation?
How strongly does demography (population size and structure) affect
adaptation?
Does adaptation rely mainly on standing variation or on new mutations?
Have genetic systems evolved to facilitate adaptation?
2
How much impact does selection have on genetic variation?
What proportion of a genome is directly under selection?
What proportion is affected by selection at linked sites?
How robust are methods that use genetic variation to infer demography to
deviations from neutrality?
Jay Taylor (ASU)
Selection
23 Feb 2017
3 / 33
Selection in an Infinite Haploid Population
We begin by formulating a simple model of selection in a haploid population containing
two alleles, A1 and A2 , where
1
Individuals with genotype Ai survive to adulthood with probability vi and then give
birth to ri offspring.
2
Genotypes are transmitted without mutation.
3
Genetic drift can be ignored.
4
We will let ni denote the number of Ai alleles in the current generation and p
denote the frequency of A1 .
Notice that differences in vi will give rise to viability selection, while differences in ri
will result in fecundity selection. Their product will be denoted wi = vi ri and can be
considered the fitness of allele Ai .
Jay Taylor (ASU)
Selection
23 Feb 2017
4 / 33
Because we are ignoring genetic drift, the number of Ai individuals alive at the
beginning of the next generation is
ni0 = ni vi ri = ni wi ,
This shows that the frequency of A1 in the next generation is
p0
=
=
=
n10
n1 w1
=
+ n20
n1 w1 + n2 w2
(n1 /n)w1
pw1
=
(n1 /n)w1 + (n2 /n)w2
pw1 + (1 − p)w2
w1
p ,
w̄
n10
where w̄ = pw1 + (1 − p)w2 is the mean fitness of the population.
Jay Taylor (ASU)
Selection
23 Feb 2017
5 / 33
This is one of the few models with selection that can be solved explicitly. Letting p(t)
denote the frequency of A1 in generation t and setting p(0) = p, one can show that
Selection in an infinite haploid model
p(t) =
1−p
1+
p
w2
w1
t −1
In particular, this shows that the fitter of the two alleles will spread towards fixation,
e.g., if w1 > w2 , then p(t) will approach 1 as time progresses. On the other hand, this
model makes several predictions that are false.
The beneficial allele is never actually fixed in the population.
Any beneficial allele, no matter how rare initially, will spread.
To remedy these shortcomings, we need to formulate a model that incorporates both
selection and genetic drift.
Jay Taylor (ASU)
Selection
23 Feb 2017
6 / 33
Selection in Finite Populations
We can incorporate selection into the haploid Wright-Fisher model by relaxing the
assumption that parents are chosen uniformly at random.
1
The alleles A1 and A2 have relative fitnesses 1 + s and 1, respectively, where the
selection coefficient s = s(p) may depend on the frequency of A1 .
2
The parents of the N individuals alive in generation t + 1 are chosen at random
and with replacement, but each A1 -type individual is (1 + s)-times more likely to
be chosen than an A2 -type individual.
3
Mutation occurs at birth, but after selection, at rates v (A1 → A2 ) and u
(A2 → A1 ).
Jay Taylor (ASU)
Selection
23 Feb 2017
7 / 33
In this model, the expected frequency of A1 is changed by both mutation and selection:
E[∆pt ] = u(1 − p) − vp + (1 − u − v )
Directional Selection
s =0.01
1
1
0.5
0.5
0.5
0
0
5000
Generation
Jay Taylor (ASU)
10000
0
0
p
1
p
p
Neutrality
N =1000, µ =0.0002
s(p) · p(1 − p)
1 + s(p) · p
5000
Generation
Selection
10000
Balancing Selection
s =0.01*(0.5 −p)
0
0
5000
Generation
10000
23 Feb 2017
8 / 33
Fixation Probabilities of Selected Alleles
If we neglect mutation, then eventually one of the two alleles will be fixed in the
population. In this case, we would like to know how the fixation probability of an allele
depends on its fitness.
Fixation probability of a selected allele in a haploid population
If s(p) = s is constant, the probability that A1 is fixed given that its initial frequency is
p is approximately
P(A1 is fixed) =
1 − e −2Nsp
.
1 − e −2Ns
Remark: This result is obtained with the help of the diffusion approximation and is
exact in that limit. For models other than the Wright-Fisher model, we need to replace
N by Ne .
Jay Taylor (ASU)
Selection
23 Feb 2017
9 / 33
The most important case is when a single copy of a new allele is introduced into a
population, either by mutation or immigration. In this case, the initial frequency is
p = 1/N and we can use the preceding result to show that
Fixation probability of a new allele
−2s
Pfix =
1−e
1 − e −2Ns

2s





1/N
≈





2|s|e −2N|s|
if 1/N s 1
if − 1/N s 1/N
if − 1 s −1/N
In particular,
Novel beneficial mutations are likely to be lost from a population.
Selection is dominated by genetic drift when |s| <
1
.
N
Deleterious mutations can be fixed, but only if N|s| is not too large.
Jay Taylor (ASU)
Selection
23 Feb 2017
10 / 33
Selection and genetic drift
As a general rule, genetic drift reduces the efficiency of natural selection, especially in
small populations. Not only are beneficial mutations likely to be lost, but deleterious
mutations can become fixed in small populations.
Fixation Probabilities of New Mutants
1
0.1
s = 0.01
0.01
s = 0.001
0.001
prob
0.0001
s=0
1E-05
s = -0.001
1E-06
1E-07
s = -0.002
1E-08
1E-09
1E-10
10
100
1000
10000
N
Jay Taylor (ASU)
Selection
23 Feb 2017
11 / 33
Substitution Rates
The substitution rate is the rate at which new mutations are fixed in a population.
Substitution rates depend on population size, mutation and selection.
Divergence between populations or species occurs when different mutations are
fixed in these populations.
Thus, substitution rates can sometimes be estimated from divergence.
Even when selection cannot be observed directly, it can sometimes be inferred
from the effect that it has on divergence.
Substitution rates are difficult to calculate exactly. However, we can find a good
approximation if we assume that the mutation rate is low enough that each new
mutation is likely to be lost or fixed before another mutation enters the population.
Jay Taylor (ASU)
Selection
23 Feb 2017
12 / 33
Under this assumption, we can approximate the substitution rate (per generation) by
the expression
1
ρ ≈ Nµ · u
,
N
where Nµ is the expected number of new mutations per generation, while u(1/N)
is the probability that any one of these is fixed in the population. This is accurate when
Nµ 1.
Neutral substitutions: For neutral mutations, we have
ρ = Nµ ·
1
= µ,
N
which shows that the neutral substitution rate is equal to the mutation rate and
is independent of population size.
Jay Taylor (ASU)
Selection
23 Feb 2017
13 / 33
Beneficial substitutions: If s 1/N, then
ρ ≈ 2Nµs,
and so the beneficial substitution rate is greater than the mutation rate and
increases with population size.
Deleterious substitutions: If s −1/N, then
ρ ≈ 2Nµ|s|e −2N|s| ,
and so the deleterious substitution rate is less than the mutation rate and
decreases with population size.
Moral: The substitution rate at a locus under selection is usually different from the
mutation rate and does depend on population size.
Jay Taylor (ASU)
Selection
23 Feb 2017
14 / 33
Protein-coding sequences appear to be under stronger selective constraints in
larger populations.
Jay Taylor (ASU)
Selection
23 Feb 2017
15 / 33
Selection-Mutation-Drift Balance
When selection and mutation are both incorporated into the model, neither allele will be
permanently fixed and so we instead investigate the stationary distribution of the allele
frequencies. Using diffusion theory, we can show that
Stationary distribution with mutation and selection
Provided that N is sufficiently large, the stationary distribution can be approximated by
the following density
Z p
1
π(p) = p 2Nv −1 (1 − p)2Nu−1 exp 2N
s(q)dq , 0 ≤ p ≤ 1.
C
0
Depending on the sign of s(q), the exponential term will either increase or decrease very
rapidly as N increases: this too reflects the competing influences of drift and selection.
Jay Taylor (ASU)
Selection
23 Feb 2017
16 / 33
Stationary distribution with mutation and purifying selection
π(p) =
1 2Nv −1
p
(1 − p)2Nu−1 e 2Nsp ,
C
0 ≤ p ≤ 1.
Purifying selection has two consequences:
It shifts the stationary distribution in the direction of the favored allele.
1.0
0.8
0.8
1.0
It tends to reduce the amount of variation present at the selected locus.
0.6
0.4
0.2
0.05
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.05
p
Jay Taylor (ASU)
2Ns = 2
0.0
0.0
0.2
0.4
0.6
2Ns = 1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p
Selection
23 Feb 2017
17 / 33
Purifying selection on human protein-coding genes
Variation is greatly reduced in
exons relative to introns in the
human genome.
Variation is slightly reduced in the
5’ and 3’ UTR’s.
Human-chimp divergence is
proportional to human
polymorphism.
This suggests that most deleterious
variants in coding regions are
strongly selected against.
Source: Durbin et al. (2010): 1000 Genomes Project
Jay Taylor (ASU)
Selection
23 Feb 2017
18 / 33
Selection in Finite Populations: Some Implications
Adaptations that are under weak selection are less likely to evolve in populations
with small Ne . This could explain why codon bias is often more pronounced in
bacteria and single-celled eukaryotes than in vertebrates (Bulmer 1991).
Muller’s ratchet: The repeated fixation of deleterious alleles on non-recombining
chromosomes can lead to their eventual degeneration (Muller 1964).
Mutational meltdown: The fixation of deleterious alleles in very small populations
may lead to futher population size declines that eventually result in extinction.
Lynch & Conery (2003) propose that many features of eukaryotic genomes (e.g.,
expansion of repetitive DNA, introns) have evolved through the accumulation of
very weakly deleterious mutations in small populations.
Jay Taylor (ASU)
Selection
23 Feb 2017
19 / 33
Purifying selection can lead to overestimates of recent divergence times
1562 Ho et al.
0.14
0.12
Estimates of divergence times are
often affected both by fixed
differences and polymorphism.
0.10
0.08
0.06
0.04
Because deleterious mutations are
more likely to contribute to
polymorphism than to substitutions,
this can lead to overestimates of
divergence times.
0.02
Rate of Change (changes/site/Ma)
0
This bias will be greatest for recently
diverged taxa.
0.14
Selection
10
20
30
10
20
30
(b)
0.12
0.10
0.08
0.06
0.04
0.02
0
0.80
Jay Taylor (ASU)
(a)
(c)
Ho et al. (2005)
23 Feb 2017
split, 3
lack of
chimpa
chimpa
of mol
2002;
For the
from N
calibrat
tal sequ
bers AY
Ra
implem
2002; D
assump
through
rate in
bution
This m
(2002)
to be s
vious r
Kishino
Sander
user-sp
multipl
sets), a
these c
size an
estimat
highest
20 /1,000,0
33
Selection on Diploid Loci
Selection at diploid loci is complicated by two additional factors.
To predict the effects of selection at a diploid locus, we need to know the fitness
of each genotype. For example, with two alelles, A1 and A2 , we need
genotype
A1 A1
A1 A2
A2 A2
relative fitness
w11 = 1 + s11
w12 = 1 + s12
w22 = 1 + s22
In sexually reproducing taxa, the genotype frequencies are also affected by meiosis
and mating. In particular, because of segregation, there is no guarantee that the
fittest allele will spread to fixation even in an infinite population.
Jay Taylor (ASU)
Selection
23 Feb 2017
21 / 33
Marginal Fitness
In many cases, the effect of selection at a diploid locus can be predicted by determing
the marginal fitnesses of the alleles, i.e., the average fitness of an allele weighted by the
frequency with which it occurs in each diploid genotype before selection acts:
w1
=
P(A1 |A1 )w11 + P(A2 |A1 )w12
w2
=
P(A1 |A2 )w12 + P(A2 |A2 )w22
Here P(Aj |Ai ) is the conditional probability that an Ai allele is contained in an Ai Aj
genotype:
P(Aj |Ai ) =
Jay Taylor (ASU)
freq(Aij )
freq(Ai )
Selection
23 Feb 2017
22 / 33
Under random mating, we have P(Aj |Ai ) = pj and so the marginal fitnesses are:
w1
=
pw11 + (1 − p)w12
w2
=
pw12 + (1 − p)w22
In this case, because the marginal fitnesses are functions of p, we have
∆p = p 0 − p = p(1 − p)
w1 − w2
w̄
where w̄ = pw1 + (1 − p)w2 is the population mean fitness and
w1 − w2
=
p(w11 − w12 ) + (1 − p)(w12 − w22 )
=
w12 − w22 + p(w11 − 2w12 + w22 ).
Notice that this last expression depends on p except when w11 − 2w12 + w22 = 0. In
other words, selection is frequency-dependent unless the fitness of the heterozygote is
equal to the average fitness of the two homozygous genotypes (additive selection).
Jay Taylor (ASU)
Selection
23 Feb 2017
23 / 33
Selection at a diploid locus segregating two alleles can have very different consequences
depending on the relative fitnesses of the genotypes.
fitnesses
w11 > w12 > w22
w11 < w12 < w22
w12 > max{w11 , w22 }
w12 < min{w11 , w22 }
Jay Taylor (ASU)
consequence
directional selection in favor of A1
directional selection in favor of A2
overdominance: selection maintains both alleles
underdominance: selection can favor either homozygote
Selection
23 Feb 2017
24 / 33
Dominance and Directional Selection
Suppose that A1 is deleterious compared to A2 and that the selection coefficients have
the form
−s
s11
=
s12
=
−hs
s22
=
0,
where s > 0 and h ∈ [0, 1].
The constant h is called the dominance coefficient because it quantifies the
contribution of the A1 allele to the fitness of the heterozygote. A1 is said to be
dominant if h ∈ (1/2, 1]
recessive if h ∈ [0, 1/2)
additive if h = 1/2.
Jay Taylor (ASU)
Selection
23 Feb 2017
25 / 33
In this case, the stationary distribution of the diffusion process has density
π(p) =
1 4Ne v −1
p
(1 − p)4Ne u−1 e −2Ne sp+(1−2h)2Ne sp(1−p) .
C
Because the exponent is a decreasing function of h, recessive deleterious alleles tend
to be more common than dominant deleterious alleles.
Equilibrium Frequency of Deleterious Alleles
0.025
0.020
p
0.015
2Ns = −10
0.010
−20
0.005
−100
0.000
0.0
0.2
0.4
0.6
0.8
1.0
h (dominance coefficient)
Jay Taylor (ASU)
Selection
23 Feb 2017
26 / 33
Selection and Genealogies
The simplicity of the neutral coalescent stems from the fact that the genealogy of
a random sample depends only on demographic events involving lineages ancestral
to the sample.
With selection, these statements are no longer true since the reproductive
contribution of the ancestral lineages depends also on lineages that are not
ancestral to the sample.
In this case, we must keep track of additional information that allows us to
account for the effects of selection on the genealogy.
This can be done in two ways: with the ancestral selection graph (Krone &
Neuhauser 1997) or with a genetically structured coalescent (Hudson et al.
1988).
Jay Taylor (ASU)
Selection
23 Feb 2017
27 / 33
The Ancestral Selection Graph
The main insight underlying the ancestral selection graph (ASG) is that selection leads
to additional deaths that can be modeled backwards in time by branching events. If
there are two alleles, A1 and A2 , the graph can be simulated as follows:
Coalescent events occur at rate
n
2
when there are n lineages.
Branching events occur at rate nσ, where σ = Ns > 0 is the selection coefficient
of A2 . One is the incoming branch and the other is the continuing branch.
This process is simulated until there is only one lineage (the ultimate ancestor,
UA).
The type of the UA is determined and then mutation is simulated along the graph.
The genealogy is extracted from the graph by starting at the UA and working
forward. When branch events are encountered, the incoming branch replaces the
continuing branch whenever its type is the fitter allele.
Jay Taylor (ASU)
Selection
23 Feb 2017
28 / 33
Two simulations of the ASG for a sample of 3 chromosomes (Krone & Neuhauser 1997,
Fig. 5).
Ancestral Processes with Selection
221
FIG. 5. The ancestral selection graph without mutation events when the ultimate ancestor is (a) of type 1 and (b) of type 2. (Thick lines represent
the true genealogy.)
the case of two genotypes, there are exactly two (not
call the particle a virtual particle. The rules are now as
necessarily
distinct)
trees
embedded
the ancestral
follows: branching
If a real particle
reaches and
a branching
point, it to
Remark:
When
σ is
large,
the inASG
contains many
events
is expensive
selection graph which describe the genealogy of the
splits into a real particle and into a virtual particle. If a
simulate.
constructed sample.
virtual particle reaches a branching point, it splits into
Before we proceed with analyzing T MRCA , we would
two virtual particles. If two particles reach a coalescing
like to mention how the above procedure needs to be
point, the resulting particle is real if and only if at least
modified in the non-stationary case. In this case,
one of the two particles is real, otherwise the resulting
instead of running the mutation✓selection process along
particle is virtual. We are now ready to state our next
the Jay
branches,
simply
put
mutation
events
along
the
result.
Taylor (ASU)
Selection
23 Feb 2017
29 / 33
Coalescents Structured by Genetic Background
Key ideas:
Think of the population as being subdivided into different genetic backgrounds
defined by the allele carried at the selected locus.
Chromosomes that carry the same allele are neutrally equivalent.
Consequently, coalescence within genetic backgrounds can be described by a
modification of Kingman’s coalescent.
To be concrete, suppose that we are interested in the genealogy at a neutral marker
locus which is linked to a locus segregating two alleles, A1 and A2 under selection. Our
goal is to describe the ancestral process,
Gt = (n1 (t), n2 (t), p(t)),
where ni (t) is the number of ancestral lineages in the Ai background and p(t) is the
ancestral frequency of A1 at time t in the past.
Jay Taylor (ASU)
Selection
23 Feb 2017
30 / 33
0.5
past%
Changes to the ancestral process can occur
through the following events:
Common%ancestor%
coal%
0.4
2%
Two A1 lineages can coalesce.
Two A2 lineages can coalesce.
2%
0.3
coal%
Each lineage can migrate between
backgrounds, through:
mut%
0.2
mutation at the selected locus;
recombination between the selected
and marker loci.
2%
coal%
0.1
rec%
The allele frequencies at the selected locus
change as we go backwards in time.
present%
1%
1%
1%
2%
0.0
0.0
0.2
0.4
0.6
0.8
1.0
p
Jay Taylor (ASU)
Selection
23 Feb 2017
31 / 33
When time is measured in units of 2Ne generations, these events will occur at the
following rates:
Transition
two A1 lineages coalesce
two A2 lineages coalesce
Rate
n1 1
2 p
n2 1
2 q
a lineage mutates from A1 to A2
n1 µ1 (q/p)
a lineage mutates from A2 to A1
n2 µ2 (p/q)
a lineage recombines from A1 to A2
n1 rq
a lineage recombines from A2 to A1
n2 rp
Furthermore, the ancestral allele frequencies at the selected locus will be changed by
genetic drift, mutation and selection.
Jay Taylor (ASU)
Selection
23 Feb 2017
32 / 33
The structured coalescent can be extended to handle selective sweeps,
frequency-dependent selection and other complicated scenarios. Dealing with
selection at multiple loci, however, is still challenging.
Unless the deleterious mutation rate is very large, the effect of purifying selection
at a single locus on the genealogy at a linked locus is fairly modest.
Purifying selection does tend to shift deleterious mutations towards the ’top’ of
the tree, i.e., they tend to be more recent than neutral mutations.
and A. M. Etheridge
Selection in Fluctua
Figure 15.—The effect of purifying selection on mean coFigure 16.—The effect of purifying selection on mean coalescence time, plotted against recombination rate, R; U !
alescence time, plotted against selection, S, for U ! 0.25, 0.5,
0.5, p ! 0.5. The vertical axis shows decreases from the neutral
1; p ! 0.5. There is complete linkage (R ! 0).
value, 1 # E["], on a logarithmic scale. The top thick curve
Jaydeterministic
Taylor (ASU)
23 Feb 2017
33 / 33
shows the
limit for S ! 8, in which allele fre-Selection