Oct22

Polymorphism
Polymorphism: when two or more alleles at a
locus exist in a population at the same time.
Nucleotide diversity:
P =
ij
xixjpij
Freq
(x)
Seq 1 G A G G T G C A A C
Seq 2 G A G G A C C A A C
Seq 3 G A G C T G G A A G
0.4
0.5
0.1
considers # differences
and allele frequency
1
2
3
1
p12 p13
2 0.2
p23
3 0.3 0.5
P = (0.4)(0.5)(0.2) + (0.4)(0.1)(0.3) + (0.5)(0.1)(0.5) = 0.077
p12
p13
p23
In Theory:
Under infinite-sites model: Expectation (P  = 

4Nem = frequency of heterozygotes per nucleotide site
Nucleotide diversity is low in humans
Polymorphism is also estimated by:
K= Number of segregating
(variable) sites in a sample of
alleles.
ATCCGGCTTTCGA
K = 3 for-->ATCCGAATTTCGA
ATTCGCCTTTCGA
In Theory:
Expectation (K =

Where a = 1 + 1/2 + 1/3 +……..1/n-1
Coalescent
Process
t2
tm is time for coalescence
from m to m-1 sequences
t3
t4
t5
Gene Tree
Coalescent
Process
a
b
The geneology of n sequences
has 2(n-1) branches.
n = number of external
branches.
c
d
n-2 are internal
e
f
g
h
Gene Tree
How long will the coalescence process take?
Simplest case: If pick two random gene copies, probability that
the second is the same as the first is 1 / (2N). This is the probability
that two alleles coalesce in previous generation.
It follows that 1 - 1 / (2N) is the probability that two sequences
were derived from different sequences in the preceding generation.
Therefore, the probability that 2 sequences derived from the same
ancestor 2 generations ago (grandparent) is 1 - 1 / (2N) x 1 / (2N).
It can be shown that the probability that two sequences were
derived from the same ancestor t generations ago is:
[1 - 1 / (2N)t x (1 / (2N)] ~ (1 / (2N) e(-t/(2N)
[1 - 1 / (2N)g-1 x (1 / (2N)]
Because N is in denominator, the probability will depend on sample size
Consider probability of common ancestry for:
Generations ago
1
Prob(N=5)
0.400
Prob(N=10)
0.200
2
0.320
0.182
3
0.256
0.162
It can be shown that the average time back to common ancestry
of a pair of genes in a diploid population is 2N, and the average
time back to common ancestry of all gene copies is 4N.
Large pop
Small pop
Coalescence with no mutation
The average degree of relationship increases with time.
All of the gene copies in a
population can be traced back
to a single ancestral gene.
A population will eventually become monomorphic
for one allele or another, with this probability
determined by initial allele frequencies.
Coalescence with mutation
If each lineage experiences m mutations per generation,
then the number of base
pair differences between
them will be #dif = 2mtca.
If the average time to
coalescence is 2N for two
randomly chosen gene
copies, then #dif = 2 m (2N).
Therefore, expect the
average number of base pair differences between gene
copies to be greater in a larger population.
Total length of branches of gene tree
I+L=J
External
Internal
+
branches branches
=
Total time
length
Now consider mutation among branches
during the coalescent process.
i) + e) = 
Mutations
internal
branches
+
Mutations
external
branches
=
Total number
of mutations
in gene tree
In theory: total number of mutations  equals the
number of segregating sites (K)
Testing for Selective Neutrality
Tajima’ s Test (1989):
D=
Rationale:
P-K/a
V(P - K/a)
Using the difference in
estimates of polymorphism
to detect deviation from
neutrality.
Normalizing factor
P and K are differentially influenced by the
frequency of alleles.
P
K/a
Few alleles at intermediate frequency
>
Many low frequency, variable alleles
<
D = 0 neutral prediction
D > 0 balancing selection
D < 0 directional selection
Fu and Li’s Test (1993):
D=
i - e / (a - 1)
Using the difference in
# mutations in gene tree
to detect deviation from
neutrality.
V[i - e / (a - 1)
Rationale: An equivalent number of mutations is expected
between interior verses exterior branches of a neutral
gene tree.
i
e
Few alleles at intermediate frequency
>
Many low frequency, variable alleles
<
D = 0 neutral prediction
D > 0 balancing selection
D < 0 directional selection
Gene genealogies under no selection, positive
selection, balancing selection, and background selection.
No Selection : 7 neutral
mutations accumulate since
the time of the last common
ancestor.
D=0
Consider the Effects of
Selection on Neutral Sites
Linked to a Selected Site
Positive Selection : neutral
variation at linked sites will
be eliminated (swept away)
as the advantageous allele
quickly is fixed in the
population. This process is
also called hitch-hiking.
D<0
Consider the Effects of
Selection on Neutral Sites
Linked to a Selected Site
Balancing Selection : neutral
variation at linked sites
accumulates during the long
period of time that both
allele lineages are
maintained.
D>0
Consider the Effects of
Selection on Neutral Sites
Linked to a Selected Site
Background Selection : gene
lineages become extinct not
only by chance, but because
of deleterious mutations to
which they are linked, which
eliminates some gene copies.
D<0
Problem: Background selection and hitchhiking are
contrasting processes that lead to the same pattern.
How to differentiate?
Dramatic examples of reduced polymorphism=hitchhiking.
Less dramatic examples=background selection.