GLYPHOSATE RESISTANCE Background / Problem

Lecture 21: Tests for Departures
from Neutrality
November 9, 2012
Last Time
Introduction to neutral theory
Molecular clock
Expectations for allele frequency
distributions under neutral theory
Today
Sequence data and quantification of variation
 Infinite sites model
 Nucleotide diversity (π)
Sequence-based tests of neutrality
 Ewens-Watterson Test
 Tajima’s D
 Hudson-Kreitman-Aguade
 Synonymous versus Nonsynonymous substitutions
 McDonald-Kreitman
Expected Heterozygosity with Mutation-Drift
Equilibrium under IAM
 At equilibrium:
1
1
fe 

4Ne  1   1
set 4Neμ = θ
 Remembering that H = 1-f:
He 

 1
Allele Frequency Distributions
Black: Predicted from Neutral
Theory
White: Observed (hypothetical)
 Neutral theory allows a
prediction of frequency
distribution of alleles
through process of birth
and demise of alleles
through time
 Comparison of observed to
expected distribution
provides evidence of
departure from Infinite
Alleles model
Hartl and Clark 2007
 Depends on f, effective
population size, and
mutation rate
Ewens Sampling Formula
Population mutation rate: index of variability of population:
  4Ne 

Probability the i-th sampled allele is new given i alleles already sampled:
Probability of sampling a new allele on the first sample: 

 0
Probability of observing a new allele after sampling one allele:
.


 1
Expected number of different alleles (k) in a sample of 2N alleles is:
E (k ) 

 i
 1
i 0



 1   2
 ... 

  2N 1
Example: Expected number of alleles in a sample of 4:
E (k ) 
2 N 1

  i
i 0
3

i 0

 i
1




 i
1
Probability of sampling a new allele on the third and fourth samples:
2 N 1


 1   2   3
 He


 2
 3
Ewens Sampling Formula
E ( n) 
2 N 1

  i
i 0
 1



 ... 

 1   2
  2N 1
where E(n) is the expected number
of different alleles in a sample of N
diploid individuals, and  = 4Ne.
1
1
fe 

4Ne  1   1
 Predicts number of
different alleles that
should be observed in a
given sample size if
neutrality prevails under
Infinite Alleles Model
 Small θ, E(n) approaches 1
 Large θ, E(n) approaches
2N
 θ can be predicted from
number of observed
alleles for given sample
size
 Can also predict expected
homozygosity (fe) under
this model
Ewens-Watterson Test
 Compares expected homozygosity under the neutral
model to expected homozygosity under HardyWeinberg equilibrium using observed allele
frequencies
 Comparison of allele frequency distributions
 fe comes from infinite allele model simulations and
can be found in tables for given sample sizes and
observed allele numbers
f HW   p
2
i
Ewens-Watterson Test Example
 Drosophila pseudobscura
collected from winery
 Xanthine dehydrogenase
alleles
 15 alleles observed in 89
chromosomes
 fHW = 0.366
Hartl and Clark 2007
fe
 Generated fe by
simulation: mean 0.168
How would you interpret this result?
Expected Homozygosity fe
Most Loci Look Neutral According to
Ewens-Watterson Test
Hartl and Clark 2007
DNA Sequence Polymorphisms
 DNA sequence is ultimate view of standing genetic
variation: no hidden alleles
 Is this really true?
 What about back mutation?
 Signatures of past evolution are contained in DNA
sequence
 Neutral theory presents null model
 Departures due to:
 Selection
 Demographic events
- Bottlenecks, founder effects
- Population admixture
Sequence Alignment
 Necessary first step for comparing sequences within
and between species
 Many different algorithms
 Tradeoff of speed and accuracy
Quantifying Divergence of Sequences
Nucleotide diversity (π) is average number of
pairwise differences between sequences
where
N

pi p j ij

N  1 ij
N is number of sequences in
sample,
pi and pj are frequency of
sequences i and j in the
sample, and
πij is the proportion of sites
that differ between
sequences i and j
Sample Calculation of π
5
10
15
20
25
30
35
A
B
C
A->B, 1 difference
A->C, 1 difference
B->C, 2 differences
N

pi p j ij

N  1 ij
3
  (0.33)(0.33)(1 / 35)  (0.33)(0.33)(1 / 35)  (0.33)(0.33)( 2 / 35)
2
 0.01867
On average, there are 18.67 polymorphisms per kb between pairs of
haplotypes in the population
Tajima’s D Statistic
 Infinite Sites Model: each new mutation affects
a new site in a sequence
E ( ) 

m
  m
where m is length of sequence, and
  4Ne 
 Expected number of polymorphic sites in all
sequences:
E ( S )  a1 S
1   S
S
a1  
a
i
1
i 1
n 1
where n is number of different sequences compared
Sample Calculation of θS
5
10
15
20
25
30
A
B
C
Two polymorphic sites
S=2
n 1
1 1 1
a1      1.5
1 2
i 1 i
S
2
S  
 1.33
a1 1.5
  0.01867
  m  (0.01867)(35)  0.65
35
Tajima’s D Statistic
 Two different ways of estimating same
parameter:
S
  m  S 
a1
 Deviation of these two indicates deviation from
neutral expectations
d     S
d
D
V (d )
where V(d) is variance of d
Tajima’s D Expectations
 D=0: Neutrality
 D>0
d     S
 Balancing Selection: Divergence of alleles (π)
increases
OR
 Bottleneck: S decreases
 D<0
 Purifying or Positive Selection: Divergence of
alleles decreases
OR
 Population expansion: Many low frequency alleles
cause low average divergence
Balancing Selection
Balancing
selection

‘balanced’ mutation
Neutral mutation
d     S
Slide adapted from Yoav Gilad

 Should increase nucleotide
diversity ()
 Decreases polymorphic sites (S)
initially.
 D>0
Recent Bottleneck
d     S
 Rare alleles are lost
 Polymorphic sites (S) more severely affected
than nucleotide nucleotide diversity ()
 D>0
Standard neutral model
Positive Selection and Purifying Selection
sweep
S
Advantageous mutation
Neutral mutation
d     S
Slide adapted from Yoav Gilad
recovery

s
s
Time
 Should decrease both
nucleotide diversity () and
polymorphic sites (S) initially.
 S recovers due to mutation
  recovers slowly: insensitive
to rare alleles
 D<0
Rapid Population Growth will also result in an
excess of rare alleles even for neutral loci
Standard neutral
model
Time
Rapid population
size increase
Often two main
haplotypes, some
rare alleles
Most alleles are rare
Slide adapted from Yoav Gilad
  4Ne 
 Most alleles are rare
 Nucleotide diversity
() depressed
 Polymorphic sites (S)
unchanged or even
enhanced : 4Neμ is
large
 D<0
d     S
How do we distinguish these two forms of
divergence (selection vs demography)?
Hudson-Kreitman-Aguade Test
Divergence between species should be of
same magnitude as variation within species
Provides a correction factor for mutation
rates at different sites
Complex goodness of fit test
Perform test for loci under selection and
supposedly neutral loci
Hudson-Kreitman-Aguade (HKA) test
Neutral Locus
Polymorphism
Divergence
8
3
20
8
Polymorphism: Variation within species
Divergence: Variation between species
Slide adapted from Yoav Gilad
Test Locus A
8/20 ≈ 3/8
Hudson-Kreitman-Aguade (HKA) test
Neutral Locus
Polymorphism
Divergence
Test Locus B
8
3
20
19
8/20 >> 3/19
Conclusion: polymorphism lower than
expected in Test Locus B: Selective sweep?
Slide adapted from Yoav Gilad
http://www.nsf.gov/news/mmg/media/images/corn-and-teosinte_h1.jpg
Teosinte
Maize
Maize w/TBR mutation
http://www.nsf.gov/news/mmg/media/images/corn-and-teosinte_h1.jpg
Mauricio
2001; Nature Reviews Genetics 2, 376
HKA Example: Teosinte Branched
 Lab exercise: test Teosinte-Branched Gene for signature of
purifying selection in maize compared to Teosinte relative
 Compare to patterns of polymorphism and diversity in
Alchohol Dehydrogenase gene