Lecture 21: Tests for Departures from Neutrality November 9, 2012 Last Time Introduction to neutral theory Molecular clock Expectations for allele frequency distributions under neutral theory Today Sequence data and quantification of variation Infinite sites model Nucleotide diversity (π) Sequence-based tests of neutrality Ewens-Watterson Test Tajima’s D Hudson-Kreitman-Aguade Synonymous versus Nonsynonymous substitutions McDonald-Kreitman Expected Heterozygosity with Mutation-Drift Equilibrium under IAM At equilibrium: 1 1 fe 4Ne 1 1 set 4Neμ = θ Remembering that H = 1-f: He 1 Allele Frequency Distributions Black: Predicted from Neutral Theory White: Observed (hypothetical) Neutral theory allows a prediction of frequency distribution of alleles through process of birth and demise of alleles through time Comparison of observed to expected distribution provides evidence of departure from Infinite Alleles model Hartl and Clark 2007 Depends on f, effective population size, and mutation rate Ewens Sampling Formula Population mutation rate: index of variability of population: 4Ne Probability the i-th sampled allele is new given i alleles already sampled: Probability of sampling a new allele on the first sample: 0 Probability of observing a new allele after sampling one allele: . 1 Expected number of different alleles (k) in a sample of 2N alleles is: E (k ) i 1 i 0 1 2 ... 2N 1 Example: Expected number of alleles in a sample of 4: E (k ) 2 N 1 i i 0 3 i 0 i 1 i 1 Probability of sampling a new allele on the third and fourth samples: 2 N 1 1 2 3 He 2 3 Ewens Sampling Formula E ( n) 2 N 1 i i 0 1 ... 1 2 2N 1 where E(n) is the expected number of different alleles in a sample of N diploid individuals, and = 4Ne. 1 1 fe 4Ne 1 1 Predicts number of different alleles that should be observed in a given sample size if neutrality prevails under Infinite Alleles Model Small θ, E(n) approaches 1 Large θ, E(n) approaches 2N θ can be predicted from number of observed alleles for given sample size Can also predict expected homozygosity (fe) under this model Ewens-Watterson Test Compares expected homozygosity under the neutral model to expected homozygosity under HardyWeinberg equilibrium using observed allele frequencies Comparison of allele frequency distributions fe comes from infinite allele model simulations and can be found in tables for given sample sizes and observed allele numbers f HW p 2 i Ewens-Watterson Test Example Drosophila pseudobscura collected from winery Xanthine dehydrogenase alleles 15 alleles observed in 89 chromosomes fHW = 0.366 Hartl and Clark 2007 fe Generated fe by simulation: mean 0.168 How would you interpret this result? Expected Homozygosity fe Most Loci Look Neutral According to Ewens-Watterson Test Hartl and Clark 2007 DNA Sequence Polymorphisms DNA sequence is ultimate view of standing genetic variation: no hidden alleles Is this really true? What about back mutation? Signatures of past evolution are contained in DNA sequence Neutral theory presents null model Departures due to: Selection Demographic events - Bottlenecks, founder effects - Population admixture Sequence Alignment Necessary first step for comparing sequences within and between species Many different algorithms Tradeoff of speed and accuracy Quantifying Divergence of Sequences Nucleotide diversity (π) is average number of pairwise differences between sequences where N pi p j ij N 1 ij N is number of sequences in sample, pi and pj are frequency of sequences i and j in the sample, and πij is the proportion of sites that differ between sequences i and j Sample Calculation of π 5 10 15 20 25 30 35 A B C A->B, 1 difference A->C, 1 difference B->C, 2 differences N pi p j ij N 1 ij 3 (0.33)(0.33)(1 / 35) (0.33)(0.33)(1 / 35) (0.33)(0.33)( 2 / 35) 2 0.01867 On average, there are 18.67 polymorphisms per kb between pairs of haplotypes in the population Tajima’s D Statistic Infinite Sites Model: each new mutation affects a new site in a sequence E ( ) m m where m is length of sequence, and 4Ne Expected number of polymorphic sites in all sequences: E ( S ) a1 S 1 S S a1 a i 1 i 1 n 1 where n is number of different sequences compared Sample Calculation of θS 5 10 15 20 25 30 A B C Two polymorphic sites S=2 n 1 1 1 1 a1 1.5 1 2 i 1 i S 2 S 1.33 a1 1.5 0.01867 m (0.01867)(35) 0.65 35 Tajima’s D Statistic Two different ways of estimating same parameter: S m S a1 Deviation of these two indicates deviation from neutral expectations d S d D V (d ) where V(d) is variance of d Tajima’s D Expectations D=0: Neutrality D>0 d S Balancing Selection: Divergence of alleles (π) increases OR Bottleneck: S decreases D<0 Purifying or Positive Selection: Divergence of alleles decreases OR Population expansion: Many low frequency alleles cause low average divergence Balancing Selection Balancing selection ‘balanced’ mutation Neutral mutation d S Slide adapted from Yoav Gilad Should increase nucleotide diversity () Decreases polymorphic sites (S) initially. D>0 Recent Bottleneck d S Rare alleles are lost Polymorphic sites (S) more severely affected than nucleotide nucleotide diversity () D>0 Standard neutral model Positive Selection and Purifying Selection sweep S Advantageous mutation Neutral mutation d S Slide adapted from Yoav Gilad recovery s s Time Should decrease both nucleotide diversity () and polymorphic sites (S) initially. S recovers due to mutation recovers slowly: insensitive to rare alleles D<0 Rapid Population Growth will also result in an excess of rare alleles even for neutral loci Standard neutral model Time Rapid population size increase Often two main haplotypes, some rare alleles Most alleles are rare Slide adapted from Yoav Gilad 4Ne Most alleles are rare Nucleotide diversity () depressed Polymorphic sites (S) unchanged or even enhanced : 4Neμ is large D<0 d S How do we distinguish these two forms of divergence (selection vs demography)? Hudson-Kreitman-Aguade Test Divergence between species should be of same magnitude as variation within species Provides a correction factor for mutation rates at different sites Complex goodness of fit test Perform test for loci under selection and supposedly neutral loci Hudson-Kreitman-Aguade (HKA) test Neutral Locus Polymorphism Divergence 8 3 20 8 Polymorphism: Variation within species Divergence: Variation between species Slide adapted from Yoav Gilad Test Locus A 8/20 ≈ 3/8 Hudson-Kreitman-Aguade (HKA) test Neutral Locus Polymorphism Divergence Test Locus B 8 3 20 19 8/20 >> 3/19 Conclusion: polymorphism lower than expected in Test Locus B: Selective sweep? Slide adapted from Yoav Gilad http://www.nsf.gov/news/mmg/media/images/corn-and-teosinte_h1.jpg Teosinte Maize Maize w/TBR mutation http://www.nsf.gov/news/mmg/media/images/corn-and-teosinte_h1.jpg Mauricio 2001; Nature Reviews Genetics 2, 376 HKA Example: Teosinte Branched Lab exercise: test Teosinte-Branched Gene for signature of purifying selection in maize compared to Teosinte relative Compare to patterns of polymorphism and diversity in Alchohol Dehydrogenase gene
© Copyright 2026 Paperzz