Lect2 - UFSCar

What should we study ?
• Levels of genetic variability - intrapopulational
• Population structure - interpopulational
• Geographic distribution of genetic diversity
• Taxonomic uncertainties – taxonomic and
systematic studies
• Number of species – taxonomic and ecological
approaches
Intrapopulational measures
Why Genetic Diversity
• Genetic diversity is important because it is the raw
material on which selection can act, and thus
species can respond to selective pressure.
• Majority of low frequency alleles exist in
heterozygous states, and there if they are
deleterious, their action may be fully or partially
masked.
Why Genetic Diversity
• Genetic diversity also plays a role in determining
IUCN categories.
• The lower the genetic diversity, the higher the
perceived risk of threat.
Measuring Genetic Diversity
• Measures of genetic diversity depend on the data
analyzed.
• One set of measures focuses on heterozygosity
measures and is based on diploid, co-dominant
markers.
• Other set of measures focuses on allelic
information, and or unphased diploid data.
Measures of
Genetic
diversity
•
Some indexes
implemented in
Arlequin
Molecular Markers
• Sequence data
• Single Nucleotide Polymorphism (SNP) data
• Microsatellite data
• Allozyme data
• Amplified Fragment Lengths Polymorphism (AFLP) data
• Randomly Amplified Polymorphic DNA (RAPD) data
• Hybridization data
• Chromosomal pattern data
Sequence data
Sequence data
• Differences in haplotypes are due to point mutations
(transition or transversion types), due to insertions or
due to deletions.
• In diploid organisms, differences are also due to
recombination.
• Molecular models of evolution dealing with point
mutations are very well studied.
Microsatellite data
Microsatellite data
Microsatellite data
Microsatellite data
Growing strand
Slippage
Template strand
Misalignment
+1 repeat
-1 repeat
Microsatellite data
• Differences in haplotypes are due to unequal crossing
over, or due to slippage in strand replication.
• This class of markers is co-dominant, i.e. heterozygous
and both homozygous classes of individuals can be
distinguished.
• Fast rate of molecular evolution.
• Models of molecular evolution are not well known.
Allozyme data
Allozyme data
• Properties of allozyme data are very similar to
microsatellite data.
RFLP
RFLP data
• Differences in haplotypes are due to point mutations
(transition or transversion types), due to insertions or
due to deletions.
• In diploid organisms, differences are also due to
recombination.
• This class of markers is dominant, i.e. heterozygous and
homozygous dominant individuals cannot be
distinguished.
Chromosomal data
Best Markers
• Theoretically the best markers are sequence markers.
• If there is sufficient variation – sufficient sequence
length.
• If the differences can be phased.
• And because we have the best models of molecular
evolution for these markers.
Haplotypes
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
Sample 7
Sample 8
Sample 9
Sample 10
Sample 11
Sample 12
AAAAA
AAAAA
AGAAA
AGAAA
AGAAG
AGAAG
GGAAA
GGAAA
GGGAA
GGGAA
GGGGA
GGGGA
Measuring Genetic Diversity
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
AGAACTTCTG
AGAACTTCTG
AGAACTTCTG
AAAA TTTTTG
AAAA TTTTTG
AAAATCTTTG
Number of
segregating sites
– Is the total
number of
mutations
observed in the
dataset.
Measuring Genetic Diversity
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
AGAACTTCTG
AGAACTTCTG
AGAACTTCTG
AAAA TTTTTG
AAAA TTTTTG
AAAATCTTTG
Gene Diversity –
Is equivalent to
expected
heterozygosity for
diploid data. It is
defined as the
probability that
any two randomly
selected
sequences will be
different.
Measuring Genetic Diversity
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
AGAACTTCTG
AGAACTTCTG
AGAACTTCTG
AAAA TTTTTG
AAAA TTTTTG
AAAATCTTTG
Mean number of
pairwise
differences –
Mean number of
differences
between all pairs
of haplotypes in
the sample.
d = mutational
difference, p =
allele frequency, k
= allele number, n
= sample size
Measuring Genetic Diversity
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
AGAACTTCTG
AGAACTTCTG
AGAACTTCTG
AAAA TTTTTG
AAAA TTTTTG
AAAATCTTTG
Nucleotide
Diversity –
It is computed as
the probability
that two randomly
chosen
homologous sites
are different.
d = mutational
difference, p =
allele frequency, k
= allele number, L
= number of loci
(allele number)
Measuring Genetic Diversity
•
•
•
•
Theta = θ = 4Nµ = 4Nm = 4N(µ+m)
For haploid markers θ = 2Nµ = 2Nm = 2N(µ+m)
The all important population genetic parameter.
It is based on the number of alleles or the number of
different nucleotides in a given sample.
• It quantifies genetic diversity of a given population.
Theta (θ) Hom
• The expected homozygosity (Zouros, 1979;
Chakraborty and Weiss (1991) in a population at
equilibrium between drift and mutation.
• Sensitive to small sample and allele sizes
• For microsat data
Theta (θ) S
• Estimated from the infinite-site equilibrium
relationship (Watterson, 1975) between the number
of segregating sites (S), the sample size (n) and θ for
a sample of non-recombining DNA.
Theta (θ) k
• Estimated from the infinite-allele equilibrium
relationship (Ewens, 1972) between the expected
number of alleles (k), the sample size (n) and θ.
• 95% confidence limits are calculated as
Sterling number (expansion factor of a factorial
Falling factorial
Theta (θ) πˆ
• Estimated from the infinite-site equilibrium (Tajima,
1983) relationship between the mean number of pairwise differences (πˆ) and theta (θ ).
Why so many θ measures
• Not all methods are suitable for all types of data.
• Ultimately all methods should result in the same
estimates of theta.
• Differences in estimates can be interpreted as
violations of assumptions, and each method is
sensitive to different assumptions.
Tajima’s D
• Tajima’s (1989) D test quantifies the discordance
between the estimate of theta from number of
segregating sites and from average pair-wise
sequence divergence.
Fu’s Fs
• Fu’s (1997) Fs measures the probability of observing
a certain number of haplotypes given particular value
of θ
Differences in θ measures
• Have selective interpretations.
• Have demographic interpretations.