Coalescent simulations - Section of population genetics

Evolutionary Genetics: Part 5
Coalescent simulations
S. chilense
S. peruvianum
Winter Semester 2012-2013
Prof Aurélien Tellier
FG Populationsgenetik
Color code
Color code:
Red = Important result or definition
Purple: exercise to do
Green: some bits of maths
Population genetics: 4 evolutionary forces
random genomic processes
(mutation, duplication, recombination, gene conversion)
molecular diversity
natural
selection
random spatial
process (migration)
random demographic
process (drift)
Simulating sequence data
How to simulate?
How to simulate?
How to simulate?
Algorithm to generate sequence data
Put k+n where n is the sample size
Choose an exponential variable with parameter k(k-1+θ)/2
With probability:
(k-1)/(k-1+θ) the event is a coalescent event
And with probability θ/(k-1+θ) the event is a mutation
If a coalescent event occurs choose a pair of lineages to coalesce, k becomes then
k-1
If a mutation event occurs, choose a lineage to mutate, k is unchanged
Repeat all this until k=1
Simulations 1
What is θ ?????
Simulations 1
Do you see the same numbers? WHY?
Simulations 1
Simulations 1
4 –t 5 –T > treefile.tre
pdf(file=‘‘constant_tree.pdf‘‘)
Dev.off()
Simulations 1: neutral and constant size
Simulations 2: neutral and expansion
Ancestral population size = x*N0
Time t1 of expansion
In 4N0 generations
Present population size = N0
Do you see a problem ??? What is N0 ???
t1 = 0.5 = time at which the expansion starts in the past
x = 0.1 = the population in the past is 0.1*N0
Simulations 2: neutral and expansion
4
-eN 0.5 0.1
4
-eN 0.05 0.1 – T > expansion.tre
0.5 = time at which the expansion starts in the past
0.1 = the population in the past is 0.1*N0
Simulations 2: trees of expansion
expansion.tre
pdf(file=‘‘expansion-tree.pdf‘‘)
Dev.off()
Simulations 3: crash or bottleneck?
For a crash:
./ms 10 4 –t 5 -eN 0.5 5
Ancestral population size = x*N0
Time t1 of expansion
In 4N0 generations
Present population size = N0
Simulations 3: crash or bottleneck?
For a bottleneck:
./ms 10 4 –t 5 -eN 0.5 0.25 -eN 0.75
t1
x1
t2
2
x2
Ancestral population
size = x2*N0
Time t2
Bottleneck population size = x1*N0
Time t1
Present population
size = N0
Simulations 2: trees of expansion
Exercise
Summarize the ms output
Exercise
Exercise
Then save the output in a file:
> test1.out
Exercise
Now using R
Load the file:
test <- read.table(“test1.out“,header=FALSE)
Then draw graphs:
pdf(file=‘‘summary_neutral_constant.pdf‘‘)
hist(test[,2],main=“Theta_Pi Tajima“)
hist(test[,4],main=“Theta_Watterson“)
hist(test[,6],main=“Tajima D“)
Dev.off()
Then do the same for an expansion, decline or bottleneck
Exercise
Final simulations
Using msmsplay on your computer
Command line is similar
Can see directly the site Frequency-Spectrum
Can you compare the site frequency spectrum with values of Tajima‘s D ?
Lets simulate neutral model, expansion, decline
What differences we see?
Some data analysis
Use datasets:
Use DnaSP to calculate usual statistics:
Diversity = θW , θπ
Site frequency spectrum
Tajima‘s D
What do you conclude on these various data?
Do you have an idea of the past demography of these populations?
Why do you need several independent loci ?