Recombination and LD - Section of population genetics

Evolutionary Genetics: Part 7
Recombination – Linkage Disequilibrium
S. chilense
S. peruvianum
Winter Semester 2012-2013
Prof Aurélien Tellier
FG Populationsgenetik
Color code
Color code:
Red = Important result or definition
Purple: exercise to do
Green: some bits of maths
Population genetics: 4 evolutionary forces
random genomic processes
(mutation, duplication, recombination, gene conversion)
molecular diversity
natural
selection
random spatial
process (migration)
random demographic
process (drift)
Recombination
Recombination and crossing over
Physical map
Genetic map
Independent segregation (Mendel’s law)
Non-independent segregation
This is genetic linkage
Non-independent segregation
Recombination rate
ρ=
number of recombined gametes
total number of gametes
In general:
The recombination rate of two loci on different chromosomes = 0.5
The recombination rate between loci on same chromosome 0<ρ<0.5
The recombination rate of two loci on the same chromosome increases
monotonically with distance
BUT there are recombination hotspots (or cold spots) in the genome
Non-independent segregation
Recombination and crossing-over
Genetic map length - Morgan
Model without recombination
A
From your grandfather or
your grandmother
B
A
Inherited from
your mother
B
A
Your
chromosomes
B
Inherited from
your father
Model with recombination
A
a
From your
grandmother
From your
grandfather
b
B
A
Inherited from
your mother
B
A
Your
chromosomes
B
Inherited from
your father
Model with recombination
So two loci on the same chromosome can come
From a single parent if there is no recombination
From two parents if there is recombination
With recombination, the chromosome of your parents are mosaics of
pieces of chromosomes from their parents
We define ρ as the probability that a recombination event happens
P[two loci have the same parent] = 1-ρ
Model with recombination
we define ρ as the probability that a recombination event happens
P[two loci have the same parent] = 1-ρ
Coalescence with recombination
Take one linage
Tracing it back in time, recombination events can happen
Recombination happens with probability ρ at every generation
P[recombination event t generation ago]=ρ(1-ρ)t-1
This is again a geometric (exponential) distribution
Backward in time:
There can be
coalescence of two lineages
or recombination event
recombination creates two lineages backward in time: one with locus A
and the other with locus B
Coalescence with recombination
The number of lineages is increased by recombination, so it can take a while to
find the MRCA
However, if the number of lineages increases (k), this will increase also the rate
of coalescence, so an MRCA will be found
Coalescence with recombination
Along the genome, a serie of sites have a coalescent tree
In fact, recombination slowly breaks link between sites
The higher the recombination, the more independent are the loci
Virtually, every locus has its own MRCA
If recombination rates vary along the genome, this means that loci have
different recombination in their tree
Coalescence without recombination
Along the genome, ONLY ONE tree for all loci
The higher the recombination, the more independent are the loci
Recombination is important, otherwise, each chromosome would be only one
data point (= one tree)
This is the case for: Y-chromosome in humans, Mitochondrial DNA,
Chloroplast DNA where there is no recombination (= one tree for all loci)
Why is this a problem if no recombination?
Coalescence without recombination
Why is this a problem if no recombination?
This is the case for: Y-chromosome in humans, Mitochondrial DNA,
Chloroplast DNA where there is no recombination (= one tree for all loci)
Understanding the evolution in the genome requires to have independent
information about ONE evolutionary process (= different trees which come from
the same evolutionary scenario)
Information comes from the variance between loci
If all loci are linked, what is neutral evolution? If some genes are under
selection?
Coalescence with recombination
How far along the genome do you have to go to find a recombination event?
define r as the per site (bp) recombination rate
if two sites are distant of d, the recombination rate ρ = rd
the coalescence rate is 1/2N, we want at least 50% chance to have a
recombination event
P[recombination before coalescence] =
2rd
1
= 1−
≥ 0.5
2rd + 1/ 2 N
4 Nrd + 1
this can be simplified as 4Nrd > 1 or d >1/4Nr
For humans, Ne=104 and r= 10-8, we get d > 2500bp
In Drosophila where Ne=106, the distance is 100 times shorter
Recombination and data
Linkage disequilibrium
Recombination in data: 4 gamete rule
There is one rule to recognize if recombination happened
the four gamete rule
Did recombination happen on the right or on the left of the 2nd site?
Recombination in data: LD
Linkage Disequilibrium (LD) is measured as D
Two loci A and B with alleles A1 and A2, B1 and B2
Frequencies are: A1B1 = p11 ; A1B2 = p12 ; A2B1 = p21 ; A2B2 =p22
Recombination in data: LD
The A1B1 and A2B2 gametes are called coupling gametes
The A1B2 and A1B2 gametes are called the repulsion gametes
LD is a measure of the excess of coupling over repulsion gametes
If D>0, there are more coupling gametes than expected at equilibrium
If D<0, there are more repulsion gametes than expected
Recombination in data: LD
Linkage Disequilibrium (LD)
Recombination in data: LD
Recombination in data: LD
Linkage Disequilibrium (LD) is measured as D and r2
The change in D in a single generation is: ∆D = –ρD
After t generations:
Dt = (1 –ρ)t D0
This is again and again a geometric function of time
This means that the ultimate state of the population is D=0
BUT there is memory of LD in time
LD decreases away from a given site in the genome also following a
geometric function
Recombination in data: haplotypes
Linkage Disequilibrium (LD) can be seen in the presence of haplotypes
Example: (Plos Genetics 2006)
Do you expect long or short haplotypes under recombination?
If genes can show different recombination rates, what does this
mean for haplotypes?
Length and frequency of haplotypes are important signatures to
detect deviation from neutral evolution!!!
Recombination in data
Using DnaSP
Using the TNFSF5 and the droso files
Look at the haplotypes ( Generate => Haplotype Data File)
Why are haplotypes important to study recombination? What about the
infos on distance between sites?
Can you look at recombination? Measure of LD, r2 and also the number
of four-gamete rule
Use Analysis => Recombination
Decay of LD from sites?