New evolutionary models for the long range
dependencies of loosely linked loci
Paul A. Jenkins
Departments of Statistics
University of Warwick
Joint work with Paul Fearnhead (Lancaster), Yun Song (Berkeley)
17 June 2015
CONFERENCE on PROBABILITY AND BIOLOGICAL EVOLUTION,
Centre International de Rencontres Mathématiques (CIRM)
Marseille-Luminy.
0 / 37
Introduction
Diffusion model
Coalescent model
Summary
The basic problem (Computing likelihoods)
For a given population genetics model, what is the probability of
observing a sample of DNA sequences randomly drawn from a
population?
Haplotype 1 =
Haplotype 2 =
Haplotype 3 =
Haplotype 4 =
Haplotype 5 =
AACT A GG......CCGTGACC......ACAGCTAT
AACT A GG......CCGT A ACC......ACAGCTAT
AACTGGG......CCGTGACC......ACAGCTAT
AACTGGG......CCGT A ACC......ACAG T TAT
AACT A GG......CCGTGACC......ACAG T TAT
Applications
Estimating evolutionary parameters: L(θ, ρ) = P(D | θ, ρ)
Ancestral inference
Disease gene mapping
1 / 37
Introduction
Diffusion model
Coalescent model
Summary
Closed-form one-locus likelihood functions
n = (n1 , . . . , nK ), where ni = number of samples with allele i.
q(n), probability of an ordered sample with configuration n.
θ = 4Nu, mutation parameter.
2 / 37
Introduction
Diffusion model
Coalescent model
Summary
Closed-form one-locus likelihood functions
n = (n1 , . . . , nK ), where ni = number of samples with allele i.
q(n), probability of an ordered sample with configuration n.
θ = 4Nu, mutation parameter.
Finite alleles, parent-independent mutation (PIM) model
Mutation transition matrix satisfies Pij = Pj .
Wright’s sampling formula (1949):
QK
θPi (θPi + 1) . . . (θPi + ni − 1)
qWSF (n) = i=1
θ(θ + 1) . . . (θ + n − 1)
2 / 37
Introduction
Diffusion model
Coalescent model
Summary
Closed-form one-locus likelihood functions
n = (n1 , . . . , nK ), where ni = number of samples with allele i.
q(n), probability of an ordered sample with configuration n.
θ = 4Nu, mutation parameter.
Finite alleles, parent-independent mutation (PIM) model
Mutation transition matrix satisfies Pij = Pj .
Wright’s sampling formula (1949):
QK
θPi (θPi + 1) . . . (θPi + ni − 1)
qWSF (n) = i=1
θ(θ + 1) . . . (θ + n − 1)
Infinite alleles model
Ewens sampling formula (1972):
Q
θK Ki=1 (ni − 1)!
qESF (n) =
θ(θ + 1) . . . (θ + n − 1)
2 / 37
Introduction
Diffusion model
Coalescent model
Summary
Crossover Recombination
Parent 1:
Child:
Parent 2:
3 / 37
Introduction
Diffusion model
Coalescent model
Summary
Crossover Recombination
Parent 1:
Child:
Parent 2:
Multi-locus models
Ancestral recombination graph (ARG)
Wright-Fisher diffusion with recombination
3 / 37
Introduction
Diffusion model
Coalescent model
Summary
Crossover Recombination
Parent 1:
Child:
Parent 2:
Multi-locus models with recombination
Obtaining an exact, analytic likelihood function under these models
has so far remained a challenging open problem, even for just two
loci.
3 / 37
Introduction
Diffusion model
Coalescent model
Summary
Problem setup
A two-locus sample configuration, c = (cij )
2
1
1
Row sums:
Column sums:
1
0
0
c A = (ci· ) = (3, 1, 1)
c B = (c·j ) = (4, 1)
4 / 37
Introduction
Diffusion model
Coalescent model
Summary
Problem setup
A two-locus sample configuration, c = (cij )
2
1
1
Row sums:
Column sums:
1
0
0
c A = (ci· ) = (3, 1, 1)
c B = (c·j ) = (4, 1)
Goal: Compute the
sampling distribution, q(c).
4 / 37
Introduction
Diffusion model
Coalescent model
Summary
Previous work
Key Idea: Asymptotic Series
Write
q(c; ρ) = q0 (c) +
(Jenkins & Song, 2009, 2010, 2012)
q1 (c) q2 (c)
+
+ ...,
ρ
ρ2
where q0 , q1 , . . . are independent of the recombination parameter,
ρ (= 4Nr ) (but implicitly depend on θA , θB ). Now recursively solve for
q0 , q1 , . . . .
5 / 37
Introduction
Diffusion model
Coalescent model
Summary
Previous work
Key Idea: Asymptotic Series
Write
q(c; ρ) = q0 (c) +
(Jenkins & Song, 2009, 2010, 2012)
q1 (c) q2 (c)
+
+ ...,
ρ
ρ2
where q0 , q1 , . . . are independent of the recombination parameter,
ρ (= 4Nr ) (but implicitly depend on θA , θB ). Now recursively solve for
q0 , q1 , . . . .
q0 (c)
q0 (c) is the exact sampling distribution when the two loci are
unlinked (ρ = ∞).
5 / 37
Introduction
Diffusion model
Coalescent model
Summary
Previous work
Key Idea: Asymptotic Series
Write
q(c; ρ) = q0 (c) +
(Jenkins & Song, 2009, 2010, 2012)
q1 (c) q2 (c)
+
+ ...,
ρ
ρ2
where q0 , q1 , . . . are independent of the recombination parameter,
ρ (= 4Nr ) (but implicitly depend on θA , θB ). Now recursively solve for
q0 , q1 , . . . .
q0 (c)
q0 (c) is the exact sampling distribution when the two loci are
unlinked (ρ = ∞).
A
B
Infinite alleles:
q0 (c) = qESF
(c A )qESF
(c B )
Finite alleles, parent-independent mutation:
A
B
q0 (c) = qWSF
(c A )qWSF
(c B )
5 / 37
Introduction
Diffusion model
Coalescent model
Summary
Previous work
Key Idea: Asymptotic Series
Write
q(c; ρ) = q0 (c) +
(Jenkins & Song, 2009, 2010, 2012)
q1 (c) q2 (c)
+
+ ...,
ρ
ρ2
where q0 , q1 , . . . are independent of the recombination parameter,
ρ (= 4Nr ) (but implicitly depend on θA , θB ). Now recursively solve for
q0 , q1 , . . . .
q0 (c)
q0 (c) is the exact sampling distribution when the two loci are
unlinked (ρ = ∞).
A
B
Infinite alleles:
q0 (c) = qESF
(c A )qESF
(c B )
Finite alleles, parent-independent mutation:
A
B
q0 (c) = qWSF
(c A )qWSF
(c B )
Key property: q0 (c) is expressible in terms of the relevant
one-locus sampling distributions.
5 / 37
Introduction
Diffusion model
Higher order terms
Coalescent model
Summary
(Jenkins & Song, 2012)
We have developed a systematic and automatable method to
compute higher order terms: q1 (c), q2 (c), q3 (c), . . ..
6 / 37
Introduction
Diffusion model
Higher order terms
Coalescent model
Summary
(Jenkins & Song, 2012)
We have developed a systematic and automatable method to
compute higher order terms: q1 (c), q2 (c), q3 (c), . . ..
A technique known as Padé summation guarantees that our
asymptotic series converges exactly to the truth, for any ρ.
6 / 37
Introduction
Diffusion model
Coalescent model
Higher order terms
Summary
(Jenkins & Song, 2012)
We have developed a systematic and automatable method to
compute higher order terms: q1 (c), q2 (c), q3 (c), . . ..
A technique known as Padé summation guarantees that our
asymptotic series converges exactly to the truth, for any ρ.
The method generalizes to handle missing alleles.
6 / 37
Introduction
Diffusion model
Coalescent model
Higher order terms
Summary
(Jenkins & Song, 2012)
We have developed a systematic and automatable method to
compute higher order terms: q1 (c), q2 (c), q3 (c), . . ..
A technique known as Padé summation guarantees that our
asymptotic series converges exactly to the truth, for any ρ.
The method generalizes to handle missing alleles.
The method generalizes to incorporate selection at one locus.
6 / 37
Introduction
Diffusion model
Coalescent model
Higher order terms
Summary
(Jenkins & Song, 2012)
We have developed a systematic and automatable method to
compute higher order terms: q1 (c), q2 (c), q3 (c), . . ..
A technique known as Padé summation guarantees that our
asymptotic series converges exactly to the truth, for any ρ.
The method generalizes to handle missing alleles.
The method generalizes to incorporate selection at one locus.
6 / 37
Introduction
Diffusion model
Coalescent model
Summary
Higher order terms
(Jenkins & Song, 2012)
We have developed a systematic and automatable method to
compute higher order terms: q1 (c), q2 (c), q3 (c), . . ..
A technique known as Padé summation guarantees that our
asymptotic series converges exactly to the truth, for any ρ.
The method generalizes to handle missing alleles.
The method generalizes to incorporate selection at one locus.
4
Example
7
c = 10
2 1 , θA = θB = 0.01
(symmetric mutation).
Likelihood (10−15)
0
Before Padé summation
3
2
1
0
0
20
40
60
80
100
ρ
6 / 37
Introduction
Diffusion model
Coalescent model
Summary
Higher order terms
(Jenkins & Song, 2012)
We have developed a systematic and automatable method to
compute higher order terms: q1 (c), q2 (c), q3 (c), . . ..
A technique known as Padé summation guarantees that our
asymptotic series converges exactly to the truth, for any ρ.
The method generalizes to handle missing alleles.
The method generalizes to incorporate selection at one locus.
4
Example
7
c = 10
2 1 , θA = θB = 0.01
(symmetric mutation).
Likelihood (10−15)
0
Before Padé summation
3
2
1
0
1
0
20
40
60
80
100
ρ
6 / 37
Introduction
Diffusion model
Coalescent model
Summary
Higher order terms
(Jenkins & Song, 2012)
We have developed a systematic and automatable method to
compute higher order terms: q1 (c), q2 (c), q3 (c), . . ..
A technique known as Padé summation guarantees that our
asymptotic series converges exactly to the truth, for any ρ.
The method generalizes to handle missing alleles.
The method generalizes to incorporate selection at one locus.
4
2
Before Padé summation
Example
7
c = 10
2 1 , θA = θB = 0.01
(symmetric mutation).
Likelihood (10−15)
0
3
2
1
0
1
0
20
40
60
80
100
ρ
6 / 37
Introduction
Diffusion model
Coalescent model
Summary
Higher order terms
(Jenkins & Song, 2012)
We have developed a systematic and automatable method to
compute higher order terms: q1 (c), q2 (c), q3 (c), . . ..
A technique known as Padé summation guarantees that our
asymptotic series converges exactly to the truth, for any ρ.
The method generalizes to handle missing alleles.
The method generalizes to incorporate selection at one locus.
4
2
3
Before Padé summation
Example
7
c = 10
2 1 , θA = θB = 0.01
(symmetric mutation).
Likelihood (10−15)
0
3
2
1
0
1
0
20
40
60
80
100
ρ
6 / 37
Introduction
Diffusion model
Coalescent model
Summary
Higher order terms
(Jenkins & Song, 2012)
We have developed a simple, systematic and automatable
method to compute higher order terms: q1 , q2 , q3 , . . ..
A technique known as Padé summation guarantees that our
asymptotic series converges exactly to the truth, for any ρ.
The method generalizes to handle missing alleles.
The method generalizes to incorporate selection at one locus.
4
Example
7
c = 10
2 1 , θA = θB = 0.01
(symmetric mutation).
−15
)
3
Likelihood (10
After Padé summation
2
1
0
0
20
40
60
80
100
ρ
7 / 37
Introduction
Diffusion model
Coalescent model
Summary
Higher order terms
(Jenkins & Song, 2012)
We have developed a simple, systematic and automatable
method to compute higher order terms: q1 , q2 , q3 , . . ..
A technique known as Padé summation guarantees that our
asymptotic series converges exactly to the truth, for any ρ.
The method generalizes to handle missing alleles.
The method generalizes to incorporate selection at one locus.
4
0
Example
7
c = 10
2 1 , θA = θB = 0.01
(symmetric mutation).
−15
)
3
Likelihood (10
After Padé summation
2
1
0
0
20
40
60
80
100
ρ
7 / 37
Introduction
Diffusion model
Coalescent model
Summary
Higher order terms
(Jenkins & Song, 2012)
We have developed a simple, systematic and automatable
method to compute higher order terms: q1 , q2 , q3 , . . ..
A technique known as Padé summation guarantees that our
asymptotic series converges exactly to the truth, for any ρ.
The method generalizes to handle missing alleles.
The method generalizes to incorporate selection at one locus.
4
0
Example
7
c = 10
2 1 , θA = θB = 0.01
(symmetric mutation).
−15
)
3
Likelihood (10
After Padé summation
1
2
1
0
0
20
40
60
80
100
ρ
7 / 37
Introduction
Diffusion model
Coalescent model
Summary
Higher order terms
(Jenkins & Song, 2012)
We have developed a simple, systematic and automatable
method to compute higher order terms: q1 , q2 , q3 , . . ..
A technique known as Padé summation guarantees that our
asymptotic series converges exactly to the truth, for any ρ.
The method generalizes to handle missing alleles.
The method generalizes to incorporate selection at one locus.
4
0
Example
7
c = 10
2 1 , θA = θB = 0.01
(symmetric mutation).
−15
)
3
Likelihood (10
After Padé summation
1
2
1
2
0
0
20
40
60
80
100
ρ
7 / 37
Introduction
Diffusion model
Coalescent model
Summary
Higher order terms
(Jenkins & Song, 2012)
We have developed a simple, systematic and automatable
method to compute higher order terms: q1 , q2 , q3 , . . ..
A technique known as Padé summation guarantees that our
asymptotic series converges exactly to the truth, for any ρ.
The method generalizes to handle missing alleles.
The method generalizes to incorporate selection at one locus.
4
0
Example
7
c = 10
2 1 , θA = θB = 0.01
(symmetric mutation).
3
)
3
−15
Likelihood (10
After Padé summation
1
2
1
2
0
0
20
40
60
80
100
ρ
7 / 37
Introduction
Diffusion model
Coalescent model
Summary
Intriguing observation
Reminder: Asymptotic expansion
q(c) = q0 (c) +
q1 (c) q2 (c)
+ ...,
+
ρ
ρ2
8 / 37
Introduction
Diffusion model
Coalescent model
Summary
Intriguing observation
Reminder: Asymptotic expansion
q(c) = q0 (c) +
q1 (c) q2 (c)
+ ...,
+
ρ
ρ2
Reminder: q0 (c)
q0 (c) = q A (c A )q B (c B ) is a simple linear combination of products of
one-locus sampling distributions, and universal—independent of the
assumed mutation model.
8 / 37
Introduction
Diffusion model
Coalescent model
Summary
Intriguing observation
Reminder: Asymptotic expansion
q(c) = q0 (c) +
q1 (c) q2 (c)
+ ...,
+
ρ
ρ2
Reminder: q0 (c)
q0 (c) = q A (c A )q B (c B ) is a simple linear combination of products of
one-locus sampling distributions, and universal—independent of the
assumed mutation model.
Observation: The same is true of q1 (c).
X cij c A
B
q1 (c) =
q (c A )q (c B ) +
q A (c A − e i )q B (c B − e j )
2
2
i,j
X ci· X c·j B
A
A
− q (c B )
q (c A − e i ) − q (c A )
q B (c B − e j ).
2
2
i
[e i =
(0 . . . , 0, 1, 0, . . . , 0)T ,
j
a unit vector with a 1 in the ith position.] 8 / 37
Introduction
Diffusion model
1
Coalescent model
2
3
Summary
4
The standard coalescent with recombination
For large recombination rates, ARGs are typically very complicated,
containing many recombination events.
9 / 37
Introduction
Diffusion model
1
Coalescent model
2
3
Summary
4
Counterintuitive
However, we in fact expect the dynamics to be easier to study for
large recombination rates, since the loci under consideration would
then be less dependent.
9 / 37
Introduction
Diffusion model
1
Coalescent model
2
3
Summary
4
Conjecture
There exists a simpler stochastic process that describes the
important dynamics of the ARG for large recombination rates, with
q1 (c) capturing its sampling distribution.
9 / 37
Introduction
Diffusion model
Coalescent model
Summary
Duality
1
Frequency
0
Time
Conjecture
Furthermore, we should be able to make a similar statement about
the Wright-Fisher diffusion, via duality.
10 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new diffusion model
Goal: Derive a diffusion model which is
11 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new diffusion model
Goal: Derive a diffusion model which is
simple to describe, with
11 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new diffusion model
Goal: Derive a diffusion model which is
simple to describe, with
a closed-form sampling distribution, which
11 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new diffusion model
Goal: Derive a diffusion model which is
simple to describe, with
a closed-form sampling distribution, which
agrees with the “truth” [up to O(ρ−2 )]: q0 (c) + q1 (c)/ρ .
11 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new diffusion model
Goal: Derive a diffusion model which is
simple to describe, with
a closed-form sampling distribution, which
agrees with the “truth” [up to O(ρ−2 )]: q0 (c) + q1 (c)/ρ .
11 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new diffusion model
Goal: Derive a diffusion model which is
simple to describe, with
a closed-form sampling distribution, which
agrees with the “truth” [up to O(ρ−2 )]: q0 (c) + q1 (c)/ρ .
Outline of approach
1
Start with a two-locus Moran model.
11 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new diffusion model
Goal: Derive a diffusion model which is
simple to describe, with
a closed-form sampling distribution, which
agrees with the “truth” [up to O(ρ−2 )]: q0 (c) + q1 (c)/ρ .
Outline of approach
1
Start with a two-locus Moran model.
2
Change coordinates from haplotype frequencies to marginal
allele frequencies and coefficients of linkage disequilibrium
(cf. Ohta & Kimura, 1969).
11 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new diffusion model
Goal: Derive a diffusion model which is
simple to describe, with
a closed-form sampling distribution, which
agrees with the “truth” [up to O(ρ−2 )]: q0 (c) + q1 (c)/ρ .
Outline of approach
1
Start with a two-locus Moran model.
2
Change coordinates from haplotype frequencies to marginal
allele frequencies and coefficients of linkage disequilibrium
(cf. Ohta & Kimura, 1969).
3
Suppose that ρβ = 4N β r is fixed as N → ∞, where
0 < β < 1—instead of the usual β = 1.
11 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new diffusion model
Goal: Derive a diffusion model which is
simple to describe, with
a closed-form sampling distribution, which
agrees with the “truth” [up to O(ρ−2 )]: q0 (c) + q1 (c)/ρ .
Outline of approach
1
Start with a two-locus Moran model.
2
Change coordinates from haplotype frequencies to marginal
allele frequencies and coefficients of linkage disequilibrium
(cf. Ohta & Kimura, 1969).
3
4
Suppose that ρβ = 4N β r is fixed as N → ∞, where
0 < β < 1—instead of the usual β = 1.
Take the diffusion limit of the fluctuations of the coordinates
about the deterministic limit.
11 / 37
Introduction
Diffusion model
Coalescent model
Summary
The Wright-Fisher diffusion
1
Frequency, X
dX = µ(X )dt + σ(X )dW ,
X = (Xij ),
i, j, ∈ {A, C, G, T }.
0
Time
The (two-locus) Wright-Fisher diffusion
X
State space: ∆ = x = (xij ) ∈ [0, 1]K ×L |
xij = 1 .
i,j
12 / 37
Introduction
Diffusion model
Coalescent model
Summary
The Wright-Fisher diffusion
1
Frequency, X
dX = µ(X )dt + σ(X )dW ,
X = (Xij ),
i, j, ∈ {A, C, G, T }.
0
Time
The (two-locus) Wright-Fisher diffusion
X
State space: ∆ = x = (xij ) ∈ [0, 1]K ×L |
xij = 1 .
i,j
Drift coefficient
ρ
µij (x) = − (xij − xi· x·j ) + (mutation terms; θA , θB )
2
12 / 37
Introduction
Diffusion model
Coalescent model
Summary
The Wright-Fisher diffusion
1
Frequency, X
dX = µ(X )dt + σ(X )dW ,
X = (Xij ),
i, j, ∈ {A, C, G, T }.
0
Time
The (two-locus) Wright-Fisher diffusion
X
State space: ∆ = x = (xij ) ∈ [0, 1]K ×L |
xij = 1 .
i,j
Drift coefficient
ρ
µij (x) = − (xij − xi· x·j ) + (mutation terms; θA , θB )
2
Diffusion coefficient:
2
σij,kl
(x) = xij (δij,kl − xkl ).
12 / 37
Introduction
Diffusion model
Coalescent model
Summary
The Wright-Fisher diffusion
1
Frequency, X
dX = µ(X )dt + σ(X )dW ,
X = (Xij ),
i, j, ∈ {A, C, G, T }.
0
Time
Sampling distribution
Y cij
q(c) = E X .
ij
i,j
Using a standard result: E[Lf (X )] = 0, we get a linear system
of equation for the moments of X .
But this system grows exponentially in the sample size.
So we need an approximation.
12 / 37
Introduction
Diffusion model
Coalescent model
Summary
How to derive this diffusion?
Classical approach
Start from a finite population model of size N.
Let N → ∞ (possibly after a rescaling of time).
Rates of mutation and recombination are assumed to be such
that they occur at O(1) in the diffusion limit.
Population size, N
Time −→
13 / 37
Introduction
Diffusion model
Coalescent model
Summary
How to derive this diffusion?
Classical approach
Start from a finite population model of size N.
Let N → ∞ (possibly after a rescaling of time).
Rates of mutation and recombination are assumed to be such
that they occur at O(1) in the diffusion limit.
Population size, N
Time −→
13 / 37
Introduction
Diffusion model
Coalescent model
Summary
How to derive this diffusion?
Classical approach
Start from a finite population model of size N.
Let N → ∞ (possibly after a rescaling of time).
Rates of mutation and recombination are assumed to be such
that they occur at O(1) in the diffusion limit.
Population size, N
Time −→
13 / 37
Introduction
Diffusion model
Coalescent model
Summary
How to derive this diffusion?
Classical approach
Start from a finite population model of size N.
Let N → ∞ (possibly after a rescaling of time).
Rates of mutation and recombination are assumed to be such
that they occur at O(1) in the diffusion limit.
Population size, N
Time −→
13 / 37
Introduction
Diffusion model
Coalescent model
Summary
1. Moran model
1
2
3
4
5
6
7
Time
Rates
Resampling
Mutation (locus A)
Mutation (locus B)
Recombination
N 2 /2
θA /2
θB /2
ρ/2
14 / 37
Introduction
Diffusion model
Coalescent model
Summary
2. Change coordinates (Ohta & Kimura, 1969)
(N)
Old system
Xij
,
i ∈ {1, 2, . . . , K }, j ∈ {1, 2, . . . , L}
15 / 37
Introduction
Diffusion model
Coalescent model
Summary
2. Change coordinates (Ohta & Kimura, 1969)
(N)
Old system
Xij
,
i ∈ {1, 2, . . . , K }, j ∈ {1, 2, . . . , L}
(N)
(N)
(N)
(N)
(N)
(N) (N)
New system
(Xi· ), (X·j ), (Dij ) , Dij := Xij − Xi· X·j .
15 / 37
Introduction
Diffusion model
Coalescent model
Summary
2. Change coordinates (Ohta & Kimura, 1969)
(N)
Old system
Xij
,
i ∈ {1, 2, . . . , K }, j ∈ {1, 2, . . . , L}
(N)
(N)
(N)
(N)
(N)
(N) (N)
New system
(Xi· ), (X·j ), (Dij ) , Dij := Xij − Xi· X·j .
Diffusion limit
#
K
θA X A (N) θA (N)
Pki Xk · − Xi·
dt + o(dt),
| X] =
2
2
k=1
"
#
L
θB X B (N) θB (N)
| X] =
Plj X·l − X·j
dt + o(dt),
2
2
l=1
"
K
ρ (N)
θA X A (N) θA (N)
(N)
| X ] = − Dij − Dij +
Pki Dkj − Dij
2
2
2
k=1
#
L
θB X B (N) θB (N)
−1
+
Plj Dil − Dij + O(N ) dt + o(dt)
2
2
"
(N)
E[∆Xi·
(N)
E[∆X·j
(N)
E[∆Dij
l=1
15 / 37
Introduction
Diffusion model
Coalescent model
Summary
3. Rescale recombination, ρ
Suppose ρβ = ρN β−1 = 4N β r is fixed as N → ∞, where 0 < β < 1.
Rescale time to capture this fast behaviour: tnew = N 1−β told .
Diffusion limit
#
K
θA X A (N) θA (N)
| X] =
Pki Xk · − Xi·
dt + o(dt),
2
2
k=1
"
#
L
θB X B (N) θB (N)
Plj X·l − X·j
dt + o(dt),
| X] =
2
2
l=1
"
K
ρ (N)
θ X A (N) θA (N)
(N)
| X ] = − Dij − Dij + A
Pki Dkj − Dij
2
2
2
k=1
#
L
θB X B (N) θB (N)
−1
+
Plj Dil − Dij + O(N ) dt + o(dt)
2
2
"
(N)
E[∆Xi·
(N)
E[∆X·j
(N)
E[∆Dij
l=1
16 / 37
Introduction
Diffusion model
Coalescent model
Summary
3. Rescale recombination, ρ
Suppose ρβ = ρN β−1 = 4N β r is fixed as N → ∞, where 0 < β < 1.
Rescale time to capture this fast behaviour: tnew = N 1−β told .
Diffusion limit
#
K
θA X A (N) θA (N)
| X] =
Pki Xk· − Xi·
dt + o(dt),
2
2
k=1
"
#
L
θB X B (N) θB (N)
Plj X·l − X·j
dt + o(dt),
| X] =
2
2
l=1
"
K
ρβ N 1−β (N)
θ X A (N) θA (N)
(N)
| X] = −
Dij − Dij + A
Pki Dkj − Dij
2
2
2
k=1
#
L
θB X B (N) θB (N)
−1
+
Plj Dil − Dij + O(N ) dt + o(dt)
2
2
"
(N)
E[∆Xi·
(N)
E[∆X·j
(N)
E[∆Dij
l=1
16 / 37
Introduction
Diffusion model
Coalescent model
Summary
3. Rescale recombination, ρ
Suppose ρβ = ρN β−1 = 4N β r is fixed as N → ∞, where 0 < β < 1.
Rescale time to capture this fast behaviour: tnew = N 1−β told .
Diffusion limit
#
K
θA X A (N) θA (N)
dt
| X] =
Pki Xk· − Xi·
+ o(dt),
2
2
N 1−β
k=1
"
#
L
dt
θB X B (N) θB (N)
Plj X·l − X·j
+ o(dt),
| X] =
2
2
N 1−β
l=1
"
K
ρβ N 1−β (N)
θ X A (N) θA (N)
(N)
| X] = −
Dij − Dij + A
Pki Dkj − Dij
2
2
2
k=1
#
L
θB X B (N) θB (N)
dt
−1
+
Plj Dil − Dij + O(N )
+ o(dt)
2
2
N 1−β
"
(N)
E[∆Xi·
(N)
E[∆X·j
(N)
E[∆Dij
l=1
16 / 37
Introduction
Diffusion model
Coalescent model
Summary
4. Seek a diffusion limit
Diffusion limit
(N)
E[∆Xi·
(N)
E[∆X·j
(N)
E[∆Dij
| X] = O
1
dt + o(dt),
N 1−β
1
dt + o(dt),
| X] = O
1−β
N
ρβ (N)
1
| X ] = − Dij + O
dt + o(dt)
2
N 1−β
17 / 37
Introduction
Diffusion model
Coalescent model
Summary
4. Seek a diffusion limit
Diffusion limit
E[∆Xi· | X ] = o(dt),
E[∆X·j | X ] = o(dt),
h ρ
i
β
E[∆Dij | X ] = − Dij dt + o(dt)
2
after N → ∞.
17 / 37
Introduction
Diffusion model
Coalescent model
Summary
4. Seek a diffusion limit
Diffusion limit
E[∆Xi· | X ] = o(dt),
E[∆X·j | X ] = o(dt),
h ρ
i
β
E[∆Dij | X ] = − Dij dt + o(dt)
2
after N → ∞.
The description is completed by finding the limiting covariance
matrix.
17 / 37
Introduction
Diffusion model
Coalescent model
Summary
4. Seek a diffusion limit
Diffusion limit
E[∆Xi· | X ] = o(dt),
E[∆X·j | X ] = o(dt),
h ρ
i
β
E[∆Dij | X ] = − Dij dt + o(dt)
2
after N → ∞.
The description is completed by finding the limiting covariance
matrix.
But—on this timescale it is 0!
17 / 37
Introduction
Diffusion model
Coalescent model
Summary
Diffusion limits
1
D
0.5
D
0
?
−0.5
−1
Time
Wright-Fisher
diffusion
18 / 37
Introduction
Diffusion model
Coalescent model
Summary
D
1
1
0.5
0.5
0
?
−0.5
D
D
Diffusion limits
0
−0.5
−1
−1
Time
Wright-Fisher
diffusion
Time
∞-population
18 / 37
Introduction
Diffusion model
Coalescent model
Summary
1
1
0.5
0.5
0.5
0
0
0
?
−0.5
D
D
1
D
D
Diffusion limits
−0.5
−1
−0.5
−1
Time
Wright-Fisher
diffusion
−1
Time
Intermediate
limit?
Time
∞-population
18 / 37
Introduction
Diffusion model
Coalescent model
Summary
Summary so far
If
(N)
(N)
(N) (N)
(N)
M (N) = (Xi· ), (X·j ), (Dij = Xij − Xi· X·j )
then
n
o
d
M (N) → M := ((Xi· (0)), (X·j (0)), (Dij (0)e−ρβ t/2 )0 : t ≥ 0 ,
as N → ∞.
This is a law-of-large-numbers result.
(Baake & Herms, 2008)
19 / 37
Introduction
Diffusion model
Coalescent model
Summary
Summary so far
If
(N)
(N)
(N) (N)
(N)
M (N) = (Xi· ), (X·j ), (Dij = Xij − Xi· X·j )
then
n
o
d
M (N) → M := ((Xi· (0)), (X·j (0)), (Dij (0)e−ρβ t/2 )0 : t ≥ 0 ,
as N → ∞.
This is a law-of-large-numbers result.
(Baake & Herms, 2008)
We really want a central limit theorem.
19 / 37
Introduction
Diffusion model
Coalescent model
Summary
Summary so far
If
(N)
(N)
(N) (N)
(N)
M (N) = (Xi· ), (X·j ), (Dij = Xij − Xi· X·j )
then
n
o
d
M (N) → M := ((Xi· (0)), (X·j (0)), (Dij (0)e−ρβ t/2 )0 : t ≥ 0 ,
as N → ∞.
This is a law-of-large-numbers result.
(Baake & Herms, 2008)
We really want a central limit theorem.
So we should be asking: what is the diffusion limit of
U (N) (t) := N (1−β)/2 [M (N) (t) − M(t)]?
19 / 37
Introduction
Diffusion model
Coalescent model
Summary
CLTs for density-dependent population processes
Theorem [Ethier & Kurtz, 1986, Ch. 11; Kang et al., 2014]
Suppose that U (N) (0) → U(0) as N → ∞, and M(t) the solution to
dM(t)
= w (M(t))
dt
exists, for some w .
20 / 37
Introduction
Diffusion model
Coalescent model
Summary
CLTs for density-dependent population processes
Theorem [Ethier & Kurtz, 1986, Ch. 11; Kang et al., 2014]
Suppose that U (N) (0) → U(0) as N → ∞, and M(t) the solution to
dM(t)
= w (M(t))
dt
exists, for some w . Then [under some regularity conditions]
20 / 37
Introduction
Diffusion model
Coalescent model
Summary
CLTs for density-dependent population processes
Theorem [Ethier & Kurtz, 1986, Ch. 11; Kang et al., 2014]
Suppose that U (N) (0) → U(0) as N → ∞, and M(t) the solution to
dM(t)
= w (M(t))
dt
exists, for some w . Then [under some regularity conditions]
d
sup |M (N) (s) − M(s)| → 0,
s≤t
20 / 37
Introduction
Diffusion model
Coalescent model
Summary
CLTs for density-dependent population processes
Theorem [Ethier & Kurtz, 1986, Ch. 11; Kang et al., 2014]
Suppose that U (N) (0) → U(0) as N → ∞, and M(t) the solution to
dM(t)
= w (M(t))
dt
exists, for some w . Then [under some regularity conditions]
d
sup |M (N) (s) − M(s)| → 0,
s≤t
(N) d
→ U, where
Z t
Z t
U(t) = U(0) +
[∇w (M(s))]U(s)ds +
σ(M(s))dW (s),
and U
0
0
and σ is such that
N
1−β
[M
(N)
Z
]t −
t
d
σ(M (N) (s))σ(M (N) (s))0 ds → 0.
0
20 / 37
Introduction
Diffusion model
Coalescent model
Summary
Main aim
Find the diffusion limit of U (N) (t) = N (1−β)/2 [M (N) (t) − M(t)].
21 / 37
Introduction
Diffusion model
Coalescent model
Summary
Main aim
Find the diffusion limit of U (N) (t) = N (1−β)/2 [M (N) (t) − M(t)].
Goals
1
Identify w , which supplies the drift part of U.
21 / 37
Introduction
Diffusion model
Coalescent model
Summary
Main aim
Find the diffusion limit of U (N) (t) = N (1−β)/2 [M (N) (t) − M(t)].
Goals
1
Identify w , which supplies the drift part of U.
2
Identify σ, which supplies the diffusion part of U.
21 / 37
Introduction
Diffusion model
Coalescent model
Summary
Main aim
Find the diffusion limit of U (N) (t) = N (1−β)/2 [M (N) (t) − M(t)].
Goals
1
Identify w , which supplies the drift part of U.
2
Identify σ, which supplies the diffusion part of U.
3
[Check regularity requirements.]
21 / 37
Introduction
Diffusion model
Coalescent model
Summary
Main aim
Find the diffusion limit of U (N) (t) = N (1−β)/2 [M (N) (t) − M(t)].
Goals
1
Identify w , which supplies the drift part of U.
2
Identify σ, which supplies the diffusion part of U.
3
[Check regularity requirements.]
21 / 37
Introduction
Diffusion model
Coalescent model
Summary
Main aim
Find the diffusion limit of U (N) (t) = N (1−β)/2 [M (N) (t) − M(t)].
Goals
1
Identify w , which supplies the drift part of U.
2
Identify σ, which supplies the diffusion part of U.
3
[Check regularity requirements.]
Sketch proof.
Recall:
So:
E[∆Xi· | X ] = o(dt),
E[∆X·j | X ] = o(dt),
h ρ
i
β
E[∆Dij | X ] = − Dij dt + o(dt)
2
ρβ 0
Drift of M :
w (M) = 0, 0, − D
2
ρβ 0
(N)
(N)
Drift of M
:
w (M) = 0, 0, − D + O(N β−1 )
2
21 / 37
Introduction
Diffusion model
Coalescent model
Summary
Main aim
Find the diffusion limit of U (N) (t) = N (1−β)/2 [M (N) (t) − M(t)].
Sketch proof (cont.).
Consider: U (N) (t) = N (1−β)/2
h
,
22 / 37
Introduction
Diffusion model
Coalescent model
Summary
Main aim
Find the diffusion limit of U (N) (t) = N (1−β)/2 [M (N) (t) − M(t)].
Sketch proof (cont.).
h
Consider: U (N) (t) = N (1−β)/2 [M (N) (0) − M(0)]
,
22 / 37
Introduction
Diffusion model
Coalescent model
Summary
Main aim
Find the diffusion limit of U (N) (t) = N (1−β)/2 [M (N) (t) − M(t)].
Sketch proof (cont.).
h
Consider: U (N) (t) = N (1−β)/2 [M (N) (0) − M(0)]
Z t
+
[w (N) (M (N) (s)) − w (M(s))]ds
,
0
22 / 37
Introduction
Diffusion model
Coalescent model
Summary
Main aim
Find the diffusion limit of U (N) (t) = N (1−β)/2 [M (N) (t) − M(t)].
Sketch proof (cont.).
h
Consider: U (N) (t) = N (1−β)/2 [M (N) (0) − M(0)]
Z t
(N)
(N)
(N)
+
[w (M (s)) − w (M(s))]ds + R (t) ,
0
where R
(N)
(t) := M
(N)
(t) − M
(N)
Z
(0) −
t
w (N) (M (N) (s))ds.
0
22 / 37
Introduction
Diffusion model
Coalescent model
Summary
Main aim
Find the diffusion limit of U (N) (t) = N (1−β)/2 [M (N) (t) − M(t)].
Sketch proof (cont.).
h
Consider: U (N) (t) = N (1−β)/2 [M (N) (0) − M(0)]
Z t
(N)
(N)
(N)
+
[w (M (s)) − w (M(s))]ds + R (t) ,
0
where R
(N)
(t) := M
(N)
(t) − M
(N)
Z
t
(0) −
w (N) (M (N) (s))ds.
0
1st term
We assumed U (N) (0) → U(0) as N → ∞.
22 / 37
Introduction
Diffusion model
Coalescent model
Summary
Main aim
Find the diffusion limit of U (N) (t) = N (1−β)/2 [M (N) (t) − M(t)].
Sketch proof (cont.).
h
Consider: U (N) (t) = N (1−β)/2 [M (N) (0) − M(0)]
Z t
(N)
(N)
(N)
+
[w (M (s)) − w (M(s))]ds + R (t) ,
0
2nd term
Z t
Z
(N) t
(N)
(N)
(N)
where
R (t) [w
:= M (M(t)
(0)(M(s))]ds
−
w (N) (M (N) (s))ds.
(N) − M
N (1−β)/2
(s))
−
w
3
3
0
0
Z th
ρβ (N)
β−1
(1−β)/2
− [D (s) − D(s)] + O(N
=N
2
0
Z th
i
ρβ (N)
=
− U 3 (s) + O(N (β−1)/2 ) ds
2
0
Z t
ρβ
d
→−
U 3 (s)ds,
N → ∞.
2 0
i
) ds
22 / 37
Introduction
Diffusion model
Coalescent model
Summary
Main aim
Find the diffusion limit of U (N) (t) = N (1−β)/2 [M (N) (t) − M(t)].
Sketch proof (cont.).
h
Consider: U (N) (t) = N (1−β)/2 [M (N) (0) − M(0)]
Z t
(N)
(N)
(N)
+
[w (M (s)) − w (M(s))]ds + R (t) ,
0
where R
(N)
(t) := M
(N)
(t) − M
(N)
Z
(0) −
t
w (N) (M (N) (s))ds.
0
3rd term
“The difference between the evolution of the Moran process and
its expectation.” Key observation: R (N) (t) is a martingale.
Appeal to the martingale CLT to characterise its limit.
In other words: we know σ(M(t)).
22 / 37
Introduction
Diffusion model
Coalescent model
Summary
Main aim
Find the diffusion limit of U (N) (t) = N (1−β)/2 [M (N) (t) − M(t)].
Putting all this together:
Z
Z t
ρβ t
(N)
0
U (t) → U(0) −
(0, 0, 1) ◦ U(s)ds +
σ(M(s))dW (s) .
2 0
0
23 / 37
Introduction
Diffusion model
Coalescent model
Summary
Main aim
Find the diffusion limit of U (N) (t) = N (1−β)/2 [M (N) (t) − M(t)].
Putting all this together:
Z
Z t
ρβ t
(N)
0
U (t) → U(0) −
(0, 0, 1) ◦ U(s)ds +
σ(M(s))dW (s) .
2 0
0
Apart from a (complicated, time-evolving) covariance term, Dij (t)
follows an Ornstein-Uhlenbeck process!
23 / 37
Introduction
Diffusion model
Coalescent model
Summary
Main aim
Find the diffusion limit of U (N) (t) = N (1−β)/2 [M (N) (t) − M(t)].
Putting all this together:
Z
Z t
ρβ t
(N)
0
U (t) → U(0) −
(0, 0, 1) ◦ U(s)ds +
σ(M(s))dW (s) .
2 0
0
Apart from a (complicated, time-evolving) covariance term, Dij (t)
follows an Ornstein-Uhlenbeck process!
1
0.5
D
(N)
(t) ≈ D(0)e
−ρβ t/2
D
Retracing our steps. . .
+N
(β−1)/2
U D (t).
0
−0.5
−1
Time
23 / 37
Introduction
Diffusion model
Coalescent model
Summary
Stationary distribution
Tracing our steps backwards, we can derive an approximate
stationary distribution:
1
D ∼ Normal 0, [Xi· (0)X·j (0)(δik − Xk· (0))(δjl − X·l (0))]ij,kl .
ρ
24 / 37
Introduction
Diffusion model
Coalescent model
Summary
Stationary distribution
Tracing our steps backwards, we can derive an approximate
stationary distribution:
1
D ∼ Normal 0, [Xi· (0)X·j (0)(δik − Xk· (0))(δjl − X·l (0))]ij,kl .
ρ
Sampling distribution
Tracing our steps further,
a sampling distribution:
we can
obtain
Y
Y c
qGaussian (c) = E Xij ij = E (Dij + Xi· X·j )cij = . . . .
i,j
= q0 (c) +
i,j
q1 (c)
+ ....
ρ
Introduction
Diffusion model
Coalescent model
Summary
Stationary distribution
Tracing our steps backwards, we can derive an approximate
stationary distribution:
1
D ∼ Normal 0, [Xi· (0)X·j (0)(δik − Xk· (0))(δjl − X·l (0))]ij,kl .
ρ
Sampling distribution
Tracing our steps further,
we can
obtain
a sampling distribution:
Y c
Y
qGaussian (c) = E Xij ij = E (Dij + Xi· X·j )cij = . . . .
i,j
= q0 (c) +
i,j
q1 (c)
+ ....
ρ
24 / 37
Introduction
Diffusion model
Coalescent model
Summary
Accuracy
“Truth”:
q(c) ≈ q0 (c) +
q1 (c) q2 (c)
qλ (x)
+
+ ... +
,
ρ
ρλ
ρ2
(G)
Gaussian model: q (G) (c) ≈ q0 (c) +
ρ = 100
λ
0
1
2
4
6
Type
of sum
True
Gaussian
True
Gaussian
True
Gaussian
True
Gaussian
True
Gaussian
Φ(1)
0.50
0.50
0.74
0.74
0.95
0.64
1.00
0.64
1.00
0.64
Φ(10)
0.72
0.72
0.95
0.95
1.00
0.99
1.00
0.99
1.00
0.99
(G)
q (x)
q1 (c) q2 (c)
+
+ ... + λ λ .
2
ρ
ρ
ρ
ρ = 200
Φ(100)
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
Φ(1)
0.54
0.54
0.90
0.90
1.00
0.85
1.00
0.83
1.00
0.83
Φ(10)
0.95
0.95
0.99
0.99
1.00
1.00
1.00
1.00
1.00
1.00
Φ(100)
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
25 / 37
Introduction
Diffusion model
Coalescent model
Summary
Accuracy
“Truth”:
q(c) ≈ q0 (c) +
q1 (c) q2 (c)
qλ (x)
+
+ ... +
,
ρ
ρλ
ρ2
(G)
Gaussian model: q (G) (c) ≈ q0 (c) +
ρ = 25
λ
0
1
2
4
6
Type
of sum
True
Gaussian
True
Gaussian
True
Gaussian
True
Gaussian
True
Gaussian
Φ(1)
0.39
0.39
0.51
0.51
0.59
0.50
0.83
0.51
0.89
0.49
Φ(10)
0.58
0.58
0.75
0.75
0.91
0.73
0.99
0.72
0.99
0.71
(G)
q (x)
q1 (c) q2 (c)
+
+ ... + λ λ .
2
ρ
ρ
ρ
ρ = 50
Φ(100)
1.00
1.00
0.96
0.96
0.97
0.97
1.00
1.00
1.00
0.99
Φ(1)
0.49
0.49
0.59
0.59
0.77
0.50
0.95
0.50
0.99
0.50
Φ(10)
0.63
0.63
0.84
0.84
0.98
0.86
1.00
0.80
1.00
0.79
Φ(100)
1.00
1.00
0.99
0.99
1.00
1.00
1.00
1.00
1.00
1.00
26 / 37
Introduction
Diffusion model
Coalescent model
Summary
Remarks
No dependence on β in these expressions.
27 / 37
Introduction
Diffusion model
Coalescent model
Summary
Remarks
No dependence on β in these expressions.
Reduced a difficult likelihood computation to the moments of a
Normal distribution.
27 / 37
Introduction
Diffusion model
Coalescent model
Summary
Remarks
No dependence on β in these expressions.
Reduced a difficult likelihood computation to the moments of a
Normal distribution.
This strong recombination result complements analogous
results for strong mutation and strong selection
(Feder et al., 2014; Feller, 1951; Norman, 1972, 1975; Kaplan et
al., 1988; Nagylaki, 1986, 1990; Wakeley & Sargsyan, 2009).
27 / 37
Introduction
Diffusion model
Coalescent model
Summary
Remarks
No dependence on β in these expressions.
Reduced a difficult likelihood computation to the moments of a
Normal distribution.
This strong recombination result complements analogous
results for strong mutation and strong selection
(Feder et al., 2014; Feller, 1951; Norman, 1972, 1975; Kaplan et
al., 1988; Nagylaki, 1986, 1990; Wakeley & Sargsyan, 2009).
27 / 37
Introduction
Diffusion model
Coalescent model
Summary
Remarks
No dependence on β in these expressions.
Reduced a difficult likelihood computation to the moments of a
Normal distribution.
This strong recombination result complements analogous
results for strong mutation and strong selection
(Feder et al., 2014; Feller, 1951; Norman, 1972, 1975; Kaplan et
al., 1988; Nagylaki, 1986, 1990; Wakeley & Sargsyan, 2009).
Wright-Fisher model
One could obtain the same diffusion limit starting from a
Wright-Fisher model.
27 / 37
Introduction
Diffusion model
Coalescent model
Summary
Remarks
No dependence on β in these expressions.
Reduced a difficult likelihood computation to the moments of a
Normal distribution.
This strong recombination result complements analogous
results for strong mutation and strong selection
(Feder et al., 2014; Feller, 1951; Norman, 1972, 1975; Kaplan et
al., 1988; Nagylaki, 1986, 1990; Wakeley & Sargsyan, 2009).
Wright-Fisher model
One could obtain the same diffusion limit starting from a
Wright-Fisher model.
CLTs for the Wright-Fisher model have been studied extensively
by Norman (1972, 1975) and Nagylaki (1986, 1990).
27 / 37
Introduction
Diffusion model
Coalescent model
Summary
Remarks
No dependence on β in these expressions.
Reduced a difficult likelihood computation to the moments of a
Normal distribution.
This strong recombination result complements analogous
results for strong mutation and strong selection
(Feder et al., 2014; Feller, 1951; Norman, 1972, 1975; Kaplan et
al., 1988; Nagylaki, 1986, 1990; Wakeley & Sargsyan, 2009).
Wright-Fisher model
One could obtain the same diffusion limit starting from a
Wright-Fisher model.
CLTs for the Wright-Fisher model have been studied extensively
by Norman (1972, 1975) and Nagylaki (1986, 1990).
Additional complication: the Wright-Fisher model in continuous
time is non-Markovian.
27 / 37
Introduction
Diffusion model
Coalescent model
Summary
Remarks
No dependence on β in these expressions.
Reduced a difficult likelihood computation to the moments of a
Normal distribution.
This strong recombination result complements analogous
results for strong mutation and strong selection
(Feder et al., 2014; Feller, 1951; Norman, 1972, 1975; Kaplan et
al., 1988; Nagylaki, 1986, 1990; Wakeley & Sargsyan, 2009).
Wright-Fisher model
One could obtain the same diffusion limit starting from a
Wright-Fisher model.
CLTs for the Wright-Fisher model have been studied extensively
by Norman (1972, 1975) and Nagylaki (1986, 1990).
Additional complication: the Wright-Fisher model in continuous
time is non-Markovian.
Q: Are there simple, general CLTs for non-Markovian
density-dependent population processes?
27 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new coalescent model
Question
Can we give a similar treatment to the ancestral recombination
graph?
1
2
3
4
28 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new coalescent model
Question
Can we give a similar treatment to the ancestral recombination
graph? Yes—via a coupling argument.
1
2
3
4
28 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new coalescent model
Question
Can we give a similar treatment to the ancestral recombination
graph? Yes—via a coupling argument.
Toy example: sample size
c = 4.
Red: Lineages ancestral to
the sample at locus B.
Time
Blue: Lineages ancestral
to the sample at locus A.
1
2
3
4
28 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new coalescent model
Question
Can we give a similar treatment to the ancestral recombination
graph? Yes—via a coupling argument.
Toy example: sample size
c = 4.
Red: Lineages ancestral to
the sample at locus B.
Time
Blue: Lineages ancestral
to the sample at locus A.
1
2
3
4
Reminder: q0 (c)
q0 (c) = q A (c A )q B (c B ) corresponds to unlinked loci (ρ = ∞).
28 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new coalescent model
Question
Can we give a similar treatment to the ancestral recombination
graph? Yes—via a coupling argument.
Toy example: sample size
c = 4.
Red: Lineages ancestral to
the sample at locus B.
Time
Blue: Lineages ancestral
to the sample at locus A.
1
2
3
4
Reminder: q0 (c)
q0 (c) = q A (c A )q B (c B ) corresponds to unlinked loci (ρ = ∞).
28 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new coalescent model
What about q1 (c)?
Consider what happens if we start to reduce ρ down from ∞.
29 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new coalescent model
What about q1 (c)?
Consider what happens if we start to reduce ρ down from ∞.
There is a short delay going backwards
before lineages all recombine apart.
1
2
3
4
29 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new coalescent model
What about q1 (c)?
Consider what happens if we start to reduce ρ down from ∞.
There is a short delay going backwards
before lineages all recombine apart.
Some lineages may recoalesce further
back in time.
1
2
3
4
29 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new coalescent model
What about q1 (c)?
Consider what happens if we start to reduce ρ down from ∞.
There is a short delay going backwards
before lineages all recombine apart.
Some lineages may recoalesce further
back in time.
q1 (c) represents the effects of any single
nontrivial event in the ARG that could
distinguish its sampling distribution from
that of two independent coalescent
trees.
1
2
3
4
29 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new coalescent model
Possible “nontrivial events”
1
A coalescence prior to the first time
all lineages have recombined (T ).
T
1
2
3
4
30 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new coalescent model
Possible “nontrivial events”
1
A coalescence prior to the first time
all lineages have recombined (T ).
2
A coalescence that would have
happened had the marginal trees
been coalescing independently, but
could not have happened in our
ARG before time T . (Call these
“prohibited coalescences”.)
T
1
2
3
4
30 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new coalescent model
Possible “nontrivial events”
1
A coalescence prior to the first time
all lineages have recombined (T ).
2
A coalescence that would have
happened had the marginal trees
been coalescing independently, but
could not have happened in our
ARG before time T . (Call these
“prohibited coalescences”.)
T
1
2
3
4
In fact, these are the only events (or nonevents) of relevance.
30 / 37
Introduction
Diffusion model
Coalescent model
Summary
Trivial event
Another “nontrivial” event?
1
First coalescence:
O(1).
1
2
3
4
31 / 37
Introduction
Diffusion model
Coalescent model
Summary
Trivial event
Another “nontrivial” event?
1
First coalescence:
2
Second coalescence:
O(1).
O(ρ−1 ).
1
2
3
4
31 / 37
Introduction
Diffusion model
Coalescent model
Summary
Trivial event
Another “nontrivial” event?
1
First coalescence:
O(1).
2
Second coalescence:
O(ρ−1 ).
3
Third coalescence:
O(ρ−1 ).
1
2
3
4
31 / 37
Introduction
Diffusion model
Coalescent model
Summary
Trivial event
Another “nontrivial” event?
1
First coalescence:
O(1).
2
Second coalescence:
O(ρ−1 ).
3
Third coalescence:
O(ρ−1 ).
Overall probability of this event is
O(ρ−2 )—i.e. negligible.
1
2
3
4
A coupling between the ARG and a pair of independent
coalescent trees can make these arguments rigorous.
31 / 37
Introduction
Diffusion model
Coalescent model
Summary
Coupling argument (outline)
T
1
2
3
1
4
F1 : Type 1 failure
2
3
F2 : Type 2 failure
4
1
2
3
4
F3 : Type 3 failure
Outline of argument
Show that:
P(F1 ) =
1 c
ρ 2
+O
1
ρ2
,
32 / 37
Introduction
Diffusion model
Coalescent model
Summary
Coupling argument (outline)
T
1
2
3
1
4
F1 : Type 1 failure
2
3
F2 : Type 2 failure
4
1
2
3
4
F3 : Type 3 failure
Outline of argument
Show that:
P(F1 ) =
P(F2 ) =
1 c
ρ 2
c
1
ρ 2
+O
,
+O
,
1
ρ2
1
ρ2
32 / 37
Introduction
Diffusion model
Coalescent model
Summary
Coupling argument (outline)
T
1
2
3
1
4
F1 : Type 1 failure
2
3
F2 : Type 2 failure
4
1
2
3
4
F3 : Type 3 failure
Outline of argument
Show that:
P(F1 ) =
P(F2 ) =
P(F3 ) =
1 c
ρ 2
c
1
ρ 2
c
1
ρ 2
+O
,
+O
,
+O
,
1
ρ2
1
ρ2
1
ρ2
32 / 37
Introduction
Diffusion model
Coalescent model
Summary
Coupling argument (outline)
T
1
2
3
1
4
F1 : Type 1 failure
2
3
4
F2 : Type 2 failure
1
2
3
4
F3 : Type 3 failure
Outline of argument
Show that:
P(F1 ) =
P(F2 ) =
P(F3 ) =
1 c
ρ 2
c
1
ρ 2
c
1
ρ 2
+O
+O
,
+O
,
1
ρ2
1
ρ2
1
ρ2
,
P(Fi ∩ Fj ) = O
1
ρ2
, i 6= j,
32 / 37
Introduction
Diffusion model
Coalescent model
Summary
Coupling argument (outline)
T
1
2
3
1
4
F1 : Type 1 failure
2
3
4
F2 : Type 2 failure
1
2
3
4
F3 : Type 3 failure
Outline of argument
Show that:
P(F1 ) =
P(F2 ) =
P(F3 ) =
1 c
ρ 2
c
1
ρ 2
c
1
ρ 2
+O
1
ρ2
,
P(Fi ∩ Fj ) = O
+O
,
+O
,
P(anyother
type of failure)
1
= O ρ2 .
1
ρ2
1
ρ2
1
ρ2
, i 6= j,
32 / 37
Introduction
Diffusion model
Coalescent model
Summary
Outline of argument (cont.)
q(c; ρ) = P(F1 )q(c | F1 ; ρ) + P(F1{ )q(c | F1{ ; ρ)
33 / 37
Introduction
Diffusion model
Coalescent model
Summary
Outline of argument (cont.)
q(c; ρ) = P(F1 )q(c | F1 ; ρ) + P(F1{ )q(c | F1{ ; ρ)
= P(F1 )q(c | F1 ; ρ) + P(F1{ )q(c | (F2 ∪ F3 ){ ; ∞)
33 / 37
Introduction
Diffusion model
Coalescent model
Summary
Outline of argument (cont.)
q(c; ρ) = P(F1 )q(c | F1 ; ρ) + P(F1{ )q(c | F1{ ; ρ)
= P(F1 )q(c | F1 ; ρ) + P(F1{ )q(c | (F2 ∪ F3 ){ ; ∞)
X cij
2
q(c | F1 ; ρ) =
c q(c − e ij ; ∞),
i,j
2
33 / 37
Introduction
Diffusion model
Coalescent model
Summary
Outline of argument (cont.)
q(c; ρ) = P(F1 )q(c | F1 ; ρ) + P(F1{ )q(c | F1{ ; ρ)
= P(F1 )q(c | F1 ; ρ) + P(F1{ )q(c | (F2 ∪ F3 ){ ; ∞)
X cij
2
q(c | F1 ; ρ) =
c q(c − e ij ; ∞),
i,j
q(c | F2 ; ρ) =
X
i
q(c | F3 ; ρ) =
X
j
2
ci·
2
c q(c A − e i ; ∞)q(c B ; ∞),
2
c·j 2
c q(c A ; ∞)q(c B − e j ; ∞),
2
33 / 37
Introduction
Diffusion model
Coalescent model
Summary
Outline of argument (cont.)
q(c; ρ) = P(F1 )q(c | F1 ; ρ) + P(F1{ )q(c | F1{ ; ρ)
= P(F1 )q(c | F1 ; ρ) + P(F1{ )q(c | (F2 ∪ F3 ){ ; ∞)
X cij
2
q(c | F1 ; ρ) =
c q(c − e ij ; ∞),
i,j
q(c | F2 ; ρ) =
X
i
q(c | F3 ; ρ) =
X
j
2
ci·
2
c q(c A − e i ; ∞)q(c B ; ∞),
2
c·j 2
c q(c A ; ∞)q(c B − e j ; ∞),
2
1
q(c | (F2 ∪ F3 ) ; ∞) =
[q(c; ∞)
1 − P(F2 ) − P(F3 )
− P(F2 )q(c | F2 ; ∞) − P(F3 )q(c | F3 ; ∞)] .
{
33 / 37
Introduction
Diffusion model
Coalescent model
Summary
Theorem.
The sampling distribution of the loose linkage coalescent is
q1 (c)
1
.
q(c) = q0 (c) +
+O
ρ
ρ2
34 / 37
Introduction
Diffusion model
Coalescent model
Summary
Theorem.
The sampling distribution of the loose linkage coalescent is
q1 (c)
1
.
q(c) = q0 (c) +
+O
ρ
ρ2
Explanation for the simple form of q1 (c)
A randomly chosen pair of haplotypes
coalesces before time T
q1 (c) =
z X cij
i,j
2
}|
A
{
B
q (c A − e i )q (c B − e j )
34 / 37
Introduction
Diffusion model
Coalescent model
Summary
Theorem.
The sampling distribution of the loose linkage coalescent is
q1 (c)
1
.
q(c) = q0 (c) +
+O
ρ
ρ2
Explanation for the simple form of q1 (c)
A randomly chosen pair of haplotypes
coalesces before time T
q1 (c) =
z X cij
i,j
2
}|
A
{
B
q (c A − e i )q (c B − e j )
34 / 37
Introduction
Diffusion model
Coalescent model
Summary
Theorem.
The sampling distribution of the loose linkage coalescent is
q1 (c)
1
.
q(c) = q0 (c) +
+O
ρ
ρ2
Explanation for the simple form of q1 (c)
Otherwise, the trees
are independent
A randomly chosen pair of haplotypes
coalesces before time T
q1 (c) =
{
z }|
c A
B
q (c A − e i )q (c B − e j ) +
q (c A )q (c B )
2
2
z X cij
i,j
}|
A
{
B
34 / 37
Introduction
Diffusion model
Coalescent model
Summary
Theorem.
The sampling distribution of the loose linkage coalescent is
q1 (c)
1
.
q(c) = q0 (c) +
+O
ρ
ρ2
Explanation for the simple form of q1 (c)
Otherwise, the trees
are independent
A randomly chosen pair of haplotypes
coalesces before time T
{
z }|
c A
B
q1 (c) =
q (c A − e i )q (c B − e j ) +
q (c A )q (c B )
2
2
i,j
X ci· X c·j − q B (c B )
q A (c A − e i ) − q A (c A )
q B (c B − e j ) .
2
2
i
j
|
{z
}
z X cij
}|
A
{
B
. . . with the restriction that no “prohibited coalescences” occur before time T
34 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new “loose linkage” coalescent
The previous decomposition picks out the important events in
the ARG.
35 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new “loose linkage” coalescent
The previous decomposition picks out the important events in
the ARG.
We can define a new coalescent process which keeps only
these events.
35 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new “loose linkage” coalescent
The previous decomposition picks out the important events in
the ARG.
We can define a new coalescent process which keeps only
these events.
35 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new “loose linkage” coalescent
The previous decomposition picks out the important events in
the ARG.
We can define a new coalescent process which keeps only
these events.
Algorithm: Loose linkage coalescent
1
With probability 1ρ 2c :
Choose a pair (uniformly) from the c haplotypes to coalesce.
Every lineage undergoes recombination until time T , with this
sole coalescence inserted randomly into the sequence of
recombinations.
Simulate the rest as two independent coalescent trees.
35 / 37
Introduction
Diffusion model
Coalescent model
Summary
A new “loose linkage” coalescent
The previous decomposition picks out the important events in
the ARG.
We can define a new coalescent process which keeps only
these events.
Algorithm: Loose linkage coalescent
1
With probability 1ρ 2c :
Choose a pair (uniformly) from the c haplotypes to coalesce.
Every lineage undergoes recombination until time T , with this
sole coalescence inserted randomly into the sequence of
recombinations.
Simulate the rest as two independent coalescent trees.
2
Otherwise:
Simulate from two independent coalescent trees conditioned not
to have any prohibited coalescences before time T , as described
earlier.
35 / 37
Introduction
Diffusion model
Coalescent model
Summary
Summary
1
Both the Wright-Fisher diffusion with recombination and the
ARG possess a deep and regular structure when the
recombination rate increases, which we have described.
36 / 37
Introduction
Diffusion model
Coalescent model
Summary
Summary
1
Both the Wright-Fisher diffusion with recombination and the
ARG possess a deep and regular structure when the
recombination rate increases, which we have described.
2
This structure can be exploited to derive simple approximations
to these models.
36 / 37
Introduction
Diffusion model
Coalescent model
Summary
Summary
1
Both the Wright-Fisher diffusion with recombination and the
ARG possess a deep and regular structure when the
recombination rate increases, which we have described.
2
This structure can be exploited to derive simple approximations
to these models.
3
Our work also provides the first closed-form extension of Ewens
sampling formula for multilocus models.
36 / 37
Introduction
Diffusion model
Coalescent model
Summary
Summary
1
Both the Wright-Fisher diffusion with recombination and the
ARG possess a deep and regular structure when the
recombination rate increases, which we have described.
2
This structure can be exploited to derive simple approximations
to these models.
3
Our work also provides the first closed-form extension of Ewens
sampling formula for multilocus models.
36 / 37
Introduction
Diffusion model
Coalescent model
Summary
Summary
1
Both the Wright-Fisher diffusion with recombination and the
ARG possess a deep and regular structure when the
recombination rate increases, which we have described.
2
This structure can be exploited to derive simple approximations
to these models.
3
Our work also provides the first closed-form extension of Ewens
sampling formula for multilocus models.
Future work
Further generalizations:
More than two loci
Natural selection
36 / 37
Introduction
Diffusion model
Coalescent model
Summary
Summary
1
Both the Wright-Fisher diffusion with recombination and the
ARG possess a deep and regular structure when the
recombination rate increases, which we have described.
2
This structure can be exploited to derive simple approximations
to these models.
3
Our work also provides the first closed-form extension of Ewens
sampling formula for multilocus models.
Future work
Further generalizations:
More than two loci
Natural selection
Better tools:
Duality between the two models?
“Separation of timescales” (cf. Möhle, 1998)
36 / 37
Introduction
Diffusion model
Coalescent model
Summary
References
Jenkins, P.A., Fearnhead, P., and Song, Y.S. (2015). “Tractable
stochastic models of evolution for weakly correlated loci.” Electronic
Journal of Probability, 20 (58): 1–26.
Jenkins, P.A. and Song, Y.S. (2012). “Padé approximants and exact
two-locus sampling distributions.” Ann. Appl. Prob., 22(2): 576–607.
Acknowledgements
Discussions with Song lab, Bob Griffiths, Charles Langley, Ben Peter,
John Pool, Nadia Singh
Simons Institute for the Theory of Computing
Isaac Newton Institute
Research supported in part by EPSRC (PAJ, PF), NIH (PAJ, YSS), Alfred
P. Sloan Research Fellowship (YSS), and a Packard Fellowship for
Science and Engineering (YSS).
37 / 37
Appendix
Covariances of the Moran model
lim (dt)−1 E[∆M (N) (τ) | M (N) (τ) = m]
dt→0
= N β−1 lim (dτ)−1 E[∆M (N) (τ) | M (N) (τ) = m] =: w (N) (m),
dτ→0
lim (dt)−1 cov[∆M (N) (τ) | M(τ) = m]
dt→0
β−1
=N
lim (dτ)−1 cov[∆M (N) (τ) | M (N) (τ) = m] =: N β−1 s(N) (m),
dτ→0
Thus, with m = (x1 , . . . , xK , y1 , . . . , yL , d11 , . . . , dKL ), we have
where
w (N) (m) = w (m) + O(N β−1 ),
0
ρβ
ρβ
w (m) = 0, . . . 0, 0, . . . 0, − d11 , . . . , − dKL ,
2 }
{z
| {z } | {z } | 2
K
L
K ×L
37 / 37
Appendix
Covariances of the Moran model (II)
s(N) (m) = s(m) + O(N −β ) is determined in a similar fashion:
sXX (m) sXY (m) sXD (m)
s(m) = sXY (m) sYY (m) sYD (m) ,
sXD (m) sYD (m) sDD (m)
where
[sXX (m)]ik = xi (δik − xk ),
[sYY (m)]jl = yj (δjl − yl ),
[sXY (m)]ij = dij ,
[sXD (m)]i,kl = dkl (δik − xi ) − xk dil ,
[sYD (m)]j,kl = dkl (δjl − yj ) − yl dkj ,
[sDD (m)]ij,kl = xi yj (δik − xk )(δjl − yl ) + dkj xi yl + dil xk yj
+ dij (xk yl − δik yl − δjl xk )
+ dkl (xi yj − δik yj − δjl xi ) + dij (δik δjl − dkl ).
37 / 37
© Copyright 2026 Paperzz