Stationary Distribution of the Linkage Disequilibrium
Coefficient r2
Wei Zhang, Jing Liu, Rachel Fewster and Jesse Goodman
Department of Statistics, The University of Auckland
December 1, 2015
Overview
1
Introduction
2
Background
3
Methodology
4
Results
5
Acknowledgements
6
References
Introduction
1
Linkage disequilibrium (LD) indicates the statistical dependence of
alleles at different loci in population genetics.
Introduction
December 1, 2015
3 / 35
Introduction
1
Linkage disequilibrium (LD) indicates the statistical dependence of
alleles at different loci in population genetics.
LD can be used to find genetic markers that might be linked to genes for
diseases and understand the evolutionary history.
Introduction
December 1, 2015
3 / 35
Introduction
1
2
Linkage disequilibrium (LD) indicates the statistical dependence of
alleles at different loci in population genetics.
r2 is a quantitative measure of LD and we are aiming to find its
stationary distribution under models for genetic drift.
Introduction
December 1, 2015
4 / 35
Introduction
1
2
3
Linkage disequilibrium (LD) indicates the statistical dependence of
alleles at different loci in population genetics.
r2 is a quantitative measure of LD and we are aiming to find its
stationary distribution under models for genetic drift.
Given some moments of the unknown distribution, the maximum
entropy (Maxent) principle can be used to approximate the density
function of r2 .
Introduction
December 1, 2015
4 / 35
Introduction
1
2
Linkage disequilibrium (LD) indicates the statistical dependence of
alleles at different loci in population genetics.
r2 is a quantitative measure of LD and we are aiming to find its
stationary distribution under models for genetic drift.
3
Given some moments of the unknown distribution, the maximum
entropy (Maxent) principle can be used to approximate the density
function of r2 .
4
The diffusion approximation is a powerful tool to compute certain
expectations at stationarity.
Introduction
December 1, 2015
4 / 35
Background
The TLD model (Liu, 2012)
Diffusion approximation
Maximum entropy principle
Background
December 1, 2015
5 / 35
Some terminologies in genetics
1
One locus is a position of a gene or significant DNA sequence on a
chromosome.
2
An allele is a variant form of a gene.
3
Diploid describes a cell or an organism that has paired chromosomes,
one from each parent.
4
A mutation is a permanent change in the DNA sequence.
5
Recombination is the production of offspring with combinations of
traits that differ from those found in either parent.
Background
The TLD model
December 1, 2015
6 / 35
Some terminologies in genetics
Background
The TLD model
December 1, 2015
7 / 35
Some terminologies in genetics
Background
The TLD model
December 1, 2015
8 / 35
The TLD model
TLD is short for ’two-locus diallelic model
with mutation and recombination’.
Background
The TLD model
December 1, 2015
9 / 35
The TLD model
TLD is short for ’two-locus diallelic model
with mutation and recombination’.
Model assumptions and notations:
The two possible alleles on each of the two loci are assumed to be
A1 , A2 and B1 , B2 , thus the four possible types of gamete are A1 B1 ,
A1 B2 , A2 B1 and A2 B2 .
Recombination rate C: Ai Bj + Am Bn ⇒ Ai Bn /Am Bj .
Equal mutation rate µ for both loci: A1 A2 and B1 B2 .
Background
The TLD model
December 1, 2015
9 / 35
The TLD model
Background
The TLD model
December 1, 2015
10 / 35
The TLD model
Table 1: The proportions of gametes in generation T and the expected proportions
in generation T + 1 in the population. N is the population size.
Generation
Background
Gamete
A1 B 1
A1 B 2
A2 B 1
A2 B2
T
x1
2N
x2
2N
x3
2N
x4
2N
T +1
φ1
φ2
φ3
φ4
The TLD model
December 1, 2015
11 / 35
The TLD model
Suppose
D(x) =
x1 x4
x2 x3
−
.
2N 2N
2N 2N
We have
x
x1
2
(1 − µ)2 +
2N
2N
x
x2
1
φ2 (x) =
(1 − µ)2 +
2N
2N
x
x3
1
φ3 (x) =
(1 − µ)2 +
2N
2N
x
x4
2
(1 − µ)2 +
φ4 (x) =
2N
2N
φ1 (x) =
Background
x3 x4 2
µ(1 − µ) +
µ − CD(x)(1 − 2µ)2
2N
2N
x4 x3 2
+
µ(1 − µ) +
µ + CD(x)(1 − 2µ)2
2N
2N
x4 x2 2
+
µ(1 − µ) +
µ + CD(x)(1 − 2µ)2
2N
2N
x3 x1 2
+
µ − CD(x)(1 − 2µ)2 .
µ(1 − µ) +
2N
2N
+
The TLD model
December 1, 2015
12 / 35
The TLD model
The transition probability of going from x = (x1 , x2 , x3 , x4 ) to
y = (y1 , y2 , y3 , y4 ) is:
pxy = P (y|x)
(2N )!
(φ1 (x))y1 (φ2 (x))y2 (φ3 (x))y3 (φ4 (x))y4 .
=
y1 !y2 !y3 !y4 !
Background
The TLD model
December 1, 2015
13 / 35
The TLD model
The transition probability of going from x = (x1 , x2 , x3 , x4 ) to
y = (y1 , y2 , y3 , y4 ) is:
pxy = P (y|x)
(2N )!
(φ1 (x))y1 (φ2 (x))y2 (φ3 (x))y3 (φ4 (x))y4 .
=
y1 !y2 !y3 !y4 !
The TLD model is an irreducible aperiodic Markov chain, thus there exists
a unique stationary distribution.
Background
The TLD model
December 1, 2015
13 / 35
Diffusion approximation
The main idea is to rescale the discrete state space and time space by
a factor related to the population size N , so that the gap between two
successive states in the new space is infinitesimal when N is large enough.
Background
Diffusion approximation
December 1, 2015
14 / 35
Diffusion approximation
The main idea is to rescale the discrete state space and time space by
a factor related to the population size N , so that the gap between two
successive states in the new space is infinitesimal when N is large enough.
Example
(2N )−1
1
2
State space {0, 1, 2, · · · , 2N } −−−−→ {0, 2N
, 2N
, · · · , 1}.
(2N )−1
Time space {0, 1, 2, · · · } −−−−→ {0δt, 1δt, 2δt, · · · }, where δt =
Background
Diffusion approximation
December 1, 2015
1
2N .
14 / 35
Diffusion approximation
When N is large enough, the new chain is approximately continuous. In the
derivation process, Taylor series expansion and the definition of derivative
are used to get two important results for the TLD model.
f 0 (x) = lim
∆x→0
Background
f (x+∆x)−f (x)
∆x
Diffusion approximation
December 1, 2015
15 / 35
The diffusion operator and master equation
The diffusion operator for the TLD model is
∂2
1
∂2
1
∂2
1
L = p (1 − p) 2 + q (1 − q) 2 +
p (1 − p) q (1 − q) + D (1 − 2p) (1 − 2q) − D2
2
∂p
2
∂q
2
∂D2
∂2
∂2
∂2
θ
∂
θ
∂
+ D (1 − 2p)
+ D (1 − 2q)
+ (1 − 2p)
+ (1 − 2q)
∂p∂q
∂p∂D
∂q∂D
4
∂p
4
∂q
∂
ρ
−D 1+ +θ
2
∂D
+D
and the master equation at stationarity is E {Lf (p, q, D)} = 0.
The master equation means the expected evolution over time of any nice
function of p, q and D is zero at stationarity.
Background
Diffusion approximation
December 1, 2015
16 / 35
The diffusion operator and master equation
The diffusion operator for the TLD model is
∂2
1
∂2
1
∂2
1
L = p (1 − p) 2 + q (1 − q) 2 +
p (1 − p) q (1 − q) + D (1 − 2p) (1 − 2q) − D2
2
∂p
2
∂q
2
∂D2
∂2
∂2
θ
∂
θ
∂
∂2
+ D (1 − 2p)
+ D (1 − 2q)
+ (1 − 2p)
+ (1 − 2q)
∂p∂q
∂p∂D
∂q∂D
4
∂p
4
∂q
∂
ρ
−D 1+ +θ
2
∂D
+D
and the master equation at stationarity is E {Lf (p, q, D)} = 0.
Here p and q are the frequencies of A1 and B1 , D = f11 − pq, f11 is
the frequency of gamete A1 B1 , ρ = 2N C, θ = 2N µ and f is any twice
continuously differentiable function with compact support.
Background
Diffusion approximation
December 1, 2015
17 / 35
A simple example
If letting f (p, q, D) = D, we can get that
ρ
Lf (p, q, D) = −D 1 + + θ
2
and
ρ
E {Lf (p, q, D)} = − 1 + + θ E (D) = 0
2
so
E (D) = 0.
Background
Diffusion approximation
December 1, 2015
18 / 35
Maximum entropy principle
Definition
Entropy is the quantitative measure of disorder in a system.
Suppose a random variable Z has K possible outcomes with probabilities
p1 , p2 , p3 , · · · , pK , the entropy is:
I (Z) = −
K
X
pi logK (pi ) .
i=1
Background
Maximum entropy principle
December 1, 2015
19 / 35
Maximum entropy principle
Definition
Entropy is the quantitative measure of disorder in a system.
Suppose a random variable Z has K possible outcomes with probabilities
p1 , p2 , p3 , · · · , pK , the entropy is:
I (Z) = −
K
X
pi logK (pi ) .
i=1
The maximum entropy principle states that the solution that maximises the
entropy is the most honest one.
Background
Maximum entropy principle
December 1, 2015
19 / 35
An example
In a coin toss experiment, suppose Pr(head)=p and Pr(tail)=1 − p, then
the entropy is:
I = −p log2 (p) − (1 − p) log2 (1 − p).
Background
Maximum entropy principle
December 1, 2015
20 / 35
An example
1.0
Relationship of the entropy and p
(0.5, 1.0)
0.6
0.2
0.4
Entropy
0.8
●
0.0
0.2
0.4
0.6
0.8
1.0
p
Background
Maximum entropy principle
December 1, 2015
21 / 35
Maximum entropy principle
The Maxent solution of an unknown probability
density function π (p) given
knowledge of n moments mi = E pi , i = 1, · · · , n is the solution π
en (p)
that maximizes:
Z
I=− π
en (p) log {e
πn (p)} dp
Ω
subject to
Z
mi =
Ω
pi π
en (p) dp for i = 0, 1, · · · , n
where Ω is the support of π.
Background
Maximum entropy principle
December 1, 2015
22 / 35
Maximum entropy principle
Considering the Lagrange function and Euler-Lagrange equation, then the
Maxent solution is:
π
en (p) = exp λ0 + λ1 p + λ2 p2 + · · · + λn pn
where λi , i = 0, · · · , n are the solutions of
(Z
n
X
2
n
exp λ0 + λ1 p + λ2 p + · · · + λn p dp −
λi mi
arg min
λ
Background
Ω
)
.
i=0
Maximum entropy principle
December 1, 2015
23 / 35
Linkage disequilibrium coefficient r2
The definition of r2 is:
r2 =
D2
p (1 − p) q (1 − q)
where p and q are the frequencies of A1 and B1 , D = f11 − pq and f11 is
the frequency of gamete A1 B1 .
Methodology
Analytic computation of the moments
December 1, 2015
24 / 35
Reformulation of the problem
Let u = 1 − 2p and v = 1 − 2q. Then the diffusion generator can be
rewritten as
L=
2
1
∂
1 − u2 1 − v 2 + Duv − D2
16
∂D2
∂2
∂2
∂2
1
∂
1
∂
1
∂
+ 4D
− 2Du
− 2Dv
− θu
− θv
−D 1+ ρ+θ
.
∂u∂v
∂D∂u
∂D∂v
2 ∂u
2 ∂v
2
∂D
∂2
∂2
1
1
1
1 − u2
+
1 − v2
+
2
∂u2
2
∂v 2
2
This reparameterization yields
r2 =
Methodology
D2
16D2
=
.
p (1 − p) q (1 − q)
(1 − u2 ) (1 − v 2 )
Analytic computation of the moments
December 1, 2015
25 / 35
Analytic computation of the moments
Note that when 0 ≤ u2 , v 2 < 1:
∞
X
1
=
u2k
1 − u2
∞
and
k=0
X
1
=
v 2l .
1 − v2
l=0
Considering the new form of r2 , it follows that when M = 1, 2 · · ·
∞
∞
X
X
E r2M = 16M
···
k1 =0
∞
X
kM =0 l1 =0
···
∞
X
n
o
E D2M u2(k1 +k2 +···+kM ) v 2(l1 +l2 +···+lM ) ,
lM =0
which can be simplified to
∞ X
∞ X
K +M −1 L+M −1
E r2M = 16M
E D2M u2K v 2L .
M −1
M −1
K=0 L=0
Methodology
Analytic computation of the moments
December 1, 2015
26 / 35
Analytic computation of the moments
Our problem now is how to compute E D2M u2K v 2L for all possible M ,
K and L.
Step 1:
Let f in the master equation be
some specific forms of un , uv, u2 v
and Dun , we can get the results
of E (un ) , E (uv) , E u2 v etc.
Step 2:
Given a value of (m, n), apply the
function f = Dk um+2−k v n+2−k
into the master equation with k ∈
{0, 1, 2, · · · , n + 2}.
A system of n + 3 linear equations is generated, whose solutions are:
E um+2 v n+2 , E Dum+1 v n+1 , E D2 um v n · · ·
Methodology
Analytic computation of the moments
December 1, 2015
27 / 35
Analytic computation of the moments
Given a M ∈ N and a truncation level `max ∈ N, we use
E r
2M
`max
2K+2L=`
X max K + M − 1L + M − 1
= 16M
E D2M u2K v 2L
M −1
M −1
K,L≥0
to approximate the M -th moment of r2 .
Methodology
Analytic computation of the moments
December 1, 2015
28 / 35
Variance of r2
Table 2: V r2 computed by our method (`max = 700)
ρ
θ
0.0125
0.0500
0.1000
0.2500
0.7500
1.2500
···
0.00
0.006119
0.03109
0.03806
0.02433
0.00685
0.003239
···
0.25
0.003412
0.01950
0.02614
0.01891
0.00608
0.002928
···
0.50
0.002271
0.01372
0.01932
0.01517
0.00538
0.002739
···
1.25
0.001062
0.00666
0.00991
0.00892
0.00403
0.002197
···
2.50
0.000526
0.00327
0.00487
0.00476
0.00263
0.001590
···
5.00
0.000244
0.00142
0.00206
0.00202
0.00141
0.000961
···
Results
Variance of r 2
December 1, 2015
29 / 35
Variance of r2
●
●
●
●
●
●
●
●
Liu (2012)
Analytic method
●
●
V(r2)
0.015
●
●
●
●
●
●
●
V(r2)
0.003
●
Liu (2012)
Analytic method
Variance of r2 with ρ=2.50
0.005
0.025
Variance of r2 with ρ=0.25
●●
●
●●
●
●
●
●
0.005
●
●
●
0.0
●
0.2
0.4
0.6
θ
0.8
1.0
0.001
●
●
●
1.2
●
●
●
●
●
0.0
0.2
0.4
0.6
θ
0.8
1.0
1.2
Figure 1: Comparison of V r2 between our analytic method and the method in
Liu(2012)
Results
Variance of r 2
December 1, 2015
30 / 35
Probability density function of r2
We can compute 50 moments in 2.5 hours with `max =
2000 on a laptop.
2
Use n = 18 moments to calculate Maxent π̃ r .
Compare moments 19, 20, . . . , 50 using π̃ r2 vs analytic method.
Maxent density of r2 using moments up to order n
Analytic moments compared with the results of Maxent
( θ = 0.125 ; ρ = 5 ) ( No.Nodes = 1000 )
( θ = 0.125 ; ρ = 5 ) ( No.Nodes = 1000 ; No.Moments = 18 )
●
300
n = 18
Analytic
Maxent
●
●
●
●
200
2.5e−06
250
●
●
●
●
●
100
~
πn(r2)
150
Moment
1.5e−06
●
●
●
●
0
50
5.0e−07
●
0.00
0.02
Results
0.04
r2
0.06
0.08
0.10
20
Probability distribution of r 2
25
●
●
●
●
●
●●
●●
●●
●●●
●●●
●●
30
35
40
Order of moment
45
December 1, 2015
50
31 / 35
Probability density function of r2
8000
( θ = 0.0125 ; ρ = 0 ) ( No.Nodes = 1000 )
n = 16
Analytic moments compared with the results of Maxent
0.0042
Maxent density of r2 using moments up to order n
( θ = 0.0125 ; ρ = 0 ) ( No.Nodes = 1000 ; No.Moments = 16 )
●
Analytic
Maxent
●
●
●
●
●
6000
0.0040
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
0.0036
2000
~
πn(r2)
4000
Moment
0.0038
●
0e+00
2e−04
4e−04
r2
6e−04
8e−04
1e−03
20
25
30
35
40
Order of moment
45
●
●
●●
●●
●
50
Figure 2: Stationary density functions of r2 for two pairs of θ and ρ approximated
by the numerical univariate Maxent method.
Results
Probability distribution of r 2
December 1, 2015
32 / 35
Acknowledgements
I would like to sincerely and gratefully thank my supervisor Rachel Fewster
for her guidance, Dr. Jesse Goodman and Dr. Jing Liu for their useful ideas.
Acknowledgements
December 1, 2015
33 / 35
References
Y. S. Song and J. S. Song, “Analytic computation of the expectation of the linkage
disequilibrium coefficient r2 ,” Theoretical population biology, vol. 71, no. 1, pp. 49–
60, 2007.
J. Liu, Reconstruction of probability distributions in population genetics.
PhD thesis, The University of Auckland, 2012.
X. Wu, “Calculation of maximum entropy densities with application to income distribution,” Journal of Econometrics, vol. 115, no. 2, pp. 347–354, 2003.
M. Slatkin, “Linkage disequilibrium–understanding the evolutionary past and mapping the medical future,” Nature Reviews Genetics, vol. 9, no. 6, pp. 477–485, 2008.
W. G. Hill and A. Robertson, “Linkage disequilibrium in finite populations,” Theoretical and Applied Genetics, vol. 38, no. 6, pp. 226–231, 1968.
L. R. Mead and N. Papanicolaou, “Maximum entropy in the problem of moments,”
Journal of Mathematical Physics, vol. 25, no. 8, pp. 2404–2417, 1984.
References
December 1, 2015
34 / 35
Thanks!
© Copyright 2026 Paperzz