Eigen model with correlated multiple mutations and solution of error

Typeset with jpsj2.cls <ver.1.2.2b>
Full Paper
Eigen model with correlated multiple mutations and solution of error
catastrophe paradox in the origin of life
Zara Kirakosyan1 , David B. Saakian1,2,3 and Chin-Kun Hu2 ∗
1
Yerevan Physic Institute, Alikhanian Brothers St. 2, Yerevan 375036, Armenia
2
3
Institute of Physics, Academia Sinica, Nankang, Taipei 11529, Taiwan
National Center for Theoretical Sciences:Physics Division, National Taiwan University, Taipei
10617, Taiwan
We consider the Eigen model with non-Poisson distribution of mutation number. We apply
the Hamilton-Jacobi equation method to solve the model and calculate the mean fitness. We
find that the error threshold depends on the correlation and the suggested mechanism may
give a simple solution of error catastrophe paradox in the origin of life.
KEYWORDS: Eigen model, origin of life, correlated multiple mutations, fitness function
1. Introduction
In recent decades, ideas and methods of physics have been widely used to study biological
problems from molecular level, e.g. the structure and folding of proteins,1–4 structure and
thermal properties of DNA and RNA,5–8 molecular models of biological evolution and the
origin of life,9–21 etc. In this paper, we will address an interesting problem related to the
origin of life.
In 1971, Eigen proposed an autocatalytic reaction model for the origin of life. 10, 11 The
model describes the self-replication of the molecules having some L monomers (letters). There
is some mutation probability 1 − q for any letter, when the given letter changes into another
letter in the set of alphabets. If the replication rate of some master or main sequence is
A times higher than the replication rate (fitness) of other sequences, then the correct selfreplication is possible when the mutation probability is less than a threshold value. Otherwise,
the sequence with the high replication rate will be lost. If one takes the typical experimental
mutation probability for RNA molecules,12, 14 then the maximal allowed length is about 1001000, while the length of the genome to posses a minimal set of genes15 was estimated to
be about 8000-20000.16 This is the famous error catastrophe paradox in the origin of life.
There have been many attempts to solve the paradox (see16, 21 and references there). Recently
Rajamani et al.18 proposed to solve the paradox through the slowing of replication processes
due to mutations and we proposed a mechanism with the lethal mutation and truncated
fitness function.20, 21 Besides the origin of life problem, the error catastrophe phenomenon is
∗
E-mail: [email protected]
1/10
J. Phys. Soc. Jpn.
Full Paper
important for RNA viruses which evolve near the error threshold.22
Most traditional biological evolution models use constant fitness landscapes and simple,
independent distributed mutations. Nowadays, it is well recognized that the further progress
in biological evolution is related to the dynamic fitness landscape and controlled or correlated
multiple mutations. It is the mainstream of evolution research. In the present paper, we investigate the error threshold phenomenon in case of high mutation rates, considering the general
probability distribution instead of the Poisson distribution for multiple mutations. In the later
case, multiple mutations are independent while in the former case multiple mutations are correlated. We applied Hamilton-Jacobi equation method23–25 to solve the model and calculate
the mean fitness. The error threshold could be changed due to considered correlations. The
suggested mechanism can give a simple solution of error catastrophe paradox in the origin of
life without the lethal mutation and a truncated fitness landscape.20, 21
In the rest of this section, we briefly review the Eigen model10, 11 with random multiple
mutations and its analytic solution. We also define related notations used in this paper.
In the Eigen model,10, 11 there are some types of monomers. For the sake of simplicity, we
consider two types of monomers, ±1 as in references.10, 11 There are M ≡ 2L different types
(sequences) of molecules, 0 ≤ i ≤ M − 1, and we choose the 0-th (peak)sequence with all +1
l
monomers. We denote by Si ≡ s1i , . . . , sL
i the i-th genome, where si describes the type of the
monomer at the l-th position. We denote the probability for the appearance of S i at time t
by pSi ≡ pi (t). The i-th sequence has a fitness ri .
The mutation matrix’ element Qij is the probability that an offspring produced by state
j changes to state i, and the evolution is given by the set of equations for M probabilities p i
M
−1
M
−1
X
X
dpi
=
(Qij )rj pj − pi (
rj pj ).
dt
j=0
Here pi satisfies the constraint
PM −1
i=0
(1)
j=0
pi = 1, and
Qij = q L−d(i,j) (1 − q)d(i,j) ,
with q being the mean nucleotide incorporation fidelity, and d(i, j) ≡ (L −
(2)
PL
l l
l=1 si sj )/2
being
the Hamming distance between Si and Sj .10, 11 The parameter γ = L(1 − q) describes the
P
mutation efficiency. The second term in Eq.(1) ensures the balance condition j pj = 1. It
describes the dilution of population in chemical reactor.
We define the Hamming distance d(i, 0) of the sequence Si from the peak sequence S0
through the parameter m
d(i, 0) = L
1−m
.
2
(3)
We assume that ri is a function of the Hamming distance d(0, i) or m
ri ≡ f (m)
2/10
(4)
J. Phys. Soc. Jpn.
Full Paper
where m is given by Eq.(3). We have chosen the indices i of the first L + 1 sequences to have
d(i, 0) = i, 0 ≤ i ≤ L.
On the basis of the connection between Eigen model and a quantum spin model,26 the
P
mean growth rate R = i ri pi for the whole population has been calculated in:27
R = max[e−γ(1−
√
1−k2 ]
f (k))|−1≤k≤1
(5)
The maximum is at some −1 ≤ k0 ≤ 1.
The population is focused at the Hamming distance L(1 − s)/2 from the peak sequence,
where s is defined by the equation:28
R = f (s).
(6)
We have the non-selective phase with k0 = 0, and the following expression for the mean growth
rate
R = f (0) ≡ 1,
(7)
where we have chosen f (0) = 1.
The Eigen’s error threshold is defined as a point when the expression of R jumps from
Eq.(5) with some k0 6= 0 to Eq.(7). For the single peak fitness model with f (1) = A and
f (m) = 1 for m < 1, the error threshold is at10, 11
Ae−γ = 1.
(8)
This paper is organized as follow. In section 2, we define the Eigen model with correlated
multiple mutations and derive its Hamilton-Jacobi equation (HJE).23–25 In section 3, we follow23–25 to solve the HJE and compare analytic results with numerical solutions to test the
reliability of analytic equations. In section 4, we discuss our results.
2. Eigen model with correlated multiple mutations
2.1 Definition of the model
Consider again the model given by Eq.(1), in the model with correlated multiple mutations,
the mutation matrix is modified as follow
Qij ≡ Qn = Q̂q −n (1 − q)n gn ,
(9)
where n ≡ d(i, j), gn are some coefficients decreasing with the growth of n, and Q̂ defines the
normalization of the probabilities Qij . In the original Eigen model,10, 11 gn = 1.
We consider the symmetric distribution of pi = Pl , where l = d(i, 0) is the Hamming
distance (HD) of the i-th sequence from the master sequence, so that all sequences with
the same HD from the master sequence have the same pi = Pl . From such a sequence, one
can generate a sequence Sj with HD n from Si through n1 up and n2 down mutations and
3/10
J. Phys. Soc. Jpn.
Full Paper
n = n1 − n2 . The total number of such mutations is
l!
(L − l)!
.
n1 !(l − n1 )! n2 !(L − l − n2 )!
(10)
Then we derive the following equation for Pl :
L−l
l
X
X
l!
(L − l)!
dPl
=
Qn +n Pl−n rl−n ,
dt
n1 !(l − n1 )! n2 !(L − l − n2 )! 1 2
(11)
n1 =0 n2 =0
where n1 − n2 = n
2.2 Derivation of the HJE equation
We assume the following ansatz for the Pl
Pl = exp[Lu(m, t)],
(12)
where m = 1 − 2l/L. With 1/L accuracy we replace in eq.(11):
Pl−n → Pl exp[2nu0 (m)], rl−n → rl ,
(13)
where u0 (m) = ∂u(m)/∂m. Then using the formulas
1 − m n1
l!
! ≈ ln1 = (L
) ,
(l − n1 )
2
(L − l)!
1 + m n2
≈ (L − l)n2 = (L
) ,
(L − l − n2 )!
2
(14)
L−l n1
l
X
X
l (L − l)n2
γ
dPl
0
=
Q̂gn1 +n2 ( )(n1 +n2 ) e2(n1 −n2 )u Pl rl .
dt
n1 !
n2 !
L
(15)
we obtain
n1 =0 n2 =0
We consider the case l À 1, N − l À 1. Then using an equality
∞ X
∞
∞
X
X
an1 bn2
(a + b)n
gn1 +n2 =
gn ,
n1 ! n2 !
n!
n1 =0 n2 =0
(16)
n=0
and re-scale the time t → t/L, from Eqs. (12) and (15) we obtain the HJE
∂u(m, t)
+ H(m, u0 ) = 0,
∂t
L
X
1 + m −2u0 1 − m 2u0 n gn
e
+
e )γ]
f (m)Q̂,
−H =
[(
2
2
n!
(17)
n=0
where f (m) = rl .
3. Solution of the model
Let us calculate the mean fitness following to the ideas of.23–25 We assume an asymptotic
solution
u(m, t) = kt + u0 (m).
4/10
(18)
J. Phys. Soc. Jpn.
Full Paper
Then we have an ordinary differential equation for the u0 (m):
k = −H(m, u00 ).
(19)
First we define the potential U (m),
U (m) = min[−H(m, p)]p
X
gn
= Q̂
(1 − m2 )n/2 f (m)γ n .
n!
n
(20)
To have a real value solution u0 (m) for u0 (m) in Eq. (20), we put a constraint. Therefore
k ≥ U (m), −1 ≤ m ≤ 1.
(21)
We choose R as a minimal k under the constraint Eq.(21),23 thus we have an equation for the
mean fitness:
R = max[U (m)]m .
(22)
Now consider a simple choice of gn :
g1 = 1, n ≤ 2;
gn =
1
, n > 2.
(n + 1)(n + 2)(n + 3)
(23)
Equation (9) and (23) imply that for a larger number of n ≥ 3, the probability for n mutations
is smaller than the case of independent mutiple mutation (i.e. gn = 1). Thus Eq.(23) has
the effect of reducing fitness function for a larger n, similar to that of the truncated fitness
landscape,20, 21 in which the fitness function is 0 for n larger than a critical value nc . The
reduction of fitness function for a larger n can be understood as follow. For a larger number of
n, the probability for silent mutations is reduced and the probability for missense or nonsense
mutations is increased causing the reduction of fitness function. We choose the specific form
of Eq. (23) because we can solve the model exactly with this specific form.
From the expansion of eγ , one can easily show that
1 + γ + γ 2 /2 +
∞
X
n≥3
=
∞
X
n≥0
=
γn
(n + 3)!
γn
1
1
1
1
+ (1 − ) + (1 − )γ + ( − )γ 2
(n + 3)!
3!
4!
2 5!
eγ − 1 − γ − 12 γ 2
1
1
1
1
+ (1 − ) + (1 − )γ + ( − )γ 2 .
3
γ
3!
4!
2 5!
(24)
From Eqs. (9) and (23) and the normalization of Qn , one can obtain an expression for Q̂.
Using such Q̂, and Eqs. (17) and (24), one can obtain the Hamiltonian:
−H(m, u0 ) =
f (m)γ 3
eγ − 1 − γ − 12 γ 2 + 65 γ 3 +
5/10
23 4
24 γ
+
59 5
120 γ
J. Phys. Soc. Jpn.
Full Paper
A
50
40
30
20
10
2
Fig. 1.
3
4
5
6
7
Γ
8
The critical values of A as a function of γ = L(1 − q) for the original Eigen model, Eq.(8),
and the correlated multiple mutation model of this paper, Eq. (28), are represented by solid curve
and dashed curve, respectively.
exp[G] − 1 − G − G2 /2 5 23
59 2
+ + G+
G ],
3
G
6 24
120
1 + m −2u0 1 − m 2u0
e
+
e ).
G = γ(
2
2
×[
(25)
For the mean fitness we have an expression:
R = max[
f (m)γ 3
eγ − 1 − γ − 12 γ 2 + 56 γ 3 +
ew − 1 − w −
×(
w3
p
w = γ 1 − m2 .
w2
2
+
23 4
24 γ
+
59 5
120 γ
59 2
5 23
+ w+
w )]|m ,
6 24
120
(26)
For the single peak fitness we derive for the mean fitness in the selective phase:
R=A
γ3
eγ − 1 − γ − 12 γ 2 + 65 γ 3 +
23 4
24 γ
+
59 5 .
120 γ
(27)
For the non-selective phase we again have Eq.(7). Thus we derive the error threshold condition
putting R = 1 in Eq.(27).
1=A
γ3
eγ − 1 − γ − 12 γ 2 + 65 γ 3 +
23 4
24 γ
+
59 5 .
120 γ
(28)
Figure 1 gives the comparison of the critical value of A for different values of γ for the
Eigen model by Eq.(8) and our model by Eq.(28). Figure 1 shows that for given values of A
and (1 − q), Eq. (28) gives a larger number of critical chain length than that given by Eq.(8).
In the Table 1 we compare the direct numerical results for Eqs. (1) and (9) with our
analytical solution by Eq. (26). Table 1 shows that our analytical results are very reliable.
The method used in this section can be extended to calculate the mean fitness and the
6/10
J. Phys. Soc. Jpn.
Table I.
Full Paper
A
2.5
3.
3.6
4.0
20
30
40
Rth
1.
1.1987
1.4376
1.5968
7.995
11.992
15.989
Rnum
1.0001
1.1992
1.4391
1.5990
7.995
11.992
15.989
The result for the mean fitness for the model defined by Eqs. (1), (9) and (23) with the single
peak fitness landscape. Rth is the analytical result given by Eq.(27), and Rnum is the numerical
results of the model defined by Eqs. (1), (9) and (23) with L = 200. Rth and Rnum are consistent
very well with each other.
error threshold condition for other functional form of gn . For example, instead of using Eq.
(23) for gn , one can choice gn = 1/n3 with n ≥ 3. However, in this case one should use the
numerical method to calculate the mean fitness and the phase diagram.
4. Discussion
In conclusion, we solved the Eigen model for the case of correlated mutations, with rescaling of the probability of multiple mutations, while holding ordinary expression for the probability of double mutations. We relaxed the condition of independence of mutations at different
positions of the genome. The numerics confirmed well our analytical results. There are some
experimental results supporting the hypothesis that the mutations in bacteria are not random.29–31 In32 the limited randomness of mutations has been assumed as one of the key
features of the life. We just speculated that the pre-biotic matter also possess such a property, constructed the model with simplest correlation between mutations, and estimated the
change of error threshold. In16 the minimal genome length has been estimated L = 8000,
which for 1 − q = 0.001 gives γ = 8. For the error threshold in our model we have A = 46 (if
we assume 40% lethal mutations33 we need even less A = 20), which is sometimes possible in
RNA replication reactions.34 Thus assuming a simple mechanism of attenuation of multiple
mutations, we can solve the error threshold paradox. The error threshold in Eigen model has
a deep information-theoretical meaning: it coincides with the Shannon constraint for optimal
codes to submit the encoded information through the channels with the noise. 35 It will be
very interesting to investigate the correlated noise case under information-theoretical point of
view.
Another possible application of our results are RNA viruses, evolving at high mutation
rates near the error threshold. The Poisson distribution for the number of multiple mutations
is only one of possible choices.
This work inspires some interesting problems for further study. It is well know that in the
Crow-Kimura model of biological evolution with parallel mutation-selection scheme, 9, 13, 36
only one (DNA or RNA) base is changed from one generation to the next generation. Such a
model has been extended to the biological evolution model with the general fitness function
7/10
J. Phys. Soc. Jpn.
Full Paper
and multiple mutations.37 One can further extend this model to a model with correlated
multiple mutations and study the influence of correlation on the error threshold. Recently,
analytic equations for finite-size corrections38 and finite population problem39 of the molecular
evolution models had been derived. One can try to extend such study to Eigen model with
correlated multiple mutations considered in this paper.
The extension of the Eigen model10, 11 with multiple random mutations (in Eq. (9), gn = 1)
to the Eigen model with multiple correlated mutations (in Eq. (9), gn < 1) is similar to the
extension of random percolation models40 to correlated percolation models corresponding to
the Ising model,41, 42 the Potts model,43 and a lattice model for hydrogen-bonding of water
molecules.44 In the correlated percolation models,41–44 the deviation from the random percolation models can be derived from subgraph expansions of the partition functions of the
original lattice models. It is of interest to know whether one can obtain g n in Eq. (9) from
some theory in molecular biology.
This work was supported by Academia Sinica, National Science Council in Taiwan with
Grant Number NSC 100-2112-M-001-003-MY2 and National Center for Theoretical Sciences
(Taipei Branch).
8/10
J. Phys. Soc. Jpn.
Full Paper
References
1) T. E. Creighton: Proteins: Structures and Molecular Properties (W. H. Freeman and Company,
New York, 1993).
2) K. Huang: Lectures on Statistical Physics and Protein Folding (World Scientific, Singapore, 2005).
3) S. G. Gevorkian, A. E. Allahverdyan , D. S. Gevorgyan and C.-K. Hu: EPL 95 (2011) 23001.
4) M.-C. Wu, M. S. Li, W.-J. Ma, M. Kouza, and C.-K. Hu, EPL 96 (2011) 68005.
5) C.-K. Peng, S. Buldyrev, A. Goldberger, S. Havlin, F. Sciortino, M. Simons and H. E. Stanley:
Nature 356 (1992) 168.
6) Y. S. Mamasakhlisov, S. Hayryan, V. F. Morozov, and C.-K. Hu: Phys. Rev. E, 75 (2007) 061907.
7) D. Y. Lando, A. S. Fridman, and C.-K. Hu: EPL 91 (2010) 38003.
8) C.-L. Chang, D. Y. Lando, A. S. Fridman, and C.-K. Hu: Biopolymers 97 (2012) 807.
9) J. F. Crow and M. Kimura: An Introduction to Population netics Theory (Harper Row, New York,
1970).
10) M. Eigen: Naturwissenschaften 58 (1971) 465.
11) M. Eigen, J. J. McCaskill, and P. Schuster: Adv. Chem. Phys. 75 (1989) 149.
12) T. Inoue and L. E. Orgel: J. Mol. Biol. 162 (1982) 201.
13) E. Baake, M. Baake, and H. Wagner: Phys. Rev. Lett. 78 (1997) 559.
14) W. K. Johnston, P. J. Unrau, M. S. Lawrence, M. E. Glasen, and D. P. Bartel: Science 292 (2001)
1319.
15) R. Gil, F. J. Silva, J. Pereto, and A. Moya: Mol. Biol. Rev. 68 (2004) 518.
16) A. Kun, M. Santos, and E. Szathmary: Nature Genetics 37 (2005) 1008.
17) D. B. Saakian, E. Munoz, C.-K. Hu, and M. W. Deem: Phys. Rev. E 73 (2006) 041913.
18) S. Rajamani, J. K. Ichida, T. Antal, D. A. Treco, K. Leu, M. A. Nowak, J. W. Szostak, and I. A.
Chen: J. Am. Chem. Soc. 132 (2010) 5880.
19) W. Gill: Int. J. Mod. Phys. C 22 (2011) 1293.
20) D. B. Saakian, C. K. Biebricher, and C.-K. Hu: Phys. Rev. E 79 (2009) 041905.
21) D. B. Saakian, C. K. Biebricher, and C.-K. Hu: PLoS One 6 (2011) 21904.
22) M. Stich, C. Briones, and S. C. Manrubia: BMC Evol Biol. 7 (2007) 110.
23) D. B. Saakian: J. Stat. Phys. 128 (2007) 781.
24) K. Sato and K. Kaneko: Phys. Rev. E 75 (2007) 061909.
25) D. B. Saakian, Z. Kirakosan and C.-K. Hu: Phys. Rev. E 77 (2008) 061907.
26) D. Saakian and C.-K. Hu: Phys. Rev. E 69 (2004) 021913.
27) D. B. Saakian and C. K. Hu: Proc. Natl. Acad. Sci. USA 103 (2006) 4935.
28) E. Baake and H. Wagner: Genet. Res. 78 (2001) 93.
29) L. Caporale, Darwin in the genome: molecular strate- gies in biological evolution. McGraw-Hill
Professional, (2003).
30) R. Galhardo, P. Hastings, and S. Rosenberg: Critical Reviews in Biochemistry and Molecular
Biology 42 (2007) 399.
31) O. Rando and K. Verstrepen: Cell 128 (2007) 655.
32) N. Goldenfeld and C. R. Woese: Ann. Rev. Cond. Matt. Phys. 2 (2011) 375-399.
33) R. Sanjuan, A. Moya, and S. F. Elena: Proc. Natl. Acad. Sci. USA 101 (2004) 8396.
34) T. A. Lincoln and G. F. Joyce: Science 323 (2009) 1229.
9/10
J. Phys. Soc. Jpn.
Full Paper
35) H. G. Schuster, Complex Adaptive Systems. Scator Verlag, Saarbrcken (2001).
36) D. B. Saakian and C.-K. Hu: Phys. Rev. E 69 (2004) 046121 .
37) D. B. Saakian, C.-K. Hu and H. Khachatryan: Phys. Rev. E 70 (2004) 041908.
38) Z. Kirakosyan, D. B. Saakian and C.-K. Hu: J. Stat. Phys. 144 (2011) 198.
39) D. B. Saakian, M. W. Deem and C.-K. Hu: EPL 98 (2012) 18001.
40) D. Stauffer and A. Aharony: Introduction to Percolation Theory (Taylor and Francis, 1994), revised
second ed.
41) C.-K. Hu: Physica A 119 (1983) 609.
42) C.-K. Hu: Phys. Rev. B 29 (1984) 5103.
43) C.-K. Hu: Phys. Rev. B 29 (1984) 5109.
44) C.-K. Hu: J. Phys. A: Math. Gen. 16 (1983) L321.
10/10