Typeset with jpsj2.cls <ver.1.2.2b> Full Paper Eigen model with correlated multiple mutations and solution of error catastrophe paradox in the origin of life Zara Kirakosyan1 , David B. Saakian1,2,3 and Chin-Kun Hu2 ∗ 1 Yerevan Physic Institute, Alikhanian Brothers St. 2, Yerevan 375036, Armenia 2 3 Institute of Physics, Academia Sinica, Nankang, Taipei 11529, Taiwan National Center for Theoretical Sciences:Physics Division, National Taiwan University, Taipei 10617, Taiwan We consider the Eigen model with non-Poisson distribution of mutation number. We apply the Hamilton-Jacobi equation method to solve the model and calculate the mean fitness. We find that the error threshold depends on the correlation and the suggested mechanism may give a simple solution of error catastrophe paradox in the origin of life. KEYWORDS: Eigen model, origin of life, correlated multiple mutations, fitness function 1. Introduction In recent decades, ideas and methods of physics have been widely used to study biological problems from molecular level, e.g. the structure and folding of proteins,1–4 structure and thermal properties of DNA and RNA,5–8 molecular models of biological evolution and the origin of life,9–21 etc. In this paper, we will address an interesting problem related to the origin of life. In 1971, Eigen proposed an autocatalytic reaction model for the origin of life. 10, 11 The model describes the self-replication of the molecules having some L monomers (letters). There is some mutation probability 1 − q for any letter, when the given letter changes into another letter in the set of alphabets. If the replication rate of some master or main sequence is A times higher than the replication rate (fitness) of other sequences, then the correct selfreplication is possible when the mutation probability is less than a threshold value. Otherwise, the sequence with the high replication rate will be lost. If one takes the typical experimental mutation probability for RNA molecules,12, 14 then the maximal allowed length is about 1001000, while the length of the genome to posses a minimal set of genes15 was estimated to be about 8000-20000.16 This is the famous error catastrophe paradox in the origin of life. There have been many attempts to solve the paradox (see16, 21 and references there). Recently Rajamani et al.18 proposed to solve the paradox through the slowing of replication processes due to mutations and we proposed a mechanism with the lethal mutation and truncated fitness function.20, 21 Besides the origin of life problem, the error catastrophe phenomenon is ∗ E-mail: [email protected] 1/10 J. Phys. Soc. Jpn. Full Paper important for RNA viruses which evolve near the error threshold.22 Most traditional biological evolution models use constant fitness landscapes and simple, independent distributed mutations. Nowadays, it is well recognized that the further progress in biological evolution is related to the dynamic fitness landscape and controlled or correlated multiple mutations. It is the mainstream of evolution research. In the present paper, we investigate the error threshold phenomenon in case of high mutation rates, considering the general probability distribution instead of the Poisson distribution for multiple mutations. In the later case, multiple mutations are independent while in the former case multiple mutations are correlated. We applied Hamilton-Jacobi equation method23–25 to solve the model and calculate the mean fitness. The error threshold could be changed due to considered correlations. The suggested mechanism can give a simple solution of error catastrophe paradox in the origin of life without the lethal mutation and a truncated fitness landscape.20, 21 In the rest of this section, we briefly review the Eigen model10, 11 with random multiple mutations and its analytic solution. We also define related notations used in this paper. In the Eigen model,10, 11 there are some types of monomers. For the sake of simplicity, we consider two types of monomers, ±1 as in references.10, 11 There are M ≡ 2L different types (sequences) of molecules, 0 ≤ i ≤ M − 1, and we choose the 0-th (peak)sequence with all +1 l monomers. We denote by Si ≡ s1i , . . . , sL i the i-th genome, where si describes the type of the monomer at the l-th position. We denote the probability for the appearance of S i at time t by pSi ≡ pi (t). The i-th sequence has a fitness ri . The mutation matrix’ element Qij is the probability that an offspring produced by state j changes to state i, and the evolution is given by the set of equations for M probabilities p i M −1 M −1 X X dpi = (Qij )rj pj − pi ( rj pj ). dt j=0 Here pi satisfies the constraint PM −1 i=0 (1) j=0 pi = 1, and Qij = q L−d(i,j) (1 − q)d(i,j) , with q being the mean nucleotide incorporation fidelity, and d(i, j) ≡ (L − (2) PL l l l=1 si sj )/2 being the Hamming distance between Si and Sj .10, 11 The parameter γ = L(1 − q) describes the P mutation efficiency. The second term in Eq.(1) ensures the balance condition j pj = 1. It describes the dilution of population in chemical reactor. We define the Hamming distance d(i, 0) of the sequence Si from the peak sequence S0 through the parameter m d(i, 0) = L 1−m . 2 (3) We assume that ri is a function of the Hamming distance d(0, i) or m ri ≡ f (m) 2/10 (4) J. Phys. Soc. Jpn. Full Paper where m is given by Eq.(3). We have chosen the indices i of the first L + 1 sequences to have d(i, 0) = i, 0 ≤ i ≤ L. On the basis of the connection between Eigen model and a quantum spin model,26 the P mean growth rate R = i ri pi for the whole population has been calculated in:27 R = max[e−γ(1− √ 1−k2 ] f (k))|−1≤k≤1 (5) The maximum is at some −1 ≤ k0 ≤ 1. The population is focused at the Hamming distance L(1 − s)/2 from the peak sequence, where s is defined by the equation:28 R = f (s). (6) We have the non-selective phase with k0 = 0, and the following expression for the mean growth rate R = f (0) ≡ 1, (7) where we have chosen f (0) = 1. The Eigen’s error threshold is defined as a point when the expression of R jumps from Eq.(5) with some k0 6= 0 to Eq.(7). For the single peak fitness model with f (1) = A and f (m) = 1 for m < 1, the error threshold is at10, 11 Ae−γ = 1. (8) This paper is organized as follow. In section 2, we define the Eigen model with correlated multiple mutations and derive its Hamilton-Jacobi equation (HJE).23–25 In section 3, we follow23–25 to solve the HJE and compare analytic results with numerical solutions to test the reliability of analytic equations. In section 4, we discuss our results. 2. Eigen model with correlated multiple mutations 2.1 Definition of the model Consider again the model given by Eq.(1), in the model with correlated multiple mutations, the mutation matrix is modified as follow Qij ≡ Qn = Q̂q −n (1 − q)n gn , (9) where n ≡ d(i, j), gn are some coefficients decreasing with the growth of n, and Q̂ defines the normalization of the probabilities Qij . In the original Eigen model,10, 11 gn = 1. We consider the symmetric distribution of pi = Pl , where l = d(i, 0) is the Hamming distance (HD) of the i-th sequence from the master sequence, so that all sequences with the same HD from the master sequence have the same pi = Pl . From such a sequence, one can generate a sequence Sj with HD n from Si through n1 up and n2 down mutations and 3/10 J. Phys. Soc. Jpn. Full Paper n = n1 − n2 . The total number of such mutations is l! (L − l)! . n1 !(l − n1 )! n2 !(L − l − n2 )! (10) Then we derive the following equation for Pl : L−l l X X l! (L − l)! dPl = Qn +n Pl−n rl−n , dt n1 !(l − n1 )! n2 !(L − l − n2 )! 1 2 (11) n1 =0 n2 =0 where n1 − n2 = n 2.2 Derivation of the HJE equation We assume the following ansatz for the Pl Pl = exp[Lu(m, t)], (12) where m = 1 − 2l/L. With 1/L accuracy we replace in eq.(11): Pl−n → Pl exp[2nu0 (m)], rl−n → rl , (13) where u0 (m) = ∂u(m)/∂m. Then using the formulas 1 − m n1 l! ! ≈ ln1 = (L ) , (l − n1 ) 2 (L − l)! 1 + m n2 ≈ (L − l)n2 = (L ) , (L − l − n2 )! 2 (14) L−l n1 l X X l (L − l)n2 γ dPl 0 = Q̂gn1 +n2 ( )(n1 +n2 ) e2(n1 −n2 )u Pl rl . dt n1 ! n2 ! L (15) we obtain n1 =0 n2 =0 We consider the case l À 1, N − l À 1. Then using an equality ∞ X ∞ ∞ X X an1 bn2 (a + b)n gn1 +n2 = gn , n1 ! n2 ! n! n1 =0 n2 =0 (16) n=0 and re-scale the time t → t/L, from Eqs. (12) and (15) we obtain the HJE ∂u(m, t) + H(m, u0 ) = 0, ∂t L X 1 + m −2u0 1 − m 2u0 n gn e + e )γ] f (m)Q̂, −H = [( 2 2 n! (17) n=0 where f (m) = rl . 3. Solution of the model Let us calculate the mean fitness following to the ideas of.23–25 We assume an asymptotic solution u(m, t) = kt + u0 (m). 4/10 (18) J. Phys. Soc. Jpn. Full Paper Then we have an ordinary differential equation for the u0 (m): k = −H(m, u00 ). (19) First we define the potential U (m), U (m) = min[−H(m, p)]p X gn = Q̂ (1 − m2 )n/2 f (m)γ n . n! n (20) To have a real value solution u0 (m) for u0 (m) in Eq. (20), we put a constraint. Therefore k ≥ U (m), −1 ≤ m ≤ 1. (21) We choose R as a minimal k under the constraint Eq.(21),23 thus we have an equation for the mean fitness: R = max[U (m)]m . (22) Now consider a simple choice of gn : g1 = 1, n ≤ 2; gn = 1 , n > 2. (n + 1)(n + 2)(n + 3) (23) Equation (9) and (23) imply that for a larger number of n ≥ 3, the probability for n mutations is smaller than the case of independent mutiple mutation (i.e. gn = 1). Thus Eq.(23) has the effect of reducing fitness function for a larger n, similar to that of the truncated fitness landscape,20, 21 in which the fitness function is 0 for n larger than a critical value nc . The reduction of fitness function for a larger n can be understood as follow. For a larger number of n, the probability for silent mutations is reduced and the probability for missense or nonsense mutations is increased causing the reduction of fitness function. We choose the specific form of Eq. (23) because we can solve the model exactly with this specific form. From the expansion of eγ , one can easily show that 1 + γ + γ 2 /2 + ∞ X n≥3 = ∞ X n≥0 = γn (n + 3)! γn 1 1 1 1 + (1 − ) + (1 − )γ + ( − )γ 2 (n + 3)! 3! 4! 2 5! eγ − 1 − γ − 12 γ 2 1 1 1 1 + (1 − ) + (1 − )γ + ( − )γ 2 . 3 γ 3! 4! 2 5! (24) From Eqs. (9) and (23) and the normalization of Qn , one can obtain an expression for Q̂. Using such Q̂, and Eqs. (17) and (24), one can obtain the Hamiltonian: −H(m, u0 ) = f (m)γ 3 eγ − 1 − γ − 12 γ 2 + 65 γ 3 + 5/10 23 4 24 γ + 59 5 120 γ J. Phys. Soc. Jpn. Full Paper A 50 40 30 20 10 2 Fig. 1. 3 4 5 6 7 Γ 8 The critical values of A as a function of γ = L(1 − q) for the original Eigen model, Eq.(8), and the correlated multiple mutation model of this paper, Eq. (28), are represented by solid curve and dashed curve, respectively. exp[G] − 1 − G − G2 /2 5 23 59 2 + + G+ G ], 3 G 6 24 120 1 + m −2u0 1 − m 2u0 e + e ). G = γ( 2 2 ×[ (25) For the mean fitness we have an expression: R = max[ f (m)γ 3 eγ − 1 − γ − 12 γ 2 + 56 γ 3 + ew − 1 − w − ×( w3 p w = γ 1 − m2 . w2 2 + 23 4 24 γ + 59 5 120 γ 59 2 5 23 + w+ w )]|m , 6 24 120 (26) For the single peak fitness we derive for the mean fitness in the selective phase: R=A γ3 eγ − 1 − γ − 12 γ 2 + 65 γ 3 + 23 4 24 γ + 59 5 . 120 γ (27) For the non-selective phase we again have Eq.(7). Thus we derive the error threshold condition putting R = 1 in Eq.(27). 1=A γ3 eγ − 1 − γ − 12 γ 2 + 65 γ 3 + 23 4 24 γ + 59 5 . 120 γ (28) Figure 1 gives the comparison of the critical value of A for different values of γ for the Eigen model by Eq.(8) and our model by Eq.(28). Figure 1 shows that for given values of A and (1 − q), Eq. (28) gives a larger number of critical chain length than that given by Eq.(8). In the Table 1 we compare the direct numerical results for Eqs. (1) and (9) with our analytical solution by Eq. (26). Table 1 shows that our analytical results are very reliable. The method used in this section can be extended to calculate the mean fitness and the 6/10 J. Phys. Soc. Jpn. Table I. Full Paper A 2.5 3. 3.6 4.0 20 30 40 Rth 1. 1.1987 1.4376 1.5968 7.995 11.992 15.989 Rnum 1.0001 1.1992 1.4391 1.5990 7.995 11.992 15.989 The result for the mean fitness for the model defined by Eqs. (1), (9) and (23) with the single peak fitness landscape. Rth is the analytical result given by Eq.(27), and Rnum is the numerical results of the model defined by Eqs. (1), (9) and (23) with L = 200. Rth and Rnum are consistent very well with each other. error threshold condition for other functional form of gn . For example, instead of using Eq. (23) for gn , one can choice gn = 1/n3 with n ≥ 3. However, in this case one should use the numerical method to calculate the mean fitness and the phase diagram. 4. Discussion In conclusion, we solved the Eigen model for the case of correlated mutations, with rescaling of the probability of multiple mutations, while holding ordinary expression for the probability of double mutations. We relaxed the condition of independence of mutations at different positions of the genome. The numerics confirmed well our analytical results. There are some experimental results supporting the hypothesis that the mutations in bacteria are not random.29–31 In32 the limited randomness of mutations has been assumed as one of the key features of the life. We just speculated that the pre-biotic matter also possess such a property, constructed the model with simplest correlation between mutations, and estimated the change of error threshold. In16 the minimal genome length has been estimated L = 8000, which for 1 − q = 0.001 gives γ = 8. For the error threshold in our model we have A = 46 (if we assume 40% lethal mutations33 we need even less A = 20), which is sometimes possible in RNA replication reactions.34 Thus assuming a simple mechanism of attenuation of multiple mutations, we can solve the error threshold paradox. The error threshold in Eigen model has a deep information-theoretical meaning: it coincides with the Shannon constraint for optimal codes to submit the encoded information through the channels with the noise. 35 It will be very interesting to investigate the correlated noise case under information-theoretical point of view. Another possible application of our results are RNA viruses, evolving at high mutation rates near the error threshold. The Poisson distribution for the number of multiple mutations is only one of possible choices. This work inspires some interesting problems for further study. It is well know that in the Crow-Kimura model of biological evolution with parallel mutation-selection scheme, 9, 13, 36 only one (DNA or RNA) base is changed from one generation to the next generation. Such a model has been extended to the biological evolution model with the general fitness function 7/10 J. Phys. Soc. Jpn. Full Paper and multiple mutations.37 One can further extend this model to a model with correlated multiple mutations and study the influence of correlation on the error threshold. Recently, analytic equations for finite-size corrections38 and finite population problem39 of the molecular evolution models had been derived. One can try to extend such study to Eigen model with correlated multiple mutations considered in this paper. The extension of the Eigen model10, 11 with multiple random mutations (in Eq. (9), gn = 1) to the Eigen model with multiple correlated mutations (in Eq. (9), gn < 1) is similar to the extension of random percolation models40 to correlated percolation models corresponding to the Ising model,41, 42 the Potts model,43 and a lattice model for hydrogen-bonding of water molecules.44 In the correlated percolation models,41–44 the deviation from the random percolation models can be derived from subgraph expansions of the partition functions of the original lattice models. It is of interest to know whether one can obtain g n in Eq. (9) from some theory in molecular biology. This work was supported by Academia Sinica, National Science Council in Taiwan with Grant Number NSC 100-2112-M-001-003-MY2 and National Center for Theoretical Sciences (Taipei Branch). 8/10 J. Phys. Soc. Jpn. Full Paper References 1) T. E. Creighton: Proteins: Structures and Molecular Properties (W. H. Freeman and Company, New York, 1993). 2) K. Huang: Lectures on Statistical Physics and Protein Folding (World Scientific, Singapore, 2005). 3) S. G. Gevorkian, A. E. Allahverdyan , D. S. Gevorgyan and C.-K. Hu: EPL 95 (2011) 23001. 4) M.-C. Wu, M. S. Li, W.-J. Ma, M. Kouza, and C.-K. Hu, EPL 96 (2011) 68005. 5) C.-K. Peng, S. Buldyrev, A. Goldberger, S. Havlin, F. Sciortino, M. Simons and H. E. Stanley: Nature 356 (1992) 168. 6) Y. S. Mamasakhlisov, S. Hayryan, V. F. Morozov, and C.-K. Hu: Phys. Rev. E, 75 (2007) 061907. 7) D. Y. Lando, A. S. Fridman, and C.-K. Hu: EPL 91 (2010) 38003. 8) C.-L. Chang, D. Y. Lando, A. S. Fridman, and C.-K. Hu: Biopolymers 97 (2012) 807. 9) J. F. Crow and M. Kimura: An Introduction to Population netics Theory (Harper Row, New York, 1970). 10) M. Eigen: Naturwissenschaften 58 (1971) 465. 11) M. Eigen, J. J. McCaskill, and P. Schuster: Adv. Chem. Phys. 75 (1989) 149. 12) T. Inoue and L. E. Orgel: J. Mol. Biol. 162 (1982) 201. 13) E. Baake, M. Baake, and H. Wagner: Phys. Rev. Lett. 78 (1997) 559. 14) W. K. Johnston, P. J. Unrau, M. S. Lawrence, M. E. Glasen, and D. P. Bartel: Science 292 (2001) 1319. 15) R. Gil, F. J. Silva, J. Pereto, and A. Moya: Mol. Biol. Rev. 68 (2004) 518. 16) A. Kun, M. Santos, and E. Szathmary: Nature Genetics 37 (2005) 1008. 17) D. B. Saakian, E. Munoz, C.-K. Hu, and M. W. Deem: Phys. Rev. E 73 (2006) 041913. 18) S. Rajamani, J. K. Ichida, T. Antal, D. A. Treco, K. Leu, M. A. Nowak, J. W. Szostak, and I. A. Chen: J. Am. Chem. Soc. 132 (2010) 5880. 19) W. Gill: Int. J. Mod. Phys. C 22 (2011) 1293. 20) D. B. Saakian, C. K. Biebricher, and C.-K. Hu: Phys. Rev. E 79 (2009) 041905. 21) D. B. Saakian, C. K. Biebricher, and C.-K. Hu: PLoS One 6 (2011) 21904. 22) M. Stich, C. Briones, and S. C. Manrubia: BMC Evol Biol. 7 (2007) 110. 23) D. B. Saakian: J. Stat. Phys. 128 (2007) 781. 24) K. Sato and K. Kaneko: Phys. Rev. E 75 (2007) 061909. 25) D. B. Saakian, Z. Kirakosan and C.-K. Hu: Phys. Rev. E 77 (2008) 061907. 26) D. Saakian and C.-K. Hu: Phys. Rev. E 69 (2004) 021913. 27) D. B. Saakian and C. K. Hu: Proc. Natl. Acad. Sci. USA 103 (2006) 4935. 28) E. Baake and H. Wagner: Genet. Res. 78 (2001) 93. 29) L. Caporale, Darwin in the genome: molecular strate- gies in biological evolution. McGraw-Hill Professional, (2003). 30) R. Galhardo, P. Hastings, and S. Rosenberg: Critical Reviews in Biochemistry and Molecular Biology 42 (2007) 399. 31) O. Rando and K. Verstrepen: Cell 128 (2007) 655. 32) N. Goldenfeld and C. R. Woese: Ann. Rev. Cond. Matt. Phys. 2 (2011) 375-399. 33) R. Sanjuan, A. Moya, and S. F. Elena: Proc. Natl. Acad. Sci. USA 101 (2004) 8396. 34) T. A. Lincoln and G. F. Joyce: Science 323 (2009) 1229. 9/10 J. Phys. Soc. Jpn. Full Paper 35) H. G. Schuster, Complex Adaptive Systems. Scator Verlag, Saarbrcken (2001). 36) D. B. Saakian and C.-K. Hu: Phys. Rev. E 69 (2004) 046121 . 37) D. B. Saakian, C.-K. Hu and H. Khachatryan: Phys. Rev. E 70 (2004) 041908. 38) Z. Kirakosyan, D. B. Saakian and C.-K. Hu: J. Stat. Phys. 144 (2011) 198. 39) D. B. Saakian, M. W. Deem and C.-K. Hu: EPL 98 (2012) 18001. 40) D. Stauffer and A. Aharony: Introduction to Percolation Theory (Taylor and Francis, 1994), revised second ed. 41) C.-K. Hu: Physica A 119 (1983) 609. 42) C.-K. Hu: Phys. Rev. B 29 (1984) 5103. 43) C.-K. Hu: Phys. Rev. B 29 (1984) 5109. 44) C.-K. Hu: J. Phys. A: Math. Gen. 16 (1983) L321. 10/10
© Copyright 2026 Paperzz