Efficiency of Protein Production from mRNA Marc A. Suchard1 ([email protected]) Kenneth Lange2 ([email protected]) Janet S. Sinsheimer3 ([email protected]) Departments of Biomathematics1,2,3, Biostatistics1,3 , Human Genetics1,2,3 , and Statistics2 University of California, Los Angeles, CA 90095 Abstract: Adapting arguments from queuing theory, we investigate a mathematical model for protein production efficiency from mRNA. Our model involves six parameters: the mRNA length, the clearance distance a ribosome must travel from the initiation site before another ribosome can attach, the ribosomal attachment rate, the ribosomal traveling speed along the mRNA, the mRNA degradation rate, and the probability that a ribosome prematurely disengages from the mRNA. The model allows for different mechanisms of mRNA degradation; the more complicated mechanisms postulate a functional role for the mRNA poly A tail. We determine the probability generating function of the number N of fully formed proteins from a single mRNA. This function yields the moments of N exactly and the entire distribution of N numerically via the finite Fourier transform. Using biologically plausible estimates, we examine the sensitivity of protein production to the model parameters and degradation mechanisms. Model predictions are most sensitive to the degradation and attachment rates, two parameters which are poorly measured in vivo. AMS Subjection Classification: 60G55; 60K25; 92B05; 92C40 Key Words: Genetics, mRNA, translation, ribosome, Poisson process, queuing theory, finite Fourier transform. 1 Introduction Protein production, in vivo, is a multi-step process involving (a) transcription of DNA into premessenger RNA, (b) splicing of the premessenger RNA to create messenger RNA (mRNA), (c) translation of mRNA into protein, and (d) degradation of the mRNA. Geneticists have traditionally viewed transcription as the basic determinant of the rate of protein production. Recently, evidence has surfaced suggesting that the rate of mRNA decay is equally important in determining protein production (Jacobson and Peltz, 1999). mRNA degrades by many mechanisms, and mRNA stability is highly variable and sequence dependent. In vertebrates, the half-life of mRNA varies from approximately 30 minutes to more than 4 days (Greenberg, 1972; Hargrove et al., 1991). However, it is a mistake to assume that the rates of transcription and mRNA degradation exclusively determine the efficiency of protein production. The rate of translation certainly has an impact as well. The translation rate depends on the ribosome attachment rate and the elongation rate; this latter rate is the ratio of the number of amino acids added to the protein per second to the number of nucleotides in the message. The number of nucleotides can vary from fewer than 100 to more than 4000. Finally, the translation process can end prematurely with an incomplete protein; the probability of this happening depends on the particular mRNA sequence (Jorgensen and Kurland, 1990). Experiments have necessarily focused on a single component of protein production at a time. Mathematical models of protein production that explicitly capture both elongation and degradation, even in a simplistic manner, may aid in understanding the interactions between these complex forces. The current paper extends the previous mathematical work of Singh (1996), who focuses on the shape distribution of ribosomes along an mRNA rather than on protein production per se. The references (Beelman and Parker, 1995; Curtis et al., 1995; Jacobson and Peltz, 1996; Lewin, 1997; Mangus and Jacobson, 1999) provide background material on the underlying biology of translation from mRNA to protein. Our model is admittedly naive and purposely simplifies some of the facts in order to make mathematical progress. The underlying probability calculations adapt arguments from queuing theory (Karlin and Taylor, 1975; Karlin and Taylor, 1981). 1 2 The Model The cartoon in Figure 1 illustrates our model for the process of mRNA translation. The model involves six parameters. The first is the length l of the message, the second is the clearance distance d < l that a ribosome must travel from the initiation site before another ribosome can attach to the initiation site, the third is the attachment rate λ of a ribosome to the initiation site in the absence of blocking by an existing ribosome, the fourth is the speed s that a ribosome travels along the message, the fifth is the rate µ of message degradation, and the sixth is the probability p that an attached ribosome disengages from the message before it finishes assembling its protein. By analogy to the simple single-server queuing model of communications engineering, our model postulates that the times of ribosome attachment occur according to a Poisson process with intensity λ in the absence of blocking. If we imagine that message degradation occurs when the message is broken, then it is conceivable that a ribosome beyond the breakpoint could still proceed to the completion of its protein. To allow for this possibility, we assume that the break occurs at a random point U on the interval [0, l] and that U has distribution function G(u). Our aim here is to investigate the random number of complete proteins N that a single message generates during its lifetime T . Both N and T are random variables. The rate of protein production boils to down to the ratio of expectations E(N )/ E(T ). In our first version of the model, T is exponentially distributed with distribution function 1−e−µt and density function µe−µt . In eukaryotes, degradation of the poly A tail of the message precipitates loss of the message. If we imagine degradation of the poly A tail as preceding in chunks of fixed size at Poisson times, it is reasonable to assume that T has gamma distribution µβ tβ−1 e−µt /Γ(β) for some positive integer β that counts the total number of chunks. Characterization of the distribution of N is considerably harder. In deriving this distribution, we will consider an infinite sequence of independent random variables X1 , X2 , . . . with common exponential distribution F (x) = 1−e−λx. As shown in Feller (1971) and other standard probability texts, the sum Sn = X1 +· · ·+Xn of the first n of these random variables follows a gamma distribution F (n) (x) = λn xn−1 e−λx /Γ(n) with Laplace transform n Z ∞ λ −θx (n) −θSn e dF (x) = ) = E(e . (2.1) λ+θ 0 We will need the integration by parts Z ∞ e−θxdF (n) (x) = e −θx 0 = 1 θ Z F (n) ∞ ∞ 1 Z ∞ e−θx F (n) (x)dx (x) + θ 0 0 e−θx F (n) (x)dx (2.2) 0 c version of formula (2.1) as well. We will also need the Laplace transform E(e−θU ) = dG(θ) of the breakpoint c U . If the break always appears at the beginning or the end the message, then dG(θ) equals 1 or e−θl , Full length polypeptide Growing polypeptide Ribosomal subunit Attachment Initiation Site 5’ end λ d Premature termination Translocation p s β polyA tail chunks mRNA AAAAA---AAAAA---AAAAA Ribosomal subunit Degradation l Nuclease Figure 1: Cartoon model of mRNA translation. 2 µ c respectively. If U is uniformly distributed on [0, l], then dG(θ) = (1 − e−θl )/(θl). Our strategy will be to calculate the right-tail probability Pr(N ≥ n) by conditioning on the values t and u of T and U . To this end, we define the time td = d/s for a ribosome to reach the clearance distance. The key to calculation of E(N ) in the model is the simple observation that for n > 0 the event N ≥ n is equivalent to the event n−1 X (Xi + td ) + Xn + u/s ≤ t. (2.3) i=1 In other words, the nth ribosome must attach and travel beyond the breakpoint before the break occurs. In the absence of blocking, Sn = X1 + · · · + Xn is the random time at which the nth ribosome attaches. The n − 1 summands td represent the time lags for the first n − 1 ribosomes to clear the initiation site. The time u/s is the travel time to the breakpoint. Assuming the probability of disengagement p = 0, these considerations lead to the expression # "n−1 X (Xi + td ) + Xn + u/s ≤ t Pr(N ≥ n | T = t, U = u) = Pr i=1 = F (n) [t − u/s − (n − 1)td ] . (2.4) Integrating this conditional probability against the product of the degradation density and breakpoint density yields Z lZ ∞ Pr(N ≥ n) = µe−µt F (n) [t − u/s − (n − 1)td ]dt dG(u) 0 = Z 0 l −µ[u/s+(n−1)td ] = ∞ e−µv F (n) (v)dv dG(u) n Z l λ e−µ(n−1)td e−µu/s dG(u) λ+µ 0 n λ c e−µ(n−1)td dG(µ/s) λ+µ µe 0 = Z 0 (2.5) c in view of formulas (2.1) and (2.2). If tl = l/s is the total travel time for a ribosome, then dG(µ/s) equals −µtl −µtl 1, e , or (1 − e )/(µtl ), depending on whether U = 0, U = l, or U is uniformly distributed on [0, l]. probability Pr(N ≥ n), we can calculate the probabilityPgenerating function P (r) = P∞Given the right-tail ∞ m n Pr(N = m)r from the tail-probability generating function Q(r) = m=0 n=0 Pr(N > n)r via the well-known formula P (r) = 1 − (1 − r)Q(r) given in (Feller, 1968). Fortunately, straightforward algebra using the geometric series yields Q(r) = c λdG(µ/s) λ + µ − λe−µtd r (2.6) P∞ n=0 wn = (1 − w)−1 (2.7) for r ∈ [0, 1]. Possession of P (r) allows us to recover the mean E(N ) = P ′ (1) = Q(1) = and variance Var(N ) c λdG(µ/s) µ + λ − λe−µtd (2.8) = P ′′ (1) + P ′ (1) − P ′ (1)2 = 2Q′ (1) + Q(1) − Q(1)2 c c λdG(µ/s){µ + λ + λ[e−µtd − dG(µ/s)]} . = −µt 2 (µ + λ − λe d ) 3 (2.9) When p > 0 and the ribosome can fall off the message, we must modify the above expressions. The probability generating function for the number of fully realized proteins is given by the functional composition ∞ ∞ X X m m−n P [p + (1 − p)r] = sn Pr(N = m) p (1 − p)n . (2.10) n m=n n=0 Here p + (1 − p)r is the probability generating function that a particular ribosome that could have completed its protein before the message degrades actually does so without disengaging from the message. To avoid complicating our analysis, we assume that all ribosomes successfully negotiate the blocking distance before disengaging. With this caveat, we can subscript the mean and variance by p and show that Ep (N ) Varp (N ) = (1 − p) E0 (N ) = p(1 − p) E0 (N ) + (1 − p)2 Var0 (N ) (2.11) by evaluating the first two derivatives of the composite generating function P [p + (1 − p)r] at r = 1. In eukaryotes, degradation of the poly A tail of the message generally precipitates loss of the message. If we imagine degradation of the poly A tail as preceding in β chunks of fixed size at Poisson times, it is reasonable to assume that T has gamma distribution µβ tβ−1 e−µt /Γ(β). To recover the right-tail probability of N , we superscript it by β and differentiate β − 1 times with respect to µ. When p = 0, this tactic yields Z lZ ∞ µβ Prβ (N ≥ n) = tβ−1 e−µt F (n) [t − u/s − (n − 1)td ]dt dG(u) Γ(β) 0 0 Z lZ ∞ (−1)β−1 µβ dβ−1 = e−µt F (n) [t − u/s − (n − 1)td ]dt dG(u) Γ(β) dµβ−1 0 0 (−1)β−1 µβ dβ−1 Pr(N ≥ n) . (2.12) = Γ(β) dµβ−1 µ Because differentiation commutes with summation, the generating function and kth moment of N satisfy (−1)β−1 µβ dβ−1 P (r) P β (r) = Γ(β) dµβ−1 µ β−1 β−1 β E(N k ) (−1) µ d β k . (2.13) E (N ) = Γ(β) dµβ−1 µ For small values of β, it is possible to carry out the indicated differentiations by hand or by a symbolic algebra program such as Maple. For larger values of β, it is preferable to extend Prβ (N ≥ n)/µ and Eβ (N k )/µ to be analytic functions of the complex variable µ and recover their values by the finite Fourier transform (Henrici, 1979; Lange, 1999). The same Fourier analysis tactics work P∞ in extracting the discrete density Pr(N = k). Indeed, any probability generating function P (s) = k=0 pk sk may be extended to the boundary of the unit circle in the complex plane by setting ∞ X P (e2πit ) = pk e2πit . (2.14) k=0 This creates a periodic function in t whose kth Fourier coefficient pk can be recovered via the finite Riemann sum n−1 1X P (e2πij/n )e−2πikj/n . (2.15) pk ≈ n j=0 In practice, one evaluates this finite Fourier transform via the fast Fourier transform algorithm for some large power n = 2v of 2. For sufficiently large v, all of the coefficients p0 , . . . , pn−1 can be computed accurately. Accuracy can be checked by comparing the numerically computed mean and variance of P (s) with its theoretical mean and variance. Finally, when premature termination is possible, we replace P β (r) by P β [p + (1 − p)r] and calculate the mean and variance of N by substituting Eβ0 (N ) and for E0 (N ) and Varβ0 (N ) for Var0 (N ) in equation (2.11) based on the values of Eβ0 (N ) and Varβ0 (N ) = Eβ0 (N 2 ) − Eβ0 (N )2 derived from equation (2.13). 4 2500 1400 1200 2000 E(N) E(N) 1000 800 600 1500 1000 400 500 200 1 2 4 3 Attachment Rate 5 0.0005 0.001 0.0015 0.002 Degradation Rate µ λ (a) Sensitivity to attachement rate (b) Sensitivity to degradation rate Figure 2: Sensitivity of E(N ) to the p attachment rate λ and degradation rate µ. The solid line in each plot is E(N ) and the dashed line is E(N ) + Var(N ). The model employed assumes U = 0, β = 1, l = 400, s = 10, d = 30, and p = 0.25. The dotted lines indicate the values of λ and µ used in the remaining sensitivity studies. 3 Numerical Results Our model can be used to explore mechanisms that the cell might employ for controlling protein production. Most likely control is exerted through perturbations of particularly sensitive parameters. To explore the sensitivity of E(N ) and Var(N ) to the various parameters of the model, we varied each parameter separately while holding the remaining parameters fixed at biologically plausible values (Christensen and Bourne, 1999; Greenberg, 1972; Hargrove et al., 1991; Menninger, 1976; Pavlov and Ehrenberg, 1996; Pederson, 1984; Ross, 1996; Voet and Voet, 1995). Arbitrary but plausible values include an attachment rate λ of 0.5 per second, a length l of 400 codons, a speed s of 10 codons per second, a clearance distance d of 30 codons, a premature detachment probability p of 0.25 per ribosome, and a degradation rate of µ of 0.0004 mRNA strands per second. Except where noted, we assume that the number of chunks β = 1. When all other parameters are constant, an increase in λ, β, or s leads to an increase in protein production, while an increase in d, µ, or p leads to a decrease in protein production. Because experimental values of d do not vary appreciably, sensitivity to d is not examined here. Our model predicts that protein production is linearly proportional to 1 − p. p The plots of E(N ) and E(N ) + Var(N ) in Figure 2a show the sensitivity of protein production to changes in λ. Figure 2a assumes 5’ exonuclease activity (U = 0). The sensitivity of protein production to changes in λ is almost identical under 3’ exonuclease activity (U = l) and uniformly distributed endonuclease activity. Values of λ less than 1.0 per second lead to the greatest control. Once λ exceeds 2.0, the expectation E(N ) is relatively insensitive to changes in λ. Figure 2b illustrates the sensitivity of protein production to changes in µ for U = 0. When µ is less than 0.0005 mRNA strands per second, protein production is very sensitive to changes in µ. For example, increasing µ from 0.0001 to 0.0002 roughly halves E(N ). The two other models for the distribution of U lead to virtually identical sensitivities of E(N ) and Var(N ) to changes in µ. When the speed s is low, say less than 15 codons per second, small changes in s can also lead to relatively large changes in E(N ) (Figure 3a). In this parameter regime, the model of protein degradation matters. If U = 0, then lims→0 E(N ) = (1 + µ/λ)−1 , while if U > 0, lims→0 E(N ) = 0. Our model predicts that protein production is very insensitive to message length when µ is restricted to biologically plausible values. Examination of equation (2.8) reveals that E(N ) does not depend on l when U = 0. When U = l or U is uniformly distributed on (0, l), protein production decays very slowly to 0. For example in Figure 3b, E(N ) drops by less than 10% as l increases from 400 to 4000 codons. When the product µl is small, E(N ) is approximately equal to (1 − p)λ/µ under our three models for U . In other words, protein production does not depend on any length dependent parameters. The effect of β on protein production is examined in the Figures 4 and 5. The probability distribution 5 1000 1400 800 1200 E(N) E(N) 1000 800 600 400 600 400 200 200 10 20 30 40 0 50 1000 2000 3000 4000 mRNA Length l Speed s (a) Sensitivity to speed (b) Sensitivity to mRNA length Figure 3: Sensitivity of E(N ) to p the speed s and the mRNA length l. The solid line in each plot is E(N ) and the dashed line is E(N ) + Var(N ). The model employed assumes U = 0, β = 1, λ = 0.5, µ = 0.0004, d = 30, and p = 0.25. The dotted lines indicate the values of s and l used in the remaining sensitivity studies. of N is shown in Figure 4 for β = 1 and β = 5. In general, the distribution of N looks very much like a discrete version of a gamma distribution with shape parameter β and scale parameter µ. When the per chunk degradation rate µ is held constant, protein production increases as β increases (Figure 5). If instead µ/β is held constant, then E(N ) is invariant and Var(N ) decreases as β is increased (Figure 5). Most available biological estimates of µ measure global mRNA degradation and must be adjusted for the appropriate number of poly A chunks β for use in our model. 0.0025 Discrete Probability Mass Discrete Probability Mass 0.0025 0.002 0.0015 0.001 0.0005 200 400 600 800 0.002 0.0015 0.001 0.0005 200 1000 1200 1400 400 600 800 1000 1200 Number of Proteins N Number of Proteins N (a) β = 1 (b) β = 5 Figure 4: Probability density of N with a single 3’ poly A chunk (β = 1) and with five 3’ poly A chucks (β = 5). The model employed assumes U = 0, l = 400, λ = 0.5, µ = 0.0004, s = 10, d = 30, and p = 0.25. Note that the function is discrete and defined only on the integers. 4 Conclusions The simple model presented here predicts that control of protein production is primarily a function of the degradation rate µ and to a lesser extent of the attachment rate λ. Regulation of degradation can be made more precise by increasing the number of poly A chunks β while keeping the mean degradation time E(T ) fixed. This tends to decrease Var(N ) but has little impact on E(N ). Fine control of protein production is almost certain to be evolutionary advantageous. The length l of the message is not a determinant of E(N ). An important short-coming of the translation model we develop is the independence between ribosome attachment at the initiation site and translational termination. In eukaryotic cells, the poly A tail can 6 Adjusted Degradation Rate 600 5000 Constant Degradation Rate E(N) o o o 400 3000 E(N) o o o o o o o o o o o o o 200 1000 o o o 0 0 o 2 4 6 8 10 2 Number of Chunks β 4 6 8 10 Number of Chunks β p Figure 5: Effects of increasing the number of 3’ poly A chunks β on E(N ) and Var(N ). The circles are p E(N ) and the error bars are E(N ) ± Var(N ). The first plot assumes the rate of single chunk degradation is measurable and equals µ = µ̂. The last plot assumes only the global mRNA degradation rate (µ̂) is measurable and µ ≈ β µ̂. The model employed assumes U = 0, l = 400, λ = 0.5, µ = 0.0004, s = 10, d = 30, and p = 0.25. Note that scales differ between plots. associate with the 5’ end of the mRNA through a protein complex involving poly-A-binding protein and eukaryotic translation initiation factor 4G (Wells et al., 1998), circularizing the mRNA. Researchers hypothesize that the circularization co-locates recently terminated ribosomal subunits near the initiation site, enhancing re-entry of the subunits into the process. The model also involves some further mathematical compromises and several microscopic parameters that are difficult to measure. For example, it would be more realistic to model premature detachment as a Poisson process with rate ν per unit length. This yields the detachment probability p = 1 − e−νl . E(N ) would then depend on l. Unfortunately, the attachment rate λ and the number of chunks β are poorly determined. Our estimate of λ is little more than a guess that makes E(N ) sensitive to small changes in λ. To obtain better experimental estimates of these parameters, it might be worth looking at genes controlled by the same operon. Message would be churned out at the same rate, but the sizes of the messages and poly A tails would vary. As the technology of DNA microarrays improves, it will be possible to use these devices to gain insight into cellular control of mRNA expression levels. Coupled with better measurement of protein concentration levels over time, microarray experiments will lay the groundwork for better, more realistic models of protein production. Acknowledgments This research was supported in part by USPHS grants GM53275 (KL), MH59490 (KL, JSS), and CA16042 (MAS, JSS) and an Alfred P. Sloan Research Fellowship (MAS). References Beelman, C. and Parker, R. (1995). Degradation of mRNA in Eukaryotes. Cell, 81:179–183. Christensen, A. and Bourne, C. (1999). Shape of large bound polysomes in cultured fibroblasts and thyroid cells. Anatomical Record, 255:116–129. Curtis, D., Lehmann, R., and Zamore, P. (1995). Translational regulation in development. Cell, 81:171–178. Feller, W. (1968). An Introduction to Probability Theory and Its Applications, Vol. 1. Wiley, New York, third edition. Feller, W. (1971). An Introduction to Probability Theory and its Applications, Vol 2. Wiley, New York, second edition. 7 Greenberg, J. (1972). High stability of messenger RNA in growing cultured cells. Nature, 240:102–104. Hargrove, J., Hulsey, M., and Beale, E. (1991). The kinetics of mammalian gene expression. BioEssays, 13:667–674. Henrici, P. (1979). Fast Fourier transform methods in computational complex analysis. SIAM Review, 21:481–527. Jacobson, A. and Peltz, S. (1996). Interrelationships of the pathways of mRNA decay and translation in eukaryotic cells. Annual Review of Biochemistry, 65:693–739. Jacobson, A. and Peltz, S. (1999). Tools for turnover: methods for analysis of mRNA stability in eukaryotic cells. Methods: Companion to Methods in Enzymology, 17:1–2. Jorgensen, F. and Kurland, C. (1990). Processing errors of gene expression in Escherichia coli. Journal of Molecular Biology, 215:511–521. Karlin, S. and Taylor, H. (1975). A First Course in Stochastic Processes. Academic Press, New York, second edition. Karlin, S. and Taylor, H. (1981). A Second Course in Stochastic Processes. Academic Press, New York. Lange, K. (1999). Numerical Analysis for Statisticians. Springer-Verlag, New York. Lewin, B. (1997). Genes VI. Oxford University Press, Oxford, United Kingdom. Mangus, D. and Jacobson, A. (1999). Linking mRNA turnover and translation: assessing the polyribosomal association of mRNA decay factors and degradative intermediates. Methods: Companion to Methods in Enzymology, 17:28–37. Menninger, J. (1976). Peptidyl-transfer RNA dissociates during protein synthesis from ribosomes of E. coli. Journal of Biological Chemistry, 251:3392–3398. Pavlov, M. and Ehrenberg, M. (1996). Rate of translation of natural mRNAs in an optimized in vitro system. Archives of Biochemistry and Biophysics, 328:9–16. Pederson, S. (1984). Escherichia coli ribosomes translate in vivo with variable rate. EMBO Journal, 3:2895– 2898. Ross, J. (1996). Control of messenger RNA stability in higher Eukaryotes. Trends in Genetics, 12:171–175. Singh, U. (1996). Polyribosome dynamics: size-distribution as a function of attachment, translocation and release of ribosomes. Journal of Theoretical Biology, 179:147–159. Voet, D. and Voet, J. (1995). Biochemistry. John Wiley and Sons, New York, second edition. Wells, S., Hillner, P., Vale, R. and Sachs, A. (1998). Circularization of mRNA by eukaryotic translation initiation factors. Molecular Cell, 2:135–140. 8
© Copyright 2026 Paperzz