Efficiency of protein production from mRNA

Efficiency of Protein Production from mRNA
Marc A. Suchard1 ([email protected])
Kenneth Lange2 ([email protected])
Janet S. Sinsheimer3 ([email protected])
Departments of Biomathematics1,2,3, Biostatistics1,3 ,
Human Genetics1,2,3 , and Statistics2
University of California, Los Angeles, CA 90095
Abstract: Adapting arguments from queuing theory, we investigate a mathematical model for protein production efficiency from mRNA. Our model involves six parameters: the mRNA length, the clearance distance
a ribosome must travel from the initiation site before another ribosome can attach, the ribosomal attachment
rate, the ribosomal traveling speed along the mRNA, the mRNA degradation rate, and the probability that
a ribosome prematurely disengages from the mRNA. The model allows for different mechanisms of mRNA
degradation; the more complicated mechanisms postulate a functional role for the mRNA poly A tail. We
determine the probability generating function of the number N of fully formed proteins from a single mRNA.
This function yields the moments of N exactly and the entire distribution of N numerically via the finite
Fourier transform. Using biologically plausible estimates, we examine the sensitivity of protein production to
the model parameters and degradation mechanisms. Model predictions are most sensitive to the degradation
and attachment rates, two parameters which are poorly measured in vivo.
AMS Subjection Classification: 60G55; 60K25; 92B05; 92C40
Key Words: Genetics, mRNA, translation, ribosome, Poisson process, queuing theory, finite Fourier transform.
1
Introduction
Protein production, in vivo, is a multi-step process involving (a) transcription of DNA into premessenger
RNA, (b) splicing of the premessenger RNA to create messenger RNA (mRNA), (c) translation of mRNA
into protein, and (d) degradation of the mRNA. Geneticists have traditionally viewed transcription as the
basic determinant of the rate of protein production. Recently, evidence has surfaced suggesting that the
rate of mRNA decay is equally important in determining protein production (Jacobson and Peltz, 1999).
mRNA degrades by many mechanisms, and mRNA stability is highly variable and sequence dependent. In
vertebrates, the half-life of mRNA varies from approximately 30 minutes to more than 4 days (Greenberg,
1972; Hargrove et al., 1991). However, it is a mistake to assume that the rates of transcription and mRNA
degradation exclusively determine the efficiency of protein production. The rate of translation certainly has
an impact as well. The translation rate depends on the ribosome attachment rate and the elongation rate;
this latter rate is the ratio of the number of amino acids added to the protein per second to the number of
nucleotides in the message. The number of nucleotides can vary from fewer than 100 to more than 4000.
Finally, the translation process can end prematurely with an incomplete protein; the probability of this
happening depends on the particular mRNA sequence (Jorgensen and Kurland, 1990).
Experiments have necessarily focused on a single component of protein production at a time. Mathematical models of protein production that explicitly capture both elongation and degradation, even in a simplistic
manner, may aid in understanding the interactions between these complex forces. The current paper extends
the previous mathematical work of Singh (1996), who focuses on the shape distribution of ribosomes along
an mRNA rather than on protein production per se. The references (Beelman and Parker, 1995; Curtis et al.,
1995; Jacobson and Peltz, 1996; Lewin, 1997; Mangus and Jacobson, 1999) provide background material on
the underlying biology of translation from mRNA to protein. Our model is admittedly naive and purposely
simplifies some of the facts in order to make mathematical progress. The underlying probability calculations
adapt arguments from queuing theory (Karlin and Taylor, 1975; Karlin and Taylor, 1981).
1
2
The Model
The cartoon in Figure 1 illustrates our model for the process of mRNA translation. The model involves
six parameters. The first is the length l of the message, the second is the clearance distance d < l that
a ribosome must travel from the initiation site before another ribosome can attach to the initiation site,
the third is the attachment rate λ of a ribosome to the initiation site in the absence of blocking by an
existing ribosome, the fourth is the speed s that a ribosome travels along the message, the fifth is the rate
µ of message degradation, and the sixth is the probability p that an attached ribosome disengages from the
message before it finishes assembling its protein. By analogy to the simple single-server queuing model of
communications engineering, our model postulates that the times of ribosome attachment occur according
to a Poisson process with intensity λ in the absence of blocking. If we imagine that message degradation
occurs when the message is broken, then it is conceivable that a ribosome beyond the breakpoint could still
proceed to the completion of its protein. To allow for this possibility, we assume that the break occurs at a
random point U on the interval [0, l] and that U has distribution function G(u).
Our aim here is to investigate the random number of complete proteins N that a single message generates
during its lifetime T . Both N and T are random variables. The rate of protein production boils to down to
the ratio of expectations E(N )/ E(T ). In our first version of the model, T is exponentially distributed with
distribution function 1−e−µt and density function µe−µt . In eukaryotes, degradation of the poly A tail of the
message precipitates loss of the message. If we imagine degradation of the poly A tail as preceding in chunks
of fixed size at Poisson times, it is reasonable to assume that T has gamma distribution µβ tβ−1 e−µt /Γ(β)
for some positive integer β that counts the total number of chunks.
Characterization of the distribution of N is considerably harder. In deriving this distribution, we will consider an infinite sequence of independent random variables X1 , X2 , . . . with common exponential distribution
F (x) = 1−e−λx. As shown in Feller (1971) and other standard probability texts, the sum Sn = X1 +· · ·+Xn
of the first n of these random variables follows a gamma distribution F (n) (x) = λn xn−1 e−λx /Γ(n) with
Laplace transform
n
Z ∞
λ
−θx
(n)
−θSn
e
dF (x) =
) =
E(e
.
(2.1)
λ+θ
0
We will need the integration by parts
Z ∞
e−θxdF (n) (x)
= e
−θx
0
=
1
θ
Z
F
(n)
∞
∞ 1 Z ∞
e−θx F (n) (x)dx
(x) +
θ 0
0
e−θx F (n) (x)dx
(2.2)
0
c
version of formula (2.1) as well. We will also need the Laplace transform E(e−θU ) = dG(θ)
of the breakpoint
c
U . If the break always appears at the beginning or the end the message, then dG(θ) equals 1 or e−θl ,
Full length polypeptide
Growing polypeptide
Ribosomal subunit
Attachment
Initiation Site
5’ end
λ
d
Premature
termination
Translocation
p
s
β polyA tail chunks
mRNA
AAAAA---AAAAA---AAAAA
Ribosomal subunit
Degradation
l
Nuclease
Figure 1: Cartoon model of mRNA translation.
2
µ
c
respectively. If U is uniformly distributed on [0, l], then dG(θ)
= (1 − e−θl )/(θl).
Our strategy will be to calculate the right-tail probability Pr(N ≥ n) by conditioning on the values t
and u of T and U . To this end, we define the time td = d/s for a ribosome to reach the clearance distance.
The key to calculation of E(N ) in the model is the simple observation that for n > 0 the event N ≥ n is
equivalent to the event
n−1
X
(Xi + td ) + Xn + u/s ≤
t.
(2.3)
i=1
In other words, the nth ribosome must attach and travel beyond the breakpoint before the break occurs.
In the absence of blocking, Sn = X1 + · · · + Xn is the random time at which the nth ribosome attaches.
The n − 1 summands td represent the time lags for the first n − 1 ribosomes to clear the initiation site.
The time u/s is the travel time to the breakpoint. Assuming the probability of disengagement p = 0, these
considerations lead to the expression
#
"n−1
X
(Xi + td ) + Xn + u/s ≤ t
Pr(N ≥ n | T = t, U = u) = Pr
i=1
= F (n) [t − u/s − (n − 1)td ] .
(2.4)
Integrating this conditional probability against the product of the degradation density and breakpoint density
yields
Z lZ ∞
Pr(N ≥ n) =
µe−µt F (n) [t − u/s − (n − 1)td ]dt dG(u)
0
=
Z
0
l
−µ[u/s+(n−1)td ]
=
∞
e−µv F (n) (v)dv dG(u)
n
Z l
λ
e−µ(n−1)td
e−µu/s dG(u)
λ+µ
0
n
λ
c
e−µ(n−1)td dG(µ/s)
λ+µ
µe
0
=
Z
0
(2.5)
c
in view of formulas (2.1) and (2.2). If tl = l/s is the total travel time for a ribosome, then dG(µ/s)
equals
−µtl
−µtl
1, e
, or (1 − e
)/(µtl ), depending on whether U = 0, U = l, or U is uniformly distributed on [0, l].
probability Pr(N ≥ n), we can calculate the probabilityPgenerating function P (r) =
P∞Given the right-tail
∞
m
n
Pr(N
=
m)r
from
the tail-probability generating function Q(r) =
m=0
n=0 Pr(N > n)r via the
well-known formula
P (r)
=
1 − (1 − r)Q(r)
given in (Feller, 1968). Fortunately, straightforward algebra using the geometric series
yields
Q(r)
=
c
λdG(µ/s)
λ + µ − λe−µtd r
(2.6)
P∞
n=0
wn = (1 − w)−1
(2.7)
for r ∈ [0, 1]. Possession of P (r) allows us to recover the mean
E(N ) = P ′ (1) = Q(1) =
and variance
Var(N )
c
λdG(µ/s)
µ + λ − λe−µtd
(2.8)
= P ′′ (1) + P ′ (1) − P ′ (1)2
= 2Q′ (1) + Q(1) − Q(1)2
c
c
λdG(µ/s){µ
+ λ + λ[e−µtd − dG(µ/s)]}
.
=
−µt
2
(µ + λ − λe d )
3
(2.9)
When p > 0 and the ribosome can fall off the message, we must modify the above expressions. The
probability generating function for the number of fully realized proteins is given by the functional composition
∞
∞
X
X
m m−n
P [p + (1 − p)r] =
sn
Pr(N = m)
p
(1 − p)n .
(2.10)
n
m=n
n=0
Here p + (1 − p)r is the probability generating function that a particular ribosome that could have completed
its protein before the message degrades actually does so without disengaging from the message. To avoid
complicating our analysis, we assume that all ribosomes successfully negotiate the blocking distance before
disengaging. With this caveat, we can subscript the mean and variance by p and show that
Ep (N )
Varp (N )
= (1 − p) E0 (N )
= p(1 − p) E0 (N ) + (1 − p)2 Var0 (N )
(2.11)
by evaluating the first two derivatives of the composite generating function P [p + (1 − p)r] at r = 1.
In eukaryotes, degradation of the poly A tail of the message generally precipitates loss of the message.
If we imagine degradation of the poly A tail as preceding in β chunks of fixed size at Poisson times, it is
reasonable to assume that T has gamma distribution µβ tβ−1 e−µt /Γ(β). To recover the right-tail probability
of N , we superscript it by β and differentiate β − 1 times with respect to µ. When p = 0, this tactic yields
Z lZ ∞
µβ
Prβ (N ≥ n) =
tβ−1 e−µt F (n) [t − u/s − (n − 1)td ]dt dG(u)
Γ(β) 0 0
Z lZ ∞
(−1)β−1 µβ dβ−1
=
e−µt F (n) [t − u/s − (n − 1)td ]dt dG(u)
Γ(β)
dµβ−1 0 0
(−1)β−1 µβ dβ−1 Pr(N ≥ n)
.
(2.12)
=
Γ(β)
dµβ−1
µ
Because differentiation commutes with summation, the generating function and kth moment of N satisfy
(−1)β−1 µβ dβ−1 P (r)
P β (r) =
Γ(β)
dµβ−1
µ
β−1
β−1 β
E(N k )
(−1)
µ d
β
k
.
(2.13)
E (N ) =
Γ(β)
dµβ−1
µ
For small values of β, it is possible to carry out the indicated differentiations by hand or by a symbolic algebra
program such as Maple. For larger values of β, it is preferable to extend Prβ (N ≥ n)/µ and Eβ (N k )/µ to be
analytic functions of the complex variable µ and recover their values by the finite Fourier transform (Henrici,
1979; Lange, 1999).
The same Fourier analysis tactics work
P∞ in extracting the discrete density Pr(N = k). Indeed, any
probability generating function P (s) = k=0 pk sk may be extended to the boundary of the unit circle in
the complex plane by setting
∞
X
P (e2πit ) =
pk e2πit .
(2.14)
k=0
This creates a periodic function in t whose kth Fourier coefficient pk can be recovered via the finite Riemann
sum
n−1
1X
P (e2πij/n )e−2πikj/n .
(2.15)
pk ≈
n j=0
In practice, one evaluates this finite Fourier transform via the fast Fourier transform algorithm for some
large power n = 2v of 2. For sufficiently large v, all of the coefficients p0 , . . . , pn−1 can be computed
accurately. Accuracy can be checked by comparing the numerically computed mean and variance of P (s)
with its theoretical mean and variance.
Finally, when premature termination is possible, we replace P β (r) by P β [p + (1 − p)r] and calculate the
mean and variance of N by substituting Eβ0 (N ) and for E0 (N ) and Varβ0 (N ) for Var0 (N ) in equation (2.11)
based on the values of Eβ0 (N ) and Varβ0 (N ) = Eβ0 (N 2 ) − Eβ0 (N )2 derived from equation (2.13).
4
2500
1400
1200
2000
E(N)
E(N)
1000
800
600
1500
1000
400
500
200
1
2
4
3
Attachment Rate
5
0.0005
0.001
0.0015
0.002
Degradation Rate µ
λ
(a) Sensitivity to attachement rate
(b) Sensitivity to degradation rate
Figure 2: Sensitivity of E(N ) to the p
attachment rate λ and degradation rate µ. The solid line in each plot is
E(N ) and the dashed line is E(N ) + Var(N ). The model employed assumes U = 0, β = 1, l = 400, s = 10,
d = 30, and p = 0.25. The dotted lines indicate the values of λ and µ used in the remaining sensitivity
studies.
3
Numerical Results
Our model can be used to explore mechanisms that the cell might employ for controlling protein production.
Most likely control is exerted through perturbations of particularly sensitive parameters. To explore the
sensitivity of E(N ) and Var(N ) to the various parameters of the model, we varied each parameter separately
while holding the remaining parameters fixed at biologically plausible values (Christensen and Bourne, 1999;
Greenberg, 1972; Hargrove et al., 1991; Menninger, 1976; Pavlov and Ehrenberg, 1996; Pederson, 1984; Ross,
1996; Voet and Voet, 1995). Arbitrary but plausible values include an attachment rate λ of 0.5 per second,
a length l of 400 codons, a speed s of 10 codons per second, a clearance distance d of 30 codons, a premature
detachment probability p of 0.25 per ribosome, and a degradation rate of µ of 0.0004 mRNA strands per
second. Except where noted, we assume that the number of chunks β = 1. When all other parameters are
constant, an increase in λ, β, or s leads to an increase in protein production, while an increase in d, µ,
or p leads to a decrease in protein production. Because experimental values of d do not vary appreciably,
sensitivity to d is not examined here. Our model predicts that protein production is linearly proportional to
1 − p.
p
The plots of E(N ) and E(N ) + Var(N ) in Figure 2a show the sensitivity of protein production to
changes in λ. Figure 2a assumes 5’ exonuclease activity (U = 0). The sensitivity of protein production to
changes in λ is almost identical under 3’ exonuclease activity (U = l) and uniformly distributed endonuclease
activity. Values of λ less than 1.0 per second lead to the greatest control. Once λ exceeds 2.0, the expectation
E(N ) is relatively insensitive to changes in λ.
Figure 2b illustrates the sensitivity of protein production to changes in µ for U = 0. When µ is less
than 0.0005 mRNA strands per second, protein production is very sensitive to changes in µ. For example,
increasing µ from 0.0001 to 0.0002 roughly halves E(N ). The two other models for the distribution of U
lead to virtually identical sensitivities of E(N ) and Var(N ) to changes in µ.
When the speed s is low, say less than 15 codons per second, small changes in s can also lead to relatively
large changes in E(N ) (Figure 3a). In this parameter regime, the model of protein degradation matters. If
U = 0, then lims→0 E(N ) = (1 + µ/λ)−1 , while if U > 0, lims→0 E(N ) = 0.
Our model predicts that protein production is very insensitive to message length when µ is restricted to
biologically plausible values. Examination of equation (2.8) reveals that E(N ) does not depend on l when
U = 0. When U = l or U is uniformly distributed on (0, l), protein production decays very slowly to 0.
For example in Figure 3b, E(N ) drops by less than 10% as l increases from 400 to 4000 codons. When the
product µl is small, E(N ) is approximately equal to (1 − p)λ/µ under our three models for U . In other
words, protein production does not depend on any length dependent parameters.
The effect of β on protein production is examined in the Figures 4 and 5. The probability distribution
5
1000
1400
800
1200
E(N)
E(N)
1000
800
600
400
600
400
200
200
10
20
30
40
0
50
1000
2000
3000
4000
mRNA Length l
Speed s
(a) Sensitivity to speed
(b) Sensitivity to mRNA length
Figure 3: Sensitivity of E(N ) to
p the speed s and the mRNA length l. The solid line in each plot is E(N )
and the dashed line is E(N ) + Var(N ). The model employed assumes U = 0, β = 1, λ = 0.5, µ = 0.0004,
d = 30, and p = 0.25. The dotted lines indicate the values of s and l used in the remaining sensitivity
studies.
of N is shown in Figure 4 for β = 1 and β = 5. In general, the distribution of N looks very much like
a discrete version of a gamma distribution with shape parameter β and scale parameter µ. When the
per chunk degradation rate µ is held constant, protein production increases as β increases (Figure 5). If
instead µ/β is held constant, then E(N ) is invariant and Var(N ) decreases as β is increased (Figure 5).
Most available biological estimates of µ measure global mRNA degradation and must be adjusted for the
appropriate number of poly A chunks β for use in our model.
0.0025
Discrete Probability Mass
Discrete Probability Mass
0.0025
0.002
0.0015
0.001
0.0005
200
400
600
800
0.002
0.0015
0.001
0.0005
200
1000 1200 1400
400
600
800
1000
1200
Number of Proteins N
Number of Proteins N
(a) β = 1
(b) β = 5
Figure 4: Probability density of N with a single 3’ poly A chunk (β = 1) and with five 3’ poly A chucks
(β = 5). The model employed assumes U = 0, l = 400, λ = 0.5, µ = 0.0004, s = 10, d = 30, and p = 0.25.
Note that the function is discrete and defined only on the integers.
4
Conclusions
The simple model presented here predicts that control of protein production is primarily a function of the
degradation rate µ and to a lesser extent of the attachment rate λ. Regulation of degradation can be made
more precise by increasing the number of poly A chunks β while keeping the mean degradation time E(T )
fixed. This tends to decrease Var(N ) but has little impact on E(N ). Fine control of protein production is
almost certain to be evolutionary advantageous. The length l of the message is not a determinant of E(N ).
An important short-coming of the translation model we develop is the independence between ribosome
attachment at the initiation site and translational termination. In eukaryotic cells, the poly A tail can
6
Adjusted Degradation Rate
600
5000
Constant Degradation Rate
E(N)
o
o
o
400
3000
E(N)
o
o
o
o
o
o
o
o
o
o
o
o
o
200
1000
o
o
o
0
0
o
2
4
6
8
10
2
Number of Chunks β
4
6
8
10
Number of Chunks β
p
Figure 5: Effects of increasing the number
of 3’ poly A chunks β on E(N ) and Var(N ). The circles are
p
E(N ) and the error bars are E(N ) ± Var(N ). The first plot assumes the rate of single chunk degradation
is measurable and equals µ = µ̂. The last plot assumes only the global mRNA degradation rate (µ̂) is
measurable and µ ≈ β µ̂. The model employed assumes U = 0, l = 400, λ = 0.5, µ = 0.0004, s = 10, d = 30,
and p = 0.25. Note that scales differ between plots.
associate with the 5’ end of the mRNA through a protein complex involving poly-A-binding protein and
eukaryotic translation initiation factor 4G (Wells et al., 1998), circularizing the mRNA. Researchers hypothesize that the circularization co-locates recently terminated ribosomal subunits near the initiation site,
enhancing re-entry of the subunits into the process.
The model also involves some further mathematical compromises and several microscopic parameters
that are difficult to measure. For example, it would be more realistic to model premature detachment as
a Poisson process with rate ν per unit length. This yields the detachment probability p = 1 − e−νl . E(N )
would then depend on l. Unfortunately, the attachment rate λ and the number of chunks β are poorly
determined. Our estimate of λ is little more than a guess that makes E(N ) sensitive to small changes in λ.
To obtain better experimental estimates of these parameters, it might be worth looking at genes controlled
by the same operon. Message would be churned out at the same rate, but the sizes of the messages and
poly A tails would vary. As the technology of DNA microarrays improves, it will be possible to use these
devices to gain insight into cellular control of mRNA expression levels. Coupled with better measurement
of protein concentration levels over time, microarray experiments will lay the groundwork for better, more
realistic models of protein production.
Acknowledgments
This research was supported in part by USPHS grants GM53275 (KL), MH59490 (KL, JSS), and CA16042
(MAS, JSS) and an Alfred P. Sloan Research Fellowship (MAS).
References
Beelman, C. and Parker, R. (1995). Degradation of mRNA in Eukaryotes. Cell, 81:179–183.
Christensen, A. and Bourne, C. (1999). Shape of large bound polysomes in cultured fibroblasts and thyroid
cells. Anatomical Record, 255:116–129.
Curtis, D., Lehmann, R., and Zamore, P. (1995). Translational regulation in development. Cell, 81:171–178.
Feller, W. (1968). An Introduction to Probability Theory and Its Applications, Vol. 1. Wiley, New York,
third edition.
Feller, W. (1971). An Introduction to Probability Theory and its Applications, Vol 2. Wiley, New York,
second edition.
7
Greenberg, J. (1972). High stability of messenger RNA in growing cultured cells. Nature, 240:102–104.
Hargrove, J., Hulsey, M., and Beale, E. (1991). The kinetics of mammalian gene expression. BioEssays,
13:667–674.
Henrici, P. (1979). Fast Fourier transform methods in computational complex analysis. SIAM Review,
21:481–527.
Jacobson, A. and Peltz, S. (1996). Interrelationships of the pathways of mRNA decay and translation in
eukaryotic cells. Annual Review of Biochemistry, 65:693–739.
Jacobson, A. and Peltz, S. (1999). Tools for turnover: methods for analysis of mRNA stability in eukaryotic
cells. Methods: Companion to Methods in Enzymology, 17:1–2.
Jorgensen, F. and Kurland, C. (1990). Processing errors of gene expression in Escherichia coli. Journal of
Molecular Biology, 215:511–521.
Karlin, S. and Taylor, H. (1975). A First Course in Stochastic Processes. Academic Press, New York, second
edition.
Karlin, S. and Taylor, H. (1981). A Second Course in Stochastic Processes. Academic Press, New York.
Lange, K. (1999). Numerical Analysis for Statisticians. Springer-Verlag, New York.
Lewin, B. (1997). Genes VI. Oxford University Press, Oxford, United Kingdom.
Mangus, D. and Jacobson, A. (1999). Linking mRNA turnover and translation: assessing the polyribosomal
association of mRNA decay factors and degradative intermediates. Methods: Companion to Methods in
Enzymology, 17:28–37.
Menninger, J. (1976). Peptidyl-transfer RNA dissociates during protein synthesis from ribosomes of E. coli.
Journal of Biological Chemistry, 251:3392–3398.
Pavlov, M. and Ehrenberg, M. (1996). Rate of translation of natural mRNAs in an optimized in vitro system.
Archives of Biochemistry and Biophysics, 328:9–16.
Pederson, S. (1984). Escherichia coli ribosomes translate in vivo with variable rate. EMBO Journal, 3:2895–
2898.
Ross, J. (1996). Control of messenger RNA stability in higher Eukaryotes. Trends in Genetics, 12:171–175.
Singh, U. (1996). Polyribosome dynamics: size-distribution as a function of attachment, translocation and
release of ribosomes. Journal of Theoretical Biology, 179:147–159.
Voet, D. and Voet, J. (1995). Biochemistry. John Wiley and Sons, New York, second edition.
Wells, S., Hillner, P., Vale, R. and Sachs, A. (1998). Circularization of mRNA by eukaryotic translation
initiation factors. Molecular Cell, 2:135–140.
8