Mapping the translation initiation landscape of an S. cerevisiae gene

YGENO-08524; No. of pages: 11; 4C:
Genomics xxx (2013) xxx–xxx
Contents lists available at SciVerse ScienceDirect
Genomics
journal homepage: www.elsevier.com/locate/ygeno
Methods
Mapping the translation initiation landscape of an S. cerevisiae gene
using fluorescent proteins
Tuval Ben-Yehezkel a,b,1, Hadas Zur c,1, Tzipy Marx b, Ehud Shapiro a,b, Tamir Tuller d,e,⁎
a
Computer Science & Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
Computer Science, Tel Aviv University, Tel Aviv, Israel
d
Biomedical Engineering, Tel Aviv University, Tel Aviv, Israel
e
The Sagol School of Neuroscience, Tel-Aviv University, Tel-Aviv, Israel
b
c
a r t i c l e
i n f o
Article history:
Received 13 March 2013
Accepted 17 May 2013
Available online xxxx
Keywords:
S. cerevisiae
Translation
Out-of-frame codons
Genomics
Scanning model
5′UTR
a b s t r a c t
Accurate and efficient gene expression requires that protein translation initiates from mRNA transcripts with
high fidelity. At the same time, indiscriminate initiation of translation from multiple ATG start-sites per transcript has been demonstrated, raising fundamental questions regarding the rate and rationale governing alternative translation initiation. We devised a sensitive fluorescent reporter assay for monitoring alternative
translation initiation. To demonstrate it, we map the translation initiation landscape of a Saccharomyces
cerevisiae gene (RMD1) with a typical ATG sequence context profile. We found that up to 3%–5% of translation
initiation events occur from alternative out-of-frame start codons downstream of the main ATG. Initiation
from these codons follows the ribosome scanning model: initiation rates from different start sites are determined by ATG order, rather than their context strength. Genomic analysis of S. cerevisiae further supports the
scanning model: ATG codons downstream rather than upstream of the main ATG tend to have higher context
scores.
© 2013 Elsevier Inc. All rights reserved.
1. Introduction
Gene translation is a central metabolic and regulatory process in all
living organisms. A central open question in this field relates to the fidelity and regulation of translation initiation [1–3].
In eukaryotes, canonical initiation of translation is believed to involve scanning of an mRNA transcript with the pre-initiation complex
until a start codon, which is typically an AUG codon with the ‘correct’
surrounding sequence context, is approached [4–6]. At this point the
large ribosomal subunit associates and elongation commences.
According to the first AUG-rule, AUG codons closer to the 5′UTR end
of the transcript have a higher probability for initiating translation [7].
However, several deviations from this canonical model have been
reported: for example, initiations from non-AUG codons (ACG/CUG/
GUG) [7–10]; specifically, it was also shown for a specific yeast gene,
ALA1, that initiation from all single codons resulting from a single mutation of AUG is possible [8]. Recent studies have demonstrated that
mammalian genes with non-AUG N-terminal extensions evolve under
⁎ Corresponding author at: Biomedical Engineering, Tel Aviv University, Tel Aviv, Israel.
E-mail address: [email protected] (T. Tuller).
1
Equal contribution.
strong purifying selection [11] and suggest that frequently non-AUG codons induce translation initiation [9,10]. There are also reported cases of
leaky scanning where AUG codons with sub-optimal contexts are
skipped and translation initiates at a downstream AUG [7]. Finally,
there are known mechanisms of translation initiation from internal ribosome entry sites (abbreviated as IRES). An IRES is a nucleotide sequence that enables translation initiation from the middle of an
mRNA transcript [12–14], potentially not involving mRNA scanning.
Recently, studies based on the ribosomal profiling approach have suggested that, in mammals, translation initiation events and programmed
frame shifts may occur downstream from the main START codon
[9,15,16]. In yeast however, it has been suggested that leaky scanning
rarely occurs [7], and cases of translation initiation at any measurable
rate from alternative codons in native yeast genes that are not in the
frame of the main ORF [17,18] have not, to our knowledge, been reported.
In this study we devised a modular fluorescent reporter assay that
experimentally monitors the rate at which translation initiates at alternative ATG codons, thereby accurately mapping the translation initiation landscape from genes with START codons not in the reading
frame of the main ORF. We applied it to the yeast RMD1 gene and
show that translation initiates from alternative ATG codons downstream of the main ATG and not in the frame of the main ORF, following a specific mechanistic model of translation initiation. Finally, we
discuss the advantages and disadvantages of our approach and possible ramifications related to the reported results.
0888-7543/$ – see front matter © 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.ygeno.2013.05.003
Please cite this article as: T. Ben-Yehezkel, et al., Genomics (2013), http://dx.doi.org/10.1016/j.ygeno.2013.05.003
2
T. Ben-Yehezkel et al. / Genomics xxx (2013) xxx–xxx
Please cite this article as: T. Ben-Yehezkel, et al., Genomics (2013), http://dx.doi.org/10.1016/j.ygeno.2013.05.003
T. Ben-Yehezkel et al. / Genomics xxx (2013) xxx–xxx
3
2. Results
2.1. The context score distribution of AUG codons in the yeast genome
We devised a statistical score for ATG start codons (named context
score [19]) based on their similarity to the sequence context of ATG
START codons of highly expressed genes (Material and methods) in
the genome of Saccharomyces cerevisiae, in which higher context
scores denote similarity to the ATG context of highly expressed
genes (Material and methods); the approach is similar to comparing
the AUG context to the Kozak consensus, but is more sensitive as it includes specific weights for each nucleotide distribution (details in the
Material and methods section). Surprisingly, our analysis of the S.
cerevisiae genome reveals that the transcripts of this organism are
enriched with alternative ATG codons with relatively high context
scores (Material and methods), both before and after the conventional START ATG (Figs. 1A–B; see also Supplementary Figs. 2–9). Specifically, when considering the nucleotides at a distance of less than 50
codons upstream (in the 5′UTR) or downstream from the ORF's conventional START codon, which is the region along the transcript likely
to participate in translation initiation, S. cerevisae's 5861 protein coding genes include a total of 16,635 alternative ATG codons in the three
reading frames, of which 12,484 are out of frame.
Although the abundance of high context scores, alternative ATG codons near the START codon (Figs. 1A–C) and not in the frame of the
main ORF emphasize the discriminatory power of the translation machinery in initiating translation, it also raises the hypothesis that translation may frequently initiate from such alternative START ATGs. The
fact that the context score of the ATG codons upstream of the beginning
of the main ORF tends to be significantly lower than context scores of
ATG codons downstream of the beginning of the main ORF (Figs. 2A–
B) supports the scanning model and our hypothesis above.
2.2. A novel fluorescent reporter assay determines the rate at which
translation initiates from alternative ATG codons not in the frame of
the main ORF in vivo
We provide initial experimental evidence for the hypothesis that
translation may frequently initiate from alternative ATG codons not in
the frame of the main ORF in S. cerevisiae by measuring the protein output from these codons in an S. cerevisiae gene. The mechanism of translation initiation by transcript scanning suggests that if alternative
initiation occurs, it is most likely to commence from those relatively
close to the main START ATG, potentially resulting in previously unknown short peptides [7]. These peptides are experimentally challenging to detect and quantify in the native endogenous context because of
their infrequency and variability, rendering analyses using ad hoc
methods complicated.
To overcome these problems we devised an experimental setup
that enables a sensitive and well controlled analysis of alternative
translation initiation events from a different frame relatively to the
reading frame of the main ORF (Fig. 3) from any gene (that fulfills
some additional conditions that will be further discussed) using a
unified methodology. Specifically, a set of variants of a gene is generated, each containing its first ~ 150 nucleotides (which is related to
the length of polypeptide needed to fill the ribosomal exit tunnel
Fig. 2. Distribution of ATG scores of alternative ATG codons before and after the beginning
of the main ORF when considering the log of the absolute context scores (A.) and the log of
the relative context scores (B.). As can be seen both relative (KS test, p-val = 4.9 ∗ 10−67,
empirical p-value b 0.001) and absolute (KS test, p-val = 2.3 ∗ 10−7, empirical
p-value b 0.001) ATG context scores are higher (more efficient) downstream of the
START codon (relatively to the ones upstream of the START codon), supporting the scanning model. The values of the context scores include various ‘outliers’ due to the nature of
this score (Material and methods); this is the reason that the mean absolute context score
of the 5′UTR does not appear in A.
[20]) fused to a downstream Yellow Florescent Protein (YFP) reporter
gene, and their fluorescent output is recorded.
We assume that the alternative initiation codons that we want to
study have a +1 frame shift compared to the frame of the main ORF;
cases with alterative initiation events from codons with a +2 frame
shift compared to the frame of the main ORF can be analyzed in a similar
manner.
The first variant, named ‘Main’, has the first 150 nt of the wild-type
(WT) gene fused to the YFP reporter, and thus estimates the protein
levels due to translation initiation solely from the main ATG codon or
other ATG codons in-frame (Fig. 3(1)). The second variant, named
‘Alt’, also contains the WT gene, but fused to the YFP reporter with a
+1 nt frame shift, in order to measure the protein levels only due to
translation initiation from all the alternative ATGs of that frame
(Fig. 3(2)). The third variant, named ‘TTG’, was constructed from the
‘Alt’ variant by replacing all its alternative ATGs of the +1 frame with
TTG (encoding Leucine). Its purpose is to estimate an upper bound on
Fig. 1. The number and distribution of the context score of alternative and main START ATG codons (see also Supplementary Figs. 2–9). A. The number and distribution of alternative
ATGs 150 nt upstream (in the 5′UTR) and downstream from the start ATGs according to their log-absolute context score. The context scores of the four alternative ATGs at the beginning of the gene RMD1 are marked by red dashed bars; as can be seen ALT4 has a higher context score than the other three, and the context scores of ALT1-3 appear in the middle
of the distribution. B. The number and distribution of the 16,635 alternative ATGs according to their log-relative context score, which is the log of their context score divided by the
context score of the main START ATG of the gene they reside in (see Material and methods section); scores larger or smaller than 0 (vertical blue bar) denote alternative ATGs with
scores higher or lower than the score of their main START ATG, respectively. The distribution of the alternative ATGs in-frame is in red while that of alternative ATGs out-of-frame is in
green. The relative context scores of the ALT1-4 are marked by pink dashed bars. As can be seen, the relative context scores of ALT1-3 appear in the middle of the distribution. C. The
distribution of the log-context scores of main START ATGs in S. cerevisiae. The context score of the gene RMD1 appears in the middle of the distribution and is marked by a red bar. The
ATG codons used to learn the context score PSSM (Material and methods) were omitted from the figure. (For interpretation of the references to color in this figure legend, the reader
is referred to the web version of this article.)
Please cite this article as: T. Ben-Yehezkel, et al., Genomics (2013), http://dx.doi.org/10.1016/j.ygeno.2013.05.003
4
T. Ben-Yehezkel et al. / Genomics xxx (2013) xxx–xxx
Fig. 3. Wet experiment to estimate the translation initiation from ATG codons downstream of the main START ATG. The figure includes all the variants that were analyzed in the
experiment.
the noise and to control for the possibility that the measured protein
levels of the ‘Alt’ variant are due to a frame shift in translation following
initiation from the traditional main START ATG of the gene (Fig. 3(3)
and Material and methods). The fourth variant named ‘None’, contains
a promoter-less YFP reporter gene in order to estimate the lower
bound on the noise, e.g. due to auto-florescence of S. cerevisiae
(Fig. 3(4)). Note that the ‘None’ and ‘TTG’ variants are still expected to
be very similar and specifically, the lower and upper bounds can be
very close or identical, as they both model noise.
Finally, in order to estimate the translation initiation rate from each
of the four alternative ATGs we collectively measured using the ‘Alt’
strain, we generated four additional strains, named ‘Alt1’, ‘Alt2’, ‘Alt3’,
and ‘Alt4’ (Figs. 3(5)–(8)); Each of these four strains was identical to
the ‘TTG’ strain except that in each of them one of the out-of-frame alternative ATGs was mutated back to ATG and the latter three remained
TTG. Taken together, a dedicated version of the aforementioned set of
reporter variants can accurately map the translation initiation landscape of any gene.
2.3. RMD1 as a model gene for studying alternative translation initiation
We analyzed the S. cerevisiae genome to find a model gene for
studying alternative initiation of translation from codons not in the
frame of the main ORF. We searched for genes with main and alternative ATG context scores that are common in S. cerevisiae, as evident by
the distribution of ATG context scores in the S. cerevisiae genome
(Fig. 1). We excluded genes whose alternative ATG codons that we
aimed to study are far, more than 50 codons, upstream or downstream from the beginning of the ORF, since initiation rarely occurs
there naturally. Finally, in order to maximally exploit the accuracy
of our approach we (1) searched for genes that had all the studied alternative ATGs in the same reading frame (and out-of-frame compared to the main ORF), (2) excluded genes that had additional
in-frame ATGs within 150 nt of the main ATG (to isolate the effect
of the main START codon on initiation) and (3) excluded genes that
had stop codons in the 5′UTR and/or the first 150 nt of the main
ORF in all frames to prevent re-initiation [21,22]; our aim was to
Please cite this article as: T. Ben-Yehezkel, et al., Genomics (2013), http://dx.doi.org/10.1016/j.ygeno.2013.05.003
T. Ben-Yehezkel et al. / Genomics xxx (2013) xxx–xxx
study initiation and not re-initiation; re-initiation occurs when the ribosome finishes translating an ORF (by approaching a STOP codon)
and starts translating a new one downstream of it (by approaching
the next START codon); by screening against genes with no stop codons we control for this phenomenon (see Fig. 4 for an illustration
of the selection process).
The gene RMD1 (YDL001W), encoding a cytoplasmatic protein required for meiosis, which contains 4 alternative out-of-frame ATGs,
was selected for experimental analysis. Fig. 1 shows that the context
score of RMD1's four alternative ATGs closest to the main ATG is similar to the mean context score of alternative ATGs in the S. cerevisiae
genome, both in terms of their absolute values (Fig. 1A) and in
terms of their relative values compared to the context score of their
main ATG (Fig. 1B). The context score of RMD1's main ATG is also similar to the mean genomic value of main ATGs (Fig. 1C). Finally, the alternative ATG load of RMD1 (4 sites downstream of the main ATG) is
also not unusual, as 31% of S. cerevisiae genes harbor at least four alternative ATGs downstream of their main ATG.
We generated the complete set of 8 strains of the RMD1 gene
(‘Main’, ‘Alt’, ‘TTG’, ‘No’, ‘Alt1’, ‘Alt2’, ‘Alt3’ and ‘Alt4’) in order to (1)
Fig. 4. Genes that can be studied with our approach (A.) and genes that are unsuitable
or irrelevant for initiation (B.). A. We can study genes that include START codons that
are frame shifted relative to the main ORF; these codons can be at the 5′UTR or the beginning of the ORF, as START codons more than 50 codons after the beginning of the
ORF typically do not initiate translation. B. Cases that may not be studied with our approach or are not relevant include: 1) alternative START codons interleaved with STOP
codons (this case may be studied if the aim is to understand reinitiation) — in this case
the ribosome may start translating one of the alternative START codons, terminate
translation, and re-initiate translation; 2) alternative START codons in the same
frame of the main ORF; 3)alternative START codons that are more than 50 codons
downstream of the beginning of the main ORF.
5
experimentally determine whether translation initiates at its
out-of-frame alternative ATG codons, (2) provide a quantitative measure of the process and (3) gain mechanistic insight into the process
of translation initiation at these sites.
2.4. Alternative translation initiation from ATGs not in the frame of the
main ORF in RMD1
We cultured the complete set of strains and recorded their fluorescence in log-phase using a plate-reader (details in the Material and
methods section). Our results demonstrate that the combined protein
output due to initiation from the four out-of-frame ATGs (i.e. the ATGs
not in the frame of the main ORF) of the ‘ALT’ strain is 3.3% (with STD
of only 0.59%) of the output due to initiation from the main ATG of
the ‘Main’ strain (Material and methods, Fig. 5). Initiation from TTG codons of the ‘TTG’ strain was orders of magnitudes lower than that of the
‘Alt’ strain. Specifically, a statistical analysis demonstrates that the protein levels resulting from the out-of-frame alternative ATGs of the ‘Alt’
strain are significantly higher than from the TTG codons of the ‘TTG’
strain (p = 1.3∙10−33, empirical p-value b 0.001), and that the ‘TTG’
strain is very similar (less than 0.2% difference) but significantly higher
than the promoter-less ‘None’ strain (p = 1.3∙10−22, empirical
p-value = 0.006). Thus, our results show for the first time, that in S.
cerevisiae significant levels of translation initiation can occur from
out-of-frame ATG codons downstream of the main START ATG, and to
a much lesser extent possibly even from non-ATG codons, or from
out-of-frame ATG codons.
Interestingly, individual analyses of the out-of-frame ‘Alt1-4’ variants
show that the protein levels due to initiations from the alternative ATG
codon closest to the main ATG of the ORF (‘Alt1’ strain) were significantly
higher than the other three strains (ALT2–4), which did not exhibit protein levels higher than the noise (‘TTG’ strain). Protein levels due to initiation from ALT1 were estimated to be 5% (with STD of only 0.13%) of
the protein levels of the ‘Main’ strain (significantly higher than the ALT
variant; p-value = 4.8 ∗ 10−32; empirical p-value b 0.001; Material
and methods, Fig. 5), suggesting that the cumulative translation initiation rate from several alternative ATG codons is not additive, and is probably related to the context and positions of the alternative ATGs as
previously suggested [7,17,23]. Only the alternative ATG codon closest
to the main START exhibited significant amounts of translation initiation,
suggesting that either a small fraction of ribosomes scanning the 5′-UTR
of RMD1 mRNAs scan through the main START and initiate translation
from the next closest ATG they encounter in the scanning process; or,
as we discuss later, it is also possible, at least partially, that the alternative
initiation is due to cases of 5′ truncated mRNA transcripts.
In fact, the ATG context score (Material and methods) of the first
out-of-frame alternative ATG codon was the lowest of the four, and
not extremely or unusually high (Figs. 1A–B; the scores of the alternative ATGs 1–4 after normalizing by the score of the Main START are
0.5, 0.76, 0.59, and 10.6 respectively). This suggests that translation
initiation from these out-of-frame codons is determined to a larger
extent by their location along the transcript than their ATG context
score. This result broadens the classical mechanistic assumption of
the scanning model, that translation initiates at the first ATG with a
sufficiently high context score, regardless of the context score of
downstream ATGs, as it demonstrates that a small fraction of the
pre-initiation complexes may scan through an ATG with a relatively
good context score (the main ATG), and start from a downstream
out-of-frame ATG even if its context score is worse.
3. Discussion
We describe a novel approach for monitoring translation initiation
from alternative out-of-frame ATG codons. We demonstrate the
method by implementing it on the model S. cerevisiae gene RMD1.
Previous papers have suggested based on the ribosomal profiling
Please cite this article as: T. Ben-Yehezkel, et al., Genomics (2013), http://dx.doi.org/10.1016/j.ygeno.2013.05.003
6
T. Ben-Yehezkel et al. / Genomics xxx (2013) xxx–xxx
Fig. 5. Expression levels of all the variants analyzed in this study and corresponding p-values and empirical p-value (pe, Material and methods). Inset: the expression levels of
‘Main’, ‘ALT1’, ‘ALT’, and ‘TTG’ after subtracting noise.
approach that in mammals there are alternative initiation events near
the beginning of the START codon of the main ORF [9,16]. Here we
used an approach which is based on estimation of protein levels and
is, therefore, complementary to the ribosomal profiling approach,
both providing an additional novel layer of experimental validation
for this phenomenon, and testing whether it occurs in relatively simple eukaryotic translation systems.
Our computational analyses of alternative START sites in the S.
cerevisiae genome (Figs. 1A, B and 2) both emphasize the discriminatory
power exhibited by the translation machinery in initiating translation at
specific START sites, and raise questions regarding the actual fidelity of
the process. Specifically, there is a global signal of lower median scores
of upstream alternative sites than the score of downstream sites; this is
in accordance with what the scanning model assumes — binding of the
ribosomal pre-initiation complex to the first ATG downstream of the
cap site with a strong enough context sequence score. However, this
signal is not very strong and coherent: first, the correlation coefficient
between the context score and ribosomal density is significant but not
very high (0.19; see Material and methods section); second, as can be
seen in Figs. 1–2, the rather wide range of context scores for the main
START sites overlaps that of alternative sites. Thus, we expect, specifically in genes with lower context scores for the main ATG, higher usage of
potential alternative start codons. There are several additional factors
other than the context score that may favor the main ATG site over
the one before or after it: First, as mentioned, the position of the
codon is important — ATG codons closer to the 5′end have higher probability to initiate translation; second, ATG codons in unstructured parts
of the mRNA molecule should be recognized more efficiently [24,25];
and third, strong mRNA structure downstream of an ATG codon should
help stabilize the pre-initiation complex and thus promote initiation
from the codon [26,27]. We believe that additional such signals will be
discovered in the future.
Our experimental mapping of the translation initiation landscape of
the model RMD1 gene suggests that alternative initiation of translation
is not only more frequent than previously thought, but also involves the
translation of many previously unknown short peptides with relatively
low protein levels due to initiation from out-of-frame ATGs, raising
questions about how these peptides are related to the functionality of
the cell. For example, it has been recently discovered that small peptides
switch the transcriptional activity of shavenbaby during Drosophila embryogenesis [28]. There are also known cases of alternative translation
initiation in various organisms, where initiation from an alternative
in-frame ATG codon results in functional longer isoforms that affect
the translocation of the protein [17,18,29–32]. Thus, it is possible that
the numerous low-abundance short peptides generated by alternative
initiation from out-of-frame ATGs have yet unknown regulatory roles
in S. cerevisiae.
While results from the model RMD1 gene cannot be readily generalized to the entire transcriptome of S. cerevisiae, its fairly common (1) absolute values of alternative ATG context scores (see Fig. 1A), (2) relative
values of alternative ATG context scores compared to their respective
‘Main’ ATG (see Fig. 1B), as well as (3) its relatively average main ATG
context score compared to other Main ATGs in the genome (see
Fig. 1C), bolsters the notion that other S. cerevisiae transcripts may also
exhibit similar behavior.
Our analyses show that the lengths of peptides potentially generated from the out-of-frame and in-frame alternative ATG codons we identified, are within the range of naturally occurring peptides expressed
from the S. cerevisiae genome (see Material and methods).
In addition, the reported signal of translation from ‘out-of-frame’
ATG codons is rather low, less than 5%, but is based on a long list of
normalizations and statistical tests (Material and methods), and is
significant based on accepted statistical standards. It is possible that
the reported signal is an upper bound and the actual signal of alternative translation is even lower than reported; this possibility emphasizes the fidelity of translation initiation. On the other hand, it is
possible that future implementations of the method reported here
on other genes will yield a much stronger signal of translation form
out-of-frame ATG codons. Thus, further studies that will include implementation of our methods on a large set of genes in S. cerevisiae
and/or other organisms will shed additional light on the content
and abundance of such out-of-frame translated proteins.
It is important to remember that all biological methods, even the
very common ones, usually include biases. For example, it is known
that DNA chips that have been used in thousands of studies in recent
years include biases [33], and various aspects of the recent ribosomal
profiling approach apparently include biases [9,16,34]. Our approach
can estimate protein levels and translation rates due to alternative
START codons that are downstream (as was demonstrated in the current study) but also upstream of the beginning of the ORF, which are
not in the same frame of the main ORF. However, in some cases it may
be difficult to determine the exact mechanism that causes the alternative initiation. In the case of alternative initiation for a codon that
is downstream of the beginning of the ORF, if the analyzed protein
levels are low it is possible that this signal is due to skipping of the
main start codon (i.e. leaky scanning). However, it is also possible,
at least partially, that the alternative initiation is due to cases of 5′
truncated mRNA transcripts; this could occur by cap-independent
translation of degradation intermediates, cryptic promoters or aberrant splicing/processing; such truncated transcripts may not include
Please cite this article as: T. Ben-Yehezkel, et al., Genomics (2013), http://dx.doi.org/10.1016/j.ygeno.2013.05.003
T. Ben-Yehezkel et al. / Genomics xxx (2013) xxx–xxx
the main START and thus translation from these transcripts does not
involve leaky scanning of their main START. It is important to mention
that currently, the typical abundance of such 5′ truncated mRNA transcripts is unknown; if their relative abundance is less than 1% (as in
the cases of transcription elongation errors which are estimated to
be 1:10,000 [12]), this mechanism will become negligible relatively
to the leaky scanning explanation. We believe, however, that the ability to detect the alternative initiation rate is important regardless of
the biophysical mechanisms causing it.
One possible source of bias in our method is the fact that the proteins corresponding to the different constructs usually have different
N-terminals. An interesting direction for improving our approach is
by including a ‘stop–go self-cleaving peptide’ [35]. Specifically, an
augmented version of our method can include a self-cleaving peptide
that will cleave the N-terminal of the generated peptide after translation such that it will be identical for all variants. However, data from
such constructs may be problematic and/or biased in different ways.
For example, first, 2A peptides used in stop–go constructs are a significant insertional mutation (approximately 60 bp) into our constructs,
rendering them largely different from the native gene we attempt to
provide data for. Second, the 2A insertion may affect the activity and
stability of our constructs in unpredictable ways (see [36]).
Finally, cleavage efficiency of the inserted peptide is not guaranteed
to be fully efficient in constructs with different N-terminals and may
therefore skew our results or create an expression bottleneck that
masks our results (see [37,35]).
In summary, the quantification of translation initiation is experimentally challenging, fraught with potential biases, with each experimental approach having its advantages and drawbacks. Specifically
in the case of stop–go constructs, we evaluate that these may potentially insert more bias in the system compared to the relatively small
differences in the N-terminals of our constructs.
4. Material and methods
4.1. Genomic analysis
4.1.1. Coding sequences and UTRs
The coding sequences and UTRs of S. cerevisiae were downloaded
from BioMart [38], and the UTR lengths were based on the study of
[39]. The analysis regarding the ATG distribution upstream of the beginning of the ORF was based on the information of the UTR length
from [39] and included only the 5′UTR region.
7
4.1.2. ATG context score
We investigated the ATG context of the start codon and ATGs upstream and downstream from it, in a genome-wide manner, by devising
a measure based on the ATG context of the start codon, called the ATG
context score. We define an ATG context score (CS) according to the following algorithm (results are robust to versions of this procedure):
1. Select percentage of highly expressed/translated genes.
2. Calculate a position specific scoring matrix (PSSM) based on the
nucleotide context around the start codon of the selected highly
expressed genes. This PSSM represents the nucleotides necessary
for highly efficient translation (see Fig. 6).
3. Calculate the context score per ATG position according to the PSSM
for the rest of the genes.
The training set for the position specific scoring matrix (PSSM) calculation is based on the selection of the top 2% of highly expressed genes,
according to the product of ribosomal density (RD) and mRNA levels
(ML) [39–41], which represent the total flux of ribosomes over a gene.
However, similar results were achieved when selecting the highly
expressed group of genes based solely on RD or ML, and also when
based on other measures of expression, such as protein abundance etc.
We consider two large scale ribosomal density (RD) measurements (the number of ribosomes occupying the transcript divided
by its length); each generated by a different technology.
The first dataset was generated more recently by Ingolia et al. [41]
and the second by Arava et al. [40]. We average across the two RD
datasets (after normalizing each dataset by its mean), in order to
minimize experimental noise. Similar results were obtained when
we analyzed each dataset separately. We average across four datasets
of protein abundance measurements [42–44], similarly to RD. Similar
results were obtained when we analyzed each dataset separately. We
considered two large scale measurements of mRNA levels from [41
and 39].
Similarly to RD, we averaged across the two datasets of mRNA levels.
In order to avoid over fitting, we select the top 4% highly expressed
genes according to the product of RD and ML, and then randomly select
half of that group as the training set. Different percentages of training
sets sizes (up to 50%) were also tested in the same manner, and the results are robust to training set size variance.
The PSSM was calculated according to the nucleotide (C, A, T, G)
appearance probability (frequency), of the training set (2% selected
at (1)), for 6 nts before and 3 nts after the first ATG (start) codon,
based on the length of the potential optimal ATG context proposed
Fig. 6. ATG context score PSSM for S. cerevisiae. The figure includes for each position near the beginning of the ORF the distribution of the four nucleotides in highly expressed genes
(details in the Material and methods section).
Please cite this article as: T. Ben-Yehezkel, et al., Genomics (2013), http://dx.doi.org/10.1016/j.ygeno.2013.05.003
8
T. Ben-Yehezkel et al. / Genomics xxx (2013) xxx–xxx
by Hamilton et al. [45]. We achieved similar results when considering
various context lengths. Interval values between 3 and 48 nts around
the start codon were examined, for symmetric and asymmetric combinations, in jumps of 3 nts, and additionally in jumps of 1 nt from
1 nt up to 10 nts, and the results are robust to interval variance.
These intervals were tested for all the aforementioned training set
sizes in (1), and the results remain robust. The resultant PSSM appears in Fig. 6.
For each gene, the ATG context score is calculated per ATG position, according to the highly expressed genes PSSM calculated in
(2): ATGcsj = exp(∑ilog(Pij)), where j is the gene index, i the nucleotide position, Pij the probability that the ith nucleotide of the jth gene
appears in the ith position. A relative context score is calculated by normalizing by the main ATG's context score: ATGcsj = exp(∑ilog(Pij))/
exp(startATGcs). We considered up to the last 6 nt of the UTR and the
first 3 nt of the ORF. Note that a PSSM with different background weights
for different nucleotides (i.e. ATGcsj = exp(∑ilog(Pij/bij)) where bij is
the genomic background frequency over the region of the PSSM (up to
the last 6 nt of the 5′UTR and the first 3 nt of the ORF after the ATG) of
nucleotide i in gene j) did not improve the correlation between the ATG
context score and the ribosomal density (r = 0.19 [p = 6.3 ∗ 10−30]
for the first model vs. r = 0.17 [p = 7.8 ∗ 10−25] for the second
model); thus, we remained with the simpler model (Occam's razor).
Finally, when we considered a slightly different variant of the context score the results remain very similar:
Suppose that for a certain gene the nt corresponding to position i
in the PSSM is missing for a certain ATG (due to short 5′UTR). The
score of the missing positions will be computed as follows:
Let p1, p2, p3, and p4 be the distribution of the 4 nucleotides (A, C,
T, G) in the genome in this position (all genes). Let p′1, p′2, p′3, and
p′4 be the four probabilities of the PSSM in this position (based on
highly expressed genes). The expected PSSM score in such a position is: p′1 ∗ p1 + p′2 ∗ p2 + p′3 ∗ p3 + p′4 ∗ p4 (this is the mean
score of this position over all genes). This variant yields similar results to the previous one — the context scores are higher at the ORF
than at the 5′UTR (for non-normalized ATG score: KS test
p-value = 6.47 ∗ 10− 9; 5′UTR mean: − 47.8975; 5′UTR median:
− 15.1650; ORF mean: − 14.7540 ORF median: − 14.6278. For
normalized ATG score: KS test p-value: 4.88 ∗ 10− 67; 5′UTR mean:
− 34.8717; 5′UTR median: − 2.1516; ORF mean: 135.0816; ORF median: − 0.7237).
List of ATG codons, their positions and context scores appears in
Supplementary Table 2.
4.1.3. ATG Kozak score
Around 30 years ago Kozak [6] suggested that there is a specific optimal context surrounding the ATG start codon. Specifically, she identified ACCATGG as the optimal sequence for initiation by eukaryotic
ribosomes. In this case the context score is the hamming distance (or
the number of mutations) from this optimal context; thus, lower numbers denote a better score and 0 is the best possible score. In vivo, this
site is often not matched exactly on different mRNAs and it is believed
that the amount of protein synthesized from a given mRNA is dependent on the strength of the Kozak sequence. We found that few
(0.0048% in S. cerevisiae and 0.0068 in Schizosaccharomyces pombe) of
the sequences satisfy the Kozak rule precisely (with an average hamming distance of 0.67 in S. cerevisiae and S. pombe). Thus, we decided
to use the context score above; however, analysis with the KOZAK
score instead of the context score yields very similar results: The
Kozak scores for the main ATG, and ALT1-4 of the RMD1gene respectively are, 0.5, 1.0, 0.75, 0.50, and 0.75.
4.1.4. ATG out-of-frame analysis and sequence construction
There are three possible frames relative to the main ATG start
codon an alternative codon can be in: frame 0, the same frame as
the main ATG, frame 1, a 1 nt shift from the main ATG's frame, and
frame 2, a 2 nt shift from the main ATG's frame.
We consider all the alternative ATGs in a frame, such that for each
gene we consider all 3 frames, and construct the experiment sequences in the following manner:
For each gene, for each frame containing alternative ATGs, we construct a sequence composed of the last 50 codons of the 5′UTR, and
the first 50 codons of the ORF, ending at nucleotides 150, 151 and
152 respective to the frame shifts, with the GFP protein concatenated
at the end.
We examined gene candidates according to the number of alternative ATGs in the 5′UTR and ORF, the best/worst/median alternative
ATG context score; the distance of the first stop codon from the alternative ATG, the metabolic cost of the peptide defined between the alternative ATG and the first stop codon; the distance of an alternative
ATG from the start of the GFP; the % of the mRNA levels of the gene
compared to the gene TEF (YPR080W) (as this gene is known to
have a strong promoter); the number/distance of relevant STOP codons (i.e. the first stop codon after each alternative ATG), the rank
of the gene's protein levels; and the rank of the gene's RD.
4.1.5. Weighted number of ATGs
This analysis (Supplementary Figs. 3, 4, 6, 7B, 8B, 9) was performed
by weighting the mRNA levels of the transcripts assuming a total of
60,000 transcripts and based on the mRNA levels from [41]. The results
were similar to the ones reported in the main text.
4.1.6. Analysis of length distribution of peptides potentially generated
from alternative ATGs in the S. cerevisiae genome
We considered the distance of out-of-frame ATGs from STOP codons in their frame, and the distance of in-frame alternative ATGs
from the traditional START ATGs (Material and methods). Our analysis shows that of the alternative (out-of-frame and in-frame) 16,635
ATG codons, 12,484 are out-of-frame alternative ATG codons with a
mean distance of 14.83 codons to the closest STOP codon, and 4151
are alternative in-frame ATG codons with a mean distance of 26.95
codons to the closest main START ATG (see also Supplementary figures). In comparison, the mean length of annotated S. cerevisiae peptides is 496.93 with STD 383.07, and with minimal ORF length of only
17 codons (close to the values above).
4.1.7. Empirical p-value based on bootstrapping
4.1.7.1. Context score empirical p-values. Let Nu and No denote sets of
ATG (absolute or relative) context scores in the 5′UTRs and ORFs respectively; let N denote the group of ATGs that appear in Nu or in
No (|N| = |Nu| + |No|). We describe here the procedure for the medians comparisons, but the same procedure was performed for the
means comparisons.
Let d ¼ medianðNoÞ−medianðNuÞ
Repeat the following 1000 times:
1) Sample a set Nui of size |Nu| from N; the rest of the ATG scores
from N are denoted as a set Noi
2) Compute di = median(Noi) − median(Nui)
The empirical p-value is (no. of cases where di ≥ d) / 1000.
The empirical p-values for both comparisons (absolute and relative) and when considering mean or median are p b 0.001 (in none
of the random comparisons were the differences between the ORF
and the 5′UTR mean/median larger or equal to the original one).
The same procedure was applied in all cases, absolute or relative,
mean or median.
Please cite this article as: T. Ben-Yehezkel, et al., Genomics (2013), http://dx.doi.org/10.1016/j.ygeno.2013.05.003
T. Ben-Yehezkel et al. / Genomics xxx (2013) xxx–xxx
4.1.7.2. YFP levels empirical p-values. Let Y1 and Y2 denote sets/vectors
of YFP measurements for two strains (e.g. ‘Alt’ and ‘TTG’); let Y denote
the vector/set of YFP that appears in Y1 or in Y2 (|Y| = |Y1| + |Y2|).
We describe here the procedure for the mean comparisons.
Let d ¼ meanðY1Þ−meanðY2Þ
Repeat the following 1000 times:
1) Sample a set Y1i of size |Y1| from Y; the rest of the YFP measurements from Y are denoted as a set Y2i
2) Compute di = mean(Y1i) − mean(Y2i)
The empirical p-value is (no. of cases where di ≥ d) / 1000.
4.1.7.3. The expected relations between initiation rates and YFP levels. In
this subsection we briefly discuss the relation between translation initiation rates and protein concentration/abundance. Specifically, we will
explain why we expect that protein abundance should stand in high
positive correlation with translation initiation rates in our study.
Protein abundance levels are determined by a balance between
protein production and degradation rates. Fixing the degradation
rate, protein abundance levels will rise when the production rate is
increased. Fixing the production rate, protein abundance levels will
decrease when the degradation rate is increased.
Let ci denote the concentration of protein i and let us assume that
this protein is translated from a certain mRNA transcript with mi copies in the cell. In general, the dynamics of this process may be described by the following differential equation: dcdti ðt Þ ¼ Ri ⋅mi −Di ðci Þ.
Where Ri and D(ci) are the translation rate per mRNA molecule and
the degradation rate of protein i correspondingly. One common
choice for D(ci) is: Di(ci) = di ⋅ ci(t) where di > 0 is constant.
The steady state solution of the above differential equation is:
ss
di ⋅ css
i = R i ⋅ mi where ci is the steady state concentration of the protein i (that is measured by the YFP levels). All the variants studied in this
research have very similar ORFs with up to four point mutations and/or
indels at their beginning; the mutations are related to the ATGs and thus
expected to be related to initiation rates (and not elongation or termination rates). Thus, in our case Ri is assumed to be mostly initiation
rate, we expect that the few very specific mutations/indels introduced in
the experiment will not affect degradation rates
or mRNA levels. Thus,
d ⋅css
from the equation above we get that Ri ¼ imii and indeed Ri∝css
i .
4.2. The different variants and the rationale behind them
The analysis included eight different variants of the gene RMD1
(see also Fig. 3). Each variant containing ~ 150 nucleotides of the beginning of this gene (Fig. 3) fused with a YFP gene, and measured
their fluorescent levels:
1) The first variant (named Main) wasn't mutated compared to the
original RMD1 gene and thus estimated the protein levels due to
translation initiation solely from the main ATG codon.
2) The second variant (named ALT) was fused to the YFP with a 1 nt
frame shift in order to measure the protein levels only due to the
alternative ATGs of that frame (Fig. 3).
3) The third variant (named TTG) was engineered to estimate an
upper bound on the noise; it was reconstructed from the second
variant by replacing all the alternative ATGs with TTG.
4) The fourth variant (named None) was a YFP gene without a promoter; thus, it is a lower bound on the noise.
To estimate the initiation rate due to each one of the alternative
ATG codons separately we also generated four additional variants;
each variant was similar to the third variant above, but in this case
only three of the alternatives ATGs were mutated to TTG. We
reconstructed such a variant for each of the four alternative ATGs
and named these variants: ALT1 (the alternative ATG closest to the
9
beginning of the ORF is not mutated), ALT2, ALT3, and ALT4 (the alternative ATG furthest from the beginning of the ORF is not mutated)
respectively.
The fact that we compared the protein levels of each of the variants to the TTG variants enabled us to estimate the protein levels
due to alternative initiation events and not due to frame shifts during
translation elongation after initiation (which is not expected to occur
when mutating ATG codons to TTGs).
4.3. Molecular biology
4.3.1. Methods for DNA construction
Construction of the DNA of the YDL001W variants was performed
according to the methods described in [46].
4.3.2. Phosphorylation
300 pmol of single stranded DNA in a 50 μl reaction containing
70 mM Tris–HCl, 10 mM MgCl2, 7 mM dithiothreitol, pH 7.6 at 37 °C,
1 mM ATP & 10 units T4 Polynucleotide Kinase (NEB) was used. Reaction is incubated at 37 °C for 30 min, then at 42 °C for 10 min and
inactivated at 65 °C for 20 min.
4.3.3. Elongation
1 pmol of single stranded DNA of each progenitor in a 25 μl reaction containing 2.5 μl of Hot Start DNA Polymerase (Novagen,
71086-3) reaction according to manufacturer's guidelines was used.
Three cycles of annealing were executed for each elongation to ensure full yield of elongation.
4.3.4. PCR
All PCR reactions were performed in 96 well PCR microplates,
using KOD Hot Start DNA Polymerase (Novagen, 71086-3) according
to its protocol.
4.3.5. Digestion by Lambda exonuclease
1–5 pmol of 5′ phosphorylated DNA termini in a 30 μl reaction
containing 67 mM Glycine–KOH, 2.5 mM MgCl2, 0.01% Triton X-100,
5 mM 1,4-dithiothreitol, 5.5 units Lambda exonuclease (Epicentre)
and SYBR Green diluted 1:50,000. Thermal Cycler program is 37 °C
for 15 min, 42 °C for 10 min, enzyme inactivation at 65 °C for 10 min.
4.3.6. Chemical oligonucleotide synthesis
Oligonucleotides for all experiments were ordered from IDT with
standard desalting.
4.3.7. Minigenes
Three sequenced fragments of under 300 bp containing segments
of the YDL001W variants were ordered as IDT Minigenes.
4.3.8. DNA purification
DNA construction related DNA purification was performed with
Qiagen Minielute purification kit using standard protocols.
4.3.9. The master strain
The master strain was created by integrating into the yeast genome a cassette containing a promoter-less YFP, followed by a NAT
(Nourseothricin) resistance marker under its own promoter. The entire sequence was inserted into the his3Δ1 locus.
4.3.10. Transformations
The three YDL001W-YFP fusion variants were transformed into
the master strain using the LiAc/SS carrier DNA/PEG method following the procedures described in [47].
Cells were plated on solid agar SD-URA selective media and incubated at 30 °C for 3–4 days. Transformant colonies were handpicked
and patched on SD-URA + NAT (Werner BioAgents) agar plates.
Please cite this article as: T. Ben-Yehezkel, et al., Genomics (2013), http://dx.doi.org/10.1016/j.ygeno.2013.05.003
10
T. Ben-Yehezkel et al. / Genomics xxx (2013) xxx–xxx
Correct transformation was verified for the 3 variants by PCR amplification from the yeast's genome and gel electrophoresis. All strains
were additionally verified by sequencing.
4.3.11. YiFP Library construction
The three synthetic YFP fusion constructs were transformed into a
master strain that contained a promoter-less YFP coding sequence at
the his3Δ1 locus. Each synthetic construct contained a URA3 selection
marker under its own promoter followed by a TEF promoter, the last
30 bp of YDL001W's 5′UTR, the first 150 bp of YDL001W's ORF and
the beginning of the YFP ORF (for recombination purposes).
4.3.12. Sequencing
Several single colonies were picked manually from the plates of each
variant, the specific integration locus was PCR amplified from each
clone. Correct size amplifications were verified by gel electrophoresis.
Amplicons were sequenced in house using Sanger sequencing. Colonies
with the correct sequence were chosen for analysis.
4.3.13. Culture and fluorescence measurements
In order to measure growth and fluorescent protein expression,
the Singer colony arrayer was used to inoculate all colonies of the library into 100 μl of SD-URA media in a 384 well growth microplate
(Greiner bio-one, 781162). Following 24 h of pre-incubation, 5 μl of
the yeast cultures was diluted into 80 μl of SD complete media in a
384 well microplate, to reach a starting O.D600 of ~ 0.1–0.2.
A microplate reader (Neotec Infinite M200 monochromator) was
then set to measure the following parameters in cycles of 10 min:
Cell growth (as extracted from absorbance at 600 nm) and YFP expression (Ex. 500 Em. 540). Each cycle contained 4 min of orbital
shaking at amplitude of 3 mm. The number of cycles was set to 100
(16h) and the temperature to 30 °C.
The experiment included three plates, each plate included four
variants, and 96 wells included yeast cultures for each of the variants.
Specifically, the 3 different plates included the following variants (all
data in Supplementary Table 1):
Plate 1: ALT, ALT1, None (denoted as None1), TTG (denoted as
TTG1).
Plate 2: Main, ALT2, None (denoted as None2), TTG (denoted as
TTG2).
Plate 3: ALT3, ALT4, None (denoted as None3), TTG (denoted as
TTG4).
4.4. Statistical analysis and normalization
To gain fair comparisons between the protein levels of the different variants we performed several normalization steps. First, we removed from each measurement of each variant outliers with YFP or
OD levels that are more than 5 standard deviations from the other
measurements of the variant at the same time point. Second, for
each measurement, we removed the time points where the OD or
YFP is after the peak (see Supplementary Fig. 1). Third, as the YFP/
OD (i.e. an estimation of the fluorescent levels per cell) is generally
significantly dependent on the OD levels, we performed the following
procedure:
1) We use the TTG variant in plate 1 (TTG1), which has the lowest
mean OD levels after the first two stages, as a reference.
2) For each of the other variants we iteratively removed time points
with the largest OD until we reached a mean OD similar to the one
of the TTG1 variant.
At the end of the third step all the variants have similar mean OD.
At the next step, for each of the up to 133 measurements of the variants in the three wells, the YFP levels were normalized by dividing
them with the corresponding OD measurements to get an estimation
of the protein levels per cell. The variants related to the TTG and the
None from the three plates were concatenated respectively after normalizing the values in plates 2 and 3, such that the mean YFP/OD levels
of TTG2 and TTG3 are identical to the mean YFP/OD levels of TTG1.
The values of pairs of variants were statistically compared by a
Kolmogorov–Smirnov (KS) test.
Similarly, to estimate the ratio between the protein levels due to the
first alternative ATG and the main ATG we used the following formula:
½ðPATTG2 =PATTG1 Þ⋅ðPAAlt1 −PATTG1 Þ
ðPAMain −PATTG2 Þ
Variants ALT2, ALT3, and ALT4 (related to initiation from the second, third, and fourth ATG, respectively) did not exhibit expression
levels higher than the TTG variants in their plate. Thus, we concluded
that there is no translation initiation from these ATGs.
Finally, we also utilized the time before t = 40 as we believe that
these data are also informative (given the normalizations mentioned
above). To prove this point, we computed the correlation between
YFP/OD obtained for the strains before time 40 and the ones obtained
afterwards. For all the plates the correlations were significantly positive (e.g. a correlation of 0.8 for plate 2; p b 10−16). Thus, since the
ranking of strains based on the first and last time points is similar it
means that the initial points are informative, and using them should
improve the statistical power of the reported analyses. This is the reason we decided to use these points in the analysis.
Supplementary data to this article can be found online at http://
dx.doi.org/10.1016/j.ygeno.2013.05.003.
Acknowledgments
We thank Prof. Martin Kupiec, Mr. Ido Yofe, and Mr. Tamir
Biezuner for helpful discussions. This study was supported in part
by a fellowship from the Edmond J. Safra Center for Bioinformatics
at Tel-Aviv University.
References
[1] I.B. Lomakin, N.E. Shirokikh, M.M. Yusupov, C.U. Hellen, T.V. Pestova, The fidelity
of translation initiation: reciprocal activities of eIF1, IF3 and YciH, EMBO J. 25
(2006) 196–210.
[2] N. Sonenberg, A.G. Hinnebusch, Regulation of translation initiation in eukaryotes:
mechanisms and biological targets, Cell 136 (2009) 731–745.
[3] A.G. Hinnebusch, Microbiol. Mol. Biol. Rev. 75 (2011) 434–467.
[4] R.J. Jackson, C.U. Hellen, T.V. Pestova, The mechanism of eukaryotic translation initiation and principles of its regulation, Nat. Rev. Mol. Cell Biol. 11 (2011) 113–127.
[5] F. Sherman, The importance of mutation, then and now: studies with yeast cytochrome c, Mutat. Res. 589 (2005) 1–16.
[6] M. Kozak, Point mutations define a sequence flanking the AUG initiator codon
that modulates translation by eukaryotic ribosomes, Cell 44 (1986) 283–292.
[7] M. Kozak, Pushing the limits of the scanning mechanism for initiation of translation, Gene 299 (2002) 1–34.
[8] C.P. Chang, S.J. Chen, C.H. Lin, T.L. Wang, C.C. Wang, A single sequence context
cannot satisfy all non-AUG initiator codons in yeast, BMC Microbiol. 10 (2010)
188.
[9] N.T. Ingolia, L.F. Lareau, J.S. Weissman, Ribosome profiling of mouse embryonic
stem cells reveals the complexity and dynamics of mammalian proteomes, Cell
147 (2011) 789–802.
[10] C. Fritsch, A. Herrmann, M. Nothnagel, K. Szafranski, K. Huse, F. Schumann, S.
Schreiber, M. Platzer, M. Krawczak, J. Hampe, et al., Genome-wide search for novel
human uORFs and N-terminal protein extensions using ribosomal footprinting, Genome Res. 22 (2012) 2208–2218, http://dx.doi.org/10.1101/gr.139568.112, (Epub
132012 Aug 139569).
[11] I.P. Ivanov, A.E. Firth, A.M. Michel, J.F. Atkins, P.V. Baranov, Identification of evolutionarily conserved non-AUG-initiated N-terminal extensions in human coding sequences,
Nucleic Acids Res. 39 (2011) 4220–4234, http://dx.doi.org/10.1093/nar/gkr007,
(Epub 2011 Jan 4225).
[12] B. Alberts, A. Johnson, J. Lewis, M. Raff, K. Roberts, P. Walter, Molecular Biology of
the Cell, Garland Science, New York, 2002.
[13] J. Pelletier, N. Sonenberg, Internal initiation of translation of eukaryotic mRNA directed by a sequence derived from poliovirus RNA, Nature 334 (1988) 320–325.
Please cite this article as: T. Ben-Yehezkel, et al., Genomics (2013), http://dx.doi.org/10.1016/j.ygeno.2013.05.003
T. Ben-Yehezkel et al. / Genomics xxx (2013) xxx–xxx
[14] M. Mokrejs, V. Vopalensky, O. Kolenaty, T. Masek, Z. Feketova, P. Sekyrova, B.
Skaloudova, V. Kriz, M. Pospisek, IRESite: the database of experimentally verified
IRES structures. , (www.iresite.org) Nucleic Acids Res. 34 (2006) D125–D130.
[15] A.M. Michel, K. Roy Choudhury, A.E. Firth, N.T. Ingolia, J.F. Atkins, P.V. Baranov,
Observation of dually decoded regions of the human genome using ribosome profiling data, Genome Res. 2012 (2012) 16.
[16] S. Lee, B. Liu, S.X. Huang, B. Shen, S.B. Qian, Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution, Proc. Natl. Acad.
Sci. U. S. A. 109 (2012) E2424–E2432, (Epub 2012 Aug 2427).
[17] L.B. Slusher, E.C. Gillman, N.C. Martin, A.K. Hopper, mRNA leader length and initiation codon context determine alternative AUG selection for the yeast gene
MOD5, Proc. Natl. Acad. Sci. U. S. A. 88 (1991) 9789–9793.
[18] J.R. Pedrajas, P. Porras, E. Martinez-Galisteo, C.A. Padilla, A. Miranda-Vizuete, J.A.
Barcena, Two isoforms of Saccharomyces cerevisiae glutaredoxin 2 are expressed in
vivo and localize to different subcellular compartments, Biochem. J. 364 (2002)
617–623.
[19] H. Zur, T. Tuller, New Universal Rules of Eukaryotic Translation Initiation Fidelity,
PLoS Comput. Biol. (2013), (in press).
[20] N. Ban, P. Nissen, J. Hansen, P.B. Moore, T.A. Steitz, The complete atomic structure
of the large ribosomal subunit at 2.4 Å resolution, Science 289 (2000) 905–920.
[21] M. Kozak, Effects of intercistronic length on the efficiency of reinitiation by
eucaryotic ribosomes, Mol. Cell. Biol. 7 (1987) 3438–3445.
[22] V. Munzarova, J. Panek, S. Gunisova, I. Danyi, B. Szamecz, L.S. Valasek, Translation
reinitiation relies on the interaction between eIF3a/TIF32 and progressively folded
cis-acting mRNA elements preceding short uORFs, PLoS Genet. 7 (2012) e1002137.
[23] M. Kozak, Point mutations close to the AUG initiator codon affect the efficiency of
translation of rat preproinsulin in vivo, Nature 308 (1984) 241–246.
[24] T. Tuller, Y.Y. Waldman, M. Kupiec, E. Ruppin, Translation efficiency is determined by
both codon bias and folding energy, Proc. Natl. Acad. Sci. U. S. A. 107 (2010) 3645–3650.
[25] W. Gu, T. Zhou, C.O. Wilke, A universal trend of reduced mRNA stability near the
translation–initiation site in prokaryotes and eukaryotes, PLoS Comput. Biol. 6
(2010) 1–8.
[26] M. Kozak, Downstream secondary structure facilitates recognition of initiator codons by eukaryotic ribosomes, Proc. Natl. Acad. Sci. U. S. A. 87 (1990) 8301–8305.
[27] T. Tuller, I. Veksler-Lublinsky, N. Gazit, M. Kupiec, E. Ruppin, M. Ziv-Ukelson,
Composite effects of gene determinants on the translation speed and density of
ribosomes, Genome Biol. 12 (2011) R110.
[28] T. Kondo, S. Plaza, J. Zanet, E. Benrabah, P. Valenti, Y. Hashimoto, S. Kobayashi, F.
Payre, Y. Kageyama, Small peptides switch the transcriptional activity of
Shavenbaby during Drosophila embryogenesis, Science 329 (2010) 336–339.
[29] S.A. Mackenzie, Plant organellar protein targeting: a traffic plan still under construction, Trends Cell Biol. 15 (2005) 548–554.
[30] S. Vagner, M.C. Gensac, A. Maret, F. Bayard, F. Amalric, H. Prats, A.C. Prats, Alternative translation of human fibroblast growth factor 2 mRNA occurs by internal
entry of ribosomes, Mol. Cell Biol. 15 (1995) 35–44.
[31] C.J. Danpure, How can the products of a single gene be localized to more than one
intracellular compartment? Trends Cell Biol. 5 (1995) 230–238.
11
[32] G. Kim, N.B. Cole, J.C. Lim, H. Zhao, R.L. Levine, Dual sites of protein initiation control the localization and myristoylation of methionine sulfoxide reductase A. J.
Biol. Chem. 285 18085–18094.
[33] J.E. Larkin, B.C. Frank, H. Gavras, R. Sultana, J. Quackenbush, Independence and reproducibility across microarray platforms, Nat. Methods 2 (2005) 337–344,
(Epub 2005 Apr 2021).
[34] A. Dana, T. Tuller, Determinants of translation elongation speed and ribosomal profiling biases in mouse embryonic stem cells, PLoS Comput. Biol. 8 (2012) e1002755.
[35] A.L. Szymczak, C.J. Workman, Y. Wang, K.M. Vignali, S. Dilioglou, E.F. Vanin, D.A.
Vignali, Correction of multi-gene deficiency in vivo using a single ‘self-cleaving’
2A peptide-based retroviral vector, Nat. Biotechnol. 22 (2004) 589–594, (Epub
2004 Apr 2004).
[36] M. Amendola, M.A. Venneri, A. Biffi, E. Vigna, L. Naldini, Coordinate dual-gene
transgenesis by lentiviral vectors carrying synthetic bidirectional promoters,
Nat. Biotechnol. 23 (2005) 108–116, (Epub 2004 Dec 2026).
[37] M.L. Donnelly, L.E. Hughes, G. Luke, H. Mendoza, E. ten Dam, D. Gani, M.D. Ryan,
The ‘cleavage’ activities of foot-and-mouth disease virus 2A site-directed mutants
and naturally occurring ‘2A-like’ sequences, J. Gen. Virol. 82 (2001) 1027–1041.
[38] J.M. Guberman, J. Ai, O. Arnaiz, J. Baran, A. Blake, R. Baldock, C. Chelala, D. Croft, A.
Cros, R.J. Cutts, et al., BioMart Central Portal: an open database network for the biological community, Database (Oxford) 18 (2011), (bar041).
[39] U. Nagalakshmi, Z. Wang, K. Waern, C. Shou, D. Raha, M. Gerstein, M. Snyder, The
transcriptional landscape of the yeast genome defined by RNA sequencing, Science 320 (2008) 1344–1349.
[40] Y. Arava, Y. Wang, J.D. Storey, C.L. Liu, P.O. Brown, D. Herschlag, Genome-wide
analysis of mRNA translation profiles in Saccharomyces cerevisiae, Proc. Natl.
Acad. Sci. U. S. A. 100 (2003) 3889–3894.
[41] N.T. Ingolia, S. Ghaemmaghami, J.R.S. Newman, J.S. Weissman, Genome-wide
analysis in vivo of translation with nucleotide resolution using ribosome profiling, Science 324 (2009) 218.
[42] S. Ghaemmaghami, W.K. Huh, K. Bower, R.W. Howson, A. Belle, N. Dephoure, E.K.
O'Shea, J.S. Weissman, Global analysis of protein expression in yeast, Nature 425
(2003) 737–741.
[43] J.R.S. Newman, S. Ghaemmaghami, J. Ihmels, D.K. Breslow, M. Noble, J.L. DeRisi,
J.S. Weissman, Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise, Nature 441 (2006) 840–846.
[44] M.V. Lee, S.E. Topper, S.L. Hubler, J. Hose, C.D. Wenger, J.J. Coon, A.P. Gasch, A dynamic model of proteome changes reveals new roles for transcript alteration in
yeast, Mol. Syst. Biol. 7 (2011).
[45] R. Hamilton, C. Watanabe, H. De Boer, Compilation and comparison of the sequence context around the AUG start codons in Saccharomyces cerevisiae
mRNAs, Nucleic Acids Res. 15 (1987) 3581.
[46] G. Linshiz, T.B. Yehezkel, S. Kaplan, I. Gronau, S. Ravid, R. Adar, E. Shapiro, Recursive construction of perfect DNA molecules from imperfect oligonucleotides, Mol.
Syst. Biol. 4 (2008) 191.
[47] R.D. Gietz, R.A. Woods, Transformation of yeast by the Liac/SS carrier DNA/PEG
method, Methods Enzymol. 350 (2002) 87–96.
Please cite this article as: T. Ben-Yehezkel, et al., Genomics (2013), http://dx.doi.org/10.1016/j.ygeno.2013.05.003