YGENO-08524; No. of pages: 11; 4C: Genomics xxx (2013) xxx–xxx Contents lists available at SciVerse ScienceDirect Genomics journal homepage: www.elsevier.com/locate/ygeno Methods Mapping the translation initiation landscape of an S. cerevisiae gene using fluorescent proteins Tuval Ben-Yehezkel a,b,1, Hadas Zur c,1, Tzipy Marx b, Ehud Shapiro a,b, Tamir Tuller d,e,⁎ a Computer Science & Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel Computer Science, Tel Aviv University, Tel Aviv, Israel d Biomedical Engineering, Tel Aviv University, Tel Aviv, Israel e The Sagol School of Neuroscience, Tel-Aviv University, Tel-Aviv, Israel b c a r t i c l e i n f o Article history: Received 13 March 2013 Accepted 17 May 2013 Available online xxxx Keywords: S. cerevisiae Translation Out-of-frame codons Genomics Scanning model 5′UTR a b s t r a c t Accurate and efficient gene expression requires that protein translation initiates from mRNA transcripts with high fidelity. At the same time, indiscriminate initiation of translation from multiple ATG start-sites per transcript has been demonstrated, raising fundamental questions regarding the rate and rationale governing alternative translation initiation. We devised a sensitive fluorescent reporter assay for monitoring alternative translation initiation. To demonstrate it, we map the translation initiation landscape of a Saccharomyces cerevisiae gene (RMD1) with a typical ATG sequence context profile. We found that up to 3%–5% of translation initiation events occur from alternative out-of-frame start codons downstream of the main ATG. Initiation from these codons follows the ribosome scanning model: initiation rates from different start sites are determined by ATG order, rather than their context strength. Genomic analysis of S. cerevisiae further supports the scanning model: ATG codons downstream rather than upstream of the main ATG tend to have higher context scores. © 2013 Elsevier Inc. All rights reserved. 1. Introduction Gene translation is a central metabolic and regulatory process in all living organisms. A central open question in this field relates to the fidelity and regulation of translation initiation [1–3]. In eukaryotes, canonical initiation of translation is believed to involve scanning of an mRNA transcript with the pre-initiation complex until a start codon, which is typically an AUG codon with the ‘correct’ surrounding sequence context, is approached [4–6]. At this point the large ribosomal subunit associates and elongation commences. According to the first AUG-rule, AUG codons closer to the 5′UTR end of the transcript have a higher probability for initiating translation [7]. However, several deviations from this canonical model have been reported: for example, initiations from non-AUG codons (ACG/CUG/ GUG) [7–10]; specifically, it was also shown for a specific yeast gene, ALA1, that initiation from all single codons resulting from a single mutation of AUG is possible [8]. Recent studies have demonstrated that mammalian genes with non-AUG N-terminal extensions evolve under ⁎ Corresponding author at: Biomedical Engineering, Tel Aviv University, Tel Aviv, Israel. E-mail address: [email protected] (T. Tuller). 1 Equal contribution. strong purifying selection [11] and suggest that frequently non-AUG codons induce translation initiation [9,10]. There are also reported cases of leaky scanning where AUG codons with sub-optimal contexts are skipped and translation initiates at a downstream AUG [7]. Finally, there are known mechanisms of translation initiation from internal ribosome entry sites (abbreviated as IRES). An IRES is a nucleotide sequence that enables translation initiation from the middle of an mRNA transcript [12–14], potentially not involving mRNA scanning. Recently, studies based on the ribosomal profiling approach have suggested that, in mammals, translation initiation events and programmed frame shifts may occur downstream from the main START codon [9,15,16]. In yeast however, it has been suggested that leaky scanning rarely occurs [7], and cases of translation initiation at any measurable rate from alternative codons in native yeast genes that are not in the frame of the main ORF [17,18] have not, to our knowledge, been reported. In this study we devised a modular fluorescent reporter assay that experimentally monitors the rate at which translation initiates at alternative ATG codons, thereby accurately mapping the translation initiation landscape from genes with START codons not in the reading frame of the main ORF. We applied it to the yeast RMD1 gene and show that translation initiates from alternative ATG codons downstream of the main ATG and not in the frame of the main ORF, following a specific mechanistic model of translation initiation. Finally, we discuss the advantages and disadvantages of our approach and possible ramifications related to the reported results. 0888-7543/$ – see front matter © 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.ygeno.2013.05.003 Please cite this article as: T. Ben-Yehezkel, et al., Genomics (2013), http://dx.doi.org/10.1016/j.ygeno.2013.05.003 2 T. Ben-Yehezkel et al. / Genomics xxx (2013) xxx–xxx Please cite this article as: T. Ben-Yehezkel, et al., Genomics (2013), http://dx.doi.org/10.1016/j.ygeno.2013.05.003 T. Ben-Yehezkel et al. / Genomics xxx (2013) xxx–xxx 3 2. Results 2.1. The context score distribution of AUG codons in the yeast genome We devised a statistical score for ATG start codons (named context score [19]) based on their similarity to the sequence context of ATG START codons of highly expressed genes (Material and methods) in the genome of Saccharomyces cerevisiae, in which higher context scores denote similarity to the ATG context of highly expressed genes (Material and methods); the approach is similar to comparing the AUG context to the Kozak consensus, but is more sensitive as it includes specific weights for each nucleotide distribution (details in the Material and methods section). Surprisingly, our analysis of the S. cerevisiae genome reveals that the transcripts of this organism are enriched with alternative ATG codons with relatively high context scores (Material and methods), both before and after the conventional START ATG (Figs. 1A–B; see also Supplementary Figs. 2–9). Specifically, when considering the nucleotides at a distance of less than 50 codons upstream (in the 5′UTR) or downstream from the ORF's conventional START codon, which is the region along the transcript likely to participate in translation initiation, S. cerevisae's 5861 protein coding genes include a total of 16,635 alternative ATG codons in the three reading frames, of which 12,484 are out of frame. Although the abundance of high context scores, alternative ATG codons near the START codon (Figs. 1A–C) and not in the frame of the main ORF emphasize the discriminatory power of the translation machinery in initiating translation, it also raises the hypothesis that translation may frequently initiate from such alternative START ATGs. The fact that the context score of the ATG codons upstream of the beginning of the main ORF tends to be significantly lower than context scores of ATG codons downstream of the beginning of the main ORF (Figs. 2A– B) supports the scanning model and our hypothesis above. 2.2. A novel fluorescent reporter assay determines the rate at which translation initiates from alternative ATG codons not in the frame of the main ORF in vivo We provide initial experimental evidence for the hypothesis that translation may frequently initiate from alternative ATG codons not in the frame of the main ORF in S. cerevisiae by measuring the protein output from these codons in an S. cerevisiae gene. The mechanism of translation initiation by transcript scanning suggests that if alternative initiation occurs, it is most likely to commence from those relatively close to the main START ATG, potentially resulting in previously unknown short peptides [7]. These peptides are experimentally challenging to detect and quantify in the native endogenous context because of their infrequency and variability, rendering analyses using ad hoc methods complicated. To overcome these problems we devised an experimental setup that enables a sensitive and well controlled analysis of alternative translation initiation events from a different frame relatively to the reading frame of the main ORF (Fig. 3) from any gene (that fulfills some additional conditions that will be further discussed) using a unified methodology. Specifically, a set of variants of a gene is generated, each containing its first ~ 150 nucleotides (which is related to the length of polypeptide needed to fill the ribosomal exit tunnel Fig. 2. Distribution of ATG scores of alternative ATG codons before and after the beginning of the main ORF when considering the log of the absolute context scores (A.) and the log of the relative context scores (B.). As can be seen both relative (KS test, p-val = 4.9 ∗ 10−67, empirical p-value b 0.001) and absolute (KS test, p-val = 2.3 ∗ 10−7, empirical p-value b 0.001) ATG context scores are higher (more efficient) downstream of the START codon (relatively to the ones upstream of the START codon), supporting the scanning model. The values of the context scores include various ‘outliers’ due to the nature of this score (Material and methods); this is the reason that the mean absolute context score of the 5′UTR does not appear in A. [20]) fused to a downstream Yellow Florescent Protein (YFP) reporter gene, and their fluorescent output is recorded. We assume that the alternative initiation codons that we want to study have a +1 frame shift compared to the frame of the main ORF; cases with alterative initiation events from codons with a +2 frame shift compared to the frame of the main ORF can be analyzed in a similar manner. The first variant, named ‘Main’, has the first 150 nt of the wild-type (WT) gene fused to the YFP reporter, and thus estimates the protein levels due to translation initiation solely from the main ATG codon or other ATG codons in-frame (Fig. 3(1)). The second variant, named ‘Alt’, also contains the WT gene, but fused to the YFP reporter with a +1 nt frame shift, in order to measure the protein levels only due to translation initiation from all the alternative ATGs of that frame (Fig. 3(2)). The third variant, named ‘TTG’, was constructed from the ‘Alt’ variant by replacing all its alternative ATGs of the +1 frame with TTG (encoding Leucine). Its purpose is to estimate an upper bound on Fig. 1. The number and distribution of the context score of alternative and main START ATG codons (see also Supplementary Figs. 2–9). A. The number and distribution of alternative ATGs 150 nt upstream (in the 5′UTR) and downstream from the start ATGs according to their log-absolute context score. The context scores of the four alternative ATGs at the beginning of the gene RMD1 are marked by red dashed bars; as can be seen ALT4 has a higher context score than the other three, and the context scores of ALT1-3 appear in the middle of the distribution. B. The number and distribution of the 16,635 alternative ATGs according to their log-relative context score, which is the log of their context score divided by the context score of the main START ATG of the gene they reside in (see Material and methods section); scores larger or smaller than 0 (vertical blue bar) denote alternative ATGs with scores higher or lower than the score of their main START ATG, respectively. The distribution of the alternative ATGs in-frame is in red while that of alternative ATGs out-of-frame is in green. The relative context scores of the ALT1-4 are marked by pink dashed bars. As can be seen, the relative context scores of ALT1-3 appear in the middle of the distribution. C. The distribution of the log-context scores of main START ATGs in S. cerevisiae. The context score of the gene RMD1 appears in the middle of the distribution and is marked by a red bar. The ATG codons used to learn the context score PSSM (Material and methods) were omitted from the figure. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) Please cite this article as: T. Ben-Yehezkel, et al., Genomics (2013), http://dx.doi.org/10.1016/j.ygeno.2013.05.003 4 T. Ben-Yehezkel et al. / Genomics xxx (2013) xxx–xxx Fig. 3. Wet experiment to estimate the translation initiation from ATG codons downstream of the main START ATG. The figure includes all the variants that were analyzed in the experiment. the noise and to control for the possibility that the measured protein levels of the ‘Alt’ variant are due to a frame shift in translation following initiation from the traditional main START ATG of the gene (Fig. 3(3) and Material and methods). The fourth variant named ‘None’, contains a promoter-less YFP reporter gene in order to estimate the lower bound on the noise, e.g. due to auto-florescence of S. cerevisiae (Fig. 3(4)). Note that the ‘None’ and ‘TTG’ variants are still expected to be very similar and specifically, the lower and upper bounds can be very close or identical, as they both model noise. Finally, in order to estimate the translation initiation rate from each of the four alternative ATGs we collectively measured using the ‘Alt’ strain, we generated four additional strains, named ‘Alt1’, ‘Alt2’, ‘Alt3’, and ‘Alt4’ (Figs. 3(5)–(8)); Each of these four strains was identical to the ‘TTG’ strain except that in each of them one of the out-of-frame alternative ATGs was mutated back to ATG and the latter three remained TTG. Taken together, a dedicated version of the aforementioned set of reporter variants can accurately map the translation initiation landscape of any gene. 2.3. RMD1 as a model gene for studying alternative translation initiation We analyzed the S. cerevisiae genome to find a model gene for studying alternative initiation of translation from codons not in the frame of the main ORF. We searched for genes with main and alternative ATG context scores that are common in S. cerevisiae, as evident by the distribution of ATG context scores in the S. cerevisiae genome (Fig. 1). We excluded genes whose alternative ATG codons that we aimed to study are far, more than 50 codons, upstream or downstream from the beginning of the ORF, since initiation rarely occurs there naturally. Finally, in order to maximally exploit the accuracy of our approach we (1) searched for genes that had all the studied alternative ATGs in the same reading frame (and out-of-frame compared to the main ORF), (2) excluded genes that had additional in-frame ATGs within 150 nt of the main ATG (to isolate the effect of the main START codon on initiation) and (3) excluded genes that had stop codons in the 5′UTR and/or the first 150 nt of the main ORF in all frames to prevent re-initiation [21,22]; our aim was to Please cite this article as: T. Ben-Yehezkel, et al., Genomics (2013), http://dx.doi.org/10.1016/j.ygeno.2013.05.003 T. Ben-Yehezkel et al. / Genomics xxx (2013) xxx–xxx study initiation and not re-initiation; re-initiation occurs when the ribosome finishes translating an ORF (by approaching a STOP codon) and starts translating a new one downstream of it (by approaching the next START codon); by screening against genes with no stop codons we control for this phenomenon (see Fig. 4 for an illustration of the selection process). The gene RMD1 (YDL001W), encoding a cytoplasmatic protein required for meiosis, which contains 4 alternative out-of-frame ATGs, was selected for experimental analysis. Fig. 1 shows that the context score of RMD1's four alternative ATGs closest to the main ATG is similar to the mean context score of alternative ATGs in the S. cerevisiae genome, both in terms of their absolute values (Fig. 1A) and in terms of their relative values compared to the context score of their main ATG (Fig. 1B). The context score of RMD1's main ATG is also similar to the mean genomic value of main ATGs (Fig. 1C). Finally, the alternative ATG load of RMD1 (4 sites downstream of the main ATG) is also not unusual, as 31% of S. cerevisiae genes harbor at least four alternative ATGs downstream of their main ATG. We generated the complete set of 8 strains of the RMD1 gene (‘Main’, ‘Alt’, ‘TTG’, ‘No’, ‘Alt1’, ‘Alt2’, ‘Alt3’ and ‘Alt4’) in order to (1) Fig. 4. Genes that can be studied with our approach (A.) and genes that are unsuitable or irrelevant for initiation (B.). A. We can study genes that include START codons that are frame shifted relative to the main ORF; these codons can be at the 5′UTR or the beginning of the ORF, as START codons more than 50 codons after the beginning of the ORF typically do not initiate translation. B. Cases that may not be studied with our approach or are not relevant include: 1) alternative START codons interleaved with STOP codons (this case may be studied if the aim is to understand reinitiation) — in this case the ribosome may start translating one of the alternative START codons, terminate translation, and re-initiate translation; 2) alternative START codons in the same frame of the main ORF; 3)alternative START codons that are more than 50 codons downstream of the beginning of the main ORF. 5 experimentally determine whether translation initiates at its out-of-frame alternative ATG codons, (2) provide a quantitative measure of the process and (3) gain mechanistic insight into the process of translation initiation at these sites. 2.4. Alternative translation initiation from ATGs not in the frame of the main ORF in RMD1 We cultured the complete set of strains and recorded their fluorescence in log-phase using a plate-reader (details in the Material and methods section). Our results demonstrate that the combined protein output due to initiation from the four out-of-frame ATGs (i.e. the ATGs not in the frame of the main ORF) of the ‘ALT’ strain is 3.3% (with STD of only 0.59%) of the output due to initiation from the main ATG of the ‘Main’ strain (Material and methods, Fig. 5). Initiation from TTG codons of the ‘TTG’ strain was orders of magnitudes lower than that of the ‘Alt’ strain. Specifically, a statistical analysis demonstrates that the protein levels resulting from the out-of-frame alternative ATGs of the ‘Alt’ strain are significantly higher than from the TTG codons of the ‘TTG’ strain (p = 1.3∙10−33, empirical p-value b 0.001), and that the ‘TTG’ strain is very similar (less than 0.2% difference) but significantly higher than the promoter-less ‘None’ strain (p = 1.3∙10−22, empirical p-value = 0.006). Thus, our results show for the first time, that in S. cerevisiae significant levels of translation initiation can occur from out-of-frame ATG codons downstream of the main START ATG, and to a much lesser extent possibly even from non-ATG codons, or from out-of-frame ATG codons. Interestingly, individual analyses of the out-of-frame ‘Alt1-4’ variants show that the protein levels due to initiations from the alternative ATG codon closest to the main ATG of the ORF (‘Alt1’ strain) were significantly higher than the other three strains (ALT2–4), which did not exhibit protein levels higher than the noise (‘TTG’ strain). Protein levels due to initiation from ALT1 were estimated to be 5% (with STD of only 0.13%) of the protein levels of the ‘Main’ strain (significantly higher than the ALT variant; p-value = 4.8 ∗ 10−32; empirical p-value b 0.001; Material and methods, Fig. 5), suggesting that the cumulative translation initiation rate from several alternative ATG codons is not additive, and is probably related to the context and positions of the alternative ATGs as previously suggested [7,17,23]. Only the alternative ATG codon closest to the main START exhibited significant amounts of translation initiation, suggesting that either a small fraction of ribosomes scanning the 5′-UTR of RMD1 mRNAs scan through the main START and initiate translation from the next closest ATG they encounter in the scanning process; or, as we discuss later, it is also possible, at least partially, that the alternative initiation is due to cases of 5′ truncated mRNA transcripts. In fact, the ATG context score (Material and methods) of the first out-of-frame alternative ATG codon was the lowest of the four, and not extremely or unusually high (Figs. 1A–B; the scores of the alternative ATGs 1–4 after normalizing by the score of the Main START are 0.5, 0.76, 0.59, and 10.6 respectively). This suggests that translation initiation from these out-of-frame codons is determined to a larger extent by their location along the transcript than their ATG context score. This result broadens the classical mechanistic assumption of the scanning model, that translation initiates at the first ATG with a sufficiently high context score, regardless of the context score of downstream ATGs, as it demonstrates that a small fraction of the pre-initiation complexes may scan through an ATG with a relatively good context score (the main ATG), and start from a downstream out-of-frame ATG even if its context score is worse. 3. Discussion We describe a novel approach for monitoring translation initiation from alternative out-of-frame ATG codons. We demonstrate the method by implementing it on the model S. cerevisiae gene RMD1. Previous papers have suggested based on the ribosomal profiling Please cite this article as: T. Ben-Yehezkel, et al., Genomics (2013), http://dx.doi.org/10.1016/j.ygeno.2013.05.003 6 T. Ben-Yehezkel et al. / Genomics xxx (2013) xxx–xxx Fig. 5. Expression levels of all the variants analyzed in this study and corresponding p-values and empirical p-value (pe, Material and methods). Inset: the expression levels of ‘Main’, ‘ALT1’, ‘ALT’, and ‘TTG’ after subtracting noise. approach that in mammals there are alternative initiation events near the beginning of the START codon of the main ORF [9,16]. Here we used an approach which is based on estimation of protein levels and is, therefore, complementary to the ribosomal profiling approach, both providing an additional novel layer of experimental validation for this phenomenon, and testing whether it occurs in relatively simple eukaryotic translation systems. Our computational analyses of alternative START sites in the S. cerevisiae genome (Figs. 1A, B and 2) both emphasize the discriminatory power exhibited by the translation machinery in initiating translation at specific START sites, and raise questions regarding the actual fidelity of the process. Specifically, there is a global signal of lower median scores of upstream alternative sites than the score of downstream sites; this is in accordance with what the scanning model assumes — binding of the ribosomal pre-initiation complex to the first ATG downstream of the cap site with a strong enough context sequence score. However, this signal is not very strong and coherent: first, the correlation coefficient between the context score and ribosomal density is significant but not very high (0.19; see Material and methods section); second, as can be seen in Figs. 1–2, the rather wide range of context scores for the main START sites overlaps that of alternative sites. Thus, we expect, specifically in genes with lower context scores for the main ATG, higher usage of potential alternative start codons. There are several additional factors other than the context score that may favor the main ATG site over the one before or after it: First, as mentioned, the position of the codon is important — ATG codons closer to the 5′end have higher probability to initiate translation; second, ATG codons in unstructured parts of the mRNA molecule should be recognized more efficiently [24,25]; and third, strong mRNA structure downstream of an ATG codon should help stabilize the pre-initiation complex and thus promote initiation from the codon [26,27]. We believe that additional such signals will be discovered in the future. Our experimental mapping of the translation initiation landscape of the model RMD1 gene suggests that alternative initiation of translation is not only more frequent than previously thought, but also involves the translation of many previously unknown short peptides with relatively low protein levels due to initiation from out-of-frame ATGs, raising questions about how these peptides are related to the functionality of the cell. For example, it has been recently discovered that small peptides switch the transcriptional activity of shavenbaby during Drosophila embryogenesis [28]. There are also known cases of alternative translation initiation in various organisms, where initiation from an alternative in-frame ATG codon results in functional longer isoforms that affect the translocation of the protein [17,18,29–32]. Thus, it is possible that the numerous low-abundance short peptides generated by alternative initiation from out-of-frame ATGs have yet unknown regulatory roles in S. cerevisiae. While results from the model RMD1 gene cannot be readily generalized to the entire transcriptome of S. cerevisiae, its fairly common (1) absolute values of alternative ATG context scores (see Fig. 1A), (2) relative values of alternative ATG context scores compared to their respective ‘Main’ ATG (see Fig. 1B), as well as (3) its relatively average main ATG context score compared to other Main ATGs in the genome (see Fig. 1C), bolsters the notion that other S. cerevisiae transcripts may also exhibit similar behavior. Our analyses show that the lengths of peptides potentially generated from the out-of-frame and in-frame alternative ATG codons we identified, are within the range of naturally occurring peptides expressed from the S. cerevisiae genome (see Material and methods). In addition, the reported signal of translation from ‘out-of-frame’ ATG codons is rather low, less than 5%, but is based on a long list of normalizations and statistical tests (Material and methods), and is significant based on accepted statistical standards. It is possible that the reported signal is an upper bound and the actual signal of alternative translation is even lower than reported; this possibility emphasizes the fidelity of translation initiation. On the other hand, it is possible that future implementations of the method reported here on other genes will yield a much stronger signal of translation form out-of-frame ATG codons. Thus, further studies that will include implementation of our methods on a large set of genes in S. cerevisiae and/or other organisms will shed additional light on the content and abundance of such out-of-frame translated proteins. It is important to remember that all biological methods, even the very common ones, usually include biases. For example, it is known that DNA chips that have been used in thousands of studies in recent years include biases [33], and various aspects of the recent ribosomal profiling approach apparently include biases [9,16,34]. Our approach can estimate protein levels and translation rates due to alternative START codons that are downstream (as was demonstrated in the current study) but also upstream of the beginning of the ORF, which are not in the same frame of the main ORF. However, in some cases it may be difficult to determine the exact mechanism that causes the alternative initiation. In the case of alternative initiation for a codon that is downstream of the beginning of the ORF, if the analyzed protein levels are low it is possible that this signal is due to skipping of the main start codon (i.e. leaky scanning). However, it is also possible, at least partially, that the alternative initiation is due to cases of 5′ truncated mRNA transcripts; this could occur by cap-independent translation of degradation intermediates, cryptic promoters or aberrant splicing/processing; such truncated transcripts may not include Please cite this article as: T. Ben-Yehezkel, et al., Genomics (2013), http://dx.doi.org/10.1016/j.ygeno.2013.05.003 T. Ben-Yehezkel et al. / Genomics xxx (2013) xxx–xxx the main START and thus translation from these transcripts does not involve leaky scanning of their main START. It is important to mention that currently, the typical abundance of such 5′ truncated mRNA transcripts is unknown; if their relative abundance is less than 1% (as in the cases of transcription elongation errors which are estimated to be 1:10,000 [12]), this mechanism will become negligible relatively to the leaky scanning explanation. We believe, however, that the ability to detect the alternative initiation rate is important regardless of the biophysical mechanisms causing it. One possible source of bias in our method is the fact that the proteins corresponding to the different constructs usually have different N-terminals. An interesting direction for improving our approach is by including a ‘stop–go self-cleaving peptide’ [35]. Specifically, an augmented version of our method can include a self-cleaving peptide that will cleave the N-terminal of the generated peptide after translation such that it will be identical for all variants. However, data from such constructs may be problematic and/or biased in different ways. For example, first, 2A peptides used in stop–go constructs are a significant insertional mutation (approximately 60 bp) into our constructs, rendering them largely different from the native gene we attempt to provide data for. Second, the 2A insertion may affect the activity and stability of our constructs in unpredictable ways (see [36]). Finally, cleavage efficiency of the inserted peptide is not guaranteed to be fully efficient in constructs with different N-terminals and may therefore skew our results or create an expression bottleneck that masks our results (see [37,35]). In summary, the quantification of translation initiation is experimentally challenging, fraught with potential biases, with each experimental approach having its advantages and drawbacks. Specifically in the case of stop–go constructs, we evaluate that these may potentially insert more bias in the system compared to the relatively small differences in the N-terminals of our constructs. 4. Material and methods 4.1. Genomic analysis 4.1.1. Coding sequences and UTRs The coding sequences and UTRs of S. cerevisiae were downloaded from BioMart [38], and the UTR lengths were based on the study of [39]. The analysis regarding the ATG distribution upstream of the beginning of the ORF was based on the information of the UTR length from [39] and included only the 5′UTR region. 7 4.1.2. ATG context score We investigated the ATG context of the start codon and ATGs upstream and downstream from it, in a genome-wide manner, by devising a measure based on the ATG context of the start codon, called the ATG context score. We define an ATG context score (CS) according to the following algorithm (results are robust to versions of this procedure): 1. Select percentage of highly expressed/translated genes. 2. Calculate a position specific scoring matrix (PSSM) based on the nucleotide context around the start codon of the selected highly expressed genes. This PSSM represents the nucleotides necessary for highly efficient translation (see Fig. 6). 3. Calculate the context score per ATG position according to the PSSM for the rest of the genes. The training set for the position specific scoring matrix (PSSM) calculation is based on the selection of the top 2% of highly expressed genes, according to the product of ribosomal density (RD) and mRNA levels (ML) [39–41], which represent the total flux of ribosomes over a gene. However, similar results were achieved when selecting the highly expressed group of genes based solely on RD or ML, and also when based on other measures of expression, such as protein abundance etc. We consider two large scale ribosomal density (RD) measurements (the number of ribosomes occupying the transcript divided by its length); each generated by a different technology. The first dataset was generated more recently by Ingolia et al. [41] and the second by Arava et al. [40]. We average across the two RD datasets (after normalizing each dataset by its mean), in order to minimize experimental noise. Similar results were obtained when we analyzed each dataset separately. We average across four datasets of protein abundance measurements [42–44], similarly to RD. Similar results were obtained when we analyzed each dataset separately. We considered two large scale measurements of mRNA levels from [41 and 39]. Similarly to RD, we averaged across the two datasets of mRNA levels. In order to avoid over fitting, we select the top 4% highly expressed genes according to the product of RD and ML, and then randomly select half of that group as the training set. Different percentages of training sets sizes (up to 50%) were also tested in the same manner, and the results are robust to training set size variance. The PSSM was calculated according to the nucleotide (C, A, T, G) appearance probability (frequency), of the training set (2% selected at (1)), for 6 nts before and 3 nts after the first ATG (start) codon, based on the length of the potential optimal ATG context proposed Fig. 6. ATG context score PSSM for S. cerevisiae. The figure includes for each position near the beginning of the ORF the distribution of the four nucleotides in highly expressed genes (details in the Material and methods section). Please cite this article as: T. Ben-Yehezkel, et al., Genomics (2013), http://dx.doi.org/10.1016/j.ygeno.2013.05.003 8 T. Ben-Yehezkel et al. / Genomics xxx (2013) xxx–xxx by Hamilton et al. [45]. We achieved similar results when considering various context lengths. Interval values between 3 and 48 nts around the start codon were examined, for symmetric and asymmetric combinations, in jumps of 3 nts, and additionally in jumps of 1 nt from 1 nt up to 10 nts, and the results are robust to interval variance. These intervals were tested for all the aforementioned training set sizes in (1), and the results remain robust. The resultant PSSM appears in Fig. 6. For each gene, the ATG context score is calculated per ATG position, according to the highly expressed genes PSSM calculated in (2): ATGcsj = exp(∑ilog(Pij)), where j is the gene index, i the nucleotide position, Pij the probability that the ith nucleotide of the jth gene appears in the ith position. A relative context score is calculated by normalizing by the main ATG's context score: ATGcsj = exp(∑ilog(Pij))/ exp(startATGcs). We considered up to the last 6 nt of the UTR and the first 3 nt of the ORF. Note that a PSSM with different background weights for different nucleotides (i.e. ATGcsj = exp(∑ilog(Pij/bij)) where bij is the genomic background frequency over the region of the PSSM (up to the last 6 nt of the 5′UTR and the first 3 nt of the ORF after the ATG) of nucleotide i in gene j) did not improve the correlation between the ATG context score and the ribosomal density (r = 0.19 [p = 6.3 ∗ 10−30] for the first model vs. r = 0.17 [p = 7.8 ∗ 10−25] for the second model); thus, we remained with the simpler model (Occam's razor). Finally, when we considered a slightly different variant of the context score the results remain very similar: Suppose that for a certain gene the nt corresponding to position i in the PSSM is missing for a certain ATG (due to short 5′UTR). The score of the missing positions will be computed as follows: Let p1, p2, p3, and p4 be the distribution of the 4 nucleotides (A, C, T, G) in the genome in this position (all genes). Let p′1, p′2, p′3, and p′4 be the four probabilities of the PSSM in this position (based on highly expressed genes). The expected PSSM score in such a position is: p′1 ∗ p1 + p′2 ∗ p2 + p′3 ∗ p3 + p′4 ∗ p4 (this is the mean score of this position over all genes). This variant yields similar results to the previous one — the context scores are higher at the ORF than at the 5′UTR (for non-normalized ATG score: KS test p-value = 6.47 ∗ 10− 9; 5′UTR mean: − 47.8975; 5′UTR median: − 15.1650; ORF mean: − 14.7540 ORF median: − 14.6278. For normalized ATG score: KS test p-value: 4.88 ∗ 10− 67; 5′UTR mean: − 34.8717; 5′UTR median: − 2.1516; ORF mean: 135.0816; ORF median: − 0.7237). List of ATG codons, their positions and context scores appears in Supplementary Table 2. 4.1.3. ATG Kozak score Around 30 years ago Kozak [6] suggested that there is a specific optimal context surrounding the ATG start codon. Specifically, she identified ACCATGG as the optimal sequence for initiation by eukaryotic ribosomes. In this case the context score is the hamming distance (or the number of mutations) from this optimal context; thus, lower numbers denote a better score and 0 is the best possible score. In vivo, this site is often not matched exactly on different mRNAs and it is believed that the amount of protein synthesized from a given mRNA is dependent on the strength of the Kozak sequence. We found that few (0.0048% in S. cerevisiae and 0.0068 in Schizosaccharomyces pombe) of the sequences satisfy the Kozak rule precisely (with an average hamming distance of 0.67 in S. cerevisiae and S. pombe). Thus, we decided to use the context score above; however, analysis with the KOZAK score instead of the context score yields very similar results: The Kozak scores for the main ATG, and ALT1-4 of the RMD1gene respectively are, 0.5, 1.0, 0.75, 0.50, and 0.75. 4.1.4. ATG out-of-frame analysis and sequence construction There are three possible frames relative to the main ATG start codon an alternative codon can be in: frame 0, the same frame as the main ATG, frame 1, a 1 nt shift from the main ATG's frame, and frame 2, a 2 nt shift from the main ATG's frame. We consider all the alternative ATGs in a frame, such that for each gene we consider all 3 frames, and construct the experiment sequences in the following manner: For each gene, for each frame containing alternative ATGs, we construct a sequence composed of the last 50 codons of the 5′UTR, and the first 50 codons of the ORF, ending at nucleotides 150, 151 and 152 respective to the frame shifts, with the GFP protein concatenated at the end. We examined gene candidates according to the number of alternative ATGs in the 5′UTR and ORF, the best/worst/median alternative ATG context score; the distance of the first stop codon from the alternative ATG, the metabolic cost of the peptide defined between the alternative ATG and the first stop codon; the distance of an alternative ATG from the start of the GFP; the % of the mRNA levels of the gene compared to the gene TEF (YPR080W) (as this gene is known to have a strong promoter); the number/distance of relevant STOP codons (i.e. the first stop codon after each alternative ATG), the rank of the gene's protein levels; and the rank of the gene's RD. 4.1.5. Weighted number of ATGs This analysis (Supplementary Figs. 3, 4, 6, 7B, 8B, 9) was performed by weighting the mRNA levels of the transcripts assuming a total of 60,000 transcripts and based on the mRNA levels from [41]. The results were similar to the ones reported in the main text. 4.1.6. Analysis of length distribution of peptides potentially generated from alternative ATGs in the S. cerevisiae genome We considered the distance of out-of-frame ATGs from STOP codons in their frame, and the distance of in-frame alternative ATGs from the traditional START ATGs (Material and methods). Our analysis shows that of the alternative (out-of-frame and in-frame) 16,635 ATG codons, 12,484 are out-of-frame alternative ATG codons with a mean distance of 14.83 codons to the closest STOP codon, and 4151 are alternative in-frame ATG codons with a mean distance of 26.95 codons to the closest main START ATG (see also Supplementary figures). In comparison, the mean length of annotated S. cerevisiae peptides is 496.93 with STD 383.07, and with minimal ORF length of only 17 codons (close to the values above). 4.1.7. Empirical p-value based on bootstrapping 4.1.7.1. Context score empirical p-values. Let Nu and No denote sets of ATG (absolute or relative) context scores in the 5′UTRs and ORFs respectively; let N denote the group of ATGs that appear in Nu or in No (|N| = |Nu| + |No|). We describe here the procedure for the medians comparisons, but the same procedure was performed for the means comparisons. Let d ¼ medianðNoÞ−medianðNuÞ Repeat the following 1000 times: 1) Sample a set Nui of size |Nu| from N; the rest of the ATG scores from N are denoted as a set Noi 2) Compute di = median(Noi) − median(Nui) The empirical p-value is (no. of cases where di ≥ d) / 1000. The empirical p-values for both comparisons (absolute and relative) and when considering mean or median are p b 0.001 (in none of the random comparisons were the differences between the ORF and the 5′UTR mean/median larger or equal to the original one). The same procedure was applied in all cases, absolute or relative, mean or median. Please cite this article as: T. Ben-Yehezkel, et al., Genomics (2013), http://dx.doi.org/10.1016/j.ygeno.2013.05.003 T. Ben-Yehezkel et al. / Genomics xxx (2013) xxx–xxx 4.1.7.2. YFP levels empirical p-values. Let Y1 and Y2 denote sets/vectors of YFP measurements for two strains (e.g. ‘Alt’ and ‘TTG’); let Y denote the vector/set of YFP that appears in Y1 or in Y2 (|Y| = |Y1| + |Y2|). We describe here the procedure for the mean comparisons. Let d ¼ meanðY1Þ−meanðY2Þ Repeat the following 1000 times: 1) Sample a set Y1i of size |Y1| from Y; the rest of the YFP measurements from Y are denoted as a set Y2i 2) Compute di = mean(Y1i) − mean(Y2i) The empirical p-value is (no. of cases where di ≥ d) / 1000. 4.1.7.3. The expected relations between initiation rates and YFP levels. In this subsection we briefly discuss the relation between translation initiation rates and protein concentration/abundance. Specifically, we will explain why we expect that protein abundance should stand in high positive correlation with translation initiation rates in our study. Protein abundance levels are determined by a balance between protein production and degradation rates. Fixing the degradation rate, protein abundance levels will rise when the production rate is increased. Fixing the production rate, protein abundance levels will decrease when the degradation rate is increased. Let ci denote the concentration of protein i and let us assume that this protein is translated from a certain mRNA transcript with mi copies in the cell. In general, the dynamics of this process may be described by the following differential equation: dcdti ðt Þ ¼ Ri ⋅mi −Di ðci Þ. Where Ri and D(ci) are the translation rate per mRNA molecule and the degradation rate of protein i correspondingly. One common choice for D(ci) is: Di(ci) = di ⋅ ci(t) where di > 0 is constant. The steady state solution of the above differential equation is: ss di ⋅ css i = R i ⋅ mi where ci is the steady state concentration of the protein i (that is measured by the YFP levels). All the variants studied in this research have very similar ORFs with up to four point mutations and/or indels at their beginning; the mutations are related to the ATGs and thus expected to be related to initiation rates (and not elongation or termination rates). Thus, in our case Ri is assumed to be mostly initiation rate, we expect that the few very specific mutations/indels introduced in the experiment will not affect degradation rates or mRNA levels. Thus, d ⋅css from the equation above we get that Ri ¼ imii and indeed Ri∝css i . 4.2. The different variants and the rationale behind them The analysis included eight different variants of the gene RMD1 (see also Fig. 3). Each variant containing ~ 150 nucleotides of the beginning of this gene (Fig. 3) fused with a YFP gene, and measured their fluorescent levels: 1) The first variant (named Main) wasn't mutated compared to the original RMD1 gene and thus estimated the protein levels due to translation initiation solely from the main ATG codon. 2) The second variant (named ALT) was fused to the YFP with a 1 nt frame shift in order to measure the protein levels only due to the alternative ATGs of that frame (Fig. 3). 3) The third variant (named TTG) was engineered to estimate an upper bound on the noise; it was reconstructed from the second variant by replacing all the alternative ATGs with TTG. 4) The fourth variant (named None) was a YFP gene without a promoter; thus, it is a lower bound on the noise. To estimate the initiation rate due to each one of the alternative ATG codons separately we also generated four additional variants; each variant was similar to the third variant above, but in this case only three of the alternatives ATGs were mutated to TTG. We reconstructed such a variant for each of the four alternative ATGs and named these variants: ALT1 (the alternative ATG closest to the 9 beginning of the ORF is not mutated), ALT2, ALT3, and ALT4 (the alternative ATG furthest from the beginning of the ORF is not mutated) respectively. The fact that we compared the protein levels of each of the variants to the TTG variants enabled us to estimate the protein levels due to alternative initiation events and not due to frame shifts during translation elongation after initiation (which is not expected to occur when mutating ATG codons to TTGs). 4.3. Molecular biology 4.3.1. Methods for DNA construction Construction of the DNA of the YDL001W variants was performed according to the methods described in [46]. 4.3.2. Phosphorylation 300 pmol of single stranded DNA in a 50 μl reaction containing 70 mM Tris–HCl, 10 mM MgCl2, 7 mM dithiothreitol, pH 7.6 at 37 °C, 1 mM ATP & 10 units T4 Polynucleotide Kinase (NEB) was used. Reaction is incubated at 37 °C for 30 min, then at 42 °C for 10 min and inactivated at 65 °C for 20 min. 4.3.3. Elongation 1 pmol of single stranded DNA of each progenitor in a 25 μl reaction containing 2.5 μl of Hot Start DNA Polymerase (Novagen, 71086-3) reaction according to manufacturer's guidelines was used. Three cycles of annealing were executed for each elongation to ensure full yield of elongation. 4.3.4. PCR All PCR reactions were performed in 96 well PCR microplates, using KOD Hot Start DNA Polymerase (Novagen, 71086-3) according to its protocol. 4.3.5. Digestion by Lambda exonuclease 1–5 pmol of 5′ phosphorylated DNA termini in a 30 μl reaction containing 67 mM Glycine–KOH, 2.5 mM MgCl2, 0.01% Triton X-100, 5 mM 1,4-dithiothreitol, 5.5 units Lambda exonuclease (Epicentre) and SYBR Green diluted 1:50,000. Thermal Cycler program is 37 °C for 15 min, 42 °C for 10 min, enzyme inactivation at 65 °C for 10 min. 4.3.6. Chemical oligonucleotide synthesis Oligonucleotides for all experiments were ordered from IDT with standard desalting. 4.3.7. Minigenes Three sequenced fragments of under 300 bp containing segments of the YDL001W variants were ordered as IDT Minigenes. 4.3.8. DNA purification DNA construction related DNA purification was performed with Qiagen Minielute purification kit using standard protocols. 4.3.9. The master strain The master strain was created by integrating into the yeast genome a cassette containing a promoter-less YFP, followed by a NAT (Nourseothricin) resistance marker under its own promoter. The entire sequence was inserted into the his3Δ1 locus. 4.3.10. Transformations The three YDL001W-YFP fusion variants were transformed into the master strain using the LiAc/SS carrier DNA/PEG method following the procedures described in [47]. Cells were plated on solid agar SD-URA selective media and incubated at 30 °C for 3–4 days. Transformant colonies were handpicked and patched on SD-URA + NAT (Werner BioAgents) agar plates. Please cite this article as: T. Ben-Yehezkel, et al., Genomics (2013), http://dx.doi.org/10.1016/j.ygeno.2013.05.003 10 T. Ben-Yehezkel et al. / Genomics xxx (2013) xxx–xxx Correct transformation was verified for the 3 variants by PCR amplification from the yeast's genome and gel electrophoresis. All strains were additionally verified by sequencing. 4.3.11. YiFP Library construction The three synthetic YFP fusion constructs were transformed into a master strain that contained a promoter-less YFP coding sequence at the his3Δ1 locus. Each synthetic construct contained a URA3 selection marker under its own promoter followed by a TEF promoter, the last 30 bp of YDL001W's 5′UTR, the first 150 bp of YDL001W's ORF and the beginning of the YFP ORF (for recombination purposes). 4.3.12. Sequencing Several single colonies were picked manually from the plates of each variant, the specific integration locus was PCR amplified from each clone. Correct size amplifications were verified by gel electrophoresis. Amplicons were sequenced in house using Sanger sequencing. Colonies with the correct sequence were chosen for analysis. 4.3.13. Culture and fluorescence measurements In order to measure growth and fluorescent protein expression, the Singer colony arrayer was used to inoculate all colonies of the library into 100 μl of SD-URA media in a 384 well growth microplate (Greiner bio-one, 781162). Following 24 h of pre-incubation, 5 μl of the yeast cultures was diluted into 80 μl of SD complete media in a 384 well microplate, to reach a starting O.D600 of ~ 0.1–0.2. A microplate reader (Neotec Infinite M200 monochromator) was then set to measure the following parameters in cycles of 10 min: Cell growth (as extracted from absorbance at 600 nm) and YFP expression (Ex. 500 Em. 540). Each cycle contained 4 min of orbital shaking at amplitude of 3 mm. The number of cycles was set to 100 (16h) and the temperature to 30 °C. The experiment included three plates, each plate included four variants, and 96 wells included yeast cultures for each of the variants. Specifically, the 3 different plates included the following variants (all data in Supplementary Table 1): Plate 1: ALT, ALT1, None (denoted as None1), TTG (denoted as TTG1). Plate 2: Main, ALT2, None (denoted as None2), TTG (denoted as TTG2). Plate 3: ALT3, ALT4, None (denoted as None3), TTG (denoted as TTG4). 4.4. Statistical analysis and normalization To gain fair comparisons between the protein levels of the different variants we performed several normalization steps. First, we removed from each measurement of each variant outliers with YFP or OD levels that are more than 5 standard deviations from the other measurements of the variant at the same time point. Second, for each measurement, we removed the time points where the OD or YFP is after the peak (see Supplementary Fig. 1). Third, as the YFP/ OD (i.e. an estimation of the fluorescent levels per cell) is generally significantly dependent on the OD levels, we performed the following procedure: 1) We use the TTG variant in plate 1 (TTG1), which has the lowest mean OD levels after the first two stages, as a reference. 2) For each of the other variants we iteratively removed time points with the largest OD until we reached a mean OD similar to the one of the TTG1 variant. At the end of the third step all the variants have similar mean OD. At the next step, for each of the up to 133 measurements of the variants in the three wells, the YFP levels were normalized by dividing them with the corresponding OD measurements to get an estimation of the protein levels per cell. The variants related to the TTG and the None from the three plates were concatenated respectively after normalizing the values in plates 2 and 3, such that the mean YFP/OD levels of TTG2 and TTG3 are identical to the mean YFP/OD levels of TTG1. The values of pairs of variants were statistically compared by a Kolmogorov–Smirnov (KS) test. Similarly, to estimate the ratio between the protein levels due to the first alternative ATG and the main ATG we used the following formula: ½ðPATTG2 =PATTG1 Þ⋅ðPAAlt1 −PATTG1 Þ ðPAMain −PATTG2 Þ Variants ALT2, ALT3, and ALT4 (related to initiation from the second, third, and fourth ATG, respectively) did not exhibit expression levels higher than the TTG variants in their plate. Thus, we concluded that there is no translation initiation from these ATGs. Finally, we also utilized the time before t = 40 as we believe that these data are also informative (given the normalizations mentioned above). To prove this point, we computed the correlation between YFP/OD obtained for the strains before time 40 and the ones obtained afterwards. For all the plates the correlations were significantly positive (e.g. a correlation of 0.8 for plate 2; p b 10−16). Thus, since the ranking of strains based on the first and last time points is similar it means that the initial points are informative, and using them should improve the statistical power of the reported analyses. This is the reason we decided to use these points in the analysis. Supplementary data to this article can be found online at http:// dx.doi.org/10.1016/j.ygeno.2013.05.003. Acknowledgments We thank Prof. Martin Kupiec, Mr. Ido Yofe, and Mr. Tamir Biezuner for helpful discussions. This study was supported in part by a fellowship from the Edmond J. Safra Center for Bioinformatics at Tel-Aviv University. References [1] I.B. Lomakin, N.E. Shirokikh, M.M. Yusupov, C.U. Hellen, T.V. Pestova, The fidelity of translation initiation: reciprocal activities of eIF1, IF3 and YciH, EMBO J. 25 (2006) 196–210. [2] N. Sonenberg, A.G. Hinnebusch, Regulation of translation initiation in eukaryotes: mechanisms and biological targets, Cell 136 (2009) 731–745. [3] A.G. Hinnebusch, Microbiol. Mol. Biol. Rev. 75 (2011) 434–467. [4] R.J. Jackson, C.U. Hellen, T.V. Pestova, The mechanism of eukaryotic translation initiation and principles of its regulation, Nat. Rev. Mol. Cell Biol. 11 (2011) 113–127. [5] F. Sherman, The importance of mutation, then and now: studies with yeast cytochrome c, Mutat. Res. 589 (2005) 1–16. [6] M. Kozak, Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes, Cell 44 (1986) 283–292. [7] M. Kozak, Pushing the limits of the scanning mechanism for initiation of translation, Gene 299 (2002) 1–34. [8] C.P. Chang, S.J. Chen, C.H. Lin, T.L. Wang, C.C. Wang, A single sequence context cannot satisfy all non-AUG initiator codons in yeast, BMC Microbiol. 10 (2010) 188. [9] N.T. Ingolia, L.F. Lareau, J.S. Weissman, Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes, Cell 147 (2011) 789–802. [10] C. Fritsch, A. Herrmann, M. Nothnagel, K. Szafranski, K. Huse, F. Schumann, S. Schreiber, M. Platzer, M. Krawczak, J. Hampe, et al., Genome-wide search for novel human uORFs and N-terminal protein extensions using ribosomal footprinting, Genome Res. 22 (2012) 2208–2218, http://dx.doi.org/10.1101/gr.139568.112, (Epub 132012 Aug 139569). [11] I.P. Ivanov, A.E. Firth, A.M. Michel, J.F. Atkins, P.V. Baranov, Identification of evolutionarily conserved non-AUG-initiated N-terminal extensions in human coding sequences, Nucleic Acids Res. 39 (2011) 4220–4234, http://dx.doi.org/10.1093/nar/gkr007, (Epub 2011 Jan 4225). [12] B. Alberts, A. Johnson, J. Lewis, M. Raff, K. Roberts, P. Walter, Molecular Biology of the Cell, Garland Science, New York, 2002. [13] J. Pelletier, N. Sonenberg, Internal initiation of translation of eukaryotic mRNA directed by a sequence derived from poliovirus RNA, Nature 334 (1988) 320–325. Please cite this article as: T. Ben-Yehezkel, et al., Genomics (2013), http://dx.doi.org/10.1016/j.ygeno.2013.05.003 T. Ben-Yehezkel et al. / Genomics xxx (2013) xxx–xxx [14] M. Mokrejs, V. Vopalensky, O. Kolenaty, T. Masek, Z. Feketova, P. Sekyrova, B. Skaloudova, V. Kriz, M. Pospisek, IRESite: the database of experimentally verified IRES structures. , (www.iresite.org) Nucleic Acids Res. 34 (2006) D125–D130. [15] A.M. Michel, K. Roy Choudhury, A.E. Firth, N.T. Ingolia, J.F. Atkins, P.V. Baranov, Observation of dually decoded regions of the human genome using ribosome profiling data, Genome Res. 2012 (2012) 16. [16] S. Lee, B. Liu, S.X. Huang, B. Shen, S.B. Qian, Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution, Proc. Natl. Acad. Sci. U. S. A. 109 (2012) E2424–E2432, (Epub 2012 Aug 2427). [17] L.B. Slusher, E.C. Gillman, N.C. Martin, A.K. Hopper, mRNA leader length and initiation codon context determine alternative AUG selection for the yeast gene MOD5, Proc. Natl. Acad. Sci. U. S. A. 88 (1991) 9789–9793. [18] J.R. Pedrajas, P. Porras, E. Martinez-Galisteo, C.A. Padilla, A. Miranda-Vizuete, J.A. Barcena, Two isoforms of Saccharomyces cerevisiae glutaredoxin 2 are expressed in vivo and localize to different subcellular compartments, Biochem. J. 364 (2002) 617–623. [19] H. Zur, T. Tuller, New Universal Rules of Eukaryotic Translation Initiation Fidelity, PLoS Comput. Biol. (2013), (in press). [20] N. Ban, P. Nissen, J. Hansen, P.B. Moore, T.A. Steitz, The complete atomic structure of the large ribosomal subunit at 2.4 Å resolution, Science 289 (2000) 905–920. [21] M. Kozak, Effects of intercistronic length on the efficiency of reinitiation by eucaryotic ribosomes, Mol. Cell. Biol. 7 (1987) 3438–3445. [22] V. Munzarova, J. Panek, S. Gunisova, I. Danyi, B. Szamecz, L.S. Valasek, Translation reinitiation relies on the interaction between eIF3a/TIF32 and progressively folded cis-acting mRNA elements preceding short uORFs, PLoS Genet. 7 (2012) e1002137. [23] M. Kozak, Point mutations close to the AUG initiator codon affect the efficiency of translation of rat preproinsulin in vivo, Nature 308 (1984) 241–246. [24] T. Tuller, Y.Y. Waldman, M. Kupiec, E. Ruppin, Translation efficiency is determined by both codon bias and folding energy, Proc. Natl. Acad. Sci. U. S. A. 107 (2010) 3645–3650. [25] W. Gu, T. Zhou, C.O. Wilke, A universal trend of reduced mRNA stability near the translation–initiation site in prokaryotes and eukaryotes, PLoS Comput. Biol. 6 (2010) 1–8. [26] M. Kozak, Downstream secondary structure facilitates recognition of initiator codons by eukaryotic ribosomes, Proc. Natl. Acad. Sci. U. S. A. 87 (1990) 8301–8305. [27] T. Tuller, I. Veksler-Lublinsky, N. Gazit, M. Kupiec, E. Ruppin, M. Ziv-Ukelson, Composite effects of gene determinants on the translation speed and density of ribosomes, Genome Biol. 12 (2011) R110. [28] T. Kondo, S. Plaza, J. Zanet, E. Benrabah, P. Valenti, Y. Hashimoto, S. Kobayashi, F. Payre, Y. Kageyama, Small peptides switch the transcriptional activity of Shavenbaby during Drosophila embryogenesis, Science 329 (2010) 336–339. [29] S.A. Mackenzie, Plant organellar protein targeting: a traffic plan still under construction, Trends Cell Biol. 15 (2005) 548–554. [30] S. Vagner, M.C. Gensac, A. Maret, F. Bayard, F. Amalric, H. Prats, A.C. Prats, Alternative translation of human fibroblast growth factor 2 mRNA occurs by internal entry of ribosomes, Mol. Cell Biol. 15 (1995) 35–44. [31] C.J. Danpure, How can the products of a single gene be localized to more than one intracellular compartment? Trends Cell Biol. 5 (1995) 230–238. 11 [32] G. Kim, N.B. Cole, J.C. Lim, H. Zhao, R.L. Levine, Dual sites of protein initiation control the localization and myristoylation of methionine sulfoxide reductase A. J. Biol. Chem. 285 18085–18094. [33] J.E. Larkin, B.C. Frank, H. Gavras, R. Sultana, J. Quackenbush, Independence and reproducibility across microarray platforms, Nat. Methods 2 (2005) 337–344, (Epub 2005 Apr 2021). [34] A. Dana, T. Tuller, Determinants of translation elongation speed and ribosomal profiling biases in mouse embryonic stem cells, PLoS Comput. Biol. 8 (2012) e1002755. [35] A.L. Szymczak, C.J. Workman, Y. Wang, K.M. Vignali, S. Dilioglou, E.F. Vanin, D.A. Vignali, Correction of multi-gene deficiency in vivo using a single ‘self-cleaving’ 2A peptide-based retroviral vector, Nat. Biotechnol. 22 (2004) 589–594, (Epub 2004 Apr 2004). [36] M. Amendola, M.A. Venneri, A. Biffi, E. Vigna, L. Naldini, Coordinate dual-gene transgenesis by lentiviral vectors carrying synthetic bidirectional promoters, Nat. Biotechnol. 23 (2005) 108–116, (Epub 2004 Dec 2026). [37] M.L. Donnelly, L.E. Hughes, G. Luke, H. Mendoza, E. ten Dam, D. Gani, M.D. Ryan, The ‘cleavage’ activities of foot-and-mouth disease virus 2A site-directed mutants and naturally occurring ‘2A-like’ sequences, J. Gen. Virol. 82 (2001) 1027–1041. [38] J.M. Guberman, J. Ai, O. Arnaiz, J. Baran, A. Blake, R. Baldock, C. Chelala, D. Croft, A. Cros, R.J. Cutts, et al., BioMart Central Portal: an open database network for the biological community, Database (Oxford) 18 (2011), (bar041). [39] U. Nagalakshmi, Z. Wang, K. Waern, C. Shou, D. Raha, M. Gerstein, M. Snyder, The transcriptional landscape of the yeast genome defined by RNA sequencing, Science 320 (2008) 1344–1349. [40] Y. Arava, Y. Wang, J.D. Storey, C.L. Liu, P.O. Brown, D. Herschlag, Genome-wide analysis of mRNA translation profiles in Saccharomyces cerevisiae, Proc. Natl. Acad. Sci. U. S. A. 100 (2003) 3889–3894. [41] N.T. Ingolia, S. Ghaemmaghami, J.R.S. Newman, J.S. Weissman, Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling, Science 324 (2009) 218. [42] S. Ghaemmaghami, W.K. Huh, K. Bower, R.W. Howson, A. Belle, N. Dephoure, E.K. O'Shea, J.S. Weissman, Global analysis of protein expression in yeast, Nature 425 (2003) 737–741. [43] J.R.S. Newman, S. Ghaemmaghami, J. Ihmels, D.K. Breslow, M. Noble, J.L. DeRisi, J.S. Weissman, Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise, Nature 441 (2006) 840–846. [44] M.V. Lee, S.E. Topper, S.L. Hubler, J. Hose, C.D. Wenger, J.J. Coon, A.P. Gasch, A dynamic model of proteome changes reveals new roles for transcript alteration in yeast, Mol. Syst. Biol. 7 (2011). [45] R. Hamilton, C. Watanabe, H. De Boer, Compilation and comparison of the sequence context around the AUG start codons in Saccharomyces cerevisiae mRNAs, Nucleic Acids Res. 15 (1987) 3581. [46] G. Linshiz, T.B. Yehezkel, S. Kaplan, I. Gronau, S. Ravid, R. Adar, E. Shapiro, Recursive construction of perfect DNA molecules from imperfect oligonucleotides, Mol. Syst. Biol. 4 (2008) 191. [47] R.D. Gietz, R.A. Woods, Transformation of yeast by the Liac/SS carrier DNA/PEG method, Methods Enzymol. 350 (2002) 87–96. Please cite this article as: T. Ben-Yehezkel, et al., Genomics (2013), http://dx.doi.org/10.1016/j.ygeno.2013.05.003
© Copyright 2026 Paperzz