University of Groningen The Regulatory RNAs of Bacillus subtilis Mars, Ruben IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record Publication date: 2014 Link to publication in University of Groningen/UMCG research database Citation for published version (APA): Mars, R. (2014). The Regulatory RNAs of Bacillus subtilis [S.l.]: [S.n.] Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons). Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum. Download date: 18-06-2017 74 Chapter 4 In silico Target Profiling Reveals Small Regulatory RNA Functions in Bacillus subtilis Ruben A. T. Mars, Pierre Nicolas, Gerhild Wachlin, Michael Hecker, Emma L. Denham, and Jan Maarten van Dijl To be submitted 75 In silico target profiling Abstract Small regulatory RNAs (srRNAs) are bacterial post-transcriptional regulators that act by complementary base-pairing to modulate mRNA stability or translation. Studies in Gramnegative bacteria have been greatly facilitated by focusing mainly on Hfq-interacting RNAs. However, the RNA chaperone Hfq is apparently not required for srRNA regulation in Grampositive bacteria, where many putative srRNAs have also been identified. The present study was aimed at providing new leads for the functional analysis of 63 selected putative srRNAs from the Gram-positive soil bacterium Bacillus subtilis. This involved extensive target predictions, evolutionary conservation analyses of both srRNAs and their predicted targets, target enrichment analyses on these predictions, two expression correlation analyses computed over a 104-condition expression space, and a selection of those srRNA-mRNA pairs that are coexpressed. The validity of the various predictions was tested with experimental data on two known srRNAs, namely FsrA/S512 and RsaE/S415. We were able to retrieve the established role of FsrA/S512 in iron metabolism through our predictions and to suggest additional iron-related targets of the FsrA/S512 regulon. In addition, we experimentally show that FsrA/S512 also has a regulatory role in cells grown on iron-proficient LB medium. Implication of RsaE/S415 in the regulation of B. subtilis central metabolism was shown via deregulated expression of the 2-oxoglutarate dehydrogenase OdhA. Furthermore, conserved predicted targets of RsaE/S415 suggest that this srRNA regulates the expression of genes from the functional categories lipid utilization and biosynthesis of cofactors in organisms ranging from Staphylococcus aureus to B. subtilis. We conclude that our present data can serve as valuable leads for further functional studies on the srRNAs of B. subtilis. 76 Chapter 4 Introduction Small regulatory RNAs (srRNAs) are the regulators of a wide variety of cellular processes in bacteria. These include stress adaptation, central carbon metabolism and bacterial virulence (1, 2, 3). The srRNAs act in trans by short, imperfect, complementary base-pairing with their target messenger RNA (mRNA) molecules. Regulation via srRNAs can be faster than the regulation via protein transcription factors (4), which makes the srRNA-mediated regulation ideally suited for stress responses. However, srRNAs can also have more subtle fine-tuning functions in the regulation of gene expression (5). These theoretical considerations imply that there is a clear niche for gene regulation at the RNA level, and this might explain why srRNA-mediated regulation is a universally occurring biological phenomenon (3, 6, 7). The phenotypes of srRNA mutants are consequences of the deregulation of their target mRNAs. While the number of functional targets for some srRNAs may be small, others have emerged as important post-transcriptional regulatory hubs in Gram-negative bacteria, with more than 20 verified targets for GcvB in Salmonella typhimurium (8). The srRNA target identification in Gram-negative bacteria has been greatly facilitated by focusing on those RNAs that interact with the RNA chaperone Hfq. Because of this, discussions regarding regulatory RNAs in bacteria have become tightly linked to the function of Hfq. However, genes for Hfq homologues are only found in approximately half of the sequenced genomes (9). For example, the Gram-positive soil bacterium Bacillus subtilis contains an Hfq homologue (previously named YmaH), but to date this protein has proven to be dispensable for the established srRNA-mRNA interactions in this organism (10, 11, 12, 13). Nevertheless, a recent comparative transcriptome analysis revealed altered abundances for mRNAs belonging to the ResD-ResE, GerE and ComK regulons in hfq mutant cells and, more importantly for our present study, the abundance of six predicted putative srRNAs was changed in cells lacking Hfq (13). These findings suggest that Hfq is only critically involved in a small subset of the srRNA-mRNA interactions in B. subtilis. Since many putative srRNAs have been identified in B. subtilis (14, 15, 16, 17) and other Grampositive bacteria (18, 19, 20), this raises the question how to best study these potentially Hfqindependent srRNAs. Experimentally characterizing srRNA targets can be complicated and laborious. After identifying the srRNA, either by a bioinformatics (21) or experimental approach (tiling array or RNA-sequencing) (16, 17), experimental work starts with the verification of srRNA expression. Subsequently, transcriptome and/or proteome analyses can be performed, either on an srRNA mutant, or a strain in which the srRNA is overexpressed for a brief period (22). These analyses are likely to result in a number of mRNAs and/or proteins that show differential expression. The respective differentially expressed genes can be translationally coupled to a reporter gene to establish their mRNAs as direct targets of the srRNA (22). In such a system, the introduction of compensatory mutations in the srRNA-mRNA target interaction site provides usually the final proof for direct srRNA-mRNA regulation. A possible shortcoming of this approach is that true srRNA targets might not be expressed under the selected experimental conditions. This implies that a comprehensive view of srRNA regulation has to rely to some extent on target predictions, as was also suggested by Sharma et al. (8). Predicting srRNA targets can be very helpful in predicting the function of an srRNA. Accordingly, several srRNA target prediction algorithms have been reported (23, 24, 25, 26) and successfully applied (e.g. (8, 27)). However, results of srRNA target predictions should be treated with caution, because of the small numbers of true positive targets that are usually predicted. The limited success in predicting srRNA targets relates to the fact that a small degree of sequence complementarity can already suffice for srRNA-target regulation. Consequently, the number of predicted targets is generally too large to justify direct target verification experiments. This is especially problematic for the analysis of srRNA-mediated regulation in organisms that are 77 In silico target profiling genetically poorly accessible. Additional in silico approaches can therefore be highly useful to pinpoint the most likely candidate targets from the initially large lists of predicted targets. For instance, such additional approaches may address the evolutionary conservation of predicted srRNA–target interactions (26), or they may take into account expression data for the investigated srRNAs and their predicted targets (28). A recent large-scale transcriptome study by Nicolas et al. identified 1583 potential regulatory RNAs in the B. subtilis prototype strain 168, of which ~150 are independently expressed (17). These RNAs were identified by analyzing the transcriptome of B. subtilis 168 across 104 different growth conditions with high-resolution tiling arrays. Analysis of this extensive dataset revealed that the number of genes that are not expressed under any condition is very low (4.4%), while 85% of all coding sequences (CDS) are highly expressed under one or more experimental conditions. Only 3% of the CDS were highly expressed under every tested condition. This remarkably high expression plasticity suggests that the study by Nicolas et al. (17) had, for the first time, covered almost all transcriptionally active regions of an organism, including its regulatory RNAs. The 1583 identified RNA features were divided into multiple categories, based on their location with respect to the nearest CDS. One of these categories consists of RNA segments with their own promoter and terminator signals, which were therefore termed All-independent (All-Indep) segments (Chapter 3 of this thesis). These AllIndep segments are transcriptionally related to srRNAs, and have the potential to function as such. The aim of the present study was to determine whether the wealth of B. subtilis gene expression data reported by Nicolas et al. (17) can be used to identify true targets of putative srRNAs amongst a large set of predicted targets. For this purpose, we performed extensive target predictions for a selected set of putative srRNAs, analyzed the evolutionary conservation of these srRNAs and their predicted targets, computed enriched functional categories on the target predictions, performed two expression correlation analyses computed over the 104 condition expression space, and selected those srRNA-mRNA pairs that are co-expressed. The validity of results obtained through this prediction pipeline was tested by focusing on two known srRNAs of B. subtilis, namely FsrA/S512 and RsaE/S415 (10, 18). Indeed, the established role of FsrA/S512 in iron metabolism was pinpointed in our predictions, and additional ironrelated candidate members of the FsrA/S512 regulon are suggested. In contrast to the previously reported data (10), our experiments strongly suggest that FsrA/S512 also has a regulatory role during the exponential growth phase on the iron-proficient Lysogeny Broth (LB) medium. The predicted involvement of RsaE/S415 in the regulation of B. subtilis central carbon metabolism was evidenced through the observed deregulation of the 2-oxoglutarate dehydrogenase OdhA in an RsaE/S415 mutant. Conserved target predictions on RsaE/S415 also suggest that the regulation of genes from the functional categories lipid utilization and biosynthesis of cofactors is a conserved function of RsaE/S415 in a range of Gram-positive bacteria, including Staphylococcus aureus. Altogether, the procedures and data presented in this study will most likely facilitate future functional studies on the putative srRNAs of B. subtilis. Results and Discussion General description of in silico srRNA target profiling The conceptual outline of our srRNA target prediction and analysis pipeline is shown in Figure 1A. There are two aspects integrated in our approach. The first addresses the srRNA functions and the second one addresses the srRNA targets. To assess srRNA functions, we gathered data on the respective promoters, evolutionary conservation in different genomes, presence of open 78 Chapter 4 reading frames (ORFs), putative secondary structures, lengths and GC contents of a set of 63 putative srRNAs from B. subtilis. A summary of these data is provided in Table 1. The targetfocused analyses were initiated with extensive srRNA target predictions of which the results are presented in the Supplementary data file predictions and Table S1. Subsequently, five separate analyses with the goal to enrich for the most likely srRNA targets were performed and positive hits in anyone of these five analyses were used to flag the target as exemplified in Table 2 (marked with Y). The first flag (marked ‘Conserved’) is representative of the evolutionary conservation of the srRNA target interaction and this already strongly reduced the number of considered targets. The second flag (marked ‘Enriched’) indicates the presence of the target in an enriched functional category of the complete set of predicted targets. Specifically, this analysis can identify srRNA regulons, as is illustrated for FsrA/S512 (see Query S512 in Table 2). The third flag for B-cluster enrichment (17) (marked ‘BclusterFlag’) represents an expression correlation-related analysis based on the presence of multiple genes that share an expression profile within the set of target predictions for one srRNA (Figure 1B). This shared expression profile might thus reflect regulation by a particular putative srRNA. The fourth flag (marked ‘PeaksFlag’) represents the expression correlation with the target under an experimental condition where the putative srRNA is highly expressed compared to its baseline expression level (Figure 1B). The fifth and final flag (marked ‘ConditionalFlag’) represents the result of a co-expression analysis that selects predicted srRNA-target interactions based on the co-expression of sRNAs and their predicted targets at a certain cut-off under at least one experimental condition. The proportion of highly significant targets that received a flag in one of the five analyses and the distribution of the number of flags per target are indicated in Figure 2. The following paragraphs will give a general description of our srRNA target predictions and the subsequent in silico analyses. These are followed by descriptions of some srRNA functions predicted by our approach in the case studies section. General analysis of independently expressed RNA segments Functional srRNAs are often highly structured, and this RNA structure is essential for their regulatory function. In general, the secondary structure elements in an srRNA molecule protect it from being degraded by RNases (29, 30). Strong secondary structure of regulatory RNAs can therefore be an indication of a function. Nevertheless, it should be noted that the region of an srRNA that interacts with its target mRNA is on average less structured than the rest of the molecule, as has been reported for srRNAs in E. coli (31). As detailed in Chapter 3 of this thesis, the All-Indep RNA segments of B. subtilis that were identified by Nicolas et al. (17) display the strongest degree of predicted secondary structure and species-level evolutionary conservation. This might therefore point to an important regulatory function of these RNA segments, possibly as srRNAs. Besides the prediction of secondary structure and the analysis of evolutionary conservation, we wondered whether we could identify possibly subtle roles for Hfq in the expression of the All-Indep segments of B. subtilis. As indicated above, Hfq is the central RNA chaperone in Gram-negative bacteria, but it seems to have only a very limited effect on srRNA regulation in B. subtilis (9, 10, 12, 13). To assess a possible effect of the expression level of hfq on that of all identified RNA segments, we computed the expression correlation of hfq with the identified RNA segments in the 104 condition space. When these correlations were inspected globally per (pooled) category, the All-Indep category, which contains the putative srRNAs, displayed a significantly increased hfq correlation compared to the other categories of RNA segments (Figure 3). This seems to suggest that Hfq could somehow be involved in srRNA regulation in B. subtilis to a larger extent than thus far believed, but only in a very subtle manner. Interestingly, hfq correlations with segments from the 3’UTR category were also significantly 79 In silico target profiling higher than those for the 5’UTRs and the intergenic RNAs (Figure 3). Notably, 3’UTRs appear to be frequent sources of srRNAs in Gram-negative bacteria (32), and it is therefore tempting to speculate that the observed expression correlation between hfq and 3’UTRs is suggestive of 3’UTR-derived srRNAs in B. subtilis as well. Altogether, the observed expression correlations of All-Indep segments and hfq may point to a general (but very subtle) role of B. subtilis Hfq in stabilizing srRNA molecules, analogous to what has been observed in Gram-negative bacteria (33). Selection and description of putative srRNAs Having discovered that All-Indep segments (i.e. putative srRNAs) have the highest level of predicted secondary structure, the largest degree of evolutionary conservation, and the strongest positive correlations with hfq, we aimed at refining this set to include only the most likely srRNAs. To do this, we first excluded the All-Indep segments that were annotated as antisense RNAs (asRNA) of protein-encoding genes. For this purpose, we followed the definition of Nicolas et al., where overlaps of ≥100 nucleotides or overlaps of ≥50% of the sequence length were used as criteria to identify asRNAs (17). A first motivation for eliminating the asRNAs was that these RNAs probably function in cis at the genomic location where they are transcribed. However, it Figure 1. Overview of the srRNA function and target prediction approach A) Overview of the approach outlined in this manuscript. The ultimate goal of the present studies was to identify potential new srRNA functions in B. subtilis. This can be approached via a focus on srRNA functions by studying srRNA deletion phenotypes, or the direct search for true srRNA targets. In practice these two approaches often overlap as is illustrated in the main text. B) Example plot of the condition-dependent expression of one srRNA, namely CsfG/S547. This srRNA was arbitrarily selected for illustration purposes. The B-cluster expression for the segment is included as the average of the 51 segments of this cluster. Three conditions that could qualify as peak expression conditions are indicated by the grey zones labelled peak 1 – 3. Note however that the actual peaks expression analysis only considers two peak expression conditions at maximum. The data and x-axis are derived from Nicolas et al. (17). A B 80 Chapter 4 should be noted that we cannot fully exclude the possibility that some of these segments might additionally have a function in trans. The second motivation for eliminating asRNAs from the subset of putative srRNAs was that the sequence of asRNAs is de facto complementary to its sense mRNA. This could lead to an undesired bias in our sequence-based target predictions. Furthermore, we excluded two segments related to type I toxin-antitoxin modules, namely BrsH/S978 (34) and as-BrsH/S977 (16), and a known protein-encoding gene. The latter gene encodes the small basic protein FbpC, but it was annotated as independent segment S834. FbpC has been implicated in FsrA/S512-mediated srRNA regulation, possibly as an RNA chaperone (10, 11, 35). Lastly, we added two known B. subtilis srRNAs to the remaining segments. These are SR1 (12, 36, 37, 38, 39) (referred to by the name of the ORF ykzW in (17)) and CsfG/S547 (40) (annotated as antisense in (17)). Altogether, this selection of All-Indep RNAs resulted in a set of 63 segments that are hereafter referred to as putative srRNAs (Table 1). We next compared the extent of predicted secondary structure of the selected putative srRNAs (reflected by Z-scores) to that of all sense RNA segments and all of the independently expressed RNA segments. From this comparison it followed that the selected putative srRNA segments display significantly higher levels of secondary structure (i.e. lower Z-scores) than the other sense RNAs (Figure 4A). There is also a statistically non-significant trend towards higher secondary structure of the selected segments when they are compared to the other independent Conditional Peaks Conserved B−cluster Enriched cat. ● ● ● Figure 2. Graphical summary of flag attribution to predicted srRNA targets The top panel illustrates the proportion of all predicted targets (11419 with p-value ≤0.01) that received a flag in the five different analyses performed in the present study. The lower panel is a histogram of the number of flags per predicted target. All these targets with extra information are listed in Table S1. Those targets with three or more flags are presented in Table S2, and the targets with four flags are listed in Table 2. 5000 4000 3000 ** * ** * ** ** 1.0 2000 1000 0 0 1 2 3 4 number of flags Figure 3. Independent segments correlate most strongly with hfq/ymaH The 104-condition expression data from Nicolas et al. (17) was used to compute the Pearson correlation between the expression pattern of all the new RNA segments and that of hfq/ ymaH. Significance was tested with Anova with TukeyHSD at 99% confidence. One star (*) indicates significance with p-value ≤ 0.05, two stars (**) indicate significance with p-value ≤ 0.01. hfq correlation all conditions number of targets All targets 0.5 0.0 −0.5 3' 81 5' indep inter ykzW S2 S72 S111 S140 S144 S145 S181 S198 S249 S254 S275 S289 S309 S313 S326 S345 S348 S357 S415 S423 S444 S458 S462 S499 S503 S512 S547 S612 S641 S645 S653 S659 S665 S708 S717 S718 S728 S731 S796 S797 S809 S849 S857 S863 S877 S903 S907 S912 S968 S1009 S1022 S1024 S1027 S1029 S1052 S1136 S1227 S1251 S1455 S1495 S1534 S1583 SigA, SigEF SigK, SigEF SigA SigEF SigA, SigB, SigA Sig-, SigA, SigA SigEF, SigB SigA SigK SigA SigGF, SigA, SigB SigA SigA SigGF SigA SigA, SigA SigGF RsaE SigA SigK, SigEF SigA SigA, SigEF SigWXY SigL SigA FsrA SigA, SigCsfG SigGF SigA SigA SigK Sig-, SigA SigWXY, SigWXY, Sig-, SigK SigK SigA SigA BsrE SigA, SigWXY BsrF SigA, SigH SigEF SigA SigA, SigA SigA, SigB SigA, SigWXY, SigA, SigA, SigA SigK, SigA, SigSigEF SigA, SigA SigA SigWXY, SigK SigK, SigA, SigB SigK, SigA, SigA RnaC SigD, SigB SigK, SigA SigA, SigA SigA, SigA SigSigB SigK, SigA SigA SigWXY SigA Sig- 22 10 11 23 9 9 17 48 9 5 21 14 20 19 10 19 22 19 57 19 9 9 9 8 9 22 61 12 16 8 11 10 19 18 17 16 21 9 6 10 15 10 19 19 19 15 10 9 9 13 19 10 22 21 9 24 9 18 43 9 2 19 Y Y Y Y Y Y Y Y Y Y Y - Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y - 82 -4.43 -1.86 0.16 -1.22 -2.57 -6.78 -0.74 -5.09 -1.24 0.12 -2.63 -2.4 -2.29 -1.73 -5.27 0.16 -3.39 -6.37 -2.62 -1.91 -3.07 -1.33 0.68 -0.06 -1.01 -4.75 -4.72 -0.79 0.31 1.12 -0.72 -3.52 -1.67 -0.67 -4.89 -2.61 -1.78 -0.25 0.13 -2.42 -3.55 -0.82 -1.55 -0.59 -0.21 -1.27 -0.42 2.18 -0.18 -0.84 -5.22 -3.81 -1.37 -3.01 -0.87 -3.53 1.41 0.24 -0.35 -0.08 -2.62 -0.84 -73.9 -45.3 -28.3 -97.5 -19.8 -70.4 -115.63 -58.5 -200.7 -31.37 -39.6 -67.4 -68 -41.17 -32.19 -34.1 -77.71 -90.1 -29.3 -34.34 -136.8 -53 -88.36 -48.2 -52.2 -25.3 -41.4 -32 -44.43 -19.9 -21.3 -55 -26.9 -143.9 -55.4 -40.2 -37.1 -140.1 -24.4 -80.18 -69.4 -10.71 -27.5 -50.4 -17.96 -39.2 -33.84 -27 -36.1 -29.6 -44.72 -74.2 -60.9 -43.8 -49.9 -69.8 -3.35 -87.86 -68.6 -30.4 -128.4 -29.47 36.5 26.1 39.6 40.6 28.6 37.0 31.7 32.8 52.4 30.1 45.9 36.4 27.9 40.3 30.6 40.9 36.5 36.7 37.3 38.0 34.3 42.7 41.4 38.7 35.6 41.2 47.6 38.1 26.2 33.5 40.9 39.7 37.3 34.4 43.5 44.4 48.7 40.9 23.5 34.9 36.8 45.3 34.1 43.0 31.5 34.8 35.1 36.8 41.2 36.2 42.1 44.2 44.1 35.7 44.4 48.2 30.8 35.8 42.6 37.2 41.4 33.3 ren ce 274 234 149 367 112 216 546 198 561 176 157 269 330 176 111 154 263 297 126 187 502 204 367 212 219 102 124 134 286 161 132 179 110 628 154 135 111 499 153 505 234 75 132 200 108 201 154 171 148 152 126 217 220 154 198 220 107 388 263 223 425 198 Re fe th GC % ng Le en om es Sp oru lat OR ion F Z-s co re MF Eo pti m #g ds icte am SR1 Pre d Alt .n Na m e e igm af ac tor In silico target profiling Licht et al, 2005 Geissmann et al, 2009 Gaballa et al, 2008 Marchais et al, 2011 Saito et al, 2009 Saito et al, 2009 Schmalisch et al, 2010 Chapter 4 segments (which are mostly asRNAs) (Figure 4B). The fraction of conserved sequences amongst the selected putative srRNAs was visualized in a heatmap with automatic reordering of rows (genomes) and columns (RNA segments) (Figure 5). Similar to what was observed for such a plot of all sense RNA segments (Chapter 3), three clusters of putative srRNAs with different conservation levels were distinguishable. A first cluster consists of four RNA segments, namely S1455, S198, RsaE/S415 and CsfG/S547, which are conserved in almost all considered genomes. Briefly, S1455 represents an un-annotated T-box regulatory mechanism (upstream of the threonyl-tRNA synthetase gene). S198 is the 5’ leader region of the vmlR gene encoding an ABC transporter. The S198 RNA segment was nevertheless included here since leader regions and riboswitches might also function as trans-regulatory RNAs (41). RsaE/S415 is reported as a regulator of central carbon metabolism in S. aureus (19) and also seems to function as an srRNA in B. subtilis (18). CsfG/S547 is an srRNA, which is highly conserved within endospore-forming bacteria (40). A second cluster is composed of putative srRNAs that are only conserved in the genomes of the 9 included B. subtilis subspecies. The third cluster consists of putative srRNAs that display an intermediate conservation level, as they are mostly present in B. subtilis sp., B. amyloliquefaciens sp. and B. atrophaeus (~20 genomes in total) (Figure 5). It was conceivable that the more structured sRNAs would also be more conserved. If so, this could be an indication of the importance of the level of secondary structure for regulation. To test this idea, we plotted the conservation levels of all All-Indep segments against their predicted secondary structures (Figure 6). The resulting plot illustrates that there are many highly conserved All-Indep segments (>20 genomes) with intermediate Z-scores (between -2 and 0), but these are mostly asRNAs and this is the reason for their sequence-level conservation. The most structured All-Indep segments either belong to the selected set of putative srRNAs, or to the above-mentioned type I toxin-antitoxin systems. Furthermore, Figure 6 shows that there is no significant relationship between the conservation and Z-score for the All-Indep segments in general. However, such a relationship does exist for the selected putative srRNAs (R2 of fitted linear model 0.084, p-value= 0.01). The selection of putative srRNAs contains eight previously investigated RNA segments. The first three of these - SR1, FsrA/S512, and RsaE/S415 - will be discussed in more detail in the case studies later in this manuscript. Briefly, SR1 was identified in a computational approach, and it represented the first reported srRNA in B. subtilis (42). Following this, a series of papers by the same authors described multiple functions of this srRNA (12, 36, 37, 38, 39). The second reported srRNA, FsrA/S512, was identified through a computational prediction of Fur regulatory regions followed by a closer inspection of the respective downstream regions (10). FsrA/S512 was subsequently shown to be a regulator of the iron-sparing response (10, 11). The third srRNA, RsaE/S415, was first identified in an expression screen of intergenic regions of S. aureus, and it was then found to be conserved in the Bacillaceae (18). The remaining five previously reported srRNAs are less well characterized. The first of these is CsfG/S547 (40). The sequence Table 1. Selection of putative srRNAs (facing page) Name, name of segment in (17). Alt. name, alternative name for previously reported segments. Predicted sigma factor, sigma factor regulation predicted by (17). #genomes, number of Bacillus genomes for which a significant Blast hit was obtained in this study (maximum 62). Sporulation, “Y” for yes serves as an indication for segment expression exclusively under conditions of sporulation. ORF, “Y” for yes indicates that an ORF was identified in (17). Z-score, the secondary structure Z-score of the segment compared to shuffled sequences with the same length and nucleotide composition (a lower Z-score indicates stronger secondary structure). MFEoptim, RNAfold minimum free energy of the optimal structure used for the computation of secondary structure Z-scores. Length, length of the segment according to (17). GC%, GC percentage of the segment. Reference, reference marks the publication in which a segment was first identified and named; if no reference is indicated, the segment was first identified in (17). 83 lag gs t rt p d8 ar stop n alF lag lag la d r rF ve sta _sto _st on erF tatio _ r i e F e e t _ e t t i e s h A A e b e u s s A A ry k s d k c r l i N N g u u e n m n n m r a no N N o a l l r a R R u a a o o u c v c n c e n t R R o Q R N L S P s s m m c B C E B P C N A S111 10 veg BSU00440 -98 0.000 15 107 -73 42 0.14 B26 Y Y Y Y 4 Biofilm formation S140 82 mecB BSU22970 -87 0.000 68 121 -34 14 0.12 B474 Y Y Y Y 4 Proteolysis S140 91 htrA BSU12900 -86 0.000 213 254 -10 34 -0.23 B328 Y Y Y Y 4 Coping with stress - heatshock S140 115 ftsH BSU00690 -84 0.010 8 26 26 44 -0.09 B36 Y Y Y Y 4 Coping with stress - heatshock S181 110 purT BSU02230 -91 0.010 62 116 -75 -28 -0.29 B9 Y Y Y Y 4 Biosynthesis / acquisition of nucleotides S254 138 yxeB BSU39610 -74 0.010 4 43 -29 13 -0.27 B49 Y Y Y Y 4 Acquisition of iron S275 38 yvaE BSU33570 -72 0.000 105 156 -16 37 -0.28 B523 Y Y Y Y 4 resistance against toxins / antibiotics / based on similarity S275 63 ykuC BSU14030 -69 0.010 57 104 -4 44 0.20 B274 Y Y Y Y 4 resistance against toxins / antibiotics / based on similarity S309 294 scoC BSU09990 -85 0.010 154 205 -70 -18 -0.05 B28 Y Y Y Y 4 Regulation of gene expression - transition state regulator S313 92 efeM BSU38270 -80 0.000 87 155 -73 -3 -0.42 B49 Y Y Y Y 4 Acquisition of iron S357 58 mtnE BSU13580 -91 0.000 107 194 -68 15 -0.23 B98 Y Y Y Y 4 Biosynthesis / acquisition of amino acids S357 169 tcyA BSU03610 -82 0.010 196 246 -75 -21 0.05 B16 Y Y Y Y 4 Biosynthesis / acquisition of amino acids S357 176 dapA BSU16770 -82 0.010 146 209 -9 48 0.17 B41 Y Y Y Y 4 Biosynthesis / acquisition of amino acids - sporulation - essential S415 89 ndhF BSU01830 -75 0.000 16 55 -1 36 -0.01 B56 Y Y Y Y 4 Electron transport and ATP synthesis S512 1 yxeB BSU39610 -96 0.000 2 30 -74 -44 0.54 B49 Y Y Y Y 4 Acquisition of iron S512 3 yfiY BSU08440 -83 0.000 3 29 -51 -24 0.45 B49 Y Y Y Y 4 Acquisition of iron S512 5 feuA BSU01630 -82 0.000 1 25 -52 -26 0.49 B49 Y Y Y Y 4 Acquisition of iron S512 22 fhuD BSU33320 -73 0.000 1 21 -61 -39 0.52 B49 Y Y Y Y 4 Acquisition of iron S512 47 efeU BSU38280 -69 0.000 1 42 -64 -22 0.23 B49 Y Y Y Y 4 Acquisition of iron S512 84 feuC BSU01610 -64 0.010 3 29 -38 -11 0.40 B49 Y Y Y Y 4 Acquisition of iron S547 5 frlD BSU32570 -85 0.000 9 91 -31 36 0.14 B593 Y Y Y Y 4 Utilization of nitrogen sources other than amino acids S547 16 glmS BSU01780 -78 0.000 36 70 -14 24 0.06 B53 Y Y Y Y 4 Biosynthesis of cell wall components - essential S641 125 murB BSU15230 -95 0.000 164 227 -28 46 -0.17 B215 Y Y Y Y 4 Cell envelope stress proteins - essential S641 164 ykrA BSU14550 -92 0.000 181 257 -65 17 0.07 B46 Y Y Y Y 4 Unknown function S641 221 salA BSU01540 -89 0.000 160 232 -35 50 0.10 B46 Y Y Y Y 4 Regulation of gene expression - transition state regulator S645 198 phrK BSU18920 -79 0.000 21 103 -40 38 -0.09 B28 Y Y Y Y 4 Genetic competence S659 24 glmS BSU01780 -91 0.000 31 100 -16 50 0.06 B53 Y Y Y Y 4 Biosynthesis of cell wall components - essential S659 154 pbpD BSU31490 -77 0.010 46 114 -34 36 -0.18 B39 Y Y Y Y 4 Cell wall synthesis S659 193 yqgS BSU24840 -75 0.010 76 124 -35 17 -0.03 B502 Y Y Y Y 4 Biosynthesis of cell wall components S718 66 yclP BSU03820 -72 0.000 11 38 -37 -10 -0.52 B49 Y Y Y Y 4 Acquisition of iron S797 122 lspA BSU15450 -91 0.000 334 399 -53 15 0.26 B5 Y Y Y Y 4 Protein synthesis, modification, and degradation S797 322 ftsY BSU15950 -83 0.010 358 409 -4 41 -0.24 B5 Y Y Y Y 4 Protein synthesis, modification, and degradation - sporulation S809 76 ykuN BSU14150 -84 0.010 12 111 -64 43 -0.12 B49 Y Y Y Y 4 Electron transport and ATP synthesis S863 9 yfmC BSU07520 -83 0.000 75 132 -65 -13 -0.24 B49 Y Y Y Y 4 Acquisition of iron S912 243 yyzM BSU40939 -75 0.010 111 167 -70 -11 -0.18 B12 Y Y Y Y 4 Unknown function S968 17 pckA BSU30560 -94 0.000 47 96 -42 3 0.09 B54 Y Y Y Y 4 Carbon core metabolism S1009 57 yusE BSU32770 -83 0.000 50 92 -24 11 0.07 B598 Y Y Y Y 4 Electron transport and ATP synthesis S1022 121 abrB BSU00370 -70 0.010 51 82 -10 19 0.12 B22 Y Y Y Y 4 Regulation of gene expression - transition state regulator S1022 135 adcA BSU02850 -69 0.010 37 83 -57 -7 -0.27 B94 Y Y Y Y 4 Trace metal homeostasis S1024 60 exoA BSU22010 -75 0.010 41 70 14 44 -0.34 B387 Y Y Y Y 4 DNA replication / based on similarity S1227 82 bacB BSU37730 -77 0.000 13 49 -26 12 -0.20 B170 Y Y Y Y 4 Biosynthesis of antibacterial compounds S1455 1 coaE BSU29060 -108 0.000 40 79 -30 16 0.12 B41 Y Y Y Y 4 Biosynthesis of cofactors - essential S1583 86 gapB BSU29020 -84 0.000 3 110 -75 15 0.11 B54 Y Y Y Y 4 Carbon core metabolism In silico target profiling 84 Chapter 4 Table 2. Predicted srRNA targets with four flags (facing page) Query, putative srRNA name. Rank, rank in the TargetRNA predictions for the Query. Name, name of the predicted target. Ltag, unique B. subtilis 168 locus tag of the predicted target. Score, TargetRNA_v1 (23) prediction significance score. Pvalue, TargetRNA_v1 (23) prediction p-value. sRNA_start, start coordinate of putative srRNA in the predicted target interaction. sRNA_stop, end coordinate of putative srRNA in the predicted target interaction. mRNA_start, start coordinate of the predicted target in the predicted target interaction relative to start codon. mRNA_stop, end coordinate of the predicted target in the predicted target interaction relative to start codon. Cor, Pearson correlation between putative srRNA and predicted target computed over the complete condition space from (17). Bcluster, B-cluster of the target from (17). Conserved8, flag with “Y” for yes indicating whether the predicted target interaction is conserved in 8 genomes (including B. subtilis 168) or more. Enriched, flag with “Y” for yes indicating whether the predicted target is part of an enrichment category from Table 3. BclusterFlag, flag with “Y” for yes indicating whether the predicted target is part of an enriched B-cluster from Table S3. PeaksFlag, flag with “Y” for yes indicating whether the predicted target is a Peaks expression target from Table S4. NumberFlags, the sum of the number of flags. Annotation, shortened description of the annotation category of the predicted target. All 11419 predicted targets can be found in Table S1 and the 746 targets with three or more flags in Table S2. A B P= 0.158 secondary structure Z−score secondary structure Z−score P= 0.001 2 0 −2 −4 −6 −8 2 0 −2 −4 −6 −8 All sense Selection Indep Selection Figure 4. Selected segments display stronger predicted secondary structure than other sense RNAs A) Box plots of the secondary structure Z-score of all sense RNAs compared to the selection of putative srRNAs from Table 1. A lower Z-score indicates stronger predicted secondary structure. The selected putative srRNA segments are significantly more structured than the other sense RNAs. B) Same as in A, but the selected putative srRNAs are compared to all ‘All-Indep’ segments. There is a trend to more secondary structure in the selected putative srRNAs. P-values shown are from a Welch Two Sample t-test. 85 In silico target profiling Color Key and Density Plot 0 5 10 Density 15 Selection independent segments 0 0.2 0.4 0.6 Value 0.8 1 S1 4 S155 S498 S 15 S1547 5 S234 S754 S796 S897 S709 S718 S617 S 41 S1181 0 S109 S611 S212 S689 S953 S603 S445 S199 4 S 4 S1 72 0 S924 S307 2 S6 6 S859 S749 S431 S462 S144 S245 S 49 S1912 S1495 2 S427 S 58 S1968 0 S552 S 03 S1708 1 S 36 S1275 0 S129 40 S S3 2 S548 S1 12 02 S 7 S1728 2 S351 S309 S 13 S1345 S1583 0 S822 S877 S457 S623 S365 S857 63 Bacillus_subtilis_subsp._subtilis_str._BSP1 Bacillus_subtilis_BSn5 Bacillus_subtilis_subsp._natto_BEST195 Bacillus_subtilis_QB928 Bacillus_subtilis_subsp._subtilis_str._168 Bacillus_subtilis_subsp._subtilis_str._RO.NN.1 Bacillus_subtilis_subsp._spizizenii_TU.B.10 Bacillus_sp._JS Bacillus_subtilis_subsp._spizizenii_str._W23 Bacillus_amyloliquefaciens_subsp._plantarum_YAU_B9601.Y2 Bacillus_amyloliquefaciens_Y2 Bacillus_amyloliquefaciens_FZB42 Bacillus_amyloliquefaciens_subsp._plantarum_AS43.3 Bacillus_amyloliquefaciens_subsp._plantarum_CAU_B946 Bacillus_amyloliquefaciens_XH7 Bacillus_amyloliquefaciens_TA208 Bacillus_amyloliquefaciens_LL3 Bacillus_amyloliquefaciens_DSM_7 Bacillus_atrophaeus_1942 Bacillus_anthracis_str._Ames_Ancestor Bacillus_cereus_Q1 Bacillus_anthracis_str._A0248 Bacillus_anthracis_str._H9401 Bacillus_anthracis_str._Ames Bacillus_anthracis_str._Sterne Bacillus_cereus_E33L Bacillus_thuringiensis_BMB171 Bacillus_cereus_ATCC_10987 Bacillus_cereus_biovar_anthracis_str._CI Bacillus_cereus_03BB102 Bacillus_cereus_AH187 Bacillus_cereus_NC7401 Bacillus_cereus_AH820 Bacillus_cereus_F837.76 Bacillus_thuringiensis_str._Al_Hakam Bacillus_cereus_B4264 Bacillus_anthracis_str._CDC_684 Bacillus_thuringiensis_serovar_finitimus_YBT.020 Bacillus_cereus_FRI.35 Bacillus_thuringiensis_serovar_chinensis_CT.43 Bacillus_thuringiensis_Bt407 Bacillus_cereus_G9842 Bacillus_thuringiensis_HD.771 Bacillus_weihenstephanensis_KBAB4 Bacillus_thuringiensis_HD.789 Bacillus_cereus_ATCC_14579 Bacillus_thuringiensis_serovar_konkukian_str._97.27 Bacillus_thuringiensis_MC28 Bacillus_cytotoxicus_NVH_391.98 Bacillus_megaterium_QM_B1551 Bacillus_megaterium_DSM_319 Bacillus_coagulans_36D1 Bacillus_megaterium_WSH.002 Bacillus_coagulans_2.6 Bacillus_cellulosilyticus_DSM_2522 Bacillus_pseudofirmus_OF4 Bacillus_halodurans_C.125 Bacillus_selenitireducens_MLS10 Bacillus_clausii_KSM.K16 Bacillus_licheniformis_ATCC_14580 Bacillus_licheniformis_DSM_13_._ATCC_14580 Bacillus_pumilus_SAFR.032 Figure 5. Conservation heatmap of selected putative srRNAs The proportion of sequence conservation of selected putative srRNAs in different bacilli is indicated in color code. Rows and columns were reordered automatically to illustrate the relationship between the conservation level of the putative srRNAs and the similarity between the included genomes. 60 Number of genomes conserved ● Figure 6. Selected putative srRNAs display stronger secondary structure with higher conservation. The secondary structure Z-scores of putative srRNAs are plotted against the respective species-level conservation. There is a significant linear relationship between these two parameters for the selected putative srRNAs (R2 0.084, p-value= 0.01). all indep selection 50 ● ● ● ● 40 30 20 ●● ● ● ● 10 ● ● 0 −8 −6 ● ● ● ● ● ●● ● ●● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ●● ●●● ● ● ● ● ●● ● ● ● ● −4 −2 0 Secondary structure Z−score 86 2 Chapter 4 on the opposite strand CsfG/S547 was first annotated by Barrick et al. as a cis-acting regulatory element termed the “ylbH leader” (43). However, Marchais et al. (40) have convincingly shown that the region on the opposite strand likely encodes for an srRNA, and they also noted its high conservation in endospore-forming bacteria. The sporulation-specific expression of CsfG/ S547 is due to regulation by SigG and SigF (40). The second segment is SurA/S653, which was identified by a microarrays bearing intergenic regions of B. subtilis (44). SurA/S653 is induced at the onset of sporulation under indirect control of Spo0A. However, this is not the only condition under which SurA/S653 is induced (17), suggesting additional regulatory functions in processes other than sporulation. Two other selected putative srRNAs, BsrE/S718 and BsrF/S728, were identified in a Northern Blot analysis of the transcription of 123 intergenic regions in cells grown in LB medium (34). Lastly, RnaC/S1022 was identified – again using microarrays of intergenic regions – as a SigD-dependent srRNA expressed during exponential growth on LB medium (45). Intriguingly, regulatory RNAs are known to be involved in various differentiation processes in eukaryotes (6). Since the main developmental pathway of B. subtilis is sporulation, we wondered whether some of the selected putative srRNAs would be predominantly expressed under sporulation-inducing conditions. To identify such putative srRNAs, we inspected the 104-condition expression profile of all segments to identify those segments that have a low baseline expression (≤~9 on a log2 scale from 7-16) under most conditions and a higher expression under sporulation-inducing conditions. This expression pattern was identified for 11 out of the 63 selected RNA segments (Table 1, “Sporulation” column). Future studies will be necessary to further define the predicted role(s) of these putative srRNAs in spore development. It was previously shown that some srRNAs can enact (a part of) their regulatory functions via the translation of a small ORF within their sequence. Examples of this are SgrS in E. coli and SR1 in B. subtilis. SgrS regulates the ptsG mRNA via an srRNA-mRNA interaction and the peptide encoded by SgrS (SgrT) is important for the recovery of E. coli cells from glucosephosphate stress (46). SR1 of B. subtilis binds to the mRNA of ahrC via an srRNA-mRNA interaction while the SR1P peptide regulates the gapA mRNA via binding of SR1P to GapA (37). For this reason, Nicolas et al. (17) have assessed the presence of ORFs in all identified RNA segments of B. subtilis. From this analysis it followed that 15 out of the 63 putative srRNAs include an open reading frame (Table 1), which may be important for their (presumed) regulatory roles. Target predictions and functional gene enrichment We predicted targets for all independent RNA segments using the source code of TargetRNA_v1 (23, 27) with expanded settings around the 5’ end of all mRNA and new RNA segments. The included target region was set from -75 nucleotides upstream until 50 nucleotides downstream the annotated AUG translation start codon. These expanded settings were based on reported srRNA–target interactions in which srRNAs were shown to bind at more distant locations than those covered by TargetRNA_v1’s default settings. This choice is supported by the observation that the default target region settings in TargetRNA_v2 were expanded to the region between -80 nucleotides and +20 nucleotides relative to the start codon (47). We selected TargetRNA for our predictions, because it has previously been successfully employed for srRNA target predictions (8, 27), and because it is solely based on sequence information. We preferred a solely sequence-based algorithm rather than a target prediction algorithm that takes structural and conservation elements into account, because the latter algorithms have so far only been developed and benchmarked for Gram-negative bacteria (24). To what extent srRNA-based regulation in Gram-positive and Gram-negative bacteria can be compared is unclear at this point, as is illustrated by the lower importance of Hfq in Gram-positive bacteria. To facilitate the comparisons, we performed exhaustive target predictions for every segment. This means that 87 In silico target profiling we predicted all possible targets up to p-value 1. The default p-value threshold for TargetRNA_ v1 target predictions is ≤0.01. In total 216248 targets with p-value ≤1 and 11419 targets with p-value ≤0.01 were predicted for the 63 selected putative srRNAs (Table S1 and Supplementary data file predictions). It is known from previously published studies that a functional srRNA-target interaction region can be very short (7). However, we noted that predicted interaction regions with prediction p-values ≤0.01 are occasionally very large, involving an average of ~70 nucleotides of the selected putative srRNAs. We therefore wondered how the length of the interaction region is related to the TargetRNA p-value. To inspect this, we plotted the predicted interaction length for multiple cut-offs of the TargetRNA p-value. As expected, the predicted interaction length is dependent on the p-value cut-off (Figure 7A). It thus seems that the average predicted interaction length with a p-value ≤0.01 is large. This is consistent with the notion that Gram-positive bacteria might on average have longer srRNA interaction regions than Gram-negative bacteria (9). Besides the length of predicted interactions, we also inspected the number of targets predicted per putative srRNA. Since the number of predicted targets is expected to increase with the length of the sRNA, we plotted the number of predicted targets against the length of the sequence. Indeed, this showed that more targets are predicted for longer sequences (R2 0.159 with p-value <0.001 on all All-Indep points), but this relationship was completely undetectable when it was tested on the 63 selected putative srRNAs only (p-value 0.468) (Figure 7B). In RNA segments of around 150 nucleotides, the number of predicted targets ranges from around 20 to around 500. This wide range is interesting as it shows that there are other factors besides the sequence length responsible for the number of predicted targets. Thus, it may be that the number of predicted targets per srRNA reflects on the function of the segment. However, the wide range of predicted targets could also be an artefact caused by the presence of certain frequently occurring sequence motifs in those RNA segments with many predicted targets. It may be that srRNAs regulate the expression of multiple genes involved in the same functional process. Such an srRNA regulon was for instance identified for FsrA/S512, which targets transcripts from genes involved in iron metabolism (11). In fact, the presence of srRNA regulons is a common aspect of srRNA regulation in Gram-negative bacteria (26, 28, 48). The likelihood of the non-random prediction of multiple genes involved in the same A B Figure 7. General description of target predictions A) Predicted interaction length at different TargetRNA p-value cut-offs. B) Relationship between the length of an srRNA query sequence and the number of predicted targets with p-value ≤0.01. This relationship is absent for the selection of putative srRNAs. 88 Chapter 4 functional process can be computed and expressed as a binomial p-value. To do this, we used the most recent B. subtilis gene annotation from SubtiWiki (http://subtiwiki.uni-goettingen. de) (49), selected only those predicted targets with p-values ≤0.01, and only considered those functional categories with a binomial p-value of enrichment of ≤0.05. Enrichment of minimally one functional category was observed for 46 out of the 63 putative srRNAs analysed (with a maximum of 4 categories for FsrA/S512). The enriched functional categories are listed in Table 3. Examples from this Table are FsrA/S512, S462, and RsaE/S415, which will be discussed in detail below. Similarly, we tested whether a more than random number of genes from a specific regulon were present in the target predictions. For this purpose, we again computed the binomial p-values on target predictions with p-values ≤0.01, now using the B. subtilis regulon annotation from SubtiWiki (49), and only considering regulons with a binomial p-value of enrichment of ≤0.05. For 47 out of the 63 selected putative srRNAs, the predictions show enrichment of minimally one regulon (with a maximum number of 4 regulons for FsrA/S512). These enriched regulons are listed in Table 4. Examples from this Table are FsrA/S512, SR1, and RsaE/S415, which are to be discussed below. We also wondered whether there might be any groups of genes that are predicted to be regulated by multiple putative srRNAs. We therefore computed the binomial p-values of functional enrichment on all 11419 predicted targets of the selected putative srRNAs with p-values ≤0.01. This led to two functional categories with binomial p-values ≤0.05, namely proteins of unknown function (p-value 0.03) and SPβ prophage (p-value 0.05). Clearly, the proteins of unknown function category is not informative. However, the observed enrichment of SPβ genes as targets for regulation by multiple srRNAs is intriguing, since it is presently unclear how organisms can specifically silence genomically integrated viruses. Repressing gene expression from integrated viruses by srRNAs is a plausible mechanism, especially since srRNAs can rapidly evolve to take on new functions. Analogous to this hypothesis, it was noted from work on Salmonella that bacterial pathogens use their large number of srRNAs to integrate horizontally acquired genes into existing posttranscriptional regulatory networks (50). This is reminiscent of transcription factors that are recruited to tame foreign genes at the DNA level (50), such as Rok in B. subtilis (51). We thus suggest that srRNAs may play (or have played) a role in silencing the expression of genes from SPβ, and we propose this as a topic worth exploring. Evolutionary target conservation as a criterion to identify candidate srRNA targets The conservation of a predicted srRNA-target interaction might be indicative of the importance of this interaction over evolutionary time. Thus, testing whether the regulatory interaction is conserved could be useful when trying to identify the most important targets of a particular srRNAs. During the course of the present analyses, such a comparative genomic approach was reported to be successful for improving srRNA target predictions (26). In parallel to the reported approach, we established a bioinformatics pipeline to identify conserved predicted srRNA targets in B. subtilis. This pipeline predicts srRNA targets in genomes in which the sequence of the putative srRNA is conserved. Since we were mainly interested in finding actual srRNA targets in B. subtilis 168, we only considered targets also predicted in B. subtilis 168. Overall, this analysis reduced the average number of considered targets per srRNA from 181 to 29. For the selected putative srRNAs, we plotted the number of predicted targets before and after evolutionary target profiling in Figure 8. Only for S254 and S499 no conserved targets were identified and this was linked to their low level of conservation (Table 1, Figure 5). Target interactions that were predicted in more than 8 species were indicated with a ‘Y’ (short for yes) flag in the column ‘Conserved8’ in the Table with all TargetRNA predictions with p-values ≤0.01 (Table S1; see Table 2 for examples). To illustrate these evolutionary conserved predicted targets, we plotted them as a network (Figure S1). This network also visualizes the shared targets, sizes of predicted 89 In silico target profiling Query S2 S72 S72 S111 S140 S144 S145 S145 S181 S181 S181 S254 S254 S275 S275 S289 S289 S313 S313 S313 S326 S345 S345 S357 S415 S415 S423 S444 S444 S444 S458 S458 S462 S462 S503 S512 S512 S512 S512 S547 S547 S547 S641 S641 S645 S645 S645 S653 S653 S659 S659 S708 S708 S717 S717 S718 S718 S718 S731 S731 S796 S797 S797 S809 S849 S857 S863 S863 S877 S912 S912 S968 S968 S968 S1009 S1009 S1022 S1022 S1024 S1024 S1136 S1136 S1227 S1227 S1455 S1455 S1583 S1583 S198 S249 S309 S348 S499 ykzW S612 S665 S728 S903 S907 S1027 S1029 S1052 S1251 S1495 S1534 Enriched category Information.processing...RNA.synthesis.and.degradation...RNases Information.processing...genetics...DNA.replication..based.on.similarity Information.processing...genetics...DNA.restriction..modification Metabolism...lipid.metabolism...utilization.of.lipids Information.processing...protein.synthesis..modification.and.degradation...proteolysis Cellular.processes...cell.envelope.and.cell.division...cell.division Metabolism...lipid.metabolism...biosynthesis.of.lipids Information.processing...genetics...DNA.replication..based.on.similarity Cellular.processes...transporters...transporters..other Cellular.processes...homeostasis...metal.ion.homeostasis..K..Na..Ca..Mg. Metabolism...nucleotide.metabolism...biosynthesis..acquisition.of.nucleotides Cellular.processes...homeostasis...acquisition.of.iron Information.processing...protein.synthesis..modification.and.degradation...chaperones..protein.folding Cellular.processes...transporters...transporters..other Cellular.processes...homeostasis...metal.ion.homeostasis..K..Na..Ca..Mg. Cellular.processes...homeostasis...acquisition.of.iron..based.on.similarity Metabolism...lipid.metabolism...utilization.of.lipids Cellular.processes...cell.envelope.and.cell.division...cell.wall..other..based.on.similarity Cellular.processes...transporters...transporters..other Metabolism...electron.transport.and.ATP.synthesis...electron.transport..other..based.on.similarity Metabolism...electron.transport.and.ATP.synthesis...electron.transport..other..based.on.similarity Metabolism...amino.acid..nitrogen.metabolism...biosynthesis..acquisition.of.amino.acids Metabolism...nucleotide.metabolism...nucleotide.metabolism..other Metabolism...amino.acid..nitrogen.metabolism...biosynthesis..acquisition.of.amino.acids Cellular.processes...homeostasis...trace.metal.homeostasis..Cu..Zn..Ni..Mn..Mo. Metabolism...electron.transport.and.ATP.synthesis...respiration Metabolism...additional.metabolic.pathways...iron.metabolism Cellular.processes...transporters...transporters..other Cellular.processes...homeostasis...pH.homeostasis Metabolism...electron.transport.and.ATP.synthesis...electron.transport..other..based.on.similarity Metabolism...electron.transport.and.ATP.synthesis...regulators.of.electron.transport Metabolism...nucleotide.metabolism...nucleotide.metabolism..other Cellular.processes...cell.envelope.and.cell.division...capsule.biosynthesis.and.degradation..based.on.similarity Metabolism...nucleotide.metabolism...biosynthesis..acquisition.of.nucleotides Cellular.processes...cell.envelope.and.cell.division...cell.wall.synthesis Cellular.processes...transporters...ABC.transporters Cellular.processes...homeostasis...acquisition.of.iron Metabolism...additional.metabolic.pathways...iron.metabolism Information.processing...genetics...genetic.competence Cellular.processes...transporters...ABC.transporters Metabolism...amino.acid..nitrogen.metabolism...utilization.of.nitrogen.sources.other.than.amino.acids Information.processing...genetics...genetic.competence Cellular.processes...cell.envelope.and.cell.division...cell.wall.synthesis Metabolism...additional.metabolic.pathways...biosynthesis.of.cell.wall.components Cellular.processes...cell.envelope.and.cell.division...cell.wall.synthesis Metabolism...additional.metabolic.pathways...biosynthesis.of.cell.wall.components Information.processing...genetics...genetic.competence Metabolism...lipid.metabolism...utilization.of.lipids Information.processing...RNA.synthesis.and.degradation...transcription Cellular.processes...cell.envelope.and.cell.division...cell.wall.synthesis Metabolism...additional.metabolic.pathways...biosynthesis.of.cell.wall.components Metabolism...additional.metabolic.pathways...biosynthesis.of.cell.wall.components Information.processing...genetics...DNA.restriction..modification Metabolism...carbon.metabolism...carbon.core.metabolism Metabolism...lipid.metabolism...utilization.of.lipids Cellular.processes...homeostasis...acquisition.of.iron Metabolism...additional.metabolic.pathways...iron.metabolism Information.processing...protein.synthesis..modification.and.degradation...protein.secretion Metabolism...additional.metabolic.pathways...miscellaneous.metabolic.pathways Information.processing...protein.synthesis..modification.and.degradation...proteolysis Information.processing...protein.synthesis..modification.and.degradation...chaperones..protein.folding Metabolism...nucleotide.metabolism...nucleotide.metabolism..other Information.processing...protein.synthesis..modification.and.degradation...protein.secretion Metabolism...electron.transport.and.ATP.synthesis...electron.transport..other Metabolism...additional.metabolic.pathways...phosphate.metabolism Information.processing...genetics...DNA.restriction..modification Cellular.processes...homeostasis...acquisition.of.iron..based.on.similarity Metabolism...additional.metabolic.pathways...iron.metabolism Information.processing...protein.synthesis..modification.and.degradation...protein.modification Cellular.processes...cell.envelope.and.cell.division...cell.wall.degradation..turnover Information.processing...genetics...DNA.repair..recombination..based.on.similarity Cellular.processes...transporters...transporters..other Metabolism...amino.acid..nitrogen.metabolism...putative.amino.acid.transporter Information.processing...RNA.synthesis.and.degradation...DEAD.box.RNA.helicases Cellular.processes...cell.envelope.and.cell.division...cell.wall..other..based.on.similarity Metabolism...electron.transport.and.ATP.synthesis...electron.transport..other..based.on.similarity Cellular.processes...cell.envelope.and.cell.division...cell.wall..other Cellular.processes...transporters...ABC.transporters Cellular.processes...homeostasis...trace.metal.homeostasis..Cu..Zn..Ni..Mn..Mo. Information.processing...genetics...DNA.replication..based.on.similarity Information.processing...protein.synthesis..modification.and.degradation...chaperone..protein.folding..based.on.similarity Information.processing...protein.synthesis..modification.and.degradation...protein.secretion..based.on.similarity Metabolism...additional.metabolic.pathways...miscellaneous.metabolic.pathways Information.processing...protein.synthesis..modification.and.degradation...chaperone..protein.folding..based.on.similarity Metabolism...additional.metabolic.pathways...biosynthesis.of.cofactors Metabolism...additional.metabolic.pathways...miscellaneous.metabolic.pathways Metabolism...carbon.metabolism...carbon.core.metabolism Information.processing...genetics...DNA.repair..recombination No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category 90 Binomial P-value 0.044 0.039 0.012 0.022 0.037 0.041 0.043 0.023 0.006 0.030 0.030 0.048 0.002 0.018 0.031 0.031 0.024 0.015 0.038 0.004 0.049 0.027 0.021 0.011 0.037 0.000 0.026 0.004 0.037 0.002 0.038 0.038 0.031 0.023 0.033 0.032 0.001 0.000 0.038 0.042 0.042 0.018 0.006 0.006 0.008 0.002 0.027 0.009 0.010 0.023 0.018 0.038 0.007 0.026 0.044 0.049 0.022 0.038 0.033 0.042 0.008 0.043 0.004 0.040 0.042 0.027 0.025 0.047 0.013 0.040 0.023 0.003 0.013 0.008 0.012 0.020 0.015 0.019 0.034 0.047 0.048 0.029 0.031 0.030 0.031 0.014 0.024 0.048 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA Table 3. Enriched functional categories from target predictions on the selected putative srRNAs Query, putative srRNA name. Enriched category, enriched category of the putative srRNA computed on all B. subtilis 168 target predictions. Binomial P-value, p-value indicating the significance of the enrichment. Chapter 4 Query S2 S72 S72 S111 S140 S144 S145 S145 S145 S181 S181 S198 S249 S249 S254 S254 S254 S289 S289 S309 S326 S326 S326 S345 S345 S357 S357 S415 S458 S499 S512 S512 S512 S512 ykzW S547 S612 S641 S641 S659 S708 S708 S717 S718 S728 S728 S728 S731 S796 S796 S796 S797 S797 S809 S857 S877 S877 S903 S907 S912 S1009 S1022 S1024 S1027 S1052 S1052 S1227 S1227 S1251 S1495 S1495 S1534 S1534 S1583 S1583 S1583 S275 S313 S348 S423 S444 S462 S503 S645 S653 S665 S849 S863 S968 S1029 S1136 S1455 Enriched regulon DnaA.regulon AdaA.regulon CssR.regulon FadR.regulon CodY.regulon CcpA.regulon Btr.regulon CcpC.regulon CitB.regulon A.box AbrB.regulon AbrB.regulon AseR.regulon DnaA.regulon DegU.regulon FadR.regulon Fur.regulon AbrB.regulon FadR.regulon AbrB.regulon A.box AbrB.regulon DegU.regulon AhrC.regulon FruR.regulon A.box AraR.regulon CcpA.regulon A.box A.box Btr.regulon CitB.regulon FsrA.regulon Fur.regulon CysL.regulon glmS.ribozyme FruR.regulon AbrB.regulon DegU.regulon glmS.ribozyme AhrC.regulon BltR.regulon BsdA.regulon Fur.regulon AseR.regulon CcpA.regulon DeoR.regulon FruR.regulon Abh.regulon CcpC.regulon FadR.regulon ArsR.regulon CtsR.regulon CzrA.regulon AzlB.regulon A.box CatR.regulon CysL.regulon FadR.regulon AraR.regulon DnaA.regulon CcpC.regulon ComN.regulon CsoR.regulon AdaA.regulon G.box CymR.regulon FadR.regulon DegU.regulon A.box FMN.box ArsR.regulon GlcT.regulon Abh.regulon AbrB.regulon CcpN.regulon No enriched regulon No enriched regulon No enriched regulon No enriched regulon No enriched regulon No enriched regulon No enriched regulon No enriched regulon No enriched regulon No enriched regulon No enriched regulon No enriched regulon No enriched regulon No enriched regulon No enriched regulon No enriched regulon Binomial P-value 0.022 0.017 0.029 0.005 0.011 0.025 0.026 0.039 0.026 0.023 0.025 0.034 0.041 0.041 0.004 0.023 0.039 0.014 0.023 0.001 0.048 0.004 0.021 0.046 0.009 0.026 0.047 0.031 0.012 0.012 0.005 0.005 0.007 0.000 0.018 0.031 0.044 0.001 0.011 0.035 0.046 0.042 0.032 0.043 0.012 0.024 0.017 0.032 0.014 0.044 0.018 0.021 0.030 0.035 0.018 0.018 0.036 0.019 0.025 0.006 0.039 0.011 0.040 0.038 0.031 0.012 0.004 0.035 0.041 0.029 0.019 0.049 0.037 0.023 0.023 0.018 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA Table 4. Enriched regulons from target predictions on the selected putative srRNAs Query, putative srRNA name. Enriched regulon, enriched regulon of the putative srRNA computed on all B. subtilis 168 target predictions. Binomial P-value, p-value indicating the significance of the enrichment. 91 In silico target profiling srRNA regulons and the possible relationships between several putative srRNAs (Figure S1). For every putative srRNA, we computed the binomial p-value enrichment of functional enrichment on the evolutionary conserved targets and only considered functional groups with a binomial p-value of ≤0.05. Enrichment of minimally one functional category was observed for 45 of the 63 putative srRNAs, with a maximum of 4 functional categories for S659 and RnaC/S1022 (Table 5). In some cases, the obtained enriched categories were identical with the functional enrichment obtained only for B. subtilis 168 (compare Tables 3 and 5). We hypothesize that this provides additional information to predict the function and most likely targets of the respective srRNAs. Examples from Table 5 will be discussed in detail for FsrA/S512, S462, SR1, and RsaE/ S415 in the case studies below. Altogether, we believe that true and important srRNA-target interactions are likely evolutionary conserved and can be identified through the analyses described in the previous paragraphs. However, the detection of such conserved potential srRNA-target interactions does not automatically make them true interactions. It is conceivable that there may be other reasons, independent of srRNA regulation, for the conservation of both interacting sequences. Such phylogenetic inertia - the influence of the ancestor on the descendant - makes it impossible to compute a statistical likelihood for the relevance of the predicted conservation of srRNA-target interactions. Co-expression analyses for improved prediction of potential srRNA-target interactions SrRNAs can regulate their targets in a wide variety of ways, most simply divided into directly triggering degradation of srRNA–mRNA duplexes, or inhibiting the translation of the mRNA (48). It is generally believed that even when an srRNA solely inhibits translation, this will lead to some degradation of the mRNA (52), mainly because this srRNA-bound mRNA is not protected by elongating ribosomes. While there are known exceptions to this rule (30), one can take the degradation of RNA duplexes as a premise and, accordingly, it can be anticipated that for the majority of srRNA targets (small) differences in mRNA abundance will be apparent, which correlate with changes in the abundance of the respective srRNA. Thus, in transcriptome data across many different growth conditions, one might expect to see a correlation in the expression 600 ● ● Number of targets 500 ● all targets conserved targets ● ● 400 ● ● ● ● ● ● ● ● 300 ● ● ● ● ● ● 200 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● S6 4 S1 S672 S745 9 S2 6 S389 S126 S 44 S1145 5 S783 S 97 S1309 2 S927 S412 S923 S507 S403 S144 S198 S140 S381 S 48 S1659 4 S395 S357 S413 S715 S908 S 68 S1653 0 S222 S554 S847 S 77 S1718 0 S809 S 09 S1111 25 1 S S 2 S1512 02 S6 9 S212 S 49 S1849 5 S 34 S1462 S1136 0 S224 S975 S403 S 99 S1731 S1052 0 S627 S465 S758 yk 17 z S W S1857 4 S855 S363 S745 28 ● ● ● ● ● ● ● 0 ● ● ● ● ● ● Figure 8. Number of predicted targets and conserved predicted targets of putative srRNAs For every putative srRNA in the selection the total number of predicted targets was plotted, together with the number of these targets that are also predicted in 8 or more Bacillus genomes (i.e. obtained a Conserved flag). For many RNA segments, especially the ones with large numbers of initially predicted targets, the conservation requirement removes the majority of considered targets. 92 Chapter 4 Query S2 S2 S140 S144 S181 S198 S198 S249 S289 S309 S309 S309 S326 S345 S345 S357 S415 S423 S458 S458 S462 S503 S512 S512 S512 ykzW ykzW S547 S641 S641 S641 S645 S645 S645 S653 S659 S659 S659 S659 S717 S728 S731 S796 S797 S809 S849 S857 S863 S863 S863 S907 S912 S912 S968 S968 S968 S1022 S1022 S1022 S1022 S1024 S1052 S1136 S1227 S1227 S1227 S1455 S1495 S1495 S1495 S1583 S1583 S1583 S72 S111 S145 S275 S313 S348 S444 S612 S665 S708 S718 S877 S903 S1009 S1027 S1029 S1251 S1534 Enriched category Metabolism...amino.acid..nitrogen.metabolism...biosynthesis..acquisition.of.amino.acids Information.processing...RNA.synthesis.and.degradation...RNases Cellular.processes...cell.envelope.and.cell.division...cell.division Metabolism...nucleotide.metabolism...metabolism.of.signalling.nucleotides Metabolism...electron.transport.and.ATP.synthesis...electron.transport..other..based.on.similarity Metabolism...lipid.metabolism...biosynthesis.of.lipids Information.processing...RNA.synthesis.and.degradation...transcription Information.processing...genetics...DNA.condensation..segregation Cellular.processes...cell.envelope.and.cell.division...cell.wall.synthesis Cellular.processes...cell.envelope.and.cell.division...cell.division Metabolism...additional.metabolic.pathways...iron.metabolism Information.processing...genetics...DNA.replication..based.on.similarity Metabolism...electron.transport.and.ATP.synthesis...electron.transport..other..based.on.similarity Metabolism...amino.acid..nitrogen.metabolism...biosynthesis..acquisition.of.amino.acids Metabolism...nucleotide.metabolism...utilization.of.nucleotides Metabolism...amino.acid..nitrogen.metabolism...biosynthesis..acquisition.of.amino.acids Metabolism...electron.transport.and.ATP.synthesis...respiration Metabolism...electron.transport.and.ATP.synthesis...respiration Metabolism...electron.transport.and.ATP.synthesis...regulators.of.electron.transport Metabolism...nucleotide.metabolism...nucleotide.metabolism..other Cellular.processes...cell.envelope.and.cell.division...cell.shape Information.processing...protein.synthesis..modification.and.degradation...protein.secretion Cellular.processes...transporters...ABC.transporters Cellular.processes...homeostasis...acquisition.of.iron Metabolism...additional.metabolic.pathways...iron.metabolism Metabolism...nucleotide.metabolism...utilization.of.nucleotides Metabolism...additional.metabolic.pathways...sulfur.metabolism Cellular.processes...transporters...ABC.transporters Cellular.processes...cell.envelope.and.cell.division...cell.wall.synthesis Metabolism...electron.transport.and.ATP.synthesis...electron.transport..other Metabolism...additional.metabolic.pathways...biosynthesis.of.cell.wall.components Cellular.processes...cell.envelope.and.cell.division...cell.wall.synthesis Cellular.processes...cell.envelope.and.cell.division...cell.wall..other Metabolism...additional.metabolic.pathways...biosynthesis.of.cell.wall.components Cellular.processes...transporters...transporters..other Cellular.processes...cell.envelope.and.cell.division...cell.wall.synthesis Cellular.processes...transporters...phosphotransferase.systems Metabolism...amino.acid..nitrogen.metabolism...biosynthesis..acquisition.of.amino.acids Metabolism...additional.metabolic.pathways...biosynthesis.of.cell.wall.components Metabolism...carbon.metabolism...carbon.core.metabolism Metabolism...carbon.metabolism...utilization.of.specific.carbon.sources Information.processing...protein.synthesis..modification.and.degradation...translation Metabolism...nucleotide.metabolism...biosynthesis..acquisition.of.nucleotides Metabolism...lipid.metabolism...biosynthesis.of.lipids Information.processing...genetics...DNA.condensation..segregation Information.processing...protein.synthesis..modification.and.degradation...chaperone..protein.folding..based.on.similarity Cellular.processes...cell.envelope.and.cell.division...cell.division Cellular.processes...transporters...ABC.transporters Cellular.processes...homeostasis...acquisition.of.iron..based.on.similarity Metabolism...additional.metabolic.pathways...iron.metabolism Cellular.processes...transporters...transporters..other Cellular.processes...cell.envelope.and.cell.division...cell.wall.degradation..turnover Information.processing...RNA.synthesis.and.degradation...DEAD.box.RNA.helicases Cellular.processes...transporters...transporters..other Metabolism...carbon.metabolism...carbon.core.metabolism Metabolism...amino.acid..nitrogen.metabolism...putative.amino.acid.transporter Cellular.processes...cell.envelope.and.cell.division...cell.wall..other Cellular.processes...cell.envelope.and.cell.division...cell.division Information.processing...genetics...DNA.replication Information.processing...genetics...DNA.repair..recombination Information.processing...genetics...DNA.replication..based.on.similarity Information.processing...genetics...DNA.condensation..segregation Cellular.processes...cell.envelope.and.cell.division...cell.shape Cellular.processes...homeostasis...metal.ion.homeostasis..K..Na..Ca..Mg. Metabolism...additional.metabolic.pathways...miscellaneous.metabolic.pathways Information.processing...protein.synthesis..modification.and.degradation...chaperone..protein.folding..based.on.similarity Metabolism...additional.metabolic.pathways...biosynthesis.of.cofactors Cellular.processes...cell.envelope.and.cell.division...cell.wall..other..based.on.similarity Metabolism...nucleotide.metabolism...nucleotide.metabolism..other Metabolism...additional.metabolic.pathways...biosynthesis.of.cofactors Metabolism...carbon.metabolism...carbon.core.metabolism Information.processing...genetics...DNA.replication..based.on.similarity Information.processing...RNA.synthesis.and.degradation...RNases No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category No enrichment category 93 Binomial P-value 0.026 0.012 0.013 0.023 0.039 0.022 0.011 0.013 0.037 0.044 0.030 0.048 0.031 0.039 0.042 0.037 0.001 0.026 0.013 0.013 0.026 0.047 0.004 0.000 0.000 0.040 0.029 0.013 0.038 0.012 0.018 0.012 0.036 0.007 0.038 0.020 0.048 0.013 0.041 0.003 0.040 0.005 0.012 0.016 0.015 0.044 0.048 0.036 0.014 0.015 0.027 0.036 0.031 0.049 0.024 0.049 0.001 0.037 0.030 0.028 0.019 0.045 0.036 0.021 0.032 0.042 0.020 0.047 0.034 0.039 0.037 0.047 0.029 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA Table 5. Enriched functional categories from evolutionary conserved predicted targets Query, putative srRNA name. Enriched category, enriched category of the putative srRNA computed on the the predicted target interactions that are conserved in 8 species (and including B. subtilis 168) or more. Binomial P-value, p-value indicating the significance of the enrichment. In silico target profiling level of an srRNA and its actual target(s). To test this idea, we computed the Pearson correlation between all predicted srRNA-target pairs across the 104 condition space analysed by Nicolas et al. (17). However, there was no significant difference detectable when the pair-wise expression correlations of potential srRNA-target pairs with predicted TargetRNA p-values of ≤0.01 were compared to pairs predicted at a very low probability (>0.50) (Figure 9). There are thus strong pair-wise correlations both between significantly predicted targets and between virtually random srRNA-mRNA pairs. It therefore seems that, by only inspecting the pair-wise correlations in expression, no reliable indication of srRNA-mediated regulation can be obtained. To find other ways of further improving the predicted srRNA-target interactions, we decided to apply two other expression-based methods for identifying more likely srRNA targets, namely B-cluster enrichment and Peaks expression correlation. For the B-cluster enrichment analysis, we exploited a correlation analysis that was previously performed by Nicolas et al. (17). In this analysis all expressed genes and RNA segments were clustered into three types of clusters based on the pair-wise correlations of their expression levels. The most strongly co-expressed segments (with Pearson pairwise correlation coefficient ρ≥0.8) were assigned to A-clusters, and the weakly co-expressed segments (ρ=0.4) were assigned to C-clusters. Genes with an intermediate level of correlation (ρ=0.6) were clustered together in B-clusters (for an example plot of a B-cluster profile see Figure 1B). To achieve a proper tradeoff between sensitivity and selectivity, we decided to use these intermediate B-clusters for our analysis. Notably, when an srRNA has an effect on the expression level of a predicted target, this target may be found in another B-cluster. Likewise, if an srRNA regulates multiple targets, it may have a similar effect on the expression level of these targets. These predicted targets are thus more likely to end up in the same B-cluster. The applied B-cluster analysis therefore tests whether particular srRNAs have a more than randomly expected number of predicted targets from one B-cluster amongst their predicted targets. This can be assessed with an enrichment analysis, as was done before for the enrichment of functional categories. Accordingly, we computed B-cluster enrichment on the set of predicted targets for every putative srRNA and used a p-value cut-off for enrichments of ≤0.05. For 50 out of the 63 selected putative srRNAs, this resulted in an enrichment of one or more B-clusters. The enriched B-clusters for every target are listed in Table S3 in the Supplementary data, together with the respective enrichment p-values. Notably, Figure 9. Pearson correlation between putative srRNAs and their predicted targets at different p-value cut-offs The expression correlations of the indicated group of potential srRNAtarget pairs were computed. For each of these distributions a kernel density estimate is shown. Expression correlations between significant srRNAtarget pairs are only very slightly different from control srRNA-target pairs with an interaction p-value of >0.5. 94 Chapter 4 the number of false-positive B-clusters is expected to increase strongly with increasing p-value, and especially those B-clusters with very high enrichment (i.e. the B-clusters with the lowest p-values) seem relevant. All predicted targets of a particular srRNA that are part of an enriched B-cluster received a B-cluster Flag (Table S1; examples are shown in Table 2). Specific examples of targets with a B-cluster flag will be discussed for FsrA/S512, SR1, and RsaE/S415 in the case studies. As pointed out above, the Pearson expression correlations over the complete condition space are not indicative of true srRNA-target pairs (Figure 9). On the other hand, the B-cluster analysis identified predicted srRNA targets that are significantly co-expressed in the complete condition space, perhaps due to regulation by the respective srRNA. Notably however, none of these approaches focused on the specific expression correlation between a particular srRNA molecule and its predicted target under conditions where the expression level of the putative srRNA changes. In case an srRNA is for instance specifically induced under stress conditions, further inspection of this condition would be relevant to distinguish negatively or positively correlated predicted targets. For such high expression conditions, we designed a dedicated approach with the goal of identifying correlated targets that are specific for at least one of these conditions. To this end, we used similarities in all 269 tiling array hybridizations (corresponding to 104 expression conditions) that were computed previously by Nicolas et al. (17). The expression profiles of all segments were plotted against an x-axis that was grouped according to the computed level of similarity. The resulting plots facilitated the selection of environmental conditions that are most similar to the peak conditions, but differ in the levels of putative srRNA expression (Figure 1B). This peak expression analysis was built up as follows. Firstly, high expression conditions for every putative srRNA were extracted from the compendium of the 269 hybridizations. Such high expression conditions were designated as peak srRNA expression conditions. As a control for each peak srRNA expression condition, the six tiling array hybridizations most related to the peak srRNA expression condition (i.e. the closest conditions on the x-axis of the plot in Figure 1B), where the expression of the putative srRNA was below a cut-off, were extracted. These were then called control for peak srRNA expression. Secondly, the correlation between the putative srRNAs with their possible targets was computed for the peak srRNA expression condition with respect to the control for peak srRNA expression (see Materials and Methods for the details of these computations). Thirdly, to identify relevant predictions, the resulting data was filtered in three ways: 1) the p-values of peak correlations had to be significant (≤0.05), 2) the TargetRNA prediction p-value was set at its default (≤0.01), and 3) only those srRNA-target pairs where the peak correlation was larger (or smaller) than the overall correlation were retained. The latter pairs, were defined as those pairs where the absolute ratio of (peak correlation - overall correlation) / overall correlation was ≥1.5. This target selection approach resulted in the identification of 4305 targets shared between all 63 putative srRNAs. Likewise, as was done above, these targets obtained a flag, called Peaks flag (Table S1; for specific examples see Table 2). The obtained results suggest that this analysis can help to determine the conditional dependency of srRNA – target regulation. For example, if the correlation between putative srRNA and target is close to 0 calculated over all 104 conditions and is -0.4 in the Peaks analysis, the putative srRNA could be responsible for this. Based on such observations detailed experiments in the relevant conditions can be designed. Examples of targets with a Peaks flag will be discussed for SR1, S462, and RsaE/S415 in the case studies below. Previous studies on srRNA regulation have shown that the expression level of an srRNA is crucial for its mode of regulation (5, 53), and that an srRNA can have a (small) effect already when it is expressed at a similar or lower level than the target (5). However, it is believed that for strong target regulation, the srRNA should be expressed to a greater extent than the target. In the afore-going paragraphs, we addressed multiple ways of selecting more likely srRNA targets, 95 In silico target profiling but did not yet take into account this simple requirement of srRNA regulation. By specifically assessing the expression of both the putative srRNA and its target in any experimental condition, or their co-expression, we aimed to exclude those targets that are not co-expressed at a considerable level in one of the 104 conditions. To achieve this, we first inspected the number of targets that remained with different cut-offs of expression levels for both the putative srRNA and their targets (Figure 10). This was done by selecting all expression conditions with an expression level of the predicted target of higher than 8 - 16. Similarly, all expression conditions with a predicted putative srRNA expression level higher than 10 - 16 were selected. Subsequently, it was tested for every combination of expression cut-offs whether the putative srRNA and its predicted target were still both present in minimally one expression condition. As pointed out before, the complete set of predicted targets with p-values ≤0.01 was 11419 (Table S1). Figure 10 shows that increasing the stringency of the target expression level reduced the number of predicted targets faster than doing the same for the putative srRNAs. This shows that srRNAs are generally expressed at higher levels than their predicted targets. To be selective, but at the same time not remove too many targets, we selected those srRNA-target pairs with a minimum expression level for the srRNA of 12 and a minimum expression level of the target of 11. This eliminates 3026 predicted srRNA-target pairs, representing a decrease of 26.5% (Figure 10). The remaining 73.5% of the predicted srRNA-target pairs that met this threshold obtained a Conditional flag (Table S1; for specific examples see Table 2). It should however be noted here that, as with the other presented selection analyses, one cannot exclude the possibility that removed predicted srRNA-target pairs may eventually turn out to be real. Remaining targets conditional subsetting. In grey n=11419 16 target expression level 14 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 15 16 12 10 8 10 11 12 13 14 sRNA expression level 96 Figure 10. Remaining targets of putative srRNAs in co-expression analysis The fraction of remaining targets for every combination of putative srRNA expression levels and predicted target expression levels (black circles) is plotted. The surface of the grey circles represents the total number of predicted targets of the selection of putative srRNAs. It follows that sRNA expression is generally higher than that of its predicted targets. Chapter 4 Identification of srRNA seed regions Seed regions are conserved and accessible regions of srRNAs that directly mediate srRNA-target interaction(s). Other regions of the srRNA molecule could for instance only be important for stability. Seed regions in srRNAs from Gram-negative bacteria are often present at the 5’ end of the srRNA molecule, and this is reminiscent of eukaryotic microRNAs, which select multiple targets by short Watson–Crick pairing of a 5’ located conserved “seed” (54). The concept of seed regions is now established as a way of improving target predictions (55), and conferring srRNA regulation to heterologous targets by Hfq-binding srRNAs (54, 56). Seed regions can help to improve target predictions since the length of the query sequence is strongly reduced and, thereby, also the number of expected false-positive predictions is minimized (55). Notably, it has been shown that srRNA regulation can even be conferred to heterologous targets by placing only the seed region at the 5’ end of a synthetic RNA or srRNA scaffold (54, 56). Seed regions have not yet been reported in studies on B. subtilis srRNAs. However, it has been reported that FsrA/S512 particularly uses a CCCCUCU sequence for target regulation, and this could thus form the core of (one of its) seed region(s) (11). Likewise, the srRNA RsaE of S. aureus (RsaE/S415 in B. subtilis) was reported to be a member of a class of srRNAs that contain a conserved UCCC sequence motif (18). Since this motif was present in many predicted srRNA-target interactions (18) it may also be part of a seed region. It has been suggested that seed regions are the generally conserved and unstructured regions of an srRNA molecule (54). Such regions could be identified from our analysis, since we have investigated the conservation of all putative srRNAs in the afore-mentioned set of 62 Bacillus genomes. These sequences are formatted such that sequence alignments can be easily made. The FASTA file with all BLAST results for the selected putative srRNAs is part of our Supplementary data file predictions. Sequence alignments can for instance be made with LocARNA (57), since LocARNA computes the conserved secondary structure and presents an alignment of this computed secondary structure (see the case studies below for examples). Another option to identify seed regions is not to take the srRNA sequence as a starting point, but its predicted targets. In this case, the seed region may be that particular region where many targets are predicted to bind. To facilitate the seed region analysis, we plotted the predicted interaction region for all targets on a single plot for every selected srRNA (these plots are part of Supplementary data file predictions; see Figure 11 where FsrA/S512 is used as an example). In these plots the accessible regions (as predicted by RNAfold) were highlighted in grey and the predicted targets were colored based on their number of flags. These plots can be visually inspected to see whether there are any preferentially predicted interacting regions. However, very few of such seed regions seem apparent (Supplementary data file predictions). The plots also provide visual clues as to the number of predicted targets and the length of the sequence. The example of this plot for FsrA/S512 will be further discussed below. Notably, a detailed analyses of possible seed regions is beyond the scope of this study. Nevertheless, the presented analyses might be helpful for assessment of predicted seed region targets. Please note also that seed regions seem especially valuable tools to find new targets once the mode of action of an srRNA is known. In this respect one has to bear in mind that it is presently not clear whether Hfq-independent srRNAs will function as predictably as the Hfq-binding srRNAs for which seed regions have been established (54, 55). Prediction case studies Enrichment of functional categories in target predictions captures the established FsrA/S512 function in iron metabolism 97 In silico target profiling To investigate the relevance of our target predictions and the functional gene enrichment analyses, we compared our data for FsrA/S512 with data reported on this srRNA by Gaballa et al. (10). This breakthrough paper elucidated the molecular players of the Fur-mediated ironsparing response. It was shown that the iron-sparing response involves the conjoint action of the FsrA/S512 srRNA and three small basic proteins named FbpABC (10). It was furthermore reported that FsrA/S512 srRNA directly represses translation of the succinate dehydrogenase operon, sdhCAB. In addition, a two-dimensional (2D) gel-based proteomics analysis was used to identify other proteins of which the expression is modulated by FsrA/S512. Importantly, nine of the twelve proteins that were differentially expressed in FsrA/S512 mutant cells were independently identified through our target predictions addressing the 5’UTRs of all proteinencoding genes (via seven predicted interactions) (Figure 12A). Nevertheless, only 4 of these, namely SdhCAB and CitB, were so far reported as members of the FsrA/S512 regulon (via two interactions) (10, 11, 35). The fact that these targets were missed in previous target predictions may relate to the extended settings around the translational start site used in our present predictions. Indeed all of the previously missed target genes of FsrA/S512 have a predicted interaction region for FsrA/S512 outside of the region addressed with the default prediction settings of TargetRNA. We therefore believe that this finding justifies our decision to expand the prediction criteria in our present target predictions. Another factor that has influenced the presently predicted interactions is that the S512 sequence used in our study starts 19 nucleotides earlier than the sequence considered in the published studies on FsrA. These nucleotides are unstructured and this would be consistent with the known properties of seed regions at the 5’ end of srRNA molecules (54) (Figure 13A). As far as we are aware, the previously used FsrA sequence start site was inferred from the distance to the Fur repressor binding site and has not been mapped experimentally. FsrA/S512 was unfortunately not identified in the RNA-seq analysis from Irnov et al. (16, 17) so this data cannot provide more information on its precise start site. Our observations therefore warrant a detailed (and possibly condition-dependent) mapping of the 5’ end of FsrA/S512. Until this has been done, we suggest that the transcripts of fhuD, feuA, yfmC, yxiB and yfiY should be considered as possible additional direct targets of FsrA/S512. Notably, the products of these five genes are all involved in iron metabolism, and thus it is well conceivable that the respective mRNAs are genuine members of the FsrA regulon. In accordance with the published data and our present observations, a functional gene enrichment analysis of all predicted FsrA/S512 targets shows a highly significant enrichment for the functional categories iron metabolism (p-value <0.001) and acquisition of iron (p-value 0.001) (Table 3). Furthermore, the regulon enrichment analysis for FsrA/S512 shows enrichment of the Fur, FsrA, CitB and Btr regulons (Table 4), where it has to be noted that Btr and CitB are involved in the regulation of the feuABC-ybbA operon. All these observations imply that our de novo target prediction strategy predicts FsrA/S512 to be involved in iron uptake and iron metabolismrelated processes. This shows that functional enrichment of target genes is a potentially useful approach for identifying srRNA functions and regulons. Interestingly, this enrichment also holds when only conserved targets are considered (iron metabolism and acquisition of iron with p-value <0.001) (Table 4). This strongly suggests that the FsrA/S512 function is conserved in multiple Bacillus species. Six FsrA/S512 targets are also part of the list of targets with four flags (Table 2), and all of these FsrA/S512 targets are related to iron handling. These flags are also visualized in the target region plot for FsrA/S512 in Figure 11. Specifically, all putative targets with 4 flags are predicted to interact with the beginning of the FsrA/S512 sequence. Predicted targets with three flags potentially interact with the first, second, and third exposed (loop) regions of FsrA/ S512 (Figure 11; loop regions are shaded in grey). However, many of the predicted targets share such a large amount of sequence complementarity with FsrA/S512 that they span multiple loop regions (Figure 11). No targets are predicted for the 3’ end of the FsrA/S512 sequence, but this is 98 Chapter 4 due to the default TargetRNA settings that remove potential terminators. However, when FsrA/ S512 targets were predicted without this default setting, they still seemed to preferentially bind to the first half of the FsrA/S512 sequence (data not shown). Altogether, our FsrA/S512 case study shows that (evolutionary) target predictions combined with a functional enrichment analysis can predict the functional process in which a particular srRNA might be involved. Based on such a predicted functional process, srRNA mutagenesis experiments can be designed to verify whether the respective srRNA mutant cells show relevant phenotypes related to the enrichment category (i.e. iron limitation in the case of FsrA/S512). Additional predicted FsrA/S512 functions As was discussed above, the role of FsrA/S512 in the iron-sparing response is well established. Our predictions suggest that this is the main function of FsrA/S512. The conserved enriched functional process that, at least by name, is not directly iron-related is termed ABC transporters (Table 5). However, inspection of the conserved targets responsible for this enrichment shows that all of these ABC transporters are involved in iron metabolism. Since none of the previously published studies identified ABC transporter genes as targets for FsrA/S512 (11), this suggests Figure 11. Predicted targets of FsrA/S512 plotted on the sequence and predicted secondary structure of this srRNA The region of the srRNA predicted to take place in the predicted target interactions was represented by a line above the nucleotide sequence of the srRNA (lower line). The colors of these lines represent the number of flags the predicted target received in the five reported analyses. The predicted RNAfold secondary structure of the srRNA is indicated in dot-bracket notation above the nucleotide sequence. Exposed (loop) regions from this predicted secondary structure of five bases or larger were indicated with a grey zone. The plot shown here is for FsrA/S512 and plots for all other selected putative srRNAs can be found in supplementary data file predictions. 99 In silico target profiling that the FsrA/S512 regulon is much larger than previously thought. Notably, inspection of the expression conditions of FsrA/S512 does however reveal a discrepancy with its presumed exclusive role in the iron-sparing response. In fact, the base-line expression of FsrA/S512 is quite high (10 on a log2 scale of 7-16) and it is highly expressed in many conditions that are not intuitively related to a shortage of iron (17). These for instance include growth in the exponential phase on minimal medium with pyruvate, and growth under conditions of high osmolarity. It thus may be that there are additional functions of FsrA/S512 that are condition-dependent. Condition-dependency of an srRNA regulator means that it can regulate one process under one environmental condition and another process under another environmental condition. Furthermore, the extent of target regulation by an srRNA can be dependent on the expression level of the putative target in the environmental condition, the expression level of the srRNA in the environmental condition, and possibly the condition-specific expression of srRNA-mRNA chaperones. According to the data presented by Nicolas et al. (17), FsrA/S512 is also expressed in LB medium. To verify this observation, we fused the FsrA/S512 promoter with gfp in single copy at its native locus by chromosomal integration of the GFP reporter plasmid pBaSysBioII (58). Next, we deleted the gene for the iron-activated transcriptional repressor Fur, which is known to repress FsrA/S512 expression (10). As shown by GFP expression measurements, FsrA/S512 expression was clearly de-repressed in the fur mutant background on LB medium (Figure 12, B and C). However, the FsrA/S512 promoter was also active, albeit at a much lower level, in the parental strain (Figure 12, C and D). In the presence of Fur, the FsrA/S512 promoter activity was most prominent in the early exponential growth phase, but it remained active at a lower level up to the late stationary phase. In addition, we determined the FsrA/S512 promoter activity by flow cytometry, showing that FsrA/S512 was homogeneously expressed throughout the growth on LB (Figure 12B). Supported by the analysis of FsrA/S512 expression in cells growing on LB, we compared the protein patterns of FsrA/S512 mutant and wild-type cells harvested during several phases of growth. This suggested differential expression of several proteins in the FsrA/S512 mutant, especially in the late exponential growth phase and during the transition from exponential to post-exponential growth (data not shown). Based on these observations, we performed a 2D gelbased proteomics analysis of samples collected during different growth phases and, in parallel, we isolated RNA which was analyzed by tiling arrays as previously described (17). Notably, these are the first reports of experiments performed directly on an FsrA/S512 mutant, since previously reported analyses have been performed on a strain where both fur and FsrA/S512 were deleted. In the 2D PAGE analysis all protein spots that showed at least a 2-fold increase or decrease in intensity compared to the parental strain were picked and analyzed by matrix-assisted laser desorption/ionisation time-of-flight (MALDI-TOF) mass spectrometry (MS) (Figure S3 and S4). In total 67 significantly differentially expressed protein spots were identified (Figure S5). We next compared these differentially expressed proteins with our target predictions. This resulted in one match, namely that of the aconitase CitB. CitB has been reported as a highly likely FsrA/ S512 target (10, 11). However, this was reported under the premise that CitB would only be regulated under conditions of low-iron and not in LB medium. Notably, CitB is not expressed in the early exponential phase, when FsrA/S512 expression is most prominent on LB medium. This suggests that there could be other FsrA/S512 targets regulated in this growth phase. The other differentially expressed proteins may be the result of indirect effects of the FsrA/S512 deletion or the deregulation of one of its targets. It is also feasible that some of these differentially expressed proteins are actually FsrA/S512 targets, but that these were missed in our TargetRNA predictions. Transcriptome analyses can provide a genome-wide view of the effects of an FsrA/ 100 Chapter 4 S512 deletion on transcript levels. We tested for significantly differentially expressed genes in the tiling array data obtained with an FsrA/S512 mutant using a linear model and found 8.1% (468 genes) of all genes significantly changed in abundance with p-values ≤0.05, and 0.4% (24 genes) with p-values ≤0.01 (Supplementary data file transcript profiling). Thirteen of the differentially expressed genes with p-values ≤0.05 are indeed predicted FsrA/S512 targets (with TargetRNA p-values ≤0.01), but most of these are of unknown function. On average more transcripts were higher expressed in the FsrA/S512 mutant compared to the parental strain (Figure 13B). This would be in line with the presumed dominant role of srRNAs in repressing their targets. However, it seems highly unlikely that the differential expression of all these genes is a direct result of the absence of FsrA/S512. Instead, it seems more likely that many of the observed differences result indirectly from the deregulation of a limited number of FsrA/S512 targets. For instance, a large part of the sulfur metabolism genes were found to be derepressed in the FsrA/ S512 mutant (Figure 13B). The expression changes of these sulfur metabolism genes suggest that there is an increased demand for cysteine in the FsrA/S512 mutant (Figure S6). In an attempt to explain this phenotype, we inspected the conserved predicted targets of FsrA/S512 for sulfurA B C D Figure 12. FsrA/S512 regulon members related to iron uptake and metabolism, and FsrA/S512 promoter activity A) Overlap between FsrA-specific differentially expressed proteins identified in the 2D PAGE analysis by Gaballa et al. (10) and our present target predictions. The sdhC and citB genes are known members of the FsrA/S512 regulon. Our predictions suggest five additional targets for FsrA/S512 with the indicated TargetRNA_v1 significance scores and coordinates of the predicted interaction. The yfmC interaction is predicted with a p-value of 0.02, the other predicted interactions have p-values ≤0.01. B) Flow cytometry histograms shown for the FsrA/S512 promoter fusion in the parental strain and the same promoter fusion in the Δfur background. Autofluorescence of the parental strain is indicated in grey. Representative data is shown. C) Promoter activity plots of the same strains used in panel B grown in a 96-well plate. Growth for both strains was identical and one growth curve is plotted. Representative data is shown. D) Same plot as in C, but without the Δfur mutation to clarify the induction pattern of the FsrA/S512 promoter in the parental background. 101 In silico target profiling related genes. A highly conserved and significant putative target is yvrO/iscS (Figure 13, B and C; Table 6). This gene encodes the essential cysteine desulfurase, which is involved in tRNA thiolation. The enzyme activity of YvrO/IscS leads to cysteine consumption, and this suggests that elevated levels of this protein could lead to the observed derepression of sulfur metabolism genes. YvrO/IscS was not identified in the 2D PAGE analysis, but the yvrO/iscS transcript levels are significantly changed in the FsrA/S512 mutant. However, this change is only slight, with a 0.67 log fold change decrease in the FsrA/S512 mutant compared to the parental strain (p-value < 0.04). In the case of relieved translation inhibition, transcript levels are expected to slightly increase, and not decrease as was observed in the present experiment. Therefore, it remains to be seen whether elevated YvrO/IscS levels are indeed responsible for the derepression of sulfur metabolism genes. It should be noted that an increase in the sulfur metabolism genes could also be related to an oxidative stress response as B. subtilis uses cysteine for thiol protection after oxidative stress (59). However no clear evidence for an oxidative stress response in cells lacking FsrA/S512 was obtained. ClpC is another conserved predicted target of FsrA/S512 (Figure 13C, Table 6). While a change in the ClpC level was not detected in the 2D PAGE analysis, the clpC mRNA level was significantly elevated in the FsrA/S512 mutant (log2 fold change 0.88). This elevated clpC transcript level could be the result of increased protection by elongating ribosomes due B A C clpC yrvO/iscS Figure 13. Conserved structure of FsrA/S512 and identification of two additional putative FsrA/S512 targets by tiling array analysis A) Conserved structure of S512 as predicted with the IntaRNA algorithm (http://rna.informatik.uni-freiburg.de/ IntaRNA/Input.jsp) (24). The corresponding sequence alignment is shown in Figure S2. Red color indicates high conservation of the predicted base-pairing interactions. The top loop, which represents the 5’ end of the S512 sequence, may correspond to the seed region identified in this manuscript. B) Scatter plot of the transcript profiling data obtained with tiling arrays for a FsrA/S512 deletion mutant and the parental strain 168. Averages from both hybridizations for the FsrA/S512 mutant and the parental strain are shown. The additional labels “clpC” and “yrvO” show possible FsrA/S512 targets as discussed in the text. “S512” indicates the hybridization of FsrA/S512, which is absent from the FsrA/S512 mutant. The downstream segments “S513, S514” are expressed at a higher level in the FsrA/S512 mutant due to readthrough transcription from the phleomycin deletion cassette (76) used to replace FsrA/S512. C) Predicted interaction of FsrA/S512 with clpC and yrvO/iscS. Predicted loop regions of FsrA/S512 are indicated in red. The coordinates of the predicted interaction of the srRNA and mRNA are shown. Notably, clpC and yrvO/iscS represent the fourth and second gene, respectively, of their cognate operons. 102 Chapter 4 to the alleviation of translational repression. Notably, ClpC is the ATPase subunit of the ATPdependent ClpC-ClpP protease, which is involved in a wide variety of processes (60). Because of the central role of ClpC in proteolysis and differentiation, the deregulation of a protein like ClpC is expected to have a large number of consequences (61). This could thus at least partly explain the large number of changes observed in the FsrA/S512 mutant. Lastly, while the present proteomic and transcriptomic analyses underscore the general importance of FsrA/S512 ,the identification of such a large number of most likely indirect effects may argue against the use of ‘omics’ analyses on srRNA mutant strains, at least when the specific goal of these analyses is to identify the direct srRNA targets. t BH tar _s _s 1 1 1 0.99 1 1 0.99 0.99 31 35 29 30 33 32 40 124 clpC BSn5_12000 A7A1_0120 clpC clpC clpC clpC clpC -69 -69 -69 -69 -69 -69 -69 -58 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 6 6 6 6 6 6 6 6 39 39 39 39 39 39 39 39 -54 -54 -54 -54 -54 -54 -54 -54 -22 -22 -22 -22 -22 -22 -22 -22 BSU00860 BSU00860 BSU00860 BSU00860 BSU00860 BSU00860 BSU00860 BSU00860 S512 S512 S512 S512 S512 S512 S512 S512 S512 S512 S512 S512 S512 102 102 102 102 102 102 102 102 102 102 102 102 102 Bacillus_subtilis_spizizenii_W23_uid51879 Bacillus_JS_uid162189 Bacillus_subtilis_168_uid57675 Bacillus_subtilis_BSn5_uid62463 Bacillus_subtilis_BSP1_uid184010 Bacillus_subtilis_QB928_uid173926 Bacillus_subtilis_RO_NN_1_uid158879 Bacillus_subtilis_spizizenii_TU_B_10_uid73967 Bacillus_subtilis_natto_BEST195_uid183001 Bacillus_amyloliquefaciens_plantarum_AS43_3_uid183682 Bacillus_amyloliquefaciens_plantarum_CAU_B946_uid84215 Bacillus_amyloliquefaciens_plantarum_YAU_B9601_Y2_uid159001 Bacillus_amyloliquefaciens_Y2_uid165195 0.99 1 1 1 1 1 1 0.99 0.99 0.98 0.97 0.98 0.98 2 15 13 12 12 16 17 16 70 226 228 247 247 iscSA MY9_2731 iscS BSn5_04550 A7A1_0393 iscSA I33_2796 GYO_2990 yrvO B938_12725 yrvO yrvO yrvO -86 -75 -75 -75 -75 -75 -75 -75 -64 -53 -53 -53 -53 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.05 0.06 0.06 0.06 11 11 11 11 11 11 11 11 11 12 12 12 12 69 69 69 69 69 69 69 69 45 45 45 45 45 -73 -73 -73 -73 -73 -73 -73 -73 -51 -51 -51 -51 -51 -17 -17 -17 -17 -17 -17 -17 -17 -17 -18 -18 -18 -18 BSU27510 BSU27510 BSU27510 BSU27510 BSU27510 BSU27510 BSU27510 BSU27510 BSU27510 BSU27510 BSU27510 BSU27510 BSU27510 103 BD mR NA mR NA t tar _s _s sR NA e sR NA e alu or Bacillus_subtilis_168_uid57675 Bacillus_subtilis_BSn5_uid62463 Bacillus_subtilis_BSP1_uid184010 Bacillus_subtilis_natto_BEST195_uid183001 Bacillus_subtilis_QB928_uid173926 Bacillus_subtilis_RO_NN_1_uid158879 Bacillus_subtilis_spizizenii_TU_B_10_uid73967 Bacillus_subtilis_spizizenii_W23_uid51879 Sc 102 102 102 102 102 102 102 102 Na Pv me nk Id ac Ra S512 S512 S512 S512 S512 S512 S512 S512 Or Fr g top e am th Le ng yN Qu er top Table 6. Selection of conserved predicted FsrA/S512 targets Queryname, putative srRNA name. Length, Length of the srRNA in B. subtilis. Org, name of the bacterium in which the conserved interaction was predicted. FracId, fraction of sequence identity of the srRNA query. Rank, rank in the TargetRNA predictions for the query in the respective genome. Name, name of the predicted target in the respective genome. Score, TargetRNA_v1 (23) prediction significance score. Pvalue, TargetRNA_v1 (23) prediction p-value. sRNA_start, start coordinate of putative srRNA in the predicted target interaction. sRNA_ stop, end coordinate of putative srRNA in the predicted target interaction. mRNA_start, start coordinate of the predicted target in the predicted target interaction relative to start codon. mRNA_stop, end coordinate of the predicted target in the predicted target interaction relative to start codon. BDBH, unique B. subtilis 168 locus tag of the nearest protein Blast hit of the predicted target for B. subtilis 168. In silico target profiling Correlations and target predictions suggest additional targets for SR1 In the preceding sections, we have argued that the expression level of an srRNA is crucial for its regulation, and an srRNA is therefore expected to be upregulated (or downregulated) in its functionally relevant condition. Such induction of srRNA expression in a specific growth phase was also observed for the B. subtilis srRNA SR1. SR1 is highly expressed during conditions of gluconeogenesis when CcpA and CcpN repression of the SR1 promoter is alleviated (42). SR1 is also strongly induced when cells enter the stationary phase after growth on rich medium (42). SR1 was the first srRNA for which a function and target were reported in B. subtilis (12). A 2D PAGE analysis of SR1 mutant B. subtilis cells revealed that three proteins under the control of AhrC, the transcriptional regulator of arginine metabolic genes, were differentially expressed. Direct pairing of SR1 and ahrC mRNA was subsequently confirmed (12). SR1 was later found to encode a 39-amino acid peptide that regulates the stability of the gapA operon, making it the only established dual-function srRNA in B. subtilis (37). Most recently, it was reported that these two SR1 functions – regulation of ahrC mRNA via srRNA-mRNA interaction and regulation of the gapA operon mediated by the SR1 peptide – are conserved in many Bacillus species (36). Upon close inspection of our target prediction dataset for SR1, four findings seemed of particular interest. Firstly, there appears to be no enrichment category for the predicted SR1 targets (Table 3). However, there are two enriched categories in the conserved predictions, namely sulfur metabolism and utilization of nucleotides (Table 5). A single target, adeC is responsible for the latter enrichment. AdeC is an adenine deaminase involved in purine salvage and interconversion, and the utilization of adenine as a nitrogen source. In the target prediction Tables, SR1-adeC has three flags; Conserved8, ConditionalFlag, and a BclusterFlag (Tables S1 and S2). The latter flag is an indication that SR1 may have an effect on the expression level of adeC. We note that the coordinates of the predicted adeC targets are remarkably conserved (Table 7). The predicted nrdE gene target (which is not a conserved predicted target) is also identified in the same B-cluster enrichment and is involved in nucleotide metabolism. NrdE is an essential protein of B. subtilis, which is involved in the synthesis of deoxyribonucleoside triphosphates. Furthermore, additional regulation of adeC and nrdE would fit with SR1s established role in the regulation of nitrogen metabolism (via AhrC regulation (12)). Secondly, the extracellular neutral protease NprE is a predicted conserved target of SR1 (Table 7). The nprE target also has three flags, Conserved8, ConditionalFlag, and a PeaksFlag. The latter relates to the high SR1 expression level during the induction of sporulation (S4 condition) (17). This is a condition for SR1 in the correlation under high expression conditions (Peaks expression). The correlation between SR1 and nprE is 0.01 over all conditions, but -0.48 in this peaks expression condition. This suggests that the nprE mRNA might be destabilized by SR1 under this condition. However, the observation could also be coincidental in case SR1 and nprE are both strongly regulated under this growth condition. Interestingly, NprE is again involved in nitrogen metabolism of stationary phase cells. Thirdly, in a microarray study, arginine metabolism was previously linked to methionine metabolism, suggesting a functional link between nitrogen and sulfur metabolism (62). SR1 might be part of this link, since its function in nitrogen metabolism and sulfur metabolism is enriched in the conserved target enrichment analysis (p-value 0.03; Table 5), via the putatively conserved regulation of CysJ (Table 7). Fourthly, the Sigma factor SigL is a regulator of stationary phase processes, including arginine metabolism. We noted that expression of SR1 and SigL is highly correlated (0.71) over all 104 expression conditions analysed by Nicolas et al. (17). Interestingly, this correlation was almost complete (0.98) in a glucose run-out time series (17). Both SR1 and sigL are induced when glucose runs out, but SR1 to a higher extent (17). Based on these observations, we propose that SR1 is relevant for the fine-tuning of nitrogen metabolism in the transition and stationary growth phases. 104 Chapter 4 Ra nk Na me Sc or e Pv alu e sR NA _s sR tart NA _s mR top NA _ mR star NA t _ BD stop BH ykzW ykzW ykzW ykzW ykzW ykzW ykzW ykzW 120 120 120 120 120 120 120 120 Bacillus_subtilis_168_uid57675 Bacillus_subtilis_QB928_uid173926 Bacillus_subtilis_BSn5_uid62463 Bacillus_subtilis_BSP1_uid184010 Bacillus_subtilis_natto_BEST195_uid183001 Bacillus_subtilis_RO_NN_1_uid158879 Bacillus_subtilis_spizizenii_W23_uid51879 Bacillus_subtilis_spizizenii_TU_B_10_uid73967 1 1 0.98 0.98 0.98 0.98 0.96 0.97 26 25 42 31 40 39 53 84 adeC adeC BSn5_19330 A7A1_0576 adeC ade adeC ade -74 -74 -72 -72 -72 -72 -71 -67 0.01 0.01 0.01 0.01 0.01 0.01 0.02 0.02 69 69 69 69 69 69 69 69 114 114 114 114 114 114 114 114 -67 -67 -68 -68 -67 -68 -68 -68 -20 -20 -21 -21 -20 -21 -21 -21 BSU14520 BSU14520 BSU14520 BSU14520 BSU14520 BSU14520 BSU14520 BSU14520 ykzW ykzW ykzW ykzW ykzW ykzW ykzW ykzW ykzW ykzW ykzW ykzW 120 120 120 120 120 120 120 120 120 120 120 120 Bacillus_subtilis_spizizenii_W23_uid51879 Bacillus_subtilis_168_uid57675 Bacillus_subtilis_QB928_uid173926 Bacillus_subtilis_BSn5_uid62463 Bacillus_subtilis_BSP1_uid184010 Bacillus_subtilis_natto_BEST195_uid183001 Bacillus_subtilis_RO_NN_1_uid158879 Bacillus_amyloliquefaciens_FZB42_uid58271 Bacillus_amyloliquefaciens_plantarum_AS43_3_uid183682 Bacillus_amyloliquefaciens_plantarum_CAU_B946_uid84215 Bacillus_JS_uid162189 Bacillus_subtilis_spizizenii_TU_B_10_uid73967 0.96 1 1 0.98 0.98 0.98 0.98 0.87 0.87 0.87 0.94 0.97 3 13 12 16 15 13 14 34 38 34 67 96 nprE nprE nprE BSn5_19435 A7A1_0557 nprE I33_1652 nprE B938_07575 npr MY9_1610 GYO_1812 -89 -82 -82 -77 -77 -77 -77 -67 -67 -67 -67 -66 0 0 0 0 0 0 0 0.02 0.02 0.02 0.02 0.03 7 7 7 7 7 7 7 7 7 7 7 7 80 75 75 74 74 74 74 80 80 80 80 60 -53 -39 -39 -38 -38 -38 -38 -51 -51 -51 -54 -30 22 22 22 22 22 22 22 22 22 22 22 22 BSU14700 BSU14700 BSU14700 BSU14700 BSU14700 BSU14700 BSU14700 BSU14700 BSU14700 BSU14700 BSU14700 BSU14700 ykzW ykzW ykzW ykzW ykzW ykzW ykzW ykzW ykzW ykzW ykzW ykzW ykzW ykzW 120 120 120 120 120 120 120 120 120 120 120 120 120 120 Bacillus_subtilis_168_uid57675 Bacillus_subtilis_QB928_uid173926 Bacillus_JS_uid162189 Bacillus_subtilis_spizizenii_W23_uid51879 Bacillus_amyloliquefaciens_DSM_7_uid53535 Bacillus_amyloliquefaciens_LL3_uid158133 Bacillus_amyloliquefaciens_TA208_uid158701 Bacillus_amyloliquefaciens_XH7_uid158881 Bacillus_subtilis_BSn5_uid62463 Bacillus_subtilis_BSP1_uid184010 Bacillus_subtilis_natto_BEST195_uid183001 Bacillus_subtilis_RO_NN_1_uid158879 Bacillus_subtilis_spizizenii_TU_B_10_uid73967 Bacillus_halodurans_C_125_uid57791 1 1 0.94 0.96 0.87 0.87 0.87 0.87 0.98 0.98 0.98 0.98 0.97 0.9 11 10 15 24 16 18 15 21 40 36 45 44 89 64 cysJ cysJ MY9_3390 cysJ cysJ cysJ cysJ yvgR BSn5_07640 A7A1_2561 yvgR I33_3462 GYO_3656 BH0609 -84 -84 -77 -77 -72 -72 -72 -72 -72 -72 -72 -72 -67 -52 0 0 0 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.02 0.02 10 10 12 12 25 25 25 25 12 12 12 12 12 1 82 82 96 74 73 73 73 73 74 74 74 74 55 28 -54 -45 -55 -15 -41 -41 -41 -41 -15 -15 -15 -15 4 -64 17 26 30 45 5 5 5 5 45 45 45 45 45 -31 BSU33440 BSU33440 BSU33440 BSU33440 BSU33440 BSU33440 BSU33440 BSU33440 BSU33440 BSU33440 BSU33440 BSU33440 BSU33440 BSU33440 ac Id Qu er yN am Le e ng th Or g Fr Table 7. Selection of conserved predicted SR1 targets Same legend as for Table 6. SigW-dependent expression of S462 is consistent with its predicted functions S462 was first reported by Nicolas et al., who proposed this srRNA to be regulated by the three alternative sigma factors SigWXY (17). Because of the similarity in the sequence motifs recognized by these three sigma factors, the applied promoter cluster analysis was unable to distinguish the respective promoters. However, a subsequent expression analysis that specifically focused on unravelling the SigW regulon (63), showed that S462 expression is most likely dependent on SigW (63). The SigW regulon is expressed at a low basal level during exponential growth, but is induced in response to cell envelope stress. Such stresses can be provoked by antibiotics, alkaline shock and salt shock (63). To confirm the SigW-dependent regulation of S462, we fused the start of its sequence with gfp in single copy at its native locus using the chromosomal integration plasmid pBaSysBioII (58). As shown by GFP expression analysis, the S462 promoter was indeed expressed at low levels during exponential growth on LB medium, and GFP expression from this promoter was no longer detectable when sigW was deleted (Figure 14A). Interestingly, the S462 gene is situated next to that encoding HtrA, an important quality control protease under CssR control and it is known that the CssR and SigW response are linked (64). Target predictions on S462 seem consistent with a role of S462 in the SigW-dependent cell envelope stress response. In all B. subtilis target predictions the categories cell envelope and 105 In silico target profiling cell division - capsule biosynthesis and degradation and acquisition of nucleotides were found to be enriched (Table 3). The first category is clearly linked to the function of SigW and a related category is also enriched in the conserved target predictions (cell envelope and cell division – cell shape) (Table 5). The conserved predicted targets that are related to cell envelope processes include the essential gene for MraY (2 flags) involved in peptidoglycan precursor biosynthesis, the cell envelope stress protein YceH (2 flags) and the cell shape determinant MreD (2 flags) (Table 8). The second enriched conserved category, acquisition of nucleotides, may also be related to cell envelope stress since Yu et al. reported a strong induction of both the SigW regulon and nucleotide metabolism genes upon exposure of B. subtilis to the cell wall-acting antibiotic fusaricidin (65). The nucleotide metabolism genes predicted to be targeted by S462 encode the xanthine permease PbuX (2 flags), the uracil permease PyrP (3 flags), and the essential phosphoribosylpyrophosphate synthetase Prs (3 flags). Our target predictions thus suggest that S462 could be partly responsible for the link between the SigW regulon and nucleotide metabolism (65) by acting as an srRNA on one or more targets. The predicted secondary structure of S462 is weak compared to other possible structures with the same nucleotide composition and length (positive Z-score of 0.68). In addition, the A C B D Figure 14. Promoter activity, structure, and predicted ORF of S462 A) Promoter activity plots of the S462 promoter-gfp fusion and the same fusion in combination with a sigW deletion in cells grown on LB in a 96-well plate. Promoter activity was determined by GFP fluorescence readings as described in the methods section. S462 is expressed at a relatively low level (e.g. compare with Figure 12) and S462 expression is absent in a sigW deletion mutant. The plot shows representative data. B) Conserved structure of S462 as predicted with IntaRNA (24). The corresponding sequence alignment is shown in Figure S7. Red color indicates highly conserved predicted base-pairing . Note that the structure contains many base-pairs that are not conserved. C) Predicted ORF in the S462 sequence. The indicated ORF is 59 or 61 amino acids in length depending on the ATG start codon that is used. The second start codon seems to have a better spacing with respect to the upstream putative GGAGG ribosome binding site. D) TMHMM prediction of transmembrane segments (66) in the S462 encoded peptide S462-P. Two transmembrane helices that are separated by a small interior loop are predicted with high probability. 106 Chapter 4 Fr a Ra sR ta NA rt _s mR top NA _ mR star NA t _ BD stop BH 367 367 367 367 367 367 367 367 Bacillus_subtilis_BSn5_uid62463 Bacillus_subtilis_BSP1_uid184010 Bacillus_subtilis_natto_BEST195_uid183001 Bacillus_subtilis_RO_NN_1_uid158879 Bacillus_subtilis_168_uid57675 Bacillus_subtilis_QB928_uid173926 Bacillus_subtilis_spizizenii_TU_B_10_uid73967 Bacillus_JS_uid162189 0.98 0.99 0.98 0.97 1 1 0.92 0.88 4 2 1 2 3 3 3 14 mraY A7A1_0507 mraY mraY mraY mraY mraY MY9_1663 -111 -111 -111 -111 -105 -105 -104 -89 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 74 74 74 74 74 74 71 77 172 172 172 172 172 172 158 154 -51 -51 -51 -51 -51 -51 -41 -35 36 36 36 36 36 36 36 45 BSU15190 BSU15190 BSU15190 BSU15190 BSU15190 BSU15190 BSU15190 BSU15190 S462 S462 S462 S462 S462 S462 S462 S462 367 367 367 367 367 367 367 367 Bacillus_subtilis_spizizenii_TU_B_10_uid73967 Bacillus_JS_uid162189 Bacillus_subtilis_168_uid57675 Bacillus_subtilis_BSP1_uid184010 Bacillus_subtilis_natto_BEST195_uid183001 Bacillus_subtilis_QB928_uid173926 Bacillus_subtilis_RO_NN_1_uid158879 Bacillus_subtilis_BSn5_uid62463 0.92 0.88 1 0.99 0.98 1 0.97 0.98 2 20 44 47 33 45 48 235 yceH MY9_0300 yceH A7A1_1904 yceH B657_02940 yceH BSn5_13050 -111 -87 -84 -84 -84 -84 -84 -73 0.00 0.00 0.01 0.01 0.01 0.01 0.01 0.04 86 90 108 108 108 108 108 108 192 143 142 142 142 142 142 142 -56 -14 -14 -14 -14 -14 -14 -14 36 36 23 23 23 23 23 23 BSU02940 BSU02940 BSU02940 BSU02940 BSU02940 BSU02940 BSU02940 BSU02940 S462 S462 S462 S462 S462 S462 S462 S462 367 367 367 367 367 367 367 367 Bacillus_JS_uid162189 Bacillus_subtilis_spizizenii_W23_uid51879 Bacillus_subtilis_168_uid57675 Bacillus_subtilis_BSP1_uid184010 Bacillus_subtilis_natto_BEST195_uid183001 Bacillus_subtilis_QB928_uid173926 Bacillus_subtilis_RO_NN_1_uid158879 Bacillus_subtilis_BSn5_uid62463 0.88 0.89 1 0.99 0.98 1 0.97 0.98 5 22 66 58 52 65 63 142 MY9_2785 mreD mreD A7A1_0442 mreD mreD mreD BSn5_04820 -96 -84 -82 -82 -82 -82 -82 -76 0.00 0.00 0.01 0.01 0.01 0.01 0.01 0.02 122 67 121 121 121 121 121 121 161 96 159 159 159 159 159 159 -33 -33 -34 -34 -34 -34 -34 -34 6 -5 6 6 6 6 6 6 BSU28010 BSU28010 BSU28010 BSU28010 BSU28010 BSU28010 BSU28010 BSU28010 sR lue re Pv a Sc o me nk Na cId th g Or ng yN er NA _s Le S462 S462 S462 S462 S462 S462 S462 S462 am e Qu Table 8. Selection of conserved predicted S462 targets Same legend as for Table 6. predicted S462 structure is not highly conserved, since it contains a large number of base-pairs with low evolutionary conservation (Figure 14B). The lower level of secondary structure might by related to the peptide-encoding potential of S462. The respective ORF, S462-P (with P for peptide), is predicted to be either 59 or 61 amino acids in length due to the presence of two potential start codons, and it contains a GGAGG ribosome binding site upstream of the first possible start codon (Figure 14C). As was discussed above, S462 is under control of the cell envelope stress sigma factor SigW and most of its predicted srRNA functions seem also cellenvelope related. We therefore wondered whether S462-P might also be linked to cell envelope processes. To test this, we used the TMHMM webserver (http://www.cbs.dtu.dk/services/ TMHMM/) for prediction of transmembrane domains (66) in the S462-P sequence. Indeed, S462-P contains two significantly predicted transmembrane domains, suggesting that it is a small integral membrane protein (Figure 14D). Future studies will have to reveal whether S462 is indeed a dual-function srRNA involved in the cell-envelope stress response of B. subtilis. Conserved RsaE/S415 functions The evolutionarily most conserved known srRNA in B. subtilis is RsaE/S415. RsaE/S415 was first identified in S. aureus through a bioinformatics screen of this organism’s intergenic regions, and the expression of RsaE/S415 was subsequently confirmed by Northern blotting (18). The authors of this study also noted its strong conservation in the Bacillaeae (18). In a later investigation, RsaE was found to downregulate (genes for) numerous metabolic enzymes (19). In both studies, direct RsaE-target interactions were tested by gel retardation analysis. However, the results remained inconclusive for many of the putative targets. It was therefore suggested that in vivo there may be an unknown RNA chaperone required for these interactions (19). In our present study, the RsaE/S415 sequence was confirmed to be highly conserved in the included Bacillus 107 In silico target profiling genomes, with the core of the sequence displaying the highest conservation (Figure S8). More generally, the presence of RsaE/S415 in organisms ranging from S. aureus to B. subtilis opens up the possibility for comparative studies on the conservation of RsaE/S415 functions. Expression data from Nicolas et al. suggest high RsaE/S415 expression under many experimental conditions, being on average most prominent in the exponential growth phase (17). However, the concordance between triplicates in this expression data was sometimes low for RsaE/S415. Since the genome-wide data was highly concordant (17) this may be a functionally relevant aspect of RsaE/S415 expression. Beyond this, RsaE in S. aureus was shown to be highly induced in the transition between the exponential and stationary growth phases (18), and we therefore wondered whether the expression pattern of B. subtilis RsaE/S415 could be similar to the pattern of expression in S. aureus. To test this, we again employed the pBaSysBioII plasmid to construct a chromosomally integrated single-copy promoter gfp fusion. Analysis of this strain grown on LB medium showed a highly homogeneous promoter activity for RsaE/S415 (Figure 15, A and B). The RsaE/S415 promoter activity remained prominent throughout the exponential growth phase and dropped to levels below the detection limit in the transition and stationary growth phases (Figure 15A). The latter is in contrast to what was observed in S. aureus (18). This shows that the pattern of expression of RsaE/S415 is probably not conserved from S. aureus to B. subtilis. Yet, we cannot exclude the possibility that the observed difference in expression patterns is caused by differences in the applied experimental conditions. Target predictions for RsaE/S415 showed enrichment of the functional categories electron transport and ATP synthesis - respiration (p-value <0.001) and trace metal homeostasis (p-value <0.04) (Table 3). Especially the significance of the first category is striking. This relates to the predicted target genes ctaC, qcrC, ndhF, scuA/ypmQ, ctaE, nasE and qcrB, all of which encode components of electron transport chains for oxidative phosphorylation or are required for cytochrome maturation. The NADH dehydrogenase gene ndhF has four flags (Table 2, Table S1). This is due to the conservation of the predicted target interaction (Table 9) and presence in the enriched respiration-specific B-cluster B56 (Table 2). The latter suggests that RsaE/S415 has an effect on the expression level of ndhF. This effect would, for instance, take place under conditions of heat stress or nitrate respiration, where ndhF is specifically expressed (17). A B C Figure 15. RsaE/S415 promoter activity and nisin-sensitivity of an RsaE/S415 mutant A) Promoter activity plot for an RsaE/S415-gfp gene fusion in cells grown in LB medium in a 96-well plate. RsaE/ S415 expression is high (compare to Figure 12) and prominent throughout the exponential phase. B) Flow cytometry histogram shown for the GFP production in cells carrying the RsaE/S415-gfp promoter fusion. Autofluorescence of the parental strain is indicated in grey, and representative data is shown. C) Increased sensitivity of an RsaE/S415 deletion mutant to incubation with nisin. The survival of cells challenged with nisin was assessed by live/dead staining and flow cytometry analyses. Representative flow cytometry data indicating a shift in color spectrum upon live/dead staining is shown. A shift toward the left implies an increase in the number of cells with permeabilized membranes. 108 Chapter 4 Although the expression pattern of RsaE/S415 may not be conserved from S. aureus to B. subtilis, it is conceivable that there are functional processes that have been maintained in evolution. To investigate whether this is the case, we first examined the results of an expression analysis previously reported for an RsaE mutant in S. aureus (18). In this analysis 86 differentially expressed mRNAs were identified. These mRNAs belonged to multiple categories, including genes related to lipid metabolism, cofactor metabolism, energy transport, and cell envelope biogenesis (18). Interestingly, the categories cofactor metabolism and energy transport are highly related to the enriched conserved target category electron transport and ATP synthesis - respiration of B. subtilis 168 (Table 5). To look for functional categories that are enriched beyond B. subtilis 168, we analyzed these evolutionary target predictions again without the requirement of the target also being predicted in B. subtilis 168. As such, this would give an overview of predicted targets in the whole clade of Bacillus genomes. Remarkably, the conserved predicted targets thus retrieved are enriched for the functional categories lipid utilization (p-value <0.01), biosynthesis of cofactors (p-value = 0.01), electron transport and ATP synthesis - respiration (p-value = 0.01), and coping with hypo-osmotic stress (p-value = 0.04). It thus seems likely that there is a functional conservation of RsaE/S415 ranging from S. aureus to B. subtilis in the functional processes lipid metabolism and cofactor metabolism. We next inspected these predictions further at the level of predicted targets. For the process of lipid metabolism, this led to the observation that genes of the fad operon involved in acetyl-coA metabolism were deregulated in S. aureus (fadABE operon), and that some of these genes are also conserved predicted targets in Bacillus species. In B. subtilis the specific predicted targets are fadH, fadE and fadN. The possible involvement of RsaE/S415 in acetyl-CoA metabolism is interesting, since it could perhaps partly explain the global changes in metabolism observed in an S. aureus mutant defective in RsaE/S415 (19). In addition, acetylCoA is a main link between central carbon metabolism and lipid metabolism. The implication of RsaE/S415 in the regulation of lipid metabolism and the predicted regulation of many membrane proteins, for instance those involved in respiration, suggested possible cell envelope changes in a Bacillus mutant of RsaE/S415. However, it is not clear whether these changes would also take place in B. subtilis 168. We therefore scrutinized our target predictions for RsaE/S415 in B. subtilis to check whether there are any possible links to cell envelope processes. Indeed, there are three predicted targets linked to cell envelope processes, namely dacB, yrpC and gcaD. DacB is a D-alanyl-D-alanine carboxypeptidase and YrpC is a glutamate racemase involved in peptidoglycan precursor biosynthesis. Both predicted targets have two flags, one for co-expression and one for peaks correlation. The third predicted target gcaD is an essential cell wall metabolism gene and contains an additional conserved flag (three flags) (Table S1, Table S2). We attempted to detect changes in the state of the cell envelope of an RsaE/S415 mutant by exposing it to nisin. Nisin is a post-translationally modified 34-amino acid polycyclic antibacterial peptide, which targets the essential Lipid II in the cell membrane to form pores in this membrane (67). A change in Lipid II abundance in the membrane will thus lead to a change in nisin sensitivity. We analyzed nisin sensitivity by a live-dead staining assay. This live-dead stain relies on the penetration of a fluorescent dye into cells with a compromised cell membrane. Using this assay, we indeed observed an increased sensitivity to nisin of the RsaE/S415 mutant compared to its parental strain (Figure 15C). Using the same assay, we also identified an increased ethanol sensitivity of the RsaE/S415 mutant (data not shown). These observations suggest that that there are indeed changes in the cell envelope architecture that are dependent on RsaE/S415. More targeted experiments are, however, required to directly link the observed phenotypes to deregulation of the predicted mRNA targets of RsaE/S415. S. aureus RsaE is probably involved in stationary phase adaptation, which is aimed at reducing enzymes from central metabolism and increasing the amino acid pool (19). The latter publication reported that in this adaptation many of the RsaE-modulated genes are also 109 In silico target profiling dependent on CcpA (19). CcpA is the master regulator of carbon catabolite repression in many Gram-positive bacteria, including S. aureus and B. subtilis. We therefore looked in our predictions whether such a link with CcpA might also be present in B. subtilis, which would thereby suggest a role for RsaE/S415 in the central carbon metabolism of B. subtilis. We did this despite the fact that RsaE/S415 is expressed in the exponential growth phase (Figure 15A), and is therefore unlikely to share the stationary phase function of S. aureus RsaE. The general target predictions for RsaE/S415 indeed show an enrichment of targets from the CcpA regulon (Table 4). This enrichment is due to predicted interactions with rsbV, citM, ccpC, ctaC, ylbP, odhA, levF, cstA, araQ, araA, acuB and licB (Table S1). The citM gene encodes an Mg2+-citrate transporter and the protein product of ccpC represses citB and citZ. This suggests a role for B. subtilis RsaE/ S415 in citrate metabolism, as was also found for S. aureus. We additionally looked for other targets involved in core carbon metabolism. Four core carbon metabolism genes are predicted targets for RsaE/S415. These genes encode the 2-oxoglutarate dehydrogenase OdhA, succinate dehydrogenase SdhC, the repressor of citB and citZ CcpC, and PdhD, which is a subunit of both the pyruvate dehydrogenase and the 2-oxoglutarate dehydrogenase complexes. These four potential targets all have two flags due to conditional expression and conservation, except for ccpC, which is not conserved. Interestingly, OdhA and PdhD can be part of the same complex, and ccpC and odhA are also part of the CcpA regulon. We therefore decided to construct a translational GFP reporter, consisting of an in-frame fusion between the first 80 amino acids of OdhA and GFP. The respective gene fusion was then placed under control of the native odhA promoter on plasmid pRM3. GFP reporter activity of this construct was identified solely in the exponential growth phase on LB medium (Figure 16A). Deregulated OdhA-GFP expression compared to the parental strain was observed when GFP activity was assayed in an RsaE/S415 mutant background. This deregulation was characterized by a >2 fold increase in maximal GFP reporter activity (Figure 16A). In addition, the reporter was active for a slightly longer period of time in the RsaE/S415 deletion background. This deregulation in GFP reporter expression was complemented by ectopic expression of RsaE/S415 under control of its native promoter from the amyE locus (Figure 16A). In fact, the RsaE/S415-complemented strain exhibited an earlier decrease in OdhA-GFP reporter activity than the parental strain (Figure 16A). Since this is the opposite of what we observed for the RsaE/S415 mutant, this observation suggests that RsaE/S415 is expressed at a (slightly) higher level from the amyE locus compared to its native genomic locus. This has also been observed for complementation of another srRNA (data not shown). We also integrated the RsaE/S415 complementation construct in the parental strain 168 to create a strain with an additional copy of RsaE/S415. This extra RsaE/S415 copy did not affect the OdhA-GFP reporter expression compared to that of the parental strain, which may suggest that repression by RsaE/S415 is already at its maximum in the parental strain. Despite this, these observations suggest that odhA is a direct target of RsaE/S415. To verify whether odhA could indeed be a direct RsaE/S415 target, we further inspected the predicted srRNA-mRNA interaction region. The predicted odhA interaction is part of the most conserved region of the RsaE/S415 molecule (Figure S8; Figure 16B). In B. subtilis 168, the predicted interaction spans from the first codon of odhA (+3) until 57 bp after the start of the ORF. This predicted interaction region is highly conserved in related bacteria (Table 9). Notably, it has been reported that loop-exposed bases of srRNAs are more often responsible for regulation than bases in stems (31). Six loop regions of RsaE/S415 are part of the predicted interaction with odhA. The third of these loops contains the UCCC motif identified for RsaE/ S415 by Geissmann et al. (18). We checked whether target predictions could help to suggest a seed region around this motif, but did not identify a preferential interaction region for RsaE/ S415 (Figure S9). Nevertheless, as an effect of the three consecutive G-C base-pairs, this third loop contributes most strongly to the low free energy of the predicted interaction. With the 110 Chapter 4 goal of disrupting the putative interaction, we therefore introduced a point mutation in RsaE/ S415 by a C to G substitution in the middle of the UCCC loop. The resulting mutant RNA was designated RsaE/S415*. In addition, we constructed a compensatory mutation in the plasmidborne odhA-gfp reporter construct, designated odhA-gfp*. The subsequent OdhA-GFP(*) expression analyses showed that the odhA-gfp* construct is still regulated by the wild-type RsaE/S415 (data not shown), suggesting that the interaction is too strong to be disrupted by a single base substitution. Furthermore, the mutated RsaE/S415* construct did neither affect the expression of the wild-type odhA-gfp reporter, nor that of the mutated odhA-gfp* reporter (data not shown). This suggests that the srRNA may either be destabilized by the introduced point mutation or that the UCCC loop is not critically involved in an interaction between RsaE/ S415 and odhA. Further experimental analyses are required to unravel the molecular basis for A B C Figure 16. OdhA is deregulated in an RsaE/S415 mutant A) GFP reporter activity (top panel) and growth (lower panel) in cells of the indicated strains grown on LB medium in 96-well plates. All cells express a translational fusion between OdhA and GFP. Expression of the OdhA-GFP reporter is >2 fold higher in the RsaE/S415 deletion strain than in the parental strain (WT), and can be reversed by ectopic expression of RsaE/S415 under control of its native promoter. B) Conserved structure of RsaE/S415 as predicted with IntaRNA (24). The corresponding sequence alignment is shown in Figure S8. Red color indicates high conservation of secondary structure. The core structure of RsaE/S415 contains many conserved base-pairs, but the top of the structure is not conserved in all species. The blue arrow indicates the cytosine base that was changed into a guanine, and that possibly destabilized the RsaE/S415. C) Predicted TargetRNA_v1 interaction between RsaE/S415 and odhA. Predicted RNAfold loop regions are indicated in red. The mutated base pair, indicated with the arrow in panel B, is marked in blue. 111 In silico target profiling _s tar t _s mR top NA _ mR star NA t _ BD stop BH sR Bacillus_JS_uid162189 Bacillus_subtilis_168_uid57675 Bacillus_subtilis_BSn5_uid62463 Bacillus_subtilis_BSP1_uid184010 Bacillus_subtilis_natto_BEST195_uid183001 Bacillus_subtilis_QB928_uid173926 Bacillus_subtilis_RO_NN_1_uid158879 Bacillus_subtilis_spizizenii_W23_uid51879 Bacillus_atrophaeus_1942_uid59887 Bacillus_subtilis_spizizenii_TU_B_10_uid73967 1 1 1 1 1 1 1 1 0.98 1 73 72 75 80 78 61 66 67 155 235 MY9_0188 ndhF BSn5_12510 A7A1_3299 ndhF ndhF I33_0230 ndhF BATR1942_19550 GYO_0376 -75 -75 -75 -75 -75 -75 -75 -75 -70 -66 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.02 16 16 16 16 16 16 16 14 13 14 55 55 55 55 55 55 55 53 53 53 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 36 36 36 36 36 36 36 36 40 36 BSU01830 BSU01830 BSU01830 BSU01830 BSU01830 BSU01830 BSU01830 BSU01830 BSU01830 BSU01830 S415 S415 S415 S415 S415 S415 S415 S415 S415 S415 S415 S415 S415 S415 S415 S415 S415 S415 S415 S415 S415 126 126 126 126 126 126 126 126 126 126 126 126 126 126 126 126 126 126 126 126 126 Bacillus_JS_uid162189 Bacillus_subtilis_168_uid57675 Bacillus_subtilis_BSn5_uid62463 Bacillus_subtilis_natto_BEST195_uid183001 Bacillus_subtilis_QB928_uid173926 Bacillus_subtilis_RO_NN_1_uid158879 Bacillus_subtilis_spizizenii_TU_B_10_uid73967 Bacillus_subtilis_spizizenii_W23_uid51879 Bacillus_amyloliquefaciens_DSM_7_uid53535 Bacillus_amyloliquefaciens_FZB42_uid58271 Bacillus_amyloliquefaciens_LL3_uid158133 Bacillus_amyloliquefaciens_plantarum_AS43_3_uid183682 Bacillus_amyloliquefaciens_plantarum_CAU_B946_uid84215 Bacillus_amyloliquefaciens_plantarum_YAU_B9601_Y2_uid159001 Bacillus_amyloliquefaciens_TA208_uid158701 Bacillus_amyloliquefaciens_XH7_uid158881 Bacillus_amyloliquefaciens_Y2_uid165195 Bacillus_cytotoxicus_NVH_391_98_uid58317 Bacillus_atrophaeus_1942_uid59887 Bacillus_coagulans_2_6_uid68053 Bacillus_coagulans_36D1_uid54335 1 1 1 1 1 1 1 1 0.98 0.98 0.98 0.98 0.98 0.98 0.98 0.98 0.98 0.95 0.98 0.97 0.97 52 52 49 53 45 48 47 47 66 62 73 65 56 64 67 70 68 48 249 95 118 MY9_2119 sucA sucA kgd odhA sucA sucA odhA odhA sucA odhA sucA sucA odhA odhA sucA odhA sucA sucA BCO26_1275 Bcoa_3252 -78 -78 -78 -78 -78 -78 -78 -78 -73 -73 -73 -73 -73 -73 -73 -73 -73 -71 -65 -63 -63 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.01 0.01 70 70 70 70 70 70 68 68 57 57 57 57 57 57 57 57 57 12 74 42 42 119 119 119 119 119 119 117 117 104 104 104 104 104 104 104 104 104 39 115 69 69 3 3 3 3 3 3 3 3 -5 -5 -5 -5 -5 -5 -5 -5 -5 -45 -5 17 17 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 -20 46 44 44 BSU19370 BSU19370 BSU19370 BSU19370 BSU19370 BSU19370 BSU19370 BSU19370 BSU19370 BSU19370 BSU19370 BSU19370 BSU19370 BSU19370 BSU19370 BSU19370 BSU19370 BSU19370 BSU19370 BSU19370 BSU19370 sR NA Pv 126 126 126 126 126 126 126 126 126 126 NA Sc alu e Ra nk Na me S415 S415 S415 S415 S415 S415 S415 S415 S415 S415 or e Fr ac Id Qu er yN am Le e ng th Or g Table 9. Selection of conserved predicted RsaE/S415 targets Same legend as for Table 6. these observations and to validate odhA as a direct target of RsaE/S415. Notably, the observed deregulation of OdhA by the RsaE/S415 deletion may also be due to indirect effects, for instance other changes in core carbon metabolism or a disturbed acetyl-CoA metabolism. However, the strong and conserved predicted interaction between RsaE/S415 and odhA seems to argue for a direct srRNA-mRNA interaction. Conclusion A major aim of the studies presented in this chapter was to establish a bioinformatics pipeline for the prediction of srRNA regulatory functions in B. subtilis. As was exemplified with the srRNAs FsrA/S512, SR1, S462 and RsaE/S415, such predicted regulatory functions can indeed be extracted from various elements in the presented predictions, either by inspecting these elements separately or in combination. Based on the present results, we encourage a further exploration of the predicted srRNA functions in B. subtilis as this may lead to a much deeper understanding of the mechanisms underlying srRNA regulation in B. subtilis in particular and in Gram-positive bacteria in general. For this purpose, we have provided all data files, the R code used for the data analysis, and an instruction how to browse through these predictions in the supplementary material. It is becoming increasingly clear that srRNAs may have many origins. They can be transcribed from independent promoters situated in intergenic genomic regions, but they can also originate from RNA processing events and complex control of transcriptional-termination of operons or even 3’UTRs (32, 41). It can also not be excluded that there are asRNAs that have a function in trans. This means that there are, most likely, many more putative srRNAs than those 63 that were included in our present selection. Whenever these will be identified, the 112 Chapter 4 considerations for the study of putative srRNAs outlined in this chapter will also be of relevance for these srRNAs. There are currently multiple algorithms for srRNA target predictions available. Some of these have been compared for different Gram-negative bacteria (26). We used TargetRNA_ v1 to provide a set of unbiased srRNA target predictions for the Gram-positive bacterium B. subtilis. This set of predictions was further evaluated with the reported analyses to identify the most likely srRNA targets. Such analyses can also be applied to other sets of target predictions, including those from other target prediction algorithms. Furthermore, it would also be useful to integrate precise knowledge on mRNA start sites in the target predictions to refine the range of predictions. This seems feasible, for instance by implementation of RNAseq data (16), but the incorporation of such data was beyond the scope of the present studies. Lastly, we conclude from the present analyses that focusing on the identification of srRNA phenotypes to subsequently link these to the deregulation of a particular target can only be successful when the srRNA in question is sufficiently important to lead to a phenotype when deleted. The fine-tuning nature of srRNA regulation, the abundance of potentially functionally redundant srRNAs, as well as the relatively low sensitivity of phenotypic assays makes it very unlikely that clear phenotypes can be observed for every srRNA mutant. For example the S462 RNA is under control of SigW, but a SigW mutant exhibits no particular phenotype - it does so only in combination with deletions of other alternative sigma factors (68, 69). It is therefore not expected that a strong S462 phenotype will be identified. Since only mRNA levels will change upon srRNA deletion, the phenotypes of srRNA mutant strains are also expected to be less prominent than phenotypes of mutants that lack, for instance, transcription factors. Instead of focusing only on srRNA mutant phenotypes, studies on the function of putative srRNAs can also include a pulsed overexpression approach of the respective srRNAs. The differentially expressed genes that are identified upon pulse-induced srRNA overexpression can then be compared to target predictions, for instance the ones reported here. Such a combination of bioinformatics predictions and targeted experiments is expected to greatly advance our understanding of srRNA regulation in B. subtilis. Materials & Methods Growth conditions and strain construction LB medium was used for all experiments and cloning. When required, E. coli media were supplemented with ampicillin (100 mg ml-1) or chloramphenicol (10 mg ml-1). Media for B. subtilis were supplemented with phleomycin (4 mg ml-1), kanamycin (20 mg ml-1), tetracyclin (5 mg ml-1), chloramphenicol (10 mg ml-1) or spectinomycin (100 mg ml-1), or combinations thereof. E. coli and B. subtilis strains and plasmids used in this study are listed in Table S4A and oligonucleotides in Table S4B. E. coli TG1 was used for all cloning procedures. All B. subtilis strains were based on the trpC2-proficient parental strain 168 (70). B. subtilis transformations were performed as described previously (71). The isogenic FsrA/S512 and RsaE/S415 mutants were constructed according to the method described by Tanaka et al. (72). Promoter fusions for S462, FsrA/S512, and RsaE/S415 were constructed using the integrative Ligation Independent Cloning (LIC) plasmid pBaSysBioII (58). For this purpose, the start of the respective RNA segment was fused with GFP. A minimum of three clones were checked to exclude possible multi-copy integration of the pBaSysBioII plasmid. These promoter fusions were combined with deletion mutants by transformation of the respective genomic DNA. The translational OdhA-GFP reporter fusion was constructed via overlap PCR followed 113 In silico target profiling by gel-purification of the respective amplified DNA fragment and LIC in the plasmid pRM3 (73) (Chapter 7). RsaE/S415 was complemented in trans by integration of the complete RsaE/S415 sequence under control of its native promoter in the amyE locus. For this, RsaE/S415 was first LIC-cloned into pRMC (Mars et al.; Chapter 5). Assaying reporter strains GFP fluorescence and growth (600 nm) were monitored at 10 min intervals for cells grown in 96-well plates in a Biotek® Synergy 2 plate reader as previously described (58). Autofluoresence of the parental strain was subtracted and promoter activity was computed by subtracting the fluorescence of the previous time-point from that of the measured time-point (as in Botella et al. (58)). Moving average filtering (filter function in R with filter=rep(1/10, 10)) was used for smoothing of the promoter activity plots. For flow cytometry, cultures were grown in shake flasks, sampled in the indicated growth phase and directly analyzed in an Accuri C6 flow cytometer. Transcriptomics and 2D PAGE Cultures for RNA isolations from cells grown on LB were sampled in the late exponential / transition phase (OD600nm 3.2). The cells were directly harvested in killing buffer and processed as described previously (17). RNA samples were analyzed with high-density tiling arrays and analyzed as was described previously (17). For 2D PAGE analyses, the cells were harvested by centrifugation, resuspended in TE-Buffer (10 mM Tris, 1 mM EDTA, pH 7.5), and mechanically disrupted using a Precellys 24 homogenizator (PeqLab, Germany; 3 x 30 s at 6.5 m s-1). Protein concentrations of extracts were determined using a ninhydrin-based assay (74, 75). Three biological and two technical replicates were analyzed. 2D PAGE was performed as previously described (76). 100 μg protein was loaded onto 18 cm IPG strips (pH 4-7, GE-Healthcare). After 2D PAGE, gels were fixed with 40 % (v/v) ethanol and 10 % (v/v) acetic acid for 1 to 2 h and subsequently stained with FlamingoTM. Stained gels were scanned (Typhoon 9400, GE-Healthcare) and the resulting images were analyzed and quantified employing Delta2D 4.2 software (Decodon GmbH, Germany). For all spots detected on the gel, the spot volume was assigned to proteins, exported from the software and subsequently used for calibration of 2D gels as described previously (77). Spot quantities were calculated as % volume of each spot on one gel compared to all detected spots on the gel. The spot volumes were used to discover spots with significantly changed abundance (students t-test). Significantly changed spots were cut from the gel, protein digestion was performed in a Ettan Spot Handling Workstation (GE Healthcare), and samples were analyzed by MALDITOF-MS/MS using the Proteome Analyzer 4800 (Applied Biosystems) as described in (77). Nisin stress assay Overnight cultures of cells grown in LB with the appropriate antibiotics were used to inoculate fresh LB broth at approximately a 1:100 dilution. When this pre-culture reached exponential phase, the samples were diluted to an OD600 of 0.05 in 100 ml bottles at a final volume of 5 ml LB. Growth was continued for approximately 70 min to an OD600 of 0.4–0.5. At this point nisin was added to a final concentration of 1 mM. The nisin was purchased from Biochemika (Fluka, Sigma Aldrich). After 10 min incubation with vigorous shaking, 1 ml of cells were pelleted by centrifugation for 1 min at maximal speed. The supernatant was discarded and cell pellets were gently re-suspended in 0.5 ml 0.85% NaCl prior to the addition of 1 ml live/dead stain (1:1 SYTO 9:propidium iodide; LIVE/DEAD BacLight Bacterial Viability and Counting Kit; Invitrogen). The membrane integrity of nisin-stressed cells was then determined by flow cytometry in an Accuri C6 Flow Cytometer as previously described (78). 114 Chapter 4 Expression data and annotation of RNA segments The tiling array expression data and annotation of RNA segments are available in the online supplement of Nicolas et al. (17). The re-classification of RNA segments into four groups, global conservation analyses and secondary structure predictions have been described in Chapter 3 of this thesis. Target predictions and computation of functional enrichment Putative srRNA targets were predicted with the program TargetRNA_v1 (23) on the B. subtilis 168 genome (Genbank: AL009126-3) near the 5’ region. The search region was defined as -75 bp; +50 bp around the start codon of the CDS or around the 5’-end of the other RNA segments reported by Nicolas et al. (17). The additional command line arguments “-z 10000 -y 2 -l 6” were used to specify relaxed search and output criteria in order to obtain an array of prediction as complete as possible: maximum number of hits per query in output 10000 (corresponding to unlimited; default 100); no p-value cut-off (default 0.01); minimum number of consecutive base pairs in the interaction set to 6 (default 9). Enrichment of functional categories was expressed as a binomial p-value computed on the genes following the SubtiWiki annotation (specifically column “FuncName3” corresponding to the third level of the functional classification) (49). Enrichment of B-clusters within the predicted targets for a putative srRNA was computed similarly, based on the B-cluster annotation by Nicolas et al. (17). The B-clusters group genes or RNA segments with substantial similarity in expression profiles as determined by the Pearson correlation coefficient (the average pair-wise coefficient within these B-clusters is 0.6). Evolutionarily conserved target analysis In order to identify putative srRNA – predicted target pairs that are evolutionarily conserved, we used the 62 Bacillus genomes available in Genbank (as of January 31, 2013). On each of these genomes a BLAST search (Blastn v2.2.26 with default parameters) was conducted with the B. subtilis 168 sequence of the putative srRNA (in practice the expression segments classified as All-Indep in(17)). Genomes where a homologuous segment (E-value < 0.001) was found were then subjected to a TargetRNA_v1 search with extended settings around the 5’UTR (-75 bp; +50 bp around the start codon and additional command line arguments “-z 250 -y 2 -l 6”) using as a query the sequence of the first high-scoring-pair of the first BLAST hit in that particular genome. A bidirectional best hit criterion (based on Blastp v2.2.26 with default parameters and E-value cut-off 0.001) was used to compare the predicted targets in the reference B. subtilis 168 genome (Genbank: AL009126-3) with the predicted targets in the other genomes. The data was tabulated and subsetted to only include genes that were predicted in 8 or more genomes, and were predicted in B. subtilis 168 with p-value ≤0.01. To inspect the complete predicted conserved RsaE regulon, the data for this srRNA was additionally analyzed without the latter criterion. Target correlations under high sRNA expression (peaks expression) We investigated the pair-wise expression correlation across biological conditions of the independent RNA segments and their predicted targets in B. subtilis 168 with two different analyses. In the first analysis, we simply computed the pair-wise Pearson correlation across conditions between a putative srRNA and a predicted target across the 269 hybridizations performed by Nicolas et al. (17). In the second analysis the Pearson correlation coefficient was computed only for the hybridizations around a peak of expression of the putative srRNA in the condition space. The goal of this was to address the problem that the correlation between srRNA 115 In silico target profiling and target might be strong only in a subset of conditions near the induction of the srRNA. In practice we defined the center of the peak as the hybridization in which the highest expression level was measured for the putative srRNA. We then proceeded with the step-wise aggregation of the other hybridizations starting with those whose global transcriptome is the most similar (correlated) to the hybridization selected as the center of the peak. We stopped this aggregation process after the inclusion of 6 hybridizations where the expression level of the putative srRNA is below a cut-off (quantile 75% of the distribution of the expression levels for this putative srRNA). For each srRNA segment, a second expression peak was also investigated when the first peak did not encompass all the hybridizations where the expression of the putative srRNA is in its upper 10% and not less than 4x its global maximum. The pair-wise Pearson correlations were then computed for the subset of hybridizations included in each peak. For each putative srRNA and in each analysis both the ranking and p-values of the pair-wise correlations were plotted. Acknowledgements This work was supported by the Commission of the European Union (projects LSHGCT-2006-037469 and 244093), and the transnational Systems Biology of Microorganisms (SysMO) organization (project BACELL SysMO2) through the Research Council for Earth and Life Sciences of the Netherlands Organization for Scientific Research.. Supplementary Material (available on request; email [email protected]) Figures S1 - S9 Tables S1 - S5 Supplementary data file predictions Compressed file containing the following files (additional required files from SubtiWiki (49) or Nicolas et al. (17) are included but not listed here): • R code for analysis and browsing through these files (R code prediction manuscript Chapter 4). • All target predictions up to TargetRNA_v1 p-value 1 (HUGEallresTargetRNA_20120412.csv). Only open in R (too large for Excel / Open Office). • All results peaks correlation analysis (allrespeakcor_20140902.tsv). Only open in R (too large for Excel / Open Office). • All information on conserved targets that received a Conserved flag (ConservedTargetsSelectionCutoff8SpeciesOnlySubtilis.csv). Only open in R (too large for Excel / Open Office). • Selection of putative srRNAs (new selection sRNAs update.csv). • Table S1 (tp0.01subsetWithFlagsCount.csv). • Folder containing plots like in Figure 11 for all putative srRNAs from the selection (Regregion plots folder). • Text file with all blast results of the selection of putative srRNAs. Can be used to make alignments (lastSelectionBlastSepForAlignment.fasta). Supplementary data file for the transcript profiling analysis of an S512 mutant using tiling arrays Compressed file containing the following files (in addition to Figures S4, S5, and S6): • Normalized tiling array data S512 mutant and parental strain (tilingQnormS512_and_WT.csv). • Outcome of analysis with limma package from R (all analyzed S512.csv). • Differentially expressed genes in the S512 mutant strain with p-value ≤ 0.01 (significantly changed targets S512 0.01.csv). • Differentially expressed genes in the S512 mutant strain with p-value ≤ 0.05 (significantly changed targets S512 0.05.csv). • Overlap between predicted targets (with p-value ≤ 0.01) and significantly changed targets in the expression analysis (≤ 0.05) (tilingS512only0.01_0.05.csv). 116 Chapter 4 Descriptions, legends and selected Supplementary Figures Table S1. All predicted targets with additional information For legend see Table 2 in main text. Table S2. Predicted targets from Table S1 with three or more flags For legend see Table 2 in main text. Table S3. Enriched B-clusters in predictions Query, putative srRNA name. Bcluster, enriched B-cluster of the putative srRNAs computed on all B. subtilis 168 target predictions. pval, binomial p-value indicating the significance of the enrichment. Table S4. Peaks targets sRNA, putative srRNA name. Name, name of the predicted target. Ltag, unique B. subtilis 168 locus tag of the predicted target. GlobalCor, Pearson correlation between the expression level of the predicted target and the srRNA in all conditions. GlobalCorPvalue, p-value of the GlobalCor. GlobalCorRankNeg, 1 for the gene that is most negatively correlated across all hybridizations. GlobalCorRankPos, 1 for the gene that is most positively correlated across all hybridizations. iPeak, peak number (maximum two). PeakCenter, unique tag of the relevant hybridization from (17). PeakHeight, peak expression level. PeakSize, number of hybridizations with this peak value. Cor, peak correlation between the expression level of the predicted target and the srRNA. CorPvalue, p-value of Cor. CorRankNeg, 1 for the gene that is the most negatively correlated in peaks condition. CorRankPos, 1 for the gene that is the most positively correlated in peaks condition. PredPval, TargetRNA_v1 (23) prediction p-value. This table contains predicted targets with p-values <0.01. PeakCorScore, the absolute difference between the corall and peaks correlation. This table contains pairs with a PeakCorScore of >1.5. Table S5. Strains, plasmids, and oligonucleotides used in this study thiV ypeQ gpsB S797 yceG ywlD S1455 plsY ubiX yaaH yhbE fbaA queE ctaE yusE coaDmrpG yaaC ybxG yezD ylxR nadC tatCY sacP spoVFA nudFykkC bglC S1009 mutS ywcE ywrE S249 psd ykpC clpX opuBA ydbD ykzD bglP yshA yqhM yciB fruR yozG yvrH ydaB yxlH ypfA ydfJ rapA ycgK malR yitI yttB ydjN trmB ycbL ypuI oxaAA sppA mbl yqzC ddl ywqN yugK yhfC hisC abrB sipS ntdA ypbFycbU spsA yrhC yhdL yqhV yxeC glgB yabB oxdC yfnD cysC yvoF ygaO yqgE yfhI ftsL yqgU rapH yjbC ywpGfliE ymaH ywfO yojN yugE defB ytqA yvdA rplS ylmC yloA yerA ydbL sspO truA S1227 ypjB ytkP bofA dxr yusU rsiV glcF ydaO yvgL yhfP yvrB yfmC ycsF S731 ybxB yojJ ytdP spsF S641 yqhT yflH bacB adeC yxjB hmp ypqE ytlP yxeN tepA cysJ yrrO S717 rapG yyaR murB thyA dapB sigW xre salA ltaSA ykrA ykhA thrC ohrB etfA glcP yuxL kbl pgsE sirB qoxB yclM xylB coaE purB yrvN smc fabHB yufQ pps ywbO S863 ywiE rsbRC nth yugI gyrB ylbQ sacY metK perR yvrP sigI S708 yvgJ yjaZ yxnA yvaE yisX mntH pksS sinR mrpC sdhB ftsA ylqB pbuG msmX kinD S444 pheS senS yerH ykqA yaaR dctP S326 natA sspA yodEdtpT lytD ftsW ilvD yusO yybP yvbU ytmB tagH atpE yjzD yceK nrdIspoIIM ynfC yjkB spo0A ysdB dhbF spoVAD yqzJ mapB rsfA ycgS yxeB yvbG yocS dacF sigM S1027 pepF ydiB aprE scoC ycsD amtB braB asnS argD citB thrZ msmE yktA ybaNyfiJ yhbB rplM yqgN lytH ywzC xynB ytxD ydbK bipA mraY ytxC ylbA ywfA rbsR bcrC cinA rsbV ykuC rsbRA ywlC ykuJ yugS yueH norM ykuE holA yqeW yybGywmC ykvS sigX yheA glcD yacL uvrA mrpA S289 S877 parB S309 yxjL spoVID S912 ybcL S462 yflI purT pupG gutP yetL mtlF glpG ywbI ytkA nusB bdbC trpP ypdP feuC natB mrgA comQ yfiY lcfA cotO ebrA dppB dapH S198 slrR ydbP xpt yfmM yueI yhjG sspN rsbSyvbW dnaEydbC adcAysnA yojF xkdB yizA yvzG ytcQ cotSyisL splB ykuH levE ispA yqzE ykuL bltD yxxD ftsY ywqC lepAywmE pyrK dps pckA yaaL yvdB abnA sufS efeU rapF ugtP natR carA padC feuA murC ylmG exlX ruvA ndk ybgF metN fliM yvgK ykuS cwlJ ydcI ypjC yjlA S1022 nifS ypuD leuB tatCD ytbD ycsG ykzC fhuD spoIISA spsI yknX yisS sleB tlp yprA ssuB yfhC yunE katA ywjH spoIIQ S512 rluB S645 dnaN yoeC ywfH yclP ylxY glgC argG iolJ yoyC spoVAB ytcJ ydhK cwlC oppF sufC nfrA folD S907 mrpB fabD cysE yplQ yitU pdhA yuaE tnrA yusI sacX purC S968 S718 yyzM mtlA yfjO pdhD atpG engD levF phrA walHrplV bioY yfkJ yugU yrzK ilvC yccF yceI pfkA malQ speD oxaAB bdbD ytpS cyeA nprE tyrA ybzI liaG ylmH thiT yheE ydeH gabP ydhB kdgK yfiU ywhC blt yvyG citR yugG ywtG ylbN ylaN ndhF yjbImcpC liaH ybaR csbA yhcWahpC fliQ exoA kbaA phoR prs ywsA resC clpC yqgW yqfU ypbG mscL fliR ctaC queA S181 mtnA yoaD S275 bkdAA bsn S503 ythA S796 ytrD guaC ftsZ nucA ywrO ytpA fruK bceR ypkP yunC ctaO minC xsa yhjH tcyP yhcU sdhC S857 ybbAnupC mutM ytpI rpmEB spoIIIAG yyaK spxA yusH yczE ydeB yczG yfnB flgD dxs ilvK gatC yhxC trpS yxcA pcrA licA ysdA dtd cheW ygzB ykoL S903 ycgQ yqjB nsrR hslO ywoG mreD zosA ampS yetN yobS pta rho tagO ypiA ykoY ycnJ yneN yqhL yflS ypzG cmk yabP hinT mntR ypbE racA ycbC trpB secG yceF ytjA aroB ydbS ywdK albA S72 yycAyycQ yndG ynzB clsA yfkH ppnKB ybxI yqzH yojO sigF rpoZ ydeL spoIIIAB dnaJ S423 racE pbpB pucF pepT ywkF estA gerD yodI ylbP qcrB S653 yqkD yybO yfkI yjbL yuzF adhA mtnE yhcK S345 gerPA yneJ hisI yxaI odhA sinI ybaC yhzF yocI yqeI yutE mhqR ykaA S612 argC ypmB hemZ rplL ctaG yhcC tdk yhdJ yurR gndA iscS bioYB yvfG fhuC ybcC dppE dnaC ypzA ypoP yjcL pbuE yqzF recF yvbJ ytzI lysS yfjT uppS yqeB ywrD fosB ydhH S415 ykoA moeB yurZ yvaB ytaF pheT S1251 yocC glpP yusF thiC coaX gapB yxiS cycB ydhD thiU yfhL ccdA fliZ S1583 yojK rplO radC nasE glpF ycgP udk ywnFcspR ykzI ydzK qcrC pcrB sigL yfnA degS splA rplB yusG yceC cstA acpA ylzI ytpB yqeF obg comEA ykzG ktrA pbpG ydaH hemA spoVR yxjI citZ secE ytoP ydcA pucH cwlS mmsA rnmV ptsH bkdR safA uvrB yjqA gcaD fadR ykoG polYB pbpD ylxS ywdJ phoD yqgT yfmL mcpB yyaB sspG nosA yhbI aroF yhdK yrdC htrA acuB sigO yvgO yttP yhjC ygxA yppF nrnA S140 dppA nagP yesR treR rok yebC spo0M pabC ytzB resA yhzC ydjH ykoN ylaI ubiD flhA ywhD yuaB yoyD tcyA yqgS S357 iscU ydaT ydeD yvdJ ykkA yqjV ytzH yitW yhdP narG motB fliW ctrA opuE yezE hutH htpX ripX spo0F ypjP mecB citM ytkL yxkC bcd ispF yrzL leuD cgeD iolR ylbH swrC iolC ribE prpE rpmF ctc fabL araQ serS ycbD S1495 pstA yhfI yjiA sspH valS yfhD pbpA tuaG ylmD ctsR yaaB glnQ ybdN ypmS bdhA ydaK recX yckD ytzG dltD slrA yjbA araA ydbJ ykzB yetJ cypB mstX yfkL yoeAnrdF pdaA epsD ywfM ydeS yugP gltX patB ydfO yaaT yugM yitK glyS frlO ydeF sqhC S1024 xkdA ribC fliY scpA ylxP ligA yecA yunD mntA ywhL S659 yhaM atpC glxK yhfM dgkA yqgC yjdJ yycN yvoD S665 cysP yqzG ptsG fadM ppnKA ktrC ycnB ykuU ymdB ymcC yuaF yrhO yjcA ftsH yttA rpsF ytzA ysfB sirC ypqA yrrL yheI parE yfmT moaC ykwD serA yfhK yabD gerPD yloV S313 ydiF ylxL gltT flhB ydfL ythP yrrM yvdS rplNA fliJ engA plsX yocH artQ cysH yaaO coaBC fur albE xhlB ydgF fliL ypzKycgF yccK ykoI pncA efeM ywdF ydaD yflB yngG ydjI yisY prpC aspS purR pyrH yngL yczF yvdQ yitZ yflK scoB yuxN tcyC ypzH khtS aroK mlpA yesU yqhO ykvU yqkB ylaE pyrB cheB yerD mgsA ribAB yojB yybH tenA S547 carB minJ sumT epsL ytcIrecR yueF acoC fer yrrC ponA ileS msmRyptA S849 yjfA divIVA ydfM yoaA yxkO yozD S1052 pit citS yphF yclE cysS yufK defA noc yvzF yqfW S809 S111 ytxJ ytzJ sbcC mtnK yycP dapA yraJ sufB ymaD trpA pnpA ysmA eag adk ymdAyulC folB ygaD ykoQ yueE ypmA glmS recG sdhA adcB yheH gcvT yhdXywgA ykvT ycnI lrgB lrgA cwlO rapC yweA gmuR aspB ilvB yvrA metE yutJ yfkM S728 ynzL ynbB ytrH metI cydB ybfO ymaB aroH scuA gltB murAA galT comER yhdN veg S2 yngKyraK jag ywlB ykrK rhgT glpK yfkF ispG yodT pbpI deoC frlD cwlD yqzD ywcD moeAspmA bmrA dhbA yisJ thiW pth yvcD yqfT ycbN yhaL xynD yjqC asnB fliS yfhQ rplC rnhB S348 yxdM yocD lysP yhdA ycgM S1029 hemH ydbO yodL flgC yceH amyC sspP opuCB gmuG ccpC yxjA tufA ynzD albB rpsP rapD garD ybeF cspB ykzW yufL ylaK yjjA citT fliG cshA yqgQ tsf yaaA yebE yoxD radA trxB ypbB parA ydfC hisS fabI nadB secDF yflE yvaV albF pdxS yktD yoxA azlC S458 mccAsul mraZ ylaJ fnr yuiA yabK ylbJ pel infA ndoA yesJ pheA ycbP yydA trpE yxjN rnr ywlA dagK ybfQ ybbJ ytlA rocD rpsM cysK ganP yvaA manR ykuIgspA yviA ymfD yazB yqgB ydjM cssR rocR rnjB ywkB yugO yheD ywmF S145 ydaM yugJ yneEsacB lytS ydhE yerI catD ypdA ydjE rapB purE motA ymfC tilS ywjAopuD pdhC queC yitJ rpsNB efp ytrA xepA yflL yuiD icd feuB pbuO ytwF yvqJ yjbQ sodF yczI rsgI dnaG rpmB kdgR yetO ybbD ctpA recO ytsJ yybR speB glnA ywlG yknW yjbM yjcF S144 ydfF araB sspE yqxK yitS yueG bofC tyrS cotV mtnW yxeP S1534 gmuE rpoC licB yrrB ytkK spoIIAA ydeM yqxD rapJ yvsG ycbG ytrP csbD flhF yknT fabHA exuR yrdA pucI ycnL ydzA aroA cdd ywnA dusB yabO prsW nasB yjaV yhcM ybdO yuxG pelB yjbK vmlR yeaC yaaQ S1136 rplE patA yvyF yitY yvdD yisI recA ysxE yjnA ywtE bltR manP hisH ypwA pyrR rimO yvgN fadF yqfC mreB yacD divIB fmnP Figure S1. Conserved target network Only targets with a conservation flag were extracted from the Table of predictions (Table S1) and formatted for making a network plot with the open source software Gephi (http://gephi.github.io). 117 In silico target profiling Figure S2. Structural LocARNA alignment of the FsrA/S512 sequence Sequence alignment of FsrA/S512 using local nBLAST results for the set of Bacillus genomes processed for alignment and secondary structure prediction using the LocARNA algorithm (http://rna.informatik.uni-freiburg.de/LocARNA/ Input.jsp) (57). Figure S3. Representative 2D PAGE analysis of proteins in an FsrA/S512 mutant and its parental strain. Protein names of significantly changed spots identified with MALDI-TOF MS are indicated. Figure S4. Hierarchical tree of changes observed by 2D PAGE analysis in the FsrA/S512 mutant strain compared to its parental strain 168 Tree of changes in protein abundance identified by replicate 2D PAGE analyses of an FsrA/S512 mutant strain compared to its parental strain 168. Figure S5. Significant changes in the abundance of proteins identified by 2D PAGE analysis of an FsrA/S512 mutant compared to its parental strain Overview of significant changes in protein abundance using data obtained through the analysis shown in Figures S3 and S4. 3 4 3,2,3 3 1 0.4 3 1.7 CymR=0.2 -0.3 5 4 -0.3,0 -0.1 1.5 -0.2 Figure S6. Deregulation of sulfur metabolism in an FsrA/S512 mutant Fold changes in the expression of genes for sulphur metabolism upon deletion of FsrA/S512 are indicated on the sulfur metabolism network. Positive (mutant / parental) fold changes are plotted in red and negative fold changes in dark green. The expression of genes responsible for cysteine biosynthesis is increased. Adapted from (79). 118 Chapter 4 Figure S7. Structural LocARNA alignment of the S462 sequence Sequence alignment of S462 using local nBLAST results for the set of Bacillus genomes processed for alignment and secondary structure prediction using the LocARNA algorithm (57). 0 50 100 150 Figure S8. Structural LocARNA alignment of the RsaE/S415 sequence Sequence alignment of RsaE/S415 using local nBLAST results for the set of Bacillus genomes processed for alignment and secondary structure prediction using the LocARNA algorithm (57). .((((.((((.....)))).(((((.(((((.((((((..((((...((((((.........))))))...))))..))))))...))))).......((((.....)))))))))......)))) AAAGTCGACATCTTTTGTTATCATAAGGATGTGAAATTGATCACAAACAAACATTACCCCTTTGTTTGACCGTGAAAAATTTCTCCCATCCCCTTTGTTGTCGTTAAGACATATGAAACCGCGCTT Figure S9. Predicted targets of RsaE/S415 plotted on the RsaE/S415 sequence. For legend see Figure 11 in main text. 119 In silico target profiling References 1. Gorke,B. and Vogel,J. (2008) Noncoding RNA control of the making and breaking of sugars. Genes Dev., 22, 29142925. 2. Liu,J.M. and Camilli,A. (2010) A broadening world of bacterial small RNAs. Curr. Opin. Microbiol., 13, 18-23. 3. Storz,G., Vogel,J. and Wassarman,K.M. (2011) Regulation by small RNAs in bacteria: Expanding frontiers. Mol. Cell, 43, 880-891. 4. Shimoni,Y., Friedlander,G., Hetzroni,G., Niv,G., Altuvia,S., Biham,O. and Margalit,H. (2007) Regulation of gene expression by small non-coding RNAs: A quantitative view. Mol. Syst. Biol., 3, 138. 5. Jost,D., Nowojewski,A. and Levine,E. (2013) Regulating the many to benefit the few: Role of weak small RNA targets. Biophys. J., 104, 1773-1782. 6. Cech,T.R. and Steitz,J.A. (2014) The noncoding RNA revolution-trashing old rules to forge new ones. Cell, 157, 7794. 7. Waters,L.S. and Storz,G. (2009) Regulatory RNAs in bacteria. Cell, 136, 615-628. 8. Sharma,C.M., Papenfort,K., Pernitzsch,S.R., Mollenkopf,H.J., Hinton,J.C. and Vogel,J. (2011) Pervasive posttranscriptional control of genes involved in amino acid metabolism by the Hfq-dependent GcvB small RNA. Mol. Microbiol., 81, 1144-1165. 9. Jousselin,A., Metzinger,L. and Felden,B. (2009) On the facultative requirement of the bacterial RNA chaperone, Hfq. Trends Microbiol., 17, 399-405. 10. Gaballa,A., Antelmann,H., Aguilar,C., Khakh,S.K., Song,K.B., Smaldone,G.T. and Helmann,J.D. (2008) The Bacillus subtilis iron-sparing response is mediated by a Fur-regulated small RNA and three small, basic proteins. Proc. Natl. Acad. Sci. U. S. A., 105, 11927-11932. 11. Smaldone,G.T., Revelles,O., Gaballa,A., Sauer,U., Antelmann,H. and Helmann,J.D. (2012) A global investigation of the Bacillus subtilis iron-sparing response identifies major changes in metabolism. J. Bacteriol., 194, 2594-2605. 12. Heidrich,N., Chinali,A., Gerth,U. and Brantl,S. (2006) The small untranslated RNA SR1 from the Bacillus subtilis genome is involved in the regulation of arginine catabolism. Mol. Microbiol., 62, 520-536. 13. Hammerle,H., Amman,F., Vecerek,B., Stulke,J., Hofacker,I. and Blasi,U. (2014) Impact of hfq on the Bacillus subtilis transcriptome. PLoS One, 9, e98661. 14. Saito,S., Kakeshita,H. and Nakamura,K. (2009) Novel small RNA-encoding genes in the intergenic regions of Bacillus subtilis. Gene, 428, 2-8. 15. Schmalisch,M., Maiques,E., Nikolov,L., Camp,A.H., Chevreux,B., Muffler,A., Rodriguez,S., Perkins,J. and Losick,R. (2010) Small genes under sporulation control in the Bacillus subtilis genome. J. Bacteriol., 192, 5402-5412. 16. Irnov,I., Sharma,C.M., Vogel,J. and Winkler,W.C. (2010) Identification of regulatory RNAs in Bacillus subtilis. Nucleic Acids Res., 38, 6637-6651. 17. Nicolas,P., Mader,U., Dervyn,E., Rochat,T., Leduc,A., Pigeonneau,N., Bidnenko,E., Marchadier,E., Hoebeke,M., Aymerich,S., et al. (2012) Condition-dependent transcriptome reveals high-level regulatory architecture in Bacillus subtilis. Science, 335, 1103-1106. 18. Geissmann,T., Chevalier,C., Cros,M.J., Boisset,S., Fechter,P., Noirot,C., Schrenzel,J., Francois,P., Vandenesch,F., Gaspin,C., et al. (2009) A search for small noncoding RNAs in Staphylococcus aureus reveals a conserved sequence motif for regulation. Nucleic Acids Res., 37, 7239-7257. 19. Bohn,C., Rigoulay,C., Chabelskaya,S., Sharma,C.M., Marchais,A., Skorski,P., Borezee-Durant,E., Barbet,R., Jacquet,E., Jacq,A., et al. (2010) Experimental discovery of small RNAs in Staphylococcus aureus reveals a riboregulator of central metabolism. Nucleic Acids Res., 38, 6620-6636. 20. Chabelskaya,S., Gaillot,O. and Felden,B. (2010) A Staphylococcus aureus small RNA is required for bacterial virulence and regulates the expression of an immune-evasion molecule. PLoS Pathog., 6, e1000927. 21. Backofen,R. and Hess,W.R. (2010) Computational prediction of sRNAs and their targets in bacteria. RNA Biol., 7, 33-42. 22. Sharma,C.M. and Vogel,J. (2009) Experimental approaches for the discovery and characterization of regulatory small RNA. Curr. Opin. Microbiol., 12, 536-546. 23. Tjaden,B., Goodwin,S.S., Opdyke,J.A., Guillier,M., Fu,D.X., Gottesman,S. and Storz,G. (2006) Target prediction for small, noncoding RNAs in bacteria. Nucleic Acids Res., 34, 2791-2802. 24. Busch,A., Richter,A.S. and Backofen,R. (2008) IntaRNA: Efficient prediction of bacterial sRNA targets incorporating target site accessibility and seed regions. Bioinformatics, 24, 2849-2856. 25. Eggenhofer,F., Tafer,H., Stadler,P.F. and Hofacker,I.L. (2011) RNApredator: Fast accessibility-based prediction of sRNA targets. Nucleic Acids Res., 39, W149-54. 26. Wright,P.R., Richter,A.S., Papenfort,K., Mann,M., Vogel,J., Hess,W.R., Backofen,R. and Georg,J. (2013) Comparative genomics boosts target prediction for bacterial small RNAs. Proc. Natl. Acad. Sci. U. S. A., 110, E3487-96. 27. Tjaden,B. (2008) TargetRNA: A tool for predicting targets of small RNA action in bacteria. Nucleic Acids Res., 36, 120 Chapter 4 W109-13. 28. Modi,S.R., Camacho,D.M., Kohanski,M.A., Walker,G.C. and Collins,J.J. (2011) Functional characterization of bacterial sRNAs using a network biology approach. Proc. Natl. Acad. Sci. U. S. A., 108, 15522-15527. 29. Smolke,C.D. and Keasling,J.D. (2002) Effect of gene location, mRNA secondary structures, and RNase sites on expression of two genes in an engineered operon. Biotechnol. Bioeng., 80, 762-776. 30. Prevost,K., Desnoyers,G., Jacques,J.F., Lavoie,F. and Masse,E. (2011) Small RNA-induced mRNA degradation achieved through both translation block and activated cleavage. Genes Dev., 25, 385-396. 31. Peer,A. and Margalit,H. (2011) Accessibility and evolutionary conservation mark bacterial small RNA targetbinding regions. J. Bacteriol., 193, 1690-1701. 32. Chao,Y., Papenfort,K., Reinhardt,R., Sharma,C.M. and Vogel,J. (2012) An atlas of Hfq-bound transcripts reveals 3’ UTRs as a genomic reservoir of regulatory small RNAs. EMBO J., 31, 4005-4019. 33. Vogel,J. and Luisi,B.F. (2011) Hfq and its constellation of RNA. Nat. Rev. Microbiol., 9, 578-589. 34. Saito,S., Kakeshita,H. and Nakamura,K. (2009) Novel small RNA-encoding genes in the intergenic regions of Bacillus subtilis. Gene, 428, 2-8. 35. Smaldone,G.T., Antelmann,H., Gaballa,A. and Helmann,J.D. (2012) The FsrA sRNA and FbpB protein mediate the iron-dependent induction of the Bacillus subtilis LutABC iron-sulfur containing oxidases. J. Bacteriol., 194, 2586-2593. 36. Gimpel,M., Preis,H., Barth,E., Gramzow,L. and Brantl,S. (2012) SR1--a small RNA with two remarkably conserved functions. Nucleic Acids Res., 40(22), 11659-11672. 37. Gimpel,M., Heidrich,N., Mader,U., Krugel,H. and Brantl,S. (2010) A dual-function sRNA from B. subtilis: SR1 acts as a peptide encoding mRNA on the gapA operon. Mol. Microbiol., 76, 990-1009. 38. Preis,H., Eckart,R.A., Gudipati,R.K., Heidrich,N. and Brantl,S. (2009) CodY activates transcription of a small RNA in Bacillus subtilis. J. Bacteriol., 191, 5446-5457. 39. Heidrich,N., Moll,I. and Brantl,S. (2007) In vitro analysis of the interaction between the small RNA SR1 and its primary target ahrC mRNA. Nucleic Acids Res., 35, 4331-4346. 40. Marchais,A., Duperrier,S., Durand,S., Gautheret,D. and Stragier,P. (2011) CsfG, a sporulation-specific, small noncoding RNA highly conserved in endospore formers. RNA Biol., 8, 358-364. 41. Loh,E., Dussurget,O., Gripenland,J., Vaitkevicius,K., Tiensuu,T., Mandin,P., Repoila,F., Buchrieser,C., Cossart,P. and Johansson,J. (2009) A trans-acting riboswitch controls expression of the virulence regulator PrfA in Listeria monocytogenes. Cell, 139, 770-779. 42. Licht,A., Preis,S. and Brantl,S. (2005) Implication of CcpN in the regulation of a novel untranslated RNA (SR1) in Bacillus subtilis. Mol. Microbiol., 58, 189-206. 43. Barrick,J.E., Corbino,K.A., Winkler,W.C., Nahvi,A., Mandal,M., Collins,J., Lee,M., Roth,A., Sudarsan,N., Jona,I., et al. (2004) New RNA motifs suggest an expanded scope for riboswitches in bacterial genetic control. Proc. Natl. Acad. Sci. U. S. A., 101, 6421-6426. 44. Silvaggi,J.M., Perkins,J.B. and Losick,R. (2006) Genes for small, noncoding RNAs under sporulation control in Bacillus subtilis. J. Bacteriol., 188, 532-541. 45. Schmalisch,M., Maiques,E., Nikolov,L., Camp,A.H., Chevreux,B., Muffler,A., Rodriguez,S., Perkins,J. and Losick,R. (2010) Small genes under sporulation control in the Bacillus subtilis genome. J. Bacteriol., 192, 5402-5412. 46. Wadler,C.S. and Vanderpool,C.K. (2007) A dual function for a bacterial small RNA: SgrS performs base pairingdependent regulation and encodes a functional polypeptide. Proc. Natl. Acad. Sci. U. S. A., 104, 20454-20459. 47. Kery,M.B., Feldman,M., Livny,J. and Tjaden,B. (2014) TargetRNA2: Identifying targets of small regulatory RNAs in bacteria. Nucleic Acids Res., 42, W124-9. 48. Beisel,C.L. and Storz,G. (2010) Base pairing small RNAs and their roles in global regulatory networks. FEMS Microbiol. Rev., 34, 866-882. 49. Mader,U., Schmeisky,A.G., Florez,L.A. and Stulke,J. (2012) SubtiWiki--a comprehensive community resource for the model organism Bacillus subtilis. Nucleic Acids Res., 40, D1278-87. 50. Papenfort,K., Podkaminski,D., Hinton,J.C. and Vogel,J. (2012) The ancestral SgrS RNA discriminates horizontally acquired salmonella mRNAs through a single G-U wobble pair. Proc. Natl. Acad. Sci. U. S. A., 109, E757-64. 51. Smits,W.K. and Grossman,A.D. (2010) The transcriptional regulator Rok binds A+T-rich DNA and is involved in repression of a mobile genetic element in Bacillus subtilis. PLoS Genet., 6, e1001207. 52. Kaberdin,V.R. and Blasi,U. (2006) Translation initiation and the fate of bacterial mRNAs. FEMS Microbiol. Rev., 30, 967-979. 53. Levine,E., Zhang,Z., Kuhlman,T. and Hwa,T. (2007) Quantitative characteristics of gene regulation by small RNA. PLoS Biol., 5, e229. 54. Papenfort,K., Bouvier,M., Mika,F., Sharma,C.M. and Vogel,J. (2010) Evidence for an autonomous 5’ target recognition domain in an Hfq-associated small RNA. Proc. Natl. Acad. Sci. U. S. A., 107, 20435-20440. 55. Beisel,C.L., Updegrove,T.B., Janson,B.J. and Storz,G. (2012) Multiple factors dictate target selection by Hfq-binding small RNAs. EMBO J., 31, 1961-1974. 121 In silico target profiling 56. Na,D., Yoo,S.M., Chung,H., Park,H., Park,J.H. and Lee,S.Y. (2013) Metabolic engineering of Escherichia coli using synthetic small regulatory RNAs. Nat. Biotechnol., 31, 170-174. 57. Will,S., Joshi,T., Hofacker,I.L., Stadler,P.F. and Backofen,R. (2012) LocARNA-P: Accurate boundary prediction and improved detection of structural RNAs. RNA, 18, 900-914. 58. Botella,E., Fogg,M., Jules,M., Piersma,S., Doherty,G., Hansen,A., Denham,E.L., Le Chat,L., Veiga,P., Bailey,K., et al. (2010) pBaSysBioII: An integrative plasmid generating gfp transcriptional fusions for high-throughput analysis of gene expression in Bacillus subtilis. Microbiology, 156, 1600-1608. 59. Smits,W.K., Dubois,J.Y., Bron,S., van Dijl,J.M. and Kuipers,O.P. (2005) Tricksy business: Transcriptome analysis reveals the involvement of thioredoxin A in redox homeostasis, oxidative stress, sulfur metabolism, and cellular differentiation in Bacillus subtilis. J. Bacteriol., 187, 3921-3930. 60. Frees,D., Savijoki,K., Varmanen,P. and Ingmer,H. (2007) Clp ATPases and ClpP proteolytic complexes regulate vital biological processes in low GC, gram-positive bacteria. Mol. Microbiol., 63, 1285-1295. 61. Elsholz,A.K., Hempel,K., Michalik,S., Gronau,K., Becher,D., Hecker,M. and Gerth,U. (2011) Activity control of the ClpC adaptor McsB in Bacillus subtilis. J. Bacteriol., 193, 3887-3893. 62. Sekowska,A., Robin,S., Daudin,J.J., Henaut,A. and Danchin,A. (2001) Extracting biological information from DNA arrays: An unexpected link between arginine and methionine metabolism in Bacillus subtilis. Genome Biol., 2, RESEARCH0019. 63. Zweers,J.C., Nicolas,P., Wiegert,T., van Dijl,J.M. and Denham,E.L. (2012) Definition of the sigma(W) regulon of Bacillus subtilis in the absence of stress. PLoS One, 7, e48471. 64. Zweers,J.C., Wiegert,T. and van Dijl,J.M. (2009) Stress-responsive systems set specific limits to the overproduction of membrane proteins in Bacillus subtilis. Appl. Environ. Microbiol., 75, 7356-7364. 65. Yu,W.B., Yin,C.Y., Zhou,Y. and Ye,B.C. (2012) Prediction of the mechanism of action of fusaricidin on Bacillus subtilis. PLoS One, 7, e50003. 66. Krogh,A., Larsson,B., von Heijne,G. and Sonnhammer,E.L. (2001) Predicting transmembrane protein topology with a hidden markov model: Application to complete genomes. J. Mol. Biol., 305, 567-580. 67. Breukink,E. and de Kruijff,B. (2006) Lipid II as a target for antibiotics. Nat. Rev. Drug Discov., 5, 321-332. 68. Mascher,T., Hachmann,A.B. and Helmann,J.D. (2007) Regulatory overlap and functional redundancy among Bacillus subtilis extracytoplasmic function sigma factors. J. Bacteriol., 189, 6919-6927. 69. Kingston,A.W., Liao,X. and Helmann,J.D. (2013) Contributions of the sigma(W) , sigma(M) and sigma(X) regulons to the lantibiotic resistome of Bacillus subtilis. Mol. Microbiol., 90, 502-518. 70. Buescher,J.M., Liebermeister,W., Jules,M., Uhr,M., Muntel,J., Botella,E., Hessling,B., Kleijn,R.J., Le Chat,L., Lecointe,F., et al. (2012) Global network reorganization during dynamic adaptations of Bacillus subtilis metabolism. Science, 335, 1099-1103. 71. Kunst,F. and Rapoport,G. (1995) Salt stress is an environmental signal affecting degradative enzyme synthesis in Bacillus subtilis. J. Bacteriol., 177, 2403-2407. 72. Tanaka,K., Henry,C.S., Zinner,J.F., Jolivet,E., Cohoon,M.P., Xia,F., Bidnenko,V., Ehrlich,S.D., Stevens,R.L. and Noirot,P. (2013) Building the repertoire of dispensable chromosome regions in Bacillus subtilis entails major refinement of cognate large-scale metabolic model. Nucleic Acids Res., 41, 687-699. 73. Reilman,E., Mars,R.A., van Dijl,J.M. and Denham,E.L. (2015) The multidrug ABC transporter BmrC/BmrD of Bacillus subtilis is regulated via a ribosome-mediated transcriptional attenuation mechanism. Nucleic Acids Res., 42, 11393-11407. 74. Maass,S., Sievers,S., Zuhlke,D., Kuzinski,J., Sappa,P.K., Muntel,J., Hessling,B., Bernhardt,J., Sietmann,R., Volker,U., et al. (2011) Efficient, global-scale quantification of absolute protein amounts by integration of targeted mass spectrometry and two-dimensional gel-based proteomics. Anal. Chem., 83, 2677-2684. 75. Starcher,B. (2001) A ninhydrin-based assay to quantitate the total protein content of tissue samples. Anal. Biochem., 292, 125-129. 76. Buttner,K., Bernhardt,J., Scharf,C., Schmid,R., Mader,U., Eymann,C., Antelmann,H., Volker,A., Volker,U. and Hecker,M. (2001) A comprehensive two-dimensional map of cytosolic proteins of Bacillus subtilis. Electrophoresis, 22, 2908-2935. 77. Wolf,C., Hochgrafe,F., Kusch,H., Albrecht,D., Hecker,M. and Engelmann,S. (2008) Proteomic analysis of antioxidant strategies of Staphylococcus aureus: Diverse responses to different oxidants. Proteomics, 8, 3139-3153. 78. Goosens,V.J., Mars,R.A., Akeroyd,M., Vente,A., Dreisbach,A., Denham,E.L., Kouwen,T.R., van Rij,T., Olsthoorn,M. and van Dijl,J.M. (2013) Is proteomics a reliable tool to probe the oxidative folding of bacterial membrane proteins? Antioxid. Redox Signal., 18, 1159-1164. 79. Even,S., Burguiere,P., Auger,S., Soutourina,O., Danchin,A. and Martin-Verstraete,I. (2006) Global control of cysteine metabolism by CymR in Bacillus subtilis. J. Bacteriol., 188, 2184-2197. 122 Chapter 4 123
© Copyright 2026 Paperzz