UvA-DARE (Digital Academic Repository) Multi-omics analyses of the molecular physiology and biotechnology of Escherichia coli and Synechocystis sp. PCC6803 Borirak, O. Link to publication Citation for published version (APA): Borirak, O. (2016). Multi-omics analyses of the molecular physiology and biotechnology of Escherichia coli and Synechocystis sp. PCC6803 General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: http://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. UvA-DARE is a service provided by the library of the University of Amsterdam (http://dare.uva.nl) Download date: 18 Jun 2017 Multi-omics analyses of the molecular physiology and biotechnology of Escherichia coli and Synechocystis sp. PCC6803 Orawan Borirak Multi-omics analyses of the molecular physiology and biotechnology of Escherichia coli and Synechocystis sp. PCC6803 Orawan Borirak Copyright © 2016 Orawan Borirak All rights reserved. No part of this publication may be reproduced in any form without permission from the author, or where appropriate, of the publisher involved. The research described in this thesis was carried out in the group of Molecular Microbial Physiology and the group of Mass Spectrometry of Biomacromolecules at the Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, The Netherlands. The research was funded by the Higher Education Commission, Thailand and the University of Amsterdam. Publication of this thesis was financially supported by the Higher Education Commission, Thailand. Cover designed by Freepik.com ISBN 978-94-028-0098-2 Printed by: Ipskamp Drukkers, Enschede, The Netherlands. Multi-omics analyses of the molecular physiology and biotechnology of Escherichia coli and Synechocystis sp. PCC6803 ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Universiteit van Amsterdam op gezag van de Rector Magnificus prof. dr. D.C. van den Boom ten overstaan van een door het College voor Promoties ingestelde commissie, in het openbaar te verdedigen in de Agnietenkapel op donderdag 7 april 2016 te 14.00 uur door Orawan Borirak geboren te Ratchaburi, Thailand Promotiecommissie: Promotores: Prof. dr. K. J. Hellingwerf Universiteit van Amsterdam Prof. dr. C. G. de Koster Universiteit van Amsterdam Copromotor: Dr. L. J. de Koning Universiteit van Amsterdam Overige leden: Prof. dr. H. V. Westerhoff Universiteit van Amsterdam Prof. dr. B. Teusink Vrije Universiteit Prof. dr. A. Smilde Universiteit van Amsterdam Dr. F. Branco dos Santos Universiteit van Amsterdam Dr. M. Bekker NIZO Faculteir der Natuurwetenschappen, Wiskunde en Informatica Table of contents Chapter 1: General introduction………………………………………………………………………. Chapter 2: Molecular physiology of the dynamic regulation of Carbon Catabolite Repression in Escherichia coli………………………………………… Chapter 3: 49 Contribution of regulatory RNAs to the post-transcriptional control of Carbon Catabolite Repression in Escherichia coli………………………… Chapter 5: 31 Time-series analysis of the transcriptome and proteome of E. coli upon glucose repression………………………………………………………………… Chapter 4: 7 79 Quantitative proteomics analysis of an ethanol- and a lactateproducing mutant strain of Synechocystis sp. PCC6803..................... 101 General discussion…………………………………………………………………………. 125 Summary/Samenvatting………………………………………………………………………………………. 145 List of publications……………………………………………………………………………………………….. 153 Acknowledgements………………………………………………………………………………………………. 155 Chapter 6: Chapter 1: General introduction Orawan Borirak CHAPTER 1 1.1 A sustainable future Hand-in-hand with the start of the industrial revolution human society has initiated exploration of various forms of fossil carbon (i.e. coal, oil, and gas) as a cheap and abundantly available source of (free) energy and of bulk materials. During the past two decades it has become clear that this use of fossil carbon, besides many positive effects like a significant increase in food supply (e.g. through the availability of fertilizer) and in general life expectancy, has also led to a very significant increase in atmospheric CO2 levels (1–3). A gradual awareness is now developing, particularly because of the studies and work of the International Panel for Climate Control (IPCC), that these latter increases cause a real threat to humanity because of their consequences, like global warming, and waterlevel rises and acidification of the oceans (4). To keep these negative consequences manageable humanity will have to refrain from the use of the majority of the fossil carbon reserves that are available (5) and (gradually) change to the use of renewable energy, to be generated with e.g. windmills, photovoltaic cells, hydroelectric and tidal power plants, etc. Accordingly, the global carbon cycle may be transformed into a true ‘cycle’ again, in which phototrophs fix CO2 for food and materials, to be used in part by heterotrophs like us, to burn a part of this phototrophic biomass as our energy source. Such a development is often referred to as an evolution towards a bio-based economy. In this development a fundamental understanding of the molecular basis of life, particularly at the basic unit of life, the cell, in all its forms of expression is of utmost importance. This is important because of the need to improve their optimal functioning, both in healthcare, to fight disease, and to optimize their application in e.g. photosynthesis and biotechnology. The most rational and unifying approach for this is the application of systems biology. This system understanding is essential for elucidation and prediction of the ‘behavior’ of complex biological systems so that further development and application regarding those systems e.g. for production of drugs, chemicals, and fuels can be established. Knowledge that will be generated accordingly can be exploited for applications in metabolic engineering and synthetic biology. 1.2 The system biology approach 1.2.1 Aim The aim of a system biology approach is to provide a deep, quantitative, understanding of a biological system by integrating the knowledge obtained at the molecular level into fundamental principles and relations, i.e. a mathematical model (6). Towards this goal one needs to collect data regarding the structure and functioning of the system, including all 8 GENERAL INTRODUCTION relevant network components and their interactions. This requires application of largescale omics approaches. The information thus obtained then allows one to set up detailed mathematical models of the system under study, e.g. via hypothesis-driven mathematical modeling approaches (Figure 1). With such models an iterative process can be initiated in which system understanding can be brought to the system level of all aspects of a living cell (6–8). 1.2.2 Omics technologies - the driving force behind system biology The system biology approach has been flourishing thanks to the advance in the technologies for omics studies. These technologies include transcriptomics, proteomics, metabolomics and fluxomics, and enable researchers to observe comprehensively (relative) changes in the level of ‘all’ molecules of a particular nature, such as mRNA, protein, and metabolites, in a biological system simultaneously. Figure 1. Various system biology approaches (i.e. use of large-scale omics techniques, data-driven computational approaches and hypothesis-driven mathematical modeling) generate knowledge and understanding about a cellular system, in particular its physiology. The first two approaches primarily support identification of network components and their interactions, the latter approach is suited to further system understanding and predictions, often described as the ultimate goals of a system biology approach. Figure reproduced from (8) with permission. Transcriptomics analyses are the most straightforward of the available omics technologies. These are based on the analysis of transcript profiles, with the help of oligonucleotide microarrays. This technique by now is well-established with high 9 CHAPTER 1 throughput capacity and relatively inexpensive, to the extent that generally accepted quality guidelines have been accepted by the research community with respect to the type of experimental protocol to be used for such experiments (MIAME; see (9)). This status was achievable because of the completeness and non-ambiguity with which genome sequence data can be obtained, in combination with well-developed protocols for the isolation of cellular RNA and oligonucleotide hybridization. However, these hybridization techniques are subject to several limitations, e.g. uncertainty regarding the genome annotation, high background levels caused by cross-hybridization, and the limited dynamic range of detection (10, 11). Recently a new technique, called RNA-seq, has been introduced to analyze transcriptome profiles (10). In contrast to hybridization-based techniques, the RNA-seq technique directly determines cDNA sequences, using nextgeneration sequencing technologies (10, 12). The number of reads of a particular sequence then reflects the abundance of the corresponding mRNA. Despite several advantages over hybridization-based technologies, still many obstacles need to be overcome, before this alternative technology will have fully matured, such as library construction and the development of suitable bioinformatics tools for robust analysis of the results, and especially reduction of the costs per analysis (10). A similar situation is, however, not within reach for the other three omics techniques, proteomics, metabolomics and fluxomics. The much larger variety in chemical structure of the corresponding classes of molecules significantly complicates their comprehensive analysis. Proteomics approaches focus on obtaining comprehensive quantitative data of protein expression levels, and their changes under the influence of physico-chemical or biological perturbations in the environment of a particular organism or cellular system of interest (13). Quantitative proteomics approaches classify into two categories, i.e. relative and absolute quantifications. Relative proteomics quantitation measures relative changes in protein expression levels, by comparing those to that of a reference proteome. Data usually are expressed as ‘fold-change’, relative to the reference condition. Absolute proteomics quantitation determines exact amounts or concentrations of expressed proteins, e.g. in the units of ng/ml of plasma (14). The quantitative proteomics approaches emerged from the application of two-dimensional gel electrophoresis (2-DE), i.e. combining iso-electric focusing and SDS-poly-acrylamide electrophoresis, in the analysis of whole cell lysates, in combination with staining techniques, and fluorescence- and/or Western detection (15). A major improvement in the technique was then achieved via the application of mass spectrometry in the identification of individual proteins, via analysis of 10 GENERAL INTRODUCTION peptides derived from in-gel proteolysis of gel-spot derived protein samples. Gradually, advanced liquid chromatography hyphenated with tandem mass spectrometry (LCMS/MS) has out-competed the 2-DE approach as the separation medium, in combination with ‘in-solution proteolysis’ of the proteome prior to the separation procedure. An important limitation of this technique lies in the quantitation of the individual components of the proteome. Because of the large variability of peptide ionization efficiency as a function of peptide sequence, absolute quantification has to be achieved through independent calibration with known concentrations of synthetic, isotopically labeled, peptides of a specific sequence, such as in the AQUA (16) and QconCAT techniques (17). Thus, this approach of absolute quantitation is not available to the whole proteome, but rather limited to a small number of preselected proteins (14). Less problematic is to quantitate relative changes of the proteome by using metabolic labeling 13 15 techniques, e.g. with [ C]glucose or [ N]H4Cl (18), or with SILAC (Stable Isotope Labeling by Amino acids in Cell culture (19)), or by using chemical labeling, e.g. ICAT (Isotope Coded Affinity Tags) (20), or iTRAQ (isobaric Tags for Relative and Absolute Quantitation) (21). Relative changes in peptide concentration ratio can be analyzed directly with mass spectrometry, and hence relative changes in the proteome can be straightforwardly obtained. Another breakthrough has been achieved with the introduction of bioorthogonal labeling techniques. By allowing the time of the metabolic perturbation to coincide with the addition of an amino acid homologue that can be selectively chemically modified, it is possible to identify (a large part of) the proteome of newly synthesized proteins (22). Besides the labeling-based techniques, label-free quantitative approaches are of wide interest, and currently in use in some experiments, because no extra preparation steps prior the MS analysis are needed (14). Two label-free approaches that are currently in use are: (i) spectral-based counting techniques (e.g. spectrum count, APEX (Absolute Protein Expression), PAI (Protein Abundance Index)), and (ii) peak-intensity based methods (e.g. ion intensities) (23). Despite their overall simplicity: easy to perform, inexpensive, and an unlimited number of parallel samples can be processed, the accuracy and robustness still remain a challenge for the label-free approaches. Measurement of samples under identical conditions, with strictly standardized sample preparation procedures and a minimal fractionation, is required for proper data collection (23, 24). Hence, there is no perfect quantitative proteomics approach that would be suitable for every system yet. Thus, addressing a specific biological question and selecting the most suitable quantitative technique to answer it is crucial for obtaining reliable results in any 11 CHAPTER 1 proteomics study. A summary of some of the available quantitative proteomics approaches is shown in Table 1. Table 1. Overview of the quantitative proteomic techniques modified from (25). Dynamic range1 Proteome coverage Quantitative accuracy Nature of quantitation No. of samples Quantification to compare level Precise (<10% rsd) Precise (<10% rsd) Relative 2 MS Relative 2 or 3 MS Precise (<10% rsd) Medium (10–30% rsd) Precise Relative 2 MS Relative 2–8 MS/MS Relative/ Absolute2 2–many MS or MS/MS Metabolic labeling 15 N 1–2 Medium SILAC 1–2 Medium Chemical labeling ICAT 1–2 Poor iTRAQ 2 Medium Standard peptide 2 Poor Ion intensities (Peakintensity) 3 Good Medium (10–30% rsd) Relative Many (depends on reproducibility of chromatography) MS APEX, PAI (Spectral counting) 3 or 4 Good Poor (>30% rsd) Relative/ Absolute3 Within sample/many samples MS Medium Medium (10–30% rsd) Relative/ Absolute2 Many (depends on reproducibility of gels) Image analysis Label-free Gels 2-D gels 1 to 4, depending on dye 1 Order of magnitude 10x. 2 Absolute quantitation is possible only through relative comparison to a spiked, “known”, standard. 3 Absolute quantitation is possible only through empirical features and back-calculation using the molecular weight of the protein and total protein amount in the sample. Rsd: relative standard deviation. Metabolomics approaches aim to profile the entire change in the level of all metabolites, which are substrates, intermediates and/or end-products of metabolism, and 12 GENERAL INTRODUCTION are acted upon by enzyme networks, in a biological system, in a steady state situation or upon perturbation of the system. Commonly used techniques for quantitative metabolomics assays are: mass spectrometry (either GC-MS or LC-MS) and Nuclear Magnetic Resonance (NMR) spectroscopy (26). Three crucial steps of sample preparation, prior to the metabolomics analysis, are: (i) proper selection of growth conditions and the physiological state of the cells, (ii) the quenching step of metabolism and (iii) the metabolite extraction method. All three could remarkably influence the outcome of the experiment (26, 27). Application of MS-based metabolomics analysis is restricted to the detection of MS-ionizable metabolites, although it has a significantly lower detection limit (pM – nM) than the NMR-based quantitative metabolomics assays (> 1 µM). The major advantage of the NMR-based metabolomics technique is the reproducibility between different instruments and users (28). Despite the successful application of the various metabolomics techniques, not any of the approaches can provide a complete coverage of the entire metabolome and usually the available information is limited to a small set of known metabolites (28). Recently a novel and even more powerful approach was developed by combining high-resolution MS with NMR spectroscopy, called the MS/NMR approach. This combined MS/NMR technique was successfully applied and correctly identified a variety of different types of well-known metabolites, as well as unknown components, in a complex metabolite mixture of an Escherichia coli lysate. This approach has subsequently been named SUMMIT MS/NMR (Structures of Unknown Metabolomic Mixture components by MS/NMR) (28, 29). The major remaining challenges in quantitative metabolomics are the heterogeneity of the biological samples, systemic errors in the measurements, and sample-handling; all these three aspects still need to be further optimized (27). Fluxomics approaches concentrate on integration of the results of in vivo measurements of metabolic fluxes, in combination with the use of stoichiometric network models, to estimate the absolute fluxes through all the branches of the metabolic network of a cell. The metabolic fluxes are the end result of the interplay between gene expression, protein expression level, protein (enzyme) kinetics, and substrate/metabolite concentrations that are constrained by thermodynamic driving forces, and regulatory mechanisms. These fluxomics approaches are built upon the general quantitative metabolomics knowledge and aim at the understanding the cellular metabolic phenotype. The approaches currently used to study fluxomics make use of e.g. FBA (Flux Balance 13 13 Analysis), C-fluxomics and C-constrained FBA, as well as other approaches designed for flux measurements in terms of dynamic, stationary and semi-stationary states (30). 13 CHAPTER 1 Jointly, the omics technologies have allowed system biologists to accumulate massive amounts of data at the molecular level for various biological systems. To properly fit this massive amount of data to the corresponding biological phenomenon, another critical part of the system biology approach i.e. computational modeling needs to be further developed (25) (Figure 1). 1.2.3 Computational tools and mathematical models To make sense of the massive amount of data generated by the omics approaches (i.e. the ‘wet’ experiments), it is generally not possible to solve the relevant underlying equations analytically. Instead, system biologists use mathematical models and computer simulation to help solve them numerically (the ‘dry’ experiments) (31). There are several benefits of using mathematical models to explain and predict biological phenomena, like e.g. low cost and being less time-consuming, compared to the corresponding wet experiments. Mathematical models combined with the corresponding experimental data are well-suited to provide us with relevant pieces of information, particularly when there is discrepancy between the actual behavior measured in an experiment and the predictions from the corresponding model. Such a discrepancy indicates that the mathematical model is still missing some crucial information, and therefore can steer the design of an updated version of the model. On the other hand, novel data from wet experiments can be added to further refine the mathematical model(s) (31). The challenge for computational approaches then are two-fold: first to build up a more (detailed) model at the molecular level of a biological system through the refinement of omics data analysis, and then further navigate the follow-up experimental analyses through computational simulations with incorporation of ever-increasing detail (8). 1.3 Model organisms, i.e. systems of interest 1.3.1 Escherichia coli as a model organism Escherichia coli is a Gram-negative, facultative anaerobic, rod-shaped bacterium that is normally found in the gastrointestinal tract of a warm-blooded animal. E. coli was first identified by Theodor Escherich in 1885, and ever since was used as a model organism for both fundamental studies e.g. in biochemistry and molecular biology, and the development of new technologies, like e.g. DNA recombination, and heterologous protein production. These developments have made E. coli the most studied Gram-negative bacterium and the best understood cellular life form (32, 33). There are multiple reasons why E. coli is an ideal model organism and is used as the ‘laboratory workhorse’ for a multitude of research groups all around the world. First of all, 14 GENERAL INTRODUCTION it is a single-celled organism with a relatively small genome (i.e. ~ 4.6 million base pairs) encoding about 4,000 different proteins. These numbers are modest compared to those of the human genome which carries ~ 3 billion base pairs, encoding approximately 30,000 different proteins. Most of our present knowledge of molecular biology, such as the DNA replication mechanism, gene expression and protein synthesis, was derived from studies of this bacterium. Because of its relatively small genome, molecular genetics experiments can easily be performed and analyzed. Moreover, E. coli is able to grow well under various laboratory conditions, with a doubling time between 20 and 60 min, depending on the specific culture conditions (34). Besides that, most strains of E. coli are harmless, be it that some strains can cause disease in the gastrointestinal tract, the urinary tract and/or the central nervous system. Examples of the latter are E. coli O157:H7, enterotoxigenic E. coli (ETEC), and enteroinvasive E. coli (EIEC) (35). The complete genome sequence of many E. coli strains, such as E. coli strain K-12 substrain MG1655 (36), E. coli K-12 substrain W3110 (37), and E. coli O157:H7 (38) are now available, making further system biology studies e.g. genomics, transcriptomics, and proteomics straightforward. Because of all the reasons mentioned above it is no surprise that E. coli becomes an ideal model organism for system biology studies. Despite the massive accumulation of knowledge derived from these studies, there are still many of its basic biological aspects unresolved. A major knowledge gap resides in the regulatory network that controls metabolism, which is based on e.g. protein-metabolite interactions, post-translational modifications, and protein-protein interactions (8, 39) (for review see (40, 41)). These regulatory interactions are often not taken into account or overlooked in computational models, which then limits the predictive capability of such models (42), thus metabolic models that are integrated with a comprehensive dynamics of its regulation are still in urgent need. 1.3.2 Carbon catabolite repression (CCR) in Escherichia coli General mechanism The ability of E. coli to utilize the components of a mixture of carbon sources in a sequential manner has already been studied for many decades. A classical example of this phenomenon in E. coli is the glucose-lactose diauxic growth that was first demonstrated by Jacques Monod in 1942 (43). For E. coli, like most bacteria that can use a variety of carbon sources, however, glucose is the most preferred carbon source. Preferential utilization of glucose over other carbon sources has initially been called glucose repression and later was re-baptised as the Carbon Catabolite Repression (CCR) mechanism. CCR is one of the most extensively studied regulation mechanisms in E. coli as it is directly 15 CHAPTER 1 coupled to metabolism and hence determines growth rate. Selective use of the preferred carbon source that allows the maximum growth rate is the key survival in a competitive environment and is the driving force for the evolution of CCR in both non-pathogenic and pathogenic bacteria (44). The mechanism underlying CCR in E. coli is governed by the main glucose transporter i.e. the glucose-PTS (44, 45). Figure 2. Uptake of glucose via the glucose-PTS and its coupling to catabolism in E. coli. Glucose is phosphorylated and simultaneously translocated into the cell. When glucose is abundant, the unphosphorylatedEIIAGlc accumulates leading to inhibition of several of non-PTS transporters. This mechanism is referred as ‘inducer exclusion’. Concurrently, the process is draining the phosphorylated-EIIAGlc, resulting in deactivation of the cAMP-synthesizing adenylate cyclase (AC). Inhibition of cAMP synthesis leads to low concentrations of the cAMP-CPR complex, which then prevents cAMP-CRP-mediated activation operons required for catabolism of alternative carbon sources, generally called ‘catabolic genes’. This phenomenon is named ‘Carbon Catabolite Repression’. Besides that, synthesis of cAMP by AC is also governed by the ratio of carbon to nitrogen availability through the α-ketoglutarate (α-KG) level. The synthesis of cAMP is inhibited when the ratio of [carbon] to [nitrogen] is high, and activated when this ratio is low. Abbreviations for enzymes (in bolded green text) are as follows: Pgi, phosphoglucose isomerase; Pfk, phosphofructokinase; Fba, fructose-1,6-bisphosphate aldolase; Tpi, triose-phosphate isomerase; Gap, glyceraldehyde-3-phosphate dehydrogenase; Pgk, phosphoglycerate kinase; Pgm, phosphoglycerate mutase; Eno, enolase; Pyk, pyruvate kinase. Figure modified from (41, 45, 46). 16 GENERAL INTRODUCTION This Phosphoenolpyruvate (PEP)-dependent carbohydrate: Phosphotransferase system (PTS) is one of the three basic types of transporter systems in E. coli. The other two are the ATP-dependent transport systems and the electrochemical energy-dependent transporter systems. The PTS couples the phosphorylation of a sugar molecule to its translocation into the cell. By transferring a phosphoryl group from PEP to the carbohydrate, the phosphorylated carbohydrate is translocated across the membrane (47). PTS systems are composed of two cytoplasmic components i.e. EI and HPr and a carbohydrate-specific enzyme complex (EII) that is also called the sugar-specific PTS permease (see Figure 2). Therefore, there are many different EII complexes that are each specific for a limited range of sugars, while EI and HPr are common to all PTS sugars (45). The sugar-specific permease consists of up to four protein domains (A, B, C, D) in which at least one domain is an intrinsic membrane protein. These protein domains can be separate proteins or fused into a single polypeptide chain (47). The glucose-specific membrane protein of the glucose-PTS transporter in E .coli consists of a cytoplasmic domain and integral membrane domains. The membraneassociated complex, encoded by ptsG, is composed of an EIIB domain that is hydrophilic and hence faces the cytoplasm while the EIIC domain is an integral membrane domain. The cytoplasmic EIIA protein, encoded by crr, plays a central role in E. coli’s CCR (44). As shown in Figure 2 when glucose is absent, phosphorylated-EIIA Glc binds to and activates adenylate cyclase (AC), yielding increased levels of cAMP. High concentrations of cAMP lead to high concentrations of the cAMP-CRP complex, which then activates the expressions of catabolic genes. This phenomenon is generally termed ‘Carbon Catabolite Repression’. When glucose is present, the concentration of phosphorylated-EIIA Glc is diminished, and therefore no cAMP is synthesized, resulting in deactivation of many catabolic genes, while the level of unphosphorylated-EIIA unphosphorylated-EIIA Glc Glc is elevated. The then binds to and/or inactivates transporters and catabolic enzymes for their catabolism. This mechanism is known as ‘inducer exclusion’ (44, 45). Further study of Doucette and colleagues (2011) had revealed that accumulation of αketoglutarate (α-KG) during nitrogen limitation can directly prevent glucose uptake, by inhibition of EI. By inhibiting the first step of the PTS system, AC is indirectly deactivated Glc by the lack of phosphorylated-EIIA , resulting in reduced cAMP levels (48). In addition, You and colleagues (2013) have shown that AC activity can be directly inhibited by α-KG (which is one of carbon precursors in amino acid biosynthesis) as well as other αketoacids, such as pyruvate and oxaloacetate (46). However, this directed inhibitory effect of α-KG seems to be controversial in the light of the earlier report of Daniel and Danchin 17 CHAPTER 1 Glc (1986) that described the inhibitory effect of α-KG requires the EIIA component (49). Nevertheless, these recent studies reveal a strong link between carbon and nitrogen metabolism through the cAMP-CRP signalling complex. This cAMP-CRP mediated regulation senses not simply the carbon availability but also the balance between carbon catabolism and the capacity for anabolism (41, 46). Carbon Catabolite Repression studies based on the application of omics technologies The application of several high-throughput omics technologies has allowed an innovation in the studies of the impact of the CCR response at the molecular level. Several studies have characterized the global changes in gene expression of E. coli grown on different carbon sources by using the microarray technology that was developed late in th the 20 century. For example, Oh and colleagues (2002) have suggested that Acs (AcetylCoA synthetase) is a major responding enzyme for acetate uptake, while it was concluded that the Pta-AckA pathway is used for acetate excretion during growth on glucose based on microarray analysis (50). Subsequently, down-regulation of acs expression, triggered by CCR, was proposed to be the main cause of overflow metabolism in the form of acetate excretion (51). The acetate excretion is often observed when E. coli grows on glucose, particularly during process scale-up, and is unfavorable as it inhibits growth and diverts carbon away from the desired products. Moreover, the CCR response mediates a global change in gene expression in which gene regulation is not only controlled by cAMP-CRP-mediated regulation of gene expression and by inducer exclusion, but also through post-transcriptional-, translational-, and post-translational control mechanisms (41, 44). Post-transcriptional control mediated by small regulatory RNA (sRNA) for instance is a well known part of the CCR gene regulatory networks. For example, Spot 42 (Spf) sRNA does play a role in the CCR gene regulatory network, where its function is to counteract the global transcription factor, CRP (52). Overexpression of sRNAs, followed by transcriptomic analysis, has widely been used to identify sRNA targets. However, this technique fails to identify genes that regulated predominantly at the translation level, nor can it distinguish between direct and indirect targets of sRNA (53). The developments in mass spectrometry for the analysis of proteins have made it possible to carry out large-scale analysis of post-translational modification (PTM) by using e.g. the phosphoproteomic approach (54, 55). Several PTS components, as well as proteins involved in carbon catabolism (i.e. glycolysis), are significantly over-represented in the fraction of phosphorylated proteins of E. coli (54), indicating that protein phosphorylation 18 GENERAL INTRODUCTION also contributes significantly to the CCR response. Another PTM that was found to be highly abundant in E. coli is lysine acetylation (56). Proteomic studies of acetylated proteins in E. coli have revealed that ~70 % of the acetylated proteins is involved in metabolism and translation processes (56, 57). Lysine acetylation is well known to regulate many cellular processes in eukaryotes; however, little is known about its molecular function in prokaryotes. Subsequently, Wang and colleagues (2010) have shown that in Salmonella enterica, several proteins involved in central metabolic pathways were also shown to be extensively acetylated, and the degree of acetylation was dependent on the nature of the carbon source and highest upon the use of glucose (58). So far, commonly found modes of PTM in bacteria are acetylation and phosphorylation which in many cases are found on the same protein. This suggests a potential cross-talk between the two forms of PTM, and possibly with other PTMs like e.g. glycosylation, methylation, and ubiquitination, as well (59). Even though individual omics analysis, such as transcript profiling, are a powerful approach, especially for prediction of the function of unknown genes or poorly characterized open reading frames (50), it is becoming clear that any single omics approach may not be sufficient to provide a comprehensive view of the CCR response at the systems level (60). At the basis of cellular functioning are multiple functional units, including mRNA, protein, and metabolites, that interact with each other. This creates a very complex regulatory cascade for any living cell, as had been clearly shown in an early study of ter Kuile & Westerhoff, in which the authors showed that glycolysis is controlled at the metabolic-, the proteomic-, as well as the genomic level (61). The components of the PTS not only control carbon metabolism through CCR and inducer exclusion, and nitrogen- and phosphate metabolism, but also as we now know e.g. potassium transport (62), antibiotic resistance (63), biofilm formation (64), chemotaxis (65, 66), endotoxin production (67), and virulence expression of certain pathogens (68– 70). Despite the more than five decades of research since the discovery of the PTS, its regulatory functions as well as its cellular functioning are still far from being fully understood (see review (71)), let alone its accurate computational simulation. So far, several single- and multi-omics analyses, as well as classical molecular approaches, have been successful in contributing to the accumulation of knowledge about this system. However, the complexity underlying the molecular interactions in this system clearly warrant more detailed studies. Here we present the result of such studies, based on a multi-omics approach. 19 CHAPTER 1 1.3.3 Synechocystis sp. PCC6803 as a model organism for the design of biosolar cell factories To reduce our societies’ dependence on fossil-derived energy in favor of renewable energy, e.g. in the form of solar biofuels, it is necessary to further develop effective procedures for production of renewable energy sources. The problems aforementioned e.g. the significant rise in CO2 emission, which in turn causes e.g. acidification of the oceans and global warming, warrant an intense search for environmentally friendly and sustainable procedures for production of biofuels, as well as for chemicals so far derived from fossil fuel (72, 73). Plants, algae, and cyanobacteria are oxygenic photoautotrophs, and thus able to use the (free) energy of sunlight to fix carbon from atmospheric CO2 through a process known as photosynthesis. In oxygenic photosynthesis atmospheric CO2 is converted into sugars and subsequently into biomass. Coupled to these conversions water functions as the electron donor with simultaneous release of oxygen. Oxygenic photosynthesis accordingly makes a major contribution to the global flux of carbon. Therefore, the photosynthesisbased production of a range of (bio)chemicals has gained a lot of attention lately, because of its potential ecological friendliness and sustainability. In such applications, cyanobacteria-based production is often referred to as an even more promising process than plant- or algae-based production. As plant-derived biofuels are all produced from food crops e.g. sugarcane, corn, and soybean, this results in high costs of production, and competition with the use of land for food crops, as well as food and feed itself (i.e. firstgeneration biofuel production processes), while using biomass from agricultural waste (or from non-food crops) still faces a major challenge in the form of the efficiency of sugar extraction from lignocellulosic biomass (i.e. in second-generation biofuel production processes). Therefore, the direct conversion of energy from sunlight plus a surplus (atmospheric) CO2 into liquid fuel, using metabolically engineered photoautotrophic microorganisms (i.e. a third- or fourth-generation biofuel production process) is a promising alternative to the exploration of fossil fuels (74, 75). Summarized diagram of the first- to the fourthgeneration biofuels/chemicals production process is shown in Figure 3. Cyanobacteria have several advantages over eukaryotic algae, like e.g. higher rates of photosynthesis, faster growth, and more facile genetic engineering, thus making such applications easier to design and develop (76). 20 GENERAL INTRODUCTION Figure 3. Diagrammatic representation of 4 generations of biobased fuel- and commodity chemicals production processes. Arrows represent the carbon and energy flow between different components, catalyzed by the organisms indicated. Numbers refer to traditional crop-derived biomass (1), of which various components (2) can be processed, according to a first-, second- and third-generation process, respectively, into biofuel and/or commodity chemicals. The green arrows represent ‘direct conversion’ in third- and fourth-generation processes, to make either biodiesel with eukaryotic algae (3) or a wide range of compounds with cyanobacteria (4). Figure modified from (77). The unicellular cyanobacterium, Synechocystis sp. PCC6803 (hereafter Synechocystis) is a Gram-negative, mesophilic organism that can grow photoautotrophically, and heterotrophically on glucose (78, 79). The complete genome sequence of Synechocystis has been available since 1996 (80). It comprises ~3.6 million base pairs representing ~3,300 protein-encoding genes, of which of about half of them the function is known (80, 81). Synechocystis also displays competence for DNA uptake via natural transformation, which allows facile genetic- and metabolic engineering. Accordingly, multiple techniques for DNA recombination have been well established in this organism (82–85), as well as genetic tools for application of synthetic biology in cyanobacteria (see review (77)). Despite the available applications of genetic tools for engineering cyanobacteria to produce chemical commodities, like e.g. biodiesel, ethanol, lactic acid, ethylene, PHB 21 CHAPTER 1 (poly-β-hydroxybutyrates) (77), the metabolic alterations and physiological impacts of such modified metabolic routes in cyanobacteria are still not well characterized nor understood. One of the reasons behind this for instance, is the incomplete genome annotation (86), which then leads to only partial understanding of the metabolic reactions in cyanobacteria, as well as to little information on the aspects of gene- and metabolic regulation. Thus, further information on functional genomics i.e. transcriptomics, proteomics, and metabolomics is needed in order to obtain insight into the direct consequences of a major re-direction of the carbon flux through intermediary metabolism of engineered cyanobacteria, as well as to refine genome-scale metabolic models for these (engineered) organisms (77, 86) for a more efficient design of cyanobacterial cell factories. 1.3.4 Synechocystis-platform based ethanol and lactic acid production Deng and Coleman (1999) were the first to successfully introduce a fermentative pathway (i.e. pdc (encoding pyruvate decarboxylase) and adh (encoding alcohol dehydrogenase); see Figure 4) from the naturally ethanol-producing bacterium Zymomonas mobilis, under control of the rbcLS promoter into Synechococcus sp. PCC7942. This resulted in an ethanol -1 accumulation of ~ 5 mM (0.23 g L ) after 4 weeks of growth (87). Next, Dexter and Fu (2009) reported the production of ethanol in Synechocystis with a yield of ~10 mM (0.46 g −1 L ) after ~6 days, using a similar set of genes, but now expressed under the control of the psbA2 promoter (88). So far, the highest production of ethanol in cyanobacteria was reported by Gao and colleagues (2012). They used a strain of Synechocystis with two copies of an ethanol pathway composed of the pyruvate decarboxylase from Z. mobilis and an endogenous alcohol dehydrogenase slr1192 under control of rbc promoter, while one of the two was integrated in the genes required for biosynthesis of poly-βhydroxybutyrate (i.e. a competing pathway for pyruvate). The highest yield of ethanol −1 production reached by these investigators was 100 mM (5.5 g L ) within 26 days (89). A similar strategy for optimization of lactic acid production has been applied in Synechocystis using different sources (such as Bacillus subtilis, and Lactococcus lactis) of ldh (encoding lactate dehydrogenase) under control of different promoters, e.g. psbA2, rnpB, and trc (90, 91). The highest partitioning of carbon to lactic acid production, i.e. > 50 %, was observed in a Synechocystis strain that carried ldh from L. lactis and pk (encoding pyruvate kinase) from Enterococcus faecalis, under control of the trc promoter, with extra ldh expression from an exogenous plasmid (see Figure 4 for a summary of the relevant pathway). This strain showed that about half of the fixed CO2 was channeled into lactic 22 GENERAL INTRODUCTION acid directly instead of into biomass. However, impaired growth was also observed in both the ethanol- and the lactic acid producing strain (89, 92). Figure 4. Schematic representation of the modified central carbon metabolism of Synechocystis sp. PCC6803 strains engineered for biobased fuel- and chemicals production, as used in this study. The heterologously overexpressed pyruvate decarboxylase (PDC) and alcohol dehydrogenase (ADH) allow the direct conversion of CO2 into ethanol, driven by solar energy. Heterologous over-expression of a lactate dehydrogenase (LDH) together with overexpression of pyruvate kinase (PK) allows a major re-direction of carbon flow away from biomass, towards the production of lactic acid. Abbreviations: CBB cycle: Calvin-Benson-Bassham cycle; TCA cycle: Tricarboxylic acid cycle; RuBP: Ribulose-1,5-bisphosphate; 3PG: 3-Phosphoglycerate; 2PG: 2Phosphoglycerate; PEP: Phosphoenolpyruvate; AcCoA: Acetyl-CoA; CA: Citrate; 2OG: 2-Oxoglutarate; OA: Oxaloacetate. Such engineering may impose quite some stresses on the cyanobacteria, which is occasionally observed in the form of increased genetic instability (90, 93–95) and growth retardation of a production strain. These stresses may derive from two main factors: first the massive channeling of carbon into the end product instead of into the biomass and the corresponding metabolic rearrangements, as well as the energy burden and competition for the endogenous co-factor utilizing enzymes. Second, the product titre may reach to levels that are harmful to the cells, even though the product titres mentioned above 23 CHAPTER 1 hardly cause any stress on the wild type organisms in terms of growth retardation (96) when added to the external medium of the cells. Effects of the latter type of stress on cyanobacteria have been extensively studied through multi-omics analyses. Such analyses showed that several stress response mechanisms were up-regulated upon addition of end products like e.g. ethanol, and butanol, including heat shock proteins, and the oxidative stress response, as well as modification of the cell membrane, and cell mobility (97–100). However, fewer studies have been focused on the first type of stress i.e. a major rechanneling of intermediary metabolism in the ‘cell factories’. One consequence of the engineering of a high capacity carbon sink in cyanobacteria, however, has already been noted, i.e. an increased rate of cellular CO2 fixation (92, 101, 102). To systematically understand the molecular rearrangements coupled to the major redirection of carbon flux towards a desired product, as well as the answer to some basic questions like e.g. to which extend the capacity of cellular CO2 fixation and product formation are mutually compatible. Here, we have carried out a comparative proteomics analysis between ethanol- and lactate producing Synechocystis strain, relative to the corresponding wild type organism. We anticipate that this may help identify bottlenecks of our synthetic approach toward the construction of optimal biosolar cell factories. 1.4 Thesis outline System understanding is essential for understanding and predicting the behavior of complex biological systems. The most rational approach for this is the application of the systems biology approach. Thus, we apply functional genomics techniques that are part of the system biology approach, to study Carbon catabolite repression (CCR). This CCR is one of the most complex regulatory phenomena in E. coli. A similar approach was applied to investigate the proteomic response elicited by the massive redirection of carbon flow in Synechocystis cell factories. Chapter 1 gives an overview of the system biology approach and of some systems of interest, as well as an update of current issues regarding those systems of interest. Chapter 2 serves to establish a method that allows accurate sampling for time-series multi-omics analysis in E. coli responding to a glucose-pulse. In addition, the physiological response of this organism, in a dynamic mode, was recorded. Chapter 3 provides information on the dynamic response of the transcriptome and proteome of E. coli exerted by glucose repression, following the well established method described in Chapter 2. Moreover, based on the (lack of) correlation between those two data sets, posttranscriptionally regulated (PTR) genes involved in this response were identified. Chapter 4 concentrates on the contribution of small regulatory RNA (sRNA) to the gene regulatory 24 GENERAL INTRODUCTION circuits that function in glucose repression. A total of 46 sRNAs changed their expression significantly. In addition, in silico analysis of these sRNAs has revealed possible sRNA regulators of several of the PTR genes identified in Chapter 3. In Chapter 5 we show that the high rate of carbon partitioning observed in the ethanol-producing Synechocystis strain is accommodated by the up-regulation of several proteins involved in the initial stages of CO2 fixation, as well as by a general decrease in abundance of the translation machinery of the cells. In contrast, no significant change in the expression level of the enzymes of specific metabolic pathways was observed in the lactic acid-producing Synechocystis strain. However, two CRISPR associated proteins were surprisingly found to have significantly increased expression levels in this strain. Chapter 6 then summarizes and discusses the results obtained in this thesis in the light of the currently available knowledge from the literature. References 1. Allen MR, Frame DJ, Huntingford C, Jones CD, Lowe JA, Meinshausen M, Meinshausen N. 2009. Warming caused by cumulative carbon emissions towards the trillionth tonne. Nature 458:1163–1166. 2. Doney SC, Fabry VJ, Feely RA, Kleypas JA. 2009. Ocean Acidification: The Other CO2 Problem. Annu Rev Mar Sci 1:169–192. 3. McNeil BI, Matear RJ. 2008. Southern Ocean acidification: A tipping point at 450-ppm atmospheric CO2. 4. IPCC. 2014. Climate Change 2014: Impacts, Adaptation, and Vulnerability. Part A: Global and Sectoral Proc Natl Acad Sci U S A 105:18860–18864. Aspects. Contribution of Working Group II to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change [Field, C.B., V.R. Barros, D.J. Dokken, K.J. Mach, M.D. Mastrandrea, T.E. Bilir, M. Chatterjee, K.L. Ebi, Y.O. Estrada, R.C. Genova, B. Girma, E.S. Kissel, A.N. Levy, S. MacCracken, P.R. Mastrandrea, and L.L. White (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA. 5. Dhillon RS, von Wuehlisch G. 2013. Mitigation of global warming through renewable biomass. Biomass 6. Kitano H. 2002. Looking beyond the details: a rise in system-oriented approaches in genetics and molecular 7. Bruggeman FJ, Westerhoff HV. 2007. The nature of systems biology. Trends Microbiol 15:45–50. 8. Heinemann M, Sauer U. 2010. Systems biology of microbial metabolism. Curr Opin Microbiol 13:337–343. 9. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Bioenergy 48:75–89. biology. Curr Genet 41:1–10. Causton HC, Gaasterland T, Glenisson P, Holstege FC, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, Vingron M. 2001. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 29:365–371. 10. Wang Z, Gerstein M, Snyder M. 2009. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10:57–63. 11. Ozsolak F, Milos PM. 2011. RNA sequencing: advances, challenges and opportunities. Nat Rev Genet 12:87– 98. 12. Metzker ML. 2010. Sequencing technologies — the next generation. Nat Rev Genet 11:31–46. 25 CHAPTER 1 13. Anderson NL, Anderson NG. 1998. Proteome and proteomics: new technologies, new concepts, and new words. Electrophoresis 19:1853–1861. 14. Elliott MH, Smith DS, Parker CE, Borchers C. 2009. Current trends in quantitative proteomics. J Mass Spectrom 44:1637–1660. 15. O’Farrell PH. 1975. High resolution two-dimensional electrophoresis of proteins. J Biol Chem 250:4007– 4021. 16. Gerber SA, Rush J, Stemman O, Kirschner MW, Gygi SP. 2003. Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS. Proc Natl Acad Sci 100:6940–6945. 17. Beynon RJ, Doherty MK, Pratt JM, Gaskell SJ. 2005. Multiplexed absolute quantification in proteomics using artificial QCAT proteins of concatenated signature peptides. Nat Methods 2:587–589. 18. Beynon RJ, Pratt JM. 2005. Metabolic Labeling of Proteins for Proteomics. Mol Cell Proteomics 4:857–872. 19. Ong S-E, Blagoev B, Kratchmarova I, Kristensen DB, Steen H, Pandey A, Mann M. 2002. Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol Cell Proteomics MCP 1:376–386. 20. Gygi SP, Rist B, Gerber SA, Turecek F, Gelb MH, Aebersold R. 1999. Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat Biotechnol 17:994–999. 21. Ross PL, Huang YN, Marchese JN, Williamson B, Parker K, Hattan S, Khainovski N, Pillai S, Dey S, Daniels S, Purkayastha S, Juhasz P, Martin S, Bartlet-Jones M, He F, Jacobson A, Pappin DJ. 2004. Multiplexed Protein Quantitation in Saccharomyces cerevisiae Using Amine-reactive Isobaric Tagging Reagents. Mol Cell Proteomics 3:1154–1169. 22. Kramer G, Sprenger RR, Back J, Dekker HL, Nessen MA, van Maarseveen JH, de Koning LJ, Hellingwerf KJ, de Jong L, de Koster CG. 2009. Identification and quantitation of newly synthesized proteins in Escherichia coli by enrichment of azidohomoalanine-labeled peptides with diagonal chromatography. Mol Cell Proteomics MCP 8:1599–1611. 23. Bantscheff M, Schirle M, Sweetman G, Rick J, Kuster B. 2007. Quantitative mass spectrometry in proteomics: a critical review. Anal Bioanal Chem 389:1017–1031. 24. Cox J, Hein MY, Luber CA, Paron I, Nagaraj N, Mann M. 2014. Accurate Proteome-wide Label-free Quantification by Delayed Normalization and Maximal Peptide Ratio Extraction, Termed MaxLFQ. Mol Cell Proteomics 13:2513–2526. 25. Schulze WX, Usadel B. 2010. Quantitation in Mass-Spectrometry-Based Proteomics. Annu Rev Plant Biol 61:491–516. 26. Rabinowitz JD. 2007. Cellular metabolomics of Escherchia coli. Expert Rev Proteomics 4:187–198. 27. Noack S, Wiechert W. 2014. Quantitative metabolomics: a phantom? Trends Biotechnol 32:238–244. 28. Bingol K, Brüschweiler R. 2015. Two elephants in the room: new hybrid nuclear magnetic resonance and mass spectrometry approaches for metabolomics. Curr Opin Clin Nutr Metab Care. 29. Bingol K, Bruschweiler-Li L, Yu C, Somogyi A, Zhang F, Brüschweiler R. 2015. Metabolomics beyond spectroscopic databases: a combined MS/NMR strategy for the rapid identification of new metabolites in complex mixtures. Anal Chem 87:3864–3870. 30. Winter G, Krömer JO. 2013. Fluxomics – connecting ‘omics analysis and phenotypes. Environ Microbiol 15:1901–1916. 31. Fischer HP. 2008. Mathematical Modeling of Complex Biological Systems. Alcohol Res Health 31:49–59. 32. Hacker J, Blum-Oehler G. 2007. In appreciation of Theodor Escherich. Nat Rev Microbiol 5:902–902. 26 GENERAL INTRODUCTION 33. Neidhardt FC, Curtiss IR, Ingraham JL, Lin ECC, Low KB, Magasanik B, Reznikoff WS, Riley M, Schaechter M, Umbarger HE. 1996. Escherichia coli and Salmonella : cellular and molecular biology. ASM Press, Washington, D.C. 34. Cooper GM. 2000. The Cell: A Molecular Approach2nd edition. Sinauer Associates Inc, Washington, DC. 35. Nataro JP, Kaper JB. 1998. Diarrheagenic Escherichia coli. Clin Microbiol Rev 11:142–201. 36. Blattner FR, Plunkett G, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF, Gregor J, Davis NW, Kirkpatrick HA, Goeden MA, Rose DJ, Mau B, Shao Y. 1997. The complete genome sequence of Escherichia coli K-12. Science 277:1453–1462. 37. Hayashi K, Morooka N, Yamamoto Y, Fujita K, Isono K, Choi S, Ohtsubo E, Baba T, Wanner BL, Mori H, Horiuchi T. 2006. Highly accurate genome sequences of Escherichia coli K-12 strains MG1655 and W3110. Mol Syst Biol. 38. Hayashi T, Makino K, Ohnishi M, Kurokawa K, Ishii K, Yokoyama K, Han CG, Ohtsubo E, Nakayama K, Murata T, Tanaka M, Tobe T, Iida T, Takami H, Honda T, Sasakawa C, Ogasawara N, Yasunaga T, Kuhara S, Shiba T, Hattori M, Shinagawa H. 2001. Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12. DNA Res Int J Rapid Publ Rep Genes Genomes 8:11–22. 39. Gerosa L, Sauer U. 2011. Regulation and control of metabolic fluxes in microbes. Curr Opin Biotechnol 22:566–575. 40. van Heeswijk WC, Westerhoff HV, Boogerd FC. 2013. Nitrogen Assimilation in Escherichia coli: Putting Molecular Data into a Systems Perspective. Microbiol Mol Biol Rev MMBR 77:628–695. 41. Chubukov V, Gerosa L, Kochanowski K, Sauer U. 2014. Coordination of microbial metabolism. Nat Rev Microbiol 12:327–340. 42. Durot M, Bourguignon P-Y, Schachter V. 2009. Genome-scale models of bacterial metabolism: reconstruction and applications. FEMS Microbiol Rev 33:164–190. 43. Monod J. 1942. Recherches sur la croissance des cultures bactériennes,. Hermann & cie, Paris. 44. Gorke B, Stulke J. 2008. Carbon catabolite repression in bacteria: many ways to make the most out of nutrients. Nat Rev 6:613–624. 45. Deutscher J, Francke C, Postma PW. 2006. How phosphotransferase system-related protein phosphorylation regulates carbohydrate metabolism in bacteria. Microbiol Mol Biol Rev MMBR 70:939–1031. 46. You C, Okano H, Hui S, Zhang Z, Kim M, Gunderson CW, Wang Y-P, Lenz P, Yan D, Hwa T. 2013. Coordination of bacterial proteome with metabolism by cyclic AMP signalling. Nature 500:301–306. 47. Lengeler JW, Jahreis K, Wehmeier UF. 1994. Enzymes II of the phospho enol pyruvate-dependent phosphotransferase systems: Their structure and function in carbohydrate transport. Biochim Biophys Acta BBA - Bioenerg 1188:1–28. 48. Doucette CD, Schwab DJ, Wingreen NS, Rabinowitz JD. 2011. α-ketoglutarate coordinates carbon and nitrogen utilization via Enzyme I inhibition. Nat Chem Biol 7:894–901. 49. Daniel J, Danchin A. 1986. 2-Ketoglutarate as a possible regulatory metabolite involved in cyclic AMPdependent catabolite repression in Escherichia coli K12. Biochimie 68:303–310. 50. Oh MK, Rohlin L, Kao KC, Liao JC. 2002. Global expression profiling of acetate-grown Escherichia coli. J Biol Chem 277:13175–13183. 51. Valgepea K, Adamberg K, Nahku R, Lahtvee PJ, Arike L, Vilu R. 2010. Systems biology approach reveals that overflow metabolism of acetate in Escherichia coli is triggered by carbon catabolite repression of acetyl-CoA synthetase. BMC Syst Biol 4:166–0509–4–166. 27 CHAPTER 1 52. Beisel CL, Storz G. 2011. The base-pairing RNA spot 42 participates in a multioutput feedforward loop to help enact catabolite repression in Escherichia coli. Mol Cell 41:286–297. 53. Beisel CL, Storz G. 2010. Base pairing small RNAs and their roles in global regulatory networks. FEMS Microbiol Rev 34:866–882. 54. Macek B, Gnad F, Soufi B, Kumar C, Olsen JV, Mijakovic I, Mann M. 2008. Phosphoproteome Analysis of E. coli Reveals Evolutionary Conservation of Bacterial Ser/Thr/Tyr Phosphorylation. Mol Cell Proteomics 7:299– 307. 55. Soares NC, Spät P, Krug K, Macek B. 2013. Global Dynamics of the Escherichia coli Proteome and Phosphoproteome During Growth in Minimal Medium. J Proteome Res 12:2611–2621. 56. Zhang J, Sprung R, Pei J, Tan X, Kim S, Zhu H, Liu C-F, Grishin NV, Zhao Y. 2009. Lysine Acetylation Is a Highly Abundant and Evolutionarily Conserved Modification in Escherichia Coli. Mol Cell Proteomics MCP 8:215– 225. 57. Zhang K, Zheng S, Yang JS, Chen Y, Cheng Z. 2013. Comprehensive Profiling of Protein Lysine Acetylation in Escherichia coli. J Proteome Res 12:844–851. 58. Wang Q, Zhang Y, Yang C, Xiong H, Lin Y, Yao J, Li H, Xie L, Zhao W, Yao Y, Ning Z-B, Zeng R, Xiong Y, Guan K-L, Zhao S, Zhao G-P. 2010. Acetylation of Metabolic Enzymes Coordinates Carbon Source Utilization and Metabolic Flux. Science 327:1004–1007. 59. Soufi B, Soares NC, Ravikumar V, Macek B. 2012. Proteomics reveals evidence of cross-talk between protein modifications in bacteria: focus on acetylation and phosphorylation. Curr Opin Microbiol 15:357–363. 60. Zhang W, Li F, Nie L. 2010. Integrating multiple “omics” analysis for microbial biology: application and methodologies. Microbiology 156:287–301. 61. ter Kuile BH, Westerhoff HV. 2001. Transcriptome meets metabolome: hierarchical and metabolic regulation of the glycolytic pathway. FEBS Lett 500:169–171. 62. Lee C-R, Cho S-H, Yoon M-J, Peterkofsky A, Seok Y-J. 2007. Escherichia coli enzyme IIANtr regulates the K+ transporter TrkA. Proc Natl Acad Sci U S A 104:4124–4129. 63. Nishino K, Senda Y, Yamaguchi A. 2008. CRP Regulator Modulates Multidrug Resistance of Escherichia coli by Repressing the mdtEF Multidrug Efflux Genes. J Antibiot (Tokyo) 61:120–127. 64. Beloin C, Roux A, Ghigo J-M. 2008. Escherichia coli biofilms. Curr Top Microbiol Immunol 322:249–289. 65. Lux R, Jahreis K, Bettenbrock K, Parkinson JS, Lengeler JW. 1995. Coupling the phosphotransferase system and the methyl-accepting chemotaxis protein-dependent chemotaxis signaling pathways of Escherichia coli. Proc Natl Acad Sci U S A 92:11583–11587. 66. Lux R, Munasinghe VRN, Castellano F, Lengeler JW, Corrie JET, Khan S. 1999. Elucidation of a PTS– Carbohydrate Chemotactic Signal Pathway in Escherichia coli Using a Time-resolved Behavioral Assay. Mol Biol Cell 10:1133–1146. 67. Varga J, Stirewalt VL, Melville SB. 2004. The CcpA Protein Is Necessary for Efficient Sporulation and Enterotoxin Gene (cpe) Regulation in Clostridium perfringens. J Bacteriol 186:5221–5229. 68. Müller CM, Åberg A, Straseviçiene J, Emődy L, Uhlin BE, Balsalobre C. 2009. Type 1 Fimbriae, a Colonization Factor of Uropathogenic Escherichia coli, Are Controlled by the Metabolic Sensor CRP-cAMP. PLoS Pathog 5. 69. Shelburne SA, Keith D, Horstmann N, Sumby P, Davenport MT, Graviss EA, Brennan RG, Musser JM. 2008. A direct link between carbohydrate utilization and virulence in the major human pathogen group A Streptococcus. Proc Natl Acad Sci U S A 105:1698–1703. 70. Skorupski K, Taylor RK. 1997. Cyclic AMP and its receptor protein negatively regulate the coordinate expression of cholera toxin and toxin-coregulated pilus in Vibrio cholerae. Proc Natl Acad Sci U S A 94:265– 270. 28 GENERAL INTRODUCTION 71. Deutscher J, Aké FMD, Derkaoui M, Zébré AC, Cao TN, Bouraoui H, Kentache T, Mokhtari A, Milohanic E, Joyet P. 2014. The Bacterial Phosphoenolpyruvate:Carbohydrate Phosphotransferase System: Regulation by Protein Phosphorylation and Phosphorylation-Dependent Protein-Protein Interactions. Microbiol Mol Biol Rev 78:231–256. 72. Schubert C. 2006. Can biofuels finally take center stage? Nat Biotechnol 24:777–784. 73. Scharlemann JPW, Laurance WF. 2008. How Green Are Biofuels? Science 319:43–44. 74. Angermayr SA, Hellingwerf KJ, Lindblad P, Teixeira de Mattos MJ. 2009. Energy biotechnology with cyanobacteria. Curr Opin Biotechnol 20:257–263. 75. Machado IMP, Atsumi S. 2012. Cyanobacterial biofuel production. J Biotechnol 162:50–56. 76. Quintana N, Kooy FV der, Rhee MDV de, Voshol GP, Verpoorte R. 2011. Renewable energy from Cyanobacteria: energy production optimization by metabolic pathway engineering. Appl Microbiol Biotechnol 91:471–490. 77. Wang B, Wang J, Zhang W, Meldrum DR. 2012. Application of synthetic biology in cyanobacteria and algae. Front Microbiol 3. 78. Rippka R, Deruelles J, Waterbury JB, Herdman M, Stanier RY. 1979. Generic Assignments, Strain Histories and Properties of Pure Cultures of Cyanobacteria. Journal of General Microbiology 111:1–61. 79. Stal LJ, Moezelaar R. 1997. Fermentation in cyanobacteria1. FEMS Microbiol Rev 21:179–211. 80. Kaneko T, Sato S, Kotani H, Tanaka A, Asamizu E, Nakamura Y, Miyajima N, Hirosawa M, Sugiura M, Sasamoto S, Kimura T, Hosouchi T, Matsuno A, Muraki A, Nakazaki N, Naruo K, Okumura S, Shimpo S, Takeuchi C, Wada T, Watanabe A, Yamada M, Yasuda M, Tabata S. 1996. Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions. DNA Res Int J Rapid Publ Rep Genes Genomes 3:109–136. 81. Nakao M, Okamoto S, Kohara M, Fujishiro T, Fujisawa T, Sato S, Tabata S, Kaneko T, Nakamura Y. 2010. CyanoBase: the cyanobacteria genome database update 2010. Nucleic Acids Res 38:D379–D381. 82. Dzelzkalns VA, Bogorad L. 1986. Stable transformation of the cyanobacterium Synechocystis sp. PCC 6803 induced by UV irradiation. J Bacteriol 165:964–971. 83. Chauvat F, Vries LD, Ende AV der, Arkel GAV. 1986. A host-vector system for gene cloning in the cyanobacterium Synechocystis PCC 6803. Mol Gen Genet MGG 204:185–191. 84. Vermaas W. 1996. Molecular genetics of the cyanobacteriumSynechocystis sp. PCC 6803: Principles and possible biotechnology applications. J Appl Phycol 8:263–273. 85. Kufryk GI, Sachet M, Schmetterer G, Vermaas WFJ. 2002. Transformation of the cyanobacterium Synechocystis sp. PCC 6803 as a tool for genetic mapping: optimization of efficiency. FEMS Microbiol Lett 206:215–219. 86. Berla BM, Saha R, Immethun CM, Maranas CD, Moon TS, Pakrasi HB. 2013. Synthetic biology of cyanobacteria: unique challenges and opportunities. Front Microbiol 4. 87. Deng M-D, Coleman JR. 1999. Ethanol Synthesis by Genetic Engineering in Cyanobacteria. Appl Environ Microbiol 65:523–528. 88. Dexter J, Fu P. 2009. Metabolic engineering of cyanobacteria for ethanol production. Energy Environ Sci 2:857. 89. Gao Z, Zhao H, Li Z, Tan X, Lu X. 2012. Photosynthetic production of ethanol from carbon dioxide in genetically engineered cyanobacteria. Energy Environ Sci 5:9857–9865. 90. Angermayr SA, Paszota M, Hellingwerf KJ. 2012. Engineering a cyanobacterial cell factory for production of lactic acid. Appl Environ Microbiol 78:7098–7106. 29 CHAPTER 1 91. Angermayr SA, Hellingwerf KJ. 2013. On the Use of Metabolic Control Analysis in the Optimization of Cyanobacterial Biosolar Cell Factories. J Phys Chem B 117:11169–11175. 92. Angermayr SA, van der Woude AD, Correddu D, Vreugdenhil A, Verrone V, Hellingwerf KJ. 2014. Exploring metabolic engineering design principles for the photosynthetic production of lactic acid by Synechocystis sp. PCC6803. Biotechnol Biofuels 7:99–6834–7–99. eCollection 2014. 93. Kusakabe T, Tatsuke T, Tsuruno K, Hirokawa Y, Atsumi S, Liao JC, Hanai T. 2013. Engineering a synthetic pathway in cyanobacteria for isopropanol production directly from carbon dioxide and light. Metab Eng 20:101–108. 94. Jacobsen JH, Frigaard N-U. 2014. Engineering of photosynthetic mannitol biosynthesis from CO2 in a cyanobacterium. Metab Eng 21:60–70. 95. Jones PR. 2014. Genetic Instability in Cyanobacteria – An Elephant in the Room? Front Bioeng Biotechnol 2. 96. Wijffels RH, Kruse O, Hellingwerf KJ. 2013. Potential of industrial biotechnology with cyanobacteria and eukaryotic microalgae. Curr Opin Biotechnol 24:405–413. 97. Qiao J, Wang J, Chen L, Tian X, Huang S, Ren X, Zhang W. 2012. Quantitative iTRAQ LC-MS/MS proteomics reveals metabolic responses to biofuel ethanol in cyanobacterial Synechocystis sp. PCC 6803. J Proteome Res 11:5286–5300. 98. Wang J, Chen L, Huang S, Liu J, Ren X, Tian X, Qiao J, Zhang W. 2012. RNA-seq based identification and mutant validation of gene targets related to ethanol resistance in cyanobacterial Synechocystis sp. PCC 6803. Biotechnol Biofuels 5:89–6834–5–89. 99. Tian X, Chen L, Wang J, Qiao J, Zhang W. 2013. Quantitative proteomics reveals dynamic responses of Synechocystis sp. PCC 6803 to next-generation biofuel butanol. J Proteomics 78:326–345. 100. Anfelt J, Hallström B, Nielsen J, Uhlén M, Hudson EP. 2013. Using Transcriptomics To Improve Butanol Tolerance of Synechocystis sp. Strain PCC 6803. Appl Environ Microbiol 79:7419–7427. 101. Ducat DC, Silver PA. 2012. Improving Carbon Fixation Pathways. Curr Opin Chem Biol 16:337–344. 102. Lan EI, Liao JC. 2012. ATP drives direct photosynthetic production of 1-butanol in cyanobacteria. Proc Natl Acad Sci U S A 109:6018–6023. 30 Chapter 2: Molecular physiology of the dynamic regulation of Carbon Catabolite Repression in Escherichia coli Orawan Borirak, Martijn Bekker, and Klaas J. Hellingwerf This chapter has been published as: Borirak O, Bekker M, Hellingwerf KJ. Molecular physiology of the dynamic regulation of carbon catabolite repression in Escherichia coli. Microbiology. 2014 Jun 1;160(Pt_6):1214–23. CHAPTER 2 2.1 Abstract Here we report on the use of the chemostat as an optimal device to create time-invariant conditions that allow accurate sampling for various omics assays in Escherichia coli, in combination with recording of the dynamics of the physiological transition in the organism under study that accompany the initiation of glucose repression. E. coli cells respond to the addition of glucose not only with the well-known transcriptional response, as was revealed through qPCR analysis of the transcript levels of key genes from the CRP regulon, but also with an increased growth rate and a transient decrease of the efficiency of its aerobic catabolism. Less than half of a doubling time is required for the organism to recover to maximal values of growth rate and efficiency. Furthermore, calculations based + - on our results show that the specific glucose uptake rate (qs) and the H /e ratio increase -1 proportionally, up to a growth rate of 0.4 h , while biomass yield on glucose (YX/s) drops during the first 15 minutes, followed by a gradual recovery. Surprisingly, the growth yields after the recovery phase show values even higher than the maximum theoretical yield. Possible explanations for these high yields are discussed. 2.2 Introduction Systems analysis of bacterial adaptation has revealed that the flexibility in the prokaryotic part of the tree of life is primarily facilitated by modulation of the rate of transcription of individual genes of an organism, as well as regulons thereof (1, 2). However, considerable uncertainty still exists on the question to what extent also post-transcriptional regulation contributes to this adaptability. To clarify this issue, well-defined experimental systems have to be made available, and characterized, that allow parallel sampling for transcript profiling and proteomics, under conditions in which a well-defined transcriptional response can be reproducibly elicited. Glucose-induced Carbon Catabolite Repression (CCR) is one of the most extensively studied processes of global transcriptional regulation in Escherichia coli (for review see (3)). During the process of glucose repression, E. coli cells lose the ability to catabolize any of a large series of sugars as their carbon- and energy source (4, 5): Increasing glucose concentrations decrease the relative phosphorylation level of Enzyme II A (glucose), which in turn activates adenylate cyclase and directly inhibits several sugar uptake systems. The increased levels of cAMP that are the result of this activation, together with the cAMPbinding protein CRP, globally modulate transcription in this enterobacterium. Many studies that have been carried out so far of the CCR use cultivation of E. coli cells in shake flasks in either a complex medium (6, 7) or in a minimal medium (8–11). The conditions generated/present in such a system are often falsely assumed to represent a quasi-steady 32 DYNAMIC REGULATION OF CCR IN E. COLI state, because in such batch cultures the E. coli cells are generating and experiencing significant changes in their physico-chemical (pH, pO2, etc.) environment during the course of such an experiment (12). As an alternative approach, other studies compared the characteristics of a wild type E. coli strain with that of (a) mutant(s) lacking e.g. the main CCR regulator CRP, for analysis of the genes regulated by the catabolite repression response. Use of such mutants, however, could mask the occurrence of minor or even major physiological responses to these physico-chemical changes. In addition, such studies of comparison of the wild type organism with selected deletion mutants suffer from the fact that the mutations may have indirect regulatory effects and preclude an analysis of the dynamics of the initiation of the glucose repression response. In this investigation, therefore, the chemostat culturing technique was selected as a tool to study omic responses related to carbon catabolite repression. This technique allows for maintenance of a chemically constant environment, in which microorganisms can be cultivated in a highly reproducible manner, while the growth rate of the organism and environmental conditions are kept constant (13, 14). The steady state condition in the chemostat provides an unbiased physiological condition in which the growth (rate) of E. coli can be fixed and can be limited by one single nutrient, like e.g. the carbon- and energy source glucose. This condition (i.e. of glucose limitation), thus creates an optimal reference condition for complete lack of CCR. Furthermore, this culturing technique allows us to use the wild type E. coli strain throughout all experimental procedures and therefore eliminates any bias that may be introduced through genetic manipulation. Chemostat culturing has been applied in many studies within the field of microbial physiology, especially for optimization of metabolic fluxes for industrial application (15). Moreover, many studies have reported that chemostat cultures can provide very reliable and reproducible omics data, that are well-interpretable in terms of altered physiology of the cell (16–19). The latter is substantially contributed to by the fact that in the chemostat the effects of environmental modulation, and of variation in growth rate, can be stringently separated (see e.g. (20, 21)). Here we report on a method for studying the CCR response in E. coli, using glucoselimited steady state growth conditions in the chemostat, as the condition in which carbon catabolite repression can be induced with addition of a saturating pulse of glucose, while simultaneously interrupting the chemostat pump (see Figure 1 for a schematic outline of the experiment). Using this approach, the growth environment is kept, though not fully, still relatively constant throughout the experiment. Furthermore, this experimental design is unique in that it neither requires the use of mutants nor an intermediate washing of the 33 CHAPTER 2 cells (6, 7, 22). The characteristics of the transcriptional response, elicited upon addition of the glucose pulse, partially concur with previously obtained data (6). However, we also uncover data that contrasts the results from previous studies. We show that these observed discrepancies are in accordance with the physiological response resulting from CCR. Future work will aim at simultaneous recording of transcriptome and proteome data in this system. Such data, together with a stringent statistical analysis, will allow us to compose a list of genes that are subject to post-transcriptional regulation during CCR. Figure 1. Schematic representation of the chemostat set up and of the experimental design selected. 2.3 Materials and Methods Bacterial strain and growth conditions Escherichia coli MG1655 was grown under glucose-limitation in 2 L chemostat vessels -1 (Applikon, The Netherlands) with a working volume of 1 L, at a dilution rate of 0.2 h . A defined minimal medium (23) was used with the following composition: 2.6 g l −1 −1 −1 −1 −1 NaH2PO4·2H2O, 0.7 g l KCl, 0.25 g l MgCl2·6H2O, 2.7 g l NH4Cl, 0.3 g l Na2SO4, 0.004 g −1 −4 −1 −3 −1 −3 l Ca2Cl2·2H2O, trace elements: 4.1×10 g l ZnO, 5.4×10 g l FeCl3·6H2O, 2.0×10 g l −4 MnCl2·4H2O, 1.7×10 −6 −1 g l −4 −5 CuCl2·2H2O, 4.8×10 CoCl2·6H2O, 6.4×10 −1 −l −l g l H3BO3, and −1 4.0×10 g l Na2MoO4·2H2O. The medium was supplemented with 3.6 g l glucose, 7.6 g −1 l nitriloacetic acid as a chelator and 3×10 34 −5 −1 g l selenite. For the reference culture, DYNAMIC REGULATION OF CCR IN E. COLI 15 NH4Cl was used as an N source, instead of NH4Cl. Temperature was controlled at 37°C and pH was maintained at 6.9 ± 0.1 by titrating with sterile 4 M NaOH. The culture was −1 aerated with 0.5 lmin water-saturated air and the stirring rate was set at 600 rpm. Pre−1 cultures were grown in the same medium, except that 26 g l sodium phosphate was used to increase its buffering capacity. Initiation of catabolite repression (glucose pulse experiment) A glucose pulse experiment was performed by directly injecting a concentrated glucose solution into the bioreactor, to achieve a final concentration of 9 g l −1 glucose. The injection volume was approximately 2.5% of the total working volume. The medium feed was stopped simultaneously and cells were harvested at 5, 15, 30, 60 and 120 minutes after this glucose pulse. Cells harvested at steady state were used as a time 0 reference sample. Analytical procedures and metabolic flux calculations in chemostat cells Bacterial dry weight, under conditions of steady state growth in the chemostat, was measured as described previously (24). The concentration of the residual glucose and of a series of fermentation products in the fermenter vessel were determined by highperformance liquid chromatography (LKB) equipped with a REZEX organic acid analysis column (Phenomenex), and a 7RI 1530 refractive index detector (Jasco). Samples were analysed at 40 °C by using 7.2 mM H2SO4 as the eluent. AZUR chromatography software was used for data integration. CO2 production and O2 consumption were measured by a Servomex CO2 analyser and a Servomex O2 analyser, respectively, through the off-gas of the bioreactor. Metabolic fluxes were calculated as described previously (25). Flux calculations during dynamical transitions Specific rates of glucose consumption (qs) and acetate production (qacetate) were calculated using the average flux analysis method as described in (26). Briefly, the time after a glucose pulse was divided into 5 different metabolic phases (periods I – V), as -1 shown in Figure 3a. Then the estimated average specific growth rate, μ (h ) and biomass -1 yield per substrate (g g ) for each period (i.e. I – V) were calculated as follows: qi = μ ΔC ΔB in which Ci is metabolite concentration i (in mM) at a given time point, while B is the -1 biomass concentration (in g l ) in the bioreactor. 35 CHAPTER 2 RNA isolation and Real-time quantitative PCR Cells were harvested from the chemostat by conventional rapid sampling (using the slight overpressure in the chemostat) directly into RNAprotect Bacterial reagent (Qiagen) for stabilization of mRNA, according to the manufacturer's instructions. This sampling method has a time resolution slightly better than 1 sec (27). Cell pellets were immediately frozen with liquid nitrogen and stored at -80°C until use. RNA was isolated by using the phenol/chloroform extraction method, modified from (28), and cleaned with the help of an RNeasy mini kit (Qiagen). 0.4 µg RNA was used to synthesize the first strand cDNA using RevertAid™ First Strand cDNA Synthesis Kits (Fermentas), following the manufacturer’s protocol. The final concentrations of cDNA and primers in a total volume of 20 µl were 20 ng and 80 nM, respectively. The primers used in this study are shown in Table S1. Realtime PCR was performed using the 7300 Real Time PCR system (Applied Biosystems) and universal cycling conditions (2 min at 50°C, 10 min at 95°C, 40 cycles of 15 s at 95°C, and 1 min at 60°C). Cycle threshold (CT) values were determined by automated threshold analysis with ABI Prism version 1.0 software. The amplification efficiencies were determined by serial dilution. Data were normalized using rimM as the reference gene. Relative changes in gene expression level were calculated with the following formula (29): CT - CT tar/t0 tar/ti 2-ΔΔCT = 2CT - CT ref/t0 ref/ti 2 where tar = target gene, t0 = sample at time 0, ti = sample at time i, ref = reference gene. 2.4 Results Validation of the experimental design Verification of the (initial) CCR-free growth condition In this study, a glucose-limited chemostat was used as a tool to generate CCR-free growth conditions. Since nearly all glucose is consumed in glucose-limited chemostats (30), the CCR should be absent, according to the currently available models for regulation of catabolism. In order to verify the lack of CCR under these conditions, E. coli was grown -1 in chemostat cultures at a growth rate of 0.2 h using the glucose-limited growth medium (that is used throughout this study) with the addition of 20 mM lactose. Immediate and complete consumption of both carbon sources would confirm the lack of CCR. Carbohydrate analysis indeed showed that during steady state growth in the chemostat, both glucose and lactose were consumed to undetectably low levels, indicating that the CCR is not operative under the steady state conditions selected (data not shown; see also (31)). 36 DYNAMIC REGULATION OF CCR IN E. COLI Validation of unrestricted oxygen availability after induction of CCR As previously reported, maintaining fully aerobic conditions during high cell densities in fast growing cultures is challenging from a technical point of view (e.g. (12)). We therefore assessed the minimally required conditions for this. After reaching the steady state of CCR-free conditions of continuous growth, the culture was perturbed with −1 addition of glucose to a final concentration of 50 mM (i.e. 9 g l ). Medium inflow into the reactor vessel was concomitantly interrupted, thus allowing the culture to grow as a batch culture whilst pH, air supply rate, stirring rate (at 400 RPM) and temperature were all kept constant. To select a proper sampling period for subsequent transcript and proteome analyses, growth, glucose consumption, and the formation of fermentation end-products were monitored after this perturbation of the culture with a glucose pulse. When using a stirring rate of 400 RPM, at 80 minutes after the glucose pulse, formate starts to be formed. Since the formate-forming enzyme, Pyruvate Formate Lyase (PFL), is rapidly inactivated by small amounts of oxygen, this indicates that after these 80 minutes the culture is no longer fully aerobic. Presumably related to this, the growth rate reaches a maximum after 100 minutes (Figure S1). Based on these results it was decided to increase the stirring rate in the chemostat culture vessel from 400 rpm to 600 rpm, to prevent lack of dissolved oxygen in the growth medium for the duration of the experiment. Indeed using a higher stirring rate resulted in lack of formation of formate and also led to continued exponential growth for the entire 120 minute period. This confirms that in this specific setup a stirring rate of > 400 RPM is required for full aerobiosis during exponential growth after a glucose pulse, confirming previous observations from (12) on the practical challenges of aerobic growth in batch culture experiments. Validation of the initiation of the CCR response at the transcriptional level To further confirm that glucose repression was successfully initiated by the glucose pulse, quantitative real-time PCR was used to measure the relative expression level of a selected set of genes. Based on previous microarray analysis (6), genes affected by addition of glucose were selected as an indicator for initiation of the CCR. PTS-glucose transporter enzyme IIBC, encoded by ptsG, and crp, the gene encoding the global transcriptional regulator CRP, were selected as they were shown to be induced and repressed by glucose, respectively. Furthermore, β-galactosidase, encoded by lacZ , was selected as a gene representing enzymes of which the expression level would not be altered significantly by glucose because of the absence of lactose. 37 CHAPTER 2 RT-qPCR results confirmed the down-regulation of the crp gene expression level, whilst the level of lacZ expression only showed a slight down-regulation, which is in agreement with the results of previous studies (Figure 2, e.g. (6)). In contrast to previous findings (6), however, the expression of ptsG was significantly repressed. This difference may have resulted from any of a large number of factors related to the culture conditions selected in this investigation. Considering that prior to the glucose pulse the glucose transporter (ptsG) will have been highly expressed, i.e. during the period when glucose is limiting, the addition of excess of glucose presumably will lead to a decrease in the expression level of the glucose transporter, because of the accumulation of glucose-6-phosphate that would otherwise occur in the cells. To verify the results obtained with RT-qPCR, a time-series monitoring of the complete transcriptome, using microarray analyses was performed (see Table 1). This transcriptome analysis of the CCR response showed that 70% of the observed regulation was in agreement with previous observations (6). The majority of the genes that are expressed differently are involved in amino acid metabolism, central carbon metabolism, and membrane proteins, including transporters and respiratory chain proteins (see Table 1). Some of the observed differences will be discussed in more detail below. Figure 2. Change in gene-expression level of selected genes from the CCR regulon upon addition of a saturating pulse (50 mM final concentration) of glucose. Data are the mean ± SD of two technical replicates. Black bars represent ptsG, the gene encoding the B and C subunits of the Enzyme II of the glucose-PTS from E. coli, while striped bars and crisscrossed bars represent lacZ, encoding β-galactosidase, and crp, encoding the global transcriptional regulator CRP, respectively. 38 DYNAMIC REGULATION OF CCR IN E. COLI Figure 3. Analysis of the dynamics of the physiological changes induced by glucose repression. Panel a: the measured parameters in this experiment were: OD600 (circles), residual glucose concentration (squares), and formation of the fermentation end-product acetate (triangles) as a function of time in a 60 minutes time window. Panel b: For further evaluation of the observations from panel a, the time window was divided into smaller periods (I – V) and for each period the rate of substrate (i.e. glucose) consumption (squares) and fermentation product (i.e. acetic acid) formation (triangles) were calculated. In addition, the average growth rate (h-1) in each period was calculated and plotted as closed circles. All data are mean ± SE from three independent experiments. For further details: see Materials and Methods. Figure 4. Calculated changes in growth yield of E. coli in response to initiation of glucose repression. Biomass yield on glucose (YX/s, open circles) is plotted against time, as well as the ratio of the specific rate of CO2 production, qCO2, over YX/s (filled squares). These values were calculated for the time intervals shown in Figure 3a. Data are mean ± SE of three independent experiments. For further details: see Materials and Methods. 39 CHAPTER 2 Table 1. Comparison of the ‘list of genes with significantly altered expression level during the first hour after glucose repression”, and the actual changes in expression level of these genes, derived from time-series monitoring of transcript profiles (and expressed as ‘Area Under the Curve’), and comparison of these results with those of (6). GN Description Regulatory protein crp Global DNA-binding transcriptional dual regulator CRP fis Global DNA-binding transcriptional dual regulator Fis Nucleotide biosynthesis and nucleotide salvage pathway adk Adenylate kinase guaA GMP synthetase nrdH Glutaredoxin-like protein nrdI Protein that stimulates ribonucleotide reduction apt Adenine phosphoribosyltransferase upp Uracil phosphoribosyltransferase Amino acid biosynthesis argG* Argininosuccinatesynthetase aroD* 3-dehydroquinate dehydratase cysK* Cysteine synthase A glnA* Glutamine synthetase gltB* Glutamate synthase, large subunit gltD* Glutamate synthase, 4Fe-4S protein, small subunit ilvB Acetolactate synthase I, large subunit ilvN Acetolactate synthase I, small subunit Central carbon metabolism aceA Isocitratelyase aceB Malate synthase A aceE Pyruvate dehydrogenase E1 component aceF Pyruvate dehydrogenase E2 component ackA* Acetate kinase A and propionate kinase 2 acnB Bifunctionalaconitatehydratase 2/2-methylisocitrate dehydratase eno Enolase fruK* Fructose-1-phosphate kinase fumA Fumaratehydratase (fumarase A), aerobic Class I gltA Citrate synthase mdh Malate dehydrogenase, NAD(P)-binding pta* Phosphate acetyltransferase pykA Pyruvate kinase II sdhA Succinate dehydrogenase, flavoprotein subunit sucA 2-oxoglutarate decarboxylase, thiamin-requiring sucC Succinyl-coasynthetase, beta subunit Membrane proteins ompX Outer membrane protein X ompC Outer membrane porin protein C Outer membrane porin F ompF (General Bacterial Porin Family) ompT* Outer membrane protease 40 AUC† Log2 (ratio‡) -23.7 57.6 -0.85 2.79 34.0 115.1 -44.0 -20.5 55.2 103.9 1.13 2.04 -1.28 -1.32 1.56 1.29 62.9 4.6 70.0 87.6 47.3 65.8 -113.7 -124.3 -0.98 -1.37 -0.80 -0.38 -0.15 -0.58 -1.26 -1.72 -103.5 -50.6 6.1 0.3 -34.7 -54.8 48.6 -115.4 -196.5 -82.0 -71.7 -15.3 -81.8 -147.9 -144.0 -135.4 -2.46 -2.28 2.12 1.62 1.40 -1.04 0.16 1.18 -1.90 -3.49 -2.06 1.46 -1.64 -2.51 -2.25 -2.42 103.7 -37.9 1.39 -0.16 -60.4 -0.82 35.8 -0.58 DYNAMIC REGULATION OF CCR IN E. COLI Table 1. Continue. GN Description Outer membrane porin nmpC (General Bacterial Porin Family) Transporter ansP L-asparagine transporter fruB* Fused fructose-specific PTS enzymes: IIA component/Hpr component gltS Glutamate transporter mgtA Magnesium transporter pheP Phenylalanine transporter pitA Phosphate transporter, low-affinity potA Polyamine transporter subunit Putrescine transporter subunit: membrane component of ABC * potH superfamily Putrescine transporter subunit: membrane component of ABC * potI superfamily proP Proline/glycine betaine transporter ptsG* Fused glucose-specific PTS enzymes: IIB component/IIC components ptsH* Hpr component of PTS system uraA Uracil transporter Respiratory protein and ATP synthase fdoG Formate dehydrogenase-O, large subunit frdC Fumarate reductase (anaerobic), membrane anchor subunit ndh Respiratory NADH dehydrogenase 2/cupric reductase nuoA NADH:ubiquinoneoxidoreductase, membrane subunit A Pyruvate dehydrogenase (pyruvate oxidase), poxB thiamin-dependent, FAD-binding cyoA Cytochrome boubiquinol oxidase subunit II cyoD Cytochrome boubiquinol oxidase subunit IV atpA* F1 sector of membrane-bound ATP synthase, alpha subunit atpC* F1 sector of membrane-bound ATP synthase, epsilon subunit atpE* F0 sector of membrane-bound ATP synthase, subunit c † The AUC values were derived from (Chapter 3). Genes representing AUC values AUC† Log2 (ratio‡) -68.8 -2.64 71.5 -129.3 35.1 37.9 70.5 40.8 60.3 2.01 1.14 1.98 1.26 2.20 1.77 1.24 -22.4 1.09 -26.8 1.30 186.1 -83.9 -30.8 183.7 1.04 1.70 1.02 2.05 -44.8 -86.9 51.9 -39.7 -1.39 -1.62 5.04 -1.08 -123.3 -1.95 35.8 34.7 -27.3 -22.6 -20.8 > 16.75 or < 0.15 0.50 0.68 0.71 0.16 -16.75 are considered significantly changed (p < 0.01). ‡ Ratio of gene expression level of wild type E. coli grown in Luria-Bertani medium, supplemented with glucose and normalized with wild type E. coli grown in Luria-Bertani medium (data from (6)). * Genes that are expressed differently between these two studies. Physiological performance of E. coli, growing at steady state under glucoselimited conditions None of the previously performed studies on CCR were able to carry out an exact physiological analysis of the corresponding CCR free conditions. Here three biological replicates of glucose-limited cultures were analysed, with growth at a dilution rate of 0.20 -1 ± 0.01 h . Neither fermentation end-products nor residual glucose is observed to be present in the growth medium in all three biological replicates. The results of carbon flux 41 CHAPTER 2 analysis and biomass measurements show that the steady state conditions generated by glucose-limited chemostats result in highly reproducible data (see Table 2). This confirms again the suitability of the selected experimental setup for studying the CCR response. Physiological performance of E. coli during a glucose pulse experiment The growth yield of bacteria is maximal when they grow in batch culture under optimal conditions (i.e. at µmax) because of the relatively low energy requirements for maintenance. However, often their growth yield is much lower than this maximal value, e.g. because the cells are subject to specific limitations (32). Likewise, the relatively low -1 growth rate in the chemostat (e.g. of 0.2 h ) will also lower growth yield (due to high fractional rate of maintenance energy consumption) than the theoretical maximal value. -1 The results displayed in Figure 4 show that at a growth rate of 0.2 h the growth yield of E. -1 coli is 0.45 g g . One hour after activation of the CCR response the growth rate of the E. coli cells has -1 increased to 0.51 ± 0.03 h (and slightly increases thereafter (Figure 3a)). The specific glucose uptake rate (qs) increases sharply during the first 30 minutes and then stays at an up-regulated level as compared to the steady state condition (Figure 3b). As expected, activation of the CCR response by addition of a glucose pulse resulted in the immediate accumulation of acetate. Such acetate production is commonly observed in E. coli growing on high-glucose-containing media even if the culture is saturated with oxygen (33). Interestingly, after a sharp increase in the specific rate of acetate formation (qacetate) after the first 15 minutes, there is a gradual decrease in this rate in the subsequent growth phases (Figure 3b). During this same period the growth yield first decreases to the minimal -1 value of 0.31 g g after 15 minutes, after which it gradually recovers and then increases to -1 a maximal value of 0.66 g g after 90 minutes (Figure 4). 2.5 Discussion The ability to access and rapidly metabolize their most preferred carbon source has a significant impact on bacteria in a competitive environment where carbon sources are -1 available in limited amounts (i.e. usually are present in the µg l range (31). The carbon catabolite repression (CCR) mechanism provides bacteria with the ability to use selected nutrients preferentially and exclusively, to ensure the highest biomass production (rate) possible (3, 5, 34). Traditionally, studies of CCR are performed in a batch culture, in combination with a peptide/amino acid-based complex medium, or a minimal medium. Then next, most often results observed during growth in the presence of different carbohydrates were compared and validated. The lack of a proper reference condition in 42 DYNAMIC REGULATION OF CCR IN E. COLI this approach has led to complicated results derived from multiple comparisons and/or a CCR-activated reference condition (6, 9–11). We here present a new experimental design that is suitable for any carbon catabolite repression study, and especially also for simultaneous transcriptome and proteome analyses. As outlined in the schematic workflow, in this approach the chemostat is used to generate a highly reproducible steady state as a starting point for the experiments, in which a carbon catabolite de-repressed state is required. This characteristic of the cells at steady state was confirmed by the ability of the E. coli cells to utilize glucose and lactose at the same time. The ability of E. coli, growing in a glucose-limited chemostat at a very low concentration of glucose, to immediately start consuming different sugars, when the latter are added, is in accordance with previous observations (31). Additionally, this carboncatabolite de-repressed condition can be established at high cell densities by proper choices regarding the composition of the defined minimal medium, in which glucose is the sole carbon source. This approach thus also allows for a well-defined route of carbon metabolism and a detailed physiological analysis of the CCR. Also, it obviates the need for use of a rich medium, with an ill-defined carbon source, which will affect the expression of selected genes (see Table 1), which then may lead to confusion about the CCR effect. Moreover, here the CCR induction was performed in a controlled environment in which all the growth-controlling factors, such as glucose availability (carbon source) and dissolved O2 concentration, are in excess, while pH and temperature are kept constant. Such a wellcontrolled culture condition allows one to monitor the uninterrupted CCR response during a 120 minute period (Figure 3a). Time-series measurements then allow one to dissect the gene- and protein-regulation cascades of CCR during a time window of at least one hour. Unlike other glucose pulse experiments (22, 35–38), CCR induction was performed here by using a higher concentration of glucose than that of the feed medium for the chemostat, creating an absolute glucose-excess condition. In this condition, the glucose uptake rate exceeds the metabolic capacity of E. coli as is evident from the lower biomass yield on glucose (i.e. YX/s; see Figure 4) and the high rate of acetate production (i.e. qacetate) during the first 15 minutes after the glucose pulse (Figure 3b). During this period expression of the ptsG was down-regulated, presumably in order to decrease the rate of glucose uptake that would be disproportional to the metabolic capacity of the cells. This down-regulation of ptsG is in agreement with the results of the time-series microarray analysis (see Table 1) and is possibly due to: (i) the lack of cAMPCRP complex and/ or (ii) degradation of the ptsG mRNA via base paring with the SgrS small RNA that is induced under glucose-phosphate stress and/or an imbalance of the glycolytic 43 CHAPTER 2 flux (39, 40). In a period of 30 minutes after the glucose pulse, the imbalance of the glycolytic flux was overcome, as the growth rate increases again to values close to the maximum growth rate of E. coli under these conditions, while the ptsG expression level recovers close to the pre-pulse level (possibly with a slight overshoot that is also visible in the glucose uptake rate (see also Figure S2)).This down-regulation of ptsG was not observed in a previous CCR study (6), which suggests that the amino acids and peptides in LB medium that was used in that investigation provide extra glycolytic intermediates, and hence higher metabolic capacity, which results in the up-regulation of ptsG. We note here that the normalized expression levels of all the selected genes in this study are subject to a slight overestimation as the ribosomal gene, rimM, which was used as the reference gene in this study, was very slightly up-regulated according to the results of the time-series microarray analysis (data not shown). -1 Interestingly, at growth rates of less than 0.4 h , there is a linear correlation between the growth rate and the glucose uptake rate, as both are increasing proportionally after -1 the glucose pulse (Figure 3b). When the growth rate exceeds 0.4 h , the glucose uptake rate saturates at a significantly higher rate as compared to the steady state conditions in the chemostat. This proportionality between the growth rate below 0.4 h -1 and the glucose uptake rate has been observed previouslyin E. coli growing in a glucose-limited -1 chemostat by varying the growth rate between 0.1-0.4 h (41). This consistency between our results and those obtained in a chemostat, confirms that our glucose pulse experiment carried out in the batch mode was as well controlled as a chemostat experiment. To assess the overall catabolic efficiency, the yield on glucose (YX/s) was calculated for each growth phase (see Figure 4). YX/s decreases during the first 15 minutes and then recovers in the next two periods, i.e. up to 90 minutes, opposite to the gradually decreasing ratio of qCO2/Yx/s. The low ratio of qCO2/Yx/s is indicative of a high catabolic efficiency, which is consistent with the increase in biomass yield. Unexpectedly, our observed YX/s is slightly higher than the maximum values typically reported in the -1 -1 literature: e.g. 0.54 g g (42) and 0.60 g g (41). However, the YX/s could surpass this value when there is a small accumulation of acetate (41), which was observed as well in our experiments after a glucose pulse. The decrease in the YX/s during the first 15 minutes may + - be due to a decrease in overall catabolic efficiency (or: the H /e stoichiometry of the electron transfer chain specifically; see further below). Time-series microarray analysis results show that the expression of the regulon expressing the NDH-I complex is downregulated in this period and subsequently recovers supporting this hypothesis (see Figure S5). 44 DYNAMIC REGULATION OF CCR IN E. COLI Calculations based on a simplified model for catabolism of E. coli (see also legend to + - Figure S3) revealed that the H /e ratio of the cells increases with the growth rate between 5 and ~45 minutes (Figure S3). This ratio increases from approximately 3 (during steady state growth) to ~4 at 30 minutes after the glucose pulse paralleled by an increase in -1 + - growth rate to 0.4 h , and to a value > 8 at the µmax. The increasing H /e ratio indicates that the NDH-I and cytochrome bo oxidase of the respiratory chain contribute more and more significantly to energy conservation in catabolism, and that the PoxB enzyme and the cytochrome bdII complex show low or no activity under these conditions. Indeed the results in Table 1 show that apparently poxB was expressed during chemostat growth, but + - is strongly repressed during CCR. At the maximal growth rate the calculated H /e stoichiometry is much higher than the theoretical maximum (i.e. 4; (43, 44), see also (45)). This overestimate may be due to multiple factors, one of which is an error in the estimate of the ATP-costs of biomass formation. Glucose-limited growth for instance may require more ATP for biomass formation than growth under glucose excess conditions, due to the decreased requirement for catabolic proteins under thermodynamically more favourable conditions of glucose excess (46). That such a transition may also be relevant during CCR is suggested by the observation that expression of pfkA decreases, and that of edd and eda increases, after the glucose pulse (data not shown). The numerical consequences of such a shift are difficult to estimate because (increased) use of the Entner-Doudoroff pathway + - will tend to increase the H /e ratio, because of the lower ATP yield of substrate-level phosphorylation of this pathway, as compared to the Embden-Meyerhof pathway. In support of the notion that faster growth of E. coli is accompanied by an increase in efficiency of biomass formation, Kayseret al., (2005) reported up to 50% increase in ○ biomass yield on ATP, for growth in glucose-limited chemostats at 28 C, and using the + - assumption of a constant the H /e stoichiometry. Both their data and ours (Figure 4) show a higher biomass yield at higher growth rates. Also when their data is used to calculate the + - H /e stoichiometry values are obtained that are higher than the theoretical maximum value of 4 (see Figure S4). The combined results suggest that there is a clear correlation between the growth rate and growth efficiency; however, for a detailed numerical interpretation of such data much more detailed physiological insight into the catabolic ‘fluxome’ of E. coli is required, in combination with a detailed analysis of the composition of the cells in terms of its macromolecular composition: higher yields at higher growth rates would be expected to be accompanied by relative increase of lipid and polysaccharide as compared to protein (and nucleic acid). 45 CHAPTER 2 2.6 Acknowledgements The authors would like to thank Dr. F. Branco dos Santos for critical reading of the manuscript. The work of O.B. was funded through a grant from the Higher Education Commission of Thailand. 2.7 Supplementary materials Supplementary data associated with this chapter can be found in the online version at http://www.microbiologyresearch.org. Alternatively it may be requested per email for personal use: [email protected]. 2.8 References 1. Balazsi G, Barabasi AL, Oltvai ZN. 2005. Topological units of environmental signal processing in the transcriptional regulatory network of Escherichia coli. Proc Natl Acad Sci U S A 102:7841–7846. 2. 3. Gottesman S. 1984. Bacterial regulation: global regulatory networks. Annu Rev Genet 18:415–441. Deutscher J, Francke C, Postma PW. 2006. How phosphotransferase system-related protein phosphorylation regulates carbohydrate metabolism in bacteria. Microbiol Mol Biol Rev MMBR 70:939–1031. 4. Postma PW, Lengeler JW, Jacobson GR. 1993. Phosphoenolpyruvate:carbohydrate phosphotransferase systems of bacteria. Microbiol Rev 57:543–594. 5. Gorke B, Stulke J. 2008. Carbon catabolite repression in bacteria: many ways to make the most out of nutrients. Nat Rev 6:613–624. 6. Gosset G, Zhang Z, Nayyar S, Cuevas WA, Saier MH Jr. 2004. Transcriptome analysis of Crp-dependent catabolite control of gene expression in Escherichia coli. J Bacteriol 186:3516–3524. 7. Khankal R, Chin J, Ghosh D, Cirino P. 2009. Transcriptional effects of CRP* expression in Escherichia coli. J Biol Eng 3:13. 8. Oh MK, Rohlin L, Kao KC, Liao JC. 2002. Global expression profiling of acetate-grown Escherichia coli. J Biol Chem 277:13175–13183. 9. Vollmer M, Nagele E, Horth P. 2003. Differential proteome analysis: two-dimensional nano-LC/MS of E. coli proteome grown on different carbon sources. J Biomol Tech JBT 14:128–135. 10. Liu M, Durfee T, Cabrera JE, Zhao K, Jin DJ, Blattner FR. 2005. Global transcriptional programs reveal a carbon source foraging strategy by Escherichia coli. J Biol Chem 280:15921–15927. 11. Silva JC, Denny R, Dorschel C, Gorenstein MV, Li GZ, Richardson K, Wall D, Geromanos SJ. 2006. Simultaneous qualitative and quantitative analysis of the Escherichia coli proteome: a sweet tale. Mol Cell Proteomics MCP 5:589–607. 12. Bekker M, Kramer G, Hartog AF, Wagner MJ, de Koster CG, Hellingwerf KJ, de Mattos MJ. 2007. Changes in the redox state and composition of the quinone pool of Escherichia coli during aerobic batch-culture growth. Microbiol Read Engl 153:1974–1980. 13. Monod J. 1949. The Growth of Bacterial Cultures. Annu Rev Microbiol 3:371–394. 14. Novick A, Szilard L. 1950. Experiments with the Chemostat on spontaneous mutations of bacteria. Proc Natl Acad Sci U S A 36:708–719. 15. Hoskisson PA, Hobbs G. 2005. Continuous culture--making a comeback? Microbiol Read Engl 151:3153– 3159. 46 DYNAMIC REGULATION OF CCR IN E. COLI 16. Wick LM, Quadroni M, Egli T. 2001. Short- and long-term changes in proteome composition and kinetic properties in a culture of Escherichia coli during transition from glucose-excess to glucose-limited growth conditions in continuous culture and vice versa. Environ Microbiol 3:588–599. 17. Piper MD, Daran-Lapujade P, Bro C, Regenberg B, Knudsen S, Nielsen J, Pronk JT. 2002. Reproducibility of oligonucleotide microarray transcriptome analyses. An interlaboratory comparison using chemostat cultures of Saccharomyces cerevisiae. J Biol Chem 277:37001–37008. 18. Boer VM, de Winde JH, Pronk JT, Piper MD. 2003. The genome-wide transcriptional responses of Saccharomyces cerevisiae grown on glucose in aerobic chemostat cultures limited for carbon, nitrogen, phosphorus, or sulfur. J Biol Chem 278:3265–3274. 19. Kolkman A, Olsthoorn MM, Heeremans CE, Heck AJ, Slijper M. 2005. Comparative proteome analysis of Saccharomyces cerevisiae grown in chemostat cultures limited for glucose or ethanol. Mol Cell Proteomics MCP 4:1–11. 20. Hayes A, Zhang N, Wu J, Butler PR, Hauser NC, Hoheisel JD, Lim FL, Sharrocks AD, Oliver SG. 2002. Hybridization array technology coupled with chemostat culture: Tools to interrogate gene expression in Saccharomyces cerevisiae. Methods San Diego Calif 26:281–290. 21. Castrillo JI, Zeef LA, Hoyle DC, Zhang N, Hayes A, Gardner DC, Cornell MJ, Petty J, Hakes L, Wardleworth L, Rash B, Brown M, Dunn WB, Broadhurst D, O’Donoghue K, Hester SS, Dunkley TP, Hart SR, Swainston N, Li P, Gaskell SJ, Paton NW, Lilley KS, Kell DB, Oliver SG. 2007. Growth control of the eukaryote cell: a systems biology study in yeast. J Biol 6:4. 22. Sunya S, Delvigne F, Uribelarrea JL, Molina-Jouve C, Gorret N. 2012. Comparison of the transient responses of Escherichia coli to a glucose pulse of various intensities. Appl Microbiol Biotechnol 95:1021–1034. 23. Evans CGT, Herbert D, Tempest DW. Chapter XIII The Continuous Cultivation of Micro-organisms: 2. Construction of a Chemostat, p. 277–327. In Methods in Microbiology. Academic Press. 24. Herbert D, Phipps PJ, Strange RE. Chapter III Chemical Analysis of Microbial Cells, p. 209–344. In Methods in Microbiology. Academic Press. 25. Alexeeva S, Hellingwerf KJ, Teixeira de Mattos MJ. 2003. Requirement of ArcA for redox regulation in Escherichia coli under microaerobic but not anaerobic or aerobic conditions. J Bacteriol 185:204–209. 26. Niklas J, Schrader E, Sandig V, Noll T, Heinzle E. 2011. Quantitative characterization of metabolism and metabolic shifts during growth of the new human cell line AGE1.HN using time resolved metabolic flux analysis. Bioprocess Biosyst Eng 34:533–545. 27. Lange HC, Eman M, van Zuijlen G, Visser D, van Dam JC, Frank J, de Mattos MJ, Heijnen JJ. 2001. Improved rapid sampling for in vivo kinetics of intracellular metabolites in Saccharomyces cerevisiae. Biotechnol Bioeng 75:406–415. 28. Chirgwin JM, Przybyla AE, MacDonald RJ, Rutter WJ. 1979. Isolation of biologically active ribonucleic acid from sources enriched in ribonuclease. Biochemistry (Mosc) 18:5294–5299. 29. Livak KJ, Schmittgen TD. 2001. Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods San Diego Calif 25:402–408. 30. Postma E, Scheffers WA, van Dijken JP. 1989. Kinetics of growth and glucose transport in glucose-limited chemostat cultures of Saccharomyces cerevisiae CBS 8066. Yeast Chichester Engl 5:159–165. 31. Lendenmann U, Egli T. 1995. Is Escherichia coli growing in glucose-limited chemostat culture able to utilize other sugars without lag? Microbiol Read Engl 141 ( Pt 1):71–78. 32. Schlegel HG. 1993. General Microbiology, 7th edition. Cambridge University Press, Cambridge, UK. 33. Delgado J, Liao JC. 1997. Inverse flux analysis for reduction of acetate excretion in Escherichia coli. Biotechnol Prog 13:361–367. 47 CHAPTER 2 34. Postma PW, Lengeler JW, Jacobson GR. 1993. Phosphoenolpyruvate:carbohydrate phosphotransferase systems of bacteria. Microbiol Rev 57:543–594. 35. Schaefer U, Boos W, Takors R, Weuster-Botz D. 1999. Automated sampling device for monitoring intracellular metabolite dynamics. Anal Biochem 270:88–96. 36. Buchholz A, Hurlebaus J, Wandrey C, Takors R. 2002. Metabolomics: quantification of intracellular metabolite dynamics. Biomol Eng 19:5–15. 37. Chassagnole C, Noisommit-Rizzi N, Schmid JW, Mauch K, Reuss M. 2002. Dynamic modeling of the central carbon metabolism of Escherichia coli. Biotechnol Bioeng 79:53–73. 38. Schaub J, Reuss M. 2008. In vivo dynamics of glycolysis in Escherichia coli shows need for growth-rate dependent metabolome analysis. Biotechnol Prog 24:1402–1407. 39. Vanderpool CK. 2007. Physiological consequences of small RNA-mediated regulation of glucose-phosphate stress. Curr Opin Microbiol 10:146–151. 40. Richards GR, Patel MV, Lloyd CR, Vanderpool CK. 2013. Depletion of Glycolytic Intermediates Plays a Key Role in Glucose-Phosphate Stress in Escherichia coli. J Bacteriol 195:4816–4825. 41. Kayser A, Weber J, Hecht V, Rinas U. 2005. Metabolic flux analysis of Escherichia coli in glucose-limited continuous culture. I. Growth-rate-dependent metabolic efficiency at steady state. Microbiol Read Engl 151:693–706. 42. Schulze KL, Lipe RS. 1964. Relationship between Substrate Concentration, Growth Rate, and Respiration Rate of Escherichia Coli in Continuous Culture. Arch Mikrobiol 48:1–20. 43. Puustinen A, Finel M, Virkki M, Wikstrom M. 1989. Cytochrome o (bo) is a proton pump in Paracoccus denitrificans and Escherichia coli. FEBS Lett 249:163–167. 44. Wikstrom M. 1984. Two protons are pumped from the mitochondrial matrix per electron transferred between NADH and ubiquinone. FEBS Lett 169:300–304. 45. Sharma P, Hellingwerf KJ, de Mattos MJ, Bekker M. 2012. Uncoupling of substrate-level phosphorylation in Escherichia coli during glucose-limited growth. Appl Environ Microbiol 78:6908–6913. 46. Flamholz A, Noor E, Bar-Even A, Liebermeister W, Milo R. 2013. Glycolytic strategy as a tradeoff between energy yield and protein cost. Proc Natl Acad Sci U S A 110:10039–10044. 48 Chapter 3: Time-series analysis of the transcriptome and proteome of E. coli upon glucose repression Orawan Borirak, Matthew D. Rolfe, Leo J. de Koning, Huub C.J. Hoefsloot, Martijn Bekker, Henk L. Dekker, Winfried Roseboom, Jeffrey Green, Chris G. de Koster and Klaas J. Hellingwerf This chapter has been published as: Borirak O, Rolfe MD, de Koning LJ, Hoefsloot HCJ, Bekker M, Dekker HL, Roseboom W, Green J, de Koster CG, Hellingwerf KJ. Time-series analysis of the transcriptome and proteome of Escherichia coli upon glucose repression. Biochim Biophys Acta. 2015 Oct;1854(10 Pt A):1269–79. CHAPTER 3 3.1 Abstract Time-series transcript- and protein-profiles were measured upon initiation of carbon catabolite repression in Escherichia coli, in order to investigate the extent of posttranscriptional control in this prototypical response. A glucose-limited chemostat culture was used as the CCR-free reference condition. Stopping the pump and simultaneously adding a pulse of glucose, that saturated the cells for at least one hour, was used to initiate the glucose response. Samples were collected and subjected to quantitative timeseries analysis of both the transcriptome (using microarray analysis) and the proteome 15 (through a combination of N-metabolic labelling and mass spectrometry). Changes in the transcriptome and corresponding proteome were analysed using statistical procedures designed specifically for time-series data. By comparison of the two sets of data, a total of 96 genes was identified that are post-transcriptionally regulated. This gene list provides candidates for future in-depth investigation of the molecular mechanisms involved in post-transcriptional regulation during carbon catabolite repression in E. coli, like the involvement of small RNAs. 3.2. Introduction Escherichia coli is the best studied Gram-negative bacterial species to date (1). This makes it the ideal prokaryote in which to study physiological adaptation, and the involvement of post-transcriptional regulation therein. The availability of omics-analysis techniques has, particularly in bacteria, opened up the possibility of analyzing biological function with a ‘systems approach’ (2). A simplifying assumption, however, that is often made in systems analyses is that the level of expression of a certain protein is proportional to the abundance of the corresponding mRNA. However, many studies did not find good correlation between protein- and mRNA abundance (for review see: (3)), which suggests that in systems analysis, regulation of gene expression at the post-transcriptional level must be taken into account. Indeed several examples of this type of regulation have recently been documented (4–8). For this reason here we investigate the extent of posttranscriptional regulation in a physiological response that is based on a global (i.e. genome-wide) alteration of the level of gene expression. For this, we have selected carbon catabolite repression (CCR) in E. coli (9, 10) as our model system. The mechanisms underlying the ability of E. coli to grow on a wide range of carbon sources has already been studied for decades, through both physiological and genetic studies (11, 12). E. coli’s preferred use of glucose is brought about by CCR (13): if in a batch culture of this organism the majority of the available glucose has been catabolized, metabolism is reprogrammed to prepare the organism for use of alternative carbon 50 IDENTIFYING POST-TRANSCRIPTIONAL CONTROL UPON CCR IN E. COLI sources (14). When glucose becomes available again, the cells undergo another major transition, i.e. CCR that involves both the cells transcriptional- and metabolic networks. Regulation of the CCR is complex and controlled at multiple levels. It is assumed to be predominantly regulated at the level of transcription via the ‘alarmone’ cAMP, which in turn forms a complex with the global transcriptional regulator CRP. Indeed, the cAMP-CRP complex modulates expression of many catabolic genes (10, 13), and cAMP levels in the cell are under control of the glucose-specific part of the Phosphoenol-pyruvate-dependent Phospho-Transferase system (‘glucose-PTS’), via the level of phosphorylation of Enzyme Glc IIA (10). ‘Systems’ studies of CCR generally focus on the transcriptional regulation of gene expression (11, 12, 15–17), whereas the involvement of post-transcriptional regulation in CCR has not even been investigated under steady state conditions, let alone dynamically. Because of the ‘global’ nature of the CCR response, we considered it plausible that post-transcriptional regulation would constitute a significant part of it. This expectation is strengthened by the results of several recent studies of other regulation mechanisms, in which it was shown that transcript levels often poorly correlate with the corresponding protein levels (18–21). Furthermore, a detailed time-series proteomics analysis of carbon catabolite repression, combined with transcript profiling analysis, has not yet been reported. Data derived from time-series experiments provide a richer source of information than a single time point measurement. Interpretation of cellular responses with the latter approach usually raises the question of whether the most informative time point was selected for optimal data analysis. Moreover, time-series data tend to reduce measurement noise and thus increase the accuracy of the conclusions. Here we present the results of a set of experiments that enabled us to quantify a genome-wide time-series of both the transcript- and the protein-levels in E. coli cells, subjected to a change from glucose-limiting conditions, i.e. CCR-free, to glucose-excess conditions, i.e. with CCR. This could be achieved via stopping the pump of a chemostat, with simultaneous addition of a glucose pulse that saturates the cells for a period of at least one hour (see Figure 1 for the experimental design). Physiological and molecular genetic evidence that such a glucose pulse indeed activates CCR has recently been described elsewhere (22). The aim of the current experiments was to investigate the significance of post-transcriptional control in CCR in E. coli, by comparing dynamic alterations of the transcriptome, with those of the corresponding proteome. The results of such measurements were subjected to statistical analyses, specifically designed for timeseries data. The ‘Area Under the Curve’ (AUC), representing the relative change in RNA 51 CHAPTER 3 level of a specific gene, was used to calculate the change in the amount of the transcript during the whole time-series , while a simple linear regression was used to determine the change in the amount of the corresponding protein per unit of time. Using a genome-wide comparative analysis between the transcript- and the corresponding protein-levels, we have identified 96 genes that are regulated post-transcriptionally, 51 of them with a significance level of < 0.01, and another 45 genes at the significance level of < 0.05. The discovery of this extensive involvement of post-transcriptional regulation in CCR provides a starting point for the more detailed understanding of this physiological response. Mechanisms that may contribute to this added layer of regulation are briefly discussed. 3.3 Materials and Methods Bacterial strain and growth conditions Escherichia coli MG1655 was grown under glucose-limited conditions in 2 L chemostat vessels (Applikon, The Netherlands) with a working volume of 1 liter at a dilution rate of -1 0.2 h . Culture conditions and medium composition were selected as previously described (22). Briefly, a minimal medium (23) supplemented with 20 mM nitrilo-acetic acid as a 15 chelator, 0.17 µM Na2SeO3, and 20 mM glucose was used. For a reference culture, NH4Cl 15 (98 atom % N; Sigma Aldrich) was used as the N source, instead of NH4Cl. Temperature was controlled at 37°C and pH was maintained at 6.9 ± 0.1 by titrating with sterile 4 M NaOH. The culture was aerated with 0.5 l/min water-saturated air and agitated with a propeller at 600 rpm. Pre-cultures were grown in the same medium, except that 100 mM sodium phosphate buffer was used to increase buffering capacity. After the chemostat culture reached steady state, 50 mM glucose (final concentration) was added to initiate the CCR response. Simultaneously, the medium feed was stopped and cells were harvested for further analyses by conventional rapid sampling (i.e. using a slight overpressure) at 5, 15, 30, and 60 minutes after the glucose pulse. Cells harvested from the steady state of the chemostat were used as a reference (i.e. time = 0) sample. Transcriptomic analysis RNA sample preparation RNA was isolated using the RNeasy mini kit (Qiagen) as described previously (22). The oligonucleotide microarrays (design ID 029412) (24) used in this study were obtained from Agilent Technologies (Stockport, UK). Each microarray slide (8×15k format) was based on the Agilent E. coli catalogue microarray (G4813A-020097), which covered 4,287 E. coli K-12 MG1655 genes and was supplemented by an additional 311 probes designed using eArray 52 IDENTIFYING POST-TRANSCRIPTIONAL CONTROL UPON CCR IN E. COLI (Agilent Technologies) for recently identified genes, re-annotated genes and small noncoding RNAs. Microarray analysis Isolated RNA was directly converted to fluorescently labeled cDNA as described elsewhere (25). The cDNA produced from RNA samples at 5, 15, 30 and 60 min after perturbation of the chemostat with a glucose pulse were all hybridized with cDNA produced from the steady-state sample (i.e. t = 0), which was used as the reference. Two RNA samples (from biological replicates) were obtained for each time point, and these were hybridized twice as dye-swaps (i.e. technical replicates) and thus provide four replicates in total. These choices provide sufficient data for robust statistical filtering. Quantification of the cDNA samples, microarray hybridization, and washing and scanning of the arrays, were carried out as described in the Fairplay III labeling kit (Agilent Technologies, 252009, Version 1.1). Scanning was performed with a high resolution microarray scanner (Agilent Technologies). GeneSpring GX v7.3 (Agilent Technologies) was used for data normalization and data analysis. The transcriptomic data have been deposited in ArrayExpress (Accession number E-MTAB-2398). Short Time-series Expression Miner (STEM) analysis The STEM analysis tool (26) that is integrated with Gene Ontology (GO) enrichment analysis, was used to cluster genes that show a similar temporal expression pattern by using normalized log2 ratios of the transcriptomic data. A total of 50 possible temporal gene expression profiles (out of the 81 possible) were computed. Genes were then assigned to the best-fitting profile using the STEM clustering algorithm. The significance level of each profile was calculated based on the ratio of the number of assigned genes to that profile, versus the number of expected genes to a profile using Permutation test, and corrected by Bonferroni correction as previously described (27). Proteomic analysis Protein sample preparation Cells (approximately 20 ml) were harvested from the chemostat by conventional rapid sampling (i.e. using a slight overpressure) directly into an ice-cold tube with a small volume of 50 µg/ml chloramphenicol and a 1/10 dilution of a Complete Protease Cocktail inhibitors mix (both from a concentrated stock solution and the latter mix was from Roche), and then centrifuged at 4,000 × g for 5 min at 4 °C. Cell pellets were immediately frozen with liquid nitrogen and subsequently lyophilized and stored at 80 °C until use. 53 CHAPTER 3 15 Sampled cells were mixed with cells from the N-reference culture at a 1:1 ratio based on OD600, and then suspended in an extraction buffer that consisted of 6 M urea, 0.5 mM EDTA, 2% (w/v) SDS, and complete protease inhibitors cocktail mixture (Roche) in 100 mM NH4HCO3 lysis buffer, after which the mixture was sonicated. The protein concentration was measured in all samples with the bicinchoninic acid (BCA) assay (BioRad). Next, 200 µg protein was subjected to trypsin digestion, using the gel-assisted digestion method as previously described (28). The resulting peptide mixture was then lyophilized after extraction from the gel. Strong Cation Exchange Chromatography Peptides were resuspended in 0.1% (v/v) trifluoroacetic acid (TFA) plus 50% (v/v) TM acetonitrile (ACN), and then loaded onto a PolySULPHOETHYLAspartamide column (2.1 mm ID, 10 cm length) on an Ultimate HPLC system, connected to a fraction collector (LC Packings, Amsterdam, The Netherlands). Elution (flow rate: 0.1 ml/min) was performed using solvent A; 10 mM KH2PO4, 25% (v/v) ACN, pH 2.9 and solvent B; 10 mM KH2PO4, 500 mM KCl and 25% (v/v) ACN, pH 2.9. A stepwise gradient was used of 2%, 4%, 6%, 8%, 10%, 20%, 50%, and 100% of B. The program was run for 120 min, in which the step-gradient started after 40 min and lasted for 10 min at each step. The elution was monitored via absorbance measurements at 214 nm. Accordingly, 8 separate fractions were collected. Then, these samples were lyophilized and stored at -80 °C. Before being analyzed by mass spectrometry, the samples were re-suspended in 0.1% (v/v) TFA plus 3% (v/v) ACN. Fractions eluted from 6 to 10% (v/v) of solvent B were combined, as well as those collected from 20 to 100% of solvent B. Therefore, a total of 4 fractions was generated, and subsequently desalted with a C18 reversed phase tip (Varian). LC-FT-MS/MS data acquisition, data processing and relative protein quantification For 3 biological replicates, the proteomes of the cells harvested at steady state (i.e. at t = 0 min), and of the cells harvested at t = 5, 15, 30 and 60 min after induction of CCR, were analyzed with mass spectrometry. The LC-FT-MS/MS data of each of the 4 SCX fractions of 14 the N, 15 N isotopic tryptic peptide mixture of these proteomes were acquired with an ApexUltra Fourier transform ion cyclotron resonance mass spectrometer (Bruker Daltonic, Bremen, Germany) equipped with a 7 T magnet and a nano-electrospray Apollo II DualSource™ coupled to an Ultimate 3000 (Dionex, Sunnyvale, CA, USA) HPLC system. The 14 15 60 samples, each containing 400 ng of the N, N tryptic peptide mixture, were injected as a 10 μl 0.1% (v/v) TFA aqueous solution and loaded onto the PepMap100 C18 (5-μm particle size, 100-Å pore size, 300-μm inner diameter × 5 mm length) pre-column. The 54 IDENTIFYING POST-TRANSCRIPTIONAL CONTROL UPON CCR IN E. COLI peptides were eluted via an Acclaim PepMap 100 C18 (3-µm particle size, 100-Å pore size, 75-μm inner diameter × 250 mm length) analytical column (Thermo Scientific, Etten-Leur, The Netherlands) using a linear gradient from 0.1% formic acid / 6% CH3CN / 94% H2O (v/v) to 0.1% formic acid / 40% CH3CN / 60% H2O (v/v) over a period of 120 min at a flow rate of 300 nl/min. Data-dependent Q-selected peptide ions were fragmented in the -6 hexapole collision cell at an Argon pressure of 6×10 mbar (measured at the ion gauge) and the fragment ions were detected in the FTICR cell at a resolution of up to 60.000 (m/Δm). Instrument mass calibration was better than 1 ppm over an m/z range of 250 to 1500. The MS/MS rate was about 2 Hz. This yielded more than 9000 MS/MS spectra over the 120 min LC-MS/MS chromatogram. Raw FT-MS/MS data of the 4 SCX peptide fractions were processed as multi-file (MudPIT) with the MASCOT DISTILLER program, version 2.4.3.1 (64 bits), MDRO 2.4.3.0 (MATRIX science, London, UK), including the Search toolbox and the Quantification toolbox. Peak-picking for both MS and MS/MS spectra was optimized for a mass resolution of up to 60.000 (m/Δm). Peaks were fitted to a simulated isotope distribution with a correlation threshold of 0.7, and with a minimum signal-to-noise ratio of 2. The processed data were searched with the MASCOT server program 2.3.02 against the complete E. coli K12 proteome database from the UniProt consortium (release: June, 2012; 4271 entries in total) with the redundancy removed with DBtoolkit (29). The database was complemented with its corresponding decoy data base for statistical analyses of peptide false discovery rate (FDR). Trypsin was used as the enzyme and 1 missed cleavage was allowed. Carbamidomethylation of cysteine was used as a fixed modification and oxidation of methionine as a variable modification. In addition to the search for tryptic peptides, semi-tryptic peptides were allowed in order to monitor selectivity of digestion. The peptide mass tolerance was set to 5 ppm and the peptide fragment mass tolerance was set to 0.01 Dalton. The quantification method was set to the 15 14 15 metabolic N labeling method, to enable MASCOT to identify both N and N peptides. The MASCOT MudPIT peptide identification score was set to a cut-off of 20. At this cut-off, and based on the number of assigned decoy peptide sequences, a peptide false discovery rate (FDR) of ~2% for all analyses was obtained. Using the quantification toolbox, the isotopic ratio for all identified proteins was determined as weighted average of the isotopic ratios of the corresponding light over heavy peptides. Selected critical settings were: require bold red: on, significance threshold: 0.05: Protocol type: precursor; Correction: Element 15N; Value 99.4; Report ratio L/H; Integration method: Simsons; Integration source: survey; Allow elution time shift: on; Elution time delta: 20 seconds; Std 55 CHAPTER 3 Err. Threshold: 0.15, Correlation Threshold (Isotopic distribution fit): 0.98; XIC threshold: 0.1; All charge states: on; Max XIC width: 200 seconds; Threshold type: at least homology; Peptide threshold value: 0.05; unique pepseq: on. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository (30) with the dataset identifier <PXD000xxx>. Statistical analysis Transcriptome - Area Under the Curve (AUC) The transcriptomic data consist of 2 biological replicates of samples taken at 0 (i.e. the steady state), and 5, 15, 30, and 60 min after the glucose pulse, with technical duplicate measurements at all-time points (except at 30 minutes, when only a single sample was available). To characterize a change in the amount of transcript over time, we determined Area Under the Curve (AUC) of normalized log2 ratios of the transcripts as a function of time (compare (31)). For each biological replicate, 8 possible time profiles (i.e.: 2 × 2 × 1 × 2) can be constructed per gene. The AUC values of the 8 time profiles were calculated using the trapz function of MATLAB. These values were averaged and next the average AUC value of the two biological replicates was calculated to obtain the average value. This final AUC value represents the relative change in the amount of a transcript during the whole time-series. The normalized unlogged ratio of the transcript at each time point and the calculated AUC values are provided in the Supplementary Table S1. Proteome - Linear regression analysis The normalized 14 15 N/ N isotopic ratio for all proteins and for all sampling points is listed in the Supplementary Table S2. They represent the relative abundance level of a protein. The number of available data points for any protein varies between 1 and 16, as the data were generated from 3 biological replicates and 5 time points, plus 1 additional technical duplicate. Any change in the abundance of a protein is governed by the balance between its rate of production (via transcription and translation) and its rate of degradation. To estimate relative changes in protein concentration, the proteins for which at least 6 separate data points were available, the abundance was analyzed with linear regression, using the MATLAB function regress. Time was used as the explanatory variable 14 15 and the normalized protein N/ N isotopic ratio as the dependent variable. The resulting regression coefficient (i.e. the slope) represents the change of the relative amount of the protein per unit of time, which can also be interpreted as ‘a net production (with positive slope) or net degradation (i.e. when the slope is negative) rate’. A rate is considered to be 56 IDENTIFYING POST-TRANSCRIPTIONAL CONTROL UPON CCR IN E. COLI significant if the value zero (i.e. the null hypothesis) is outside the 95% confidence interval of the calculated slope. The calculated rates of change in relative protein abundance of 557 proteins, with the corresponding p-values, are provided in Supplementary Table S3. Integrated analysis of Transcriptomic and Proteomic data Calculation of the confidence region for the first null-hypothesis that transcript and corresponding protein level do not change To be able to calculate confidence regions for this null hypothesis, first the statistical properties of the measurement distribution under this null hypothesis must be derived. For the transcript level all the duplicate measurements were used to obtain a standard deviation per batch. Under the null hypothesis all the log2 ratios, the values used in the analysis, are zero. So in order to obtain the statistical distribution of the AUC values the artificial transcriptomic data were drawn from a normal distribution with zero mean and the standard deviation from the batch under consideration. Then the same procedure as described previously was used to calculate the AUC for the artificial transcriptomics data. This was done 1000 times. The obtained AUC values were fitted with a normal distribution again using the MATLAB function normfit, and the mean and standard deviation were calculated. For the proteome a normal distribution is fitted to the measurements at t = 0 for the three batches. From this fit the means and standard deviations of the relative changes in the level of a specific protein are determined using the normfit function of MATLAB. Under the null hypothesis that no changes in protein level occur during the experiment, the measurements at times 5, 15, 30 and 60 minutes come from the same distribution as the measurements from t = 0. In an artificial data file every value from the original data is replaced by a value from a normal distribution with the appropriate mean and standard deviation. This means that if the value was from batch 2 also the mean and standard deviation from batch 2 was used. If there was no value in the original data this was also the case in the artificial data. Then for an artificial data file the slopes were calculated as described above. This was repeated for 1000 artificial data files. The calculated slopes were then used to fit a normal distribution and both its mean and standard deviation were determined with normfit. With this we have the distribution of the artificial data under the null-hypothesis for both the AUC values and the slopes. Then to get a confidence region where neither the 2 transcript- nor the protein level did change significantly, the χ distribution with 2 degrees of freedom was calculated (32). This is the elliptic region (p < 0.01) as shown in Figure 6A. The outbound regions of this ellipse were then used as the significance threshold for the 57 CHAPTER 3 transcript level (AUC). This means that any gene having AUC values > 16.75 or < -16.75 was considered as significantly changed in its transcript level (with p < 0.01). The corresponding values for the ‘slope’ are: < - 0.004 and > 0.004. Use of a ‘moving average’ to identify genes with a disproportionate change in the level of its mRNA and the corresponding protein For each quantifiable protein the value for the slope (i.e. from the time-series proteomics analysis and that represents the relative change in its abundance) was averaged over 13 (= n) values, i.e. the value for the slope of the protein itself, plus the slope of the 6 proteins with the closest lower AUC value and those with the 6 nearest higher AUC values. If this averaged slope falls outside the 99% confidence interval (as determined by linear regression) for the protein under consideration, the corresponding gene is considered to be subject to post-transcriptional regulation (Table 1 and Supplementary Table S4). The same analysis was carried out with n = 11 and n = 15 (results not shown). This resulted in essentially the same list of genes subject to PTR. 3.4 Results Dynamic analysis of the physiological characteristics of E. coli cells upon glucose repression, induced in cells growing in a steady-state chemostat culture To investigate the contribution of post-transcriptional control in E. coli upon initiation of the CCR response, we first created a reference condition where no CCR is present by using a glucose-limited chemostat culture. Under these conditions no residual glucose could be detected in the chemostat cultures, which means that the glucose concentration was lower than 50 µM (22). This culture was then pulsed with an excess of glucose to initiate the CCR, simultaneously with stopping the pump of the chemostat. The derepressed nature of the cells at steady state, and the switch to the glucose-repressed state that we aimed to achieve were experimentally validated as described elsewhere (22). Briefly, growth and physiological characteristics of the glucose-repressed cells were monitored through biomass- and fermentation-product measurements, and additionally via a time-series measurement of the expression level of selected genes. The time-series gene expression analysis was performed with RT-PCR. The results showed that there was no CCR in E. coli cells growing in a glucose-limited chemostat, as such cells had the ability to immediately consume alternative carbon sources (confirmed via addition of selected sugars to the culture), while two selected genes, i.e. ptsG (encoding a PTS-glucose transporter enzyme IIBC), and crp (encoding the global transcriptional regulator CRP), were shown to be repressed at least up to 60 min after the glucose pulse. The decreased 58 IDENTIFYING POST-TRANSCRIPTIONAL CONTROL UPON CCR IN E. COLI expression level of these genes confirms that the initiation of glucose repression by the pulse of glucose was successful. The growth rate (µ) of the E. coli culture after the glucose pulse increased from 0.2 up -1 to 0.5 h after 60 min, and slightly increased thereafter up to 90 min. Sugar and organic acid measurements showed that the glucose concentration was in excess at all times during this 60 min time window (glucose concentration 38 mM 60 min after a 50 mM pulse (22)) and that acetate was the only fermentation product observed up to 120 min after the glucose pulse. From such a pulsed chemostat, cells were harvested at steady state (t = 0) and in a time-series at 5, 15, 30, and 60 min after the glucose pulse, to be used for further analyses of both the transcriptome and the proteome (Figure 1). Figure 1. Experimental design and identification strategy for post-transcriptionally regulated (PTR) genes used in this study. 59 CHAPTER 3 Quantitative transcriptomic analysis of Carbon Catabolite Repression Time-series transcriptomic data were measured using microarrays containing 4,057 protein encoding genes, 152 pseudo-genes, and 78 small non-coding RNAs and other RNA elements. Transcript levels at t = 5, 15, 30, and 60 min after the glucose pulse were normalized with the transcript level at t = 0. The average coefficient of variation among replicates at 5, 15, 30, and 60 minutes was 20%, 21%, 16%, and 31%, respectively. Significant changes in the amount of transcript over the observed time after the glucose pulse were determined using the ‘Area Under the Curve’ (AUC) approach (see Materials and Methods: “Statistical analysis”). The AUC value represents the average change in the relative amount of a transcript present during the whole time-series, and is shown in Supplementary Table S1, along with the normalized unlogged ratio of each transcript at each time point. Besides that, the Short Time-series Expression Miner (STEM) analysis tool (26) that is integrated with Gene Ontology (GO) enrichment analysis, was used to cluster genes that show a similar temporal expression pattern using normalized log2 ratios. In total, 10 significant temporal gene expression profiles were clustered (p < 0.01) as shown in Figure 2. As expected, when glucose is introduced into the glucose-limited chemostat culture, genes that express proteins involved in carbohydrate metabolism, including carbohydrate transport (GO:0008643) and carbohydrate catabolic process (GO:0016052), are downregulated. Interestingly expression of these same genes recovered after 30 min, as can be seen in Profile 0, while genes involved in cellular biosynthetic processes and cellular growth are up-regulated (e.g. DNA metabolic processes (GO:0006259), ribosomal proteins (GO:0003735), and nucleotide metabolic processes (GO:0009165)), as revealed by Profile 43, 41, and 40, respectively. Genes encoding proteins involved in RNA binding (GO:0003723) or in translation (GO:0006412) are enriched in Profile 38. Profile 49 represents the expression pattern of genes encoding proteins functioning in sulfur compound transport (GO:0072348). Phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS; GO:0009401) encoding genes, including fruB, gatB, gatC, manZ, srlB, and srlE, are clustered in Profile 19. In contrast, genes encoding carbohydrate transport (GO:0008643), especially ATP-binding cassette (ABC) transporters, including malE, malF, malG, malK, araF, and araG, follow the expression pattern of Profile 9. Profile 8 shows the expression of a set of genes that are down-regulated rapidly from the beginning onwards and remain so up to at least 15 min after initiation of the glucose pulse. Profile 8 is enriched in genes expressing proteins involved in cellular respiration (GO:0045333) and includes the NDH-I complex and some genes participating in the TCA- 60 IDENTIFYING POST-TRANSCRIPTIONAL CONTROL UPON CCR IN E. COLI Figure 2. Time-series transcriptomic analysis of E. coli upon glucose repression. Genes were clustered using Short Time-series Expression Miner (STEM) analysis tool with the abscissa representing the time scale of 5, 15, 30, and 60 minutes after the glucose pulse, and the ordinate the log2 of a gene expression level. Only significant profiles are shown here and arranged according to the significance level. The significance level (p-value) of the profile was calculated based on the number of genes assigned (right top) versus the number of genes expected and is indicated at the right-bottom. Black lines represent temporal gene expression model of the profile itself while red lines are genes that were assigned to the profile. The significant profiles which are similar to each other are grouped as a cluster of profiles, and are given the same color. 61 CHAPTER 3 cycle, i.e. acnA, fumA, sucC, and sucD, while Profile 10 displays the expression pattern of genes encoding proteins involved in aerobic respiration (GO:0009060), e.g. acnB, gltA, sdhA, sdhB, sdhC, and sdhD, that are rapidly down-regulated at first, but recover after 15 min after the glucose pulse. In contrast to most other genes involved in (cellular) respiration, ndh (encoding NADH: ubiquinone oxidoreductase II (NDH-II)) is up-regulated in a pattern that follows the expression pattern of Profile 38 (for more detail: see the STEM analysis files in Supplementary Data S1). Quantitative proteomic analysis of Carbon Catabolite Repression In parallel with the analysis of the transcriptome, quantitative time-series measurements of the proteome were carried out by using a stable-isotope labelling 15 + technique and LC-FTMS. Reference cells from a glucose-limited culture grown on NH4 , were harvested at steady state (T0ref) and then mixed equally (based on OD600) with cells 14 derived from a N culture (see Figure 1). Then, these mixed samples (i.e. t0/t0ref, t5/t0ref, t15/t0ref, t30/t0ref, and t60/t0ref) were individually processed and analysed as described in Materials and Methods: “Statistical analysis”. Three independent experiments with the 14 N cultures were performed, resulting in a total of 873 quantified proteins. Errors in the 1:1 mixing of the 14 N cell cultures, with the 15 N reference cultures, were corrected by 14 15 normalizing each dataset for all time points. This was done by setting the N/ N isotopic ratio for the TufA protein to 1. TufA has previously been used as the internal standard for corrections of protein injection between technical replicates and also for variation in protein loading after growth on different carbon sources (33). After normalization, a 14 15 2 normal distribution of the protein N/ N isotopic ratios at t = 0 of around 1 (with R = 14 15 0.99) is obtained, as shown in Figure 3. Standard deviations in the protein N/ N isotopic ratios, both before and after normalization of the three biological replicas, are within 10%, revealing the accuracy of the protein quantification. As an alternative, normalization of the protein 14 15 14 15 N/ N isotopic ratios in each data set was completed using their median value of the N/ N isotopic ratio (34). As shown in Supplementary Table S5, there is no significant difference in the results between normalization on the median values, and on the values of the TufA isotopic ratios. Nearly 80% of the quantified proteins were detected in at least two biological replicates, an observation which attests to the excellent reproducibility of the experiments (Supplementary Figure S1A). The resulting normalized 14 15 protein N/ N isotopic ratios for all time points are listed in the Supplementary Table S2. 62 IDENTIFYING POST-TRANSCRIPTIONAL CONTROL UPON CCR IN E. COLI Figure 3. Normalization of protein ratios at steady state, t=0, against TufA. Ordinate represents number of quantified proteins while abscissa indicates the protein 14 N/15N isotopic ratio. (A) Normal distribution of the protein ratios before normalization from three independent experiments. (B) Normal distribution of the normalized protein ratios. Number of quantified protein, mean value of the protein ratios, standard deviation of the protein ratios, and R2 are indicated. The abundance of a total of 557 proteins as a function of time after the glucose pulse was then subjected to linear regression analysis (see Statistical analysis). Approximately 60% of the proteins analyzed showed a slope significantly different from zero (p < 0.05). Of those, 115 (20%), and 228 (40%) proteins were significantly induced and repressed by CCR, respectively. The calculated changes in protein abundance (slopes) of 557 proteins, with the corresponding p-value and the change in transcript level (i.e. AUC value), are provided in Supplementary Table S3. Response of the proteome upon Carbon Catabolite Repression Consistent with the immediate upshift in growth rate upon addition of the pulse of glucose (22) and with the transcriptomics results (see above), the time series analysis of E. coli’s proteome revealed that the majority of the proteins who’s level is up-regulated, are ribosomal proteins (40%), and proteins involved in nucleotide- or amino acid biosynthesis (19% and 18%, respectively; Figure 4A). Also expressions of proteins involved in scavenging nucleotides and amino acids, such as the uracil permease (UraA), and the periplasmic oligopeptide-binding protein OppA were increased. Presumably, increasing amounts of iron-sulfur cluster containing proteins are required in this upshift of growth rate, because the levels of the sulfate transporter (i.e. CysA, and CysP) and of glutaredoxin-4 (GrxD) were remarkably up-regulated too. 63 CHAPTER 3 A particularly intriguing member of the group of proteins whose level is significantly upregulated by the glucose pulse is StpA, a multifunctional protein that is homologous to H-NS (35). StpA has a role in the regulation of glucose catabolism, as it represses the bgl operon (36, 37). Furthermore, it has recently been shown to display RNA chaperone activity, both in vitro and in vivo (38, 39) and is, together with H-NS, responsible for the “high expression level of essential and growth-associated genes and low levels of stressrelated and horizontally acquired genes” (40, 41). Figure 4. Quantitative time-series proteomic analysis of E. coli upon a glucose repression. Proteins with significantly changed expression level (p < 0.05) were grouped based on their cellular functions. (A) Significantly up-regulated protein categories. (B) Significantly down-regulated protein categories. A group of enzymes involved in central carbon metabolism, which includes glycolysis, the TCA cycle, the glyoxylate shunt and the pentose phosphate pathway, are the most important down-regulated proteins by CCR, in good agreement with the transcriptome data, as shown in Figure 4B. Also the alternative sugar transporters galactitol permease and mannose permease are sharply down-regulated, as well as the general PTS components EI (ptsI) and HPr (ptsH), and the glucose-specific PTS components EIIA (crr), and EIICB (ptsG), which is also in agreement with the results of the transcript analyses reported here and with the ~3.5-fold decrease in expression from a ptsG-lacZ fusion in nitrogen-limited chemostat cultures (mM residual glucose) compared to glucose-limited chemostat cultures (µM residual glucose) (42). Similarly, ATP synthase and several components of the respiratory chain (i.e. the NDH-I complex and the cytochrome bd-I terminal oxidase) are also down-regulated, while the expression level of NDH-II (which has + - a lower H /e stoichiometry than NDH-I (43, 44)) increased. Also the latter observation is consistent with the microarray results. The slope derived from the linear regression model not only indicates the direction of change in protein abundance, but also reveals the net change in the rate of production 64 IDENTIFYING POST-TRANSCRIPTIONAL CONTROL UPON CCR IN E. COLI (i.e. the rate of production minus the rate of degradation) for each protein. These changes in rate differ considerably between individual proteins and between protein categories. The most pronounced differences, after grouping of the members of the proteome of E. coli in the MultiFun (45) categories, are shown in Figure 5. Among the 3 most upregulated groups, the highest production rate observed (i.e. for the nucleotide biosynthesis group) is 12-fold higher than observed for the lowest (i.e. the ribosomal protein group). The standard deviation of the production rates in the amino acid biosynthesis group and in the nucleotide biosynthesis group is 63% and 41%, respectively. In contrast, the production rates of the ribosomal proteins show less variation: 26%. These results suggest that the production of the ribosomal proteins is strictly controlled and mutually synchronized. In contrast to the differences observed for the (MultiFun categories of the) upregulated proteins, the rates of decreased abundance of the down-regulated proteins is relatively constant (Figure 5). Decreased abundance will be due to a combination of ‘dilution’ of the protein because decreased relative rate of synthesis, plus active proteolytic degradation. The fact that the limiting rate seems to converge to a value close to the growth rate after the glucose pulse, may suggest that the former contribution may be dominant. Nevertheless, the results show that during the glucose response protein expression in E. coli is predominantly controlled at the synthesis level, considering that most of the down-regulated proteins appear to be degraded gradually and passively at a similar rate. Time-series analyses of Transcriptome vs. Proteome To identify genes whose products, in addition to CCR control, may be subject to posttranscriptional regulation, we have applied a new type of statistical analysis of the genome-wide omics data that relates the relative change in a transcript level with the change in expression of the corresponding protein that it brings about. In Figure 6A, this data has been plotted with increasing AUC values (i.e. change in relative mRNA abundance) as the explanatory variable and the relative change of the corresponding protein (i.e. the slope) as the dependent variable. In agreement with the first-order approximation, i.e. that the relative change in mRNA abundance is proportional to the relative change in protein abundance, the data points in this plot seem to be related exponentially. This is further confirmed by plotting the slopes against AUC* (= 2 Supplementary Figure S2). This linearity between the slopes and 2 AUC (AUC/55) ; see is expected as changes in both mRNA and protein levels are expressed relative to level prior to the perturbation of the cells with the pulse of glucose. Quantitative analysis of the linearity of 65 CHAPTER 3 this fit reveals that 36 % of the covariance in the relative change of the abundance of the E. coli proteome in the glucose response is defined by changing transcript levels (Supplementary Figure S2). The slope of this line holds information on the average efficiency of translation of the genes affected by glucose repression; however, a molecular interpretation of its numerical value is prevented by the fact that alteration of protein levels have not yet come to equilibrium in the time window available. Figure 5. Relative net production rates of the abundant proteins, grouped based on the MultiFun categories. Box plots show the relative net synthesis rates of the three most up-regulated protein categories, and the five most down-regulated protein categories. The horizontal bars represent the median, the first and the third quartiles, while whiskers represent minimum and maximum values. To identify genes that are subject to post-transcriptional regulation a method nevertheless was selected that is independent of the mathematical relation between AUC and slope. First the region in the plot was identified in which neither a change in the relative transcript abundance, nor in that of the corresponding protein level, can be 2 considered as significantly changed. Such a region is defined by a χ distribution with two degrees of freedom (see red and blue ellipse in Figure 6A for p < 0.01 and p < 0.05, respectively). All genes from within this ellipse were then excluded from further analysis. In the absence of further gene specific regulatory mechanisms, genes with similarly altered relative transcription level (i.e. AUC values) are predicted to also have similarly altered relative protein abundances. Neighboring genes on the AUC axis in Figure 6A, for 66 IDENTIFYING POST-TRANSCRIPTIONAL CONTROL UPON CCR IN E. COLI which this first-order hypothesis holds, are therefore expected to also have similar slope values (i.e. proportionally altered production rates/abundance). If their slope values are not similar, then the level of expression of that protein must be subject to posttranscriptional regulation. For genes of this latter class there must be a posttranscriptional mechanism that influences the relative amount of produced protein by other means than solely via the relative mRNA concentration. In order to identify all genes for which this conclusion about post-transcriptional regulation holds, we used the moving average approach as described in Materials and Methods: Statistical analysis: Genes having a correlation between the change in their relative transcript abundance and the change in the level of the corresponding protein that is statistically significantly different from the rest, are thereby identified as post-transcriptionally regulated (PTR) genes. Using this analysis, a total of 51 genes has been identified at the significance level of < 0.01 (Table 1) and they are joined by another 45 genes at the significance level of < 0.05 (Supplementary Table S4). The genes identified using this approach with the significance level of < 0.01 are shown in Figure 6B; red dot, and are listed in Table 1. Of these 51 genes, eight genes have already been reported in the literature to be regulated at the post-transcriptional level (46–50). The 51 genes can be classified into 4 groups, corresponding to the four quadrants of Figure 6C. The first two groups (Quadrants I and IV) contain genes transcriptionally activated by the glucose pulse. The abundance of their transcript increases (i.e. positive AUC); however, the production of the corresponding proteins is constrained by additional post-transcriptional control(s). The post-transcriptional regulation of genes from these quadrants can be further sub-divided into three possible control mechanisms, i.e. (i) increase in the translation rate, (ii) feedback regulation/ or inhibition of the translation process and (iii) altered mRNA stability. Enhanced translation rates were observed (Table 1) for genes in Quadrant I, like suhB, stpA, ompX, cysA, and gltB, whereas feedback regulation was observed for genes encoding the RNA polymerase α subunit (rpoA), pyridine nucleotide transhydrogenase β subunit (pntB), lysyl-tRNA synthetase (lysS), and amino acid periplasmic-binding proteins (metQ and fliY). As these genes exhibit significant increases in transcript level, but show little change in protein abundance. The increase in translation rate of ompX is possibly due to the fact that the small RNAs, CyaR and MicA, that inhibit translation of ompX (5, 50), were significantly down-regulated (see Supplementary Table S1). Inada and Nakamura, (1996) proposed that the expression of suhB, encoding inositol monophosphatase, is auto-regulated via its own translation product, by negatively modulating mRNA stability. But the underlying mechanism that 67 CHAPTER 3 controls suhB mRNA decay is unclear and whether or not inositol monophosphatase directly or indirectly modifies the activity of RNaseIII is not known (51). However, in the specific transition elicited by addition of a pulse of glucose, it is clear that the activation of suhB mRNA degradation was abolished. Figure 6. Comparison of transcript level, expressed as Area Under the curve (AUC), with the rate of change of the corresponding protein abundance, expressed as the slope of the amount of the respective protein, against time upon a glucose repression. (A) The area in which neither transcript- nor protein abundance is changed significantly, at p < 0.01 (red ellipse) and at p < 0.05 (blue ellipse). Genes from these areas are excluded from further analysis. (B) Genes for which the correlation between transcript level and protein abundance differs significantly from the average with p < 0.01 (red dot), and p < 0.05 (blue dot). These are the genes that are identified as being post-transcriptionally regulated. QI to QIV refer to a Cartesian coordinate. 68 IDENTIFYING POST-TRANSCRIPTIONAL CONTROL UPON CCR IN E. COLI Genes identified in Quadrant IV are possibly regulated by the inhibition of translation and/or the rate of mRNA turnover. Genes that may well be inhibited at the level of translation are eno, pdxB, and sspA. Notably ~35% of the genes (out of a total of 17) in this quadrant contain a Repetitive Extragenic Palindrome sequence (REP) element, within an intergenic region of a polycistronic mRNA or in the 3’UTR of a monocistronic transcript. The REP element has been reported to extend the half-life of the upstream mRNA by protecting the 3’end of the transcript from attack by a 3’ 5’ exonuclease (52, 53). In contrast to the first group of identified PTR genes (i.e. those in Quadrants I and IV), the group in Quadrant II represents genes of which transcription was repressed (i.e. negative AUC) by glucose, while the corresponding protein increased in abundance. Possible mechanisms underlying this type of regulation could be: (i) a strongly increased rate of translation or (ii) an extension of the half-life of the protein. Several genes in this quadrant have previously been shown to be regulated post-transcriptionally by changes in growth rate (46). Examples are: ggp, gltB, and rpsP. The abundance of proteins encoded by genes present in Quadrant III might be regulated by increased proteolysis. The rates of degradation of MinE and DnaJ are remarkably higher than those of the rest of the genes within this quadrant, considering the relatively small rate of change in their transcript level. Considering that these proteins are involved in cell division and DNA replication, respectively, tight regulation of their abundance is to be expected. 3.5 Discussion Catabolite/glucose repression has already been studied for many decades, but a full understanding of this regulation mechanism is still not available (54). Significantly, transcriptional regulation through the global regulator CRP and the concentration of cAMP only, cannot fully explain all the re-programming in E. coli to adjust metabolism to the availability of its preferred carbon source. In this study, the de-repressed CCR state of E. coli cells was established using the chemostat culturing technique, in combination with the induction of CCR by addition of a saturating pulse of glucose, as described elsewhere (22). The fact that cells cultured in a chemostat under glucose limitation are in the carbon catabolite de-repressed state was known from previous work (55). As shown in the Results, after switching from glucoselimited to glucose-excess conditions, expression of more than 50% of the genes of the E. coli MG1655 genome was changed significantly. 69 CHAPTER 3 Table 1. List of post-transcriptionally regulated genes identified in this study and grouped according to their cellular functions. GN Function Amino acid biosynthesis aroK† Shikimate kinase 1 cysA† Sulfate/thiosulfate import ATP-binding protein CysA Sulfite reductase [NADPH] hemoprotein β† cysI component cysPa Thiosulfate-binding protein glnA† Glutamine synthetase gltB†,b Glutamate synthase [NADPH] large chain metB† Cystathionine γ-synthase metC Cystathionine β-lyase metC metQ D-methionine-binding lipoprotein metQ fliY Cystine-binding periplasmic protein-SymR Cell division and DNA replication minE Cell division topological specificity factor nlpI Lipoprotein NlpI dnaJ Chaperone protein DnaJ Transcription rpoA DNA-directed RNA polymerase subunit α sspA Stringent starvation protein A stpA DNA-binding protein stpA yifE UPF0438 protein yifE Translation hfq Protein hfq hpf Ribosome hibernation promoting factor lysS Lysine-tRNA ligase pheT Phenylalanine-tRNA ligase β-subunit srmB† ATP-dependent RNA helicase SrmB-isrB tig Trigger factor typA GTP-binding protein typA/BipA ygfZ† tRNA-modifying protein ygfZ Ribosomal proteins rplAc 50S ribosomal protein L1 rplF 50S ribosomal protein L6 rplI 50S ribosomal protein L9 rplM 50S ribosomal protein L13 rplS 50S ribosomal protein L19 rplX 50S ribosomal protein L24 rpmA 50S ribosomal protein L27 rpmG† 50S ribosomal protein L33 rpsGd 30S ribosomal protein S7 rpsPb 30S ribosomal protein S16 Central carbon metabolism eno Enolase ppcb Phosphoenolpyruvate carboxylase talB Transaldolase B 70 Quadrant Slope AUC I I 0.0039 0.0175 24.2 80.0 I 0.0075 108.1 I I I I IV I I 0.0054 0.0033 0.0099 0.0046 -0.0045 0.0018 0.0005 102.9 87.6 47.3 124.7 34.6 56.9 55.4 III IV III -0.0051 -0.0020 -0.0051 -18.3 23.8 -6.3 I IV I I 0.0001 -0.0011 0.0260 0.0034 51.8 45.4 83.3 17.1 IV IV I II IV I I II -0.0028 -0.0080 0.0008 0.0003 -0.0032 0.0036 0.0081 0.0055 17.8 11.5 44.4 -46.3 39.8 30.7 45.2 -36.4 I I I I II I I I I I 0.0027 0.0036 0.0049 0.0025 0.0042 0.0034 0.0037 0.0058 0.0043 0.0038 60.0 72.6 40.9 60.4 -17.1 75.5 36.5 50.3 29.7 18.4 IV II IV -0.0001 0.0012 -0.0020 48.6 -19.7 47.1 IDENTIFYING POST-TRANSCRIPTIONAL CONTROL UPON CCR IN E. COLI Table 1. Continue. GN Function Quadrant Slope AUC Membrane proteins nuoC† NADH-quinone oxidoreductase subunit C/D III -0.0039 -79.7 ompXe Outer membrane protein X I 0.0255 103.7 Periplasmic substrate-binding protein subunit of potD IV -0.0014 35.1 Putrescine/ spermidine ABC transporter Uncharacterized ABC transporter ATP-binding yjjK† IV -0.0010 42.8 protein YjjK Other metabolisms PTS-dependent dihydroxyacetone kinase, dhaK† IV -0.0070 20.7 dihydroxyacetone-binding subunit dhaK pntB NAD(P) transhydrogenase subunit beta I 0.0004 51.3 recAb Protein RecA IV -0.0022 38.2 suhB Inositol-1-monophosphatase I 0.0228 42.4 pepB† Peptidase B IV -0.0039 27.3 rnr† Ribonuclease R IV -0.0029 39.4 pdxB Erythronate-4-phosphate dehydrogenase IV -0.0016 73.6 yeeX UPF0265 protein yeeX IV -0.0035 16.4 yfgD† Uncharacterized protein YfgD IV -0.0007 53.5 Post-transcriptionally regulated genes reported in aKramer et al., 2009, bValgepea et al., 2013, cBaughman and Nomura, 1983, dDean et al., 1981, and eDe Lay and Gottesman, 2009. †Gene containing a Repetitive Extragenic Palindrome sequence (REP element) within an intergenic region of an operon or 3’UTR of transcription unit. Underlined genes are additionally identified when a set of 556 proteins having at least 6 separate data points were used instead of a set of proteins having 8 separate data points (490 proteins). Quadrant refers to Figure 6B. This observed high percentage of genes with a significant change in expression level is a result of: (i) The accuracy we could achieve in the multiple measurements in the timeseries analysis, and (ii) the procedure selected to initiate the CCR. Regarding accuracy: Via selection of time-series measurements, genes that show a slight but consistent increase or decrease in expression as a function of time, are distinguished more accurately from background noise than in probing at a single time-point, which has resulted in the high proportion of statistically significantly altered gene expression levels detected. Regarding the second point, the procedure that we selected for initiation of glucose repression, this causes, in parallel to the CCR response, also an increase in growth rate of the cells (22). Therefore, a subset of the PTR genes that we have identified in this study may in fact respond to increased growth rate, rather than to glucose de/repression specifically, like e.g. some of those in Quadrant II (see also Results section). In spite of the duplicate samples taken for the mRNA quantification, and an assay in triplicate for the proteomics samples, the relative error ranges of the protein slopes are larger than those for the AUC in the transcriptome assay, as can be deduced from the shape of the ellipses in Figure 6A. Further reduction of these error ranges in this time71 CHAPTER 3 series experiment could be obtained by more frequent sampling or via the analysis of a larger number of parallel samples. The first of these options, however, would require larger chemostat vessels. The limit value of the negative slopes (implying decrease relative protein abundance) against the AUC values (Figure 6A) suggests that a limit value of the ordinate is observed at AUC values < -150 (see left-hand part of Figure 6A). The change in protein abundance of these genes is due to a combination of degradation and ‘dilution’ relative to newly synthesized protein for continued growth. The limit value of the decrease of relative -1 protein abundance after a glucose pulse is 0.008 min , which is very close to the maximal growth rate of the cells after the glucose pulse. This suggests that dilution is a more important mode of decreased protein levels than active proteolysis and supports the idea that change in protein abundance during CCR is regulated primarily via synthesis (Figures 5 and 6A) which is consistent with the characteristics of other physiological transitions reported previously (20, 46, 56, 57). Decreased expression of selected genes encoding proteins that contribute to cellular respiration was expected (58): Faster growth requires more ribosomes for anabolism, like e.g. nucleotide-synthesizing enzymes. Therefore, at higher growth rates cells rely more on ATP production through pathways with a higher net thermodynamic driving force, e.g. ATP-uncoupled pathways, which can catalyze with higher molecular turnover numbers (58). The occurrence of this shift to the use of lower efficiency/higher turnover pathways is confirmed by the observation that key glycolytic genes/enzymes, especially pfkA/PfkA and pykA/PykA, are down-regulated significantly (see Supplementary Table S1 and S4). This will allow for more expression of components involved in anabolism that then will allow faster growth (59). In contrast to this general expectation, after ~30 min glucose-repressed E. coli cells exhibit a catabolic efficiency that is higher than under glucose-limited growth conditions (22), possibly because the higher glucose concentrations drive an increased flux through the still incomplete suppression of the high-efficiency catabolic pathways. Furthermore, down-regulation of the central carbon- and energy-metabolism is unlikely to be driven by the general stress response, as the master regulator of this response (rpoS, (60)), was down-regulated significantly in parallel with oxidative-, osmotic-, starvation-, and DNA damage-stress response genes (see Supplementary Table S1). The importance of control of CCR, beyond the level of transcription, has recently also been emphasized by Geiselmann, de Jong and others (54, 61). 72 IDENTIFYING POST-TRANSCRIPTIONAL CONTROL UPON CCR IN E. COLI Significantly, recent study of Beisel and Storz (2011) has proposed a model in which the sRNA Spot 42 forms a coherent feed-forward loop with CRP and helps to regulate the expression of catabolic genes at the level of translation and/or the mRNA stability (62). More than one hundred sRNAs have been predicted computationally to exist in E. coli. Of those, only 80 sRNAs have been experimentally validated (63). To date, many sRNA targeting programs are available, e.g. CopraRNA-sRNA targeting (64), RNApredator (65), and targetRNA (66). However, identifying genes that are regulated by sRNA remains challenging as the sRNA and mRNA base-paring sequences are usually very short (7-10 base pairs) (6), which may result in many false-positive predictions. The approach we have applied in this study does not allow one to distinguish between mechanisms that a cell could use to reduce transcript (nor protein) levels, i.e. whether it is due to increased mRNA degradation or due to the inhibition of transcription. Nevertheless, the mRNA molecules in E. coli usually have a half-life of between 3 and 8 minutes (67). Therefore, genes exhibiting a very rapid decrease in their transcript level during the first 5 minutes after the glucose pulse, as those clustered in Profiles 0 and 8 (Figure 2), may be subject to one of the selective mRNA-degradation mechanisms. Other candidate mechanisms are the involvement of a riboswitch (68) or of ribosome stalling (69). Obviously, much more detailed experimental validation is necessary before all the post-transcriptional regulation mechanisms of the genes identified in this study will have been elucidated. 3.6 Acknowledgements The authors would like to thank Dr. Jeroen van der Steen for the omics data integration. The work of O.B. was funded through a grant from the Higher Education Commission of Thailand. 3.7 Supplementary materials Supplementary data associated with this chapter can be found in the online version at http://www.sciencedirect.com. Alternatively it may be requested per email for personal use: [email protected]. 3.8 References 1. Neidhardt FC, Curtiss IR, Ingraham JL, Lin ECC, Low KB, Magasanik B, Reznikoff WS, Riley M, Schaechter M, Umbarger HE. 1996. Escherichia coli and Salmonella : cellular and molecular biology. ASM Press, Washington, D.C. 2. Kitano H. 2002. Looking beyond the details: a rise in system-oriented approaches in genetics and molecular biology. Curr Genet 41:1–10. 73 CHAPTER 3 3. Maier T, Guell M, Serrano L. 2009. Correlation of mRNA and protein in complex biological samples. FEBS Lett 583:3966–3973. 4. Wadler CS, Vanderpool CK. 2007. A dual function for a bacterial small RNA: SgrS performs base pairingdependent regulation and encodes a functional polypeptide. Proc Natl Acad Sci U S A 104:20454–20459. 5. Johansen J, Eriksen M, Kallipolitis B, Valentin-Hansen P. 2008. Down-regulation of outer membrane proteins by noncoding RNAs: unraveling the cAMP-CRP- and sigmaE-dependent CyaR-ompX regulatory case. J Mol Biol 383:1–9. 6. Rice JB, Vanderpool CK. 2011. The small RNA SgrS controls sugar-phosphate accumulation by regulating 7. Beisel CL, Storz G. 2011. Discriminating tastes: physiological contributions of the Hfq-binding small RNA Spot multiple PTS genes. Nucleic Acids Res 39:3806–3819. 42 to catabolite repression. RNA Biol 8:766–770. 8. Desnoyers G, Masse E. 2012. Noncanonical repression of translation initiation through small RNA 9. Liu M, Durfee T, Cabrera JE, Zhao K, Jin DJ, Blattner FR. 2005. Global transcriptional programs reveal a recruitment of the RNA chaperone Hfq. Genes Dev 26:726–739. carbon source foraging strategy by Escherichia coli. J Biol Chem 280:15921–15927. 10. Deutscher J, Francke C, Postma PW. 2006. How phosphotransferase system-related protein phosphorylation regulates carbohydrate metabolism in bacteria. Microbiol Mol Biol Rev MMBR 70:939–1031. 11. Oh MK, Rohlin L, Kao KC, Liao JC. 2002. Global expression profiling of acetate-grown Escherichia coli. J Biol Chem 277:13175–13183. 12. Gosset G, Zhang Z, Nayyar S, Cuevas WA, Saier MH Jr. 2004. Transcriptome analysis of Crp-dependent catabolite control of gene expression in Escherichia coli. J Bacteriol 186:3516–3524. 13. Postma PW, Lengeler JW, Jacobson GR. 1993. Phosphoenolpyruvate:carbohydrate phosphotransferase systems of bacteria. Microbiol Rev 57:543–594. 14. Gorke B, Stulke J. 2008. Carbon catabolite repression in bacteria: many ways to make the most out of nutrients. Nat Rev 6:613–624. 15. Tao H, Bausch C, Richmond C, Blattner FR, Conway T. 1999. Functional genomics: expression analysis of Escherichia coli growing on minimal and rich media. J Bacteriol 181:6425–6440. 16. Gutierrez-Rios RM, Freyre-Gonzalez JA, Resendis O, Collado-Vides J, Saier M, Gosset G. 2007. Identification of regulatory network topological units coordinating the genome-wide transcriptional response to glucose in Escherichia coli. BMC Microbiol 7:53. 17. Khankal R, Chin JW, Ghosh D, Cirino PC. 2009. Transcriptional effects of CRP* expression in Escherichia coli. J Biol Eng 3:13–1611–3–13. 18. Taniguchi Y, Choi PJ, Li G-W, Chen H, Babu M, Hearn J, Emili A, Xie XS. 2010. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science 329:533–538. 19. Lu P, Vogel C, Wang R, Yao X, Marcotte EM. 2007. Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation. Nat Biotechnol 25:117–124. 20. Kramer G, Sprenger RR, Nessen MA, Roseboom W, Speijer D, de Jong L, de Mattos MJ, Back J, de Koster CG. 2010. Proteome-wide alterations in Escherichia coli translation rates upon anaerobiosis. Mol Cell Proteomics MCP 9:2508–2516. 21. Nie L, Wu G, Zhang W. 2006. Correlation between mRNA and protein abundance in Desulfovibrio vulgaris: a multiple regression to identify sources of variations. Biochem Biophys Res Commun 339:603–610. 22. Borirak O, Bekker M, Hellingwerf KJ. 2014. Molecular physiology of the dynamic regulation of carbon catabolite repression in Escherichia coli. Microbiology 160:1214–1223. 74 IDENTIFYING POST-TRANSCRIPTIONAL CONTROL UPON CCR IN E. COLI 23. Evans CGT, Herbert D, Tempest DW. Chapter XIII The Continuous Cultivation of Micro-organisms: 2. Construction of a Chemostat, p. 277–327. In Methods in Microbiology. Academic Press. 24. McLean S, Begg R, Jesse HE, Mann BE, Sanguinetti G, Poole RK. 2013. Analysis of the bacterial response to Ru(CO)3Cl(Glycinate) (CORM-3) and the inactivated compound identifies the role played by the ruthenium compound and reveals sulfur-containing species as a major target of CORM-3 action. Antioxid Redox Signal 19:1999–2012. 25. Rolfe MD, Ter Beek A, Graham AI, Trotter EW, Asif HM, Sanguinetti G, de Mattos JT, Poole RK, Green J. 2011. Transcript profiling and inference of Escherichia coli K-12 ArcA activity across the range of physiologically relevant oxygen concentrations. J Biol Chem 286:10147–10154. 26. Ernst J, Bar-Joseph Z. 2006. STEM: a tool for the analysis of short time series gene expression data. BMC Bioinformatics 7:191. 27. Ernst J, Nau GJ, Bar-Joseph Z. 2005. Clustering short time series gene expression data. Bioinforma Oxf Engl 21 Suppl 1:i159–68. 28. Han CL, Chien CW, Chen WC, Chen YR, Wu CP, Li H, Chen YJ. 2008. A multiplexed quantitative strategy for membrane proteomics: opportunities for mining therapeutic targets for autosomal dominant polycystic kidney disease. Mol Cell Proteomics MCP 7:1983–1997. 29. Martens L, Vandekerckhove J, Gevaert K. 2005. DBToolkit: processing protein databases for peptide-centric proteomics. Bioinforma Oxf Engl 21:3584–3585. 30. Martens L, Hermjakob H, Jones P, Adamski M, Taylor C, States D, Gevaert K, Vandekerckhove J, Apweiler R. 2005. PRIDE: the proteomics identifications database. Proteomics 5:3537–3545. 31. Lee MV, Topper SE, Hubler SL, Hose J, Wenger CD, Coon JJ, Gasch AP. 2011. A dynamic model of proteome changes reveals new roles for transcript alteration in yeast. Mol Syst Biol 7:514. 32. T. W. Anderson. 1984. An Introduction to Multivariate Statistical Analysis, p. 72–74. In Wiley. Wiley. 33. Silva JC, Denny R, Dorschel C, Gorenstein MV, Li GZ, Richardson K, Wall D, Geromanos SJ. 2006. Simultaneous qualitative and quantitative analysis of the Escherichia coli proteome: a sweet tale. Mol Cell Proteomics MCP 5:589–607. 34. Cox J, Mann M. 2008. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol 26:1367–1372. 35. Free A, Williams RM, Dorman CJ. 1998. The StpA protein functions as a molecular adapter to mediate repression of the bgl operon by truncated H-NS in Escherichia coli. J Bacteriol 180:994–997. 36. Free A, Porter ME, Deighan P, Dorman CJ. 2001. Requirement for the molecular adapter function of StpA at the Escherichia coli bgl promoter depends upon the level of truncated H-NS protein. Mol Microbiol 42:903– 917. 37. Wolf T, Janzen W, Blum C, Schnetz K. 2006. Differential dependence of StpA on H-NS in autoregulation of stpA and in regulation of bgl. J Bacteriol 188:6728–6738. 38. Grossberger R, Mayer O, Waldsich C, Semrad K, Urschitz S, Schroeder R. 2005. Influence of RNA structural stability on the RNA chaperone activity of the Escherichia coli protein StpA. Nucleic Acids Res 33:2280–2289. 39. Rajkowitsch L, Schroeder R. 2007. Dissecting RNA chaperone activity. RNA 13:2053–2060. 40. Zhang A, Rimsky S, Reaban ME, Buc H, Belfort M. 1996. Escherichia coli protein analogs StpA and H-NS: regulatory loops, similar and disparate effects on nucleic acid dynamics. EMBO J 15:1340–1349. 41. Srinivasan R, Scolari VF, Lagomarsino MC, Seshasayee ASN. 2015. The genome-scale interplay amongst xenogene silencing, stress response and chromosome architecture in Escherichia coli. Nucleic Acids Res 43:295–308. 75 CHAPTER 3 42. Seeto S, Notley-McRobb L, Ferenci T. 2004. The multifactorial influences of RpoS, Mlc and cAMP on ptsG expression under glucose-limited and anaerobic conditions. Res Microbiol 155:211–215. 43. Matsushita K, Ohnishi T, Kaback HR. 1987. NADH-ubiquinone oxidoreductases of the Escherichia coli aerobic respiratory chain. Biochemistry (Mosc) 26:7732–7737. 44. Sharma P, Teixeira de Mattos MJ, Hellingwerf KJ, Bekker M. 2012. On the function of the various quinone species in Escherichia coli. FEBS J 279:3364–3373. 45. Serres MH, Riley M. 2000. MultiFun, a multifunctional classification scheme for Escherichia coli K-12 gene products. Microb Comp Genomics 5:205–222. 46. Valgepea K, Adamberg K, Seiman A, Vilu R. 2013. Escherichia coli achieves faster growth by increasing catalytic and translation rates of proteins. Mol Biosyst 9:2344–2358. 47. Kramer G, Sprenger RR, Back J, Dekker HL, Nessen MA, van Maarseveen JH, de Koning LJ, Hellingwerf KJ, de Jong L, de Koster CG. 2009. Identification and quantitation of newly synthesized proteins in Escherichia coli by enrichment of azidohomoalanine-labeled peptides with diagonal chromatography. Mol Cell Proteomics MCP 8:1599–1611. 48. Baughman G, Nomura M. 1983. Localization of the target site for translational regulation of the L11 operon and direct evidence for translational coupling in Escherichia coli. Cell 34:979–988. 49. Dean D, Yates JL, Nomura M. 1981. Identification of ribosomal protein S7 as a repressor of translation within the str operon of E. coli. Cell 24:413–419. 50. De Lay N, Gottesman S. 2009. The Crp-activated small noncoding regulatory RNA CyaR (RyeE) links nutritional status to group behavior. J Bacteriol 191:461–476. 51. Inada T, Nakamura Y. 1996. Autogenous control of the suhB gene expression of Escherichia coli. Biochimie 78:209–212. 52. Newbury SF, Smith NH, Robinson EC, Hiles ID, Higgins CF. 1987. Stabilization of translationally active mRNA by prokaryotic REP sequences. Cell 48:297–310. 53. Newbury SF, Smith NH, Higgins CF. 1987. Differential mRNA stability controls relative gene expression within a polycistronic operon. Cell 51:1131–1143. 54. Kremling A, Geiselmann J, Ropers D, de Jong H. 2015. Understanding carbon catabolite repression in Escherichia coli using quantitative models. Trends Microbiol 23:99–109. 55. Lendenmann U, Egli T. 1995. Is Escherichia coli growing in glucose-limited chemostat culture able to utilize other sugars without lag? Microbiol Read Engl 141 ( Pt 1):71–78. 56. Boysen A, Moller-Jensen J, Kallipolitis B, Valentin-Hansen P, Overgaard M. 2010. Translational regulation of gene expression by an anaerobically induced small non-coding RNA in Escherichia coli. J Biol Chem 285:10690–10702. 57. Shalgi R, Hurt JA, Krykbaeva I, Taipale M, Lindquist S, Burge CB. 2013. Widespread regulation of translation by elongation pausing in heat shock. Mol Cell 49:439–452. 58. Flamholz A, Noor E, Bar-Even A, Liebermeister W, Milo R. 2013. Glycolytic strategy as a tradeoff between energy yield and protein cost. Proc Natl Acad Sci U S A 110:10039–10044. 59. Molenaar D, van Berlo R, de Ridder D, Teusink B. 2009. Shifts in growth strategies reflect tradeoffs in cellular economics. Mol Syst Biol 5:323. 60. Weber H, Polen T, Heuveling J, Wendisch VF, Hengge R. 2005. Genome-wide analysis of the general stress response network in Escherichia coli: sigmaS-dependent genes, promoters, and sigma factor selectivity. J Bacteriol 187:1591–1603. 61. Berthoumieux S, de Jong H, Baptist G, Pinel C, Ranquet C, Ropers D, Geiselmann J. 2013. Shared control of gene expression in bacteria by transcription factors and global physiology of the cell. Mol Syst Biol 9:634. 76 IDENTIFYING POST-TRANSCRIPTIONAL CONTROL UPON CCR IN E. COLI 62. Beisel CL, Storz G. 2011. The base-pairing RNA spot 42 participates in a multioutput feedforward loop to help enact catabolite repression in Escherichia coli. Mol Cell 41:286–297. 63. Raghavan R, Groisman EA, Ochman H. 2011. Genome-wide detection of novel regulatory RNAs in E. coli. Genome Res 21:1487–1497. 64. Wright PR, Richter AS, Papenfort K, Mann M, Vogel J, Hess WR, Backofen R, Georg J. 2013. Comparative genomics boosts target prediction for bacterial small RNAs. Proc Natl Acad Sci U S A 110:E3487–96. 65. Eggenhofer F, Tafer H, Stadler PF, Hofacker IL. 2011. RNApredator: fast accessibility-based prediction of sRNA targets. Nucleic Acids Res 39:W149–54. 66. Tjaden B. 2008. TargetRNA: a tool for predicting targets of small RNA action in bacteria. Nucleic Acids Res 36:W109–13. 67. Bernstein JA, Khodursky AB, Lin PH, Lin-Chao S, Cohen SN. 2002. Global analysis of mRNA decay and abundance in Escherichia coli at single-gene resolution using two-color fluorescent DNA microarrays. Proc Natl Acad Sci U S A 99:9697–9702. 68. Nudler E, Mironov AS. 2004. The riboswitch control of bacterial metabolism. Trends Biochem Sci 29:11–17. 69. Ito K, Chiba S, Pogliano K. 2010. Divergent stalling sequences sense and control cellular physiology. Biochem Biophys Res Commun 393:1–5. 77 Chapter 4: Contribution of regulatory RNAs to the post-transcriptional control of Carbon Catabolite Repression in Escherichia coli Orawan Borirak and Klaas J. Hellingwerf CHAPTER 4 4.1 Abstract Carbon Catabolite Repression (CCR) induces a global gene-expression response affecting many cellular processes. Involvement of small regulatory RNA (sRNA) in the cellular regulation has been well documented. Thus, a large number of sRNAs can be expected to participate in this regulatory cascade. To investigate this hypothesis, we analyzed the transcript levels of 78 sRNAs before and after a glucose pulse to glucose-limited, chemostat-grown E. coli cells, by using the global time-series transcriptomic data obtained previously. A total of 46 sRNAs were found to change their expression level significantly (p < 0.01) in response to the glucose pulse. Of those 46, 14 uncharacterized sRNAs, i.e. IsrB, RttR, C0614, C0293, C0465, PsrD, SroH, RyjB, Tpke11, Tpke70, C0719, RyfD, RyjA, and RyeA, were further investigated with respect to their possible role in CCR using bioinformatics tools. This analysis has revealed that three of these sRNA regulators, i.e. IsrB, C0293, and RyeA, may participate in the CCR regulatory network. In addition, for several of the Post-Transcriptionally Regulated (PTR) genes identified in our previous study (see Chapter 3), we have tentatively identified the corresponding regulatory sRNA(s). 4.2 Introduction The ability of Escherichia coli to grow on a large variety of carbon sources has been studied already for decades, through physiological as well as through genetic studies. Preferential utilization of glucose as a carbon source in this organism is regulated by the carbon catabolite repression (CCR) system. If glucose is depleted metabolism is transcriptionally reprogrammed to optimize E. coli for dissimilation of alternative carbon sources, in order to assure continued optimal growth (1–3). These metabolic adjustments can straightforwardly be detected via transcriptome and proteome analyses. The most basic mechanism of physiological adjustment in bacteria makes use of protein-based transcriptional regulators, many of which are ligand/metabolite gated. Nevertheless, several additional regulation mechanisms are relevant, like the involvement of higher-order nucleic acid structures (i.e. supercoiling (4), post-translational modification mechanisms like protein phosphorylation, -methylation and/or -acetylation (5) and even gene silencing (6). More recently it has become evident that also transcriptional- and posttranscriptional control exerted by regulatory RNAs makes an important contribution to this regulation in bacteria (see review (7)). The regulatory RNAs can be classified into 3 main classes. The first class involves 2 types of RNA element, usually located at the 5’ untranslated region (UTR) of the corresponding gene, which is also referred to as the leader sequence. For example attenuators exert their regulation through a change in the mRNA secondary structure upon binding of a metabolite, to permit or terminate 80 UNDERLYING OF SRNA REGULATION UPON CCR IN E. COLI transcription depending on nutrient availability, as e.g. in the trp operon (8). Furthermore, riboswitches control the initiation of translation by sensing the physiological state of the cell (e.g. temperature, or availability of a specific nutrient (for review see (9)), which then affects the spatial structure (and hence function) of the (m)RNA. The second class of regulatory RNAs involves small non-coding RNAs, also known as small RNAs (hereafter sRNA(s)), typically 50-500 nucleotides long, that bind to other RNA molecules or to (a) protein(s) and regulate a cellular function (7). The sRNAs are either cis-encoded or transencoded RNAs which modulate the translation process, or mRNA stability, through baseparing with their target mRNAs. Also, expression of their target mRNAs can be modulated by the cis-encoded sRNA (element) through preventing of the process of initiation of transcription or by inducing premature transcription termination (10). Their binding is usually assisted by the Sm-like RNA chaperon, Hfq (11, 12). In an alternative mechanism the sRNA may bind to a specific protein, which then modulates activity of its protein target, as e.g. CsrB and GlmY (7). The third class of the regulatory RNAs are the recently discovered CRISPR (Clustered regularly interspaced short palindromic repeats) RNAs, which provide immunity to bacteria against bacteriophage (and plasmid) infection (reviewed in (13, 14)). The most extensively studied among the 3 classes of regulatory RNAs are the sRNAs and hence these are the main focus here. The sRNAs participate in various cellular processes such as the oxidative stress response (15), regulation of outer membrane protein (OMP) expression (16), metabolic processes like iron uptake (17), carbon utilization (18), amino sugar catabolism (19), as well as the CCR regulatory network (20). CCR is one of the most important global regulatory response mechanisms in bacteria, as it is directly linked to cell growth, affects a large fraction of the genes encoded in the genome (i.e. it is a ‘global’ response), and it affects virulence factor expression in pathogenic bacteria. Thus we anticipated that a large number of sRNAs would also be contributing to the CCR regulatory network. To investigate this hypothesis, we analyzed the transcript levels of 78 sRNAs before and after administration of a pulse of glucose to glucose-limited, chemostat-grown E. coli cells, by using the global time series transcriptomic data obtained previously ((21), see also Chapters 2 and 3). A total of 46 sRNAs were found to change their expression level significantly (p < 0.01) in response to the glucose pulse. Of those 46, 31 sRNAs were known or predicted to participate in regulation of genes involved in a stress response, OMP composition, central carbon metabolism, or a toxin-antitoxin (TA) system. We therefore analyzed the potential function(s) and mRNA targets of the remaining 14 sRNAs in more detail, using an sRNA target prediction tool, CopraRNA-sRNA Targeting (22). In 81 CHAPTER 4 addition, CopraRNA was used to evaluate possible sRNA regulators of the posttranscriptionally regulated genes (PTR genes) identified earlier (21). In total, 56 PTR genes (out of a total number of 96) were predicted to be regulated via sRNA (p < 0.01). Possible regulation mechanisms of the relevant sRNAs are further discussed in more detail below. 4.3 Materials and methods Experimental data Small RNAs transcript levels, assayed with a time-series microarray analysis were taken from (21). Briefly, the time-series transcriptome analysis was performed using Agilent DNA microarrays (Stockport, UK) which covered 4,287 Escherichia coli K-12 MG1655 genes, supplemented with 126 probes representing 78 small RNAs. The sRNA probes were designed by eArray online software (Agilent Technologies), using the primary sequence reported in (23, 24). RNA isolation and microarray analysis were carried out as previously described (21). Statistical analysis of time-series transcriptomic data The available transcriptomic data consists of 2 biological replicates of samples taken at 0 (steady state-reference condition of the cells grown under glucose limitation in the chemostat), and 5, 15, 30, and 60 min after the glucose pulse, with technical duplicate measurements at all-time points (except at 30 minutes, when only a single measurement was performed). To characterize a change in the amount of transcript over time, Area Under the Curve of normalized log2 ratios of the transcripts as a function of time was calculated. Significance analysis was carried out as described in (21). Accordingly, any gene having AUC values > 16.75 or < -16.75 was considered significantly changed in its transcript level (with a corresponding p < 0.01). Volcano plots were derived from the normalized log2 ratio of the transcripts at each time profile. The student t-test was applied for a significance test. Transcripts of which the abundance changed more than 2-fold and which had a p-value < 0.05 were considered as significantly changed. In silico analyses Putative promoter regions of 14 poorly-characterized sRNAs from the www.ecocyc.org database (25) were analysed with the bioinformatics programs BPROM-Prediction of bacterial promoters (26) and Virtual Footprint (27). RNA sequences of these 14 sRNAs (from the same database) were used to predict their target mRNAs by using the CopraRNA prediction program (22). CopraRNA prediction is based on analysis of a series of 82 UNDERLYING OF SRNA REGULATION UPON CCR IN E. COLI homologous sRNA sequences. Therefore, eight organisms (at least 3 distinct organisms are required), including the organism of interest, were selected as input organisms: NC_000913 (i.e. E. coli str. K-12 substr. MG1655), NC_009792 (i.e. Citrobacter koseri ATCC BAA-895), NC_013716 (i.e. Citrobacter rodentium ICC168), NC_011740 (i.e. E. fergusonii ATCC 35469), NC_012917 (i.e. Pectobacterium carotovorum subsp. carotovorum PC1), NC_003197 (i.e. Salmonella enterica subsp. enterica serovar Typhimurium str. LT2), NC_009832 (i.e. Serratia proteamaculans 568), NC_003143 (i.e. Yersinia pestis CO92). For the isrB- and ryjA predictions, however, NC_003143 was excluded. The top 100 targets of the CopraRNA prediction have been subjected to a functional enrichment test by the integrated bioinformatics tool, DAVID, as described previously (28). Only clusters with an enrichment score ≥ 1.3 are considered statistically significant and are shown in Table 2. 4.4 Results and Discussion Effect of a glucose pulse on the expressions of 78 small non-coding RNAs in glucose-repressed Escherichia coli cells To investigate the effect of the CCR response on the expression of the 78 sRNAs previously identified (23, 24), time-series microarray analysis was performed as described elsewhere (21). Briefly, E. coli was cultivated using a glucose-limited chemostat such that cells at steady state were in a glucose de-repressed condition (29, 30), and these cells were used as reference cells. Then additional cell samples were collected in a time-series of 5, 15, 30, and 60 minutes after a saturating glucose pulse was added to the cells. The transcript level at any time-point was normalized against that of the reference condition, i.e. of cells in the glucose de-repressed steady state. Area under the Curve (AUC) and a significance analysis were performed to identify significantly changed transcripts, and those identified are shown in Figure 1a-c (for more detail: see Materials and Methods). Besides that, volcano plots derived from a conventional 2-fold cut-off and Student t-test analyses were chosen to provide an alternative overview. Genes representing a more than 2-fold change and p < 0.05 were considered as significantly changed. The volcano plots of a global transcriptomic response showed a significant global down-regulation of gene expression (Figure 1d) that is clearer than can be seen from the graph with the AUC values (Figure 1a). This global down-regulation was also observed in genes encoding transcription factors (TFs) (Figure 1e). These results are consistent with the response at the proteome level, as described earlier (21). It contrasts, however, the expression of the sRNAs, which were more equally distributed with a slightly greater range of fold-change on the upregulation side, as compared to the expression of the TFs (Figure 1f). 83 CHAPTER 4 Of the 78 sRNAs included in the array, nine sRNA (i.e. C0067, C0299, C0362, IS128, IstR_2, PsrO, RseX, SraA, and Tp2) were excluded from further analysis as their expression level was not detectable in all measurements (i.e. at 5 time-points, in 2 biological replicates, and 2 technical replicates (except at 30 minutes; see Materials and Methods). Furthermore, six sRNAs (C0343 (31), PsrN (32), SroA (33), SroC (33), SroS (33), and Tff (34)) that were later reported to be part of, or overlap with, the 5’UTR of a gene, were also excluded, as well as RyfB, that later has been shown to express the toxic peptide ShoB (35) (see Supplementary Table 1). Figure 1. Global changes in gene expression level of E. coli after 5, 15, 30, and 60 minutes after a glucose pulse. Left panel shows the overall change of the gene expression level of all genes (a), of only transcription factors (b), and of only small RNAs (c), after 60 minutes after administering the glucose pulse and expressed as Area Under the Curve (AUC). Genes representing AUC values > 16.75 or < -16.75 (dotted lines) were considered as significantly changed (p < 0.01). Right-hand panels are volcanu plots of the gene expression level of all genes (d), of only transcription factors (e), and of only small RNAs (f). Normalized log2 transcript levels were used for the xaxis and plotted against p-value, as calculated using the t-test. Genes representing more than a 2-fold change and p < 0.05 were considered as significantly changed. 84 UNDERLYING OF SRNA REGULATION UPON CCR IN E. COLI In total, 45 sRNAs (out of 62) were found with a significantly changed expression level (p < 0.01). The significantly changed sRNAs were grouped according to their function, obtained from www.ecocyc.org (25), together with their expression level, and are displayed in Figure 2. Figure 2. A total of 45 sRNAs showed a significantly altered expression level. Overall changes in transcript level after 60 minutes of the glucose pulse for these 45 sRNAs were represented by their Area under the Curve (AUC) value. These sRNAs were grouped according to their biological function. As shown, Spot 42 was the most abundant transcript (AUC = 290) among the sRNAs of which the function was known. Spot 42 has been reported to be participating in the CCR gene regulatory network, where its function is to counteract the global transcription factor CRP (20). Transcription of Spot 42 is inhibited by the cAMP-CPR complex (36), thus up-regulation of Spot 42 upon the glucose pulse was anticipated. Down-regulation of CyaR (AUC = -189) that is involved in regulation of the envelop stress response, quorum sensing, and nitrogen assimilation, was also predicted as cyaR expression is subject to catabolite repression (37, 38). The down regulation of CyaR and MicA (AUC = -37), that negatively regulate ompX expression, may be the reason why we observed earlier a non-proportional increase between the OmpX protein level and its corresponding transcript level (see Chapter 3). Also down-regulation of IsrA and McaS (AUC = -22), of which of both the expression is under control of catabolite repression (10), was observed. McaS has been reported to positively regulate flhD that encodes the transcription factor involved in synthesis of flagella, while it negatively regulates csgD encoding a transcription factor regulating biofilm formation (10, 39). Down-regulation of genes involved in metabolism 85 CHAPTER 4 and transport of alternative carbon sources was expected after the glucose pulse, as well as of those for synthesis of flagella and biofilm formation. Also up-regulation of GcvB (AUC = 153), that has been shown to negatively regulate peptide- and amino acid transporters (40–42), and ChiX (AUC = 32) that represses chiP encoding chitoporin (which is a chitosugar transporter (43)) were according to expectation. The GlmZ sRNA that enhances synthesis of glmS, encoding an enzyme that catalyzes the first step in hexosamine biosynthesis, was up-regulated. However, we did not observe a significant up-regulation of glmS. This could be a result of down-regulation of GlmY (AUC = -56) that helps to stabilize an activated form (i.e. the full-length form) of GlmZ. We also observed an upregulation of MgrR (AUC = 88), that is reported to be highly expressed in rich medium and may be involved in lipopolysaccharide modification (44). The earlier transcriptome profiling showed that the glucose pulse elicited significant E down-regulation of rpoE, that encodes the alternative sigma factor σ , and rpoS that S E encodes the alternative sigma factor σ (21). The first of these, σ , regulates genes S involved in the response to cell envelop stress (45), while σ acts as the master regulator E of the general stress response in E. coli (46). Both RybB and MicA are under control of σ ; thus down-regulation of their expression level was expected. Besides that, all the known OMP-regulating sRNAs (16) detected in this study, including OmrA, MicA, and MicF, were down-regulated significantly. So far, four sRNAs, including ArcZ (AUC = -99), DsrA (AUC = 42), RprA (AUC = 62), and OxyS (AUC = 67) have been reported to be involved in the regulation of rpoS (47, 48). The first three of these sRNAs positively regulate translation of the rpoS mRNA (47), while OxyS was reported to inhibit its translation (48). However, later it was shown that OxyS was rather competing with other sRNAs, like DsrA, for Hfq, rather than directly binding to the rpoS mRNA (49). Down-regulation of rpoS expression after the glucose pulse was presumably a result of the decreased concentration of the cAMP-CPR complex that is one of the rpoS regulators (50). DsrA, RprA, and OxyS, however, may well contribute additional modes of regulation of rpoS gene expression. DsrA has also been reported to inhibit hns translation. Hns is a global transcription silencer of genes containing AT-rich sequences (51). Indeed the hns transcript level was observed to decrease significantly, even though its positive regulator, Fis (52) was up-regulated. Overexpression of RprA, as well as of DsrA, has been shown to increase acid tolerance in E. coli (53, 54). Another, related, acid-responsive sRNA that was up-regulated after the glucose pulse, is GadY (AUC = 36). GadY positively regulates gadXW, which encode transcription regulators that participate in a complex regulatory cascade responding to acid stress (55). 86 UNDERLYING OF SRNA REGULATION UPON CCR IN E. COLI However, it is unlikely that the glucose pulse led to an induction of an acid stress response, as the acid-stress responsive genes, including hdeAB, gadAXW, and gadBC, were significantly down-regulated. We note here that our growth conditions were such that the pH before- and after the glucose pulse was kept constant at pH = 7 (29). Nevertheless, the glucose pulse initiated intracellular production of acetic acid at an average rate of ~1.6 mmol/g dry weight/h, starting 15 minutes after the glucose pulse ((29); Fig 4b). Thus, whether the accumulation of this acetic acid may have affected the intracellular pH, and hence still could have led to the induction of acid stress, and of expression of the related sRNAs, needs to be further investigated. Besides that, another stress-related sRNA that was up-regulated here is OxyS, which is induced by hydrogen peroxide (H2O2), and is known to be part of a regulatory network responding to oxidative stress (15). Additional independent evidence, indicating that there might be an elevated level of H2O2 after the glucose pulse, was the up-regulation of oxyR (AUC = 26) (56). However, we did not observe up-regulation of the most important genes involved in the scavenging of H2O2, i.e. katE, katG, ahpC, and ahpF, which are under control of OxyR (57). Taken together, these results suggest that the level of H2O2 in the cells did increase after the glucose pulse. However, it is unlikely that this level of H2O2 increased to such an extent that it contributed significantly to the global response of the transcriptome and proteome to the glucose pulse. Another large group of sRNAs that was significantly altered in its expression level, and that may contribute to the overall response to the glucose pulse, are the Toxin-Antitoxin (TA)-system related sRNAs. The TA systems are categorized based on the antitoxin type. Five distinct TA systems so far have been discovered (for a review see (58)). Type I TA systems are those where the antitoxin acts as an antisense RNA that binds to the toxin mRNA, which leads to inhibition of translation. Type I TA systems that are well characterized include Hok/Sok, LdrD-RdlD, ShoB-OhsC, SymE-SymR, TisB-IstR, and Ibs-Sib (35, 58). For all these systems expression of antisense RNA(s) resulted in a counteracting effect on the toxic mRNA, except for SibB which was supposed to be up-regulated (see Supplementary Table 2). SibB was down-regulated significantly, while the expression of ibsB, encoding a toxic peptide pair, was up-regulated (judging from raw data, as no AUC value could be calculated). Deliberate overexpression of Ibs proteins was toxic and caused cells to stop growing (35). However, after the glucose pulse cells continued to grow, even at a faster rate than before the glucose pulse. So it is likely that regulation of Ibs expression is beyond the repression of the Sib antisense RNA (35). 87 CHAPTER 4 Another sRNA that was initially expected to be up-regulated is SgrS, as there was a rapid down-regulation of ptsG, encoding the PTS-glucose transporter enzyme IIBC, after the glucose pulse (21, 29). This down-regulation of ptsG was contradicting a previous result, which indicated that ptsG expression was induced by glucose (59). SgrS downregulates multiple PTS genes, including ptsG and manXYZ (that encodes a mannose PTS permease under glucose-phosphate stress, or exposure to the non-metabolisable glucose analogue α-methyl glucoside (60, 61)). Thus, these results led us to believe that the glucose pulse might induce a glucose-phosphate stress. However, we did not detect any significant change in the sgrS expression level. Furthermore, it was shown that high glucose concentrations also did not lead to an induction of sgrS expression in E. coli K-12 MG1655 and JM109 (62). Taken together, these data suggest that the down-regulation of ptsG upon the glucose pulse was simply due to a lowered concentration of cAMP-CRP only, without any need for altered sgrS expression. Potential catabolite-repression controlled sRNAs among the 14 unknown small RNAs. To evaluate whether any of the sRNAs with unknown function is subject to catabolite repression, BPROM-Prediction of bacterial promoters (26) and Virtual Footprint (27) were used to examine the presence of possible transcriptional regulators among these 14 sRNAs. Default input was used for both prediction tools. The resulting predictions for potential transcription factor targets in the -150 bp region of the 5’UTR of these 14 sRNAs are shown in Table 1. As can be seen in Table 1, the promoter region of three sRNAs, i.e. IsrB, C0293, and RyeA were predicted to contain a CRP-binding site. The expression of IsrB (AUC = 314) and C0293 (AUC = 43) is likely repressed by cAMP-CRP, while RyeA expression (AUC = -134) was probably induced because of the decreasing cAMP-CRP level, judging from the expression level of these sRNAs. Indeed, the expression of IsrB, encoding the small protein AzuC (28 amino acids), was reported to be repressed by CRP (63). Initially, AzuC was predicted to be an amphipathic α-helix-containing protein (63) and later it was confirmed that this protein is bound to the cytoplasmic side of the inner membrane (64). The isrB/azuC was the most abundant transcript in the array, i.e. even higher than that of Spot 42. However, we did not detect any evidence of the expression of AzuC at the protein level, presumably due to its small size. The question whether IsrB functions only as an mRNA in expressing a protein, or whether it acts as a regulatory RNA as well, needs to be further investigated. Such a dual function of a small RNA has been demonstrated in E. coli for SgrS, and for many others in other bacteria as well (for review see (65)). SgrS inhibits 88 UNDERLYING OF SRNA REGULATION UPON CCR IN E. COLI ptsG translation by blocking the ribosome binding site. In addition, it encodes a small protein, SgrT, that modulates the activity of the glucose transporter (66). Use of the CopraRNA target prediction tool (22), showed that most of the mRNA targets of IsrB function in nucleotide biosynthesis or in carbohydrate biosynthesis (see Table 2). The identified targets include ndk, adk, atpA, mraY, waaF, wecD, mpl, wecE (p < 0.01, and false discovery rate < 50%). The first two targets (i.e. adk and ndk) were previously identified as post-transcriptionally regulated (PTR) genes (21), with an unknown mechanism. Only expression of mpl has been suggested to be activated and repressed by cAMP-CRP (25) and Cra (67), respectively. Mpl is involved in re-synthesizing peptidoglycan and/or its use as an energy source (68). Altogether, these results suggest a dual function of IsrB, which, however needs to be confirmed by directed experiments. Table 1. Predicted transcription factors regulating the 14 ‘unknown-function’ sRNAs. Promoter analysis BPROM† Virtual Footprint‡ IsrB (encoding a small protein AzuC) Fnr, SoxS, Fur Crp, Mlc RttR NO Fis, OxyR C0614 NagC, GcvA Fis, GcvA, MalT C0293 Fis, NarL, PhoB3, PdhR, Crp Fis, Fur, GlnG C0465 NO ArcA, FadR, Fis, IciA, Lrp, NarL, PdhR PsrD NO GlnG SroH NO Fis, NarL RyjB NO Fis, Lrp, NarL Tpke11 (dnaK-tpke11-dnaJ operon) RpoH, TyrR Tpke70 ArgR, NagC Fis, GlnG C0719 Fis RyfD NO OxyR RyjA IHF NarL RyeA Crp MetJ, PdhR † NO: No transcription factor binding site was predicted. sRNA ‡ Only those predicted transcription factors that represent a score ≥ mean score of the Position Weight Matrix (PWM) are shown. Target prediction of C0293 showed that most of its potential targets were ribosomal genes and transporters e.g. dcuA, oppF, and yhdY (Table 2). Interestingly, the expression of dcuA, encoding a C4-dicarboxylate transporter, is under control of CRP (69). Besides that, many target mRNAs (11 out of 31, see Supplementary Table 3) showed a prediction score (p-value) of 0, indicating a perfect complementary sequence (i.e. an extremely low binding-energy score). According to the program’s instruction, this is an artifact which 89 CHAPTER 4 should be ignored. However, it has been shown that some of the cis-encoded sRNAs could act as trans-encoded sRNAs (7). Thus interpretation of these result needs to be done with caution. Whether C0293 is a cis- or trans-acting sRNA or both, needs to be further investigated. RyeA had previously been speculated to be a cis-encoded sRNA, base-paring with the pphA mRNA (70). Consistently, pphA is one of the targets predicted by CopraRNA with a pvalue = 0 (for more predicted targets see: Supplementary Table 3). Additionally, other predicted targets with partial complementary have been subjected to enrichment analysis. The functional enrichments of the predicted targets were porphyrin and chlorophyll metabolism, pentose- and glucuronate interconversion, and tetrapyrrole metabolic processes (see Table 2). To date, little is known about the function of RyeA. More evidence is needed before any conclusion regarding the function of RyeA can be drawn. Potential functions and their predicted target mRNAs of the 14 unknown small RNAs. In order to shine some light on the function of the unknown sRNAs, the CopraRNA target prediction tool (22) was used to predict possible targets of each unknown sRNA, as well as subject the predicted targets to a functional enrichment test. The prediction results have been summarized and shown in Table 2. Predicted targets with p < 0.01 are significant and of interest. However, only predicted targets with p < 0.01 and a false discovery rate < 50 % are shown in Table 2 (for more prediction results, see: Supplementary Table 3). Furthermore, interactions based on predicted targets that do not belong to a defined enrichment class should be studied in more detail. Because of the limited amount of information available on these sRNAs, detailed evaluation of the prediction results without complementary directed experiments is not feasible. However, significant changes in expression level of these unknown sRNAs upon the perturbation (i.e. the glucose pulse) do suggest a possible function. Together with the prediction results, this could be used as a guideline for designing directed experiments, under properly selected growth conditions. 90 UNDERLYING OF SRNA REGULATION UPON CCR IN E. COLI Table 2. Predicted mRNA target and functional enrichment of 14 unknown small RNAs. sRNA IsrB RttR C0614 C0293 AUC 314 98 52 43 Functional- and domain enrichments† Group 1) -GO:0009152~purine ribonucleotide biosynthetic process -GO:0009150~purine ribonucleotide metabolic process -GO:0009260~ribonucleotide biosynthetic process -GO:0009259~ribonucleotide metabolic process -GO:0006164~purine nucleotide biosynthetic process -GO:0006163~purine nucleotide metabolic process -GO:0009165~nucleotide biosynthetic process -GO:0034404~nucleobase, nucleoside and nucleotide biosynthetic process -GO:0034654~nucleobase, nucleoside, nucleotide and nucleic acid biosynthetic process Group 2) -GO:0016051~carbohydrate biosynthetic process -GO:0000271~polysaccharide biosynthetic process -GO:0005976~polysaccharide metabolic process Group 1) -Membrane -Cell membrane -Cell inner membrane Group 2) -Transmembrane region -Transmembrane Group 3) -Topological domain: Cytoplasmic -Topological domain: Periplasmic Group 1) -IPR000600: ROK -Transcription/ Carbohydrate transport and metabolism Group 2) -Protein biosynthesis -GO: 0006412~translation Group 1) -KEGG pathway (ecv0310): Ribosome -GO:0006412~translation Group 2) -IPR:013026: Tetratricopeptide region -IPR011990: Tetratricopeptide-like helical Group 3) -Cell membrane -Membrane Group 4) -Antibiotic resistance -GO0046677~response to antibiotic Predicted mRNA target‡ ndk, adk, atpA mraY, waaF, wecD, mpl, wecE djlA, bacA, sdhA, dmsC, yeaQ, potH, amiD, rcnA, atpH, ypfN, glpB, bamC djlA, dmsC, yeaQ, rcnA, ypfN djlA, dmsC, potH, rcnA rpsI, fusA rpsJ, rpsI, rplP, rplD nrfG cusC, oppF, yhdY, rhmT(yfaV), dcuA, osmB, secY, yqhA rplD 91 CHAPTER 4 Table 2. Continue. sRNA C0465 PsrD SroH RyjB Tpke11 92 AUC 27 26 21 18 -24 Functional- and domain enrichments† Group 1) -FAD -Flavoprotein Group 2) -Nucleotide binding -P-loop Group 3) -KEGG pathway (ecq02010): ABC transporters Group 4) -IPR002917:GTP-binding protein, HSR1-related -Nucleotide phosphate-binding region:GTP -IPR005225:Small GTP-binding protein -GTP-binding Group 1) -IPR000150:Cof protein -IPR005834:Haloacid dehalogenase-like hydrolase Group 1) -Membrane -Cell membrane -Cell inner membrane Group 2) -Domain: ABC transmembrane type-1 -IPR000515:Binding-protein-dependent transport systems inner membrane component Group 3) -Transmembrane region -Transmembrane Group 1) Oxidoreductase GO:0055114~oxidation reduction Group 2) -Electron transport -GO:0022900~electron transport chain Group 3) -FAD -Flavoprotein Group 4) -KEGG pathway (ecq00020): Citrate cycle (TCA cycle) Group 1) -GO:0006399~tRNA metabolic process -GO:0034660~ncRNA metabolic process Group 2) -ATP-binding -Nucleotide-binding Predicted mRNA target‡ ilvB, ubiH, ilvH obgE, macB, minD macB, livM, cysU obgE, hflX zntA lplT, pnuC, pstA, rsxB pstA lplT, pnuC, pstA ydjA, gnd, sdhA, ybbO, nuoL, fdnI, tauD, cysI, aceF, rpsA, gor, nrdG sdhA, fdnI ydjA, sdhA, gor pck queA dppF, recB, dnaK, recA UNDERLYING OF SRNA REGULATION UPON CCR IN E. COLI Table 2. Continue. sRNA Tpke70 AUC -43 Functional- and domain enrichments† Group1) GO:0055114~oxidation-reduction process Group 2) Domain: ABC transporter-like (IPR003439) Conserved Site: ABC transporter, conserved site (IPR017871) Domain: AAA+ ATPase domain (IPR003593) Nucleotide phosphate-binding region:ATP Group 3) GO:006732~coenzyme metabolic process GO:0051186~cofactor metabolic process C0719 -48 Group1) KEGG pathway (efc00190): Oxidative phosphorylation Group 2) rRNA-binding KEGG pathway (ecv03010): Ribosome GO:006412~translation RyfD -58 Group 1) -KEGG pathway (ecv03010): Ribosome -Rribonucleoprotein -Ribosomal protein -Protein biosynthesis -GO:0006412~translation RyjA -94 Group 1) -KEGG pathway (ecd00650):Butanoate metabolism RyeA -134 Group 1) -KEGG pathway (ecf00860):Porphyrin and chlorophyll metabolism Group 2) -KEGG pathway (eci00040):Pentose and glucuronate interconversions Group 3) -GO:0033013~tetrapyrrole metabolic process -GO:0006778~porphyrin metabolic process -GO:0006779~porphyrin biosynthetic process -GO:0033014~tetrapyrrole biosynthetic process † Only the group representing an enrichment score ≥ 1.3 is shown. ‡ Predicted mRNA target‡ fdoI, asd, ahr, fadJ, sdhA, fixC, eutE, lpd, serA gltL, ybbA, pstB, macB, ddlA, fadD sucD, metH nuoH rpsC, rpsS, rpsK rpsK ilvB tktB Only predicted mRNA targets with p < 0.01 and false discovery rate < 50 % are shown. See Supplementary Table 3 for more predicted mRNA targets with p < 0.01. Potential sRNA regulators of PTR genes indentified in Chapter 3 Our earlier study ((21), chapter 3) has identified 96 genes of which the change in transcript level did not correlate with the corresponding change in protein level (i.e. these 93 CHAPTER 4 are known as Post-Transcriptionally Regulated (PTR) genes). Of those 96 genes, 26 genes have previously been reported to be regulated via sRNAs, identified on the basis of directed experiments and/or a computational approach. Here we have used the CopraRNA target prediction tool (22) to identify possible sRNA regulators of the remaining PTR genes. A total of 41 genes were predicted to be regulated via sRNA (see Table 3). Unfortunately, however, none of the prediction results overlapped with the previous reports. Most of these newly identified sRNAs are involved in: OMP composition, TA system, and stress response (Figure 2). The PTR genes reported in (21) were identified in an experiment in which cells were stimulated with a saturating glucose pulse. Such a glucose addition leads to a higher rate of metabolism, which in turn results in a higher growth rate of the cells (21, 29). It is therefore not surprising that among the PTR genes identified, representatives from the class of ribosome biosynthesis, metabolic process, biosynthesis of amino acids, and purine metabolism were (over)abundant. So far, only two sRNAs have been identified that are participating in these cellular processes, i.e. Spot 42, and GcvB. Indeed, 4 PTR genes, i.e. lysS, pheT, talB, and guaB were predicted to be paired with Spot 42, while 2 PTR genes, i.e. thrA and rpoA, were predicted to be regulated via GcvB. From an independent computational analysis (71) (see Table 3) it was concluded that an additional 11 PTR genes that are involved in amino acid biosynthesis, are also regulated via GcvB. Most of the sRNA regulators that were predicted to regulate the remaining PTR genes still have remained uncharacterized. Based on our analysis we now predict that IsrB sRNA is involved in the purine biosynthetic process, and that C0293 participates in regulation of ribosome biosynthesis (see Table 2). Taken together, this study shines some light on the possible involvement of a large number of sRNAs in cellular processes, besides the stress responses. Initially the stress response was believed to be a major process regulated by sRNAs, as the sRNA regulators allow cells to respond rapidly, yet reliably, to the relevant signals, and presumably more rapidly than the protein-based regulators (72). Another advantage of sRNA regulation is that it is possible to express these sRNAs at a steady but low level, without the large bursts in expression, typical for a proteinaceous regulator, which implies that their expression level can be tuned accurately and dynamically upon changes in environmental conditions (72, 73). Recently, many more studies have shown that other cellular processes are regulated via sRNA as well, e.g. by Spot 42 (20), GcvB (74), and GlmY/GlmZ (75). It would not be surprising if in the coming years many more new sRNA regulators would be discovered in the regulatory networks of the enteron E. coli. 94 UNDERLYING OF SRNA REGULATION UPON CCR IN E. COLI Table 3. Potential sRNA regulators of PTR genes indentified in Chapter 3. Literature report CopraRNA prediction (p<0.01)§ Description Slope† AUC‡ 0.0039 24 ChiX 0.0075 108 RttR, RyjB cysP gltB hfq hpf lysS metC metQ minE nlpI Shikimate kinase 1 Sulfite reductase [NADPH] hemoprotein β-component Thiosulfate-binding protein Glutamate synthase [NADPH] large chain Protein hfq Ribosome hibernation promoting factor Lysine-tRNA ligase Cystathionine β-lyase metC D-methionine-binding lipoprotein metQ Cell division topological specificity factor Lipoprotein NlpI 0.0054 0.0099 -0.0028 -0.0080 0.0008 -0.0045 0.0018 -0.0051 -0.0020 103 47 18 12 44 35 57 -18 24 ompX Outer membrane protein X 0.0255 104 -0.0016 74 0.0003 0.0004 -46 51 -0.0014 35 0.0012 -0.0022 0.0049 0.0025 0.0042 0.0037 0.0058 0.0001 0.0038 -0.0011 -0.0020 0.0036 -20 38 41 60 -17 37 50 52 18 45 47 31 0.0081 0.1715 34 78 GcvB (71) 0.1385 64 GcvB (71) 0.0669 108 0.3860 88 GcvB (71) 0.0003 90 GcvB (71) GN p < 0.01 aroK cysI pdxB pheT pntB potD ppc recA rplI rplM rplS rpmA rpmG rpoA rpsPb sspA talB tig p < 0.05 adk argB argD argH artJ cysN Erythronate-4-phosphate dehydrogenase Phenylalanine-tRNA ligase β-subunit NAD(P) transhydrogenase subunit beta Periplasmic substrate-binding protein subunit of Putrescine/ spermidine ABC transporter Phosphoenolpyruvate carboxylase Protein RecA 50S ribosomal protein L9 50S ribosomal protein L13 50S ribosomal protein L19 50S ribosomal protein L27 50S ribosomal protein L33 DNA-directed RNA polymerase subunit α 30S ribosomal protein S16 Stringent starvation protein A Transaldolase B Trigger factor Adenylate kinase Acetylglutamate kinase Acetylornithine/succinyldiaminopimelate aminotransferase Argininosuccinate lyase ABC transporter arginine-binding protein 1 Sulfate adenylyltransferase subunit 1 GcvB (71) FnrS, RyjA C0293 SroH Spot42 GcvB (71) GcvB (71) PsrD ArcZ RyhB CyaR (38), MicA (76) PsrD RyeA (77) Spot42, RyeA IsrB, Tpke70 RybB RyfD GlmZ (71) PsrD (77) PsrD (77) PsrD (77) PsrD (77) PsrD (77) Tpke11 SroH RttR, OmrA C0614 GcvB PsrD (77) RprA (71) RyjA (78) RyjA Spot42, ArcZ MicA, MgrR IsrB FnrS, RyeA 95 CHAPTER 4 Table 3. Continue. Slope† AUC‡ Literature report 0.0023 -14 GcvB (71) CopraRNA prediction (p<0.01)§ RttR 0.0001 -34 GcvB (71) SgrS 0.0000 -27 RyeA (77) 0.0000 123 Spot42, RyfD 0.0852 18 RyeA 0.0012 -114 C0465, RyjA 0.0276 0.0022 0.0108 0.0001 0.0222 0.2123 0.0000 0.0018 0.0024 0.6322 23 -73 9 58 4 88 63 36 28 51 ChiX IsrB C0719 MicC OxyS PsrD SroD, C0719 C0293 C0293 0.3929 54 Phosphoserine aminotransferase 0.1026 Biosynthetic arginine decarboxylase 0.0029 Bifunctional aspartokinase/homoserine thrA 0.0000 dehydrogenase 1 tolC Outer membrane protein tolC 0.1122 † Overall change in protein abundance during the glucose pulse. 41 -18 GcvB (71) 69 GcvB (71) -24 RprA (71) GN fdhE Description matP/ycbG ndk pheA pyrC pyrG pyrH rpmB rpsC rpsJ rpsQ Protein FdhE Formate dehydrogenase-O iron-sulfur subunit Glutathione S-transferase GstB Inosine-5'-monophosphate dehydrogenase Glutamate-1-semialdehyde 2,1aminomutase Acetolactate synthase isozyme 1 large subunit Macrodomain Ter protein Nucleoside diphosphate kinase P-protein Dihydroorotase CTP synthase Uridylate kinase 50S ribosomal protein L28 30S ribosomal protein S3 30S ribosomal protein S10 30S ribosomal protein S17 secY Protein translocase subunit SecY fdoH gstB/yliJ guaB hemL ilvB serC speA PsrD (77) C0293, RyfD, RybB PsrD RttR, IsrB, RyeA, GcvB ‡ Overall change in transcript level during the glucose pulse § sRNA regulators with a target prediction of p < 0.01 and a false discovery rate < 50 % are given in bold . characters. 4.5 Acknowledgements The work of O.B. was funded through a grant from the Higher Education Commission of Thailand. 4.6 Supplementary materials Supplementary data for this chapter may be requested per email for personal use: [email protected]. 96 UNDERLYING OF SRNA REGULATION UPON CCR IN E. COLI 4.7 References 1. Deutscher J. 2008. The mechanisms of carbon catabolite repression in bacteria. Curr Opin Microbiol 11:87– 93. 2. Deutscher J, Francke C, Postma PW. 2006. How phosphotransferase system-related protein phosphorylation regulates carbohydrate metabolism in bacteria. Microbiol Mol Biol Rev MMBR 70:939–1031. 3. Gorke B, Stulke J. 2008. Carbon catabolite repression in bacteria: many ways to make the most out of nutrients. Nat Rev 6:613–624. 4. 5. Pruss GJ, Drlica K. 1989. DNA supercoiling and prokaryotic transcription. Cell 56:521–523. Walsh CT, Garneau-Tsodikova S, Gatto GJ. 2005. Protein Posttranslational Modifications: The Chemistry of Proteome Diversifications. Angew Chem Int Ed 44:7342–7372. 6. Wiedenheft B, Sternberg SH, Doudna JA. 2012. RNA-guided genetic silencing systems in bacteria and archaea. Nature 482:331–338. 7. Waters LS, Storz G. 2009. Regulatory RNAs in bacteria. Cell 136:615–628. 8. Naville M, Gautheret D. 2009. Transcription attenuation in bacteria: theme and variations. Brief Funct Genomic Proteomic 8:482–492. 9. Breaker RR. 2012. Riboswitches and the RNA world. Cold Spring Harb Perspect Biol 4. 10. Thomason MK, Fontaine F, De Lay N, Storz G. 2012. A small RNA that regulates motility and biofilm formation in response to changes in nutrient availability in Escherichia coli. Mol Microbiol 84:17–35. 11. Aiba H. 2007. Mechanism of RNA silencing by Hfq-binding small RNAs. Curr Opin Microbiol 10:134–139. 12. Brennan RG, Link TM. 2007. Hfq structure, function and ligand binding. Curr Opin Microbiol 10:125–133. 13. Sorek R, Kunin V, Hugenholtz P. 2008. CRISPR — a widespread system that provides acquired resistance against phages in bacteria and archaea. Nat Rev Microbiol 6:181–186. 14. Sorek R, Lawrence CM, Wiedenheft B. 2013. CRISPR-Mediated Adaptive Immune Systems in Bacteria and Archaea. Annu Rev Biochem 82:237–266. 15. Altuvia S, Weinstein-Fischer D, Zhang A, Postow L, Storz G. 1997. A small, stable RNA induced by oxidative stress: role as a pleiotropic regulator and antimutator. Cell 90:43–53. 16. Vogel J, Papenfort K. 2006. Small non-coding RNAs and the bacterial outer membrane. Curr Opin Microbiol 9:605–611. 17. Massé E, Vanderpool CK, Gottesman S. 2005. Effect of RyhB Small RNA on Global Iron Use in Escherichia coli. J Bacteriol 187:6962–6971. 18. Babitzke P, Romeo T. 2007. CsrB sRNA family: sequestration of RNA-binding regulatory proteins. Curr Opin Microbiol 10:156–163. 19. Kalamorz F, Reichenbach B, März W, Rak B, Görke B. 2007. Feedback control of glucosamine-6-phosphate synthase GlmS expression depends on the small RNA GlmZ and involves the novel protein YhbJ in Escherichia coli. Mol Microbiol 65:1518–1533. 20. Beisel CL, Storz G. 2011. The base-pairing RNA spot 42 participates in a multioutput feedforward loop to help enact catabolite repression in Escherichia coli. Mol Cell 41:286–297. 21. Borirak O, Rolfe MD, de Koning LJ, Hoefsloot HCJ, Bekker M, Dekker HL, Roseboom W, Green J, de Koster CG, Hellingwerf KJ. Time-series analysis of the transcriptome and proteome of E. coli upon glucose repression. Biochim Biophys Acta BBA - Proteins Proteomics. 22. Wright PR, Richter AS, Papenfort K, Mann M, Vogel J, Hess WR, Backofen R, Georg J. 2013. Comparative genomics boosts target prediction for bacterial small RNAs. Proc Natl Acad Sci U S A 110:E3487–96. 23. Argaman L, Hershberg R, Vogel J, Bejerano G, Wagner EG, Margalit H, Altuvia S. 2001. Novel small RNAencoding genes in the intergenic regions of Escherichia coli. Curr Biol CB 11:941–950. 97 CHAPTER 4 24. Wassarman KM, Repoila F, Rosenow C, Storz G, Gottesman S. 2001. Identification of novel small RNAs using comparative genomics and microarrays. Genes Dev 15:1637–1651. 25. Karp PD, Weaver D, Paley S, Fulcher C, Kubo A, Kothari A, Krummenacker M, Subhraveti P, Weerasinghe D, Gama-Castro S, Huerta AM, Muñiz-Rascado L, Bonavides-Martinez C, Weiss V, Peralta-Gil M, SantosZavaleta A, Schröder I, Mackie A, Gunsalus R, Collado-Vides J, Keseler IM, Paulsen I. 2014. The EcoCyc Database. EcoSal Plus 2014. 26. Solovyev V, Salamov A. 2011. Automatic Annotation of Microbial Genomes and Metagenomic Sequences, p. 61–78. In Metagenomics and its Applications in Agriculture, Biomedicine and Environmental Studies (Ed. R.W. Li). Nova Science Publishers. 27. Münch R, Hiller K, Grote A, Scheer M, Klein J, Schobert M, Jahn D. 2005. Virtual Footprint and PRODORIC: an integrative framework for regulon prediction in prokaryotes. Bioinforma Oxf Engl 21:4187–4189. 28. Wright PR, Georg J, Mann M, Sorescu DA, Richter AS, Lott S, Kleinkauf R, Hess WR, Backofen R. 2014. CopraRNA and IntaRNA: predicting small RNA targets, networks and interaction domains. Nucleic Acids Res 42:W119–W123. 29. Borirak O, Bekker M, Hellingwerf KJ. 2014. Molecular physiology of the dynamic regulation of carbon catabolite repression in Escherichia coli. Microbiol Read Engl 160:1214–1223. 30. Lendenmann U, Egli T. 1995. Is Escherichia coli growing in glucose-limited chemostat culture able to utilize other sugars without lag? Microbiol Read Engl 141 ( Pt 1):71–78. 31. Durand S, Storz G. 2010. Reprogramming of anaerobic metabolism by the FnrS small RNA. Mol Microbiol 75:1215–1231. 32. Nechooshtan G, Elgrably-Weiss M, Sheaffer A, Westhof E, Altuvia S. 2009. A pH-responsive riboregulator. Genes Dev 23:2650–2662. 33. Vogel J, Bartels V, Tang TH, Churakov G, Slagter-Jäger JG, Hüttenhofer A, Wagner EGH. 2003. RNomics in Escherichia coli detects new sRNA species and indicates parallel transcriptional output in bacteria. Nucleic Acids Res 31:6435–6443. 34. Aseev LV, Levandovskaya AA, Tchufistova LS, Scaptsova NV, Boni IV. 2008. A new regulatory circuit in ribosomal protein operons: S2-mediated control of the rpsB-tsf expression in vivo. RNA 14:1882–1894. 35. Fozo EM, Kawano M, Fontaine F, Kaya Y, Mendieta KS, Jones KL, Ocampo A, Rudd KE, Storz G. 2008. Repression of small toxic protein synthesis by the Sib and OhsC small RNAs. Mol Microbiol 70:1076–1093. 36. Polayes DA, Rice PW, Garner MM, Dahlberg JE. 1988. Cyclic AMP-cyclic AMP receptor protein as a repressor of transcription of the spf gene of Escherichia coli. J Bacteriol 170:3110–3114. 37. Johansen J, Eriksen M, Kallipolitis B, Valentin-Hansen P. 2008. Down-regulation of outer membrane proteins by noncoding RNAs: unraveling the cAMP-CRP- and sigmaE-dependent CyaR-ompX regulatory case. J Mol Biol 383:1–9. 38. De Lay N, Gottesman S. 2009. The Crp-activated small noncoding regulatory RNA CyaR (RyeE) links nutritional status to group behavior. J Bacteriol 191:461–476. 39. Jørgensen MG, Nielsen JS, Boysen A, Franch T, Møller-Jensen J, Valentin-Hansen P. 2012. Small regulatory RNAs control the multi-cellular adhesive lifestyle of Escherichia coli. Mol Microbiol 84:36–50. 40. Urbanowski ML, Stauffer LT, Stauffer GV. 2000. The gcvB gene encodes a small untranslated RNA involved in expression of the dipeptide and oligopeptide transport systems in Escherichia coli. Mol Microbiol 37:856– 868. 41. Pulvermacher SC, Stauffer LT, Stauffer GV. 2009. The small RNA GcvB regulates sstT mRNA expression in Escherichia coli. J Bacteriol 191:238–248. 98 UNDERLYING OF SRNA REGULATION UPON CCR IN E. COLI 42. Pulvermacher SC, Stauffer LT, Stauffer GV. 2009. Role of the sRNA GcvB in regulation of cycA in Escherichia coli. Microbiol Read Engl 155:106–114. 43. Plumbridge J, Bossi L, Oberto J, Wade JT, Figueroa-Bossi N. 2014. Interplay of transcriptional and small RNAdependent control mechanisms regulates chitosugar uptake in Escherichia coli and Salmonella. Mol Microbiol 92:648–658. 44. Moon K, Gottesman S. 2009. A PhoQ/P-regulated small RNA regulates sensitivity of Escherichia coli to antimicrobial peptides. Mol Microbiol 74:1314–1330. 45. Johansen J, Rasmussen AA, Overgaard M, Valentin-Hansen P. 2006. Conserved small non-coding RNAs that belong to the sigmaE regulon: role in down-regulation of outer membrane proteins. J Mol Biol 364:1–8. 46. Weber H, Polen T, Heuveling J, Wendisch VF, Hengge R. 2005. Genome-wide analysis of the general stress response network in Escherichia coli: sigmaS-dependent genes, promoters, and sigma factor selectivity. J Bacteriol 187:1591–1603. 47. Soper T, Mandin P, Majdalani N, Gottesman S, Woodson SA. 2010. Positive regulation by small RNAs and the role of Hfq. Proc Natl Acad Sci U S A 107:9602–9607. 48. Zhang A, Altuvia S, Tiwari A, Argaman L, Hengge-Aronis R, Storz G. 1998. The OxyS regulatory RNA represses rpoS translation and binds the Hfq (HF-I) protein. EMBO J 17:6061–6068. 49. Updegrove TB, Wartell RM. 2011. The influence of Escherichia coli Hfq mutations on RNA binding and sRNA•mRNA duplex formation in rpoS riboregulation. Biochim Biophys Acta 1809:532–540. 50. Lange R, Hengge-Aronis R. 1994. The cellular concentration of the sigma S subunit of RNA polymerase in Escherichia coli is controlled at the levels of transcription, translation, and protein stability. Genes Dev 8:1600–1612. 51. Lease RA, Cusick ME, Belfort M. 1998. Riboregulation in Escherichia coli: DsrA RNA acts by RNA:RNA interactions at multiple loci. Proc Natl Acad Sci U S A 95:12456–12461. 52. Falconi M, Brandi A, La Teana A, Gualerzi CO, Pon CL. 1996. Antagonistic involvement of FIS and H-NS proteins in the transcriptional control of hns expression. Mol Microbiol 19:965–975. 53. Lease RA, Smith D, McDonough K, Belfort M. 2004. The small noncoding DsrA RNA is an acid resistance regulator in Escherichia coli. J Bacteriol 186:6179–6185. 54. Bak G, Han K, Kim D, Lee Y. 2014. Roles of rpoS-activating small RNAs in pathways leading to acid resistance of Escherichia coli. MicrobiologyOpen 3:15–28. 55. Opdyke JA, Fozo EM, Hemm MR, Storz G. 2011. RNase III Participates in GadY-Dependent Cleavage of the gadX-gadW mRNA. J Mol Biol 406:29–43. 56. Chiang SM, Schellhorn HE. 2012. Regulators of oxidative stress response genes in Escherichia coli and their functional conservation in bacteria. Arch Biochem Biophys 525:161–169. 57. Park S, You X, Imlay JA. 2005. Substantial DNA damage from submicromolar intracellular hydrogen peroxide detected in Hpx- mutants of Escherichia coli. Proc Natl Acad Sci U S A 102:9317–9322. 58. Wen Y, Behiels E, Devreese B. 2014. Toxin–Antitoxin systems: their role in persistence, biofilm formation, and pathogenicity. Pathog Dis 70:240–249. 59. Gosset G, Zhang Z, Nayyar S, Cuevas WA, Saier MH Jr. 2004. Transcriptome analysis of Crp-dependent catabolite control of gene expression in Escherichia coli. J Bacteriol 186:3516–3524. 60. Vanderpool CK, Gottesman S. 2004. Involvement of a novel transcriptional activator and small RNA in posttranscriptional regulation of the glucose phosphoenolpyruvate phosphotransferase system. Mol Microbiol 54:1076–1089. 61. Rice JB, Vanderpool CK. 2011. The small RNA SgrS controls sugar-phosphate accumulation by regulating multiple PTS genes. Nucleic Acids Res 39:3806–3819. 99 CHAPTER 4 62. Negrete A, Ng W-I, Shiloach J. 2010. Glucose uptake regulation in E. coli by the small RNA SgrS: comparative analysis of E. coli K-12 (JM109 and MG1655) and E. coli B (BL21). Microb Cell Factories 9:75. 63. Hemm MR, Paul BJ, Miranda-Ríos J, Zhang A, Soltanzad N, Storz G. 2010. Small Stress Response Proteins in Escherichia coli: Proteins Missed by Classical Proteomic Studies. J Bacteriol 192:46–58. 64. Fontaine F, Fuchs RT, Storz G. 2011. Membrane Localization of Small Proteins in Escherichia coli. J Biol Chem 286:32464–32474. 65. Vanderpool CK, Balasubramanian D, Lloyd CR. 2011. Dual-function RNA regulators in bacteria. Biochimie 93:1943–1949. 66. Wadler CS, Vanderpool CK. 2007. A dual function for a bacterial small RNA: SgrS performs base pairingdependent regulation and encodes a functional polypeptide. Proc Natl Acad Sci U S A 104:20454–20459. 67. Shimada T, Yamamoto K, Ishihama A. 2011. Novel members of the Cra regulon involved in carbon metabolism in Escherichia coli. J Bacteriol 193:649–659. 68. Park JT, Uehara T. 2008. How Bacteria Consume Their Own Exoskeletons (Turnover and Recycling of Cell Wall Peptidoglycan). Microbiol Mol Biol Rev MMBR 72:211–227. 69. Golby P, Kelly DJ, Guest JR, Andrews SC. 1998. Transcriptional regulation and organization of the dcuA and dcuB genes, encoding homologous anaerobic C4-dicarboxylate transporters in Escherichia coli. J Bacteriol 180:6586–6596. 70. Brantl S. 2007. Regulatory mechanisms employed by cis-encoded antisense RNAs. Curr Opin Microbiol 10:102–109. 71. Modi SR, Camacho DM, Kohanski MA, Walker GC, Collins JJ. 2011. Functional characterization of bacterial sRNAs using a network biology approach. Proc Natl Acad Sci 108:15522–15527. 72. Mehta P, Goyal S, Wingreen NS. 2008. A quantitative comparison of sRNA-based and protein-based gene regulation. Mol Syst Biol 4:221. 73. Levine E, Hwa T. 2008. Small RNAs establish gene expression thresholds. Curr Opin Microbiol 11:574–579. 74. Sharma CM, Papenfort K, Pernitzsch SR, Mollenkopf HJ, Hinton JC, Vogel J. 2011. Pervasive posttranscriptional control of genes involved in amino acid metabolism by the Hfq-dependent GcvB small RNA. Mol Microbiol 81:1144–1165. 75. Urban JH, Vogel J. 2008. Two seemingly homologous noncoding RNAs act hierarchically to activate glmS mRNA translation. PLoS Biol 6:e64. 76. Gogol EB, Rhodius VA, Papenfort K, Vogel J, Gross CA. 2011. Small RNAs endow a transcriptional activator with essential repressor functions for single-tier control of a global stress regulon. Proc Natl Acad Sci U S A 108:12875–12880. 77. Pandey SP, Winkler JA, Li H, Camacho DM, Collins JJ, Walker GC. 2014. Central role for RNase YbeY in Hfqdependent and Hfq-independent small-RNA regulation in bacteria. BMC Genomics 15:121. 78. Silva IJ, Ortega ÁD, Viegas SC, Portillo FG, Arraiano CM. 2013. An RpoS-dependent sRNA regulates the expression of a chaperone involved in protein folding. RNA 19:1253–1265. 100 Chapter 5: Quantitative proteomics analysis of an ethanol- and a lactateproducing mutant strain of Synechocystis sp. PCC6803 Orawan Borirak, Leo J. de Koning, Aniek D. van der Woude, Huub C.J. Hoefsloot, Henk L. Dekker, Winfried Roseboom, Chris G. de Koster and Klaas J. Hellingwerf This chapter has been published as: Borirak O, de Koning LJ, van der Woude AD, Hoefsloot HCJ, Dekker HL, Roseboom W, Green J, de Koster CG, Hellingwerf KJ. Quantitative proteomics analysis of an ethanol- and a lactate-producing mutant strain of Synechocystis sp. PCC6803. Biotechnol Biofuels. 2015;8:111. CHAPTER 5 5.1 Abstract To explore the molecular physiological consequences of a major redirection of carbon flow in so-called cyanobacterial cell factories, quantitative whole-cell proteomics analyses were 14 15 carried out on two N-labelled Synechocystis mutant strains, relative to their N-labelled wild type counterpart. Each mutant strain overproduced one specific commodity product, i.e. ethanol or lactic acid, to such an extent that the majority of the incoming CO2 in the organism was directly converted into the product. In total 267 proteins have been identified with a significantly up- or down-regulated expression level. In the ethanol-producing mutant, which had the highest relative direct flux of carbon-to-product (> 65 %), significant up-regulation of several components involved in the initial stages of CO2 fixation for cellular metabolism was detected. Also a general decrease in abundance of the protein synthesizing machinery of the cells and a specific induction of an oxidative stress response were observed in this mutant. In the lactic acid overproducing mutant, that expresses part of the heterologous L-lactate dehydrogenase from a self-replicating plasmid, specific activation of two CRISPR associated proteins, encoded on the endogenous pSYSA plasmid, was observed. RT-qPCR was used to measure, of nine of the genes identified in the proteomics studies, also the adjustment of the corresponding mRNA level. The most striking adjustments detected in the proteome of the engineered cells were dependent on the specific product formed, with e.g. more stress caused by lactic acidthan by ethanol production. Up-regulation of the total capacity for CO2 fixation in the ethanol-producing strain was due to hierarchical- rather than metabolic regulation. Furthermore, plasmid-based expression of heterologous gene(s) may induce genetic instability. For selected, limited, number of genes a striking correlation between the respective mRNA- and the corresponding protein expression level was observed, suggesting that for the expression of these genes regulation takes place primarily at the level of gene transcription. 5.2 Introduction The steep increase in the use of fossil carbon during the past 200 years has generated worries about the increasing CO2 content of the earth’s atmosphere and its presumed consequences, like global warming and ocean acidification (1–3). For this reason it is of utmost importance to develop new methodology and systems that will allow substitution of fossil carbon for renewable carbon, preferably driven by renewable, solar, energy so that in due time the global carbon cycle can be brought close(r) to a closure. Such systems indeed are under development and are mostly referred to as solar biofuel production 102 PROTEOMICS STUDY OF SYNECHOCYSTIS-CELL FACTORIES systems, although much more than only liquid fuel (carriers) can be produced: For many chemical commodities and fine chemicals proof of principle for their production, based on the same approach, meanwhile has also been provided (4–7). This development has taken place through first- to fourth-generation approaches (8– 11), driven by increasing concerns about sustainability, and more and more supported by life cycle analyses (12–14). Whereas the first generation approach would use key products from the food sector, like sucrose or starch, to produce ethanol (8, 15), in the fourth generation approach – also referred to as ‘direct conversion’ – one aims at the direct conversion of CO2 into the desired product, without the need to form biomass (and all the minerals required to form it) as an obligatory intermediate (6, 7, 9–11). This latter approach preferably uses cyanobacteria, to engineer them towards highly efficient ‘production systems’ for the desired product, because these organisms combine the highest efficiency in oxygenic photosynthesis (16) with straight forward application of synthetic systems biology, both in terms of molecular genetic intervention and in terms of the required computational analyses (e.g. review (17)). Indeed for such engineering several successful examples have meanwhile been documented, like e.g. for ethanol, iso-butanol and lactic acid, with product titres ranging from several tens of mM up to 0.1 M (7, 18, 19). Engineered strains that can carry out a particular conversion efficiently, so that more than 50 % of the fixed carbon is directly channelled into product, are referred to as ‘cell factories’ for that particular substrate (7). Such engineering, however, may impose quite some stress on the producing organism, which occasionally is visible as an increased genetic instability of a production strain (7, 20–22). This stress may originate from two different factors: First the massive re-direction of the carbon flow through the cell’s intermediary metabolism may cause metabolic adjustment that may have the characteristics of as a stress response. Second, the product titre may increase to levels that change the physico-chemical conditions of the growth medium and/or the cell, such that it causes growth inhibition. The effect of the latter type of stress on cyanobacteria has already extensively investigated through transcriptomics and proteomics studies of the effect of the extracellular addition of an end product, e.g. ethanol and butanol, to batch cultures of particularly Synechocystis sp. PCC6803 (hereafter Synechocystis) (23–26). Multiple stress response mechanisms were reported upon addition of either of these end products, including up-regulation of heat shock proteins, modification of the cell membrane and cell mobility, as well as induction of the oxidative stress response (23–26). It should be noted, however, that the highest product titres so far obtained with cyanobacterial cell factories 103 CHAPTER 5 (see above) hardly cause any stress on the wild type organisms in terms of growth retardation (27). But one has to keep in mind that this type of end-product stress may differ between situations in which the stressor is produced intracellularly, or added in the extracellular compartment. Notably, a recent transcriptome study of prolonged ethanol production in Synechocystis, yielding a final level of 4.7 g/L ethanol (i.e. 2.5-fold less than the concentration of ethanol used in (23, 24) to stress the cells) showed that this product formation causes only minor changes in the level of gene expression (28). Significantly fewer studies have been published on the physiological consequences, i.e. stress, of major rechanneling of intermediary metabolism in the ‘cell factories’. One consequence of the engineering of a high capacity carbon sink in cyanobacteria, however, has already been noted, i.e. the increased rate of cellular CO2 fixation (6, 29, 30). Here we explore the consequences of this approach (i.e. engineering of a high-capacity productforming pathway into cyanobacteria) with a detailed proteomics analysis for cell factories for ethanol (with the two necessary heterologous genes integrated in the hosts’ chromosome) and lactic acid (with partial expression of the ldh gene from an exogenous plasmid). These two mutants were selected because they represent the cell factories for which we have achieved the highest carbon partitioning coefficient (between new cells and product). The results obtained show that for the ethanol-producing mutant, diverting up to 60 to 70 % of the fixed carbon into product (31), causes little notable stress response, but rather a physiological accommodation in the form of an induction of the carbon concentrating mechanism, the CO2-fixing enzyme RuBisCO, and additional enzymes involved in the Calvin cycle. The highest levels achievable of carbon partitioning into lactic acid (30) did not lead to a similar increase in abundance of Calvin cycle enzymes. This high lactate productivity required introduction of a plasmid encoded lactate dehydrogenase. In this strain this elicits next to some physiological adaptation, a significantly increased expression of CRISPR associated proteins. 5.3 Materials and Methods Bacterial strains and growth condition Wild-type Synechocystis sp. PCC6803, a glucose tolerant derivative provided by D. Bhaya, University of Stanford, USA was used as the reference strain. An engineered ethanol-producing-, and a lactic acid-producing Synechocystis sp. PCC6803 derivative were constructed as described previously (30, 31). All strains were grown in triplicate in modified liquid BG-11 medium (32) supplemented with 50 mM NaHCO3 pH 8.0 (Sigma) at 30 °C in a shaking incubator at 120 rpm (Innova 43; New Brunswick Scientific) under a 2 constant light intensity of approximately 30 μE/m /s provided by 15 W cool fluorescent 104 PROTEOMICS STUDY OF SYNECHOCYSTIS-CELL FACTORIES white lights (F15T8-PL/AQ; General Electric) with addition of antibiotic where appropriate. For plates, BG-11 (Cyanobacteria BG-11 freshwater solution; Sigma) was supplemented with 1.5% (w/v) agar, 5 mM glucose, 0.3% (w/v) Na2S2O2, and antibiotic where appropriate. To obtain a 15 N-reference culture, modified BG-11 medium supplemented 15 14 with Na NO3 (98% atom; Sigma) instead of Na NO3 was used. Cell density was measured with a spectrophotometer (Lightwave II, WPA). Additional information regarding the strains used in this study is provided in Additional file 1: Table S1. Organic acid analysis and total carbon fixation calculation Ethanol concentrations were determined by HPLC set-up (LKB), equipped with a REZEX organic acid analysis column (Phenomenex) and a 7RI 1530 refractive index detector (Jasco). Samples were analysed at 45 °C by using 7.2 mM H2SO4 as the eluent. AZUR chromatography software was used for data quantification. The concentration of lactic acid was determined using the Rapid assay (Megazyme), following the manufacturers’ instructions. Total carbon fixation rates, qCO2, were calculated as described previously (7) (and references therein) and shown in Figure 1b. Proteomic analysis Sample preparation Cells were harvested at mid-exponential growth phase directly into an ice-cold tube with a 1/10 dilution of a Complete protease inhibitors cocktail mixture (Roche), and then centrifuged at 4,000 rpm for 10 min at 4 °C. Cell pellets were immediately frozen with liquid nitrogen and stored at -80°C until use. Sampled cells of the three replicate cultures were mixed with cells from the wild-type 15 N-reference culture at a 1:1 ratio, based on OD730. The mixed cell pellets were then resuspended in an extraction buffer that consisted of 6 M urea, 0.5 mM EDTA, 2% (w/v) SDS and complete protease inhibitors cocktail mixture in 100 mM NH4HCO3 and transferred into a new tube containing 100-μm glass beads (Sigma). Cells were broken using a Precellys®24 bead beater (Bertin Technologies). Cell debris was removed by centrifugation at 15,000 rpm for 30 min at 4 °C. The protein concentration of all the samples was measured by BCA assay. For further analyses 500 µg of the protein extract was reduced and alkylated using 10 mM Dithiothreitol (Sigma) and 55 mM Iodoacetamide (Sigma), respectively. Pre-fractionation was performed using SDS-PAGE containing 10% (w/v) polyacrylamide (Bio-Rad). A total of 9 protein fractions was extracted from each gel and subsequently subjected to trypsin digestion (added at a 1:10 (w/w) ratio), using an in-gel digestion method modified from (33). The resulting peptide mixtures were lyophilized, re105 CHAPTER 5 suspended in 0.1 % (v/v) trifluoroacetic acid (TFA) and 50 % (v/v) CH3CN, and then loaded onto an SCX PolySULPHOETHYLAspartamide TM column (2.1 mm ID, 10 cm length) on an Ultimate 2000 HPLC system (Thermo Scientific, Etten-Leur, The Netherlands) for cleaning purpose. Elution at a flow rate of 0.4 ml/min was performed using a 5 minute linear gradient from buffer A; 10 mM KH2PO4 and 25 % (v/v) CH3CN, pH 2.9 to B; 10 mM KH2PO4, 500 mM KCl and 25 % (v/v) CH3CN, pH 2.9. The total peptide fraction was collected based on UV monitoring of the eluent at 214 nm. The collected peptide fraction was lyophilized and stored at -80°C. Prior to mass spectrometry analysis, samples were desalted using a C18 reversed phase tip (Varian). LC-FT-MS/MS data acquisition, data processing and relative protein quantification The LC-FT-MS/MS data of the 3 biological replicates per strain containing each of 9 14 15 fractions of the N, N isotopic tryptic peptide mixture were acquired with an ApexUltra Fourier transform ion cyclotron resonance mass spectrometer (Bruker Daltonic, Bremen, Germany) equipped with a 7 T magnet and a nano-electrospray Apollo II DualSource™ coupled to an Ultimate 3000 (Thermo Scientific, Etten-Leur, The Netherlands) HPLC system. The 81 samples each containing 300 ng of the 14 N, 15 N tryptic peptide mixture were injected as a 20 μl 0.1% (w/v) TFA aqueous solution and loaded onto the PepMap100 C18 (5 μm particle size, 100 Å pore size, 300 μm inner diameter, and 5 mm length) precolumn. Following injection, the peptides were eluted via an Acclaim PepMap 100 C18 (3 µm particle size, 100 Å pore size, 75 μm inner diameter, and 250 mm length) analytical column (Thermo Scientific, Etten-Leur, The Netherlands) using a linear gradient from 0.1 % formic acid / 3 % CH3CN / 97 % H2O (v/v) to 0.1 % formic acid / 35 % CH3CN / 65 % H2O (v/v) over a period of 120 min, followed by 5 min to 0.1 % formic acid / 90 % CH3CN / 10 % H2O (v/v) at a flow rate of 300 nl/min. Data-dependent Q-selected peptide ions were -6 fragmented in the hexapole collision cell at an Argon pressure of 6×10 mbar (measured at the ion gauge) and both precursor and fragment ions were detected in the FTICR cell at a resolution of up to 60.000 (m/Δm) with a maximum MS/MS rate of about 2 Hz. Instrument mass calibration was better than 1.5 ppm over the m/z range from 250 to 1500. This yielded more than 9000 MS/MS spectra over the 125 min LC-MS/MS chromatogram. Raw FT-MS/MS data of the 9 protein gel fractions were processed as multi-file (MudPIT) with the MASCOT DISTILLER program, version 2.4.3.1 (64bits), MDRO 2.4.3.0 (MATRIX science, London, UK), including the Search toolbox and the Quantification toolbox. Peak-picking for both MS and MS/MS spectra was optimized for a mass resolution of up to 60.000 (m/Δm). Peaks were fitted to a simulated isotope distribution with a correlation threshold of 0.7, and with a minimum signal to noise ratio of 2. The 106 PROTEOMICS STUDY OF SYNECHOCYSTIS-CELL FACTORIES processed data were searched with the MASCOT server program 2.3.02 (MATRIX science) against a complete Synechocystis sp. PCC6803 proteome database obtained from the UniProt consortium (May, 2014; 3506 entries in total), plus the appropriate additional protein sequences for the two mutant strains. The database was further complemented with its corresponding decoy data base for statistical analyses of peptide false discovery rate (FDR). Trypsin was used as the hydrolytic enzyme and 1 missed cleavage was allowed in peptide identification. Carbamidomethylation of cysteine was used as a fixed modification and oxidation of methionine as a variable modification. The peptide mass tolerance was set to 10 ppm and the peptide fragment mass tolerance was set to 0.03 15 Dalton. The quantification method was set to the metabolic N labeling method to enable MASCOT to identify both 14 N and 15 N peptides. The MASCOT MudPIT peptide identification score was set at a cut-off of 20. At this cut-off, and based on the number of assigned decoy peptide sequences, a peptide false discovery rate of ~2 % for all analyses was obtained. Using the quantification toolbox, the isotopic ratios for all identified proteins were determined as weighted average of the isotopic ratios of the corresponding light over heavy peptides. Selected critical settings were: require bold red: on, significance threshold: 0.05: Protocol type: precursor; Correction: Element 15 N; Value 99.4; Report ratio L/H; Integration method: Simsons; Integration source: survey; Allow elution time shift: on; Elution time delta: 20 seconds; Std Err. Threshold: 0.15, Correlation Threshold (Isotopic distribution fit): 0.98; XIC threshold: 0.1; All charge states: on; Max XIC width: 200 seconds; Threshold type: at least homology; Peptide threshold value: 0.05; unique peptide sequence: on. All obtained quantification results were manually screened with the spectral data. The MASCOT DISTILLER protein identification reports were exported as Microsoft Excel xlsx files and then imported into a custom made VBA software program running in Microsoft Excel. The program facilitates organization and data mining of large sets of 14 15 proteomics data and corrects for possible N/ N culture mixing errors by normalization of the isotope ratios on the median values as described elsewhere (34). The combined protein quantification results produced accordingly are listed in Additional file 1: Table S2. Statistical analysis of the protein quantifications Significant changes in protein expression level measured in the two mutants, as compared to the wild-type strain were calculated. A Normal distribution was fitted using all the proteins measured in at least one of the three replicate WT experiments. By using this Normal distribution, the significance of the change in protein expression level was calculated using the z-test and then a Bonferroni correction was applied. Quantified 107 CHAPTER 5 proteins from the ethanol-producing strain, SAA012, with a p-value < 1.21E-05 was considered as a significant change (α < 0.01), while 1.14E-05 was used as the threshold for significant change (α < 0.01) in quantified proteins from the lactic acid-producing strain, SAW041. The small difference in threshold value between the strains is caused by the fact that not the same number of proteins is identified in both strains. RNA isolation and real-time quantitative PCR Cells were harvested at the mid-exponential growth phase and then centrifuged at 4,000 rpm for 10 min at 4 °C. Cell pellets were immediately frozen with liquid nitrogen and stored at -80 °C until use. Cells were opened using a Precellys®24 bead beater and cell debris was removed by centrifugation. RNA was isolated using the RNeasy mini kit (Qiagen). The concentration of RNA was measured using a Nanodrop 1000 spectrophotometer (Thermo Scientific), while RNA quality was determined using 1% agarose gels. The RevertAid First Strand cDNA Synthesis Kit (Thermo Sciecntific) was used to synthesize cDNA. Quantitative reverse transcription-PCR (RT-qPCR) was performed with cDNA in an Applied Biosystems 7300 Real Time PCR system using Power SYBRs Green PCR Master Mix (Life Technologies). Primers were designed using Primer3 software (Life Technologies) and are listed in Additional file 1: Table S3. The relative mRNA levels of eight selected genes, i.e. cbbL, ccmK1, cpcG2, gabD, gap1, gap2, kaiA, and sll7087, were calculated using the ∆∆CT method (35), and normalized using pgl as the internal control. The statistical significance level was calculated using the t-test and corrected by the Bonferroni method. 5.4 Results and Discussion Physiological analysis of product formation, i.e. ethanol and lactic acid, on Synechocystis sp. PCC6803 In order to determine the consequences of high rates of (intracellular) product formation (i.e. of ethanol and of lactic acid) in Synechocystis, the recombinant Synechocystis strains SAA012 (31) and SAW041 (30) were selected. The ethanol-producing strain carries pyruvate decarboxylase, pdc, and alcohol dehydrogenase, adhII, from Zymomonas mobilis under control of the endogenous promoter, psbA2 (5). The lactic acid-producing strain (i.e. SAW041) harbors a lactate dehydrogenase, ldh from Lactococcus lactis, and a pyruvate kinase from Enterococcus faecalis, each under control of the strong constitutive promoter, trc2, with additional expression of the lactate dehydrogenase from an exogenous plasmid (36) (see Methods and Additional file 1: Table S1). 108 PROTEOMICS STUDY OF SYNECHOCYSTIS-CELL FACTORIES Figure 1. Growth, product formation, and calculated total rates of carbon fixation of the two product-forming mutants, compared to the wild-type strain Synechocystis sp. PCC6803. (a) Growth and product formation of the wild-type (WT) and the two product-forming strains (SAA012 and SAW041). (b) Total carbon fixation rate of SAA012, SAW041, and the corresponding WT Synechocystis strain plotted against the cell density of the culture. Data are mean ± SD of 3 biological replicates. The two product-forming strains both grew considerably slower than the wild type (WT) strain, which was to be expected in view of the large amount of carbon directly channeled into product. Maximum specific growth rates (µ) of SAA012 and SAW041 were reduced to 71% and 58% of the value of WT, respectively. The ethanol production observed in SAA012 was 9.88 ± 0.16 mM over 19 days, representing a maximum -1 -1 production rate of 1.33 ± 0.12 mmol l d , while in the same period the lactic acid -1 -1 accumulated up to 6.19 ± 0.12 mM, with a rate of 0.36 ± 0.01 mmol l d (see Figure 1a). From these physiological data the total carbon fixation rate (qCO2) was calculated (i.e. for conversion into biomass and into product). Due to the differences in growth rate between the strains (and the consequences this has on the degree of light saturation in the -1 cultures), the qCO2 is here plotted as a function of biomass density in gDW l (Figure 1b). During exponential growth of the cells, when OD730 < 1 (i.e. up to approximately 0.2 gDW l - 1 ), SAA012 exhibits a higher qCO2 than both WT and SAW041. The average increase in the 109 CHAPTER 5 qCO2 calculated was ~1.5-fold higher than the rate of CO2 fixation in the WT. In contrast, the lactic acid-forming strain SAW041 showed a small decrease in overall qCO2 at this cell density. Quantitative proteomic analysis in Synechocystis To investigate the physiological consequences of intracellular product formation in the form of ethanol and lactic acid on the cellular protein composition, cells were harvested at the mid-logarithmic growth phase, to minimize interference by changes in the cellular protein composition induced by the (poorly defined) factors that limit exponential growth of these cells in the transition to stationary phase. Accordingly, cells of the WT, SAA012 and SAW041 were harvested at OD730 = 0.7, 0.7, and 0.4, respectively (see Figure 1a). 15 Reference cells were harvested at OD730 = 0.7 from a wild type culture for which Na NO3 was used as the nitrogen source. Cells from three independent cultures of the WT and of each of the two productforming strains were harvested, and mixed with the reference cells based on their OD730. 14 15 14 Normalization of the N/ N ratio, to allow correction for possible errors in mixing the N 15 and N samples, was performed using the median value, as described previously (34) (see also Methods). Upon subsequent analysis of these samples with LC-FT-MS/MS, 761, 826, and 881 proteins were quantified in WT, SAA012, and SAW041 respectively. This has resulted in a total of 1039 unique protein ratios (and thus 716 and 633 proteins that were identified in both product-forming strains and among the three strains, respectively). The list of proteins and their respective ratio(s) quantified in this study are provided in Additional file 1: Table S2, together with their mascot scores. The distribution of the normalized protein abundance of the three biological replicates of WT, SAA012, and SAW041 is depicted in Figure 2. As shown, the protein isotopic ratio of the WT is normally distributed around 1, further emphasizing the reproducibility of the measurements with the reference- and the WT cultures. This allows calculation of significant changes in relative protein expression level of the two product-forming strains, by using the z-test and the protein ratio distribution of the WT as a reference, followed by Bonferroni correction. This resulted in significance thresholds for the two product-forming strains as described in: Methods; Statistical analysis of the protein quantifications. Using these boundary conditions a total number of 168 and 153 proteins, respectively, showed a significantly altered abundance in mutant strains SAA012 and SAW041. The two lists of these proteins, including their calculated p-value, can be found in Supplementary Tables 4 and 5, respectively. 110 PROTEOMICS STUDY OF SYNECHOCYSTIS-CELL FACTORIES Figure 2. Normal distribution of normalized 14 N/15N isotope ratios of quantified proteins in three biological replicates of the wild-type (WT) strain (a), the ethanol-producing mutant, SAA012 (b), and the lactate-producing mutant, SAW041* (c). The abscissa represents the number of quantified proteins (expressed as frequency), while the ordinate indicates the protein 14N/15N isotopic ratio (l/h). The number of quantified proteins, the mean value of the protein ratios, and the standard deviation of the protein ratios are indicated in the insets. (*Three proteins with a ratio > 8-fold change were excluded). The numbers of differentially expressed proteins listed in these tables are visualized in Figure 3. Furthermore, in order to establish the connection between the up-regulated proteins and the down-regulated proteins of the two product-forming strains, STRING v9.1 (37) plus KEGG pathway (38) enrichment analysis was used to predict the underlying protein interaction network. The protein interaction networks that have resulted from this analysis were reconstructed by Cytoscape v3.1.1.1 (39), and are shown in Figures 4 and 5, for strain SAA012 and SAW041, respectively. The results of the KEGG pathway enrichment analyses showed that in SAA012, the up-regulated pathways were: syn00710 - carbon fixation in photosynthetic organisms, and syn00480 - glutathione metabolism, while the down-regulated pathways found were syn03010 - ribosome, and syn00196 photosynthesis-antenna proteins (p < 0.05). In contrast to SAA012, the enrichment analysis of SAW041 did not reveal specific up-regulated pathways, but only down- 111 CHAPTER 5 regulated pathways. Amongst others: photosynthesis (syn00195), photosynthesis-antenna proteins (syn00196), and glycolysis/gluconeogenesis (syn00010) were identified. An overview of the results of the KEGG pathway enrichment analyses of the two mutants is provided in Additional file 1: Table S6. Some of the observed proteins with significantly altered abundances are discussed in more detail below. Figure 3. Number of differentially expressed proteins in the two mutants compared to the wild-type strain. (a) Number of significantly up-regulated proteins (α < 0.01). (b) Number of significantly down-regulated proteins (α < 0.01). Mutant SAA012: A high rate of ethanol production increases the overall rate of carbon fixation Besides possible toxicity effects of the intracellular production of ethanol, we are also interested in the physiological effect(s) of rechanneling of a major part of the intermediary metabolites into our product of interest. As reported earlier (30, 31, 40), engineering a high capacity carbon sink in a cyanobacterium is able to stimulate the overall rate of CO2 assimilation in such organisms. Accordingly, we have observed that our ethanol-producing strain, SAA012, showed a higher qCO2 than the WT strain ((7); see also Figure 1b). A remaining question is whether this higher rate of total carbon fixation is the result of an increase in the activity of ribulose-1,5-bisphosphate carboxylase (RuBisCO; e.g. by release of product inhibition) or that more RuBisCO enzyme is expressed in response to functional 112 PROTEOMICS STUDY OF SYNECHOCYSTIS-CELL FACTORIES expression of the ethanol biosynthetic pathway (i.e. the carbon sink). The proteomic data suggests that the higher qCO2 is not exclusively a result of the up-regulation of the RuBisCO enzyme only (which is ~1.4-fold; see Additional file 1: Table S4), because some of the downstream enzymes involved in the Calvin cycle, including phosphoglycerate kinase (Pgk), fructose-bisphosphate aldolase class 1 (Fda), fructose-1,6-bisphosphatase class 1 (Slr0952), and glucose-6-phosphate 1-dehydrogenase (Zwf), that functions in pentose phosphate pathway (PPP), are also (slightly) up-regulated (see Figure 4a). In addition, carbonic anhydrase (encoded by slr0051), and the proteins involved in the CO2 concentrating mechanism, including CcmM and CcmK1, that assist to increase the intracellular CO2 level, were also considerably increased. This leads to the conclusion that the CO2-flux in Synechocystis upon engineering of a carbon sink is regulated hierarchically, i.e. by changes of gene expression that adjust enzyme capacities (Vmax), rather than metabolically by interactions of enzymes with substrates, products, or allosteric effectors (41). Significantly, we found that phosphomethylpyrimidine synthase (ThiC) which is involved in the biosynthesis of Thiamine pyrophosphate (TTP) (42), was also up-regulated. Pyruvate decarboxylase (PDC) catalyses the first step in the metabolic pathway towards ethanol, and is one of several enzymes that use TPP as a cofactor. An increase in TPP biosynthesis can be anticipated because of the high level constitutive expression of the exogenous PDC (31). Intracellular ethanol production reduces expression of the protein synthesis machinery and the phycobilisomes, but activates an oxidative stress response A large number of ribosomal proteins, and the 10 kDa and 60 kDa chaperonins are down-regulated in the ethanol-producing strain SAA012, as is shown in Figure 4a. This presumably is the result of the growth retardation that is observed in SAA012 (see Figure 1a) as a consequence of the dramatic rechanneling of intermediary metabolites in these cells. Besides a reduction in the expression of the protein synthesis machinery, also the components of the phycobilisome light harvesting complex, including the structural genes CpcA, CpcB, CpcC1, CpcC2, and CpcG2 (Sll1471), as well as PbsA1 (heme oxygenase 1), HemE and HemF (i.e. the latter three all involved in chromophore (tetrapyrrole) biosynthesis), were decreased in relative abundance. 113 CHAPTER 5 Figure 4. Predicted interaction of differentially expressed proteins identified in the ethanol-producing mutant, SAA012. Protein interaction was analyzed and reconstructed using STRING and Cytoscape, respectively. (a) Upregulated proteins (α < 0.01). (b) Down-regulated proteins (α < 0.01). 114 PROTEOMICS STUDY OF SYNECHOCYSTIS-CELL FACTORIES Reduction in the abundance of the phycobilisome complexes in these cells was independently confirmed via recording of whole cell absorption spectra (Additional file 2: Figure S1). Significantly, none of the key functional components of the two photosystems were down-regulated in the ethanol-producing strain. A slight (0.7-fold) down-regulation was observed for Psb27, but this component has a role in PS-II repair. This may be a consequence of the reduction of the (size of the) phycobilisome antenna. Consistent with this, the high level ethanol-producing cells do not display an increase in their general stress response (23). In contrast, moderate induction of the oxidative stress response as evidenced by increased levels of glutathione synthetase (GshB) and hydroperoxy fatty acidreductase (Slr1992) was observed. In the lactate producing mutant SAW041 a CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) system is induced Unlike the ethanol-producing strain, the proteomics results obtained with the lactateproducing strain SAW041 did not show any specific group of proteins that was upregulated. This includes the group of proteins identified by Kurian et. al., 2006 (43) as proteins up-regulated upon low-pH stress. We therefore tentatively conclude that significant low-pH stress is not among the consequences of redirection of a major part of the flux of carbon in Synechocystis towards L-lactic acid. Some of the up-regulated proteins in this strain, like DapB, Asd, ArgF and PurA, function in amino acid and nucleotide biosynthesis (see Figure 5a). Furthermore, three proteins with unknown function, including serum resistance locus BrkB, Slr5088 and Slr0106, were observed to be strongly (i.e. > 8-fold) up-regulated. Slr5088, encoded on pSYSM, was recently annotated + as a short-chain dehydrogenase (44). In the SAW041 strain, the NADH/NAD ratio is presumably decreased due to the high constitutive expression of the exogenous NADHdependent lactate dehydrogenase (LDH). Thus, up-regulation of a dehydrogenase (i.e. + Slr5088) which may be an NAD -dependent enzyme, may be a consequence of this. Besides that, three distinct proteins, i.e. Sll7065, Sll7087, and Sll7090, were found to be > 2-fold up-regulated. The expression of these proteins, and their probable function, has recently been characterized (45). They have been described as ‘CRISPR2- or CRISPR3system associated proteins’. Significantly, two of them, i.e. Sll7065 (CRISPR2-associated protein csm3, Cas7) and Sll7087 (CRISPR3-associated protein Cmr4) were also quantified in SAA012, but did not show any up-regulation in that mutant. Sll7065 was even significantly down-regulated in the ethanol-producing strain. Sll7090 (CRISPR3-associated protein Cmr2, cas10) could only be quantified in SAW041. 115 CHAPTER 5 Figure 5. Predicted interaction of differentially expressed proteins identified in the lactate-producing mutant, SAW041. Protein interaction was analyzed and reconstructed using STRING and Cytoscape, respectively. (a) Upregulated proteins (α < 0.01). (b) Down-regulated proteins (α < 0.01). 116 PROTEOMICS STUDY OF SYNECHOCYSTIS-CELL FACTORIES Little is known about the exact function of these two CRISPR systems in Synechocystis, except that the CRISPR3 system is the most abundantly expressed CRISPR system in WT cells (45). The CRISPR/Cas systems are often referred to as adaptive immune systems in bacteria and archaea (see reviews (46, 47)), which confer resistance to horizontal gene transfer, including phage transduction, transformation, and conjugation (48). It is relevant to note that the lactic acid-producing mutant has been transformed with a self-replicative plasmid, to allow for the required amount of LDH expression (30). It is still unclear whether the CRISPR/Cas system contributes to decreased genetic stability of the SAW041 strain. Decreased production of lactic acid was occasionally observed after re-culturing this strain from a frozen stock (data not shown). An earlier study, however, on ethylene production in Synechocystis has reported a stable production of ethylene (> 6 months) using an ethylene-producing mutant that carries a similar self-replicating plasmid (49). Intracellular lactic acid production reduces photosynthesis and the level of the light harvesting phycobilisome complex The majority of the down-regulated proteins in SAW041 are involved in photosynthesis and light harvesting, as can be seen in Figure 5a (see also Additional file 2: Figure S1; whole cell absorption spectra). Therefore, a decreasing abundance of proteins involved in central metabolic pathways, including Pgk, Tal, TktA, Eno, and Gap1 does not come unexpectedly. Whether or not this is cause or consequence of the slow growth phenotype observed in this strain cannot yet be decided. Surprisingly, however, all the ribosomal proteins quantified in this strain were not measurably changed in their abundance, which contrasts the proteomics results obtained with SAA012. Circadian clock components are induced by overproduction of ethanol and lactic acid A recent proteomics study of Synechocystis sp. PCC6803 after treatment with excessive (i.e. much higher than the levels achievable by cyanobacterial overproduction) amounts of ethanol has suggested that the circadian rhythm of the cells may be affected by the ethanol treatment, as the circadian clock protein KaiB (Slr0757) was up-regulated (23). In a follow-up study on the transcriptome of these cells, using the RNA-seq technique, it was shown that slr0758, encoding the circadian clock protein KaiC, was significantly upregulated by ethanol stress (24). Although KiaB was not detected in our proteomic analysis, the circadian clock proteins KaiA and KaiC were, and were both found to be upregulated in both product-forming strains. It is still unresolved what the mechanism is behind this up-regulation and how it in turn will affect circadian physiology. Conversely, 117 CHAPTER 5 the fact that circadian physiology is important for product formation in cyanobacteria has recently elegantly been demonstrated (50). Our results suggest that the expression of the circadian clock components may be affected by a change in (a) central metabolite(s) of the cells. RT-qPCR validation of the proteomic analyses To verify and further characterize the significance of the quantitative proteomics results, nine genes were chosen for analysis with RT-qPCR (see Figure 6a, and Methods). These genes were chosen, based on their corresponding protein levels, for which a wide range of abundances was observed. This set includes genes that may play an important role in the regulation of biofuel production. Among them, three proteins did not change significantly in relative abundance, neither in SAA012, nor in SAW041 (i.e. Pgl, GabD, and Gap2). Two proteins were chosen that are similarly down-regulated (i.e. CpcG2 and Gap1) in the two mutant strains, and one protein was selected that is up-regulated in both (i.e. KaiA). Also included were CbbL and CcmK1, which were up-regulated only in SAA012, and Sll7087 (i.e. the CRISPR3-associated protein Cmr4) which was only up-regulated in SAW041. Comparative RT-qPCR analysis between the two product-forming strains and the WT strain, showed a surprisingly good correlation between the changes in mRNA expression level (Figure 6a) and the relative protein abundance (Figure 6b). The small SEM errors in these measurements further emphasize the accuracy of our quantitative proteomic data set, and suggest that the regulatory processes involved in the responses of these selected genes operate predominantly at the level of transcription with no notable involvement of post-transcriptional regulation. 118 PROTEOMICS STUDY OF SYNECHOCYSTIS-CELL FACTORIES Figure 6. Altered mRNA- and protein expression level of selected genes of the two product-forming mutants, SAA012 and SAW041, relative to the WT strain. (a) Relative fold change of mRNA and protein level of the selected genes. Data are mean ± SEM of 3 biological- and 3 technical-replicates. Genes that were changed significantly in their level of expression are indicated with an asterisk. (b) Changes in relative protein abundance of the selected set of genes. Proteins with significantly changed expression level are marked with an asterisk (α < 0.01). Data are mean ± SEM of at least 2 biological replicates. Proteins that are shown without error bars, i.e. GabD in SAA012 and KaiA in SAW041, were quantified in only one biological replicate. Conclusions Our study further demonstrates that introducing a high-capacity fermentative pathway in a cyanobacterium, i.e. an extra carbon sink that resulted in an increase of the overall rate of carbon fixation, up-regulated a set of proteins involved in the carbon concentrating mechanism, CO2-fixation, and the Calvin cycle, as is evident from the results obtained with the ethanol-forming strain SAA012. A similar up-regulation, however, was not observed in the lactic acid-forming strain SAW041, even though in that strain up to 50 % of the fixed carbon was converted into lactic acid (30). This is in part due to the more strongly decreased growth rate of this latter strain. Instead, we observed in this strain a strong upregulation (i.e. ~11-fold) of Slr5088, which recently was annotated as a probable shortchain dehydrogenase (44). This is presumably due to an imbalance of the intracellular + NADH/NAD ratio, caused by the over-expressed NADH-dependent LDH. Whether or not 119 CHAPTER 5 the imbalance of this redox couple is also the prime reason behind the strong growth inhibition observed in this strain has yet to be revealed. Table 1. The top-ten most up- and down-regulated proteins quantified in at least two replica’s of the SAA012and the SAW041 mutant strain relative to the WT Synechocystis sp. PCC6803, as indicated by their isotopic ratios. 14 Protein ID P74635 P73238 P74705 P74644 Q6YRW2 P74625 P73799 P73204 P73605 P72955 Protein ID Q6ZEP2 Q55877 P73238 Q6ZEB1 Q6YRT3 P74625 P73204 Q54715 P73203 Q54714 Description Uncharacterized transporter slr0753 Slr2018 protein Slr1178 protein Circadian clock protein KaiA Slr6042 protein Phycobilisome rod-core linker polypeptide CpcG Slr1259 protein Phycobilisome 32.1 kDa linker polypeptide, phycocyanin-associated, rod 2 Slr1854 protein Urease accessory protein UreG Description Slr5088 protein Slr0106 protein Slr2018 protein Uncharacterized protein sll7087 Slr6012 protein Phycobilisome rod-core linker polypeptide CpcG Phycobilisome 32.1 kDa linker polypeptide, phycocyanin-associated, rod 2 C-phycocyanin alpha chain Phycobilisome 32.1 kDa linker polypeptide, phycocyanin-associated, rod 1 C-phycocyanin beta chain 1.78 1.76 0.20 N/15N ratio SAA012-2 3.01 3.31 1.82 2.07 1.95 0.23 0.32 0.40 0.43 SAA012-1 2.87 2.72 0.44 0.45 SAW041-1 11.23 8.77 2.81 2.77 2.51 0.22 0.54 0.45 14 N/15N ratio SAW041-2 10.71 8.63 3.00 SAA012-3 3.35 2.14 2.07 0.20 0.38 0.40 0.43 0.53 SAW041-3 2.56 0.22 2.47 2.45 2.45 0.16 0.23 0.22 0.16 0.36 0.44 0.36 0.39 0.27 0.28 0.45 0.43 0.28 Furthermore, in the SAW041 strain that harbors a self-replicative plasmid, we observed a significant up-regulation of CRISPR/Cas systems (i.e. > 2-fold). It is still unclear whether these systems play a role in genetic (in)stability (7, 20, 21). It is unlikely that the CRISPR/Cas system is induced solely by an exposure to exogenous plasmid, since similar use of the self-replicative plasmid in the synthesis of 2,3-butanediol (31) and ethylene (49) did not result in a reported loss of genetic stability. Further investigations on the 120 PROTEOMICS STUDY OF SYNECHOCYSTIS-CELL FACTORIES functional role of the CRISPR/Cas system in cyanobacteria will be necessary before its mode of action will have been resolved. As a final point it is relevant to note many proteins that were significantly altered in their abundance (see Table 1) in the product-forming strains are either hypothetical proteins or proteins with unknown function. Thus, significantly more research on the molecular physiology of cyanobacteria is necessary for a better understanding of the mode of operation of ‘biosolar cell factories’. 5.5 Acknowledgement The work of O.B. was funded through a grant from the Higher Education Commission of Thailand. 5.6 Supplementary materials The additional file associated with this chapter can be found in the online version at http://biotechnologyforbiofuels.biomedcentral.com. Alternatively it may be requested per email for personal use: [email protected]. 5.7 References 1. Allen MR, Frame DJ, Huntingford C, Jones CD, Lowe JA, Meinshausen M, Meinshausen N. 2009. Warming 2. Doney SC, Fabry VJ, Feely RA, Kleypas JA. 2009. Ocean Acidification: The Other CO2 Problem. Annu Rev Mar caused by cumulative carbon emissions towards the trillionth tonne. Nature 458:1163–1166. Sci 1:169–192. 3. McNeil BI, Matear RJ. 2008. Southern Ocean acidification: A tipping point at 450-ppm atmospheric CO2. 4. Deng M-D, Coleman JR. 1999. Ethanol Synthesis by Genetic Engineering in Cyanobacteria. Appl Environ 5. Lindberg P, Park S, Melis A. 2010. Engineering a platform for photosynthetic isoprene production in Proc Natl Acad Sci U S A 105:18860–18864. Microbiol 65:523–528. cyanobacteria, using Synechocystis as the model organism. Metab Eng 12:70–79. 6. Lan EI, Liao JC. 2012. ATP drives direct photosynthetic production of 1-butanol in cyanobacteria. Proc Natl Acad Sci U S A 109:6018–6023. 7. Angermayr SA, Paszota M, Hellingwerf KJ. 2012. Engineering a cyanobacterial cell factory for production of lactic acid. Appl Environ Microbiol 78:7098–7106. 8. Sims REH, Mabee W, Saddler JN, Taylor M. 2010. An overview of second generation biofuel technologies. Bioresour Technol 101:1570–1580. 9. Hellingwerf KJ, Teixeira de Mattos MJ. 2009. Alternative routes to biofuels: light-driven biofuel formation from CO2 and water based on the “photanol” approach. J Biotechnol 142:87–90. 10. Atsumi S, Higashide W, Liao JC. 2009. Direct photosynthetic recycling of carbon dioxide to isobutyraldehyde. Nat Biotechnol 27:1177–1180. 11. Lü J, Sheahan C, Fu P. 2011. Metabolic engineering of algae for fourth generation biofuels production. Energy Environ Sci 4:2451–2466. 121 CHAPTER 5 12. Yang J, Xu M, Zhang X, Hu Q, Sommerfeld M, Chen Y. 2011. Life-cycle analysis on biodiesel production from microalgae: Water footprint and nutrients balance. Bioresour Technol 102:159–165. 13. Diaz-Chavez RA. 2011. Assessing biofuels: Aiming for sustainable development or complying with the market? Energy Policy 39:5763–5769. 14. Gnansounou E. 2011. Assessing the sustainability of biofuels: A logic-based model. Energy 36:2089–2096. 15. Larson ED. 2006. A review of life-cycle analysis studies on liquid biofuel systems for the transport sector. Energy Sustain Dev 10:109–126. 16. Janssen M, Tramper J, Mur LR, Wijffels RH. 2003. Enclosed outdoor photobioreactors: light regime, photosynthetic efficiency, scale-up, and future prospects. Biotechnol Bioeng 81:193–210. 17. Berla BM, Saha R, Immethun CM, Maranas CD, Moon TS, Pakrasi HB. 2013. Synthetic biology of cyanobacteria: unique challenges and opportunities. Front Microbiol 4. 18. Gao Z, Zhao H, Li Z, Tan X, Lu X. 2012. Photosynthetic production of ethanol from carbon dioxide in genetically engineered cyanobacteria. Energy Environ Sci 5:9857–9865. 19. Varman AM, Xiao Y, Pakrasi HB, Tang YJ. 2013. Metabolic engineering of Synechocystis sp. strain PCC 6803 for isobutanol production. Appl Environ Microbiol 79:908–914. 20. Kusakabe T, Tatsuke T, Tsuruno K, Hirokawa Y, Atsumi S, Liao JC, Hanai T. 2013. Engineering a synthetic pathway in cyanobacteria for isopropanol production directly from carbon dioxide and light. Metab Eng 20:101–108. 21. Jacobsen JH, Frigaard N-U. 2014. Engineering of photosynthetic mannitol biosynthesis from CO2 in a cyanobacterium. Metab Eng 21:60–70. 22. Jones PR. 2014. Genetic Instability in Cyanobacteria – An Elephant in the Room? Front Bioeng Biotechnol 2. 23. Qiao J, Wang J, Chen L, Tian X, Huang S, Ren X, Zhang W. 2012. Quantitative iTRAQ LC-MS/MS proteomics reveals metabolic responses to biofuel ethanol in cyanobacterial Synechocystis sp. PCC 6803. J Proteome Res 11:5286–5300. 24. Wang J, Chen L, Huang S, Liu J, Ren X, Tian X, Qiao J, Zhang W. 2012. RNA-seq based identification and mutant validation of gene targets related to ethanol resistance in cyanobacterial Synechocystis sp. PCC 6803. Biotechnol Biofuels 5:89–6834–5–89. 25. Tian X, Chen L, Wang J, Qiao J, Zhang W. 2013. Quantitative proteomics reveals dynamic responses of Synechocystis sp. PCC 6803 to next-generation biofuel butanol. J Proteomics 78:326–345. 26. Anfelt J, Hallström B, Nielsen J, Uhlén M, Hudson EP. 2013. Using Transcriptomics To Improve Butanol Tolerance of Synechocystis sp. Strain PCC 6803. Appl Environ Microbiol 79:7419–7427. 27. Wijffels RH, Kruse O, Hellingwerf KJ. 2013. Potential of industrial biotechnology with cyanobacteria and eukaryotic microalgae. Curr Opin Biotechnol 24:405–413. 28. Dienst D, Georg J, Abts T, Jakorew L, Kuchmina E, Borner T, Wilde A, Duhring U, Enke H, Hess WR. 2014. Transcriptomic response to prolonged ethanol production in the cyanobacterium Synechocystis sp. PCC6803. Biotechnol Biofuels 7:21–6834–7–21. 29. Ducat DC, Silver PA. 2012. Improving Carbon Fixation Pathways. Curr Opin Chem Biol 16:337–344. 30. Angermayr SA, van der Woude AD, Correddu D, Vreugdenhil A, Verrone V, Hellingwerf KJ. 2014. Exploring metabolic engineering design principles for the photosynthetic production of lactic acid by Synechocystis sp. PCC6803. Biotechnol Biofuels 7:99–6834–7–99. eCollection 2014. 31. Savakis PE, Angermayr SA, Hellingwerf KJ. 2013. Synthesis of 2,3-butanediol by Synechocystis sp. PCC6803 via heterologous expression of a catabolic pathway from lactic acid- and enterobacteria. Metab Eng 20:121– 130. 122 PROTEOMICS STUDY OF SYNECHOCYSTIS-CELL FACTORIES 32. Shcolnick S, Shaked Y, Keren N. 2007. A role for mrgA, a DPS family protein, in the internal transport of Fe in the cyanobacterium Synechocystis sp. PCC6803. Biochim Biophys Acta 1767:814–819. 33. Shevchenko A, Wilm M, Vorm O, Mann M. 1996. Mass spectrometric sequencing of proteins silver-stained polyacrylamide gels. Anal Chem 68:850–858. 34. Cox J, Mann M. 2008. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol 26:1367–1372. 35. Livak KJ, Schmittgen TD. 2001. Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods San Diego Calif 25:402–408. 36. Zinchenko VV, Piven IV, Melnik VA, Shestakov SV. 1999. Vectors for the complementation analysis of cyanobacterial mutants. Russ J Genet 35:228–232. 37. Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, Lin J, Minguez P, Bork P, von Mering C, Jensen LJ. 2013. STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res 41:D808–15. 38. Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M. 2004. The KEGG resource for deciphering the genome. Nucleic Acids Res 32:D277–D280. 39. Saito R, Smoot ME, Ono K, Ruscheinski J, Wang PL, Lotia S, Pico AR, Bader GD, Ideker T. 2012. A travel guide to Cytoscape plugins. Nat Methods 9:1069–1076. 40. Ducat DC, Avelar-Rivas JA, Way JC, Silver PA. 2012. Rerouting Carbon Flux To Enhance Photosynthetic Productivity. Appl Environ Microbiol 78:2660–2668. 41. Daran-Lapujade P, Rossell S, van Gulik WM, Luttik MAH, de Groot MJL, Slijper M, Heck AJR, Daran J-M, de Winde JH, Westerhoff HV, Pronk JT, Bakker BM. 2007. The fluxes through glycolytic enzymes in Saccharomyces cerevisiae are predominantly regulated at posttranscriptional levels. Proc Natl Acad Sci U S A 104:15753–15758. 42. Kaneko T, Sato S, Kotani H, Tanaka A, Asamizu E, Nakamura Y, Miyajima N, Hirosawa M, Sugiura M, Sasamoto S, Kimura T, Hosouchi T, Matsuno A, Muraki A, Nakazaki N, Naruo K, Okumura S, Shimpo S, Takeuchi C, Wada T, Watanabe A, Yamada M, Yasuda M, Tabata S. 1996. Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions. DNA Res Int J Rapid Publ Rep Genes Genomes 3:109–136. 43. Kurian D, Phadwal K, Mäenpää P. 2006. Proteomic characterization of acid stress response in Synechocystis sp. PCC 6803. PROTEOMICS 6:3614–3624. 44. Kramm A, Kisiela M, Schulz R, Maser E. 2012. Short-chain dehydrogenases/reductases in cyanobacteria. FEBS J 279:1030–1043. 45. Scholz I, Lange SJ, Hein S, Hess WR, Backofen R. 2013. CRISPR-Cas systems in the cyanobacterium Synechocystis sp. PCC6803 exhibit distinct processing pathways involving at least two Cas6 and a Cmr2 protein. PloS One 8:e56470. 46. Sorek R, Lawrence CM, Wiedenheft B. 2013. CRISPR-Mediated Adaptive Immune Systems in Bacteria and Archaea. Annu Rev Biochem 82:237–266. 47. van der Oost J, Westra ER, Jackson RN, Wiedenheft B. 2014. Unravelling the structural and mechanistic basis of CRISPR-Cas systems. Nat Rev Microbiol 12:479–492. 48. Marraffini LA, Sontheimer EJ. 2008. CRISPR Interference Limits Horizontal Gene Transfer in Staphylococci by Targeting DNA. Science 322:1843–1845. 49. Guerrero F, Carbonell V, Cossu M, Correddu D, Jones PR. 2012. Ethylene synthesis and regulated expression of recombinant protein in Synechocystis sp. PCC 6803. PloS One 7:e50470. 123 CHAPTER 5 50. Johnson CH, Egli M. 2014. Metabolic Compensation and Circadian Resilience in Prokaryotic Cyanobacteria. Annu Rev Biochem 83:221–247. 124 Chapter 6: General discussion Orawan Borirak CHAPTER 6 6.1 Introduction Within the framework of this thesis I studied multi-omics analyses of E. coli responding to glucose repression, and possible stresses elicited by overproduction of commodity chemicals by Synechocystis cell factories, as well as the subsequent physiological responses of these two types of cells under the selected conditions. The main aim of the designed experiments was to investigate the quantitative relationship between transcript expression level and the corresponding protein abundance. Results obtained regarding those relationships, in particular the absence of proportionality between them, have been discussed in detail in Chapter 3. Below, more general aspects of the results obtained will be discussed, with an emphasis on the physiological behaviour of the two organisms under the experimental conditions selected for the experiments. 6.2 Microbial growth rate vs. growth yield Microbial growth rates (µ) indicate the relative rate of biomass produced per unit of time, while growth yields are usually expressed in grams of biomass produced per mole of substrate consumed (Y) or per mole of ATP formed (YATP), if the relevant underlying catabolic pathways (and their ATP yield) are known. Growth rate and growth yield (or: growth efficiency) are fundamental properties for the behavior of a microorganism in an ecosystem. However, several contradictory statements in the literature regarding the relationship between the growth rate and growth yield have been made (see e.g. (1)). Recently DA Lipson, however, proposed a reconciling view of the relationship between these two parameters, by stating that both a positive and a negative relation are part of the same continuum, such that each of these two phases is specified by a specific set of the physiological conditions (Figure 1). A positive relationship between growth rate and growth yield is observed when the cells are confronted with growth-limiting factors such as nutrient limitation, starvation, physiological stresses, etc., whereas the negative relationship (also referred to as a trade-off between growth rate and growth yield) is observed when there is excess resources and/or during rapid growth (see (1) and references therein). In Chapter 2, we have presented experiments designed to allow for accurate sampling for multi-omics analyses of carbon catabolite repression in E. coli, by growing this organism in an aerobic chemostat under glucose-limited conditions and at a dilution rate -1 of 0.2 h . After steady state was established, a 50 mM pulse of glucose was administered. Biomass production, the (rate of) glucose consumption, and fermentation product formation were then monitored over a 120 minutes period after the glucose pulse (Chapter 2; Figure 3a), in addition to the transcriptome and proteome analyses (see 126 GENERAL DISCUSSION Chapter 3). We observed that as a result of the addition of glucose, the specific glucose uptake rate (qs) and the growth rate (µ) of the cells were increasing proportionally up to 30 minutes after the pulse of glucose, while the specific rate of acetate formation peaked within the first 15 minutes (Chapter 2; Figure 3b). Biomass yield on glucose (YX/s) dropped during the first 15 minutes, followed by a gradual recovery. Surprisingly, the growth yields after the recovery phase showed values even higher than what is often reported as the maximum theoretical yield (Chapter 2; Figure 4). To analyse the relationship between growth rate and growth yield under these conditions (i.e. during the onset of glucose repression) in the interpretation of Lipson they were-plotted in Figure 2. To avoid possible complications in the interpretation of these data by oxygen limitation (see: (2), supplementary materials) only the first 60 min after the glucose pulse are taken into consideration. Figure 1. Conceptual diagram showing a proposed relationship between growth rate and growth yield across a broad range of environmental conditions and ecological strategies. The dominant forces structuring the two limbs of the curve appear at top. The left limb (gray) has a hyperbolic shape based on maintenance theory. The right limb (black) is sigmoidal curve derived from (3). The dashed line at left indicates the continuation of the simpler relationship between rate and yield that might be observed in the absence of these other phenomena. Figure reproduced from (1) with permission. It can be seen that after the glucose pulse the specific glucose uptake rate increase for a period up to 30 minutes, whereas the biomass yield on sugar, after an initial decrease, increases almost proportionally with growth rate. Overall the data lead to the conclusion that glucose repression brings cells from low-, nutrient-limited growth (where maintenance costs weigh heavily on growth yield) to moderate growth rates and that 127 CHAPTER 6 (much) higher growth rates would have to be achieved to see a lowering of the yield with the increased growth rate (compare Figure 1). Presumably, the initial lowering of the yield that occurred during the first 15 minutes after the pulse is due to metabolic perturbation caused by the sudden increase in glucose concentration; the initial decrease can thus be explained by the re-direction of catabolism through pathways that yield less ATP/mol glucose (e.g. acetate formation). The measured transiently high rate of acetate production observed in this time interval is in agreement with this (Chapter 3; Figure 3b). Figure 2. Observed changes in biomass yield on glucose (Y) and a specific glucose uptake rate (qs) as a function of growth rate (µ) of E. coli in response to the glucose pulse. Closed circles and open squares represent the biomass yield on glucose (Y) and the specific glucose uptake rate (qs), respectively. These values were calculated for the time intervals shown in Chapter 2, Figure 3a. Data are mean ± SE of three independent experiments. Beyond the decreasing contribution of maintenance, also energy metabolism may change considerably during the first 60 minutes, as can be seen in Figure 3. First of all, significant up-regulation of the less efficient NADH-dehydrogenase ndh, that encodes NDH-II, and down-regulation, particularly during the first 15 minutes, of the more efficient equivalent, the NDH-I complex, encoded by nuo operon, is clearly observed. This may be due to the fact that when E. coli is shifted from glucose-limited to glucose-excess conditions, an increasing amount of NDH-II, which has a higher turnover rate (Vmax) than the NDH-I complex (4, 5), is required to transfer electrons from NADH to the quinone pool. + - However, the overall effect of these changes on the H /e stoichiometry of the electron transfer chain are difficult to estimate because at the level of the terminal oxidases, in parallel to the changes in the NADH-dehydrogenase part of the respiratory chain, the composition changes too. Figure 3a also shows that the more efficient enzyme in terms of proton over electron stoichiometry, cytochrome bo (encoded by the cyoABCDE operon) is significantly up-regulated, while a less efficient one, cytochrome bd-I (encoded by cydAB) is significantly down-regulated (note: no significant change was observed for appCB, 128 GENERAL DISCUSSION encoding cytochrome bd-II). The cytochrome- bo and bd-I oxidases have a similar Vmax (i.e. ~222 mol O2/mol cytochrome bd/s (6)) and cytochrome bo is generally expressed under oxygen saturation while cytochrome bd-I is expressed under oxygen limitation (7). If anything, the oxygen concentration during this time interval may be expected to gradually decrease (until anaerobiosis sets in after > 100 minutes (see: (2), supplementary materials)), so this adjustment of the cytochrome bd-I oxidase is in contrast to expectation. Figure 3. Time-series microarray analysis of E. coli after the glucose pulse. (A) Log2 expression levels of genes or operons encoding proteins involved in respiratory chain. Abbreviations for genes and operons: nuoABCEFGHIJKLMN, encoding NDH-I complex (open circles); ndh, encoding NDH-II (closed circles); cyoABCDE, encoding cytochrome bo (open triangles); cydAB, encoding cytochrome bd-I (open diamonds); appCB, encoding cytochrome bd-II (open squares). (B) Log2 expression levels of genes encoding proteins involved in methylglyoxal pathway and gene encoding pyruvate oxidase. Abbreviations for genes: mgsA, encoding methylglyoxal synthase (open circles); dld, encoding D-lactate dehydrogenase (open up-pointing triangles); ldhA, encoding NADdependent D-lactate dehydrogenase (open diamonds); lldD, encoding L-lactate dehydrogenase (open squares); poxB, encoding pyruvate oxidase (open down-pointing triangles). (C) Log2 expression levels of genes encoding proteins involved in EMP and ED pathways. Abbreviations for genes: edd, encoding 6-phosphogluconate dehydratase (open circles); eda, encoding KDPG aldolase (open squares); pfkA, encoding 6-phosphofructokinase. Data are the mean of two biological and two technical replicates. 129 CHAPTER 6 A consequence of these changes in the composition of the respiratory chain with respect to physiological regulation in the cells may be that the redox potential of the ubiquinone pool gradually increases with increasing growth rate because reaction rate on the reducing and oxidizing side of this pool increase and decrease, respectively. If so, this could be a preparative response towards anaerobiosis conditions that may often follow periods of high growth rate. The second major contribution to energy metabolism (or: ATP yield) of the cells is provided through substrate level phosphorylation. Although often the simplifying assumption is made that this proceeds via the Embden-Meyerhof-Parnas pathway exclusively, in fact also the Entner-Doudoroff pathway, the methylglyoxal shunt and pyruvate oxidase may contribute to glucose catabolism in E. coli (see e.g. (8) and references therein). To complicate things further, Figure 3B-C shows that the expression level of key enzymes from these pathways may vary considerably in the time window considered here. We therefore think that at this stage it is not possible to decide whether or not there is, beyond the decreasing relative contribution of maintenance energy, also a + - contribution due to an increased H /e ratio, ATP-yield per mole of glucose, or a general increase in the efficiency of anabolism of the cells. 6.3 Regulation of gene expression and metabolism in bacteria Until recently, single omic analyses such as transcriptome and proteome studies have been widely used in order to understand the regulatory mechanisms in, and behaviour of cells. Based on the central dogma, a correlation between the mRNA level and the corresponding protein abundance was generally assumed. However, several recent studies based on multi-omic analyses have shown that this correlation between the 2 mRNA- and protein levels often is rather low (R = 0.45-0.53), due to multiple factors such as the different half-lives between these two components, measurement errors, and possibly post-transcriptional regulation (9–13). Thus, estimation of the protein abundance using the mRNA expression level should be taken as a tentative indication only. In addition to this it is important to stress the importance of direct proteomics studies. At the single-cell level transcriptome and proteome analyses in E. coli have shown, at any specific point in time, a near-zero correlation. This lack of correlation was suggested to be caused by typical differences in mRNA- and protein lifetimes, as bacterial mRNAs are generally short-lived (i.e. 0.6–36 min) with widely varying lifetimes (see (11); Table S6), whereas the corresponding proteins generally have a longer lifetime than the cell division cycle. Therefore, the mRNA level at any particular time point can only reflect the transcriptional activity over a period of a few minutes, while the protein level at the same 130 GENERAL DISCUSSION time point will reflect the accumulated result of gene expression for a period that goes beyond the timescale of the cell cycle. Thus, the mRNA level averaged over a long period of time should show correlation with its corresponding protein level particularly for well expressed proteins. Also heterogeneity in the translation rate was suggested to be a major cause for this near zero correlation of the mRNA- and the corresponding protein levels at the single-cell resolution level in E. coli (11). Here, our data of the time-series analyses of the changes in the transcriptome and proteome exerted by the glucose pulse in E. coli (see Chapter 3) were in agreement with the above observation that regulation at the translation level may indeed be the main reason for the low correlation between these two sets of data observed in several studies and in ours. If the differences in lifetimes between the mRNA and protein molecules were a major cause, time-series analysis could reveal the relation between the mRNA and the protein levels. However, our data and the predicted outcome based on the central dogma (i.e. the change in mRNA level is proportional to the change in protein abundance) showed 2 even less correlation between the transcriptome and the proteome (i.e. R = 0.36) than the reported values mentioned above (see Chapter 3 and (14); Supplementary Figure S2). Carbon catabolite repression (CCR) controls the sequential use of the different carbon sources that are directly coupled to catabolism, for the production of energy and precursors for biomass production and growth in bacteria. Although this system has been extensively studied for many decades, the underlying mechanisms remain controversial (see e.g. (15)). Several models regarding the CCR phenomenon have been proposed; however, each model has its own limitations due to the different ways of approaching the problem. For example, current mechanistic models can explain some features of diauxic growth, but usually are limited to specific experimental scenarios that they have been designed for. The explanation could be that these models usually ignore regulatory connections that are unknown or so far unaccounted for, e.g. global effects due to physical constraints (i.e. molecular crowding, membrane occupancy, growth-rate dilution) and mechanisms controlling global physiological effects (i.e. general activity of the transcriptional and translational machinery, mRNA and protein stability, etc.) (16). Over the past 10 years, more and more complexity in bacterial gene regulation has been revealed via the combination of classic genetic- and biochemical studies, together with high-throughput technologies (i.e. omics approaches). The accumulated data derived from these technologies suggests that gene regulation in bacteria is more complex than previously predicted and closely resembles that in eukaryotes. Furthermore, gene regulation mechanisms, in the past often referred to as uncommon in bacteria, like for 131 CHAPTER 6 instance alternative transcripts within an operon, small RNAs (sRNAs), riboswitches, noncanonical ribosome binding motifs, leaderless mRNAs, etc., may instead be rather common (see Table 1 for bacterial transcriptome features that recall eukaryotic complexity) (for a review see (17)). Table 1. Bacterial transcriptome features that recall eukaryotic complexity (reproduced from (17)). Feature Present in eukaryotes? Present in bacteria? Small RNAs Yes Yes Complex promoter Yes, eukaryotes have many transcription Yes, but simpler than in eukaryotes regulation factors and other proteins Alternative transcripts Yes RNA processing Yes Yes, alternative promoters and terminators are present Yes, but mainly related to the regulation of degradation, with few examples of specific processing RNA splicing Yes Yes, but mainly restricted to organelles (mitochondria and chloroplasts), tRNAs and ribosomal RNAs, and there are few examples Polyadenylation Yes, polyadenylation stabilizes RNA Yes, polyadenylation destabilizes RNA Localized translation Yes, many mRNAs are transported to Yes, recent data show that mRNAs specific sites where they are translated in bacteria do not diffuse and are translated on specific nucleoid localizations Epigenetic modifications Yes Yes, but little information available Impact of chromatin and Yes, supercoiling and chromatin domains Yes, several examples have been nucleoid structure on have a major impact on transcription found transcriptional regulation Regulation of the CCR is complex and controlled at multiple levels, including the interplay between metabolism and signalling via proteins and metabolites, gene expression, and the physiological state of the cells (16, 18). Also, a recent study has revealed another layer of CCR’s post-transcriptional regulation that involves sRNAs. The Spot 42 sRNA regulates catabolic genes at the level of translation and/or mRNA stability, by forming a coherent feed-forward loop with CRP (19). Gene regulation in bacteria through regulatory RNAs has come into spotlight lately as they are key players in many cellular responses (see review (20)). In some bacteria, the number of sRNA can be as high as 10 to 20% of the total number of RNA molecules (21). In E. coli, more than one hundred 132 GENERAL DISCUSSION sRNAs have been predicted computationally; of these ~80 sRNAs have been experimentally validated (22). With the advances in the deep sequencing technique (23), discovery of new sRNAs can be anticipated in the near future. Because of these global effects on the regulation of the CCR response, we hypothesize that the post-transcriptional regulation may contribute a significant part to it. Therefore, by integrating data derived from the time-series analyses of the transcriptome and the proteome of E. coli following the glucose pulse, in combination with the assumption that genes presenting a deviating correlation between the change in the mRNA level and the corresponding protein abundance from the majority of gene/protein combinations must be influenced by a post-transcriptional regulation mechanism. Among 556 genes that were analysed, 96 genes (~17%) that are subjected to the post-transcriptional control (named here: post-transcriptionally regulated (PTR) gene) have been identified (see Chapter 3; Table 1 (p < 0.01) and Supplementary Table S4 (p < 0.05) of (14)). By comparing transcriptome and proteome data, larger fractions of genes that are post-transcriptionally regulated were reported in Bacillus subtilis during shift from glucose to malate and vice versa (i.e. ~37%) (24), and in E. coli during shift from slow growing- to fast growing cells (i.e. 56%) (25). Calculations and/or using absolute quantitative method for absolute amount of mRNA and protein copy number per cell are required in these experiments. In contrast to our method that is simpler and more robust in term of data processing. Hence, coverage of PTR genes in E. coli during shift from glucose de-repressed state to glucose repressed state can be broadening if further data processing be performed in combination with extra experiment if need be. To further investigate the involvement of sRNA regulation in the CCR response, as well as to identify possible sRNA regulators of the PTR genes identified in Chapter 3, a timeseries measurement of the 78 sRNAs reported in (26, 27) was analysed (see Chapter 4). A total of 46 sRNAs was found to change its expression level significantly (p < 0.01). The majority class of these altered sRNAs are of ‘unknown function’ (14 sRNAs), followed by ‘nutrient uptake’ and ‘metabolism related sRNAs’ (12 sRNAs) (Chapter 4; Figure 2). Spot 42 that has been reported to negatively control several catabolic genes (19), whose function was suggested to be to collaborate with the global transcriptional regulator CRP and to provide stringent control over catabolic genes (28, 29), was significantly up-regulated as expected. Similarly, up-regulation of sRNAs that negatively regulate genes for alternative catabolic routes, e.g. GcvB (30–32), ChiX (33), CsrC (34), and glmY/Z (35) are also according to expectation. Although hundreds of sRNA regulators have been identified, in various bacterial species, only a small fraction of them has been studied in detail, thus 133 CHAPTER 6 leaving a large gap in our knowledge regarding to contribution of the sRNAs in the regulation of cellular processes (29). As can be seen in our results, a large fraction of the altered sRNAs was of unknown function. This sets a limit to our interpretation and comprehensive understanding of the gene regulatory network of the CCR response. More basic research is urgently needed in order to understand the mode of action of these sRNA regulators. Therefore, finding possible mRNA targets, and the function of the 14 unknown sRNAs that were altered significantly in our study was tackled by using the sRNA target prediction tool, CopraRNA (36) (Chapter 4; Table 2), as a guide for the development of further hypotheses. The most promising candidates for further characterization and validation are IsrB, C0293, and RyeA, because their 5’UTR was predicted to contain a CRP binding site; thus they may play some role in the CCR response. Surprisingly, only the expression of IsrB, that also encodes the small protein AzuC (28 amino acids), was confirmed to be negatively regulated by CRP (37). Furthermore, the result of the CopraRNA prediction showed that most of the mRNA targets of these unknown sRNAs are involved either in carbohydrate metabolism or in regulation of the expression of membrane-associated proteins (Chapter 4; Table 2). Interestingly, isrB/azuC was the most abundant transcript in the array, i.e. even higher than Spot 42. However, small proteins are usually missed in the classical proteomics studies, due to the fact that fewer peptides are generated during their tryptic digestion. This thus makes it more difficult to identify these small proteins with confidence (37). So, no evidence for the presence of the AzuC protein was obtained here. Nevertheless, collective evidence from various bioinformatics studies has suggested that IsrB/AzuC may have a dual-function. Recently, several dualfunction RNA regulators have been discovered, like for instance RNAIII in Staphylococcus aureus, SgrS in enteric bacteria, SR1 RNA in B. subtilis, and PhrS RNA in Pseudomonas species (for review see (38)). However, little is known about to what extent these two functions overlap, as it appears that both physiologically distinct, and similar, functions of peptides encoded from sRNA regulators can be observed, and whether or not they are cooperative or competitive, as they are encoded on the same RNA molecule (29, 38). These unanswered questions warrant further research into this RNAome field. Furthermore, 41 (out of 96) PTR genes identified in Chapter 3 were predicted to be regulated by sRNA regulators (Chapter 4; Table 3), besides the 26 PTR genes that were previously reported to be regulated this way. However, none of the prediction was in agreement with the previous reports that used both direct experiment and bioinformatics approach. Nevertheless, computational predictions for mRNA targets of sRNA regulators 134 GENERAL DISCUSSION are still having high rates of both false positive and false negative predictions because not all interaction parameters are known, and based on what is known is also biased information (39) that is often used. In conclusion, the study described in Chapter 4 shows the possible contribution of a large number of sRNAs in the gene regulatory network responding to an abrupt change in glucose availability and the results described in this chapter can be used as a guideline for further study. Besides post-transcriptional controls, it gradually becomes clear that also posttranslational modifications (PTMs) play a significant role in the dynamic regulation of metabolism in bacteria. Bacterial PTMs generally are dynamic and reversible, thus allowing bacteria to respond rapidly to environmental changes, with respect to their cellular metabolic processes, and e.g. cell signalling (40, 41). Phosphorylation on serine, threonine or tyrosine (S/T/Y) side chains and acetylation on lysines are among the commonly found PTMs in central carbon metabolic enzymes (which in many cases are found on the same protein), as well as of several PTS components (42–46). Different ratios of phosphorylation of central carbon metabolic enzymes in E. coli, grown on different carbon sources, have been reported (41). Also, growth on glucose was shown to induce lysine acetylation in E. coli (47), suggesting that both phosphorylation and acetylation contribute a significant part in the regulation of metabolism and hence, in the CCR response. It has been shown that acetyl-CoA synthetase (Acs), that is involved in acetate metabolism, was modified by the Gcn5-like acetyltransferase, PatZ/YfiQ, which is + positively regulated by cAMP-CRP, and the NAD -dependent deacetylasec system (cobB) (48). Therefore, identification of other targets under control of the patZ/cobB system could provide further insight into regulation via acetylation (48). However later, protein acetylation in E. coli was shown to occur mainly through chemical acetylation by acetylphosphate (AcP), and is related to AcP metabolism (49). These reports raise the question in which condition non-enzymatic acetylation would be favored over enzymatic acetylation and vice versa. Also, the underlying mechanism that regulates these PTMs, and the question how these PTMs can influence cellular process and metabolism, still remain unanswered. Several questions regarding the overlap between phosphorylation and acetylation, e.g. how these two types of PTMs are coordinated and whether a change in phosphorylation leads to a change in acetylation or vice versa (50), as well as: how are these PTMs regulated dynamically, are still largely unknown. It is still challenging to accurately identify proteins with PTMs, due to the fact that these modified peptides are usually present in low levels. Therefore, specific enrichment processes prior MS analysis is often needed. Nevertheless, a list of proteins undergoing 135 CHAPTER 6 PTMs (i.e. phosphorylation and acetylation) is currently available (41–45), and expected to be further expanding. Thus this information can be exploited for future identification and dynamic quantification of modified proteins. We note here that our time-series proteomic data contain several proteins that are reported in literature to be phosphorylated and acetylated. However, extra data processing is needed in order to extract information on these PTMs in response to the glucose pulse applied in our studies. 6.4 Cyanobacterial cell factories Massive consumption of natural resources, especially fossil carbon, since the start of the industrial revolution, leads to the worrisome foresight of resource shortages in the near future. Moreover, the awareness of the link between an over-utilization of fossil carbon and a significant increase in the atmospheric CO2 level, that negatively affects the global environment, is now increasingly emphasized (51). Therefore, it is necessary to find alternatives for fossil energy that are sustainable and ecologically friendly. Advances in multidisciplinary approaches such as system biology, metabolic engineering, and synthetic biology are now making it possible to design organisms that can synthesize desired products, using abundantly free and clean solar energy to drive the processes. Photosynthesis-driven production of cyanobacterial biofuels/chemicals is one of the promising alternative and renewable sources. Cyanobacteria have several advantages over eukaryotic algae for industrial applications, as they present higher rates and efficiency of photosynthesis, faster growth, more facile genetic engineering, and have very simple nutrient requirements (52, 53). Multiple successful examples of biofuels/chemicals produced by cyanobacteria, at considerable product titers, have already been demonstrated (see reviews (54, 55)). However, the technology of cyanobacterial based biofuels/chemicals production is still in its infancy and not yet economically competitive with exploration of fossil carbon. Currently, the product titers obtained appear not to be high and stable enough, even for the best cyanobacterial cell factories to be used at the industrial scale (>1 M) (56, 57). This is not surprising, as the basic strategy to introduce a novel production pathway in cyanobacteria is nonphysiological condition. A number of challenges still need to be overcome in order to reach the full, promising, potential of the use of cyanobacterial cell factories for the production of chemical commodities and biofuels. Several reviews have been addressing these challenges (54, 55, 58, 59), which can be summarized into three separate categories: 1) Strain development, like e.g. improving photosynthesis, the rate of carbon fixation, target chemical tolerance, genetic stability. 136 GENERAL DISCUSSION 2) Improvement of titer/yield of the target chemical, for example via expression of high performing synthetic pathways, elimination of competing pathways, improvement of the synthetic pathway flux, e.g. via the addition of irreversible steps, and separation or combining (spatial or temporal) of reactions by using metabolic flux modeling. 3) Improvement on the scale-up processes, for example via development of a bioreactor design that allows high photosynthetic efficiency and high rates of gas exchange. The first two categories rely heavily on the knowledge derived from multidisciplinary studies and synthetic systems biology studies of the cyanobacteria. The third category requires an advance in bioreactor design that will not be the focus here (see review (59) for further details). In order to improve the yield of the target chemical, we need to understand cyanobacteria at the molecular and the cellular level, as well as the effect of the heterologously expressed pathways, and the often non-physiological end products synthesized via these pathways in the engineered cyanobacterial strains. Several studies focus on the effect of the extracellular addition of potential end products, like e.g. ethanol and butanol, at the global level of gene and protein expression (60–63). Considering that the highest product titres so far obtained with a cyanobacteria-based production system showed little stress in terms of growth inhibition, as compared to the wild type organisms, the physico-chemical stress of the titer of the end product can be modest at most (64). Other stresses, e.g. a major rechanneling of intermediary metabolism, and unbalanced cofactor usage in the engineered strain, might play more significant roles. Nevertheless, fewer studies have been published on the physiological consequences of these types of stress in the ‘cyanobacterial cell factories’. Therefore, in Chapter 5, we have carried out a comparative proteomics analysis between an ethanol- and a lactate-producing Synechocystis strain, relative to the corresponding wild type (WT) organism, in order to decipher the effect of the major rechanneling of intermediary metabolism. These two strains had been chosen because they represent examples of such a phenotype, i.e. in both strains > 50% of the fixed carbon is redirected directly into the product (65, 66). The results showed a major, distinct, metabolic rearrangement towards these 2 types of product. This is not surprising due to the fact that different strategies and pathways were used to maximize the product yield (Chapter 1; Figure 4). However, we had anticipated that this study might help to identify common bottlenecks in our synthetic approach. We found that in the ethanol- 137 CHAPTER 6 producing mutant significant up-regulation of proteins involved in the initial stages of CO2 fixation was detected (Chapter 5; Figure 4). In parallel, a decrease in the abundance of the protein synthesizing machinery of the cells and a slight induction of an oxidative stress response were observed. This result is in agreement with the phenotype observed in this strain that showed an overall increase in the CO2 fixation rate and decrease in the growth rate, as compared to the WT strain (Chapter 5; Figure 1). A similar increased rate of cellular CO2 fixation was observed in other engineered cyanobacteria with a high capacity carbon sink (66–68). The increased rate of CO2 fixation in the ethanol-producing strain might result from the increased cellular level of CO2 that is released as a by-product from the ethanol-forming pathway. Furthermore, regulation of the total capacity for CO2 fixation in the ethanol-producing strain was due to hierarchical regulation (Chapter 5; Figure 6). The same group of proteins that was significantly up-regulated in the ethanolproducing strain was not observed to be upregulated in the lactic acid-producing strain. Even though this strain showed that up to 50 % of the fixed carbon was directly rechanneled into lactic acid. Instead, we observed in this strain a strong decrease in growth rate and up-regulation of several uncharacterized/unknown proteins. While a significant down-regulation of ribosomal proteins was not observed, several central catabolic proteins and proteins involved in photosynthesis, and the light-harvesting phycobilisome complex were down-regulated significantly (Chapter 5; Figure 5). The most up-regulated protein detected in this strain is Slr5088 (i.e. ~11-fold) which recently was annotated as a probable short-chain dehydrogenase (69). This is presumably due to an + imbalance of the intracellular NADH/NAD ratio, caused by the over-expressed NADHdependent LDH. Several factors could contribute to the strong growth retardation + observed in this strain, like for example the imbalance of this NADH/NAD redox couple, and/or the overall decrease in photosynthesis. Whether or not these factors are behind the strong growth inhibition observed in this strain has yet to be revealed. It is relevant to + note that the imbalance of the NADH/NAD ratio could be overcome by employing NADPH-dependent enzymes, that have been shown to increase many product titers (55, 66, 68). Interestingly, recent metabolic modelling analyses have suggested that modification (i.e. lowering) of the ATP/NADPH utilization ratio can be used to increase product yield. Considering that the ATP/NADPH ratio required for product formation of ethanol and lactic acid is below the value estimated (based on the genome-scale stoichiometric network of Synechocystis sp. PCC 6803) for cellular growth, excess ATP must be consumed 138 GENERAL DISCUSSION or directed into other pathways in order to create a favorable condition for product synthesis (70, 71). This knowledge holds great promise towards constructing optimized cyanobacterial cell factories. In general, genetic instability is also one of the obstacles often encountered in metabolically engineered cyanobacteria. However, it is often overlooked due to the little information that is available about this aspect(72). The scarcity of information regarding the molecular mechanism of DNA acquisition, DNA recombination, and DNA repair in cyanobacteria is probably due to the fact that cyanobacteria, like e.g. Synechocystis sp. PCC6803 (hereafter Synechocystis) are naturally transformable. Thus, understanding the molecular mechanism of this process does not have high priority. However, recently PR Jones (2014) explicitly raised the question whether or not the genetic instability (indicated by loss of productivity) is an issue for development of cyanobacterial based biofuels/ chemicals production systems, because several diverse and often contradictory results were obtained (72). These diverse results were for example from J Qiao and colleagues (2012) who observed that the up-regulation of UvrA (slr1844, that is involved in DNA repair) along with several stress response proteins, upon ethanol exposure in Synechocystis (60) while D Dienst and colleagues (2014) had observed that a prolonged ethanol production in Synechocystis caused only minor changes in the transcriptome. Also, the latter authors did not report the loss of productivity nor the severe growth defect (73). This could be caused by the fact that the ethanol concentration produced in the latter study was 2.5-fold less than the concentration of ethanol used in the first study. No significant changes of proteins possibly involved in the generation of spontaneous mutations, including UvrA, RecA, and MutS, were observed in our proteomic result of both the ethanol- and the lactic acid-producing strain. Instead, we observed specifically in the lactic acid-producing mutant, a significant up-regulation of proteins involved in a CRISPR/Cas system i.e. Sll7065, Sll7087, and Sll7090. The CRISPR/cas systems are known to provide adaptive immune systems in bacteria and archaea (see reviews (74, 75)) against phage transduction, transformation, and conjugation as well as confer resistance to horizontal gene transfer (76). It is relevant to note that the lactic acid-producing mutant has been transformed with a self-replicating plasmid, to increase the level of LDH expression (66). However, other studies that have used a similar self-replicating plasmid did not report genetic instability or loss of productivity upon subculturing (65, 77). Therefore, at this stage we cannot conclude whether or not the CRISPR/Cas system contributes to the decreased genetic stability observed in our lactic-acid producing strain (see Chapter 5). The collective evidence nevertheless suggests that the severe growth 139 CHAPTER 6 defect often observed in cyanobacteria, and either caused by the exposure to a high concentration of product, or the major rechanneling of intermediary metabolism, may be a driving force to increase genetic instability. This speculation, however, needs to be further investigated. 6.5 Concluding remarks Advances in the recording and interpretation of omics-data, including transcriptomic-, proteomic-, and metabolomics data have brought us closer to the goal of the system biology approach, a comprehensive understanding of the biological systems of interest. However, for many systems several layers of gene regulation and biomolecular interactions are still missing from a completed picture, as was shown above for example in the CCR response. Little is known in bacteria about the contribution of gene regulation mediated by small non-coding RNAs, as well as about the relative contribution of posttranslational modification, to the organisms’ phenotype. This is not only true in E. coli but also in Synechocystis. In these two model organisms only a handful of reports have been published on these two aspects. Furthermore, while there are enormous amounts of data derived from omics analyses, basic biochemical knowledge of any particular gene or protein is lagging behind significantly. This, amongst others, has led to a situation in which several genes/proteins are still with an unknown function. Our results give further substance to the conclusion that also the small scale molecular studies, as opposed to the omics analyses, are important and should be strongly encouraged. 6.6 References 1. Lipson DA. 2015. The complex relationship between microbial growth rate and yield and its implications for ecosystem processes. Front Microbiol 6:615. 2. Borirak O, Bekker M, Hellingwerf KJ. 2014. Molecular physiology of the dynamic regulation of carbon catabolite repression in Escherichia coli. Microbiology 160:1214–1223. 3. Beardmore RE, Gudelj I, Lipson DA, Hurst LD. 2011. Metabolic trade-offs and the maintenance of the fittest and the flattest. Nature 472:342–346. 4. Björklöf K, Zickermann V, Finel M. 2000. Purification of the 45 kDa, membrane bound NADH dehydrogenase 5. David P, Baumann M, Wikström M, Finel M. 2002. Interaction of purified NDH-1 from Escherichia coli with 6. Bekker M, de Vries S, Ter Beek A, Hellingwerf KJ, de Mattos MJT. 2009. Respiration of Escherichia coli can of Escherichia coli (NDH-2) and analysis of its interaction with ubiquinone analogues. FEBS Lett 467:105–110. ubiquinone analogues. Biochim Biophys Acta BBA - Bioenerg 1553:268–278. be fully uncoupled via the nonelectrogenic terminal cytochrome bd-II oxidase. J Bacteriol 191:5510–5517. 140 GENERAL DISCUSSION 7. Tseng CP, Albrecht J, Gunsalus RP. 1996. Effect of microaerophilic cell growth conditions on expression of the aerobic (cyoABCDE and cydAB) and anaerobic (narGHJI, frdABCD, and dmsABC) respiratory pathway genes in Escherichia coli. J Bacteriol 178:1094–1098. 8. Sharma P, Hellingwerf KJ, de Mattos MJ, Bekker M. 2012. Uncoupling of substrate-level phosphorylation in Escherichia coli during glucose-limited growth. Appl Environ Microbiol 78:6908–6913. 9. Nie L, Wu G, Zhang W. 2006. Correlation between mRNA and protein abundance in Desulfovibrio vulgaris: a multiple regression to identify sources of variations. Biochem Biophys Res Commun 339:603–610. 10. Lu P, Vogel C, Wang R, Yao X, Marcotte EM. 2007. Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation. Nat Biotechnol 25:117–124. 11. Taniguchi Y, Choi PJ, Li G-W, Chen H, Babu M, Hearn J, Emili A, Xie XS. 2010. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science 329:533–538. 12. Kramer G, Sprenger RR, Nessen MA, Roseboom W, Speijer D, de Jong L, de Mattos MJ, Back J, de Koster CG. 2010. Proteome-wide alterations in Escherichia coli translation rates upon anaerobiosis. Mol Cell Proteomics MCP 9:2508–2516. 13. Maier T, Schmidt A, Güell M, Kühner S, Gavin A-C, Aebersold R, Serrano L. 2011. Quantification of mRNA and protein and integration with protein turnover in a bacterium. Mol Syst Biol 7:511. 14. Borirak O, Rolfe MD, de Koning LJ, Hoefsloot HCJ, Bekker M, Dekker HL, Roseboom W, Green J, de Koster CG, Hellingwerf KJ. 2015. Time-series analysis of the transcriptome and proteome of Escherichia coli upon glucose repression. Biochim Biophys Acta 1854:1269–1279. 15. Gorke B, Stulke J. 2008. Carbon catabolite repression in bacteria: many ways to make the most out of nutrients. Nat Rev 6:613–624. 16. Kremling A, Geiselmann J, Ropers D, de Jong H. 2015. Understanding carbon catabolite repression in Escherichia coli using quantitative models. Trends Microbiol 23:99–109. 17. Güell M, Yus E, Lluch-Senar M, Serrano L. 2011. Bacterial transcriptomics: what is beyond the RNA horizome? Nat Rev Microbiol 9:658–669. 18. Berthoumieux S, de Jong H, Baptist G, Pinel C, Ranquet C, Ropers D, Geiselmann J. 2013. Shared control of gene expression in bacteria by transcription factors and global physiology of the cell. Mol Syst Biol 9:634. 19. Beisel CL, Storz G. 2011. The base-pairing RNA spot 42 participates in a multioutput feedforward loop to help enact catabolite repression in Escherichia coli. Mol Cell 41:286–297. 20. Waters LS, Storz G. 2009. Regulatory RNAs in bacteria. Cell 136:615–628. 21. Romby P, Charpentier E. 2010. An overview of RNAs with regulatory functions in gram-positive bacteria. Cell Mol Life Sci CMLS 67:217–237. 22. Raghavan R, Groisman EA, Ochman H. 2011. Genome-wide detection of novel regulatory RNAs in E. coli. Genome Res 21:1487–1497. 23. Wang Z, Gerstein M, Snyder M. 2009. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10:57–63. 24. Buescher JM, Liebermeister W, Jules M, Uhr M, Muntel J, Botella E, Hessling B, Kleijn RJ, Chat LL, Lecointe F, Mäder U, Nicolas P, Piersma S, Rügheimer F, Becher D, Bessieres P, Bidnenko E, Denham EL, Dervyn E, Devine KM, Doherty G, Drulhe S, Felicori L, Fogg MJ, Goelzer A, Hansen A, Harwood CR, Hecker M, Hubner S, Hultschig C, Jarmer H, Klipp E, Leduc A, Lewis P, Molina F, Noirot P, Peres S, Pigeonneau N, Pohl S, Rasmussen S, Rinn B, Schaffer M, Schnidder J, Schwikowski B, Dijl JMV, Veiga P, Walsh S, Wilkinson AJ, Stelling J, Aymerich S, Sauer U. 2012. Global Network Reorganization During Dynamic Adaptations of Bacillus subtilis Metabolism. Science 335:1099–1103. 141 CHAPTER 6 25. Valgepea K, Adamberg K, Seiman A, Vilu R. 2013. Escherichia coli achieves faster growth by increasing catalytic and translation rates of proteins. Mol Biosyst 9:2344–2358. 26. Argaman L, Hershberg R, Vogel J, Bejerano G, Wagner EG, Margalit H, Altuvia S. 2001. Novel small RNAencoding genes in the intergenic regions of Escherichia coli. Curr Biol CB 11:941–950. 27. Wassarman KM, Repoila F, Rosenow C, Storz G, Gottesman S. 2001. Identification of novel small RNAs using comparative genomics and microarrays. Genes Dev 15:1637–1651. 28. Beisel CL, Storz G. 2011. Discriminating tastes: physiological contributions of the Hfq-binding small RNA Spot 42 to catabolite repression. RNA Biol 8:766–770. 29. Bobrovskyy M, Vanderpool CK. 2013. Regulation of Bacterial Metabolism by Small RNAs Using Diverse Mechanisms. Annu Rev Genet 47:209–232. 30. Urbanowski ML, Stauffer LT, Stauffer GV. 2000. The gcvB gene encodes a small untranslated RNA involved in expression of the dipeptide and oligopeptide transport systems in Escherichia coli. Mol Microbiol 37:856– 868. 31. Pulvermacher SC, Stauffer LT, Stauffer GV. 2009. The small RNA GcvB regulates sstT mRNA expression in Escherichia coli. J Bacteriol 191:238–248. 32. Pulvermacher SC, Stauffer LT, Stauffer GV. 2009. Role of the sRNA GcvB in regulation of cycA in Escherichia coli. Microbiol Read Engl 155:106–114. 33. Plumbridge J, Bossi L, Oberto J, Wade JT, Figueroa-Bossi N. 2014. Interplay of transcriptional and small RNAdependent control mechanisms regulates chitosugar uptake in Escherichia coli and Salmonella. Mol Microbiol 92:648–658. 34. Romeo T, Vakulskas CA, Babitzke P. 2013. Posttranscriptional regulation on a global scale: Form and function of Csr/Rsm systems. Environ Microbiol 15:313–324. 35. Reichenbach B, Maes A, Kalamorz F, Hajnsdorf E, Görke B. 2008. The small RNA GlmY acts upstream of the sRNA GlmZ in the activation of glmS expression and is subject to regulation by polyadenylation in Escherichia coli. Nucleic Acids Res 36:2570–2580. 36. Wright PR, Richter AS, Papenfort K, Mann M, Vogel J, Hess WR, Backofen R, Georg J. 2013. Comparative genomics boosts target prediction for bacterial small RNAs. Proc Natl Acad Sci U S A 110:E3487–96. 37. Hemm MR, Paul BJ, Miranda-Ríos J, Zhang A, Soltanzad N, Storz G. 2010. Small Stress Response Proteins in Escherichia coli: Proteins Missed by Classical Proteomic Studies. J Bacteriol 192:46–58. 38. Vanderpool CK, Balasubramanian D, Lloyd CR. 2011. Dual-function RNA regulators in bacteria. Biochimie 93:1943–1949. 39. Storz G, Vogel J, Wassarman KM. 2011. Regulation by small RNAs in bacteria: expanding frontiers. Mol Cell 43:880–891. 40. Grangeasse C, Stülke J, Mijakovic I. 2015. Regulatory potential of post-translational modifications in bacteria. Front Microbiol 6. 41. Lim S, Marcellin E, Jacob S, Nielsen LK. 2015. Global dynamics of Escherichia coli phosphoproteome in central carbon metabolism under changing culture conditions. J Proteomics 126:24–33. 42. Macek B, Gnad F, Soufi B, Kumar C, Olsen JV, Mijakovic I, Mann M. 2008. Phosphoproteome Analysis of E. coli Reveals Evolutionary Conservation of Bacterial Ser/Thr/Tyr Phosphorylation. Mol Cell Proteomics 7:299– 307. 43. Zhang J, Sprung R, Pei J, Tan X, Kim S, Zhu H, Liu C-F, Grishin NV, Zhao Y. 2009. Lysine Acetylation Is a Highly Abundant and Evolutionarily Conserved Modification in Escherichia Coli. Mol Cell Proteomics MCP 8:215– 225. 142 GENERAL DISCUSSION 44. Zhang K, Zheng S, Yang JS, Chen Y, Cheng Z. 2013. Comprehensive Profiling of Protein Lysine Acetylation in Escherichia coli. J Proteome Res 12:844–851. 45. Soares NC, Spät P, Krug K, Macek B. 2013. Global Dynamics of the Escherichia coli Proteome and Phosphoproteome During Growth in Minimal Medium. J Proteome Res 12:2611–2621. 46. Soufi B, Soares NC, Ravikumar V, Macek B. 2012. Proteomics reveals evidence of cross-talk between protein modifications in bacteria: focus on acetylation and phosphorylation. Curr Opin Microbiol 15:357–363. 47. Lima BP, Antelmann H, Gronau K, Chi BK, Becher D, Brinsmade SR, Wolfe AJ. 2011. Involvement of protein acetylation in glucose-induced transcription of a stress-responsive promoter. Mol Microbiol 81:1190–1204. 48. Castaño-Cerezo S, Bernal V, Blanco-Catalá J, Iborra JL, Cánovas M. 2011. cAMP-CRP co-ordinates the expression of the protein acetylation pathway with central metabolism in Escherichia coli. Mol Microbiol 82:1110–1128. 49. Weinert BT, Iesmantavicius V, Wagner SA, Schölz C, Gummesson B, Beli P, Nyström T, Choudhary C. 2013. Acetyl-Phosphate Is a Critical Determinant of Lysine Acetylation in E. coli. Mol Cell 51:265–272. 50. Beltrao P, Bork P, Krogan NJ, van Noort V. 2013. Evolution and functional cross-talk of protein post-translational modifications. Mol Syst Biol 9. 51. IPCC. 2014. Climate Change 2014: Impacts, Adaptation, and Vulnerability. Part A: Global and Sectoral Aspects. Contribution of Working Group II to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change [Field, C.B., V.R. Barros, D.J. Dokken, K.J. Mach, M.D. Mastrandrea, T.E. Bilir, M. Chatterjee, K.L. Ebi, Y.O. Estrada, R.C. Genova, B. Girma, E.S. Kissel, A.N. Levy, S. MacCracken, P.R. Mastrandrea, and L.L. White (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA. 52. Quintana N, Kooy FV der, Rhee MDV de, Voshol GP, Verpoorte R. 2011. Renewable energy from Cyanobacteria: energy production optimization by metabolic pathway engineering. Appl Microbiol Biotechnol 91:471–490. 53. Machado IMP, Atsumi S. 2012. Cyanobacterial biofuel production. J Biotechnol 162:50–56. 54. Nozzi NE, Oliver JWK, Atsumi S. 2013. Cyanobacteria as a platform for biofuel production. Synth Biol 1:7. 55. Savakis P, Hellingwerf KJ. 2015. Engineering cyanobacteria for direct biofuel production from CO2. Curr Opin Biotechnol 33:8–14. 56. Deng M-D, Coleman JR. 1999. Ethanol Synthesis by Genetic Engineering in Cyanobacteria. Appl Environ Microbiol 65:523–528. 57. Gao Z, Zhao H, Li Z, Tan X, Lu X. 2012. Photosynthetic production of ethanol from carbon dioxide in genetically engineered cyanobacteria. Energy Environ Sci 5:9857–9865. 58. Wijffels RH, Barbosa MJ, Eppink MHM. 2010. Microalgae for the production of bulk chemicals and biofuels. Biofuels Bioprod Biorefining 4:287–295. 59. Olivieri G, Salatino P, Marzocchella A. 2014. Advances in photobioreactors for intensive microalgal production: configurations, operating strategies and applications. J Chem Technol Biotechnol 89:178–195. 60. Qiao J, Wang J, Chen L, Tian X, Huang S, Ren X, Zhang W. 2012. Quantitative iTRAQ LC-MS/MS proteomics reveals metabolic responses to biofuel ethanol in cyanobacterial Synechocystis sp. PCC 6803. J Proteome Res 11:5286–5300. 61. Wang J, Chen L, Huang S, Liu J, Ren X, Tian X, Qiao J, Zhang W. 2012. RNA-seq based identification and mutant validation of gene targets related to ethanol resistance in cyanobacterial Synechocystis sp. PCC 6803. Biotechnol Biofuels 5:89–6834–5–89. 62. Tian X, Chen L, Wang J, Qiao J, Zhang W. 2013. Quantitative proteomics reveals dynamic responses of Synechocystis sp. PCC 6803 to next-generation biofuel butanol. J Proteomics 78:326–345. 143 CHAPTER 6 63. Anfelt J, Hallström B, Nielsen J, Uhlén M, Hudson EP. 2013. Using Transcriptomics To Improve Butanol Tolerance of Synechocystis sp. Strain PCC 6803. Appl Environ Microbiol 79:7419–7427. 64. Wijffels RH, Kruse O, Hellingwerf KJ. 2013. Potential of industrial biotechnology with cyanobacteria and eukaryotic microalgae. Curr Opin Biotechnol 24:405–413. 65. Savakis PE, Angermayr SA, Hellingwerf KJ. 2013. Synthesis of 2,3-butanediol by Synechocystis sp. PCC6803 via heterologous expression of a catabolic pathway from lactic acid- and enterobacteria. Metab Eng 20:121– 130. 66. Angermayr SA, van der Woude AD, Correddu D, Vreugdenhil A, Verrone V, Hellingwerf KJ. 2014. Exploring metabolic engineering design principles for the photosynthetic production of lactic acid by Synechocystis sp. PCC6803. Biotechnol Biofuels 7:99–6834–7–99. eCollection 2014. 67. Ducat DC, Silver PA. 2012. Improving Carbon Fixation Pathways. Curr Opin Chem Biol 16:337–344. 68. Lan EI, Liao JC. 2012. ATP drives direct photosynthetic production of 1-butanol in cyanobacteria. Proc Natl Acad Sci U S A 109:6018–6023. 69. Kramm A, Kisiela M, Schulz R, Maser E. 2012. Short-chain dehydrogenases/reductases in cyanobacteria. FEBS J 279:1030–1043. 70. Knoop H, Steuer R. 2015. A computational analysis of stoichiometric constraints and trade-offs in cyanobacterial biofuel production. Synth Biol 47. 71. Erdrich P, Knoop H, Steuer R, Klamt S. 2014. Cyanobacterial biofuels: new insights and strain design strategies revealed by computational modeling. Microb Cell Factories 13. 72. Jones PR. 2014. Genetic Instability in Cyanobacteria – An Elephant in the Room? Front Bioeng Biotechnol 2. 73. Dienst D, Georg J, Abts T, Jakorew L, Kuchmina E, Borner T, Wilde A, Duhring U, Enke H, Hess WR. 2014. Transcriptomic response to prolonged ethanol production in the cyanobacterium Synechocystis sp. PCC6803. Biotechnol Biofuels 7:21–6834–7–21. 74. Sorek R, Lawrence CM, Wiedenheft B. 2013. CRISPR-Mediated Adaptive Immune Systems in Bacteria and Archaea. Annu Rev Biochem 82:237–266. 75. van der Oost J, Westra ER, Jackson RN, Wiedenheft B. 2014. Unravelling the structural and mechanistic basis of CRISPR-Cas systems. Nat Rev Microbiol 12:479–492. 76. Marraffini LA, Sontheimer EJ. 2008. CRISPR Interference Limits Horizontal Gene Transfer in Staphylococci by Targeting DNA. Science 322:1843–1845. 77. Guerrero F, Carbonell V, Cossu M, Correddu D, Jones PR. 2012. Ethylene synthesis and regulated expression of recombinant protein in Synechocystis sp. PCC 6803. PloS One 7:e50470. 144 Summary/Samenvatting SUMMARY Summary The main aim of this thesis was to investigate the physiological response of bacteria responding to physiological‐ and engineered changes in their central carbon metabolism through the application of multi‐omics analyses techniques. The level of the proteome, which is the key functional component of biological systems, and its relationship with its corresponding transcriptome, are the focus of this research. The results described in this thesis have revealed an additional significant layer of regulation of the Carbon Catabolite Repression (CCR) response (i.e. via small, regulatory RNAs) in Escherichia coli. In addition, a number of physiological‐ and stress responses, caused by overproduction of chemical commodities in Synechocystis cell factories, have been uncovered. Chapter 1 gives an overview of the system biology approach including current techniques used for data generation, i.e. omics analyses, as well as a brief introduction of applying computational models in the system biology. Next, I describe the rationale behind, and current knowledge of, the systems selected for this study, i.e. the CCR response in E. coli, and chemical commodity production in engineered Synechocystis cells. In E. coli, the CCR is directly linked to the catabolism as well as to the anabolism of the cell and hence, determines the growth rate and growth yield of the bacteria. Improved understanding of this system may help to further improve applications of this organism, e.g. in fermentation‐based production of chemical commodities using mixed carbon sources such as lignocellulose, and in finding new antimicrobial drug targets. Next, a brief history is provided of the CO2‐based production of chemical commodities, based on catalysis by light‐driven metabolism in cyanobacteria. Next, the properties of cyanobacterial strains engineered to over‐produce ethanol‐ and lactic acid, respectively are discussed, as well as current information on the stress imposed by such engineering and by these end‐products. In Chapter 2, we present a method that allows accurate sampling for time‐series multi‐ omics analyses in E. coli responding to a glucose‐pulse. A condition in which CCR is absent was generated using E. coli grown under glucose‐limitation in the chemostat. Following the glucose pulse, the induction of the CCR was confirmed using RT‐PCR. The physiological response of E. coli to such a glucose pulse was recorded dynamically. An increased growth rate and a transient decrease of the efficiency of aerobic catabolism were observed after the glucose pulse. In parallel, a maximal growth rate and recovery of the efficiency of catabolism were observed after less than half of a doubling time of the organism. Our results show that there is a positive linear relationship between the specific glucose uptake rate (qs) and the apparent H+/e‐ ratio, up to a growth rate of 0.4 h‐1, while biomass 146 SUMMARY yield on glucose (YX/s) drops during the first 15 minutes, followed by a gradual recovery. Surprisingly, the growth yields after the recovery phase show values even higher than the maximum “theoretical” yield. These combined results suggest that there is a positive correlation between the growth rate and growth efficiency; however, further investigation is needed to verify this result. Chapter 3 provides information on the dynamic response of the transcriptome and proteome of E. coli elicited by a glucose pulse, following the well established method described in Chapter 2. Changes in the transcriptome and the corresponding proteome were analysed using statistical procedures designed specifically for time‐series data. We have shown that the relationship between the transcriptome and proteome during the CCR response is low (R2 = 0.36). Subsequently, based on the lack of correlation between these two sets of data, 96 post‐transcriptionally regulated (PTR) genes involved in the glucose repression response were identified. This gene list provides candidates for future in‐depth investigation of the molecular mechanisms involved in post‐transcriptional control during the CCR response in E. coli. Based on the time‐series transcript analysis of 79 small RNAs reported in Chapter 3, in combination with currently available bioinformatics tools, in Chapter 4, we have analyzed a possible contribution of small regulatory RNA (sRNA) to the gene regulatory network of the CCR response. Of a total of 46 sRNAs the expression level changed significantly during this response. The amount of information regarding sRNA regulation in E. coli is limited and the majority of the significantly altered sRNAs observed here is of unknown function. In silico analysis of these sRNAs has revealed possible functions of 14 unknown sRNAs, as well as possible sRNA regulators of several of the PTR genes identified in Chapter 3. Also, we have provided a list of new sRNA regulators that may contribute to the CCR response in E. coli, which includes IsrB/AzuC, C0293, and RyeA. This chapter serves as a primer for further study into the field of RNAomics and bacterial physiology. In Chapter 5, we show that the high degree of carbon partitioning observed in the ethanol‐ and the lactic acid producing Synechocystis strain is accommodated by distinct responses. In the ethanol‐producing strain, increased levels of mRNA and the corresponding protein level, involved in the initial stages of CO2 fixation, along with decreased abundance of the translation machinery of the cells, is observed. This observation was in agreement with physiological observations on the ethanol‐producing strain. In contrast, no significant change in these specific groups of proteins was observed in the lactic acid‐producing strain, except for several proteins with unknown function. Notably, in this strain we did observe a significantly increased level of two CRISPR 147 SUMMARY associated proteins. These latter results emphasize how little we know about cyanobacteria in general, and especially about their stress response, genetic stability, and gene‐ and metabolic regulation. Chapter 6 then discusses the results obtained in this thesis in the broader context of the currently available knowledge from the literature. Also some future perspectives on each of the topics studied are provided. 148 SAMENVATTING Samenvatting Het hoofddoel van het onderzoek beschreven in dit proefschrift was om met multi‐omics technieken te beschrijven op welke manier bacterien reageren op natuurlijke en ge‐ engineerde veranderingen in hun centrale metabolisme, via veranderingen in hun interne‐ en externe milieu. Het hoofdaccent in de uitgevoerde studie ligt op metingen van het niveau de componenten van het proteoom van de gekozen model organismen – hun meest belangrijke functionele component – in relatie tot de niveaus van de overeenkomstige mRNAs. De resultaten die met dit onderzoek verkregen zijn en in dit proefschrift beschreven worden laten zien dat er een extra laag van moleculaire, functionele, interacties actief is in het regel‐netwerk van het proces van kataboliet repressie (KR) in Escherichia coli, nl. dat van de zogenaamde ‘kleine RNAs’ (sRNAs). Daarnaast zijn een aantal fysiologische en stress‐geinduceerde aanpassingen ontdekt in ge‐engineerde stammen van de cyanobacterie Synechocystis, die een specifieke chemische commodity verbinding overproduceren. In hoofdstuk 1 wordt een overzicht gegeven van de systeem‐biologische aanpak van onderzoek op cellulair niveau, inclusief een beschrijving van de grootschalige analyse technieken voor karakterisering van het cellulaire proteoom en transcriptoom. Daarbij wordt het nut van het gebruik van een combinatie van data verkregen met deze technieken en een computationele analyse daarvan, en van het te onderzoeken systeem, benadrukt. Vervolgens worden de twee model systemen geintroduceerd waarop dit onderzoek zich heeft toegespitst, namelijk het systeem van KR in de enterobacterie Escherichia coli, en de productie van commodity chemicalien met behulp van ge‐ engineerde Synechocystis stammen. In E. coli is de KR is direct gekoppeld aan zowel het anabolisme als het katabolisme van de cellen, en daarom is het systeem van KR een belangrijke factor die mee bepaalt hoe groot de groeisnelheid en de groeiopbrengst van de cellen zal zijn onder specifieke groeiomstandigheden. Een beter begrip van de KR response kan er toe bijdragen dat industriele processen gebaseerd op de grootschalige kweek van E. coli, bijvoorbveeld om specifieke commodity chemicalien te produceren vanuit een mengsel van verschillende suikers (zoals die bijvoorbeeld uit lignocellulose verkregen kunnen worden), efficienter kan verlopen. Mogelijk kan deze kennis ook bijdragen aan het vinden van betere targets voor het ontwikkelen van nieuwe antibiotika tegen (infecties van) enterobacterien. Vervolgens wordt in het kort de ontwikkeling geschetst van de aanpak van ‘directe conversie’, waarin een ge‐engineerde cyanobacterie en zonlicht gebruikt worden, om water en kooldioxide rechtstreeks om te zetten in commodity chemicalien. In dit verband 149 S AMENVATTING worden twee ge‐engineerde Synechocystis stamen in meer detail besproken, die meer dan 50 % van de gefixeerde kooldioxide rechtstreeks omzetten in respectievelijk ethanol en melkzuur. Ook wordt de stress (en de reactie van het organisme daarop) besproken die deze hoge mate van ombuiging van de koolstof flux in deze ge‐engineerde cyanobacterien veroorzaakt. In hoofdstuk 2 wordt een methode gepresenteerd die het mogelijk maakt monsters te nemen voor nauwkeurige, dynamische, multi‐omics analyses in E. coli, waarin de KR geinduceerd wordt door een plotselinge verhoging (tot 50 mM) van de extracellulaire glucose concentratie (i.e. een pulse glucose). Er is voor gezorgd in dit experiment dat KR niet aanwezig is voordat de glucose wordt toegevoegd, door de cellen bij relatief lage groeisnelheid in een chemostaat te kweken onder goed ge‐aereerde omstandigheden onder glucose limitatie en de initiatie van de KR door de glucose pulse is met RT‐PCR gebaseerde analyses bevestigd. De response van E. coli op een glucose pulse is geanalyseerd door samples te nemen na verschillende reactietijden. Het directe fysiologische gevolg van de toevoeging van glucose is een verhoging van de groeisnelheid van de cellen en een tijdelijke verlaging van de groeiopbrengst op glucose. De maximale groeisnelheid van het organisme wordt na ongeveer 30 minuten bereikt, terwijl de efficientie van het katabolisme van glucose in ongeveer dezelfde periode zich herstelt. Onze resultaten laten zien dat er een positieve lineaire correlatie is tussen de specifieke snelheid van glucose opname (qs) en de berekende H+/e‐ verhouding, tot de groeisnelheid tot 0.4 h‐1 is toegenomen, terwijl de biomassa opbrengst op glucose (YX/s) in de eerste 15 minuten daalt, gevolgd door een geleidelijk herstel in de volgende 45 minuten. Na een uur is de groeiopbrengst per hoeveelheid gekataboliseerd glucose gestegen tot waarden die zelfs boven de waardes liggen die meestal als maximum worden verondersteld voor deze grootheid. Deze gezamenlijke resultaten versterken het vermoeden dat er een positieve correlatie aanwezig is tussen de snelheid en de katabole efficientie van de groei van E. coli; echter dit en de moleculaire achtergrond hiervan moeten nog verder onderzocht worden. In hoofdstuk 3 worden de gegevens gepresenteerd betreffende onze analyse van de dynamiek van het transcriptoom en het proteoom van E. coli, opgewekt door initiatie van de KR, in het systeem zoals beschreven in hoofdstuk 2. De verkregen meetresultaten zijn geanalyseerd met statistische methoden die specifiek ontwikkeld zijn voor analyse van dynamische (i.e. tijd‐serie) data. Hiermee konden we laten zien dat de relatie tussen de verhouding van de hoeveelheid van de transcriptoom‐ en corresponderende proteoom componenten, die door KR veranderen, laag is (R2 = 0.36). Gebaseerd op deze lage graad 150 SAMENVATTING van correlatie tussen de componenten van deze twee data sets konden 96 post‐ transcriptioneel gereguleerde (PTR) genen, betrokken bij de KR response in E. coli, geidentificeerd worden. Deze lijst van genen kan in de toekomst gebruikt worden voor de identifikatie van genen waarvan het mechanisme van post‐transcriptionele regulatie van expressie gedurende KR verder in detail bestudeerd wordt. De transcriptoom analyse beschreven in hoofdstuk 3 bevat ook data omtrent het expressie niveau van 79 sRNAs. In hoofdstuk 4 worden deze data geanalyseerd met recent ontwikkelde bioinformatika‐gereedschappen, om de bijdrage van deze sRNAs aan het regulatie netwerk van de KR response verder in kaart te brengen. Van 46 van deze sRNAs verandert het expressive niveau in de KR response gedurende het eerste uur na toevoeging van glucose op een significante manier. De hoeveelheid informatie die tot nu toe beschikbaar is omtrent de bijdrage van sRNAs aan de regulatie van gen expressie in E. coli is beperkt en het is daarom dan ook niet verrassend dat aan de meerderheid van deze 46 sRNAs in de literatuur nog geen functie kon worden toegekend. In silico analyse van deze sRNAs heeft van 14 van hen een mogelijke functie aan het licht gebracht, samen met een aantal mogelijke sRNA regulatoren van de PTR genen die geidentificeerd waren in hoofdstuk 3. Ook heeft deze analyse een lijst opgeleverd van nieuwe sRNA regulatoren die mogelijk een bijdrage leveren aan de regulatie van de KR in E. coli; deze lijst omvat tenminste IsrB/AzuC, C0293, and RyeA. Dit hoofdstuk (4) is bedoeld als een primer voor verder onderzoek in het veld van RNAomics en de fysiologie van bacterien. In hoofdstuk 5 laten we zien dat de hoge graad van omzetting van CO2 direct naar de (commodity) producten ethanol en melkzuur, in twee ge‐engineerde Synechocystis stammen, tot heel verschillende fysiologiche aanpassingen in het proteoom van deze respectievelijke stammen leidt. In de ethanol‐producerende stam is een toegenomen expressive te zien, zowel op mRNA als op eiwit niveau, van een aantal componenten betrokken bij de initiele stadia van CO2 fixatie, samen met een afname in de expressie niveaus van de translatie machinerie van de cellen. Laatstgenoemde waarneming was in overeenstemming met resultaten van fysiologische experimenten met de ethanol‐ producerende stam. In tegenstelling hiermee werd in de melkzuur overproducerende stam waargenomen dat er geen significante veranderingen optraden in deze beide groepen van genen. Wel waren in deze stam een aantal eiwitten verhoogd tot expressie gebracht waarvan voor de meesten nog geen functie is opgehelderd. Een uitzondering in laatsgenoemde groep vormen twee eiwitten die een CRISPR‐geassocieerde functie hebben. De resultaten uit dit hoofdstuk laten ook zien hoe weinig we nog weten van cyanobacterien in zijn algemeenheid en meer specifiek van de moleculaire basis van de 151 S AMENVATTING stress response, genetische stabiliteit, en de regulatie van gen‐expressie en het metabolisme in deze organismen. Hoofdstuk 6 bespreekt vervolgens het totaal van de reultaten uit de eerdere hoofdstukken van dit proefschrift in de bredere context van wat er tot nu toe in de literatuur bekend is rond het onderzoeksthema van dit proefschrift. Ook worden in dit hoofdstuk een aantal van de perspectieven besproken die de resultaten van dit onderzoek bieden voor verder wetenschappelijk onderzoek rond dit thema. 152 List of publications LIST OF PUBLICATIONS Borirak O, de Koning LJ, van der Woude AD, Hoefsloot HCJ, Dekker HL, Roseboom W, Green J, de Koster CG, Hellingwerf KJ. Quantitative proteomics analysis of an ethanol- and a lactate-producing mutant strain of Synechocystis sp. PCC6803. Biotechnol Biofuels. 2015;8:111. Borirak O, Rolfe MD, de Koning LJ, Hoefsloot HCJ, Bekker M, Dekker HL, Roseboom W, Green J, de Koster CG, Hellingwerf KJ. Time-series analysis of the transcriptome and proteome of Escherichia coli upon glucose repression. Biochim Biophys Acta. 2015 Oct;1854(10 Pt A):1269–79. Borirak O, Bekker M, Hellingwerf KJ. Molecular physiology of the dynamic regulation of carbon catabolite repression in Escherichia coli. Microbiology. 2014 Jun 1;160(Pt_6):1214– 23. ‡ ‡ ‡ Borirak O , Samanphan P , Dangtip S , Kiatpathomchai W, Jitrapakdee S. Promoter motifs essential to the differential transcription of structural and non-structural genes of the white spot syndrome virus. Virus Genes. 2009 Oct;39(2):223–33. Borirak O and Hellingwerf KJ. Contribution of regulatory RNAs to the post-transcriptional control of Carbon Catabolite Repression in Escherichia coli (in preparation). ‡ These authors contributed equally to this work. 154 Acknowledgements ACKNOWLEDGEMENTS Finally, another chapter of my life, so called ‘PhD phase’ has been accomplished. I am now ready to face any challenges. Before my new life’s chapter begins, however, I would like to have an opportunity to express my gratitude and appreciation to people who help me during the time I was doing my PhD and staying in Amsterdam. First of all, I would like to thank my promotor and supervisor, Klaas who gave me an opportunity to join in the group of Molecular Microbial Physiology (MMP) at UvA. From the very first day that I arrived and you picked me up at the airport, I have learned a lot from you not exclusively in the field of science but also managing skills. I am deeply grateful for your help both in guiding me through my PhD research, and solving all the endless problem of bureaucracy. Your dedication and passion for science is inspiring. I am really fortunate to have a chance to work with you. Also, I would like to express my sincere gratitude to my second promoter, Chris who gave me deep insight into Mass Spectrometry field as well as many useful tips and suggestions on my ongoing research. I would also like to express my deepest gratitude to the Higher Education Commission, Thailand for the financial support. Also all the Thai government staffs who taking care of my scholarship. Many thanks go to Ms. Wataya Singhasa for managing all the bureaucracy concerning my scholarship in Amsterdam. Special thanks to my co‐promotor, Leo who has contributed to a significant part in my research especially in the Mass Spectrometry aspect. Without your help, enthusiasm, encouraging, and suggestion, it was impossible to complete this thesis. Also I would like to express my gratitude to Huub for your willingness to participate in my research especially in the Statistical analysis. I am deeply appreciated for your patient to my random daily questions regarding Statistics. Many thanks to Martijn who has introduced me to the lab and guided me into topic during the first year of my PhD as well as accepting being part of my thesis committee. I would also like to thank the remaining members of my thesis committee who spent their valuable time to evaluate my thesis: Prof. Dr. Hans V. Westerhoff, Prof. Dr. Bas Teusink, Prof. Dr. Age Smilde, and Dr. Filipe Branco dos Santos. My sincere gratitude goes to Dr. Matthew D. Rolfe and Prof. Dr. Jeffrey Green at Sheffield University, UK. Your cooperation regarding Microarray analysis was highly appreciated. I would also like to express my deepest thanks to Henk and Winfried who have introduced me to the Mass Spectrometry lab, giving me useful tips and suggestion regarding mass spectrometry workflow as well as helping me with every problem concerning mass spectrometry and HPLC. Talking with you two are always interesting and 156 ACKNOWLEDGEMENTS enjoyable! Also many thanks to Dennis and Jos for taking care of the MMP lab and making sure that we will never run out of necessary things. Jos, I really enjoy our conversation and picture sharing session. You are the best officemate! My gratitude goes to the remaining of my officemates as well as colleagues (from both Roeterseiland and Science Park): Kornel, Yusuke, Marijke, Aleksandra, Jeroen, Pascal, Aniek, Rosanne, Milou, Poonam, Philipp, Vinod, Que, Wei, and Parsa. Kornel and Yusuke, I really enjoy our conversation both scientific and non‐scientific topics. Many thanks go to Kornel for your friendship, and for helping me in the lab during the beginning at Roeterseiland. Without you I would never find a filter! Special thanks to Yusuke for your endless support and wonderful Japanese meal! I truly value our friendship, and I hope to see you and Airi and the kids in Japan soon! For the rest, thanks for creating such a friendly and work stimulating environment in both the office and the lab. Also big thanks to the members of Mass Spectrometry of Biomacromolecules group: Hansuk, Wishwas, Sascha, and Linli. It’s a pleasure sharing the lab space with you. Hansuk, thanks for your friendship and support. I hope we will be doing cooperation on research in the near future. Another group of people that I must mention here are the lovely members of the lunch group/coffee break: Alice, Andreas, Aleksandra, Nadine, Clemens, Philipp, Jeroen, Pascal, and Soraya, without you my life at the UvA would be boring. I truly value our conversation (even it bordered on insanity) and all the activities we have done together. Alice, Nadine, and Aleksandra thanks for your friendship, support, and countless fun activities that we have shared (I could never tell them all!). It’s been a wonderful time. I am really fortunate to meet you. Thanks for these unforgettable memories! I will really miss you and I do hope that we would manage to visit each other regularly. I would also like to thanks people who I met outside work. Uli thanks for your support and friendship. I will always remember how beautiful Freiburg is. Nol, Chayada, Weina, Paula, Jiraporn, thanks for your friendship and support. Bernd and Grzegorz, thanks for the lovely trip to Gdnask and many other things we have done together. Last but not least: ขอขอบคุณกําลังใจจากทางบ้ าน ทั ้งคุณพ่อและคุณแม่ รวมทั ้งคุณพี่ทั ้งสองคนที่คอย ให้ กําลังใจ สัง่ สอนและสนับสนุน ไม่ให้ ยอ่ ท้ อทั ้งในเรื่ องการเรี ยน และเรื่ องอื่นๆ 157
© Copyright 2026 Paperzz