Stochastic Modelling of Transcription Tom Brown Supervisors: Dr Andrew Angel, Prof. Jane Mellor Project 1 lieved to be much more complex with a number of additional factors influencing gene regulation. These include: The regulation of gene expression and RNA pro- non-coding antisense transcription, that of transcribing in duction is a tightly-controlled process that can have the opposite orientation to the gene, or ‘sense’ strand [2, 3]; major negative effects on organisms if compromised. modifications to chromatin and DNA [4]; and the act of Further to genes being transcribed into functional transcription itself is also believed to have a role in gene regmRNAs, non-coding RNA is produced which has ulation [5]. Without the correct responses to external stimregulatory roles in the cell. Antisense transcrip- uli, incorrect recruitment of transcription factors, through tion has been associated with a number of key ncRNA or other regulatory mechanisms, can have serious gene regulatory roles both at the transcriptional clinical consequences, such as in immune disorders [6] or level and once the non-coding antisense transcript cancer [7]. has been produced. This project compares data Steps involved in transcription from two mutant strains of Saccharomyces cerevisiae, with and without antisense transcription In order for a gene to be successfully transcribed, the proacross the GAL1 gene, via mathematical modelling. moter region upstream of the gene must be accessible for Through a stochastic model and comparison to sinthe RNA polymerase to bind and begin the initiation phase gle molecule transcript data, the key aspects of gene of transcription. During this initiation, RNA polymerase regulation governed by antisense transcription are recognises the promoter region of the gene and, using initihighlighted. The model developed makes clear the ation factors, binds to the promoter. The RNA polymerase impact antisense transcription has on the initiation recruits a number of elements of the transcription machinof transcription at the DNA up to the degradation ery in order to form the pre-initiation complex (PIC). Once of RNA in the cytoplasm, affecting the entire RNA bound, the RNA polymerase and the initiation factors seplife-cycle. arate the two DNA strands so that the template strand is exposed to the polymerase and can be read. The exposed Introduction strand then enters the transcription machinery, beginning the elongation process. The RNA polymerase progresses Transcription is one of the fundamental processes un- along the template strand, reading the single nucleotides, derlying variation in organisms, dictating their abilities to producing the complementary RNA strand. As the polyfunction, adapt and survive in the face of changing environ- merase reaches the gene terminator, the completed RNA ments. During transcription, the 4 DNA bases are copied to strand is then released from the RNA polymerase. Several complementary RNA nucleotides. Functional mRNA pro- proteins act on the RNA strand as part of processing to duced from coding genes are translated into functional pro- prevent early degradation, assist nuclear export and provide teins, while the non-coding RNAs (ncRNAs) could have fur- a template for the translational machinery. Once mRNAs ther effects on the regulation of gene expression[1]. The have been processed, they are exported to the cytoplasm, tightly regulated control of transcription plays vital roles in where they undergo protein translation and are ultimately an organism’s development and in response to environmen- degraded (Fig. 1)[8]. tal cues. If genes are not correctly transcribed, organisms will experience irregular levels of protein and RNA, which Antisense transcription can be lethal to cells. Abstract A major focus of the work carried out here is the impact The traditional view of gene regulation was that of tranantisense transcription has on gene regulation and in particscription factors initiating transcription, producing functional mRNA and proteins, which may themselves further ular what effect can be detected on the mechanistic level of influence transcription. The underlying process is now be- gene regulation regarding promoter activity, transcription 1 Fig. 1: A schematic of transcription and the key processes involved. During transcription, a gene promoter can either be in an active or inactive configuration (1). Once in an active configuration, RNA polymerase binds to the promoter, initiating transcription (2). The polymerase moves along the gene during the elongation phase (3), copying the DNA bases into corresponding RNA bases. Once complete, the mRNA and polymerase dissociate from the gene (4). The mRNA is then exported to the cytoplasm (5), where it will be degraded (6). initiation, polymerase elongation, nuclear export and cytoplasmic degradation. Antisense transcription is initiated and regulated through similar mechanisms to that of sense transcription, arising through independent promoters; bidirectional promoters, promoting transcription in both the sense and antisense directions [9, 10]; and cryptic promoters, situated within genes. In S. cerevisiae, most antisense transcripts arise through bidirectional promoters [9, 11], with some arising from cryptic promoters [12] or from the terminator of the sense gene [13]. can alter chromatin modifications or DNA methylation in higher eukaryotes [15, 16, 17]. In S cerevisiae, antisense transcription arising from internal cryptic promoters has been shown to directly modify the chromatin of the associated sense gene, delaying the initiation of the transcription through hypoacetylation of the nucleosomes surrounding the promoter of the sense gene [16]. Recent evidence suggests that sense and antisense transcripts could recruit histone modifying proteins to the locus, altering the acetylation and methylation status of key histone marks. Sense and antisense transcription were found to associate with elRegulation of sense transcription evated levels of histone methylation and acetylation, respectively, at the gene body and promoter [5]. The acetylation Both antisense transcripts and the act of antisense tran- marks left by antisense transcription could allow for easier scription itself can affect a number of stages of gene ex- passage of the sense-transcribing polymerase. pression [14]. Sense transcription initiation can be affected Antisense transcription could also affect elongation of through promoter competition, where the assembly of the transcription machinery at one promoter can impact the as- sense transcription. Sense RNA polymerase elongation can be halted through collision with RNA polymerase moving sembly of the transcription machinery at another promoter [15]. Transcribing RNA polymerase moving in the antisense in the antisense direction [15]. Antisense transcription can also affect which transcript isoforms are produced from the direction can block the binding site of a sense promoter, resulting in a polymerase being unable to bind to the sense sense locus. By altering the splice sites, mRNA processing can be affected, potentially resulting in poor nuclear export promoter. Furthermore, the act of antisense transcription 2 and likely degradation by the nuclear exosome [15]. There is evidence to suggest that antisense transcripts can further affect the mRNA once it has exited the RNA polymerase. By binding to the 5’ region of the sense transcript, it has been shown that antisense transcripts can increase translational efficiency of some genes. The sense mRNA can be made both more or less stable through pairing of sense-antisense molecules. This form of regulation relies on both the sense and antisense molecules being present in the cell simultaneously, resulting in a lower probability of occurrence, with evidence so far suggested in mammalian cells [18, 19]. The mathematical model presented here will be used to detect any changes at these stages in transcription and identify where antisense transcription is most affecting gene regulation. The comparison between strains with and without antisense transcription across a locus will highlight which Fig. 2: Example data used in this study, obtained through single of the above mechanisms are the key means by which regu- molecule fluorescence in situ hybridization (smFISH). For each cell, the DNA has been stained using DAPI (blue), showing the lation through antisense transcription is mediated. nucleus of each cell and the mRNA molecules of the gene of interest are bound by a fluorescent single-stranded DNA probe with the complementary sequence (red). Any bright red foci on the nucleus show one or more mRNA transcripts undergoing transcription or awaiting nuclear export. Any foci outside of the nucleus were counted as cytoplasmic mRNA. Stochastic behaviour of gene expression Previous studies have analysed gene expression in individual S. cerevisiae [20] and mammalian cells [21]. While population average estimates of gene expression show consistent levels of expression under similar conditions, the expression levels in individual cells show large variation. In S. cerevisiae, genes have been observed to undergo bursting expression, demonstrating short time periods where the promoter region is in an active state, with large numbers of RNA molecules transcribed during this active period. As a result of this, one sees a great range in the the number of transcripts per cell at a single time-point across a population [20]. Due to this variation in gene expression across cell populations, transcription is modelled here by considering the underlying transcription mechanisms of each cell, rather than modelling transcription based on population averages. Methods Transcript data Transcript data in this study were obtained through single molecule fluorescence in situ hybridization (smFISH), which allows spacial localisation of the RNA molecules in a cell at a fixed moment in time. This technique involves probing fixed cells with single-stranded DNA probes complementary to the sequence of the gene of interest, with each probe having a fluorescent dye attached [22]. Once the probe has bound to the appropriate RNA sequence, it will fluoresce under laser excitation. By further staining the nucleus with DAPI (4’,6-diamidino-2-phenylindole), one can determine how many transcripts are present in both the nucleus and cytoplasm for each cell at a fixed moment in time [23] (Fig. 2). Thus given the underlying noisy gene expression, transcription is modelled here as undergoing a series of stochastic reactions on the single-cell level. A model of individual cells will capture the inherent variation in gene expression across the population. The variations in the rates of the underlying reactions involved in the life-cycle of RNA will cast light on any changes on a mechanistic level which can be detected with and without antisense transcription. Any transcribing RNA polymerase will result in a partial RNA transcript emerging from the transcription machinery, and as such probes will bind to this RNA, producing a nuclear spot on the raw image. To capture this behaviour, the model developed includes both complete and partial RNA 3 Fig. 3: Schematic of the GAL1 -GAL10 loci. The two strains of S. cerevisiae used in this study each had truncated forms of GAL1. The strain SH9 had a scrambled TATA box in its terminator, resulting in a reduction in the amount of antisense transcription across GAL1. Upstream of GAL1, GAL10 shares a bi-directional promoter with GAL1. The GAL1 antisense transcription terminates in the bi-directional promoter, whereas the GAL10 antisense, beginning inside the GAL10 open reading frame, has two forms, terminating in the promoter and continuing through into the sense GAL1 transcript [5]. transcripts for comparison to the smFISH data. GAL10. This system presents the opportunity to identify the effect antisense transcription has on its corresponding Given this level of detail it is possible to obtain distributions of the levels of cytoplasmic mRNA and the number of sense gene and any genes downstream of the antisense transcription. The GAL10 smFISH data collected was of higher transcripts held in the nucleus awaiting elongation terminaquality than the GAL1, thus the GAL10 data was used for tion or nuclear export. further development of the model. Saccharomyces cerevisiae strains Stochastic modelling In order to analyse the effect of antisense transcription on the mechanisms governing gene expression, data were collected from similar mutant strains of S. cerevisiae where one mutant has reduced antisense transcription across the GAL1 locus. The strains, labelled SH9 and ADH4, both have engineered GAL1 genes, expression of which is stimulated in galactose [24]. In the SH9 strain, the TATA box, found to associate with RNA polymerase II binding [25], in the terminator for the GAL1 gene is scrambled, significantly reducing the amount of antisense transcription that occurs across the GAL1 sequence to approximately 5% of the wild-type levels [5]. The ADH4 strain has an intact terminator and levels of antisense transcription across GAL1 comparable to wild-type levels, but, like the SH9 strain, had a shortened GAL1 sequence, resulting in identical sense GAL1 mRNA molecules for both strains. The alterations to the GAL1 gene also affect transcription of the GAL10 gene, found upstream of the GAL1 gene, transcribed on the opposite strand. The GAL1 and GAL10 genes share a bidirectional promoter, with the GAL1 antisense transcript and one form of the GAL10 antisense transcript terminating in this terminator.[5] (Fig. 3). The transcriptional system was modelled as a stochastic process with 6 core reactions, where the rate of each reaction depended on the current state of the system. The reactions were: inactive gene → active gene active gene → inactive gene (1) (2) active gene → active gene + elongating Pol II (3) elongating Pol lI → nuclear transcript nuclear transcript → cytoplasmic transcript cytoplasmic transcript → ∅ (4) (5) (6) Where ∅ represents the loss of an mRNA molecule through cytoplasmic degradation. While previous work has been carried out modelling the rate at which mRNA transcripts are transcribed, translated and degraded, these models have not taken into account any nuclear retention [20, 22, 26, 27]. Some of the reactions used in this model are incorporated in the model analysed in [27], however the introduction of reactions involving transcription elongation and nuclear export aims to reveal more information regarding the changes occurring on a mechanistic level regarding gene regulation. The reduction in antisense in transcription across the GAL1 locus will affect gene regulation of both GAL1 and Due to the inherent noisiness of single-cell gene expres4 Distribution of cytoplasmic transcripts after 15 minutes in glucose 0.3 0.25 0.25 0.2 0.2 Frequency Frequency Distribution of cytoplasmic transcripts after 7 minutes in glucose 0.3 0.15 0.15 0.1 0.1 0.05 0.05 0 0 0 5 10 15 20 25 30 0 Number of cytoplasmic transcripts 5 10 15 20 25 30 Number of cytoplasmic transcripts A B Fig. 4: Data from shutdown experiments performed, where the S. cerevisiae cells’ environment was changed from galactose to glucose, which stops transcription across the GAL genes, having previously undergone high levels of transcription in galactose. Distributions of cytoplasmic transcripts 7 minutes (A) and 15 minutes (B) after glucose was added to the cells’ environment. Over the 8 minute period, the frequency of cells containing lower numbers of transcripts increases, demonstrating the shut-down in transcription 5. Update concentrations according to reaction k occurring, then repeat from step 1 until t > Tmax . sion, a stochastic Gillespie algorithm was incorporated to model the underlying reactions [28]. In this algorithm each reaction is assigned a probability of occurring based on the number of molecules in the system that could undergo this reaction. Each time-step is modelled as a memory-less Markovian process where the system progresses based entirely on the current number and distribution of different molecules present. The time to next reaction is modelled as an exponential random variable, with the probability of a reaction occurring increasing with time at a rate given by the state of the current system. Once a reaction time has been chosen, which reaction occurs is chosen randomly, weighted by the distribution of molecules in the system: To count the number of nuclear and cytoplasmic transcripts in the cell at a fixed time point, the model was run for 500 time-steps to allow the system to reach a steady state (Fig. S1) and the number of full transcripts in the cytoplasm and the number of full and partial transcripts in the nucleus were counted at the end of the simulations. This method was used to most closely represent the data measured, reflecting the single-time snap shot of the data and ignored any partially degraded transcripts in the cytoplasm. Determining degradation rates 1. At time t, compute the propensities of each reaction, The system of equations in the above form suffered from given the current state of the system, ai for reaction i. 2. Generate two random numbers uniformly on the inter- a problem of scale. If the rate at which transcripts were exported from the nucleus increased, a balance could be struck val [0,1] ri ∼ U (0, 1), i = 1, 2 by increasing the rate at which the transcripts were de3. Compute time to next reaction: graded in the cytoplasm. This artefact of the model meant n X 1 1 that equally good distributions to fit the data were found ai (7) , a0 = τ = log a0 r1 by increasing corresponding rates, resulting in an infinite i=1 space of parameter fits. To reduce the search space, and Next reaction occurs at time t + τ . correspondingly the solution space, the degradation rate for 4. Compute which reaction occurs next by finding k such each strain was determined by performing further simulathat: tions. k−1 k 1 X 1 X In order to determine the degradation rate of the GAL10 ai ≤ r2 < ai (8) a0 i=0 a0 i=0 cytoplasmic mRNA, results from a glucose shut-down exwhere k can take the value of any reaction (here num- periment were modelled. In this experiment, cells were grown for 2 hours in galactose, causing expression of the bered 1-6). 5 GAL genes. Glucose was then introduced to the cells’ environment, causing transcription of the GAL genes to shut down. After roughly 7 minutes in glucose, it has been assumed that RNA polymerase stopped binding to the promoter region and no new transcripts were produced in the nucleus [24]. Given the low number of transcripts in the nucleus, the cytoplasmic system could be approximated as only undergoing degradation: in reality, so the parameters were allowed to vary unconstrained, further allowing any inter-dependent parameters to vary to the same scale, of which there are numerous in the system as discussed above. At each step, new parameters were chosen by drawing random numbers from a Normal distribution with mean given by the current parameter value and fixed standard deviation of 0.02. These sets of parameters were then used to simulate 500 time steps for 4,500 cells to determine the distributions for the number of cytoplasmic transcript → ∅ (9) nuclear and cytoplasmic transcripts, allowing the system to reach a steady state and to capture the underlying distri2 The number of transcripts present in the cytoplasm at bution (Fig. S1). A modified χ statistic was used as a goodness-of-fit test for the simulated set of parameters: the 7 minute time point of the galactose to glucose shutN down were calculated and then the 15 minute data were nuc X (Oi − Ei )2 Ncyt χ2 = normalised to have the same number of cells as the 7 minute Ncyt + Nnuc i=0 Ei data. All of the cytoplasmic transcripts were then simuNcyt X (Oi − Ei )2 Nnuc lated as existing in the same environment and each tran(10) + Ncyt + Nnuc i=0 Ei script could be degraded with the same probability determined by the degradation rate as in the above Gillespie algowhere Nnuc and Ncyt are the number of nuclear and cytorithm. For each degradation rate, the entire population unplasmic data points with at least 5 counts, respectively. Oi derwent 8 minutes of simulated reactions with cytoplasmic represents the number of observed cells with i transcripts transcripts being degraded at the rate given by the current and Ei the number of expected cells with i transcripts from degradation rate. The degradation rates were varied from the simulated model, only including those values of i where 0.0001 to 0.2 in steps of 0.0001 and the degradation rate Oi ≥ 5 and Ei > 0. The χ2 statistic was chosen as a was recorded if the number of transcripts remaining in the goodness-of-fit test to incorporate all recorded data points simulated system after 8 minutes was the same as the numand penalise greater relative differences to the distributions, ber of transcripts in the normalised 15 minute data. These i.e. the same absolute difference between the raw data and simulations were then repeated 2,000 times, recording each the simulated results would be penalised heavier if there was instance when a degradation rate resulted in a successful a larger relative difference, when compared to the simulated simulation. The histogram of successful degradation rates data (Ei ). was then fitted to a Beta distribution, chosen to encompass This weighted χ2 statistic gave equal value to the nuclear the fixed interval of possible degradation rates and to obtain a modal value for the degradation rates. The modal data and the cytoplasmic data, regardless of how many data value of the fitted Beta distribution, signifying the peak in points were included in each. This was used to avoid oversuccessful degradation rates, was taken to be the value of fitting to one data set over the other. the true degradation rate for the ADH4 and SH9 strains. This statistic was obtained at the end of each set of simulations for the sampled parameter values, the parameters used for these simulations were accepted if a random number drawn from the interval (0.5, 1) was less than the ratio of the old statistic to the new statistic: Searching the parameter space Only the degradation rate could be calculated using the smFISH data available, thus in order to gain some insight into the rates of the other 5 reactions in the system, the nuclear and cytoplasmic transcript data were used. To determine which sets of parameters created distributions that best explained the data, paramaters were sampled and tested using Markov Chain Monte Carlo (MCMC) simulations via a Metropolis-Hastings algorithm, varying the 5 unknown parameters with each simulation step [29, 30]. Initial conditions were chosen as unbiased equal values of 0.5min−1 and allowed to vary with equal standard deviation of 0.02 over the positive real numbers. This reflected the fact that it was not known accurately how long each reaction takes r< χ2old χ2new (11) where r ∼ U (0.5, 1). The new parameters were accepted if the new χ2 statistic was smaller than the previously accepted statistic. If the new statistic was larger than the previous statistic, but less than double the previous 2statistic,it was accepted with a χ probability equal to 2 · χ2old − 0.5 . If the tested paramenew ters resulted in a measure that was too large to be accepted, then the new rates were rejected and the parameter values 6 ADH4 GAL10 degradation rates, mode: 0.045269 SH9 GAL10 degradation rates, mode: 0.028703 2000 3000 1800 2500 1400 Accepted Frequency Accepted Frequency 1600 1200 1000 800 600 400 2000 1500 1000 500 200 0 0 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0 Degradation Rate (min-1) 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Degradation Rate (min-1) A B Fig. 5: GAL10 degradation rates in the presence and absence of antisense transcription. Various degradation rates were tested against the shutdown experimental data to determine the true degradation rates in the two strains. Shown are histograms of the successful simulated GAL10 degradation rates from shut-down experiments and fitted beta distributions to determine the modal degradation rate. The calculated degradation rates for the GAL10 mRNA were: 0.045 min−1 (A) and 0.029 min−1 (B) for the +antisense (ADH4) and -antisense (SH9), respectively. remained at the last accepted values. Once a set of parameter were accepted, these were then used as the mean when choosing the sample parameters for the next step in the iteration. This method was used to ensure a thorough search of the parameter space, avoiding being restricted to local minima, yet further being prevented from exploring superfluous areas of the parameter space in order to save computational time. To perform an effective search of the 5-dimensional parameter space, each sample parameter would have to be tested against each sample parameter of the other 4 parameters. For a sample space of 100 values per parameter, this would involve testing 1010 parameter sets, requiring approximately 3,000 hours of computation time (48,000 CPU hours 2.0GHz Xeon SandyBridge Red Hat Enterprise Linux). utes and 24.8 minutes. A recent study calculated the average mRNA half-life in S. cerevisiae was approximately 11.5 minutes [31], showing that the calculated values from the shutdown simulations were reasonable. These determined degradation rates were then used as fixed values in the MCMC parameter search, keeping the degradation rate fixed and varying the other 5 parameters as explained above. Inter-dependency of different rates The parameter search simulations were performed using the determined degradation rates to determine the sets of parameters that explained the experimental distributions obtained (Fig. 6). By approximating the gene promoter to have reached a steady state, where the promoter is in Results on an active configuration with probability: on+of f , the entire system can be approximated as a queueing network proDegradation rates cess where there are 3 servers with infinite capacity repApplying the degradation simulations to the data from resenting the elongation, export and degradation events. a shutdown experiment looking at RNA from the GAL10 Simplifying this model to a succession of three M/M/∞ genes, the degradation rates were determined for the cyto- queues, meaning that arrival times and processing times plasmic RNA in both the ADH4 and SH9 strains. Fig. 5 are distributed as memory-less Poisson processes and there shows the accepted simulated degradation rates. For the are infinitely-many servers available to process arrivals, the GAL10 mRNA, the degradation rates were 0.045 min−1 mean number of transcripts at each queue can be calculated in the ADH4 strain (Fig. 5A) and 0.029 min−1 in the SH9 [32]. Using Mathematica [33] , the stationary distribution strain (Fig. 5B). These correspond to half-lives of 15.4 min- of the cytoplasmic transcripts was calculated as being Pois- 7 Simulated Data - Nucleus 50 100 0 150 100 0 0 5 0 5 0 10 0 30 20 10 0 0 10 20 30 Number of Transcripts per cell 0 10 20 30 10 40 30 20 10 30 20 10 0 0 Number of Transcripts per cell 5 Number of Transcripts per cell Simulated Data - Cytoplasm 0 0 50 10 40 Frequency Frequency 20 5 Number of Transcripts per cell Real Data - Cytoplasm 40 30 100 0 10 Number of Transcripts per cell Simulated Data - Cytoplasm 40 50 150 0 10 Number of Transcripts per cell Real Data - Cytoplasm Frequency 50 Simulated Data - Nucleus Frequency 100 Frequency 150 Frequency Frequency 150 Real Data - Nucleus Frequency Real Data - Nucleus 10 20 30 0 Number of Transcripts per cell A 10 20 30 Number of Transcripts per cell B Fig. 6: Fits of the distributions of the number of nuclear and cytoplasmic transcripts obtained through smFISH experiments. Shown are the best fits to the GAL10 ADH4 (A) and SH9 (B) data using a weighted chi-square statistic placing equal importance on the nuclear and cytoplasmic data init·on son with mean deg(on+of f ) . The stationary distribution of the nuclear transcripts was given by a sum of two Poisson processes, where the means of the two processes were: init·on init·on elong(on+of f ) and exp(on+of f ) . This demonstrates rather nicely that the key values to be determined are the balance between transcription initiation and degradation given by: init·on deg(on+of f ) and the balance of transcription in the nucleus init·on init·on given by the sum: elong(on+of f ) + exp(on+of f ) . each fitted distribution, frequencies were determined to be 0.20 min−1 for the ADH4 +antisense strain (Fig. 7B) and 0.13 min−1 for the SH9 -antisense strain (Fig. 7C). A reduced system While the system of 6 equations described above captures the key aspects of transcription and transcript level, the model presents too many unknown parameter values for Given that the degradation rate can be determined from the data available. In order to determine the effect the loss the shut-down experimental data, the model is able to de- of antisense transcription has on the elongation of the polytermine the value of the transcription initiation frequency: merase and export rate of the nuclear transcripts, the results from the simulations of the previous model were compared init · on (12) with that of a simpler model. The system was reduced to on + of f one only undergoing 5 reactions: Simulations from both ADH4 and SH9 GAL10 data were inactive gene →active gene (13) fitted using the MCMC model simulations and the best active gene →inactive gene (14) 1,000 parameter sets were recorded to give an indication of active gene →active gene + nuclear transcript (15) any patterns between parameter values associated with distributions fitting the collected data. Fig. 7A shows the ininuclear transcript →cytoplasmic transcript (16) on+of f tiation frequency plotted against the fraction: . The cytoplasmic transcript →∅ (17) on ratio between the initiation frequency and the probability the gene is in an active configuration gives the value of the In this reduced model, the two nuclear reactions, that of transcription initiation frequency. Plotting the initiation transcription elongation and nuclear export, were reduced frequency against the on-fraction demonstrates this ratio as to one reaction . In accordance with the simplified queueing the gradient of the straight line passing through the points format, this one reaction could be modelled as undergoing a init·on equivalent of a plotted. In order to determine the value of the transcription Poisson process with mean nuc(on+of f ) , the 1 init·on 1 · on+of + exp initiation frequency for the two strains, histograms of the Poission process with mean elong f . Perratio were calculated and smoothed distributions were fit- forming the MCMC parameter search algorithm with this ted to the calculated histograms, indicating at its peak the simplified system, any changes in the amount of time tranmodal value of the histogram. By taking the modal value of scripts spent in the nucleus could be determined by fitting 8 Transcription initiation frequencies ADH4 transcription initiation, mode: 0.1978 ADH4 SH9 2 35 30 50 25 1.5 1 Frequency 40 Frequency Initiation Frequency (min -1 ) SH9 transcription initiation, mode: 0.12938 60 2.5 30 20 15 20 10 0.5 10 5 0 0 0 2 4 6 8 10 12 1/On Frequency 14 16 18 20 0 0 0.05 0.1 0.15 0.2 0.25 Transcription Initiation Rate (min-1 ) A 0.3 0 0.05 0.1 0.15 0.2 0.25 0.3 Transcription Initiation Rate (min-1 ) B C Fig. 7: Change in the transcription initiation rate in the presence and absence of antisense transcription. The best 1,000 parameter sets fitted to nuclear and cytoplasmic RNA data in +antisense (ADH4) and -antisense (SH9) strains were taken to analyse the f differences in the distributions of the experimental data from the two strains. The ratio of initiation to on frequency on+of , on demonstrated by the straight-line behaviour in A, can be determined by fitting to histograms of the calculated transcription init·on , shown in B and C, showing the rate at which new transcripts are initiated in each strain. The initiation frequencies on+of f determined transcription initiation frequencies were 0.20 min−1 for ADH4 and 0.13 min−1 for SH9. to the nuclear transcript data. There was a clear difference detected in amount of time transcripts spent in the nucleus (Fig. 8B) in the reduced model. This single parameter was mapped to the nuclear balance between elongation and export rates in the more complex model. Fig. 8A shows the separation in rates and Fig. 8C shows the corresponding change in time spent in the nucleus under the more complex model. The two systems consistently show the ADH4 GAL10 mRNA spending less time in the nucleus, with the rate for transcripts to go from initiation to export approximately doubling compared to the SH9 GAL10 mRNA. Similar simulations were carried out with an unweighted χ2 measure, resulting in a heavier bias towards fitting cytoplasmic data. Identical conclusions were drawn from these simulations with transcription initiation occurring faster in ADH4, ADH4 mRNA spending less time in the nucleus and then being degraded faster in the cytoplasm (Fig. S2 - Fig. S4). detected from one experiment and the size of the experiment is not sufficiently large to yield a statistically significant result. Increased transcription initiation frequency in the presence of antisense transcription agrees with an argument suggesting that acetylated histones at the promoter region, brought about by antisense transcription, facilitate polymerase binding to the promoter region. In particular, reduction in antisense transcription would result in histones with methylated marks remaining at the promoter region, reducing polymerase binding to the promoter region. These methylated histone marks have been associated with genes experiencing lower levels of expression, which in the system studied would be left on histones not experiencing antisense transcription[5, 4]. In addition to the effects detected on transcription initiation at the GAL10 locus, antisense transcription or the transcripts themselves are having an effect on the time taken for the GAL10 mRNA to leave the nucleus. If the polymerase from the GAL1 antisense transcription continues past the GAL10 promoter, the transcription machinery may continue leaving acetylation marks on histones in the ORF of the gene, resulting in improved elongation efficiency [4]. Discussion The simulations performed demonstrate a clear difference in the underlying mechanisms governing regulation of the GAL10 gene by antisense transcription across the upstream GAL1 gene. The act of antisense transcription has been associated with histone acetylation [16, 5], a mark which has been shown to promote transcription initiation and facilitate polymerase elongation [4]. In the system of genes studied, reduction in the antisense transcription into the promoter of the GAL10 gene saw a reduction in transcription initiation frequency, although this result has only been It is not known what effect histone modifications have on the transcripts themselves. It may be the case that the acetylation/methylation status of the histones either at the gene promoter or in the gene body leave marks on the mRNA signalling early nuclear export and cytoplasmic degradation. The absence of antisense transcription, resulting in hypoacetylation may cause the mRNA to not have these key markers, slowing down nuclear export and cytoplasmic degradation. This change in degradation rate is 9 Nuclear rates Transcription initiation and nuclear rate ADH4 SH9 200 150 100 50 0 10 15 20 25 Elong + Exp (min -1 ) 30 35 Transcription initiation and nuclear rates 0.2 0.3 Transcript initiation frequency (min -1 ) Elong x Exp (min-2 ) 250 Transcription initiation frequency (min-1 ) 300 ADH4 SH9 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 ADH4 SH9 0.28 0.26 0.24 0.22 0.2 0.18 0.16 0.14 0.12 0.1 0 5 10 Nuclear Rate (min -1 ) A B 15 0 1 2 3 4 5 6 7 8 Elong x Exp / (Elong + Exp) (min -1 ) C Fig. 8: Change in time transcripts spent in the nucleus in the presence and absence of antisense transcription. The relationship between the polymerase elongation and nuclear export rate is demonstrated in A. B shows the change in time transcripts spend in the nucleus when the nuclear transcripts only undergo a single reaction encompassing both polymerase elongation and nuclear export, with the antisense-less mutant (SH9) retaining transcripts in the nucleus for longer. C shows the same change in time spent in the nucleus between the two strains, but relates this change to the polymerase elongation and nuclear export rate. of particular interest as it is not currently known entirely termine the degradation rates of the mRNA, the export what effect antisense transcription has on mRNA once in rate of mRNA from the nucleus may be found. Again, by the cytoplasm. assuming that no more transcripts will be produced, any The changes at a mechanistic level detected on the RNA reduction in the number of transcripts found in the nucleus can be modelled as occurring through nuclear export. This life-cycle suggest that gene regulation is affected by much more than just the processes at the transcriptional level. Al- will present a distinction between polymerase elongation terations governed by antisense transcription and ncRNA and nuclear export, which cannot currently be recognised by the current system. assist in governing gene regulation from the initiation of transcription up to the degradation of RNA in the cytoThere are also a number of other mutations in S. cereplasm. The complex interaction-networks that exist within visiae known to impact on the transcriptional process. eukaryotic cells governing gene regulation involving non- These alterations include ease of promoter binding, eloncoding transcripts augment the processes that are already gation and nuclear export. The model developed here can known to govern levels of key proteins in cells. be use to compare any sets of strains with nuclear and cytoTherefore this work has revealed the importance of plasmic RNA data and in order to further understand what impact these mutations are having on a transcription level ncRNA and antisense transcription has on gene regulation. this model could be used to cast some light on a dark and With high levels of ncRNA present in mammalian cells, these results highlight the importance of ncRNA and all mostly unknown area of gene regulation and understand which steps of the gene regulation process major regulatory forms of transcription of gene regulation. proteins affect. Future Work One area of particular interest is the changes that occur across the GAL1 gene in the absence of GAL1 antisense transcription. With antisense transcription believed to be removing methylation marks at histones and placing acetylation marks, the effects should be witnessed across the entire gene, affecting polymerase elongation, as well as at the promoter region. Having seen the changes to the regulation of GAL10 the mechanisms involved in regulating other genes can now be closely monitored and modelled. Acknowledgements The author would like to acknowledge the use of the University of Oxford Advanced Research Computing (ARC) facility in carrying out this work. The smFISH experiments were carried out by Françoise Howe. Image analysis was initiated by Struan Murray and automated by Andrew Angel. Data analysis was carried out using MATLAB [34]. Simulation scripts and raw data can be found at: http://www.dtc.ox.ac.uk/people/14/ brownt/Research/Tom_research.html Further, through similar techniques to those used to de- 10 [20] Zenklusen, D. Larson, D. Singer, R. (2008) Single-RNA counting reveals alternative modes of gene expression in yeast. Nature Structural & Molecular Biology, 15(12): 1263-1271. References [1] Phillips, T. (2008) Small Non-coding RNA and Gene Expression. Nature Education, 1(1): 115. [21] Bahar Halpen, K. et al. (2015) Bursty Gene Expression in the Intact Mammalian Liver. Molecular Cell, 58: 147-156. [2] Mattick, J. Makunin, I. (2006) Non-coding RNA. Human Molecular Genetics, 15(Review Issue 1), R17-R29. [3] David, L. et al. (2006) A high-resolution map of transcription in the yeast genome. PNAS, 103(14), 5320-5325. [22] Raj, A. Peskin, C. Ranchina, D. Vargas, D. Tyagi, S. (2006) Stochastic mRNA Synthesis in Mammalian Cells. PLoS Biology, 4(10): 1707-1719 (e309). [23] Trcek, T. Chao, J. Larson, D. Park, H. Zenklusen, D. Shenoy, S. Singer, R. (2012) Single-mRNA counting using fluorescent in situ hybridization in budding yeast. Nature Protocols, 7(2), 408-419. [4] Zentner, G. Henikoff, S. (2013) Regulation of nucleosome dynamics by histone modifications. Nature structural & molecular biology, 20(3), 259-266. [24] Johnston, M. Flick, J. Pexton, T. (1994) Multiple Mechanisms Provide Rapid and Stringent Glucose Repression of GAL Gene Expression in Saccharomyces cerevisiae. Molecular and Cellular Biology, 14(6): 3834-3841. [5] Murray, S. Haenni, S. Howe, F. Fischl, H. Chocian, K. Nair, A. Mellor, J. (2015) Sense and antisense transcription are associated with distinct chromatin architectures across genes. Nucleic Acids Research, published online: June 29, 2015. [6] Anderson, M. et al. (2002) Projection of an Immunological Self Shadow Within the Thymus by the Aire Protein. Science, 298, 1395-1401. [7] Cox, P. Goding, C. (1991) Transcription and Cancer. British Journal of Cancer, 63(5), 651-662. [8] Alberts, B. et al. (2014) Molecular Biology of the Cell. How Cells Read the Genome: From DNA to Protein, (Garland Science, New York), pp299-368. [9] Neil, H. Malabat, C. d’Aubenton-Carafa, Y. Xu, Z. Steinmetz, L. Jacquier, A. (2009) Widespread bidirectional promoters are the major source of cryptic transcripts in yeast. Nature, 457: 10381042. [10] Sigova, A. et al. (2013) Divergent transcription of long noncoding RNA/mRNA gene pairs in embryonic stem cells. PNAS, 110(8): 2876-2881. [11] Xu, Z. et al. (2009) Bidirectional promoters generate pervasive transcription in yeast. Nature, 457: 10433-1037. [25] Smale, S. Kadonaga, J. (2003) The RNA Polymerase II Core Promoter. Annual Review of Biochemistry, 72, 449-479. [26] Sanchez, A. Choubey, S. Kondev, J. (2013) Stochastic models of transcription: From single molecules to single cells. Methods, 62(1): 13-25. [27] Peccoud, J. Ycart, B. (1995) Markovian Modelling of Gene Product Synthesis. Theoretical Population Biology, 48: 222-234. [28] Gillespie, D. (1977) Exact Stochastic Simulation of Coupled Chemical Reactions. The Journal of Physical Chemistry, 81(25): 2340-2361. [29] Metropolis, N. Rosenbluth, A. Rosenbluth, M. Teller, A. Teller, E. (1953) Equations of State Calculations by Fast Computing Machines. Journal of Chemical Physics, 21(6): 1087-1092. [30] Hastings, W. (1970) Monte Carlo Sampling Methods Using Markov Chains and Their Applications. Biometrika, 57(1): 97109. [31] Eser, P. et al. (2014) Periodic mRNA synthesis and degradation co-operate during cell cycle gene gene expression. Molecular Systems Biology, 10(1), 717. [12] Carozza, M. et al. (2005) Histone H3 methylation by Set2 directs deacetylation of coding regions by Rpd3S to suppress spurious intragenic transcription. Cell, 123(4): 581-592. [32] Grimmett, G. Stirzaker, D. (2001) Probability and Random Processes. Queues, (Oxford University Press, Oxford), pp367-370. [13] Murray, S. Serra Barros, A. Brown, D. Dudek, P. Ayling, J. Mellor, J. (2012) A pre-initiation complex at the 3’-end of genes drives antisense transcription independent of divergent sense transcription. Nucleic Acids Research, 40(6): 2432-2444. [33] Wolfram Research, Inc., Mathematica, Version 10.0, Champaign, IL (2014). [34] The MathWorks, MATLAB, Version 8.5, Natick, MA (2015). [14] Pelechano, V. Steinmetz, L. (2013) Gene regulation by antisense transcription. Nature Reviews Genetics, 14: 880-893. [15] Shearwin, K. Callen, B. Egan, J. (2005) Transcriptional interference - a crash course. Trends Genetics, 21(6): 339-345. [16] Houseley, J. Rubbi, L. Grunstein, M. Tollervy, D. Vogelauer, M. (2008) A ncRNA Modulates Histone Modification and mRNA Induction in the Yeast GAL Gene Cluster. Molecular Cell, 32: 685-695. [17] Gupta, R. et al. (2010) Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastatis. Nature, 464: 1071-1076. [18] Carrieri, C. et al. (2012) Long non-coding antisense RNA controls Uchl1 translation through an embedded SINEB2 repeat. Nature, 491: 454-457 [19] Faghihi, M. et al. (2008) Expression of a noncoding RNA is elevated in Alzheimer’s disease and drives rapid feed-forward regulation of β-secretase. Nature Medicine, 14(7): 723-730. 11
© Copyright 2026 Paperzz