Stochastic Modelling of Transcription

Stochastic Modelling of Transcription
Tom Brown
Supervisors: Dr Andrew Angel, Prof. Jane Mellor
Project 1
lieved to be much more complex with a number of additional factors influencing gene regulation. These include:
The regulation of gene expression and RNA pro- non-coding antisense transcription, that of transcribing in
duction is a tightly-controlled process that can have the opposite orientation to the gene, or ‘sense’ strand [2, 3];
major negative effects on organisms if compromised. modifications to chromatin and DNA [4]; and the act of
Further to genes being transcribed into functional transcription itself is also believed to have a role in gene regmRNAs, non-coding RNA is produced which has ulation [5]. Without the correct responses to external stimregulatory roles in the cell. Antisense transcrip- uli, incorrect recruitment of transcription factors, through
tion has been associated with a number of key ncRNA or other regulatory mechanisms, can have serious
gene regulatory roles both at the transcriptional clinical consequences, such as in immune disorders [6] or
level and once the non-coding antisense transcript cancer [7].
has been produced. This project compares data
Steps involved in transcription
from two mutant strains of Saccharomyces cerevisiae, with and without antisense transcription
In order for a gene to be successfully transcribed, the proacross the GAL1 gene, via mathematical modelling.
moter region upstream of the gene must be accessible for
Through a stochastic model and comparison to sinthe RNA polymerase to bind and begin the initiation phase
gle molecule transcript data, the key aspects of gene
of transcription. During this initiation, RNA polymerase
regulation governed by antisense transcription are
recognises the promoter region of the gene and, using initihighlighted. The model developed makes clear the
ation factors, binds to the promoter. The RNA polymerase
impact antisense transcription has on the initiation
recruits a number of elements of the transcription machinof transcription at the DNA up to the degradation
ery in order to form the pre-initiation complex (PIC). Once
of RNA in the cytoplasm, affecting the entire RNA
bound, the RNA polymerase and the initiation factors seplife-cycle.
arate the two DNA strands so that the template strand is
exposed to the polymerase and can be read. The exposed
Introduction
strand then enters the transcription machinery, beginning
the elongation process. The RNA polymerase progresses
Transcription is one of the fundamental processes un- along the template strand, reading the single nucleotides,
derlying variation in organisms, dictating their abilities to producing the complementary RNA strand. As the polyfunction, adapt and survive in the face of changing environ- merase reaches the gene terminator, the completed RNA
ments. During transcription, the 4 DNA bases are copied to strand is then released from the RNA polymerase. Several
complementary RNA nucleotides. Functional mRNA pro- proteins act on the RNA strand as part of processing to
duced from coding genes are translated into functional pro- prevent early degradation, assist nuclear export and provide
teins, while the non-coding RNAs (ncRNAs) could have fur- a template for the translational machinery. Once mRNAs
ther effects on the regulation of gene expression[1]. The have been processed, they are exported to the cytoplasm,
tightly regulated control of transcription plays vital roles in where they undergo protein translation and are ultimately
an organism’s development and in response to environmen- degraded (Fig. 1)[8].
tal cues. If genes are not correctly transcribed, organisms
will experience irregular levels of protein and RNA, which Antisense transcription
can be lethal to cells.
Abstract
A major focus of the work carried out here is the impact
The traditional view of gene regulation was that of tranantisense
transcription has on gene regulation and in particscription factors initiating transcription, producing functional mRNA and proteins, which may themselves further ular what effect can be detected on the mechanistic level of
influence transcription. The underlying process is now be- gene regulation regarding promoter activity, transcription
1
Fig. 1: A schematic of transcription and the key processes involved. During transcription, a gene promoter can either be in an
active or inactive configuration (1). Once in an active configuration, RNA polymerase binds to the promoter, initiating transcription
(2). The polymerase moves along the gene during the elongation phase (3), copying the DNA bases into corresponding RNA bases.
Once complete, the mRNA and polymerase dissociate from the gene (4). The mRNA is then exported to the cytoplasm (5), where
it will be degraded (6).
initiation, polymerase elongation, nuclear export and cytoplasmic degradation. Antisense transcription is initiated
and regulated through similar mechanisms to that of sense
transcription, arising through independent promoters; bidirectional promoters, promoting transcription in both the
sense and antisense directions [9, 10]; and cryptic promoters,
situated within genes. In S. cerevisiae, most antisense transcripts arise through bidirectional promoters [9, 11], with
some arising from cryptic promoters [12] or from the terminator of the sense gene [13].
can alter chromatin modifications or DNA methylation in
higher eukaryotes [15, 16, 17]. In S cerevisiae, antisense
transcription arising from internal cryptic promoters has
been shown to directly modify the chromatin of the associated sense gene, delaying the initiation of the transcription through hypoacetylation of the nucleosomes surrounding the promoter of the sense gene [16]. Recent evidence
suggests that sense and antisense transcripts could recruit
histone modifying proteins to the locus, altering the acetylation and methylation status of key histone marks. Sense
and antisense transcription were found to associate with elRegulation of sense transcription
evated levels of histone methylation and acetylation, respectively, at the gene body and promoter [5]. The acetylation
Both antisense transcripts and the act of antisense tran- marks left by antisense transcription could allow for easier
scription itself can affect a number of stages of gene ex- passage of the sense-transcribing polymerase.
pression [14]. Sense transcription initiation can be affected
Antisense transcription could also affect elongation of
through promoter competition, where the assembly of the
transcription machinery at one promoter can impact the as- sense transcription. Sense RNA polymerase elongation can
be halted through collision with RNA polymerase moving
sembly of the transcription machinery at another promoter
[15]. Transcribing RNA polymerase moving in the antisense in the antisense direction [15]. Antisense transcription can
also affect which transcript isoforms are produced from the
direction can block the binding site of a sense promoter, resulting in a polymerase being unable to bind to the sense sense locus. By altering the splice sites, mRNA processing
can be affected, potentially resulting in poor nuclear export
promoter. Furthermore, the act of antisense transcription
2
and likely degradation by the nuclear exosome [15].
There is evidence to suggest that antisense transcripts can
further affect the mRNA once it has exited the RNA polymerase. By binding to the 5’ region of the sense transcript,
it has been shown that antisense transcripts can increase
translational efficiency of some genes. The sense mRNA
can be made both more or less stable through pairing of
sense-antisense molecules. This form of regulation relies
on both the sense and antisense molecules being present in
the cell simultaneously, resulting in a lower probability of
occurrence, with evidence so far suggested in mammalian
cells [18, 19].
The mathematical model presented here will be used to
detect any changes at these stages in transcription and identify where antisense transcription is most affecting gene regulation. The comparison between strains with and without
antisense transcription across a locus will highlight which Fig. 2: Example data used in this study, obtained through single
of the above mechanisms are the key means by which regu- molecule fluorescence in situ hybridization (smFISH). For each
cell, the DNA has been stained using DAPI (blue), showing the
lation through antisense transcription is mediated.
nucleus of each cell and the mRNA molecules of the gene of
interest are bound by a fluorescent single-stranded DNA probe
with the complementary sequence (red). Any bright red foci
on the nucleus show one or more mRNA transcripts undergoing
transcription or awaiting nuclear export. Any foci outside of the
nucleus were counted as cytoplasmic mRNA.
Stochastic behaviour of gene expression
Previous studies have analysed gene expression in individual S. cerevisiae [20] and mammalian cells [21]. While
population average estimates of gene expression show consistent levels of expression under similar conditions, the expression levels in individual cells show large variation. In
S. cerevisiae, genes have been observed to undergo bursting expression, demonstrating short time periods where the
promoter region is in an active state, with large numbers of
RNA molecules transcribed during this active period. As a
result of this, one sees a great range in the the number of
transcripts per cell at a single time-point across a population [20]. Due to this variation in gene expression across cell
populations, transcription is modelled here by considering
the underlying transcription mechanisms of each cell, rather
than modelling transcription based on population averages.
Methods
Transcript data
Transcript data in this study were obtained through single molecule fluorescence in situ hybridization (smFISH),
which allows spacial localisation of the RNA molecules in
a cell at a fixed moment in time. This technique involves
probing fixed cells with single-stranded DNA probes complementary to the sequence of the gene of interest, with
each probe having a fluorescent dye attached [22]. Once the
probe has bound to the appropriate RNA sequence, it will
fluoresce under laser excitation. By further staining the nucleus with DAPI (4’,6-diamidino-2-phenylindole), one can
determine how many transcripts are present in both the
nucleus and cytoplasm for each cell at a fixed moment in
time [23] (Fig. 2).
Thus given the underlying noisy gene expression, transcription is modelled here as undergoing a series of stochastic reactions on the single-cell level. A model of individual
cells will capture the inherent variation in gene expression
across the population. The variations in the rates of the
underlying reactions involved in the life-cycle of RNA will
cast light on any changes on a mechanistic level which can
be detected with and without antisense transcription.
Any transcribing RNA polymerase will result in a partial
RNA transcript emerging from the transcription machinery,
and as such probes will bind to this RNA, producing a nuclear spot on the raw image. To capture this behaviour, the
model developed includes both complete and partial RNA
3
Fig. 3: Schematic of the GAL1 -GAL10 loci. The two strains of S. cerevisiae used in this study each had truncated forms of GAL1.
The strain SH9 had a scrambled TATA box in its terminator, resulting in a reduction in the amount of antisense transcription across
GAL1. Upstream of GAL1, GAL10 shares a bi-directional promoter with GAL1. The GAL1 antisense transcription terminates
in the bi-directional promoter, whereas the GAL10 antisense, beginning inside the GAL10 open reading frame, has two forms,
terminating in the promoter and continuing through into the sense GAL1 transcript [5].
transcripts for comparison to the smFISH data.
GAL10. This system presents the opportunity to identify
the effect antisense transcription has on its corresponding
Given this level of detail it is possible to obtain distributions of the levels of cytoplasmic mRNA and the number of sense gene and any genes downstream of the antisense transcription. The GAL10 smFISH data collected was of higher
transcripts held in the nucleus awaiting elongation terminaquality
than the GAL1, thus the GAL10 data was used for
tion or nuclear export.
further development of the model.
Saccharomyces cerevisiae strains
Stochastic modelling
In order to analyse the effect of antisense transcription
on the mechanisms governing gene expression, data were
collected from similar mutant strains of S. cerevisiae where
one mutant has reduced antisense transcription across the
GAL1 locus. The strains, labelled SH9 and ADH4, both
have engineered GAL1 genes, expression of which is stimulated in galactose [24]. In the SH9 strain, the TATA box,
found to associate with RNA polymerase II binding [25],
in the terminator for the GAL1 gene is scrambled, significantly reducing the amount of antisense transcription that
occurs across the GAL1 sequence to approximately 5% of
the wild-type levels [5]. The ADH4 strain has an intact terminator and levels of antisense transcription across GAL1
comparable to wild-type levels, but, like the SH9 strain,
had a shortened GAL1 sequence, resulting in identical sense
GAL1 mRNA molecules for both strains. The alterations
to the GAL1 gene also affect transcription of the GAL10
gene, found upstream of the GAL1 gene, transcribed on the
opposite strand. The GAL1 and GAL10 genes share a bidirectional promoter, with the GAL1 antisense transcript
and one form of the GAL10 antisense transcript terminating in this terminator.[5] (Fig. 3).
The transcriptional system was modelled as a stochastic
process with 6 core reactions, where the rate of each reaction
depended on the current state of the system. The reactions
were:
inactive gene → active gene
active gene → inactive gene
(1)
(2)
active gene → active gene + elongating Pol II (3)
elongating Pol lI → nuclear transcript
nuclear transcript → cytoplasmic transcript
cytoplasmic transcript → ∅
(4)
(5)
(6)
Where ∅ represents the loss of an mRNA molecule
through cytoplasmic degradation. While previous work
has been carried out modelling the rate at which mRNA
transcripts are transcribed, translated and degraded, these
models have not taken into account any nuclear retention
[20, 22, 26, 27]. Some of the reactions used in this model
are incorporated in the model analysed in [27], however the
introduction of reactions involving transcription elongation
and nuclear export aims to reveal more information regarding the changes occurring on a mechanistic level regarding
gene regulation.
The reduction in antisense in transcription across the
GAL1 locus will affect gene regulation of both GAL1 and
Due to the inherent noisiness of single-cell gene expres4
Distribution of cytoplasmic transcripts after 15 minutes in glucose
0.3
0.25
0.25
0.2
0.2
Frequency
Frequency
Distribution of cytoplasmic transcripts after 7 minutes in glucose
0.3
0.15
0.15
0.1
0.1
0.05
0.05
0
0
0
5
10
15
20
25
30
0
Number of cytoplasmic transcripts
5
10
15
20
25
30
Number of cytoplasmic transcripts
A
B
Fig. 4: Data from shutdown experiments performed, where the S. cerevisiae cells’ environment was changed from galactose to
glucose, which stops transcription across the GAL genes, having previously undergone high levels of transcription in galactose.
Distributions of cytoplasmic transcripts 7 minutes (A) and 15 minutes (B) after glucose was added to the cells’ environment.
Over the 8 minute period, the frequency of cells containing lower numbers of transcripts increases, demonstrating the shut-down
in transcription
5. Update concentrations according to reaction k occurring, then repeat from step 1 until t > Tmax .
sion, a stochastic Gillespie algorithm was incorporated to
model the underlying reactions [28]. In this algorithm each
reaction is assigned a probability of occurring based on
the number of molecules in the system that could undergo
this reaction. Each time-step is modelled as a memory-less
Markovian process where the system progresses based entirely on the current number and distribution of different
molecules present. The time to next reaction is modelled
as an exponential random variable, with the probability of
a reaction occurring increasing with time at a rate given
by the state of the current system. Once a reaction time
has been chosen, which reaction occurs is chosen randomly,
weighted by the distribution of molecules in the system:
To count the number of nuclear and cytoplasmic transcripts in the cell at a fixed time point, the model was run
for 500 time-steps to allow the system to reach a steady
state (Fig. S1) and the number of full transcripts in the
cytoplasm and the number of full and partial transcripts in
the nucleus were counted at the end of the simulations. This
method was used to most closely represent the data measured, reflecting the single-time snap shot of the data and
ignored any partially degraded transcripts in the cytoplasm.
Determining degradation rates
1. At time t, compute the propensities of each reaction,
The system of equations in the above form suffered from
given the current state of the system, ai for reaction i.
2. Generate two random numbers uniformly on the inter- a problem of scale. If the rate at which transcripts were exported from the nucleus increased, a balance could be struck
val [0,1] ri ∼ U (0, 1), i = 1, 2
by increasing the rate at which the transcripts were de3. Compute time to next reaction:
graded in the cytoplasm. This artefact of the model meant
n
X
1
1
that equally good distributions to fit the data were found
ai
(7)
,
a0 =
τ = log
a0
r1
by increasing corresponding rates, resulting in an infinite
i=1
space of parameter fits. To reduce the search space, and
Next reaction occurs at time t + τ .
correspondingly the solution space, the degradation rate for
4. Compute which reaction occurs next by finding k such each strain was determined by performing further simulathat:
tions.
k−1
k
1 X
1 X
In order to determine the degradation rate of the GAL10
ai ≤ r2 <
ai
(8)
a0 i=0
a0 i=0
cytoplasmic mRNA, results from a glucose shut-down exwhere k can take the value of any reaction (here num- periment were modelled. In this experiment, cells were
grown for 2 hours in galactose, causing expression of the
bered 1-6).
5
GAL genes. Glucose was then introduced to the cells’ environment, causing transcription of the GAL genes to shut
down. After roughly 7 minutes in glucose, it has been assumed that RNA polymerase stopped binding to the promoter region and no new transcripts were produced in the
nucleus [24]. Given the low number of transcripts in the
nucleus, the cytoplasmic system could be approximated as
only undergoing degradation:
in reality, so the parameters were allowed to vary unconstrained, further allowing any inter-dependent parameters
to vary to the same scale, of which there are numerous in
the system as discussed above. At each step, new parameters were chosen by drawing random numbers from a Normal distribution with mean given by the current parameter
value and fixed standard deviation of 0.02. These sets of
parameters were then used to simulate 500 time steps for
4,500 cells to determine the distributions for the number of
cytoplasmic transcript → ∅
(9) nuclear and cytoplasmic transcripts, allowing the system to
reach a steady state and to capture the underlying distri2
The number of transcripts present in the cytoplasm at bution (Fig. S1). A modified χ statistic was used as a
goodness-of-fit test for the simulated set of parameters:
the 7 minute time point of the galactose to glucose shutN
down were calculated and then the 15 minute data were
nuc
X
(Oi − Ei )2
Ncyt
χ2 =
normalised to have the same number of cells as the 7 minute
Ncyt + Nnuc i=0
Ei
data. All of the cytoplasmic transcripts were then simuNcyt
X (Oi − Ei )2
Nnuc
lated as existing in the same environment and each tran(10)
+
Ncyt + Nnuc i=0
Ei
script could be degraded with the same probability determined by the degradation rate as in the above Gillespie algowhere Nnuc and Ncyt are the number of nuclear and cytorithm. For each degradation rate, the entire population unplasmic data points with at least 5 counts, respectively. Oi
derwent 8 minutes of simulated reactions with cytoplasmic
represents the number of observed cells with i transcripts
transcripts being degraded at the rate given by the current
and Ei the number of expected cells with i transcripts from
degradation rate. The degradation rates were varied from
the simulated model, only including those values of i where
0.0001 to 0.2 in steps of 0.0001 and the degradation rate
Oi ≥ 5 and Ei > 0. The χ2 statistic was chosen as a
was recorded if the number of transcripts remaining in the
goodness-of-fit test to incorporate all recorded data points
simulated system after 8 minutes was the same as the numand penalise greater relative differences to the distributions,
ber of transcripts in the normalised 15 minute data. These
i.e. the same absolute difference between the raw data and
simulations were then repeated 2,000 times, recording each
the simulated results would be penalised heavier if there was
instance when a degradation rate resulted in a successful
a larger relative difference, when compared to the simulated
simulation. The histogram of successful degradation rates
data (Ei ).
was then fitted to a Beta distribution, chosen to encompass
This weighted χ2 statistic gave equal value to the nuclear
the fixed interval of possible degradation rates and to obtain a modal value for the degradation rates. The modal data and the cytoplasmic data, regardless of how many data
value of the fitted Beta distribution, signifying the peak in points were included in each. This was used to avoid oversuccessful degradation rates, was taken to be the value of fitting to one data set over the other.
the true degradation rate for the ADH4 and SH9 strains.
This statistic was obtained at the end of each set of simulations for the sampled parameter values, the parameters
used for these simulations were accepted if a random number drawn from the interval (0.5, 1) was less than the ratio
of the old statistic to the new statistic:
Searching the parameter space
Only the degradation rate could be calculated using the
smFISH data available, thus in order to gain some insight
into the rates of the other 5 reactions in the system, the
nuclear and cytoplasmic transcript data were used. To
determine which sets of parameters created distributions
that best explained the data, paramaters were sampled and
tested using Markov Chain Monte Carlo (MCMC) simulations via a Metropolis-Hastings algorithm, varying the 5 unknown parameters with each simulation step [29, 30]. Initial
conditions were chosen as unbiased equal values of 0.5min−1
and allowed to vary with equal standard deviation of 0.02
over the positive real numbers. This reflected the fact that
it was not known accurately how long each reaction takes
r<
χ2old
χ2new
(11)
where r ∼ U (0.5, 1).
The new parameters were accepted if the new χ2 statistic
was smaller than the previously accepted statistic. If the
new statistic was larger than the previous statistic, but less
than double the previous
2statistic,it was accepted with a
χ
probability equal to 2 · χ2old − 0.5 . If the tested paramenew
ters resulted in a measure that was too large to be accepted,
then the new rates were rejected and the parameter values
6
ADH4 GAL10 degradation rates, mode: 0.045269
SH9 GAL10 degradation rates, mode: 0.028703
2000
3000
1800
2500
1400
Accepted Frequency
Accepted Frequency
1600
1200
1000
800
600
400
2000
1500
1000
500
200
0
0
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0
Degradation Rate (min-1)
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
Degradation Rate (min-1)
A
B
Fig. 5: GAL10 degradation rates in the presence and absence of antisense transcription. Various degradation rates were tested
against the shutdown experimental data to determine the true degradation rates in the two strains. Shown are histograms of the
successful simulated GAL10 degradation rates from shut-down experiments and fitted beta distributions to determine the modal
degradation rate. The calculated degradation rates for the GAL10 mRNA were: 0.045 min−1 (A) and 0.029 min−1 (B) for the
+antisense (ADH4) and -antisense (SH9), respectively.
remained at the last accepted values. Once a set of parameter were accepted, these were then used as the mean when
choosing the sample parameters for the next step in the iteration. This method was used to ensure a thorough search of
the parameter space, avoiding being restricted to local minima, yet further being prevented from exploring superfluous
areas of the parameter space in order to save computational
time. To perform an effective search of the 5-dimensional
parameter space, each sample parameter would have to be
tested against each sample parameter of the other 4 parameters. For a sample space of 100 values per parameter, this
would involve testing 1010 parameter sets, requiring approximately 3,000 hours of computation time (48,000 CPU hours
2.0GHz Xeon SandyBridge Red Hat Enterprise Linux).
utes and 24.8 minutes. A recent study calculated the average mRNA half-life in S. cerevisiae was approximately 11.5
minutes [31], showing that the calculated values from the
shutdown simulations were reasonable.
These determined degradation rates were then used as
fixed values in the MCMC parameter search, keeping the
degradation rate fixed and varying the other 5 parameters
as explained above.
Inter-dependency of different rates
The parameter search simulations were performed using
the determined degradation rates to determine the sets of
parameters that explained the experimental distributions
obtained (Fig. 6). By approximating the gene promoter
to have reached a steady state, where the promoter is in
Results
on
an active configuration with probability: on+of
f , the entire
system
can
be
approximated
as
a
queueing
network
proDegradation rates
cess where there are 3 servers with infinite capacity repApplying the degradation simulations to the data from resenting the elongation, export and degradation events.
a shutdown experiment looking at RNA from the GAL10 Simplifying this model to a succession of three M/M/∞
genes, the degradation rates were determined for the cyto- queues, meaning that arrival times and processing times
plasmic RNA in both the ADH4 and SH9 strains. Fig. 5 are distributed as memory-less Poisson processes and there
shows the accepted simulated degradation rates. For the are infinitely-many servers available to process arrivals, the
GAL10 mRNA, the degradation rates were 0.045 min−1 mean number of transcripts at each queue can be calculated
in the ADH4 strain (Fig. 5A) and 0.029 min−1 in the SH9 [32]. Using Mathematica [33] , the stationary distribution
strain (Fig. 5B). These correspond to half-lives of 15.4 min- of the cytoplasmic transcripts was calculated as being Pois-
7
Simulated Data - Nucleus
50
100
0
150
100
0
0
5
0
5
0
10
0
30
20
10
0
0
10
20
30
Number of Transcripts per cell
0
10
20
30
10
40
30
20
10
30
20
10
0
0
Number of Transcripts per cell
5
Number of Transcripts per cell
Simulated Data - Cytoplasm
0
0
50
10
40
Frequency
Frequency
20
5
Number of Transcripts per cell
Real Data - Cytoplasm
40
30
100
0
10
Number of Transcripts per cell
Simulated Data - Cytoplasm
40
50
150
0
10
Number of Transcripts per cell
Real Data - Cytoplasm
Frequency
50
Simulated Data - Nucleus
Frequency
100
Frequency
150
Frequency
Frequency
150
Real Data - Nucleus
Frequency
Real Data - Nucleus
10
20
30
0
Number of Transcripts per cell
A
10
20
30
Number of Transcripts per cell
B
Fig. 6: Fits of the distributions of the number of nuclear and cytoplasmic transcripts obtained through smFISH experiments.
Shown are the best fits to the GAL10 ADH4 (A) and SH9 (B) data using a weighted chi-square statistic placing equal importance
on the nuclear and cytoplasmic data
init·on
son with mean deg(on+of
f ) . The stationary distribution of
the nuclear transcripts was given by a sum of two Poisson processes, where the means of the two processes were:
init·on
init·on
elong(on+of f ) and exp(on+of f ) . This demonstrates rather
nicely that the key values to be determined are the balance
between transcription initiation and degradation given by:
init·on
deg(on+of f ) and the balance of transcription in the nucleus
init·on
init·on
given by the sum: elong(on+of
f ) + exp(on+of f ) .
each fitted distribution, frequencies were determined to be
0.20 min−1 for the ADH4 +antisense strain (Fig. 7B) and
0.13 min−1 for the SH9 -antisense strain (Fig. 7C).
A reduced system
While the system of 6 equations described above captures the key aspects of transcription and transcript level,
the model presents too many unknown parameter values for
Given that the degradation rate can be determined from the data available. In order to determine the effect the loss
the shut-down experimental data, the model is able to de- of antisense transcription has on the elongation of the polytermine the value of the transcription initiation frequency: merase and export rate of the nuclear transcripts, the results
from the simulations of the previous model were compared
init · on
(12)
with that of a simpler model. The system was reduced to
on + of f
one only undergoing 5 reactions:
Simulations from both ADH4 and SH9 GAL10 data were
inactive gene →active gene
(13)
fitted using the MCMC model simulations and the best
active gene →inactive gene
(14)
1,000 parameter sets were recorded to give an indication of
active gene →active gene + nuclear transcript (15)
any patterns between parameter values associated with distributions fitting the collected data. Fig. 7A shows the ininuclear transcript →cytoplasmic transcript
(16)
on+of f
tiation frequency plotted against the fraction:
. The
cytoplasmic transcript →∅
(17)
on
ratio between the initiation frequency and the probability
the gene is in an active configuration gives the value of the In this reduced model, the two nuclear reactions, that of
transcription initiation frequency. Plotting the initiation transcription elongation and nuclear export, were reduced
frequency against the on-fraction demonstrates this ratio as to one reaction . In accordance with the simplified queueing
the gradient of the straight line passing through the points format, this one reaction could be modelled as undergoing a
init·on
equivalent of a
plotted. In order to determine the value of the transcription Poisson process with mean nuc(on+of
f ) , the
1
init·on
1
· on+of
+ exp
initiation frequency for the two strains, histograms of the Poission process with mean elong
f . Perratio were calculated and smoothed distributions were fit- forming the MCMC parameter search algorithm with this
ted to the calculated histograms, indicating at its peak the simplified system, any changes in the amount of time tranmodal value of the histogram. By taking the modal value of scripts spent in the nucleus could be determined by fitting
8
Transcription initiation frequencies
ADH4 transcription initiation, mode: 0.1978
ADH4
SH9
2
35
30
50
25
1.5
1
Frequency
40
Frequency
Initiation Frequency (min -1 )
SH9 transcription initiation, mode: 0.12938
60
2.5
30
20
15
20
10
0.5
10
5
0
0
0
2
4
6
8
10
12
1/On Frequency
14
16
18
20
0
0
0.05
0.1
0.15
0.2
0.25
Transcription Initiation Rate (min-1 )
A
0.3
0
0.05
0.1
0.15
0.2
0.25
0.3
Transcription Initiation Rate (min-1 )
B
C
Fig. 7: Change in the transcription initiation rate in the presence and absence of antisense transcription. The best 1,000 parameter
sets fitted to nuclear and cytoplasmic RNA data in +antisense (ADH4) and -antisense (SH9) strains were taken to analyse the
f
differences in the distributions of the experimental data from the two strains. The ratio of initiation to on frequency on+of
,
on
demonstrated by the straight-line
behaviour in A, can be determined by fitting to histograms of the calculated transcription
init·on
,
shown
in B and C, showing the rate at which new transcripts are initiated in each strain. The
initiation frequencies on+of
f
determined transcription initiation frequencies were 0.20 min−1 for ADH4 and 0.13 min−1 for SH9.
to the nuclear transcript data. There was a clear difference
detected in amount of time transcripts spent in the nucleus
(Fig. 8B) in the reduced model. This single parameter was
mapped to the nuclear balance between elongation and export rates in the more complex model. Fig. 8A shows the
separation in rates and Fig. 8C shows the corresponding
change in time spent in the nucleus under the more complex model. The two systems consistently show the ADH4
GAL10 mRNA spending less time in the nucleus, with the
rate for transcripts to go from initiation to export approximately doubling compared to the SH9 GAL10 mRNA. Similar simulations were carried out with an unweighted χ2
measure, resulting in a heavier bias towards fitting cytoplasmic data. Identical conclusions were drawn from these
simulations with transcription initiation occurring faster in
ADH4, ADH4 mRNA spending less time in the nucleus and
then being degraded faster in the cytoplasm (Fig. S2 - Fig.
S4).
detected from one experiment and the size of the experiment is not sufficiently large to yield a statistically significant result. Increased transcription initiation frequency in
the presence of antisense transcription agrees with an argument suggesting that acetylated histones at the promoter
region, brought about by antisense transcription, facilitate
polymerase binding to the promoter region. In particular,
reduction in antisense transcription would result in histones
with methylated marks remaining at the promoter region,
reducing polymerase binding to the promoter region. These
methylated histone marks have been associated with genes
experiencing lower levels of expression, which in the system
studied would be left on histones not experiencing antisense
transcription[5, 4].
In addition to the effects detected on transcription initiation at the GAL10 locus, antisense transcription or the
transcripts themselves are having an effect on the time taken
for the GAL10 mRNA to leave the nucleus. If the polymerase from the GAL1 antisense transcription continues
past the GAL10 promoter, the transcription machinery may
continue leaving acetylation marks on histones in the ORF
of the gene, resulting in improved elongation efficiency [4].
Discussion
The simulations performed demonstrate a clear difference
in the underlying mechanisms governing regulation of the
GAL10 gene by antisense transcription across the upstream
GAL1 gene. The act of antisense transcription has been
associated with histone acetylation [16, 5], a mark which
has been shown to promote transcription initiation and facilitate polymerase elongation [4]. In the system of genes
studied, reduction in the antisense transcription into the
promoter of the GAL10 gene saw a reduction in transcription initiation frequency, although this result has only been
It is not known what effect histone modifications have
on the transcripts themselves. It may be the case that
the acetylation/methylation status of the histones either
at the gene promoter or in the gene body leave marks on
the mRNA signalling early nuclear export and cytoplasmic
degradation. The absence of antisense transcription, resulting in hypoacetylation may cause the mRNA to not have
these key markers, slowing down nuclear export and cytoplasmic degradation. This change in degradation rate is
9
Nuclear rates
Transcription initiation and nuclear rate
ADH4
SH9
200
150
100
50
0
10
15
20
25
Elong + Exp (min -1 )
30
35
Transcription initiation and nuclear rates
0.2
0.3
Transcript initiation frequency (min -1 )
Elong x Exp (min-2 )
250
Transcription initiation frequency (min-1 )
300
ADH4
SH9
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
ADH4
SH9
0.28
0.26
0.24
0.22
0.2
0.18
0.16
0.14
0.12
0.1
0
5
10
Nuclear Rate (min -1 )
A
B
15
0
1
2
3
4
5
6
7
8
Elong x Exp / (Elong + Exp) (min -1 )
C
Fig. 8: Change in time transcripts spent in the nucleus in the presence and absence of antisense transcription. The relationship
between the polymerase elongation and nuclear export rate is demonstrated in A. B shows the change in time transcripts spend
in the nucleus when the nuclear transcripts only undergo a single reaction encompassing both polymerase elongation and nuclear
export, with the antisense-less mutant (SH9) retaining transcripts in the nucleus for longer. C shows the same change in time
spent in the nucleus between the two strains, but relates this change to the polymerase elongation and nuclear export rate.
of particular interest as it is not currently known entirely termine the degradation rates of the mRNA, the export
what effect antisense transcription has on mRNA once in rate of mRNA from the nucleus may be found. Again, by
the cytoplasm.
assuming that no more transcripts will be produced, any
The changes at a mechanistic level detected on the RNA reduction in the number of transcripts found in the nucleus
can be modelled as occurring through nuclear export. This
life-cycle suggest that gene regulation is affected by much
more than just the processes at the transcriptional level. Al- will present a distinction between polymerase elongation
terations governed by antisense transcription and ncRNA and nuclear export, which cannot currently be recognised
by the current system.
assist in governing gene regulation from the initiation of
transcription up to the degradation of RNA in the cytoThere are also a number of other mutations in S. cereplasm. The complex interaction-networks that exist within visiae known to impact on the transcriptional process.
eukaryotic cells governing gene regulation involving non- These alterations include ease of promoter binding, eloncoding transcripts augment the processes that are already gation and nuclear export. The model developed here can
known to govern levels of key proteins in cells.
be use to compare any sets of strains with nuclear and cytoTherefore this work has revealed the importance of plasmic RNA data and in order to further understand what
impact these mutations are having on a transcription level
ncRNA and antisense transcription has on gene regulation.
this model could be used to cast some light on a dark and
With high levels of ncRNA present in mammalian cells,
these results highlight the importance of ncRNA and all mostly unknown area of gene regulation and understand
which steps of the gene regulation process major regulatory
forms of transcription of gene regulation.
proteins affect.
Future Work
One area of particular interest is the changes that occur
across the GAL1 gene in the absence of GAL1 antisense
transcription. With antisense transcription believed to be
removing methylation marks at histones and placing acetylation marks, the effects should be witnessed across the entire gene, affecting polymerase elongation, as well as at the
promoter region. Having seen the changes to the regulation of GAL10 the mechanisms involved in regulating other
genes can now be closely monitored and modelled.
Acknowledgements The author would like to acknowledge the use of the University of Oxford Advanced Research Computing (ARC) facility in carrying out this work.
The smFISH experiments were carried out by Françoise
Howe. Image analysis was initiated by Struan Murray and
automated by Andrew Angel. Data analysis was carried
out using MATLAB [34]. Simulation scripts and raw data
can be found at: http://www.dtc.ox.ac.uk/people/14/
brownt/Research/Tom_research.html
Further, through similar techniques to those used to de-
10
[20] Zenklusen, D. Larson, D. Singer, R. (2008) Single-RNA counting reveals alternative modes of gene expression in yeast. Nature
Structural & Molecular Biology, 15(12): 1263-1271.
References
[1] Phillips, T. (2008) Small Non-coding RNA and Gene Expression.
Nature Education, 1(1): 115.
[21] Bahar Halpen, K. et al. (2015) Bursty Gene Expression in the
Intact Mammalian Liver. Molecular Cell, 58: 147-156.
[2] Mattick, J. Makunin, I. (2006) Non-coding RNA. Human Molecular Genetics, 15(Review Issue 1), R17-R29.
[3] David, L. et al. (2006) A high-resolution map of transcription in
the yeast genome. PNAS, 103(14), 5320-5325.
[22] Raj, A. Peskin, C. Ranchina, D. Vargas, D. Tyagi, S. (2006)
Stochastic mRNA Synthesis in Mammalian Cells. PLoS Biology,
4(10): 1707-1719 (e309).
[23] Trcek, T. Chao, J. Larson, D. Park, H. Zenklusen, D. Shenoy, S.
Singer, R. (2012) Single-mRNA counting using fluorescent in situ
hybridization in budding yeast. Nature Protocols, 7(2), 408-419.
[4] Zentner, G. Henikoff, S. (2013) Regulation of nucleosome dynamics by histone modifications. Nature structural & molecular biology, 20(3), 259-266.
[24] Johnston, M. Flick, J. Pexton, T. (1994) Multiple Mechanisms
Provide Rapid and Stringent Glucose Repression of GAL Gene
Expression in Saccharomyces cerevisiae. Molecular and Cellular
Biology, 14(6): 3834-3841.
[5] Murray, S. Haenni, S. Howe, F. Fischl, H. Chocian, K. Nair, A.
Mellor, J. (2015) Sense and antisense transcription are associated
with distinct chromatin architectures across genes. Nucleic Acids
Research, published online: June 29, 2015.
[6] Anderson, M. et al. (2002) Projection of an Immunological Self
Shadow Within the Thymus by the Aire Protein. Science, 298,
1395-1401.
[7] Cox, P. Goding, C. (1991) Transcription and Cancer. British
Journal of Cancer, 63(5), 651-662.
[8] Alberts, B. et al. (2014) Molecular Biology of the Cell. How Cells
Read the Genome: From DNA to Protein, (Garland Science, New
York), pp299-368.
[9] Neil, H. Malabat, C. d’Aubenton-Carafa, Y. Xu, Z. Steinmetz, L.
Jacquier, A. (2009) Widespread bidirectional promoters are the
major source of cryptic transcripts in yeast. Nature, 457: 10381042.
[10] Sigova, A. et al. (2013) Divergent transcription of long noncoding
RNA/mRNA gene pairs in embryonic stem cells. PNAS, 110(8):
2876-2881.
[11] Xu, Z. et al. (2009) Bidirectional promoters generate pervasive
transcription in yeast. Nature, 457: 10433-1037.
[25] Smale, S. Kadonaga, J. (2003) The RNA Polymerase II Core
Promoter. Annual Review of Biochemistry, 72, 449-479.
[26] Sanchez, A. Choubey, S. Kondev, J. (2013) Stochastic models
of transcription: From single molecules to single cells. Methods,
62(1): 13-25.
[27] Peccoud, J. Ycart, B. (1995) Markovian Modelling of Gene Product Synthesis. Theoretical Population Biology, 48: 222-234.
[28] Gillespie, D. (1977) Exact Stochastic Simulation of Coupled
Chemical Reactions. The Journal of Physical Chemistry, 81(25):
2340-2361.
[29] Metropolis, N. Rosenbluth, A. Rosenbluth, M. Teller, A. Teller,
E. (1953) Equations of State Calculations by Fast Computing
Machines. Journal of Chemical Physics, 21(6): 1087-1092.
[30] Hastings, W. (1970) Monte Carlo Sampling Methods Using
Markov Chains and Their Applications. Biometrika, 57(1): 97109.
[31] Eser, P. et al. (2014) Periodic mRNA synthesis and degradation
co-operate during cell cycle gene gene expression. Molecular Systems Biology, 10(1), 717.
[12] Carozza, M. et al. (2005) Histone H3 methylation by Set2 directs
deacetylation of coding regions by Rpd3S to suppress spurious
intragenic transcription. Cell, 123(4): 581-592.
[32] Grimmett, G. Stirzaker, D. (2001) Probability and Random Processes. Queues, (Oxford University Press, Oxford), pp367-370.
[13] Murray, S. Serra Barros, A. Brown, D. Dudek, P. Ayling, J. Mellor, J. (2012) A pre-initiation complex at the 3’-end of genes
drives antisense transcription independent of divergent sense
transcription. Nucleic Acids Research, 40(6): 2432-2444.
[33] Wolfram Research, Inc., Mathematica, Version 10.0, Champaign,
IL (2014).
[34] The MathWorks, MATLAB, Version 8.5, Natick, MA (2015).
[14] Pelechano, V. Steinmetz, L. (2013) Gene regulation by antisense
transcription. Nature Reviews Genetics, 14: 880-893.
[15] Shearwin, K. Callen, B. Egan, J. (2005) Transcriptional interference - a crash course. Trends Genetics, 21(6): 339-345.
[16] Houseley, J. Rubbi, L. Grunstein, M. Tollervy, D. Vogelauer, M.
(2008) A ncRNA Modulates Histone Modification and mRNA
Induction in the Yeast GAL Gene Cluster. Molecular Cell, 32:
685-695.
[17] Gupta, R. et al. (2010) Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastatis. Nature,
464: 1071-1076.
[18] Carrieri, C. et al. (2012) Long non-coding antisense RNA controls
Uchl1 translation through an embedded SINEB2 repeat. Nature,
491: 454-457
[19] Faghihi, M. et al. (2008) Expression of a noncoding RNA is elevated in Alzheimer’s disease and drives rapid feed-forward regulation of β-secretase. Nature Medicine, 14(7): 723-730.
11