STATISTICAL PHYSICS APPROACH
TO
POST-TRANSCRIPTIONAL REGULATION
Ph.D. Thesis Project
Student: Araks Martirosyan
Supervisors: Enzo Marinari, Andrea De Martino
Universita La Sapienza, Roma, April 6, 2014
I.
INTRODUCTION
In cell biology, networks are systems of interacting molecules that implement cellular functions. Nodes
are representing molecules and wires between them show interactions. While on molecular level a biological network is a mesh of chemical reactions, the collective effect of these reactions is enabling or
regulating the flow of matter and energy or of information. The study of biological networks opens many
interesting problems for statistical physics, e.g. a) description and properties of the flow, b) understanding the behavior of the network as a whole by looking at models, that leave out many molecular details,
c) the question of how network function and operating principles can be inferred, despite the limited
experimental access to the details of interactions between molecules.
An example of biological network are genetic regulatory circuits. They are functioning to control gene
expression, the process by which information from a gene (segment of DNA) is used in the synthesis of
proteins. In Fig. 1 we represent schematically that process. At first DNA is transcribed to pre-RNA,
which is spliced to mRNA. The latest is translated by the ribosomes into aminoacid sequences that fold
into functioning proteins. Proteins are maintaining the functionality of the cell, therefore control of
their expression is an important task for the cell. All cellular processes which control the expression of
proteins are called gene regulation. Gene regulation may appear at every stage of protein production.
It is a noisy process. Functions are non linear, processes involved in gene regulation happen on various
time scales, and the wiring of the networks can be specific, depending on the position or properties of
interacting parts. Therefore the precision of that control mechanisms is limited by the randomness of
individual molecular events, the metabolic cost to the number of signalling molecules used by the network,
constraints on the speed of the signalling etc. As evolutionary selection depends on function and not on
molecular details, different wiring diagrams or even changes in the components can result in the same
performance. Evolutionary process can change the structure of the network as long as its function is
preserved. So, one expects that the evolution selected those circuits that allow cells to maximize their
control power given physical limits. In other words, we should expect the physical limitations to be
translated into observable circuit properties.
The project will focus on the role of micro-RNAs (miRNA) for post-transcriptional regulation (PTR).
PTR is the control of gene expression at the RNA level, i.e. the stage between the transcription and the
translation of the gene. A miRNA is a small non-coding RNA molecule (around 22 nucleotides). Encoded
by eukaryotic nuclear DNA, miRNAs function via base-pairing with complementary sequences within
mRNA molecules, usually resulting in gene silencing via translational repression or target degradation
[1, 2].
We are going to focus on the two different instances of miRNA-mediated effects on PTR presented
below.
II.
miRNA-MEDIATED INFORMATION FLOW IN SMALL GENETIC CIRCUITS
Several miRNAs can bind the same mRNA and the same miRNA can bind several mRNAs. Information from target prediction algorithms or miRNA-mRNA database provides a way to build the potential
relationships between miRNAs and mRNAs. Several miRNA-mRNA databases, such as TargetScan,
Miranda, MiRTarBase, store computationally predicted miRNA targets as well as few biologically validated ones. The algorithms provide also the score of the miRNA-mRNA interaction. Databases provide
information about miRNA-mRNA interaction topology, that can be represented as a bipartite graph. By
being able to target different mRNA species with different kinetics, miRNAs can, in principle, act as the
mediators of an effective interaction between the mRNAs, such that a change in the transcription level
of one mRNA can result in an alteration of the levels of another mRNA (a competition occurs between
mRNAs for miRNA). Recently the so-called ‘ceRNA hypothesis’ (where ceRNA stands for ‘competitive
2
Figure 1: Gene expression diagram. We see transcription of DNA to pre-RNA, splicing of pre-RNA to mRNA
by removing the introns and translation of mRNA to proteins.
endogenous RNA’) was suggested, according to which, in any given cell type, the proteins are effectively
influenced by the levels of the different miRNA species, in ways that depend on the a priori possible
couplings between miRNAs and mRNAs, and on the kinetics that governs the different interactions.
A.
Simplest ceRNA Model
The smallest circuit that demonstrates competition between mRNAs and, therefore, their regulation
by each other, is the one where a single miRNA targets two different mRNA [1, 2]. Once miRNA binds
mRNA a complex is associated which later can dissociate to miRNA and mRNA again.
We’ll include in the model also TF (transcription factor) regulation of gene expression, that appears
on the DNA level. TFs are proteins that bind DNA sequences and can both promote or block the
recruitment of RNA polymerase. We will focus on transcriptional activation case. Schematically this
model can be represented by a network shown in the figure 2, where we have:
- mRNA/miRNA transcription from DNA activated by their transcription factors (TFs) with the
rates b1,2 , bµ correspondingly for first/second mRNA and miRNA.
+
−
- complex associations/dissociations with the rates k1,2
, k1,2
correspondingly.
- complex decay by catalytic channel where we see miRNA recycling with the rates κ1,2 ,
- complex/miRNA/mRNA decays with the rates dc1,2 , dµ , d1,2 .
Mathematically dynamics of such network can be represented by the set of ordinary differential equations (ODE), given below:
dmi
= −di mi + bi ni − ki+ µmi + ki− ci + ξmi − ξi+ + ξi− ,
dt
X
X
X
X
X
dµ
= −δµ + βnµ −
ki+ µmi +
(ki− + κi )ci + ξµ −
ξi+ +
ξi− +
ξκi ,
dt
i
i
i
i
i
dci
= −(dci + ki− + κi )ci + ki+ µmi + ξci + ξi+ − ξi− − ξκi ,
dt
dni
= kin fmi (1 − ni ) − kout ni + ξni ,
dt
dnµ
= kin fµ (1 − nµ ) − kout nµ + ξµ ,
dt
where i = {1, 2}, mi stands for the concentrations of mRNAs, µ for the miRNA, ci for the complexes,
3
Figure 2: Small ceRNA network including two mRNA (yellow), one miRNA (green), transcription factors for
each mRNA and miRNA (blue) and complexes (pale yellow) that are associated due to the miRNA-mRNA
interaction.
fi , fµ for TFs. ni , nµ are the TF binding site occupancy of corresponding DNAs and vary from zero to
one. The last two parameters, i.e. kin and kout , are TF binding-unbinding rates, that are chosen to be
the same for mRNAs and miRNA.
We will be interested in the fluctuations of the mRNAs/miRNA levels around the steady state, that arise
due to intrinsic noise sources, i.e.
- the fact that the binding site only has two binary states that switch on some characteristic timescale,
- the fact that we make a finite number of discrete proteins at the output,
- the fact that the input concentration might itself fluctuate at the binding site location.
Analytically we introduce that noise by adding Langevin random forces ”ξ” for each process separately.
That forces have mean equal to zero, they are uncorrelated in time and they have an amplitude such
that the random kicks have variance equal to the leftward and rightward step size:
< ξmi (t)ξmi (t0 ) >
< ξµ (t)ξµ (t0 ) >
< ξci (t)ξci (t0 ) >
< ξi− (t)ξi− (t0 ) >
< ξi+ (t)ξi+ (t0 ) >
< ξκi (t)ξκi (t0 ) >
< ξni (t)ξni (t0 ) >
< ξµ (t)ξµ (t0 ) >
=
=
=
=
=
=
=
=
(di m̄i + bi n̄i ) δ(t − t0 ),
(δ µ̄ + β n¯µ ) δ(t − t0 ),
dci c¯i δ(t − t0 ),
ki− c¯i δ(t − t0 ),
ki+ m̄i µ̄ δ(t − t0 ),
κi c¯i δ(t − t0 ),
(kin fmi (1 − n̄i ) + kout n̄i ) δ(t − t0 ),
(kin fµ (1 − n¯µ ) + kout n¯µ ) δ(t − t0 ).
Steady state of the system is given by the following set of equations:
P
bi n̄i + ki− c¯i
β n¯ +
(k− +κ )c¯
, µ̄ = µδ+Pi ki+ m̄ i i ,
+
i
i i
di + ki µ̄
+
k µ̄m̄i
k fmi ,µ
c¯i =
, n̄i,µ = kin fin
.
mi ,µ +kout
σi + ki− + κi
m̄i =
Analytic calculations, even for this small gene circuit, cannot be carried on very far; computational
methods like stochastic simulations will be used to go further and reach the goal of quantifying the role
of miRNA. One of them is the Gillespie algorithm (GA) [9].
The idea behind GA is the following. Suppose the probability of reaction R to happen during next
time interval t is P (R, t). Then steps to perform are as follows:
Step 0: set up initial amount of each molecule,
Step 1: generate random pair of (R, t) according to P (R, t),
Step 2: advance time by t and change number of molecules according to reaction R,
Step 3: stop, if there are no more molecules left in the system or if the defined termination time is
reached, otherwise return to the step 1.
4
Figure 3:
Amount
miRNA(green) at the
lations are done using
0.5; k1+ = k2+ = 0.1;
1; fµ = fm2 = 100.
of the free molecules of the first mRNA (black), the second mRNA (red) and the
steady state as a function of the effective birth rate of the first mRNA (b1 n1 ). SimuGillespie algorithm. Parameters are the following: b2 = bµ = 100; d1 = d2 = 1; dµ =
k1− = k2− = 0.0001; κ1 = κ2 = 0.001; kout = 5; d1 = 1; kin = 0.475; dc1 = dc2 =
If we simulate system long enough, it will reach its steady state. Figure 3 shows dependence of the
steady state of the system on the effective birth rate of first mRNA - b1 n¯1 . As we can see, when we increase
effective birth rate of the first mRNA, amount of its free molecules increases resulting downregulation of
the free miRNA molecules. Therefore amount of free molecules of the second mRNA increases as well.
This is exactly miRNA-mediated regulation we would like to quantify.
Given data from simulations it will be possible to estimate the noise of the system, which is crucial for
understanding how it functions. Noise encodes information about physical limits of the performance of
given biological function, therefore tells how precise biological systems, in particular regulatory elements,
can work. If evolution is an algorithm for finding better circuits, then the question we ask is how much
miRNA-mediated regulation is shaped by that algorithm? How close we are to the solution? How
do kinetic parameters affect the performance of the circuits? How precisely can molecular levels be
controlled by such mechanisms? Which structures perform regulation better? To answer this questions
we need to find a measure that will give as quantitative estimate about precision of the biological
functions/regulation. We would like to test if optimization of that measure will be predictive about
network structure. Mutual information will be our choice [10].
B.
Mutual Information
One of the physical measures that can be used to describe the efficiency of any molecular circuit, if
that is designed to function as a regulatory element, is the mutual information. Shannon proved [3]
that the mutual information is the unique coherent quantitative measure of the concept that ”y provides
information about x”. If we denote by p(x) and p(y) probability distributions of random variables x
and y correspondingly and by p(x, y) joint probability distribution of x and y, then mutual information
I(x; y) between them will be defined as follows:
Z
I(x; y) =
dx dy p(x, y) log2
p(x, y)
.
p(x) p(y)
Mutual information tells us how much information (in bits) we can extract about variable x given
y, or how much knowledge of y will decrease our uncertainty about x. One can easily see, that when
variable x does not regulate y (they are independent variables), then joint probability distribution can be
factorized p(x, y) = p(x)p(y) and mutual information will become zero (what is expected in that case).
If there is I bits mutual information between x and y, it means there are 2I distinguishable levels of x
that can be reached by dialing the value of y.
In the case of genetic regulatory circuits I(x; y) depends on the topology of the network and from the
kinetics of the specific model. So, a hypothesis was suggested [4, 5] that the principle of maximizing
the mutual information can be predictive about the circuit structure. In some cases the signaling time or
5
any other limitation might be more important than the noise of the channel. So, we should define the
limits of the validity of the hypothesis, which is the regime when the noise related to the constraint of
the number of signaling molecules is a dominant factor.
In some particular cases in which mutual information can be measured experimentally, theoretical
results can be confirmed or rejected. One such example is the transcriptional regulation by transcription
factors (TF). This pathway is very well described by Tkačik et al [4, 5]. Applying the principle of
maximizing the mutual information they were able to compute the maximal efficiency (in bits) achievable
a simple transcriptional regulatory circuit, given a pre-determined level of noise. A specific transcriptional
module in the Drosophila embryo has provided the experimental proof that, at least in one specific case,
real regulatory elements saturate the predicted efficiency bounds.
In this project we would like to quantify the performance of miRNA-mediated PTR circuits using the
principle of the optimization of information flow. We aim at testing the idea that control of mRNA levels
by miRNA might be more efficient than by TF, as suggested by the fine tuning that is achieved in cells
through miRNA regulation.
III.
INTEGRATION OF miRNAS TO NMD REGULATION
Nonsense mediated decay (NMD) is the second recently discovered mechanism of PTR we are going
to consider. Its main function is to reduce errors in gene expression by eliminating mRNA transcripts
that contain premature stop codons.
Genetic information is saved in a sector of mRNA called open reading frame (ORF). ORF is separated
from the rest of mRNA sequence by start and stop codons (see Fig. 4). As a result of mutation stop
codons (which in this case are called ‘premature stop codons’) may occur on ORF. On the other hand,
there are proteins called EJCs that can detect stop codons. During pre-RNA splicing process (see
Fig. 1) introns are removed, exons are joined and the proteins EJC sit on the junction points. When
ribosome is reaching start codon it starts the translation. During the translation process, if ribosome
reads premature stop codon, EJC sitting on mRNA after the position of ribosome detects the error and
eliminates production of the protein encoded by that mutant gene.
The interesting case is when protein EJC sits on mRNA after natural stop codon (Fig. 4). In this case
by the same mechanism it represses production of functional proteins. This pathway is exactly what is
called NMD.
One of the most important proteins, detected in neuronal cells, that undergo NMD is ARC. ARC is
believed to play a critical role in learning and memory-related molecular processes because of its influence
on synaptic activity. In particular, overexpression of ARC leads to decrease of the amplitude of miniature
postsynaptic currents (mEPSC) [6]. As NMD represses production of ARC, it was natural to expect that
when one blocks NMD (by EJC knockdown) decrease of mEPSC amplitude should be recorded. But the
opposite happens: mEPSC amplitude increases, although concentration of ARC increases. The question
is why this happens.
Figure 4: NMD diagram. On the upper frame we see a mutant gene, translation of which will be eliminated as
last EJC will detect premature stop codon. On the lower frame we see a functional gene, translation of which
will be eliminated by EJC sitting after the natural stop codon, i.e. NMD will take place.
Based on recent works [6, 7], a hypothesis was suggested, that integration of miRNA related regulation
6
to NMD might give an answer to this paradox. The idea is that it is an indirect effect, that can be a
result of the influence of other proteins on mEPSC, that are also regulated by NMD or are competitors
of ARC for the same miRNA. In the scope of the project, in collaboration with HUGEF group in Torino
and European Brain Research Institute of Rome we are working on database analyses in order to build
a ceRNA network of competitors of ARC, that undergo NMD as well.
First analyses of data show that there are ceRNAs for several miRNA molecules that are functionally
relevant to ARCs role at the synapse for human cells. Experiments will be designed to confirm or reject
the hypothesis and help to understand the importance of miRNA and NMD control in neuronal cells.
—————–
Above all of this, we should remember that genetic circuits are embedded in large complex networks
of interactions. What happens when these small post-transcriptional mechanisms are inserted into large
regulatory networks? Local indirect effects might be triggered, such as the cross-talk among common
targets. Mapping the knowledge about the state of the inputs to the knowledge about the state of the
outputs is a challenging open question.
[1] M. Figliuzzi, E. Marinari, A. De Martino, MicroRNAs as a selective channel of communication between
competing RNAs: a steady-state theory, Biophysical Journal, 104(5), 1203-1213, 2013.
[2] C. Bosia, A. Pagnani, R. Zecchina, Modelling Competing Endogenous RNA Networks, PLoS ONE, 8, e66609,
2013.
[3] C. E. Shannon, A Mathematical Theory of Communication, The Bell System Technical Journal, 27, 379-423,
623-656, 1948.
[4] G. Tkačik, A. M. Walczak, W. Bialek, Optimizing information flow in small genetic networks, Physical
Review E, 80(3), 031920, 2009.
[5] G. Tkačik, A. M. Walczak, Information transmission in genetic regulatory networks: a review, Journal of
Physics: Condensed Matter, 23(15), 153102, 2011.
[6] C. Giorgi, G. W. Yeo et al., The EJC Factor elF4AIII Modulates Synaptic Strength and Neuronal Protein
Expression, Cell, 130(1), 179-191, 2007.
[7] K. Wibrand et al., MicroRNA Regulation of the Synaptic Plasticity-Related Gene Arc, PLoS ONE, 7, e41688,
2012.
[8] P. S. Swain, Efficient attenuation of stochasticity in gene expression through post-transcriptional control,
Journal of Molecular Biology, 344(4), 965-976, 2004.
[9] T. Gillespie, A general method for numerically simulating the stochastic time evolution of coupled chemical
reactions, Journal of Computational Physics, 22, 403-434, 1976.
[10] W. Bialek, Biophysics: Searching for Principles, Princeton University Press, 2012.
© Copyright 2026 Paperzz