Parameter estimation methods for gene circuit modeling from time

Briefings in Bioinformatics, 16(6), 2015, 987–999
doi: 10.1093/bib/bbv015
Advance Access Publication Date: 26 March 2015
Paper
Parameter estimation methods for gene circuit
modeling from time-series mRNA data: a
comparative study
Ming Fan*, Hiroyuki Kuwahara*, Xiaolei Wang, Suojin Wang and Xin Gao
Corresponding author. Xin Gao, Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division,
King Abdullah University of Science and Technology, Kingdom of Saudi Arabia. Tel.: þ966-12-8080323; Fax: þ966-12-8021241. E-mail: [email protected]
*These authors contributed equally to this work.
Abstract
Parameter estimation is a challenging computational problem in the reverse engineering of biological systems. Because
advances in biotechnology have facilitated wide availability of time-series gene expression data, systematic parameter estimation of gene circuit models from such time-series mRNA data has become an important method for quantitatively
dissecting the regulation of gene expression. By focusing on the modeling of gene circuits, we examine here the performance of three types of state-of-the-art parameter estimation methods: population-based methods, online methods and
model-decomposition-based methods. Our results show that certain population-based methods are able to generate highquality parameter solutions. The performance of these methods, however, is heavily dependent on the size of the parameter search space, and their computational requirements substantially increase as the size of the search space increases. In
comparison, online methods and model decomposition-based methods are computationally faster alternatives and are less
dependent on the size of the search space. Among other things, our results show that a hybrid approach that augments
computationally fast methods with local search as a subsequent refinement procedure can substantially increase the quality of their parameter estimates to the level on par with the best solution obtained from the population-based methods
while maintaining high computational speed. These suggest that such hybrid methods can be a promising alternative to
the more commonly used population-based methods for parameter estimation of gene circuit models when limited prior
knowledge about the underlying regulatory mechanisms makes the size of the parameter search space vastly large.
Key words: parameter estimation; gene circuits; comparative study; thermodynamic-based modeling
Introduction
Recent advances in biotechnology have facilitated integrative,
cross-disciplinary biological research [1, 2]. In particular, technologies enabling high-resolution, high-throughput, time-series
measurements of gene expression levels [3–8] have provided a
means of inferring kinetic models that can more accurately capture the dynamics of gene regulatory systems than ever before.
As direct measurements of kinetic parameters in gene circuit
models are rare, advances in such technologies have proved
Ming Fan is an assistant professor in the College of Life Information Science and Instrument Engineering, Hangzhou Dianzi University, China.
Hiroyuki Kuwahara is a research scientist in the Structural and Functional Bioinformatics Group in the Computational Bioscience Research Center and
the Division of Computer, Electrical and Mathematical Sciences and Engineering at King Abdullah University of Science and Technology.
Xiaolei Wang is a PhD candidate in the Structural and Functional Bioinformatics Group in the Computational Bioscience Research Center and the Division
of Computer, Electrical and Mathematical Sciences and Engineering at King Abdullah University of Science and Technology.
Suojin Wang is a professor in the Department of Statistics, Texas A&M University, USA.
Xin Gao is an assistant professor and the lead of the Structural and Functional Bioinformatics Group in the Computational Bioscience Research Center
and the Division of Computer, Electrical and Mathematical Sciences and Engineering at King Abdullah University of Science and Technology.
Submitted: 8 December 2014; Received (in revised form): 9 February 2015
C The Author 2015. Published by Oxford University Press. For Permissions, please email: [email protected]
V
987
988
|
Fan et al.
to be essential to the reverse engineering of gene circuits.
The construction of phenomenological models via the reverse
engineering of biological systems is useful to test various competing hypotheses on the underlying molecular mechanisms
that quantitatively control the expression of genes, and they
can also be used to design perturbation experiments to refine
current knowledge [9–11].
A major challenge in the construction of quantitative biological models is parameter estimation. Given experimental data
and a fixed model structure, parameter estimation seeks to align
the dynamics of the model with the observed measurements by
fitting unknown model parameters that are constrained within
biologically relevant bounds [12, 13]. One of the most intuitive
ways to tackle this problem is to transform it into a global optimization problem and use optimization-based methods to
search for suitable parameter sets. Previous comparative studies
examined a number of optimization-based parameter estimation
methods for various biological models [14–16]. The typical focus
of those studies was the application of stochastic metaheuristics
to the parameter estimation of biological models. Commonly
used metaheuristics are based mainly on population-based optimization algorithms, including the stochastic ranking evolution
strategy (SRES) [17], differential evolution (DE) [18] and particle
swarm optimization (PSO) [19, 20]. The scatter search method
(SSM) [21, 22] was also used in parameter estimation of biological
models [19, 23]. To increase the accuracy, hybrid approaches that
combine stochastic metaheuristics and deterministic local
search algorithms were applied so as to guarantee that the solution reaches some local optimum [24, 25]. The computational requirements of these stochastic metaheuristic-based approaches
can, however, be daunting, especially when intermediate solutions produce models with widely different timescale characteristics, and the scalability becomes a challenging issue. To
improve the computational efficiency and the scalability of parameter estimation, several approaches were developed to decompose systems of ordinary differential equations (ODEs) and
reduce the search space [26–29]. Another way to increase the
computational speed is to use online filtering algorithms [30].
These include hybrid extended Kalman filters (HEKFs) [31, 32],
unscented Kalman filters (UKFs) [6, 33, 34] and particle filters
(PFs) [35–37]. While a number of parameter estimation methods
have been applied to modeling of various biological systems, it is
clear that each method has advantages and disadvantages, and
that there is no one-size-fits-all method for parameter estimation of biological models.
Here, we examined the use of several state-of-the-art parameter estimation methods, specifically in the context of the reverse engineering of gene circuits using time-series mRNA data
sets. A gene circuit is a network of genes that interact with each
other to regulate their expression. In gene circuits, transcription
initiation based on the interaction between transcription factors
and cis-regulatory elements plays a crucial role in the control of
gene expression. In transcriptional regulation, the typical behavior is often quantitatively characterized by statistical
thermodynamic models based on systems of coupled ODEs [38].
Parameter estimation of such kinetic models is particularly
challenging owing to (i) the highly nonlinear nature of gene
regulation and (ii) the noisy nature of the experimental measurements. These limitations make parameter estimation of
gene circuit models a non-convex, global optimization problem
with many local optima [39]—a computationally difficult problem. In addition, high-throughput time-series measurements
are conducted at sparse observation time points compared with
the timescale of gene expression reactions, and they are often
limited to mRNA molecules [3–6, 8, 40, 41]. Thus, while protein
abundance can be experimentally quantified, the protein
dynamics is typically treated as unobservable, which further
complicates the parameter estimation problem. In what follows, we explore both the accuracy and computational efficiency of parameter estimation methods in the context of
modeling gene circuits using synthetic and real time-series
gene expression data.
Methods
Problem setting
In this comparative study, we assume that the observations of a
gene circuit with N genes, g1 ; . . . ; gN , be given by bulk-level
time-series mRNA data at M time points, t1 < t2 . . . < tM , and denote the empirical mean of each mRNA mi at time tj by mij . The
protein copy of each gene gi, denoted by pi, can be used to regulate the transcription of any genes in the gene circuit. The regulation of each gene is often modeled using four rate-limiting
reaction processes: transcription initiation, mRNA degradation,
translation initiation and protein degradation. Because bulklevel time-series mRNA data sets can only infer the average
time course of mRNA levels, we focus on a continuousdeterministic version of the gene circuit model based on a system of ODEs that captures these four rate-limiting processes as
follows:
^i
dm
^; m
^ i ; hi Þ hi ðp
^ ; ðhi2 ; . . . ; hiKi ÞÞ hi1 m
^ i;
¼ fi ðp
dt
^i
dp
^ i bi p
^ i;
¼ ai m
for i ¼ 1; . . . ; N;
dt
(1)
^ i are time-dependent variables that estimate the
^ i and p
where m
average concentrations of mRNA mi and protein pi, respectively;
^ is
fi is the reaction rate function for the regulation of mRNA mi; p
^ i ; hi ðhi1 ; . . . ;
an N-dimensional vector whose i-th element is p
hiKi Þ is a Ki-dimensional vector that represents the parameters
used in the rate equation representing the regulation of mRNA
mi; hi represents transcription reaction kinetics; ai and bi are the
parameters used in the regulation of protein pi. Here, to represent transcription reaction kinetics based on the interaction
between regulatory proteins and cis-regulatory elements, we
set the function hi to be the equilibrium thermodynamics
model [38]. A brief overview of the thermodynamic-based transcriptional regulation modeling formalism is presented in
Supplementary Section S1.1.
The objective of the parameter estimation of a gene circuit
model described in Equation (1) is, thus, to search for the values
of unknown parameters in each hi so that the dynamics of the
model can reconstruct the observed time-series mRNA data.
Here, we examine various state-of-the-art parameter estimation algorithms in the context of gene circuit modeling with
diverse parameter estimation settings.
Algorithms
In this study, we selected state-of-the-art parameter estimation
methods from three categories: population-based methods,
online methods and decomposition-based methods. Brief descriptions of the parameter estimation methods examined in
this study are given in this section, while detailed descriptions
and specific configurations are presented in Supplementary
Sections S2 and S4.
Parameter estimation methods for gene circuit modeling
Population-based methods
Differential evolution. Differential evolution (DE) was proposed to
handle non-differentiable and nonlinear cost functions [42]. It is
a generic type of metaheuristics for global optimization problems. The original DE algorithm does not constrain the parameters between the upper and lower boundaries. We, therefore,
modified the algorithm to enable parameter boundaries
(Supplementary Section S2.1).
Stochastic ranking evolution strategy. Given an objective function,
the parameter estimation problem can be formulated as a constrained optimization problem. The SRES was introduced based
on the idea of soft constraints (i.e. constraints are added to the
objective function as the penalty term) [17]. SRES was ranked as
the best method in a comparative study of parameter estimation of a metabolic network model [15].
|
989
based methods. Here, we refer to these four PEDI-based methods
as PEDI(DE), PEDI(SRES), PEDI(SSM) and PEDI(PSO).
Objective function
Because the four population-based methods treat parameter estimation as an optimization problem, they require some objective functions that they are set out to optimize. Following Moles
et al. [15], we defined the objective function to be the weighted
sum of the squared residuals of the levels of mRNAs over the
M time points as follows:
Ji ¼
M
X
^ i ðtj Þ mij 2 ;
wi m
(2a)
Ji ;
(2b)
j¼1
J¼
N
X
i¼1
Scatter search method. The scatter search framework was first
proposed by Laguna and Martı́ [43] as a hybrid search method
that combines global search with local search. Later, RodriguezFernandez et al. improved several steps of the original framework and proposed the SSM [19]. Their study showed that SSM
could find global optima in a complex biochemical reaction network whereas SRES and DE became trapped in the local optima.
Particle swarm optimization. PSO is based on the idea of simulating social behaviors [20, 44]. We implemented PSO as described
by Birge et al. [45].
Online methods
Hybrid extended Kalman filter. The Kalman filter is a minimum
variance estimator, which operates by propagating the state
and the covariance of a discrete-time linear system through
time [46]. The HEKF considers continuous-time, nonlinear systems with discrete-time measurements by extending the
Kalman filter. In this study, we used a variant of a constrained
HEKF, which was used by Lillacci and Khammash [32] to impose
constraints for the lower bounds of the states (Supplementary
Section S2.2).
Unscented Kalman filter. The UKF is another filtering method that
propagates the means and covariances of states in nonlinear
systems by extending the Kalman filter [46]. Here, we used a
variant of a constrained UKF [47], which can have constraints
for the boundary conditions of the states.
Particle filter. The PF [35], also known as the sequential Monte
Carlo method, is a model estimation method based on a recursive Bayesian filter with Monte Carlo sampling. In a recent comparative study [30], PF generated more accurate and consistent
results than HEKF and UKF did in estimating parameters of an
Escherichia coli heat shock response model.
Decomposition-based methods
Parameter estimation by decomposition and integration (PEDI) is
a scalable framework for parameter estimation. It was specifically
designed to estimate parameters of gene circuit models [29].
Whereas the parameter search space in typical parameter estimation methods increases exponentially as the number of unknown parameters increases, PEDI is able to reduce the search
space substantially by decomposing the gene circuit model
described in Equation (1) into rate equations at the individual
gene level (Supplementary Section S2.3). In this study, we applied
the PEDI framework to the aforementioned four population-
where wi, the weight for mRNA mi, is given as
wi ¼ 1=max j ½ðmij Þ2 .
Within the PEDI framework [29], a gene circuit model based
on a system of ODEs, i.e. Equation (1), is decomposed into those
based on individual genes. Hence, when a population-based
method is used in PEDI as a parameter search module, objective
function Ji, Equation (2a), is used to optimize the parameter set, hi.
Experimental settings with synthetic time-series
mRNA data
This section overviews the experimental setting of our comparative study with synthetic time-series mRNA data (detailed
descriptions are presented in Supplementary Section S1.2). To
evaluate and compare parameter estimation methods, we considered five gene circuit models (Figure 1 and Supplementary
Section S3). The experimental workflow for our comparative
study is depicted in Figure 2. The time-series mRNA data of
each gene circuit are generated by adding two levels of white
Gaussian noise, i.e. 10 and 25%, to the true mean dynamics (see
Supplementary Section S1.2 for details). We evaluated both the
original algorithm and a hybrid version of each method. The
hybrid method is a combination of the original algorithm and a
local search algorithm in which the output parameter set from
the original method is fed into the local search method as the
initial parameter set. One major advantage of this hybrid approach is that the resulting parameter set is guaranteed to reach
a local optimum (Supplementary Section S1). Because the population-based methods and the PEDI-based methods can set parameter boundaries with their constraints, we used two different
parameter ranges (i.e. a narrower one and a wider one) to test
the effects of the size of parameter search space on their performance (Supplementary Section S4). This resulted in 20 combinations of test cases for each method (i.e. five models with
two noise levels and two boundary conditions). We evaluated
each of the methods with computational efficiency and accuracy criteria. The computational efficiency was quantified based
on the method’s runtime and how fast it achieved certain levels
of accuracy, while the accuracy was quantified by using the prediction error, which we measured using Equation (2b).
Results
The population-based methods
To measure the performance of the population-based methods,
we ran each method three times at various parameter estimation
990
|
Fan et al.
Figure 1. Network structures (left) and the true mean trajectories of gene circuit models (right). Each mi represents the mRNA of corresponding gene gi. (A) Model 1.
(B) Model 2. (C) Model 3. (D) Model 4. (E) Model 5. In these network structures, each line with an arrow head indicates transcriptional activation, while each line with a
bar indicates transcriptional repression. A colour version of this figure is available at BIB online: http://bib.oxfordjournals.org.
settings (Supplementary Section S4). In each of the runs, we
fixed the total number of searches to be 1 million. Table 1
shows the results of the run with the lowest prediction error of
each population-based method. Comparing the accuracy
performance of the original algorithms, we found that SSM most
frequently generated parameter solutions with the highest
degree of accuracy, achieving the lowest prediction error in 18
of the 20 settings (Table 1). We also observed that, in many
settings, SRES was able to achieve accuracy levels on par with
those from SSM.
Next, we examined the runtime based on 1 million searches
(Table 1 and Supplementary Section S5.1.1). From this experiment, PSO and DE emerged as the computationally most efficient method in 9 and 7 of the 20 settings, respectively.
However, they were also the slowest in several settings, suggesting that the runtime can be easily affected by both noise
Parameter estimation methods for gene circuit modeling
|
991
Figure 2. The experimental setting based on synthetic time-series data in this comparative study.
levels and parameter boundaries in a complex fashion
(Supplementary Section S5.1.1).
To examine how parameter solutions were improved for
each method, we first analyzed the intermediate solutions at
various search points (Supplementary Section S5.1.2). We found
that solutions from SSM had the highest level of accuracy at
most search points. Next, we analyzed the accuracy improvement rate based on computational time (Supplementary
Section S5.1.3). We found that SSM generally reaches higher
accuracy levels much more rapidly than the other populationbased methods do. These results agree with those by RodriguezFernandez et al. [19], which reported that SSM had higher
accuracy improvement rates compared with DE and SRES. One
exception was Model 2 in which SRES found high-quality solutions
in the wider parameter range settings more rapidly than SSM did.
This may be because the local search-based optimization had difficulty finding parameter solutions that exhibited similar
oscillatory dynamics in the large search space because such a
behavior requires intricate combinations of parameters.
By comparing the results from various configurations, we
found that the population-based methods generally resulted in
higher prediction errors in the wider parameter setting (31 of 40
cases; see Supplementary Table S6). The results from intermediate solutions also showed that the wider parameter boundary
settings resulted in higher prediction errors, and in a number of
those settings, solutions were not able to stabilize within 1 million searches (Supplementary Section S5.1.2). Given that it took
between 8 h and 2 days of computational power to run each of
the population-based methods with 1 million searches, it is
clear that the computational requirement of the populationbased methods can substantially increase as the parameter
search space increases.
A previous study [32] and a recent review paper [16]
suggested the usefulness of hybrid methods for parameter
estimation. We analyzed the performance of hybrid methods
based on the use of local search followed by the best run of each
population-based method. The results revealed that the accuracy improvements were mainly dependent on the type of
method (Table 1 and Supplementary Section S5.1.4). Overall, our
results suggest that this hybrid approach is useful in that the
accuracy improvement via the addition of a local search to the
three methods—except for SSM—more than compensates for
the cost of the runtime overhead.
The online methods
To analyze the performance of the three online methods, we ran
each method based on different initial parameter guesses
(Supplementary Section S5.2.1). The results of the prediction error
show that the accuracy levels can differ substantially depending
on the initial guesses whereby a high level of accuracy was
achieved only when the initial guesses were close to the true solution (Figure 3 and Supplementary Section S5.2.2). To examine
the accuracy further, we computed three statistical measures:
the lowest prediction error (i.e. the best), the mean and the standard deviation (Table 2). From these measures, we observed a huge
disparity in the accuracy levels between the best and the median
prediction error (Supplementary Section S5.2.2).
In this experiment, we observed that some of the HEKF
and UKF runs led to numerical errors. By dissecting the problems, we found two issues based on numerical instabilities
(Supplementary Section S5.2.3). In particular, the problem from
HEKF and the follow-up experiment demonstrated that the parameter boundary constraints are essential to accurate parameter estimation when online methods are used (Supplementary
Section S5.2.3).
The runtime results show that the online methods are considerably fast (Table 2). The computationally most efficient
992
|
Fan et al.
Table 1. Comparison of the best runs from the population-based methods
Boundary range
Narrower
Noise level
10%
Version
Original
Hybrid
Original
Hybrid
Original
Hybrid
Original
Hybrid
0.02
959.39
0.02
955.17
0.02
916.00
0.10
867.53
0.70
2012.03
0.17
2684.15
0.17
2593.82
0.24
1829.19
0.32
1195.33
0.14
1089.73
0.14
1053.48
0.14
1142.32
0.55
830.60
0.49
789.70
0.49
815.14
1.05
767.95
1.88
827.46
1.08
833.21
1.08
817.80
3.31
871.97
0.02
959.40
0.02
955.18
0.02
916.01
0.02
867.67
0.17
2017.97
0.17
2684.24
0.17
2593.91
0.23
1829.95
0.14
1198.20
0.14
1089.77
0.14
1053.52
0.14
1142.37
0.49
831.77
0.49
789.72
0.49
815.17
0.49
769.10
1.08
833.00
1.08
833.29
1.08
817.84
1.08
878.00
0.07
900.19
0.07
904.17
0.07
919.55
0.13
814.33
0.74
1990.55
0.33
2574.35
0.33
2540.06
0.34
1852.33
0.58
1096.66
0.39
1044.15
0.38
1024.20
0.39
1029.56
0.90
750.12
0.81
652.94
0.81
654.75
1.32
733.90
3.96
785.80
2.80
826.09
2.80
803.87
6.67
822.45
0.07
900.19
0.07
904.18
0.07
919.56
0.07
814.47
0.33
1997.78
0.33
2574.44
0.33
2540.16
0.34
1852.43
0.39
1100.53
0.39
1044.19
0.38
1024.24
0.39
1029.73
0.81
751.60
0.81
652.96
0.81
654.77
0.81
735.34
2.80
790.83
2.80
826.23
2.80
803.91
2.80
830.00
0.02
971.05
0.04
975.80
0.04
1053.20
0.05
1037.73
4.18
1020.32
0.17
2179.97
0.17
2342.72
46.46
2552.82
9.47
2785.98
0.33
3345.77
0.14
2900.98
5.50
3697.71
1.20
835.25
1.26
1029.86
0.53
934.98
12.10
514.51
4.36
601.26
6.29
674.37
1.41
771.51
9.78
578.95
0.02
971.06
0.04
975.81
0.04
1053.21
0.04
1037.80
0.17
1037.32
0.17
2180.06
0.17
2342.80
44.32
2556.95
0.19
2821.26
0.33
3346.26
0.14
2901.07
1.90
3751.98
0.59
838.09
0.53
1036.72
0.53
935.02
12.10
514.52
2.79
604.26
5.40
697.37
1.41
771.59
6.18
583.70
0.07
908.12
0.09
920.15
0.09
1010.11
0.09
999.81
3.53
1027.41
0.33
2467.70
0.33
2044.23
18.52
2363.73
7.67
2502.13
0.59
3112.10
0.38
2706.34
3.37
2811.04
1.75
645.93
1.72
774.05
0.94
786.99
10.83
503.06
6.87
596.08
10.26
683.38
3.02
707.48
18.88
522.60
0.07
908.13
0.09
920.16
0.09
1010.12
0.09
999.82
0.33
1046.60
0.33
2467.81
0.33
2044.35
18.22
2371.73
0.53
2533.62
0.59
3112.98
0.38
2706.45
0.48
2858.36
0.97
649.33
1.61
779.38
0.94
787.01
10.83
503.07
4.58
605.27
3.20
693.78
3.02
707.54
17.59
523.64
Model 1
DE
SRES
SSM
PSO
Model 2
DE
SRES
SSM
PSO
Model 3
DE
SRES
SSM
PSO
Model 4
DE
SRES
SSM
PSO
Model 5
DE
SRES
SSM
PSO
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Wider
25%
10%
25%
The lowest prediction error and runtime (in minutes) for each situation are in bold. The results of the hybrid approach here were based on the best solution obtained
from each of the population-based methods.
method was HEKF, which, regardless of the initial guess,
required the shortest runtime. UKF was also fast. While PF
with 200 particles was the slowest among the three methods,
it was still relatively fast, with most of the runtime being
<1 min.
Next, we analyzed the performance of the online-based
hybrid methods (Figure 3). Our results demonstrated that they
consistently generated solutions with much higher levels of
accuracy compared with their stand-alone counterparts
(Supplementary Section S5.2.4). In particular, we observed that
the accuracy and consistency of UKF þ LS, the hybrid method
based on UKF with a subsequent local search, were substantially improved compared with those of UKF. While the overhead of the subsequent local search was large considering the
very short runtime of the online methods, the overall runtime
was still frequently 100-fold faster than that of the populationbased methods, indicating a strong advantage of having a subsequent local search step.
The PEDI-based methods
We evaluated the performance of PEDI-based methods by using
the four population-based methods (Supplementary Section S5.
3.1). Among the four original, stand-alone PEDI-based methods,
PEDI(SSM) turned out to be the most accurate one, scoring the
lowest prediction error in 14 of the 20 settings (Table 3).
PEDI(DE) also exhibited high levels of accuracy, achieving the
lowest prediction error in 12 settings (Table 3). This result differs
from those of the population-based methods in which DE rarely
achieved a high level of accuracy compared with the
Parameter estimation methods for gene circuit modeling
|
993
Figure 3. Comparison of the three online methods in terms of the prediction accuracy in various settings. In these log–log plots, prediction error (the y-axis) is measured given initial parameter guesses. The x-axis shows the ratio of each initial guess to the true parameters expressed as percentages. Here, the results of the original
online methods and the hybrid methods with 10% and 25% noise levels are shown for the five models. The discontinuities in the HEKF lines in Model 2 indicate points
with unrepresentable values. A colour version of this figure is available at BIB online: http://bib.oxfordjournals.org.
other three. Another unexpected observation was that the accuracy levels of the PEDI-based methods in different experimental
settings were more comparable than those of the corresponding
population-based methods (Supplementary Table S6).
The runtime data show that PEDI(SRES) and PEDI(PSO) were
substantially faster than PEDI(DE) and PEDI(SSM) (Table 3). By
comparing the runtime data between the population-based
method and the PEDI-based methods, we found that the runtime of the PEDI-based methods was much shorter, but the efficiency gain via PEDI was strongly dependent on the type of
search. PEDI(SRES) and PEDI(PSO) mostly achieved more than a
100-fold speedup compared with the population-based counterparts, whereas the others typically had low single-digit speedup
(Supplementary Section S5.3.2).
Next, we examined the quality of the intermediate solutions
(Supplementary Section S5.3.3). We found that, in all settings,
the solutions of PEDI(DE) and PEDI(SSM) stabilized after fewer
rounds of refinement than those needed for PEDI(SRES) and
PEDI(PSO). However, based on the computational time to achieve
specific accuracy levels, there were only few settings where
PEDI(DE) and PEDI(SSM) were the fastest ones, indicating that
their refinement iterations are computationally demanding.
Analysis of the PEDI-based hybrid methods showed that all
four PEDI-methods benefited greatly from using a follow-up
local search and that they had mostly comparable levels of prediction errors (Table 3). In particular, the accuracy improvement
of PEDI(SRES) þ LS was remarkable given its runtime, which
was often >50 times faster than those of PEDI(DE) þ LS and
994
|
Fan et al.
Table 2. Comparison of prediction error and runtime (in minutes) among the online methods from 100 samples with different initial guesses
Measurements
Besta
Noise level
10%
Version
Original
Hybrid
Original
Hybrid
Original
Hybrid
Original
Hybrid
Original
Hybrid
Original
Hybrid
0.04
0.01
3.36
0.04
0.02
0.29
0.41
0.03
0.20
0.16
0.20
0.57
0.30
0.02
1.59
0.19
0.15
0.35
4.87
0.01
0.56
0.06
0.56
0.23
4.41
0.03
1.30
0.13
1.30
0.25
0.02
0.10
0.04
0.24
0.02
0.36
0.17
3.41
0.17
2.34
0.17
2.83
0.14
1.63
0.14
1.78
0.14
1.29
1.83
2.30
0.48
1.31
0.48
1.49
1.85
4.20
1.05
8.92
1.05
10.12
0.75
0.01
4.15
0.04
0.08
0.29
0.61
0.03
0.37
0.27
0.37
0.59
0.67
0.02
2.34
0.60
0.43
0.34
4.61
0.01
1.22
0.08
1.22
0.24
9.37
0.06
3.15
0.15
3.15
0.26
0.07
0.18
0.09
0.34
0.07
0.36
0.33
3.25
0.33
3.20
0.33
3.36
0.39
1.69
0.39
3.33
0.38
3.55
1.60
1.65
0.79
1.54
0.79
1.75
3.56
27.39
2.70
11.24
2.70
11.48
4.13
0.03
422.73
0.05
422.73
0.31
80.97
0.03
144.27
0.16
2366.76
1.11
47.83
0.03
83.13
0.20
81.01
0.62
31.38
0.03
26.03
0.07
46.78
0.48
53.54
0.06
50.41
0.15
52.41
0.61
0.04
0.20
2.55
0.24
2.56
0.47
22.53
4.86
26.36
2.51
52.50
4.83
0.14
13.64
0.14
3.15
0.19
13.64
5.39
2.37
0.52
2.78
4.60
2.01
10.71
4.96
1.11
17.43
16.18
5.72
34.34
0.01
483.78
0.06
419.93
0.31
70.70
0.03
126.44
0.29
2078.39
1.10
36.17
0.03
81.11
0.69
74.47
0.71
18.95
0.03
24.96
0.08
39.34
0.51
101.77
0.05
52.13
0.15
55.11
0.63
0.09
0.12
2.36
0.29
2.53
0.48
22.82
4.50
23.92
2.46
42.69
4.16
0.46
13.58
0.40
3.17
0.46
13.59
1.99
2.03
0.86
2.75
3.24
2.16
13.58
6.45
2.77
14.00
19.46
6.55
>1e4
0.01
625.65
0.01
630.90
0.01
831.25
0.01
39.79
0.03
3003.46
0.22
17.97
0.01
37.17
0.03
32.20
0.17
2568.50
0.01
804.23
0.02
28.28
0.12
>1e4
0.02
128.80
0.03
16.88
0.18
2.24
0.07
1.51
0.10
1.45
0.10
18.59
6.45
18.73
5.57
21.27
15.00
0.55
11.70
0.00
0.95
0.66
11.78
4.84
1.70
4.02
1.66
5.98
1.72
5.81
6.22
0.08
6.66
7.89
5.47
>1e4
0.01
>1e4
0.01
623.88
0.01
401.22
0.01
34.05
0.03
2652.69
0.20
14.51
0.01
196.41
0.11
29.72
0.21
135.36
0.01
445.02
0.01
22.20
0.14
>1e4
0.04
1005.23
0.02
17.04
0.19
14.64
0.11
1.36
0.12
1.42
0.10
15.81
9.47
15.48
6.82
19.15
11.56
0.91
9.05
0.06
1.12
0.48
11.07
3.81
1.71
2.52
1.30
4.64
1.50
6.31
6.48
1.69
5.09
8.13
7.54
Model 1
HEKF
UKF
PF
Model 2
HEKFb
UKF
PF
Model 3
HEKF
UKF
PF
Model 4
HEKF
UKF
PF
Model 5
HEKF
UKF
PF
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Median
25%
Standard deviation
10%
25%
10%
25%
The best performance for each setting is shown in bold.
a
The best solution of each hybrid approach is based on local search using the most accurate solution of the online method among the 100 runs as the initial seed. The
runtime row in the Best field shows the runtime of the most accurate solution.
b
Because HEKF resulted in no solutions for 23 and 16 runs in the 10% and 25% noise settings, respectively, we excluded those runs from computation of statistical
measures.
PEDI(SSM) þ LS. Our results, thus, indicate that PEDI(SRES) þ LS
is overall a well-balanced parameter estimation method with a
high level of accuracy and computational efficiency.
Our results have shown that the online- and PEDI-based
hybrid methods are capable of achieving the same level of accuracy as the population-based hybrid methods, but computationally much more efficiently. To evaluate how these methods
perform more objectively, we compared the accuracy and the
consistency of the hybrid methods with similar runtime speeds
(see Supplementary Section S5.3.4 for details). We found that,
while the accuracy levels of solutions from PEDI-based hybrid
methods were more consistent with respect to different initial
parameter guesses than those from the online-based hybrid
methods were, both were able to consistently generate parameter solutions with high levels of accuracy, on par with those
from SSM.
Results from the E. coli SOS response system with
microarray data
Next, we compared the performance of these parameter estimation methods using a time-series gene expression data set from
cDNA microarray experiments of the E. coli SOS response
system by Courcelle et al. [48]. The SOS response system implements a damage tolerance mechanism that senses DNA damage and regulates the transcription of SOS genes that induce
DNA repair [49, 50]. Protein LexA is a master regulator in this
gene regulatory system that, by binding to the helix-turn-helix
motif as a homodimer, represses the transcription of >30 SOS
genes [51–54]. Courcelle et al.’s microarray data sets contain
fold-change data of many mRNAs that are potentially regulated
by LexA at 6 time points after UV exposure [48].
Our SOS response model describes the regulation of seven
genes that are known to be controlled by LexA (Figure 4 and
Supplementary Section S5.4.1). To reduce the effects of upstream pathways and to focus on LexA-regulated gene expression circuitry, we used the microarray data set from the E. coli
mutant with non-cleavable LexA. The parameter range was
chosen based on our prior knowledge about the SOS response
system (see Supplementary Section S5.4.2).
In this experiment, the population-based methods and the
PEDI-based methods were run three times, while the onlinebased methods were run 10 times because they were fast.
Although the online methods demand that the covariance matrix be given, we could not obtain this information with a high
confidence. Thus, to use the online methods, we made a series
Parameter estimation methods for gene circuit modeling
|
995
Table 3. Comparison of the best runs from the PEDI-based methods
Boundary range
Narrower
Noise level
10%
Version
Original
Hybrid
Original
Hybrid
Original
Hybrid
Original
Hybrid
0.04
333.13
0.04
4.48
0.04
353.41
0.07
4.10
0.45
858.15
39.23
16.27
0.46
971.89
40.76
13.79
0.14
895.43
0.30
15.38
0.14
1000.81
0.55
13.90
0.57
386.61
1.35
6.36
0.57
443.01
2.42
5.54
1.54
781.32
4.25
12.81
1.11
875.50
10.26
9.85
0.02
333.24
0.02
4.60
0.02
353.54
0.02
4.24
0.17
862.15
0.17
23.92
0.17
975.80
0.17
23.04
0.14
897.95
0.14
19.60
0.14
1002.51
0.14
17.99
0.49
389.36
0.49
7.38
0.49
444.02
0.49
7.24
1.08
785.39
1.08
18.64
1.08
879.85
1.08
14.93
0.12
327.83
0.12
4.53
0.12
359.58
0.12
4.19
3.18
877.23
2.20
16.46
3.26
981.53
25.25
13.95
0.39
914.31
0.43
15.30
0.43
1013.33
0.66
14.01
1.14
391.41
1.70
6.36
0.90
446.24
3.52
5.67
3.22
789.55
3.54
12.82
3.05
887.15
87.95
9.37
0.08
327.95
0.08
4.64
0.08
359.69
0.08
4.31
0.33
882.17
0.33
22.30
0.33
986.46
0.33
23.39
0.39
918.15
0.39
20.08
0.39
1017.39
0.39
17.74
0.81
392.91
0.81
7.72
0.81
447.23
0.81
7.41
2.80
795.69
2.80
17.79
2.80
893.07
2.80
15.65
0.04
340.28
0.04
4.45
0.04
357.05
0.07
4.16
0.45
871.29
0.53
16.56
0.46
976.59
40.76
13.87
0.14
900.93
0.16
15.06
0.14
1011.66
0.55
14.07
0.57
402.81
1.07
6.39
0.57
444.25
2.42
5.63
1.54
810.84
2.52
12.88
1.11
881.96
10.26
10.05
0.02
340.56
0.04
4.51
0.04
357.18
0.04
4.23
0.17
877.30
0.17
24.59
0.17
982.61
15.63
15.69
0.14
903.74
0.15
21.56
0.14
1012.02
0.33
24.06
0.49
404.47
0.59
10.53
0.48
445.58
0.59
8.42
1.20
818.28
1.62
24.46
1.06
886.07
10.23
12.23
0.12
328.01
0.12
4.52
0.12
355.72
0.12
4.20
3.18
882.20
2.02
16.59
3.26
978.50
25.25
13.99
0.39
917.92
0.43
15.30
0.43
1011.69
0.66
14.02
1.14
393.40
2.08
6.35
0.90
445.94
3.52
5.70
3.22
791.18
8.51
12.94
3.05
887.55
87.95
9.41
0.09
328.11
0.10
4.58
0.09
355.91
0.09
4.44
0.33
890.50
0.33
24.35
0.33
986.73
1.49
64.07
0.38
920.71
0.39
24.86
0.38
1015.92
0.59
22.97
0.87
395.10
0.97
11.75
0.79
446.64
0.97
11.00
2.91
803.21
3.96
20.42
2.88
901.80
19.92
10.65
Model 1
PEDI(DE)
PEDI(SRES)
PEDI(SSM)
PEDI(PSO)
Model 2
PEDI(DE)
PEDI(SRES)
PEDI(SSM)
PEDI(PSO)
Model 3
PEDI(DE)
PEDI(SRES)
PEDI(SSM)
PEDI(PSO)
Model 4
PEDI(DE)
PEDI(SRES)
PEDI(SSM)
PEDI(PSO)
Model 5
PEDI(DE)
PEDI(SRES)
PEDI(SSM)
PEDI(PSO)
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Error
Time
Wider
25%
10%
25%
The lowest prediction error and runtime (in minutes) for each situation are in bold. The results of the hybrid approach here were based on the best solution obtained
from each of the PEDI-based methods.
of assumptions to generate the covariance values
(Supplementary Section S5.4.3). Similar to our earlier experiments with synthetic mRNA data, the results from the best run
were compared in both original and hybrid versions. Table 4
shows the relative accuracy and speedup factor of each method
with respect to SSM, which produced the most accurate parameter solutions among the population-based methods. These results are consistent with those from the five gene circuit models
with synthetic mRNA data and demonstrate that the PEDI- and
online-based hybrid methods are capable of generating more
accurate parameter solutions in a computationally more efficient fashion than the population-based methods generate.
This experiment with a real cDNA microarray data set gives
strong evidence that these computationally efficient hybrid
methods are useful alternatives in estimating parameters of
gene circuit models.
Discussion
In this study, we analyzed the performance of several state-of-the-art
parameter estimation methods in the context of gene circuit modeling.
While previous studies compared several parameter estimation methods in the systems biology setting [15, 19, 30], to the best of our knowledge, this study is the first instance in which various types of
parameter estimation methods are compared specifically in the context
of gene circuit modeling. In particular, we focused on the estimation of
kinetic parameters in thermodynamic-based mRNA regulation models.
Unlike parameters used in statistical and high-level phenomenological
996
|
Fan et al.
Table 4. Comparison of the accuracy and speedup factors among the
best run from each of the parameter estimation methods using the
E. coli SOS response model
Version
SSM
DE
SRES
PSO
PEDI(DE)
PEDI(SRES)
PEDI(SSM)
PEDI(PSO)
HEKF
UKF
PF
Figure 4. The network structure and bulk-level mRNA dynamics of the E. coli
Accuracy
Speedup
Accuracy
Speedup
Accuracy
Speedup
Accuracy
Speedup
Accuracy
Speedup
Accuracy
Speedup
Accuracy
Speedup
Accuracy
Speedup
Accuracy
Speedup
Accuracy
Speedup
Accuracy
Speedup
Original
Hybrid
1.00
1.00
0.25
1.06
0.63
1.01
0.06
4.27
1.09
2.03
0.30
77.11
1.08
1.47
1.05
63.46
0.12
1622.93
0.01
995.29
0.03
425.08
1.00
0.99
0.63
1.05
0.63
1.00
0.65
4.10
1.31
2.01
1.18
49.03
1.29
1.47
1.27
58.34
0.61
143.68
0.60
73.78
0.66
56.66
SOS response system. Similar to the networks in Figure 1, each line with an
arrow head indicates transcriptional activation, while each line with a bar indicates transcriptional repression. The bulk-level time-series mRNA data are from
cDNA microarray experiments of the MG1655 lexA1(Ind-) strain by Courcelle
et al. [48]. A colour version of this figure is available at BIB online: http://
bib.oxfordjournals.org.
models, these parameters are based on underlying biophysical processes
and have concrete biological meanings [38, 55, 56]. Whereas this equilibrium thermodynamic-based formalism has been traditionally applied to
modeling prokaryotic gene regulation [38, 55, 57–59], it has also been
used successfully to model eukaryotic gene regulation [60–62] as well as
to design artificial gene circuits [63–65]. Thus, the estimation of kinetic
parameters in thermodynamic-based gene circuit models has practical
significance not just to simulating the dynamics of gene regulation, but
also to gaining quantitative insights into how underlying transcriptional
mechanisms are controlled by the interaction of regulatory proteins and
DNA binding sites in a wide range of natural and synthetic organisms.
Using such thermodynamic-based gene circuit models, we
evaluated three types of parameter estimation methods: population-based methods, online methods and PEDI-based methods. To this end, we made relatively realistic assumptions
about the type of time-series gene expression data available
for parameter estimation of gene circuit models. Namely, instead of assuming that time-series data be given with finegrained time intervals, we assumed that, as is often the case
with high-throughput gene expression data, the observation
time points of our synthetic data were sparse relative to the
timescale of the gene expression. In addition, as time-series
proteomic data with corresponding mRNA data points are
often not available, we assumed that only the population-level
mRNA data were measured, and we thus treated the time
course of transcription factors as unobservable. However, to
capture transcriptional regulation based on the interaction of
transcription factors and DNA binding sites, we included in
our gene circuit models the reaction processes for protein
The accuracy and speedup factors of each method were measured relative to
the performance of SSM. (The accuracy and speedup factors were computed by
dividing the prediction error and the runtime of SSM by those of each method.
These reference values are 0.24 for the error and 61.51 min for the runtime.) The
higher the value is, the better the performance is. The highest accuracy and
speedup factors in each method group are in bold.
regulation. We also assumed that the synthesis rate and degradation rate constant of each regulatory protein were known.
Because such information can be deduced from the data for
mRNA and protein abundance levels as well as protein half-life
data, as measured in an eukaryotic cell [66, 67], we believe that
this assumption is not unrealistic.
By considering various parameter estimation settings with
different noise levels and parameter boundary ranges, we
showed that, when the parameter boundary ranges were relatively small, SRES and SSM attained the most accurate parameter solutions in a computationally efficient fashion among the
population-based methods that we examined. With the wider
parameter boundary setting, SSM markedly performed well, but
the usefulness of the population-based methods was deteriorated in general, as they required a much larger number of
searches to stabilize the prediction error and a much larger
amount of time to arrive at solutions with higher levels of accuracy. These outcomes indicate that, should model parameters
be estimated from the conservative boundary owing to limited
quantitative knowledge about the underlying gene regulatory
mechanisms, the population-based methods would be computationally expensive and possibly not as effective as the other
methods. The online methods are computationally much more
efficient alternatives, but our experiments demonstrated that
the accuracy levels of their parameter solutions were not consistent and varied widely depending on the initial parameter
guess. At the same time, the accuracy levels of the online methods were much lower compared with those of the populationbased methods unless the initial parameter guess was close to
the true value.
Parameter estimation methods for gene circuit modeling
Unlike a previous study in which PF was reported to converge to the global optimum and to perform the best among
online methods [30], we did not find any clear indications as to
which method generates the most accurate solutions, suggesting a more complex picture of factors involved in the performance of the online methods. This is surprising, particularly
because HEKF is based on a first-order Taylor expansion while
UKF is equivalent to a third-order Taylor expansion [46, 68],
which, intuitively speaking, indicates that the accuracy of UKF
is expected to be higher than that of HEKF. This unexpected
result may be owing to the fact that we used versions of HEKF
and UKF that impose constraints on the parameter boundaries,
which drastically improved the quality of the parameter solutions from HEKF and UKF. The discrepancy may also have
arisen owing to the fact that our measurement time points were
set to be much more sparse compared with those used by Liu
et al. [30], in which time-series data with 1000 time points were
assumed to be available. In fact, we believe that this point is significant in the reverse engineering of gene circuits because the
time interval of time-series mRNA data is expected to be wide
[69]. One thing to note here in the use of Kalman filter-based online methods in modeling gene circuits is that they demand
that the covariance matrix of mRNAs and proteins be given.
While it is easy to generate true covariance in synthetic data,
obtaining such information may be more involved in wet-lab
experiments, and this strict requirement may prevent these
methods from being used in many real applications.
PEDI-based methods are also computationally efficient
methods that can be specific to parameter estimation of gene
circuit models from time-series mRNA data with sparse time
points [29]. Our results showed that most of the PEDI-based
methods attained parameter solutions with accuracy levels on
par with those of the population-based methods, but they were
often much faster than the population-based methods. In particular, PEDI(SRES) was demonstrated to be a well-balanced
method by achieving high accuracy and efficiency in general.
Our experiments also demonstrated that, unlike the population-based methods, PEDI-based methods are more independent of the parameter boundaries and their solutions are largely
unaffected by an increase in the boundary range. This indicates
that PEDI-based methods would perform well even when prior
knowledge about the range of each of kinetic parameters is
limited.
We also analyzed the performance of the hybrid approach
that combines each parameter estimation method with a subsequent local search algorithm. Because most of the methods in
this study—specifically, all but SSM—are not guaranteed to attain locally optimal solutions, the subsequent local search can
help these methods increase the accuracy level of parameter solutions. In particular, we showed that this hybrid strategy improved the solutions from the online methods by substantially
increasing their accuracy and stability. While the computational
time of the local search was often much higher than that of the
original online methods themselves, the overall runtime of the
online-based hybrid methods was still computationally much
more efficient compared with that of the population-based
methods. We also showed that all of the PEDI-based methods
benefited from the hybrid strategy and increased the accuracy
levels with low computational overhead. In particular, we demonstrated that PEDI- and online-based hybrid methods were
capable of generating parameter solutions comparable with—
and many times more accurate than—those from the populationbased methods with high computational efficiency, and we
confirmed these results using a real microarray data set.
|
997
Conclusion
Parameter estimation of gene circuit models is an essential
step in discovering useful information about gene regulatory
mechanisms from transcriptomics data. Population-based
metaheuristics have traditionally been thought to be the de facto
standard for parameter estimation of biochemical kinetic models [15, 16], but their usefulness is substantially lowered when
the parameter search space widens. Our study indicates that, in
such cases, hybrid approaches based on the aforementioned
computationally efficient methods coupled with a local search
algorithm are useful alternatives to the population-based methods. Accurate and computationally efficient estimation of
kinetic parameters in gene circuit models is one key to the systematic understanding of gene regulatory systems. Thus, our
results may have substantial implications in an integrative
systems biology approach to predicting how genetic parts
interact to control gene expression and understanding how
such gene regulation can affect cellular morphology and
physiology.
Supplementary data
Supplementary data are available online at http://bib.oxfordjournals.org/.
Key Points
• The accurate estimation of kinetic parameters in de-
•
•
•
•
tailed gene circuit models from transcriptomics data is
an essential step in integrative systems biology.
We evaluated the performance of 22 distinct
approaches based on three types of state-of-the-art
parameter estimation methods using six gene circuit
models with various parameter estimation settings.
We found that the usefulness of the population-based
methods, in general, was deteriorated with a larger parameter search space.
We showed that a hybrid strategy that augments computationally efficient methods with a subsequent local
search can substantially increase the accuracy of parameter solutions while still maintaining high computational efficiency.
While population-based methods have been popular in
systematically estimating parameters of biological models, our results suggest that computationally efficient
hybrid methods are promising alternatives for effective
parameter estimation of gene circuit models.
Funding
The research reported in this publication was supported by
competitive research funding from King Abdullah University of
Science and Technology (KAUST), the Natural Science
Foundation of Zhejiang Province of China (LQ14F010011) and
the National Natural Science Foundation of China (Grant No.
61401131).
References
1. Hood L. A personal journey of discovery: developing technology and changing biology. Annu Rev Anal Chem 2008;1:1–43.
2. O’Shea P. Future medicine shaped by an interdisciplinary
new biology. Lancet 2012;379:1544–50.
998
|
Fan et al.
3. Nolan T, Hands RE, Bustin SA. Quantification of mRNA using
real-time RT-PCR. Nat Protoc 2006;1:1559–82.
4. Joo C, Balci H, Ishitsuka Y, et al. Advances in single-molecule
fluorescence methods for molecular biology. Annu Rev
Biochem 2008;77:51–76.
5. Raj A, van Oudenaarden A. Single-molecule approaches to
stochastic gene expression. Annu Rev Biophys 2009;38:255–70.
6. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool
for transcriptomics. Nat Rev Genet 2009;10:57–63.
7. Dhanasekaran S, Doherty TM, Kenneth J, et al. Comparison of
different standards for real-time PCR-based absolute quantification. J Immunol Methods 2010;354:34–9.
8. Materna SC, Nam J, Davidson EH. High accuracy, highresolution prevalence measurement for the majority of
locally expressed regulatory genes in early sea urchin
development. Gene Expr Patterns 2010;10:177–84.
9. Ideker T, Galitski T, Hood L. A new approach to decoding life:
systems biology. Annu Rev Genomics Hum Genet 2001;2:343–372.
10. Kitano H. Computational systems biology. Nature 2002;420:
206–210.
11. Church GM. From systems biology to synthetic biology. Mol
Syst Biol 2005;1:2005.0032.
12. Schwartz R. Biological Modeling and Simulation: A Survey of
Practical Models, Algorithms, and Numerical Methods. The MIT
Press, 2008, Cambridge, Massachusetts, USA.
13. Beck JV, Woodbury KA. Inverse problems and parameter estimation: integration of measurements and analysis. Meas Sci
Technol 1999;9:839.
14. Mendes P, Kell D. Non-linear optimization of biochemical
pathways: applications to metabolic engineering and parameter estimation. Bioinformatics 1998;14:869–83.
15. Moles CG, Mendes P, Banga JR. Parameter estimation in biochemical pathways: a comparison of global optimization
methods. Genome Res 2003;13:2467–74.
16. Sun J, Garibaldi JM, Hodgman C. Parameter estimation using
metaheuristics in systems biology: a comprehensive review.
IEEE/ACM Trans Comput Biol Bioinform 2012;9:185–202.
17. Runarsson T, Yao X. Stochastic ranking for constrained evolutionary optimization. IEEE Trans Evol Comput 2000;4:
284–94.
18. Storn R, Price K. Differential evolution—a simple and efficient
heuristic for global optimization over continuous spaces.
J Global Optim 1997;11:341–59.
19. Rodriguez-Fernandez M, Egea JA, Banga JR. Novel metaheuristic for parameter estimation in nonlinear dynamic biological systems. BMC Bioinformatics 2006;7:483.
20. Shi Y, Eberhart R. A modified particle swarm optimizer. In:
IEEE International Conference on Evolutionary Computation. New
York, NY: IEEE, 1998, 69–73.
21. Glover F. Heuristics for integer programming using surrogate
constraints. Decis Sci 1977;8:156–66.
22. Fleurent C, Glover F, Michelon P, et al. A scatter search approach for unconstrained continuous optimization. In:
Proceedings of 1996 IEEE International Conference on Evolutionary
Computation (ICEC’96). New York, NY: IEEE, 1996, 643–8.
23. Glover F, Laguna M, Marti R. Scatter search and path relinking: advances and applications handbook of metaheuristics.
In: F Glover, GA Kochenberger (ed). Handbook of Metaheuristics,
Chapter 1, Vol. 57. Boston: Springer New York, 2003, 1–35.
24. Ashyraliyev M, Jaeger J, Blom JG. Parameter estimation and
determinability analysis applied to Drosophila gap gene circuits. BMC Syst Biol 2008;2:83.
25. Liu PK, Yuh CH, Wang FS. Inference of genetic regulatory networks using S-system and hybrid differential evolution. In:
IEEE Congress on Evolutionary Computation. Hong Kong, China.
IEEE, 2008, pp. 1736–43, Piscataway, NJ, USA.
26. Koh G, Teong HFC, Clément MV, et al. A decompositional approach to parameter estimation in pathway modeling: a case
study of the Akt and MAPK pathways and their crosstalk.
Bioinformatics 2006;22:e271–80.
27. Zhan C, Yeung LF. Parameter estimation in systems biology
models using spline approximation. BMC Syst Biol 2011;5:14.
28. Jia G, Stephanopoulos GN, Gunawan R. Parameter estimation
of kinetic models from metabolic profiles: two-phase
dynamic decoupling method. Bioinformatics 2011;27:1964–70.
29. Kuwahara H, Fan M, Wang S, et al. A framework for scalable
parameter estimation of gene circuit models using structural
information. Bioinformatics 2013;29:i98–107.
30. Liu X, Niranjan M. State and parameter estimation of the heat
shock response system using Kalman and particle filters.
Bioinformatics 2012;28:1501–7.
31. Sun X, Jin L, Xiong M. Extended Kalman filter for estimation
of parameters in nonlinear state-space models of biochemical networks. PLoS One 2008;3:e3758.
32. Lillacci G, Khammash M. Parameter estimation and model selection in computational biology. PLoS Comput Biol 2010;6:
e1000696.
33. Julier SJ, Uhlmann JK. New extension of the Kalman filter to
nonlinear systems. In: AeroSense’97. International Society for
Optics and Photonics, 1997, pp. 182–93 SPIE Digital Library
(http://spie.org/). Bellingham, Washington, USA.
34. Sarkka S. On unscented Kalman filtering for state estimation
of continuous-time nonlinear systems. IEEE Trans Automat
Contr 2007;52:1631–41.
35. Sarkar P. Sequential Monte Carlo methods in practice.
Technometrics 2003;45:106.
36. Nagasaki M, Yamaguchi R, Yoshida R, et al. Genomic data assimilation for estimating hybrid functional petri net from
time-course gene expression data. Genome Inform 2006;17:46.
37. Tasaki S, Nagasaki M, Oyama M, et al. Modeling and estimation of dynamic egfr pathway by data assimilation approach
using time series proteomic data. Genome Inform 2006;17:226.
38. Shea MA, Ackers GK. The OR control system of bacteriophage
lambda: a physical-chemical model for gene regulation. J Mol
Biol 1985;181:211–30.
39. Villaverde AF, Banga JR. Reverse engineering and identification in systems biology: strategies, perspectives and challenges. J R Soc Interface 2014;11:20130505.
40. Kulkarni MM. Digital multiplexed gene expression analysis
using the NanoString nCounter system. Curr Protoc Mol Biol
2011;Chapter 25:Unit25B.10.
41. van Oijen AM. Single-molecule approaches to characterizing
kinetics of biomolecular interactions. Curr Opin Biotechnol
2011;22:75–80.
42. Storn R. On the usage of differential evolution for function
optimization. In: M Smith, M Lee, J Keller, J Yen (eds). North
American Fuzzy Information Processing. New York, NY: IEEE,
1996, 519–23.
43. Laguna M, Marti R, Martı́ RC. Scatter search: methodology and
implementation in C, Norwell, Massachusetts, USA. Vol. 24.
Springer, 2003, New York, NY, USA.
44. Kennedy J, Eberhart R. Particle swarm optimization. In: Neural
Networks, 1995 Proceedings. IEEE International Conference on
IEEE, Perth, Australia. Vol. 4. 1995, 1942–8, Piscataway, NJ, USA.
45. Birge B. PSOt - a particle swarm optimization toolbox for use
with matlab. In: Swarm Intelligence Symposium, 2003. SIS’03.
Indianapolis, Indiana, USA. Proceedings of the 2003 IEEE.
IEEE, 2003, pp. 182–6, Piscataway, NJ, USA.
Parameter estimation methods for gene circuit modeling
46. Simon D. Optimal State Estimation: Kalman, H Infinity, and
Nonlinear Approaches. John Wiley & Sons, 2006, New Jersey,
USA.
47. Baker SM, Poskar CH, Schreiber F, et al. An improved
constraint filtering technique for inferring hidden states and
parameters of a biological model. Bioinformatics 2013;29:
1052–9.
48. Courcelle J, Khodursky A, Peter B, et al. Comparative gene expression profiles following UV exposure in wild-type and
SOS-deficient Escherichia coli. Genetics 2001;158:41–64.
49. Radman M. SOS repair hypothesis: phenomenology of an inducible DNA repair which is accompanied by mutagenesis.
Basic Life Sci 1975;5A:355–67.
50. Michel B. After 30 years of study, the bacterial SOS response
still surprises us. PLoS Biol 2005;3:e255.
51. Brent R, Ptashne M Mechanism of action of the lexA gene
product. Proc Natl Acad Sci USA 1981;78:4204–8.
52. Sutton MD, Smith BT, Godoy VG, et al. The SOS response: recent insights into umuDC-dependent mutagenesis and DNA
damage tolerance. Annu Rev Genet 2000;34:479–97.
53. Fernández De Henestrosa AR, Ogi T, Aoyagi S, et al.
Identification of additional genes belonging to the LexA regulon in Escherichia coli. Mol Microbiol 2000;35:1560–72.
54. Zhang APP, Pigli YZ, Rice PA. Structure of the LexA-DNA complex and implications for SOS box measurement. Nature 2010;
466:883–6.
55. Ackers GK, Johnson AD, Shea MA. Quantitative model for
gene regulation by k phage repressor. Proc Natl Acad Sci USA
1982;79:1129–33.
56. Wang X, Kuwahara H, Gao X. Modeling DNA affinity landscape through two-round support vector regression with
weighted degree kernels. BMC Syst Biol 2014;8 (Suppl 5):S5.
57. McAdams HH, Arkin A. Simulation of prokaryotic genetic circuits. Annu Rev Biophys Biomol Struct 1998;27:199–224.
|
999
58. Arkin A, Ross J, McAdams H. Stochastic kinetic analysis of developmental pathway bifurcation in phage lambda-infected
Escherichia coli cells. Genetics 1998;149:1633–48.
59. Kuwahara H, Myers CJ, Samoilov MS. Temperature control of
fimbriation circuit switch in uropathogenic Escherichia coli:
quantitative analysis via automated model abstraction. PLoS
Comput Biol 2010;6:e1000723.
60. Bintu L, Buchler NE, Garcia HG, et al. Transcriptional regulation
by the numbers: models. Curr Opin Genet Dev 2005;15:116–24.
61. Chickarmane V, Troein C, Nuber UA, et al. Transcriptional dynamics of the embryonic stem cell switch. PLoS Comput Biol
2006;2:e123.
62. Zeigler RD, Cohen BA. Discrimination between thermodynamic models of cis-regulation using transcription factor
occupancy data. Nucleic Acids Res 2014;42:2224–34.
63. Gardner TS, Cantor CR, Collins JJ. Construction of a genetic
toggle switch in Escherichia coli. Nature 2000;403:339–42.
64. Gertz J, Cohen BA. Environment-specific combinatorial cisregulation in synthetic promoters. Mol Syst Biol 2009;5:244.
65. Nguyen Np, Myers C, Kuwahara H, et al. Design and analysis
of a robust genetic muller C-element. J Theor Biol 2010;264:
174–87.
66. Ghaemmaghami S, Huh WK, Bower K, et al. Global analysis of
protein expression in yeast. Nature 2003;425:737–41.
67. Belle A, Tanay A, Bitincka L, et al. Quantification of protein
half-lives in the budding yeast proteome. Proc Natl Acad Sci
USA 2006;103:13004–9.
68. Wan E, Van Der Merwe R. The unscented Kalman filter for nonlinear estimation. In: Adaptive Systems for Signal Processing,
Communications, and Control Symposium 2000. Piscataway, NJ:
IEEE, 2000, 153–8.
69. Ernst J, Bar-Joseph Z. STEM: a tool for the analysis of short
time series gene expression data. BMC Bioinformatics 2006;7:
191.