Briefings in Bioinformatics, 16(6), 2015, 987–999 doi: 10.1093/bib/bbv015 Advance Access Publication Date: 26 March 2015 Paper Parameter estimation methods for gene circuit modeling from time-series mRNA data: a comparative study Ming Fan*, Hiroyuki Kuwahara*, Xiaolei Wang, Suojin Wang and Xin Gao Corresponding author. Xin Gao, Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Kingdom of Saudi Arabia. Tel.: þ966-12-8080323; Fax: þ966-12-8021241. E-mail: [email protected] *These authors contributed equally to this work. Abstract Parameter estimation is a challenging computational problem in the reverse engineering of biological systems. Because advances in biotechnology have facilitated wide availability of time-series gene expression data, systematic parameter estimation of gene circuit models from such time-series mRNA data has become an important method for quantitatively dissecting the regulation of gene expression. By focusing on the modeling of gene circuits, we examine here the performance of three types of state-of-the-art parameter estimation methods: population-based methods, online methods and model-decomposition-based methods. Our results show that certain population-based methods are able to generate highquality parameter solutions. The performance of these methods, however, is heavily dependent on the size of the parameter search space, and their computational requirements substantially increase as the size of the search space increases. In comparison, online methods and model decomposition-based methods are computationally faster alternatives and are less dependent on the size of the search space. Among other things, our results show that a hybrid approach that augments computationally fast methods with local search as a subsequent refinement procedure can substantially increase the quality of their parameter estimates to the level on par with the best solution obtained from the population-based methods while maintaining high computational speed. These suggest that such hybrid methods can be a promising alternative to the more commonly used population-based methods for parameter estimation of gene circuit models when limited prior knowledge about the underlying regulatory mechanisms makes the size of the parameter search space vastly large. Key words: parameter estimation; gene circuits; comparative study; thermodynamic-based modeling Introduction Recent advances in biotechnology have facilitated integrative, cross-disciplinary biological research [1, 2]. In particular, technologies enabling high-resolution, high-throughput, time-series measurements of gene expression levels [3–8] have provided a means of inferring kinetic models that can more accurately capture the dynamics of gene regulatory systems than ever before. As direct measurements of kinetic parameters in gene circuit models are rare, advances in such technologies have proved Ming Fan is an assistant professor in the College of Life Information Science and Instrument Engineering, Hangzhou Dianzi University, China. Hiroyuki Kuwahara is a research scientist in the Structural and Functional Bioinformatics Group in the Computational Bioscience Research Center and the Division of Computer, Electrical and Mathematical Sciences and Engineering at King Abdullah University of Science and Technology. Xiaolei Wang is a PhD candidate in the Structural and Functional Bioinformatics Group in the Computational Bioscience Research Center and the Division of Computer, Electrical and Mathematical Sciences and Engineering at King Abdullah University of Science and Technology. Suojin Wang is a professor in the Department of Statistics, Texas A&M University, USA. Xin Gao is an assistant professor and the lead of the Structural and Functional Bioinformatics Group in the Computational Bioscience Research Center and the Division of Computer, Electrical and Mathematical Sciences and Engineering at King Abdullah University of Science and Technology. Submitted: 8 December 2014; Received (in revised form): 9 February 2015 C The Author 2015. Published by Oxford University Press. For Permissions, please email: [email protected] V 987 988 | Fan et al. to be essential to the reverse engineering of gene circuits. The construction of phenomenological models via the reverse engineering of biological systems is useful to test various competing hypotheses on the underlying molecular mechanisms that quantitatively control the expression of genes, and they can also be used to design perturbation experiments to refine current knowledge [9–11]. A major challenge in the construction of quantitative biological models is parameter estimation. Given experimental data and a fixed model structure, parameter estimation seeks to align the dynamics of the model with the observed measurements by fitting unknown model parameters that are constrained within biologically relevant bounds [12, 13]. One of the most intuitive ways to tackle this problem is to transform it into a global optimization problem and use optimization-based methods to search for suitable parameter sets. Previous comparative studies examined a number of optimization-based parameter estimation methods for various biological models [14–16]. The typical focus of those studies was the application of stochastic metaheuristics to the parameter estimation of biological models. Commonly used metaheuristics are based mainly on population-based optimization algorithms, including the stochastic ranking evolution strategy (SRES) [17], differential evolution (DE) [18] and particle swarm optimization (PSO) [19, 20]. The scatter search method (SSM) [21, 22] was also used in parameter estimation of biological models [19, 23]. To increase the accuracy, hybrid approaches that combine stochastic metaheuristics and deterministic local search algorithms were applied so as to guarantee that the solution reaches some local optimum [24, 25]. The computational requirements of these stochastic metaheuristic-based approaches can, however, be daunting, especially when intermediate solutions produce models with widely different timescale characteristics, and the scalability becomes a challenging issue. To improve the computational efficiency and the scalability of parameter estimation, several approaches were developed to decompose systems of ordinary differential equations (ODEs) and reduce the search space [26–29]. Another way to increase the computational speed is to use online filtering algorithms [30]. These include hybrid extended Kalman filters (HEKFs) [31, 32], unscented Kalman filters (UKFs) [6, 33, 34] and particle filters (PFs) [35–37]. While a number of parameter estimation methods have been applied to modeling of various biological systems, it is clear that each method has advantages and disadvantages, and that there is no one-size-fits-all method for parameter estimation of biological models. Here, we examined the use of several state-of-the-art parameter estimation methods, specifically in the context of the reverse engineering of gene circuits using time-series mRNA data sets. A gene circuit is a network of genes that interact with each other to regulate their expression. In gene circuits, transcription initiation based on the interaction between transcription factors and cis-regulatory elements plays a crucial role in the control of gene expression. In transcriptional regulation, the typical behavior is often quantitatively characterized by statistical thermodynamic models based on systems of coupled ODEs [38]. Parameter estimation of such kinetic models is particularly challenging owing to (i) the highly nonlinear nature of gene regulation and (ii) the noisy nature of the experimental measurements. These limitations make parameter estimation of gene circuit models a non-convex, global optimization problem with many local optima [39]—a computationally difficult problem. In addition, high-throughput time-series measurements are conducted at sparse observation time points compared with the timescale of gene expression reactions, and they are often limited to mRNA molecules [3–6, 8, 40, 41]. Thus, while protein abundance can be experimentally quantified, the protein dynamics is typically treated as unobservable, which further complicates the parameter estimation problem. In what follows, we explore both the accuracy and computational efficiency of parameter estimation methods in the context of modeling gene circuits using synthetic and real time-series gene expression data. Methods Problem setting In this comparative study, we assume that the observations of a gene circuit with N genes, g1 ; . . . ; gN , be given by bulk-level time-series mRNA data at M time points, t1 < t2 . . . < tM , and denote the empirical mean of each mRNA mi at time tj by mij . The protein copy of each gene gi, denoted by pi, can be used to regulate the transcription of any genes in the gene circuit. The regulation of each gene is often modeled using four rate-limiting reaction processes: transcription initiation, mRNA degradation, translation initiation and protein degradation. Because bulklevel time-series mRNA data sets can only infer the average time course of mRNA levels, we focus on a continuousdeterministic version of the gene circuit model based on a system of ODEs that captures these four rate-limiting processes as follows: ^i dm ^; m ^ i ; hi Þ hi ðp ^ ; ðhi2 ; . . . ; hiKi ÞÞ hi1 m ^ i; ¼ fi ðp dt ^i dp ^ i bi p ^ i; ¼ ai m for i ¼ 1; . . . ; N; dt (1) ^ i are time-dependent variables that estimate the ^ i and p where m average concentrations of mRNA mi and protein pi, respectively; ^ is fi is the reaction rate function for the regulation of mRNA mi; p ^ i ; hi ðhi1 ; . . . ; an N-dimensional vector whose i-th element is p hiKi Þ is a Ki-dimensional vector that represents the parameters used in the rate equation representing the regulation of mRNA mi; hi represents transcription reaction kinetics; ai and bi are the parameters used in the regulation of protein pi. Here, to represent transcription reaction kinetics based on the interaction between regulatory proteins and cis-regulatory elements, we set the function hi to be the equilibrium thermodynamics model [38]. A brief overview of the thermodynamic-based transcriptional regulation modeling formalism is presented in Supplementary Section S1.1. The objective of the parameter estimation of a gene circuit model described in Equation (1) is, thus, to search for the values of unknown parameters in each hi so that the dynamics of the model can reconstruct the observed time-series mRNA data. Here, we examine various state-of-the-art parameter estimation algorithms in the context of gene circuit modeling with diverse parameter estimation settings. Algorithms In this study, we selected state-of-the-art parameter estimation methods from three categories: population-based methods, online methods and decomposition-based methods. Brief descriptions of the parameter estimation methods examined in this study are given in this section, while detailed descriptions and specific configurations are presented in Supplementary Sections S2 and S4. Parameter estimation methods for gene circuit modeling Population-based methods Differential evolution. Differential evolution (DE) was proposed to handle non-differentiable and nonlinear cost functions [42]. It is a generic type of metaheuristics for global optimization problems. The original DE algorithm does not constrain the parameters between the upper and lower boundaries. We, therefore, modified the algorithm to enable parameter boundaries (Supplementary Section S2.1). Stochastic ranking evolution strategy. Given an objective function, the parameter estimation problem can be formulated as a constrained optimization problem. The SRES was introduced based on the idea of soft constraints (i.e. constraints are added to the objective function as the penalty term) [17]. SRES was ranked as the best method in a comparative study of parameter estimation of a metabolic network model [15]. | 989 based methods. Here, we refer to these four PEDI-based methods as PEDI(DE), PEDI(SRES), PEDI(SSM) and PEDI(PSO). Objective function Because the four population-based methods treat parameter estimation as an optimization problem, they require some objective functions that they are set out to optimize. Following Moles et al. [15], we defined the objective function to be the weighted sum of the squared residuals of the levels of mRNAs over the M time points as follows: Ji ¼ M X ^ i ðtj Þ mij 2 ; wi m (2a) Ji ; (2b) j¼1 J¼ N X i¼1 Scatter search method. The scatter search framework was first proposed by Laguna and Martı́ [43] as a hybrid search method that combines global search with local search. Later, RodriguezFernandez et al. improved several steps of the original framework and proposed the SSM [19]. Their study showed that SSM could find global optima in a complex biochemical reaction network whereas SRES and DE became trapped in the local optima. Particle swarm optimization. PSO is based on the idea of simulating social behaviors [20, 44]. We implemented PSO as described by Birge et al. [45]. Online methods Hybrid extended Kalman filter. The Kalman filter is a minimum variance estimator, which operates by propagating the state and the covariance of a discrete-time linear system through time [46]. The HEKF considers continuous-time, nonlinear systems with discrete-time measurements by extending the Kalman filter. In this study, we used a variant of a constrained HEKF, which was used by Lillacci and Khammash [32] to impose constraints for the lower bounds of the states (Supplementary Section S2.2). Unscented Kalman filter. The UKF is another filtering method that propagates the means and covariances of states in nonlinear systems by extending the Kalman filter [46]. Here, we used a variant of a constrained UKF [47], which can have constraints for the boundary conditions of the states. Particle filter. The PF [35], also known as the sequential Monte Carlo method, is a model estimation method based on a recursive Bayesian filter with Monte Carlo sampling. In a recent comparative study [30], PF generated more accurate and consistent results than HEKF and UKF did in estimating parameters of an Escherichia coli heat shock response model. Decomposition-based methods Parameter estimation by decomposition and integration (PEDI) is a scalable framework for parameter estimation. It was specifically designed to estimate parameters of gene circuit models [29]. Whereas the parameter search space in typical parameter estimation methods increases exponentially as the number of unknown parameters increases, PEDI is able to reduce the search space substantially by decomposing the gene circuit model described in Equation (1) into rate equations at the individual gene level (Supplementary Section S2.3). In this study, we applied the PEDI framework to the aforementioned four population- where wi, the weight for mRNA mi, is given as wi ¼ 1=max j ½ðmij Þ2 . Within the PEDI framework [29], a gene circuit model based on a system of ODEs, i.e. Equation (1), is decomposed into those based on individual genes. Hence, when a population-based method is used in PEDI as a parameter search module, objective function Ji, Equation (2a), is used to optimize the parameter set, hi. Experimental settings with synthetic time-series mRNA data This section overviews the experimental setting of our comparative study with synthetic time-series mRNA data (detailed descriptions are presented in Supplementary Section S1.2). To evaluate and compare parameter estimation methods, we considered five gene circuit models (Figure 1 and Supplementary Section S3). The experimental workflow for our comparative study is depicted in Figure 2. The time-series mRNA data of each gene circuit are generated by adding two levels of white Gaussian noise, i.e. 10 and 25%, to the true mean dynamics (see Supplementary Section S1.2 for details). We evaluated both the original algorithm and a hybrid version of each method. The hybrid method is a combination of the original algorithm and a local search algorithm in which the output parameter set from the original method is fed into the local search method as the initial parameter set. One major advantage of this hybrid approach is that the resulting parameter set is guaranteed to reach a local optimum (Supplementary Section S1). Because the population-based methods and the PEDI-based methods can set parameter boundaries with their constraints, we used two different parameter ranges (i.e. a narrower one and a wider one) to test the effects of the size of parameter search space on their performance (Supplementary Section S4). This resulted in 20 combinations of test cases for each method (i.e. five models with two noise levels and two boundary conditions). We evaluated each of the methods with computational efficiency and accuracy criteria. The computational efficiency was quantified based on the method’s runtime and how fast it achieved certain levels of accuracy, while the accuracy was quantified by using the prediction error, which we measured using Equation (2b). Results The population-based methods To measure the performance of the population-based methods, we ran each method three times at various parameter estimation 990 | Fan et al. Figure 1. Network structures (left) and the true mean trajectories of gene circuit models (right). Each mi represents the mRNA of corresponding gene gi. (A) Model 1. (B) Model 2. (C) Model 3. (D) Model 4. (E) Model 5. In these network structures, each line with an arrow head indicates transcriptional activation, while each line with a bar indicates transcriptional repression. A colour version of this figure is available at BIB online: http://bib.oxfordjournals.org. settings (Supplementary Section S4). In each of the runs, we fixed the total number of searches to be 1 million. Table 1 shows the results of the run with the lowest prediction error of each population-based method. Comparing the accuracy performance of the original algorithms, we found that SSM most frequently generated parameter solutions with the highest degree of accuracy, achieving the lowest prediction error in 18 of the 20 settings (Table 1). We also observed that, in many settings, SRES was able to achieve accuracy levels on par with those from SSM. Next, we examined the runtime based on 1 million searches (Table 1 and Supplementary Section S5.1.1). From this experiment, PSO and DE emerged as the computationally most efficient method in 9 and 7 of the 20 settings, respectively. However, they were also the slowest in several settings, suggesting that the runtime can be easily affected by both noise Parameter estimation methods for gene circuit modeling | 991 Figure 2. The experimental setting based on synthetic time-series data in this comparative study. levels and parameter boundaries in a complex fashion (Supplementary Section S5.1.1). To examine how parameter solutions were improved for each method, we first analyzed the intermediate solutions at various search points (Supplementary Section S5.1.2). We found that solutions from SSM had the highest level of accuracy at most search points. Next, we analyzed the accuracy improvement rate based on computational time (Supplementary Section S5.1.3). We found that SSM generally reaches higher accuracy levels much more rapidly than the other populationbased methods do. These results agree with those by RodriguezFernandez et al. [19], which reported that SSM had higher accuracy improvement rates compared with DE and SRES. One exception was Model 2 in which SRES found high-quality solutions in the wider parameter range settings more rapidly than SSM did. This may be because the local search-based optimization had difficulty finding parameter solutions that exhibited similar oscillatory dynamics in the large search space because such a behavior requires intricate combinations of parameters. By comparing the results from various configurations, we found that the population-based methods generally resulted in higher prediction errors in the wider parameter setting (31 of 40 cases; see Supplementary Table S6). The results from intermediate solutions also showed that the wider parameter boundary settings resulted in higher prediction errors, and in a number of those settings, solutions were not able to stabilize within 1 million searches (Supplementary Section S5.1.2). Given that it took between 8 h and 2 days of computational power to run each of the population-based methods with 1 million searches, it is clear that the computational requirement of the populationbased methods can substantially increase as the parameter search space increases. A previous study [32] and a recent review paper [16] suggested the usefulness of hybrid methods for parameter estimation. We analyzed the performance of hybrid methods based on the use of local search followed by the best run of each population-based method. The results revealed that the accuracy improvements were mainly dependent on the type of method (Table 1 and Supplementary Section S5.1.4). Overall, our results suggest that this hybrid approach is useful in that the accuracy improvement via the addition of a local search to the three methods—except for SSM—more than compensates for the cost of the runtime overhead. The online methods To analyze the performance of the three online methods, we ran each method based on different initial parameter guesses (Supplementary Section S5.2.1). The results of the prediction error show that the accuracy levels can differ substantially depending on the initial guesses whereby a high level of accuracy was achieved only when the initial guesses were close to the true solution (Figure 3 and Supplementary Section S5.2.2). To examine the accuracy further, we computed three statistical measures: the lowest prediction error (i.e. the best), the mean and the standard deviation (Table 2). From these measures, we observed a huge disparity in the accuracy levels between the best and the median prediction error (Supplementary Section S5.2.2). In this experiment, we observed that some of the HEKF and UKF runs led to numerical errors. By dissecting the problems, we found two issues based on numerical instabilities (Supplementary Section S5.2.3). In particular, the problem from HEKF and the follow-up experiment demonstrated that the parameter boundary constraints are essential to accurate parameter estimation when online methods are used (Supplementary Section S5.2.3). The runtime results show that the online methods are considerably fast (Table 2). The computationally most efficient 992 | Fan et al. Table 1. Comparison of the best runs from the population-based methods Boundary range Narrower Noise level 10% Version Original Hybrid Original Hybrid Original Hybrid Original Hybrid 0.02 959.39 0.02 955.17 0.02 916.00 0.10 867.53 0.70 2012.03 0.17 2684.15 0.17 2593.82 0.24 1829.19 0.32 1195.33 0.14 1089.73 0.14 1053.48 0.14 1142.32 0.55 830.60 0.49 789.70 0.49 815.14 1.05 767.95 1.88 827.46 1.08 833.21 1.08 817.80 3.31 871.97 0.02 959.40 0.02 955.18 0.02 916.01 0.02 867.67 0.17 2017.97 0.17 2684.24 0.17 2593.91 0.23 1829.95 0.14 1198.20 0.14 1089.77 0.14 1053.52 0.14 1142.37 0.49 831.77 0.49 789.72 0.49 815.17 0.49 769.10 1.08 833.00 1.08 833.29 1.08 817.84 1.08 878.00 0.07 900.19 0.07 904.17 0.07 919.55 0.13 814.33 0.74 1990.55 0.33 2574.35 0.33 2540.06 0.34 1852.33 0.58 1096.66 0.39 1044.15 0.38 1024.20 0.39 1029.56 0.90 750.12 0.81 652.94 0.81 654.75 1.32 733.90 3.96 785.80 2.80 826.09 2.80 803.87 6.67 822.45 0.07 900.19 0.07 904.18 0.07 919.56 0.07 814.47 0.33 1997.78 0.33 2574.44 0.33 2540.16 0.34 1852.43 0.39 1100.53 0.39 1044.19 0.38 1024.24 0.39 1029.73 0.81 751.60 0.81 652.96 0.81 654.77 0.81 735.34 2.80 790.83 2.80 826.23 2.80 803.91 2.80 830.00 0.02 971.05 0.04 975.80 0.04 1053.20 0.05 1037.73 4.18 1020.32 0.17 2179.97 0.17 2342.72 46.46 2552.82 9.47 2785.98 0.33 3345.77 0.14 2900.98 5.50 3697.71 1.20 835.25 1.26 1029.86 0.53 934.98 12.10 514.51 4.36 601.26 6.29 674.37 1.41 771.51 9.78 578.95 0.02 971.06 0.04 975.81 0.04 1053.21 0.04 1037.80 0.17 1037.32 0.17 2180.06 0.17 2342.80 44.32 2556.95 0.19 2821.26 0.33 3346.26 0.14 2901.07 1.90 3751.98 0.59 838.09 0.53 1036.72 0.53 935.02 12.10 514.52 2.79 604.26 5.40 697.37 1.41 771.59 6.18 583.70 0.07 908.12 0.09 920.15 0.09 1010.11 0.09 999.81 3.53 1027.41 0.33 2467.70 0.33 2044.23 18.52 2363.73 7.67 2502.13 0.59 3112.10 0.38 2706.34 3.37 2811.04 1.75 645.93 1.72 774.05 0.94 786.99 10.83 503.06 6.87 596.08 10.26 683.38 3.02 707.48 18.88 522.60 0.07 908.13 0.09 920.16 0.09 1010.12 0.09 999.82 0.33 1046.60 0.33 2467.81 0.33 2044.35 18.22 2371.73 0.53 2533.62 0.59 3112.98 0.38 2706.45 0.48 2858.36 0.97 649.33 1.61 779.38 0.94 787.01 10.83 503.07 4.58 605.27 3.20 693.78 3.02 707.54 17.59 523.64 Model 1 DE SRES SSM PSO Model 2 DE SRES SSM PSO Model 3 DE SRES SSM PSO Model 4 DE SRES SSM PSO Model 5 DE SRES SSM PSO Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Wider 25% 10% 25% The lowest prediction error and runtime (in minutes) for each situation are in bold. The results of the hybrid approach here were based on the best solution obtained from each of the population-based methods. method was HEKF, which, regardless of the initial guess, required the shortest runtime. UKF was also fast. While PF with 200 particles was the slowest among the three methods, it was still relatively fast, with most of the runtime being <1 min. Next, we analyzed the performance of the online-based hybrid methods (Figure 3). Our results demonstrated that they consistently generated solutions with much higher levels of accuracy compared with their stand-alone counterparts (Supplementary Section S5.2.4). In particular, we observed that the accuracy and consistency of UKF þ LS, the hybrid method based on UKF with a subsequent local search, were substantially improved compared with those of UKF. While the overhead of the subsequent local search was large considering the very short runtime of the online methods, the overall runtime was still frequently 100-fold faster than that of the populationbased methods, indicating a strong advantage of having a subsequent local search step. The PEDI-based methods We evaluated the performance of PEDI-based methods by using the four population-based methods (Supplementary Section S5. 3.1). Among the four original, stand-alone PEDI-based methods, PEDI(SSM) turned out to be the most accurate one, scoring the lowest prediction error in 14 of the 20 settings (Table 3). PEDI(DE) also exhibited high levels of accuracy, achieving the lowest prediction error in 12 settings (Table 3). This result differs from those of the population-based methods in which DE rarely achieved a high level of accuracy compared with the Parameter estimation methods for gene circuit modeling | 993 Figure 3. Comparison of the three online methods in terms of the prediction accuracy in various settings. In these log–log plots, prediction error (the y-axis) is measured given initial parameter guesses. The x-axis shows the ratio of each initial guess to the true parameters expressed as percentages. Here, the results of the original online methods and the hybrid methods with 10% and 25% noise levels are shown for the five models. The discontinuities in the HEKF lines in Model 2 indicate points with unrepresentable values. A colour version of this figure is available at BIB online: http://bib.oxfordjournals.org. other three. Another unexpected observation was that the accuracy levels of the PEDI-based methods in different experimental settings were more comparable than those of the corresponding population-based methods (Supplementary Table S6). The runtime data show that PEDI(SRES) and PEDI(PSO) were substantially faster than PEDI(DE) and PEDI(SSM) (Table 3). By comparing the runtime data between the population-based method and the PEDI-based methods, we found that the runtime of the PEDI-based methods was much shorter, but the efficiency gain via PEDI was strongly dependent on the type of search. PEDI(SRES) and PEDI(PSO) mostly achieved more than a 100-fold speedup compared with the population-based counterparts, whereas the others typically had low single-digit speedup (Supplementary Section S5.3.2). Next, we examined the quality of the intermediate solutions (Supplementary Section S5.3.3). We found that, in all settings, the solutions of PEDI(DE) and PEDI(SSM) stabilized after fewer rounds of refinement than those needed for PEDI(SRES) and PEDI(PSO). However, based on the computational time to achieve specific accuracy levels, there were only few settings where PEDI(DE) and PEDI(SSM) were the fastest ones, indicating that their refinement iterations are computationally demanding. Analysis of the PEDI-based hybrid methods showed that all four PEDI-methods benefited greatly from using a follow-up local search and that they had mostly comparable levels of prediction errors (Table 3). In particular, the accuracy improvement of PEDI(SRES) þ LS was remarkable given its runtime, which was often >50 times faster than those of PEDI(DE) þ LS and 994 | Fan et al. Table 2. Comparison of prediction error and runtime (in minutes) among the online methods from 100 samples with different initial guesses Measurements Besta Noise level 10% Version Original Hybrid Original Hybrid Original Hybrid Original Hybrid Original Hybrid Original Hybrid 0.04 0.01 3.36 0.04 0.02 0.29 0.41 0.03 0.20 0.16 0.20 0.57 0.30 0.02 1.59 0.19 0.15 0.35 4.87 0.01 0.56 0.06 0.56 0.23 4.41 0.03 1.30 0.13 1.30 0.25 0.02 0.10 0.04 0.24 0.02 0.36 0.17 3.41 0.17 2.34 0.17 2.83 0.14 1.63 0.14 1.78 0.14 1.29 1.83 2.30 0.48 1.31 0.48 1.49 1.85 4.20 1.05 8.92 1.05 10.12 0.75 0.01 4.15 0.04 0.08 0.29 0.61 0.03 0.37 0.27 0.37 0.59 0.67 0.02 2.34 0.60 0.43 0.34 4.61 0.01 1.22 0.08 1.22 0.24 9.37 0.06 3.15 0.15 3.15 0.26 0.07 0.18 0.09 0.34 0.07 0.36 0.33 3.25 0.33 3.20 0.33 3.36 0.39 1.69 0.39 3.33 0.38 3.55 1.60 1.65 0.79 1.54 0.79 1.75 3.56 27.39 2.70 11.24 2.70 11.48 4.13 0.03 422.73 0.05 422.73 0.31 80.97 0.03 144.27 0.16 2366.76 1.11 47.83 0.03 83.13 0.20 81.01 0.62 31.38 0.03 26.03 0.07 46.78 0.48 53.54 0.06 50.41 0.15 52.41 0.61 0.04 0.20 2.55 0.24 2.56 0.47 22.53 4.86 26.36 2.51 52.50 4.83 0.14 13.64 0.14 3.15 0.19 13.64 5.39 2.37 0.52 2.78 4.60 2.01 10.71 4.96 1.11 17.43 16.18 5.72 34.34 0.01 483.78 0.06 419.93 0.31 70.70 0.03 126.44 0.29 2078.39 1.10 36.17 0.03 81.11 0.69 74.47 0.71 18.95 0.03 24.96 0.08 39.34 0.51 101.77 0.05 52.13 0.15 55.11 0.63 0.09 0.12 2.36 0.29 2.53 0.48 22.82 4.50 23.92 2.46 42.69 4.16 0.46 13.58 0.40 3.17 0.46 13.59 1.99 2.03 0.86 2.75 3.24 2.16 13.58 6.45 2.77 14.00 19.46 6.55 >1e4 0.01 625.65 0.01 630.90 0.01 831.25 0.01 39.79 0.03 3003.46 0.22 17.97 0.01 37.17 0.03 32.20 0.17 2568.50 0.01 804.23 0.02 28.28 0.12 >1e4 0.02 128.80 0.03 16.88 0.18 2.24 0.07 1.51 0.10 1.45 0.10 18.59 6.45 18.73 5.57 21.27 15.00 0.55 11.70 0.00 0.95 0.66 11.78 4.84 1.70 4.02 1.66 5.98 1.72 5.81 6.22 0.08 6.66 7.89 5.47 >1e4 0.01 >1e4 0.01 623.88 0.01 401.22 0.01 34.05 0.03 2652.69 0.20 14.51 0.01 196.41 0.11 29.72 0.21 135.36 0.01 445.02 0.01 22.20 0.14 >1e4 0.04 1005.23 0.02 17.04 0.19 14.64 0.11 1.36 0.12 1.42 0.10 15.81 9.47 15.48 6.82 19.15 11.56 0.91 9.05 0.06 1.12 0.48 11.07 3.81 1.71 2.52 1.30 4.64 1.50 6.31 6.48 1.69 5.09 8.13 7.54 Model 1 HEKF UKF PF Model 2 HEKFb UKF PF Model 3 HEKF UKF PF Model 4 HEKF UKF PF Model 5 HEKF UKF PF Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Median 25% Standard deviation 10% 25% 10% 25% The best performance for each setting is shown in bold. a The best solution of each hybrid approach is based on local search using the most accurate solution of the online method among the 100 runs as the initial seed. The runtime row in the Best field shows the runtime of the most accurate solution. b Because HEKF resulted in no solutions for 23 and 16 runs in the 10% and 25% noise settings, respectively, we excluded those runs from computation of statistical measures. PEDI(SSM) þ LS. Our results, thus, indicate that PEDI(SRES) þ LS is overall a well-balanced parameter estimation method with a high level of accuracy and computational efficiency. Our results have shown that the online- and PEDI-based hybrid methods are capable of achieving the same level of accuracy as the population-based hybrid methods, but computationally much more efficiently. To evaluate how these methods perform more objectively, we compared the accuracy and the consistency of the hybrid methods with similar runtime speeds (see Supplementary Section S5.3.4 for details). We found that, while the accuracy levels of solutions from PEDI-based hybrid methods were more consistent with respect to different initial parameter guesses than those from the online-based hybrid methods were, both were able to consistently generate parameter solutions with high levels of accuracy, on par with those from SSM. Results from the E. coli SOS response system with microarray data Next, we compared the performance of these parameter estimation methods using a time-series gene expression data set from cDNA microarray experiments of the E. coli SOS response system by Courcelle et al. [48]. The SOS response system implements a damage tolerance mechanism that senses DNA damage and regulates the transcription of SOS genes that induce DNA repair [49, 50]. Protein LexA is a master regulator in this gene regulatory system that, by binding to the helix-turn-helix motif as a homodimer, represses the transcription of >30 SOS genes [51–54]. Courcelle et al.’s microarray data sets contain fold-change data of many mRNAs that are potentially regulated by LexA at 6 time points after UV exposure [48]. Our SOS response model describes the regulation of seven genes that are known to be controlled by LexA (Figure 4 and Supplementary Section S5.4.1). To reduce the effects of upstream pathways and to focus on LexA-regulated gene expression circuitry, we used the microarray data set from the E. coli mutant with non-cleavable LexA. The parameter range was chosen based on our prior knowledge about the SOS response system (see Supplementary Section S5.4.2). In this experiment, the population-based methods and the PEDI-based methods were run three times, while the onlinebased methods were run 10 times because they were fast. Although the online methods demand that the covariance matrix be given, we could not obtain this information with a high confidence. Thus, to use the online methods, we made a series Parameter estimation methods for gene circuit modeling | 995 Table 3. Comparison of the best runs from the PEDI-based methods Boundary range Narrower Noise level 10% Version Original Hybrid Original Hybrid Original Hybrid Original Hybrid 0.04 333.13 0.04 4.48 0.04 353.41 0.07 4.10 0.45 858.15 39.23 16.27 0.46 971.89 40.76 13.79 0.14 895.43 0.30 15.38 0.14 1000.81 0.55 13.90 0.57 386.61 1.35 6.36 0.57 443.01 2.42 5.54 1.54 781.32 4.25 12.81 1.11 875.50 10.26 9.85 0.02 333.24 0.02 4.60 0.02 353.54 0.02 4.24 0.17 862.15 0.17 23.92 0.17 975.80 0.17 23.04 0.14 897.95 0.14 19.60 0.14 1002.51 0.14 17.99 0.49 389.36 0.49 7.38 0.49 444.02 0.49 7.24 1.08 785.39 1.08 18.64 1.08 879.85 1.08 14.93 0.12 327.83 0.12 4.53 0.12 359.58 0.12 4.19 3.18 877.23 2.20 16.46 3.26 981.53 25.25 13.95 0.39 914.31 0.43 15.30 0.43 1013.33 0.66 14.01 1.14 391.41 1.70 6.36 0.90 446.24 3.52 5.67 3.22 789.55 3.54 12.82 3.05 887.15 87.95 9.37 0.08 327.95 0.08 4.64 0.08 359.69 0.08 4.31 0.33 882.17 0.33 22.30 0.33 986.46 0.33 23.39 0.39 918.15 0.39 20.08 0.39 1017.39 0.39 17.74 0.81 392.91 0.81 7.72 0.81 447.23 0.81 7.41 2.80 795.69 2.80 17.79 2.80 893.07 2.80 15.65 0.04 340.28 0.04 4.45 0.04 357.05 0.07 4.16 0.45 871.29 0.53 16.56 0.46 976.59 40.76 13.87 0.14 900.93 0.16 15.06 0.14 1011.66 0.55 14.07 0.57 402.81 1.07 6.39 0.57 444.25 2.42 5.63 1.54 810.84 2.52 12.88 1.11 881.96 10.26 10.05 0.02 340.56 0.04 4.51 0.04 357.18 0.04 4.23 0.17 877.30 0.17 24.59 0.17 982.61 15.63 15.69 0.14 903.74 0.15 21.56 0.14 1012.02 0.33 24.06 0.49 404.47 0.59 10.53 0.48 445.58 0.59 8.42 1.20 818.28 1.62 24.46 1.06 886.07 10.23 12.23 0.12 328.01 0.12 4.52 0.12 355.72 0.12 4.20 3.18 882.20 2.02 16.59 3.26 978.50 25.25 13.99 0.39 917.92 0.43 15.30 0.43 1011.69 0.66 14.02 1.14 393.40 2.08 6.35 0.90 445.94 3.52 5.70 3.22 791.18 8.51 12.94 3.05 887.55 87.95 9.41 0.09 328.11 0.10 4.58 0.09 355.91 0.09 4.44 0.33 890.50 0.33 24.35 0.33 986.73 1.49 64.07 0.38 920.71 0.39 24.86 0.38 1015.92 0.59 22.97 0.87 395.10 0.97 11.75 0.79 446.64 0.97 11.00 2.91 803.21 3.96 20.42 2.88 901.80 19.92 10.65 Model 1 PEDI(DE) PEDI(SRES) PEDI(SSM) PEDI(PSO) Model 2 PEDI(DE) PEDI(SRES) PEDI(SSM) PEDI(PSO) Model 3 PEDI(DE) PEDI(SRES) PEDI(SSM) PEDI(PSO) Model 4 PEDI(DE) PEDI(SRES) PEDI(SSM) PEDI(PSO) Model 5 PEDI(DE) PEDI(SRES) PEDI(SSM) PEDI(PSO) Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Error Time Wider 25% 10% 25% The lowest prediction error and runtime (in minutes) for each situation are in bold. The results of the hybrid approach here were based on the best solution obtained from each of the PEDI-based methods. of assumptions to generate the covariance values (Supplementary Section S5.4.3). Similar to our earlier experiments with synthetic mRNA data, the results from the best run were compared in both original and hybrid versions. Table 4 shows the relative accuracy and speedup factor of each method with respect to SSM, which produced the most accurate parameter solutions among the population-based methods. These results are consistent with those from the five gene circuit models with synthetic mRNA data and demonstrate that the PEDI- and online-based hybrid methods are capable of generating more accurate parameter solutions in a computationally more efficient fashion than the population-based methods generate. This experiment with a real cDNA microarray data set gives strong evidence that these computationally efficient hybrid methods are useful alternatives in estimating parameters of gene circuit models. Discussion In this study, we analyzed the performance of several state-of-the-art parameter estimation methods in the context of gene circuit modeling. While previous studies compared several parameter estimation methods in the systems biology setting [15, 19, 30], to the best of our knowledge, this study is the first instance in which various types of parameter estimation methods are compared specifically in the context of gene circuit modeling. In particular, we focused on the estimation of kinetic parameters in thermodynamic-based mRNA regulation models. Unlike parameters used in statistical and high-level phenomenological 996 | Fan et al. Table 4. Comparison of the accuracy and speedup factors among the best run from each of the parameter estimation methods using the E. coli SOS response model Version SSM DE SRES PSO PEDI(DE) PEDI(SRES) PEDI(SSM) PEDI(PSO) HEKF UKF PF Figure 4. The network structure and bulk-level mRNA dynamics of the E. coli Accuracy Speedup Accuracy Speedup Accuracy Speedup Accuracy Speedup Accuracy Speedup Accuracy Speedup Accuracy Speedup Accuracy Speedup Accuracy Speedup Accuracy Speedup Accuracy Speedup Original Hybrid 1.00 1.00 0.25 1.06 0.63 1.01 0.06 4.27 1.09 2.03 0.30 77.11 1.08 1.47 1.05 63.46 0.12 1622.93 0.01 995.29 0.03 425.08 1.00 0.99 0.63 1.05 0.63 1.00 0.65 4.10 1.31 2.01 1.18 49.03 1.29 1.47 1.27 58.34 0.61 143.68 0.60 73.78 0.66 56.66 SOS response system. Similar to the networks in Figure 1, each line with an arrow head indicates transcriptional activation, while each line with a bar indicates transcriptional repression. The bulk-level time-series mRNA data are from cDNA microarray experiments of the MG1655 lexA1(Ind-) strain by Courcelle et al. [48]. A colour version of this figure is available at BIB online: http:// bib.oxfordjournals.org. models, these parameters are based on underlying biophysical processes and have concrete biological meanings [38, 55, 56]. Whereas this equilibrium thermodynamic-based formalism has been traditionally applied to modeling prokaryotic gene regulation [38, 55, 57–59], it has also been used successfully to model eukaryotic gene regulation [60–62] as well as to design artificial gene circuits [63–65]. Thus, the estimation of kinetic parameters in thermodynamic-based gene circuit models has practical significance not just to simulating the dynamics of gene regulation, but also to gaining quantitative insights into how underlying transcriptional mechanisms are controlled by the interaction of regulatory proteins and DNA binding sites in a wide range of natural and synthetic organisms. Using such thermodynamic-based gene circuit models, we evaluated three types of parameter estimation methods: population-based methods, online methods and PEDI-based methods. To this end, we made relatively realistic assumptions about the type of time-series gene expression data available for parameter estimation of gene circuit models. Namely, instead of assuming that time-series data be given with finegrained time intervals, we assumed that, as is often the case with high-throughput gene expression data, the observation time points of our synthetic data were sparse relative to the timescale of the gene expression. In addition, as time-series proteomic data with corresponding mRNA data points are often not available, we assumed that only the population-level mRNA data were measured, and we thus treated the time course of transcription factors as unobservable. However, to capture transcriptional regulation based on the interaction of transcription factors and DNA binding sites, we included in our gene circuit models the reaction processes for protein The accuracy and speedup factors of each method were measured relative to the performance of SSM. (The accuracy and speedup factors were computed by dividing the prediction error and the runtime of SSM by those of each method. These reference values are 0.24 for the error and 61.51 min for the runtime.) The higher the value is, the better the performance is. The highest accuracy and speedup factors in each method group are in bold. regulation. We also assumed that the synthesis rate and degradation rate constant of each regulatory protein were known. Because such information can be deduced from the data for mRNA and protein abundance levels as well as protein half-life data, as measured in an eukaryotic cell [66, 67], we believe that this assumption is not unrealistic. By considering various parameter estimation settings with different noise levels and parameter boundary ranges, we showed that, when the parameter boundary ranges were relatively small, SRES and SSM attained the most accurate parameter solutions in a computationally efficient fashion among the population-based methods that we examined. With the wider parameter boundary setting, SSM markedly performed well, but the usefulness of the population-based methods was deteriorated in general, as they required a much larger number of searches to stabilize the prediction error and a much larger amount of time to arrive at solutions with higher levels of accuracy. These outcomes indicate that, should model parameters be estimated from the conservative boundary owing to limited quantitative knowledge about the underlying gene regulatory mechanisms, the population-based methods would be computationally expensive and possibly not as effective as the other methods. The online methods are computationally much more efficient alternatives, but our experiments demonstrated that the accuracy levels of their parameter solutions were not consistent and varied widely depending on the initial parameter guess. At the same time, the accuracy levels of the online methods were much lower compared with those of the populationbased methods unless the initial parameter guess was close to the true value. Parameter estimation methods for gene circuit modeling Unlike a previous study in which PF was reported to converge to the global optimum and to perform the best among online methods [30], we did not find any clear indications as to which method generates the most accurate solutions, suggesting a more complex picture of factors involved in the performance of the online methods. This is surprising, particularly because HEKF is based on a first-order Taylor expansion while UKF is equivalent to a third-order Taylor expansion [46, 68], which, intuitively speaking, indicates that the accuracy of UKF is expected to be higher than that of HEKF. This unexpected result may be owing to the fact that we used versions of HEKF and UKF that impose constraints on the parameter boundaries, which drastically improved the quality of the parameter solutions from HEKF and UKF. The discrepancy may also have arisen owing to the fact that our measurement time points were set to be much more sparse compared with those used by Liu et al. [30], in which time-series data with 1000 time points were assumed to be available. In fact, we believe that this point is significant in the reverse engineering of gene circuits because the time interval of time-series mRNA data is expected to be wide [69]. One thing to note here in the use of Kalman filter-based online methods in modeling gene circuits is that they demand that the covariance matrix of mRNAs and proteins be given. While it is easy to generate true covariance in synthetic data, obtaining such information may be more involved in wet-lab experiments, and this strict requirement may prevent these methods from being used in many real applications. PEDI-based methods are also computationally efficient methods that can be specific to parameter estimation of gene circuit models from time-series mRNA data with sparse time points [29]. Our results showed that most of the PEDI-based methods attained parameter solutions with accuracy levels on par with those of the population-based methods, but they were often much faster than the population-based methods. In particular, PEDI(SRES) was demonstrated to be a well-balanced method by achieving high accuracy and efficiency in general. Our experiments also demonstrated that, unlike the population-based methods, PEDI-based methods are more independent of the parameter boundaries and their solutions are largely unaffected by an increase in the boundary range. This indicates that PEDI-based methods would perform well even when prior knowledge about the range of each of kinetic parameters is limited. We also analyzed the performance of the hybrid approach that combines each parameter estimation method with a subsequent local search algorithm. Because most of the methods in this study—specifically, all but SSM—are not guaranteed to attain locally optimal solutions, the subsequent local search can help these methods increase the accuracy level of parameter solutions. In particular, we showed that this hybrid strategy improved the solutions from the online methods by substantially increasing their accuracy and stability. While the computational time of the local search was often much higher than that of the original online methods themselves, the overall runtime of the online-based hybrid methods was still computationally much more efficient compared with that of the population-based methods. We also showed that all of the PEDI-based methods benefited from the hybrid strategy and increased the accuracy levels with low computational overhead. In particular, we demonstrated that PEDI- and online-based hybrid methods were capable of generating parameter solutions comparable with— and many times more accurate than—those from the populationbased methods with high computational efficiency, and we confirmed these results using a real microarray data set. | 997 Conclusion Parameter estimation of gene circuit models is an essential step in discovering useful information about gene regulatory mechanisms from transcriptomics data. Population-based metaheuristics have traditionally been thought to be the de facto standard for parameter estimation of biochemical kinetic models [15, 16], but their usefulness is substantially lowered when the parameter search space widens. Our study indicates that, in such cases, hybrid approaches based on the aforementioned computationally efficient methods coupled with a local search algorithm are useful alternatives to the population-based methods. Accurate and computationally efficient estimation of kinetic parameters in gene circuit models is one key to the systematic understanding of gene regulatory systems. Thus, our results may have substantial implications in an integrative systems biology approach to predicting how genetic parts interact to control gene expression and understanding how such gene regulation can affect cellular morphology and physiology. Supplementary data Supplementary data are available online at http://bib.oxfordjournals.org/. Key Points • The accurate estimation of kinetic parameters in de- • • • • tailed gene circuit models from transcriptomics data is an essential step in integrative systems biology. We evaluated the performance of 22 distinct approaches based on three types of state-of-the-art parameter estimation methods using six gene circuit models with various parameter estimation settings. We found that the usefulness of the population-based methods, in general, was deteriorated with a larger parameter search space. We showed that a hybrid strategy that augments computationally efficient methods with a subsequent local search can substantially increase the accuracy of parameter solutions while still maintaining high computational efficiency. While population-based methods have been popular in systematically estimating parameters of biological models, our results suggest that computationally efficient hybrid methods are promising alternatives for effective parameter estimation of gene circuit models. Funding The research reported in this publication was supported by competitive research funding from King Abdullah University of Science and Technology (KAUST), the Natural Science Foundation of Zhejiang Province of China (LQ14F010011) and the National Natural Science Foundation of China (Grant No. 61401131). References 1. Hood L. A personal journey of discovery: developing technology and changing biology. Annu Rev Anal Chem 2008;1:1–43. 2. O’Shea P. Future medicine shaped by an interdisciplinary new biology. Lancet 2012;379:1544–50. 998 | Fan et al. 3. Nolan T, Hands RE, Bustin SA. Quantification of mRNA using real-time RT-PCR. Nat Protoc 2006;1:1559–82. 4. Joo C, Balci H, Ishitsuka Y, et al. Advances in single-molecule fluorescence methods for molecular biology. Annu Rev Biochem 2008;77:51–76. 5. Raj A, van Oudenaarden A. Single-molecule approaches to stochastic gene expression. Annu Rev Biophys 2009;38:255–70. 6. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 2009;10:57–63. 7. Dhanasekaran S, Doherty TM, Kenneth J, et al. Comparison of different standards for real-time PCR-based absolute quantification. J Immunol Methods 2010;354:34–9. 8. Materna SC, Nam J, Davidson EH. High accuracy, highresolution prevalence measurement for the majority of locally expressed regulatory genes in early sea urchin development. Gene Expr Patterns 2010;10:177–84. 9. Ideker T, Galitski T, Hood L. A new approach to decoding life: systems biology. Annu Rev Genomics Hum Genet 2001;2:343–372. 10. Kitano H. Computational systems biology. Nature 2002;420: 206–210. 11. Church GM. From systems biology to synthetic biology. Mol Syst Biol 2005;1:2005.0032. 12. Schwartz R. Biological Modeling and Simulation: A Survey of Practical Models, Algorithms, and Numerical Methods. The MIT Press, 2008, Cambridge, Massachusetts, USA. 13. Beck JV, Woodbury KA. Inverse problems and parameter estimation: integration of measurements and analysis. Meas Sci Technol 1999;9:839. 14. Mendes P, Kell D. Non-linear optimization of biochemical pathways: applications to metabolic engineering and parameter estimation. Bioinformatics 1998;14:869–83. 15. Moles CG, Mendes P, Banga JR. Parameter estimation in biochemical pathways: a comparison of global optimization methods. Genome Res 2003;13:2467–74. 16. Sun J, Garibaldi JM, Hodgman C. Parameter estimation using metaheuristics in systems biology: a comprehensive review. IEEE/ACM Trans Comput Biol Bioinform 2012;9:185–202. 17. Runarsson T, Yao X. Stochastic ranking for constrained evolutionary optimization. IEEE Trans Evol Comput 2000;4: 284–94. 18. Storn R, Price K. Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 1997;11:341–59. 19. Rodriguez-Fernandez M, Egea JA, Banga JR. Novel metaheuristic for parameter estimation in nonlinear dynamic biological systems. BMC Bioinformatics 2006;7:483. 20. Shi Y, Eberhart R. A modified particle swarm optimizer. In: IEEE International Conference on Evolutionary Computation. New York, NY: IEEE, 1998, 69–73. 21. Glover F. Heuristics for integer programming using surrogate constraints. Decis Sci 1977;8:156–66. 22. Fleurent C, Glover F, Michelon P, et al. A scatter search approach for unconstrained continuous optimization. In: Proceedings of 1996 IEEE International Conference on Evolutionary Computation (ICEC’96). New York, NY: IEEE, 1996, 643–8. 23. Glover F, Laguna M, Marti R. Scatter search and path relinking: advances and applications handbook of metaheuristics. In: F Glover, GA Kochenberger (ed). Handbook of Metaheuristics, Chapter 1, Vol. 57. Boston: Springer New York, 2003, 1–35. 24. Ashyraliyev M, Jaeger J, Blom JG. Parameter estimation and determinability analysis applied to Drosophila gap gene circuits. BMC Syst Biol 2008;2:83. 25. Liu PK, Yuh CH, Wang FS. Inference of genetic regulatory networks using S-system and hybrid differential evolution. In: IEEE Congress on Evolutionary Computation. Hong Kong, China. IEEE, 2008, pp. 1736–43, Piscataway, NJ, USA. 26. Koh G, Teong HFC, Clément MV, et al. A decompositional approach to parameter estimation in pathway modeling: a case study of the Akt and MAPK pathways and their crosstalk. Bioinformatics 2006;22:e271–80. 27. Zhan C, Yeung LF. Parameter estimation in systems biology models using spline approximation. BMC Syst Biol 2011;5:14. 28. Jia G, Stephanopoulos GN, Gunawan R. Parameter estimation of kinetic models from metabolic profiles: two-phase dynamic decoupling method. Bioinformatics 2011;27:1964–70. 29. Kuwahara H, Fan M, Wang S, et al. A framework for scalable parameter estimation of gene circuit models using structural information. Bioinformatics 2013;29:i98–107. 30. Liu X, Niranjan M. State and parameter estimation of the heat shock response system using Kalman and particle filters. Bioinformatics 2012;28:1501–7. 31. Sun X, Jin L, Xiong M. Extended Kalman filter for estimation of parameters in nonlinear state-space models of biochemical networks. PLoS One 2008;3:e3758. 32. Lillacci G, Khammash M. Parameter estimation and model selection in computational biology. PLoS Comput Biol 2010;6: e1000696. 33. Julier SJ, Uhlmann JK. New extension of the Kalman filter to nonlinear systems. In: AeroSense’97. International Society for Optics and Photonics, 1997, pp. 182–93 SPIE Digital Library (http://spie.org/). Bellingham, Washington, USA. 34. Sarkka S. On unscented Kalman filtering for state estimation of continuous-time nonlinear systems. IEEE Trans Automat Contr 2007;52:1631–41. 35. Sarkar P. Sequential Monte Carlo methods in practice. Technometrics 2003;45:106. 36. Nagasaki M, Yamaguchi R, Yoshida R, et al. Genomic data assimilation for estimating hybrid functional petri net from time-course gene expression data. Genome Inform 2006;17:46. 37. Tasaki S, Nagasaki M, Oyama M, et al. Modeling and estimation of dynamic egfr pathway by data assimilation approach using time series proteomic data. Genome Inform 2006;17:226. 38. Shea MA, Ackers GK. The OR control system of bacteriophage lambda: a physical-chemical model for gene regulation. J Mol Biol 1985;181:211–30. 39. Villaverde AF, Banga JR. Reverse engineering and identification in systems biology: strategies, perspectives and challenges. J R Soc Interface 2014;11:20130505. 40. Kulkarni MM. Digital multiplexed gene expression analysis using the NanoString nCounter system. Curr Protoc Mol Biol 2011;Chapter 25:Unit25B.10. 41. van Oijen AM. Single-molecule approaches to characterizing kinetics of biomolecular interactions. Curr Opin Biotechnol 2011;22:75–80. 42. Storn R. On the usage of differential evolution for function optimization. In: M Smith, M Lee, J Keller, J Yen (eds). North American Fuzzy Information Processing. New York, NY: IEEE, 1996, 519–23. 43. Laguna M, Marti R, Martı́ RC. Scatter search: methodology and implementation in C, Norwell, Massachusetts, USA. Vol. 24. Springer, 2003, New York, NY, USA. 44. Kennedy J, Eberhart R. Particle swarm optimization. In: Neural Networks, 1995 Proceedings. IEEE International Conference on IEEE, Perth, Australia. Vol. 4. 1995, 1942–8, Piscataway, NJ, USA. 45. Birge B. PSOt - a particle swarm optimization toolbox for use with matlab. In: Swarm Intelligence Symposium, 2003. SIS’03. Indianapolis, Indiana, USA. Proceedings of the 2003 IEEE. IEEE, 2003, pp. 182–6, Piscataway, NJ, USA. Parameter estimation methods for gene circuit modeling 46. Simon D. Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches. John Wiley & Sons, 2006, New Jersey, USA. 47. Baker SM, Poskar CH, Schreiber F, et al. An improved constraint filtering technique for inferring hidden states and parameters of a biological model. Bioinformatics 2013;29: 1052–9. 48. Courcelle J, Khodursky A, Peter B, et al. Comparative gene expression profiles following UV exposure in wild-type and SOS-deficient Escherichia coli. Genetics 2001;158:41–64. 49. Radman M. SOS repair hypothesis: phenomenology of an inducible DNA repair which is accompanied by mutagenesis. Basic Life Sci 1975;5A:355–67. 50. Michel B. After 30 years of study, the bacterial SOS response still surprises us. PLoS Biol 2005;3:e255. 51. Brent R, Ptashne M Mechanism of action of the lexA gene product. Proc Natl Acad Sci USA 1981;78:4204–8. 52. Sutton MD, Smith BT, Godoy VG, et al. The SOS response: recent insights into umuDC-dependent mutagenesis and DNA damage tolerance. Annu Rev Genet 2000;34:479–97. 53. Fernández De Henestrosa AR, Ogi T, Aoyagi S, et al. Identification of additional genes belonging to the LexA regulon in Escherichia coli. Mol Microbiol 2000;35:1560–72. 54. Zhang APP, Pigli YZ, Rice PA. Structure of the LexA-DNA complex and implications for SOS box measurement. Nature 2010; 466:883–6. 55. Ackers GK, Johnson AD, Shea MA. Quantitative model for gene regulation by k phage repressor. Proc Natl Acad Sci USA 1982;79:1129–33. 56. Wang X, Kuwahara H, Gao X. Modeling DNA affinity landscape through two-round support vector regression with weighted degree kernels. BMC Syst Biol 2014;8 (Suppl 5):S5. 57. McAdams HH, Arkin A. Simulation of prokaryotic genetic circuits. Annu Rev Biophys Biomol Struct 1998;27:199–224. | 999 58. Arkin A, Ross J, McAdams H. Stochastic kinetic analysis of developmental pathway bifurcation in phage lambda-infected Escherichia coli cells. Genetics 1998;149:1633–48. 59. Kuwahara H, Myers CJ, Samoilov MS. Temperature control of fimbriation circuit switch in uropathogenic Escherichia coli: quantitative analysis via automated model abstraction. PLoS Comput Biol 2010;6:e1000723. 60. Bintu L, Buchler NE, Garcia HG, et al. Transcriptional regulation by the numbers: models. Curr Opin Genet Dev 2005;15:116–24. 61. Chickarmane V, Troein C, Nuber UA, et al. Transcriptional dynamics of the embryonic stem cell switch. PLoS Comput Biol 2006;2:e123. 62. Zeigler RD, Cohen BA. Discrimination between thermodynamic models of cis-regulation using transcription factor occupancy data. Nucleic Acids Res 2014;42:2224–34. 63. Gardner TS, Cantor CR, Collins JJ. Construction of a genetic toggle switch in Escherichia coli. Nature 2000;403:339–42. 64. Gertz J, Cohen BA. Environment-specific combinatorial cisregulation in synthetic promoters. Mol Syst Biol 2009;5:244. 65. Nguyen Np, Myers C, Kuwahara H, et al. Design and analysis of a robust genetic muller C-element. J Theor Biol 2010;264: 174–87. 66. Ghaemmaghami S, Huh WK, Bower K, et al. Global analysis of protein expression in yeast. Nature 2003;425:737–41. 67. Belle A, Tanay A, Bitincka L, et al. Quantification of protein half-lives in the budding yeast proteome. Proc Natl Acad Sci USA 2006;103:13004–9. 68. Wan E, Van Der Merwe R. The unscented Kalman filter for nonlinear estimation. In: Adaptive Systems for Signal Processing, Communications, and Control Symposium 2000. Piscataway, NJ: IEEE, 2000, 153–8. 69. Ernst J, Bar-Joseph Z. STEM: a tool for the analysis of short time series gene expression data. BMC Bioinformatics 2006;7: 191.
© Copyright 2026 Paperzz