Probabilistic Approximations of Bio-Pathways Dynamics P.S. Thiagarajan School of Computing, National University of Singapore Joint Work with: Liu Bing, David Hsu Bio-pathways Intracellular: Gene regulatory networks Metabolic networks signalling pathways. A Common Modeling Approach View a bio-pathway as a network of biochemical reactions Model the network as a system of ODEs One for each molecular species Mass action, Michelis-Menten, Hill, etc. Study the ODE system to understand the dynamics of the bio-chemical network Major Hurdles Many rate constants not known must be estimated noisy data; limited precision; population-based High dimensional system closed-form solutions are impossible Must resort to numerical simulations Point values of initial states/data will not be available. a large number of numerical simulations needed for answering each question Our work: Decompose Analyze Compose (ISMB’06, WABI’07) Probabilistic models and methods to: approximate the ODE dynamics(CMSB’09, TCS’11). update models (RECOMB’10) The “Opinion Poll” Idea Start with an ODEs system. Discretize the time and value domains. Assume a (uniform) distribution of initial states Generate a “sufficiently” large number of trajectories by sampling the initial states and numerical simulations. Encode this collection of trajectories as a dynamic Bayesian network. ODEs DBN The “opinion poll” Idea Pay the one-time cost of constructing the Bayesian network. Amortize this cost: perform multiple analysis tasks using inferencing algorithms for DBNs. Time Discretization Observe the system only at discrete time points; in fact, only at finitely many time points x(t) x(t) = t3 + 4t + x(0) t0 t1 t2 ... ... ... ... tmax Value Discretization Observe only with bounded precision x(t) E D C B A t0 t1 t2 ... ... ... ... tmax Discrete Trajectories A trajectory now is sequence of discrete values C D D E D C B x(t) E D C B A t0 t1 t2 ... ... ... ... tmax Discretized Behavior Assume a prior distribution of the initial states. Behavior = (Uncountably) many sequences of discrete values. C D D x(t) E D D ... ... C B E D C B A t0 t1 t2 ... ... ... ... tmaxt Representation of Behavior In fact, a probabilistic transition system. Pr( (D, 2) (E, 3) ) is 0.8 D D C the “fraction” of the 0.2 trajectories residing in D E at t = 2 that land in E at D t = 3. E D C B D C B A t0 t1 t2 ... ... ... ... tmaxt Mathematical Justification The value space of the variables is assumed to be a compact subset C of In Z’ = F(Z), F is assumed to be differentiable everywhere in C. Mass-law, Michaelis-Menton,… for every rate constant k. Then the solution (flow) t : C C (for each t) exists, is unique, a bijection, continuous and hence measurable. But the transition probabilities can not be computed. Computational Approximation (s, i) – States; (s, i) (s’, i+1) -- Transitions Sample, say, 1000 times the initial states. Through numerical simulation, generate 1000 trajectories. C D D 0.8 0.2 E D C B D E D Pr((s, i) (s’ i+1)) is the fraction of the trajectories that are in s at ti which land in s’ at t i+1. C B A t0 t1 t2 ... ... ... ... tmaxt Infeasible Size! But the transition system will be huge. O(T . kn) k 2 and n ( 50-100) can be large. Compact Representation Exploit the network structure (additional independence assumptions) to construct a dynamic Bayesian network instead. The DBN is a factored form of a probabilistic transition system (Markov chain). The DBN Representation S0 S1 S2 S3 ... ... E0 E1 E2 E3 ... ... ES0 ES1 ES2 ES3 ... ... P0 P1 P2 P3 ... ... The DBN Representation A B C S0 S1 S2 S3 ... ... E0 E1 E2 E3 ... ... ES0 ES1 ES2 ES3 ... ... P0 P1 P2 P3 ... ... P(S2=C|S1=B,E1=C,ES1=B)= 0.2 P(S2=C|S1=B,E1=C,ES1=C)= 0.1 P(S2=A|S1=A,E1=A,ES1=C)= 0.05 . . . A Each node has a B CPT associated with C it. This specifies the . local (probabilistic) dynamics. B C C A B C B B B B B C S0 S1 S2 S3 ... ... E0 E1 E2 E3 ... ... ES0 ES1 ES2 ES3 ... ... P0 P1 P2 P3 ... ... C B C C ... ... C A C C P(S2=C|S1=B,E1=C,ES1=B)= 0.2 P(S2=C|S1=B,E1=C,ES1=C)= 0.1 P(S2=A|S1=A,E1=A,ES1=C)= 0.05 . . . Fill up the entries in A B the CPTs by C sampling, simulations and counting S0 S1 S2 S3 ... ... E0 E1 E2 E3 ... ... ES0 ES1 ES2 ES3 ... ... P0 P1 P2 P3 ... ... . Computational Approximation 500 Fill up the entries in A B the CPTs by C sampling, simulations and counting 1000 100 S0 S1 S2 S3 ... ... E0 E1 E2 E3 ... ... ES0 ES1 ES2 ES3 ... ... P0 P1 P2 P3 ... ... The Technique P(S2=C|S1=B,E1=C,ES1=B)= 100/500= 0 500 Fill up the entries in A B the CPTs by C sampling, simulations and counting 100 S0 S1 S2 S3 ... ... E0 E1 E2 E3 ... ... ES0 ES1 ES2 ES3 ... ... P0 P1 P2 P3 ... ... The Technique P(S2=C|S1=B,E1=C,ES1=B)= 100/500= 0.2 500 The size of the DBN is: O(T . n . kd) d will be usually much smaller than n. A B C 100 S0 S1 S2 S3 ... ... E0 E1 E2 E3 ... ... ES0 ES1 ES2 ES3 ... ... P0 P1 P2 P3 ... ... Unknown rate constants S0 S1 S2 S3 ... ... E0 E1 E2 E3 ... ... ES0 ES1 ES2 ES3 ... ... P0 P1 P2 P3 ... ... k2 Unknown rate constants During the numerical generation of a trajectory, the value of k2 does not change after sampling. S0 S1 S2 S3 ... ... E0 E1 E2 E3 ... ... ES0 ES1 ES2 ES3 ... ... P0 P1 P2 P3 ... ... k2 k2 k2 k2 Unknown rate constants Sample uniformly across all the Intervals. S0 S1 S2 S3 ... ... E0 E1 E2 E3 ... ... ES0 ES1 ES2 ES3 ... ... P0 P1 P2 P3 ... ... k2 k2 k2 k2 Bayesian inference This is the Factored Frontier algorithm (Murphy & Weiss) S0 S1 S2 S3 ... ... Approximate but fast: E0 E1 E2 E3 ... ... ES0 ES1 ES2 ES3 ... ... P0 P1 P2 P3 ... ... k2 k2 k2 k2 Parameter Estimation For each choice of (jnterval) values for unknown parameters, present experimental data as evidence and assign a score using FF. Return parameter estimates as maximal likelihoods. Similar application of FF to do sensitivity analysis. S0 S1 S2 S3 ... ... E0 E1 E2 E3 ... ... ES0 ES1 ES2 ES3 ... ... P0 P1 P2 P3 ... ... k2 k2 k2 k2 DBN based Analysis Our experiments with signalling pathways models (taken from the biomodels data base) suggest: The one-time cost of constructing the DBN can be easily recovered by using it to do parameter estimation and sensitivity analysis. Segmentation Clock Network Governs the periodic formation process of somites during embryogenesis The underlying signaling network couples three oscillating pathways: FGF, Wnt and Notch. 22 ODEs; 75 unknown parameters 5 equal intervals; t = 5 min; 100 time points; 2.5 million trajectories; 3.1 hours on a 10 PC cluster. Results LM: Levenberg–Marquardt 40 unknown parameters; Synthetic data: 8 proteins, 10 time points (noise with 5% variance added) Global sensitivity analysis ODE based: 81 hours DBN based: 3 hours GA: Genetic Algorithm SRES: Evolutionary Strategy PSO: Particle Swarm Optimization BDM: our method EGF-NGF Pathway ODE model (Brown et al. 2004) 32 species 48 parameters Features: Good size Feedback loops DBN construction Settings 5 intervals 1 min time-step, 100 minutes 3 x 106 samples Runtime 4 hours on a cluster of 10 PCs Parameter Estimation 20 (out of 48) unknown parameters Synthetic data: 9 proteins, 10 time points LM: Levenberg–Marquardt GA: Genetic Algorithm SRES: Evolutionary Strategy PSO: Particle Swarm Optimization BDM: our method Sensitivity Analysis Running time ODE based: 22 hours DBN based: 34 minutes Complement System Complement system is a critical part of the immune system Ricklin et al. 2007 35 Amplified Classical Lectin Inhibition response pathway pathway Goals Quantitatively understand the regulatory mechanisms of complement system How is the complement activity enhanced under inflammation condition? How is the excessive response of the complement avoided? The model: Classical pathway + the lectin pathway Inhibitory mechanisms 37 enhancement C4BP Complement System ODE Model 42 Species 45 Reactions Mass law Michaelis-Menten kinetics 92 Parameters (71 unknown) DBN Construction Settings 6 intervals 100s time-step, 12600s 1.2 x 106 samples Runtime 12 hours on a cluster of 20 PCss Parameter estimation Training data 4 proteins, 7 time points, 4 conditions 39 * inflammation condition: pH 6.5 calcium 2 nM Model Calibration (parameter estimation) 40 Model validation 41 Validated the model using previous published data (Zhang et al 2009) Enhancement mechanism 42 The antimicrobial response is sensitive to the pH and calcium level Model prediction: The regulatory effect of C4BP C4BP maintains classical complement activation but delays the maximal response time. But attenuates the lectin pathway activation. Classical pathway 43 Lectin pathway The regulatory mechanism of C4BP The major inhibitory role of C4BP is to facilitate the decay of C3 convertase A B C D A D B 44 C Results Both predictions concerning C4BP were experimentally verified. “A Computational and Experimental Study of the Regulatory Mechanisms of the Complement System” Liu et.al: PLoS Comp.Bio7(2011) BioModels database (303.Liu). Current Collaborators Ding Jeak Ling Immune signalling during Multiple infections Marie-Veronique Clement DNA damage/response pathway G V Shivashankar Chromosome co-localizations and co-regulations Current work Parametrized version of FF Reduce errors by investing more computational time. Probabilistic verification methods. Monte Carlo, FF based, … Implementation on a GPU platform. Conclusion The DBN approximation method is useful and efficient. This is all very well in practice but will it work in theory? Our group: S. Akshay David Hsu Liu Bing Suchee Palaniappan Abhinav Dubey Blaise Genest P.S. Thiagarajan Wang Junjie Benjamin Gyori Gireedhar Venkatachalam
© Copyright 2026 Paperzz