Computational Modeling and Analysis of Biopathways: A

Probabilistic Approximations of
Bio-Pathways Dynamics
P.S. Thiagarajan
School of Computing, National University of Singapore
Joint Work with: Liu Bing, David Hsu
Bio-pathways

Intracellular:

Gene regulatory networks
 Metabolic networks
 signalling pathways.
A Common Modeling Approach
View a bio-pathway as a network of biochemical reactions
 Model the network as a system of ODEs


One for each molecular species
 Mass action, Michelis-Menten, Hill, etc.

Study the ODE system to understand the
dynamics of the bio-chemical network
Major Hurdles

Many rate constants not known

must be estimated
noisy data; limited precision; population-based
High dimensional system



closed-form solutions are impossible
 Must resort to numerical simulations
 Point values of initial states/data will not be available.
 a large number of numerical simulations needed for
answering each question
Our work:
Decompose  Analyze  Compose
(ISMB’06, WABI’07)
 Probabilistic models and methods to:


approximate the ODE dynamics(CMSB’09,
TCS’11).
 update models (RECOMB’10)
The “Opinion Poll” Idea




Start with an ODEs system.
Discretize the time and value domains.
Assume a (uniform) distribution of initial states
Generate a “sufficiently” large number of
trajectories by


sampling the initial states and numerical simulations.
Encode this collection of trajectories as a
dynamic Bayesian network.
 ODEs  DBN
The “opinion poll” Idea
Pay the one-time cost of constructing the
Bayesian network.
 Amortize this cost:


perform multiple analysis tasks using
inferencing algorithms for DBNs.
Time Discretization

Observe the system only at discrete time points; in fact,
only at finitely many time points
x(t)
x(t) = t3 + 4t + x(0)
t0
t1
t2
... ...
... ...
tmax
Value Discretization

Observe only with bounded precision
x(t)
E
D
C
B
A
t0
t1
t2
... ...
... ...
tmax
Discrete Trajectories

A trajectory now is sequence of discrete values
C
D
D
E
D
C
B
x(t)
E
D
C
B
A
t0
t1
t2
... ...
... ...
tmax
Discretized Behavior


Assume a prior distribution of the initial states.
Behavior = (Uncountably) many sequences of discrete
values.
C
D
D
x(t)
E
D
D
... ...
C
B
E
D
C
B
A
t0
t1
t2
... ...
... ...
tmaxt
Representation of Behavior


In fact, a probabilistic transition system.
Pr( (D, 2)
(E, 3) ) is
0.8
D
D
C
the “fraction” of the
0.2
trajectories residing in D
E
at t = 2 that land in E at
D
t = 3.
E
D
C
B
D
C
B
A
t0
t1
t2
... ...
... ...
tmaxt
Mathematical Justification


The value space of the variables is assumed to be a
compact subset C of
In Z’ = F(Z), F is assumed to be differentiable
everywhere in C.




Mass-law, Michaelis-Menton,…
for every rate constant k.
Then the solution (flow) t : C  C (for each t) exists, is
unique, a bijection, continuous and hence measurable.
But the transition probabilities can not be computed.
Computational Approximation
(s, i) – States;
(s, i)  (s’, i+1) -- Transitions
Sample, say, 1000 times the
initial states.
Through numerical
simulation, generate 1000
trajectories.
C
D
D
0.8
0.2
E
D
C
B
D
E
D
Pr((s, i)  (s’ i+1)) is the
fraction of the trajectories that
are in s at ti which land in s’
at t i+1.
C
B
A
t0
t1
t2
... ...
... ...
tmaxt
Infeasible Size!

But the transition system will be huge.

O(T . kn)

k  2 and n ( 50-100) can be large.
Compact Representation
Exploit the network structure (additional
independence assumptions) to construct a
dynamic Bayesian network instead.
 The DBN is a factored form of a probabilistic
transition system (Markov chain).

The DBN Representation
S0
S1
S2
S3
... ...
E0
E1
E2
E3
... ...
ES0
ES1
ES2
ES3
... ...
P0
P1
P2
P3
... ...
The DBN Representation
A
B
C
S0
S1
S2
S3
... ...
E0
E1
E2
E3
... ...
ES0
ES1
ES2
ES3
... ...
P0
P1
P2
P3
... ...
P(S2=C|S1=B,E1=C,ES1=B)= 0.2
P(S2=C|S1=B,E1=C,ES1=C)= 0.1
P(S2=A|S1=A,E1=A,ES1=C)= 0.05
.
.
.
A
Each node has a
B
CPT associated with C
it.
 This specifies the
. local (probabilistic)
dynamics.

B
C
C
A
B
C
B
B
B
B
B
C
S0
S1
S2
S3
... ...
E0
E1
E2
E3
... ...
ES0
ES1
ES2
ES3
... ...
P0
P1
P2
P3
... ...
C
B
C
C
... ...
C
A
C
C
P(S2=C|S1=B,E1=C,ES1=B)= 0.2
P(S2=C|S1=B,E1=C,ES1=C)= 0.1
P(S2=A|S1=A,E1=A,ES1=C)= 0.05
.
.
.
Fill up the entries in A
B
the CPTs by
C
sampling, simulations
and counting

S0
S1
S2
S3
... ...
E0
E1
E2
E3
... ...
ES0
ES1
ES2
ES3
... ...
P0
P1
P2
P3
... ...
.
Computational Approximation
500

Fill up the entries in A
B
the CPTs by
C
sampling, simulations
and counting
1000
100
S0
S1
S2
S3
... ...
E0
E1
E2
E3
... ...
ES0
ES1
ES2
ES3
... ...
P0
P1
P2
P3
... ...
The Technique
P(S2=C|S1=B,E1=C,ES1=B)= 100/500= 0
500

Fill up the entries in A
B
the CPTs by
C
sampling, simulations
and counting
100
S0
S1
S2
S3
... ...
E0
E1
E2
E3
... ...
ES0
ES1
ES2
ES3
... ...
P0
P1
P2
P3
... ...
The Technique
P(S2=C|S1=B,E1=C,ES1=B)= 100/500= 0.2
500
The size of the
DBN is:
O(T . n . kd)
d will be usually
much smaller than
n.
A
B
C
100
S0
S1
S2
S3
... ...
E0
E1
E2
E3
... ...
ES0
ES1
ES2
ES3
... ...
P0
P1
P2
P3
... ...
Unknown rate constants
S0
S1
S2
S3
... ...
E0
E1
E2
E3
... ...
ES0
ES1
ES2
ES3
... ...
P0
P1
P2
P3
... ...
k2
Unknown rate constants
During the numerical
generation of a
trajectory, the value
of k2 does not change
after sampling.
S0
S1
S2
S3
... ...
E0
E1
E2
E3
... ...
ES0
ES1
ES2
ES3
... ...
P0
P1
P2
P3
... ...
k2
k2
k2
k2
Unknown rate constants
Sample uniformly
across all the
Intervals.
S0
S1
S2
S3
... ...
E0
E1
E2
E3
... ...
ES0
ES1
ES2
ES3
... ...
P0
P1
P2
P3
... ...
k2
k2
k2
k2
Bayesian inference
This is the Factored Frontier
algorithm (Murphy & Weiss)
S0
S1
S2
S3
... ...
Approximate but fast:
E0
E1
E2
E3
... ...
ES0
ES1
ES2
ES3
... ...
P0
P1
P2
P3
... ...
k2
k2
k2
k2
Parameter Estimation
For each choice of (jnterval) values
for unknown parameters, present
experimental data as evidence and
assign a score using FF.
Return parameter estimates as
maximal likelihoods.
Similar application of FF to do
sensitivity analysis.
S0
S1
S2
S3
... ...
E0
E1
E2
E3
... ...
ES0
ES1
ES2
ES3
... ...
P0
P1
P2
P3
... ...
k2
k2
k2
k2
DBN based Analysis

Our experiments with signalling pathways
models (taken from the biomodels data
base) suggest:
 The
one-time cost of constructing the DBN
can be easily recovered by using it to do
parameter estimation and sensitivity analysis.
Segmentation Clock Network




Governs the periodic formation
process of somites during
embryogenesis
The underlying signaling
network couples three oscillating
pathways: FGF, Wnt and Notch.
22 ODEs; 75 unknown
parameters
5 equal intervals; t = 5 min;
100 time points; 2.5 million
trajectories; 3.1 hours on a 10
PC cluster.
Results
LM: Levenberg–Marquardt
40 unknown
parameters;
Synthetic data: 8
proteins, 10 time
points (noise with
5% variance
added)

Global sensitivity analysis


ODE based: 81 hours
DBN based: 3 hours
GA: Genetic Algorithm
SRES: Evolutionary Strategy
PSO: Particle Swarm Optimization
BDM: our method
EGF-NGF Pathway

ODE model (Brown et al. 2004)



32 species
48 parameters
Features:



Good size
Feedback loops
DBN construction

Settings




5 intervals
1 min time-step, 100 minutes
3 x 106 samples
Runtime

4 hours on a cluster of 10 PCs
Parameter Estimation
20 (out of 48) unknown
parameters
Synthetic data: 9
proteins, 10 time points
LM: Levenberg–Marquardt
GA: Genetic Algorithm
SRES: Evolutionary Strategy
PSO: Particle Swarm Optimization
BDM: our method
Sensitivity Analysis

Running time
 ODE
based: 22 hours
 DBN based: 34 minutes
Complement System

Complement system is a critical part of the immune system
Ricklin et al. 2007
35
Amplified
Classical
Lectin
Inhibition
response
pathway
pathway
Goals

Quantitatively understand the regulatory
mechanisms of complement system



How is the complement activity enhanced under inflammation
condition?
How is the excessive response of the complement avoided?
The model:

Classical pathway + the lectin pathway


Inhibitory mechanisms

37
enhancement
C4BP
Complement System

ODE Model


42 Species
45 Reactions




Mass law
Michaelis-Menten kinetics
92 Parameters (71 unknown)
DBN Construction

Settings




6 intervals
100s time-step, 12600s
1.2 x 106 samples
Runtime

12 hours on a cluster of 20 PCss
Parameter estimation

Training data

4 proteins, 7 time points, 4 conditions
39 * inflammation condition: pH 6.5 calcium 2 nM
Model Calibration (parameter estimation)
40
Model validation

41
Validated the model using previous published data (Zhang et al
2009)
Enhancement mechanism

42
The antimicrobial response is sensitive to the pH and calcium
level
Model prediction: The regulatory effect of C4BP


C4BP maintains classical complement activation but delays the
maximal response time.
But attenuates the lectin pathway activation.
Classical pathway
43
Lectin pathway
The regulatory mechanism of
C4BP

The major inhibitory role of C4BP is to facilitate the decay of
C3 convertase
A
B
C
D
A
D
B
44
C
Results

Both predictions concerning C4BP were experimentally
verified.

“A Computational and Experimental Study of the Regulatory
Mechanisms of the Complement System”
 Liu et.al: PLoS Comp.Bio7(2011)

BioModels database (303.Liu).
Current Collaborators
Ding Jeak Ling
Immune signalling during
Multiple infections
Marie-Veronique Clement
DNA damage/response pathway
G V Shivashankar
Chromosome co-localizations
and co-regulations
Current work

Parametrized version of FF


Reduce errors by investing more
computational time.
Probabilistic verification methods.
 Monte

Carlo, FF based, …
Implementation on a GPU platform.
Conclusion
The DBN approximation method is useful
and efficient.
 This is all very well in practice but will it
work in theory?

Our group:
S. Akshay
David Hsu
Liu Bing
Suchee
Palaniappan
Abhinav Dubey
Blaise Genest
P.S.
Thiagarajan
Wang Junjie
Benjamin Gyori
Gireedhar
Venkatachalam