MAPK~{p1,p2} - Virtual building 8

Formal Biology of the Cell
Modeling, Computing and Reasoning with
Constraints
François Fages,
Constraint Programming Group, INRIA Rocquencourt
mailto:[email protected]
http://contraintes.inria.fr/
Transpose concepts and tools from programming theory to systems biology
• Formal Methods of Program Verification to Systems Biology,
• Constraint Logic Programming and Constraint-based Model Checking
In course,
• Learn bits of cell biology through computational models,
• Develop new formalisms, languages and algorithms coming from biological questions
François Fages
MPRI Bio-info 2007
Systems Biology
•Multidisciplinary field aiming at getting over the complexity walls to reason
about biological processes at the system level.
• Conferences ICSB, CMSB, … journal TCSB, …
•Virtual cell: emulate high-level biological processes in terms of their
biochemical basis at the molecular level (in silico experiments)
•Bioinformatics: end 90’s, genomic sequences  post-genomic data (RNA
expression, protein synthesis, protein-protein interactions,… )
•Need for a strong effort on:
- the formal representation of biological processes,
- formal tools for modeling and reasoning about their global behavior.
François Fages
MPRI Bio-info 2007
Language Approach to Cell Systems Biology
Qualitative models: from diagrammatic notation to
• Boolean networks [Thomas 73]
• Petri Nets [Reddy 93]
• Milner’s π–calculus [Regev-Silverman-Shapiro 99-01, Nagasali et al. 00]
• Bio-ambients [Regev-Panina-Silverman-Cardelli-Shapiro 03]
• Pathway logic [Eker-Knapp-Laderoute-Lincoln-Meseguer-Sonmez 02]
• Transition systems [Chabrier-Chiaverini-Danos-Fages-Schachter 04]
Biochemical abstract machine BIOCHAM-1 [Chabrier-Fages 03]
Quantitative models: from differential equation systems to
• Hybrid Petri nets [Hofestadt-Thelen 98, Matsuno et al. 00]
• Hybrid automata [Alur et al. 01, Ghosh-Tomlin 01]
• Hybrid concurrent constraint languages [Bockmayr-Courtois 01]
• Rules with continuous dynamics BIOCHAM-2 [Chabrier-Fages-Soliman 04]
François Fages
MPRI Bio-info 2007
The Biochemical Abstract Machine BIOCHAM
Software environment based on two formal languages:
1. Biocham Rule Language for Modeling Biochemical Systems
1. Syntax of molecules, compartments and reactions
2. Semantics at 3 abstraction levels: Boolean, Concentrations, Populations
2. Biocham Temporal Logic for Formalizing Biological Properties
1. CTL for Boolean semantics
2. Constraint LTL for concentration semantics, PCTL for stochastic semantics
Machine learning Rules and Parameters from Temporal Properties
1. Learning reaction rules from CTL specification
2. Learning kinetic parameter values from Constraint-LTL specification
Internship topics: http://contraintes.inria.fr
François Fages
MPRI Bio-info 2007
Overview of the Lectures
1.
2.
3.
4.
5.
6.
7.
8.
Formal molecules and reaction rules in BIOCHAM.
Formal biological properties in temporal logic. Symbolic model-checking.
Continuous dynamics. Kinetics and transport models.
Computational models of the cell cycle control.
Abstract interpretation and typing of biochemical networks
Machine learning reaction rules from temporal properties.
Constraint-based model checking. Learning kinetic parameter values.
Constraint Logic Programming approach to protein structure prediction.
François Fages
MPRI Bio-info 2007
References
A wonderful textbook:
Molecular Cell Biology. 5th Edition, 1100 pages+CD, Freeman Publ.
Lodish, Berk, Zipursky, Matsudaira, Baltimore, Darnell. Nov. 2003.
Modeling dynamic phenomena in molecular and cellular biology.
Segel. Cambridge Univ. Press. 1987.
Modeling and querying bio-molecular interaction networks.
Chabrier, Chiaverini, Danos, Fages, Schächter. Theoretical Computer Science 04
Machine learning biochemical reaction networks.
Calzone, Chabrier, Fages, Soliman. Trans. Comp. Syst. Biology. 2006
The Biochemical Abstract Machine BIOCHAM. Fages, Soliman
http://contraintes.inria.fr/BIOCHAM
François Fages
MPRI Bio-info 2007
Map of Course 1
1. BIOCHAM syntax
• Proteins: complexation and phosphorylation
• DNA and genes: replication and transcription
• Reaction and transport rules
2. Boolean semantics: concurrent transition system, Kripke structure
• States and transitions
• Examples: RTK membrane receptors, MAPK signaling pathways
François Fages
MPRI Bio-info 2007
2. Syntax: a Simple Algebra of Cell Molecules
Small molecules: covalent bonds 50-200 kcal/mol
• 70% water
• 1% ions
• 6% amino acids (20), nucleotides (5),
fats, sugars, ATP, ADP, …
Macromolecules: hydrogen bonds, ionic, hydrophobic, Waals 1-5 kcal/mol
Stability and bindings determined by the number of weak bonds: 3D shape
• 20% proteins (50-104 amino acids)
• RNA (102-104 nucleotides AGCU)
• DNA (102-106 nucleotides AGCT)
François Fages
MPRI Bio-info 2007
Structure Levels of Proteins
1) Primary structure: word of n amino acids residues (20n possibilities)
linked with C-N bonds
Example: MPRI
Methionine-Proline-Arginine-Isoleucine
2) Secondary: word of m a-helix, b-strands, random coils,… (3m-10m)
stabilized by hydrogen bonds H---O
3) Tertiary 3D structure: spatial folding
stabilized by
hydrophobic
interactions
François Fages
MPRI Bio-info 2007
Formal proteins
Cyclin dependent kinase 1
(free, inactive)
Complex Cdk1-Cyclin B
(low activity)
Phosphorylated form
at site threonine 161
(high activity)
Cdk1
Cdk1–CycB
Cdk1~{thr161}-CycB
BIOCHAM syntax
François Fages
MPRI Bio-info 2007
Deoxyribonucleic Acid DNA
1) Primary structure: word over 4 nucleotides
Adenine, Guanine, Cytosine, Thymine
2) Secondary structure:
double helix of pairs
A--T and C---G stabilized
by hydrogen bonds
François Fages
MPRI Bio-info 2007
DNA: Genome Size
Species
Genome size Chromosomes
Coding DNA
E. Coli (bacteria)
5 Mb
1 circular
100 %
S. Cerevisae (yeast)
12 Mb
16
70 %
François Fages
…
3 Gb
…
15 Gb
…
140 Gb
MPRI Bio-info 2007
DNA: Genome Size
Species
Genome size Chromosomes
Coding DNA
E. Coli (bacteria)
5 Mb
1 circular
100 %
S. Cerevisae (yeast)
12 Mb
16
70 %
Mouse, Human
3 Gb
20, 23
15 %
…
15 Gb
…
140 Gb
3,200,000,000 pairs of nucleotides
single nucleotide polymorphism 1 / 2kb
François Fages
MPRI Bio-info 2007
Genome Size
Species
Genome size Chromosomes
Coding DNA
E. Coli (bacteria)
4 Mb
1
100 %
S. Cerevisae (yeast)
12 Mb
16
70 %
Mouse, Human
3 Gb
20, 23
15 %
Onion
15 Gb
8
1%
…
140 Gb
François Fages
MPRI Bio-info 2007
Genome Size
Species
Genome size Chromosomes
Coding DNA
E. Coli (bacteria)
4 Mb
1
100 %
S. Cerevisae (yeast)
12 Mb
16
70 %
Mouse, Human
3 Gb
20, 23
15 %
Onion
15 Gb
8
1%
Lungfish
140 Gb
François Fages
0.7 %
MPRI Bio-info 2007
DNA Replication
Separation of the two helices and
production of one complementary strand for each copy
(from one or several starting points of replication)
François Fages
MPRI Bio-info 2007
Syntax of Genes
Part of DNA, unique
Activation
binding of promotion factor
#E2
#E2-E2f13-DP12
Repression
binding of another molecule
François Fages
MPRI Bio-info 2007
Transcription: DNA gene  pRNA  mRNA  Protein
Genes: parts of DNA
1. Activation (Inhibition): transcription factors (inhibitors) bind to the
regulatory region of the gene #E2 + E2F13-DP12 => #E2-E2F13-DP12
2. Transcription: RNA polymerase copies the DNA from start to stop
positions into a single stranded pre-mature messenger pRNA
_=[#E2-E2F13-DP12]=> pRNAcycA
3. (Alternative) splicing: non coding regions of pRNA are removed giving
mature messenger mRNA pRNAcycA => mRNAcycA
4. Protein synthesis: mRNA moves to cytoplasm and binds to ribosome to
assemble a protein mRNAcycA => mRNAcycA::cyt
mRNAcycA::cyt + ribosome::cyt => cycA::cyt
François Fages
MPRI Bio-info 2007
BIOCHAM Syntax of Objects
E == compound | E-E | E~{p1,…,pn}
Compound: molecule, #gene binding site, abstract @process…
- : binding operator for protein complexes, gene binding sites, …
Associative and commutative.
~{…}: modification operator for phosphorylated sites, …
Set of modified sites (Associative, Commutative, Idempotent).
O == E | E::location
Location: symbolic compartment (nucleus, cytoplasm, membrane, …)
S == _ | O+S
+ : solution operator (Associative, Commutative, Neutral _)
François Fages
MPRI Bio-info 2007
Elementary Rule Schemas
Complexation: A + B => A-B
cdk1+cycB => cdk1–cycB
François Fages
Decomplexation A-B => A + B
MPRI Bio-info 2007
Elementary Rule Schemas
Complexation: A + B => A-B
cdk1+cycB => cdk1–cycB
Decomplexation A-B => A + B
Phosphorylation: A =[C]=> A~{p}
Dephosphorylation A~{p} =[C]=> A
Cdk1-CycB =[Myt1]=> Cdk1~{thr161}-CycB
Cdk1~{thr14,tyr15}-CycB =[Cdc25~{Nterm}]=> Cdk1-CycB
François Fages
MPRI Bio-info 2007
Elementary Rule Schemas
Complexation: A + B => A-B
cdk1+cycB => cdk1–cycB
Decomplexation A-B => A + B
Phosphorylation: A =[C]=> A~{p}
Dephosphorylation A~{p} =[C]=> A
Cdk1-CycB =[Myt1]=> Cdk1~{thr161}-CycB
Cdk1~{thr14,tyr15}-CycB =[Cdc25~{Nterm}]=> Cdk1-CycB
Synthesis: _ =[C]=> A.
_=[#Ge2-E2f13-Dp12]=>cycA
François Fages
Degradation: A =[C]=> _.
cycE =[@UbiPro]=> _
(not for cycE-cdk2 which is stable)
MPRI Bio-info 2007
Elementary Rule Schemas
Complexation: A + B => A-B
cdk1+cycB => cdk1–cycB
Decomplexation A-B => A + B
Phosphorylation: A =[C]=> A~{p}
Dephosphorylation A~{p} =[C]=> A
Cdk1-CycB =[Myt1]=> Cdk1~{thr161}-CycB
Cdk1~{thr14,tyr15}-CycB =[Cdc25~{Nterm}]=> Cdk1-CycB
Synthesis: _ =[C]=> A.
_=[#Ge2-E2f13-Dp12]=>cycA
Degradation: A =[C]=> _.
cycE =[@UbiPro]=> _
(not for cycE-cdk2 which is stable)
Transport: A::L1 => A::L2
Cdk1~{p}-CycB::cytoplasm=>Cdk1~{p}-CycB::nucleus
François Fages
MPRI Bio-info 2007
From Syntax to Semantics
R ::= S => S | kinetic-expression for R
A =[C]=> B stands for A+C => B+C
A <=> B stands for A=>B and B=>A, etc.
Systems Biology Markup Language: exchange format, no semantics
BIOCHAM : three abstraction levels
1. Boolean Semantics: presence-absence of molecules
1. Concurrent Transition System (asynchronous, non-deterministic)
2. Differential Semantics: concentration
1. Ordinary Differential Equations or Hybrid system (deterministic)
3. Stochastic Semantics: number of molecules
1. Continuous time Markov chain
François Fages
MPRI Bio-info 2007
The Actin-Myosin two-stroke Engine with ATP fuel
Myosin + ATP => Myosin-ATP
Myosin-ATP => Myosin + ADP
http://www.sci.sdsu.edu/movies
François Fages
MPRI Bio-info 2007
The Actin-Myosin two-stroke Engine with ATP fuel
Myosin + ATP => Myosin-ATP
Myosin-ATP => Myosin + ADP
http://www.sci.sdsu.edu/movies
François Fages
MPRI Bio-info 2007
The Actin-Myosin two-stroke Engine with ATP fuel
Myosin + ATP => Myosin-ATP
Myosin-ATP => Myosin + ADP
http://www.sci.sdsu.edu/movies
François Fages
MPRI Bio-info 2007
The Actin-Myosin two-stroke Engine with ATP fuel
Myosin + ATP => Myosin-ATP
Myosin-ATP => Myosin + ADP
http://www.sci.sdsu.edu/movies
http://www-rocq.inria.fr/sosso/icema2
François Fages
MPRI Bio-info 2007
Cell to Cell Signaling by Hormones and Receptors
Signals: insulin, adrenaline, steroids, EGF, …, Delta, …,
nutriments, light, pressure, …
Receptors: tyrosine kinases, G-protein coupled, Notch, …
L + R <=> L-R
RAS-GDP =[L-R]=> RAS-GTP
François Fages
MPRI Bio-info 2007
Five MAP Kinase Pathways
in Budding Yeast
(Saccharomyces Cerevisiae)
François Fages
MPRI Bio-info 2007
MAPK Signaling Pathways
Input:
RAF
• Activated by the receptor
RAF-p14-3-3 + RAS-GTP
=> RAF + p14-3-3 + RAS-GDP
Output:
MAPK~{T183,Y185}
• moves to the nucleus
• phosphorylates a
transcription factor
• which stimulates gene
transcription
François Fages
MPRI Bio-info 2007
MAPK Signaling Pathway in BIOCHAM
RAF + RAFK <=> RAF-RAFK.
Pattern variables $P for
RAF-RAFK => RAFK + RAF~{p1}.
• Phosphorylation sites
RAF~{p1} + RAFPH <=> RAF~{p1}-RAFPH.
• Molecules
RAF~{p1}-RAFPH => RAF + RAFPH.
MEK~$P + RAF~{p1} <=> MEK~$P-RAF~{p1}
with constraints
where p2 not in $P.
MEK~{p1}-RAF~{p1} => MEK~{p1,p2} + RAF~{p1}.
MEK-RAF~{p1} => MEK~{p1} + RAF~{p1}.
MEKPH + MEK~{p1}~$P <=> MEK~{p1}~$P-MEKPH.
BIOCHAM rules are
MEK~{p1}-MEKPH => MEK + MEKPH.
expanded in
MEK~{p1,p2}-MEKPH => MEK~{p1} + MEKPH.
BIOCHAM-0 rules
MAPK~$P + MEK~{p1,p2} <=> MAPK~$P-MEK~{p1,p2}
without patterns
where p2 not in $P.
MAPKPH + MAPK~{p1}~$P <=> MAPK~{p1}~$P-MAPKPH.
MAPK~{p1}-MAPKPH => MAPK + MAPKPH.
MAPK~{p1,p2}-MAPKPH => MAPK~{p1} + MAPKPH.
MAPK-MEK~{p1,p2} => MAPK~{p1} + MEK~{p1,p2}.
MAPK~{p1}-MEK~{p1,p2} => MAPK~{p1,p2}+MEK~{p1,p2}.
François Fages
MPRI Bio-info 2007
Reaction Model of the MAPK Cascade [Levchenko et al. PNAS 2000]
(MA(1), MA(0.4)) for RAF + RAFK <=> RAF-RAFK.
(MA(0.5),MA(0.5)) for RAF~{p1} + RAFPH <=> RAF~{p1}-RAFPH.
(MA(3.3),MA(0.42)) for MEK~$P + RAF~{p1} <=> MEK~$P-RAF~{p1} where p2 not in $P.
(MA(10),MA(0.8)) for MEKPH + MEK~{p1}~$P <=> MEK~{p1}~$P-MEKPH.
(MA(20),MA(0.7)) for MAPK~$P + MEK~{p1,p2} <=> MAPK~$P-MEK~{p1,p2} where p2 not in $P.
(MA(5),MA(0.4)) for MAPKPH + MAPK~{p1}~$P <=> MAPK~{p1}~$P-MAPKPH.
MA(0.1) for RAF-RAFK => RAFK + RAF~{p1}.
MA(0.1) for RAF~{p1}-RAFPH => RAF + RAFPH.
MA(0.1) for MEK~{p1}-RAF~{p1} => MEK~{p1,p2} + RAF~{p1}.
MA(0.1) for MEK-RAF~{p1} => MEK~{p1} + RAF~{p1}.
MA(0.1) for MEK~{p1}-MEKPH => MEK + MEKPH.
MA(0.1) for MEK~{p1,p2}-MEKPH => MEK~{p1} + MEKPH.
MA(0.1) for MAPK-MEK~{p1,p2} => MAPK~{p1} + MEK~{p1,p2}.
MA(0.1) for MAPK~{p1}-MEK~{p1,p2} => MAPK~{p1,p2} + MEK~{p1,p2}.
MA(0.1) for MAPK~{p1}-MAPKPH => MAPK + MAPKPH.
MA(0.1) for MAPK~{p1,p2}-MAPKPH => MAPK~{p1} + MAPKPH.
François Fages
MPRI Bio-info 2007
Bipartite Proteins-Reactions Graph of MAPK
GraphViz
http://www.research.att.co/sw/tools/graphviz
François Fages
MPRI Bio-info 2007
Influence
Graph
inferred from
the
syntactical
reaction
model of the
MAPK
“cascade”
Negative
feedback
loops…
[Fages Soliman
CMSB’06]
François Fages
MPRI Bio-info 2007
Differential Simulation
François Fages
MPRI Bio-info 2007
Boolean Simulation
François Fages
MPRI Bio-info 2007
Automatic Generation of CTL Properties
reachable(MAPK~{p1}))
reachable(!(MAPK~{p1})))
oscil(MAPK~{p1}))
…
reachable(MAPKPH-MAPK~{p1}))
reachable(!(MAPKPH-MAPK~{p1})))
oscil(MAPKPH-MAPK~{p1}))
AG(!(MAPKPH-MAPK~{p1})->checkpoint(MAPKPH,MAPKPH-MAPK~{p1})))
AG(!(MAPKPH-MAPK~{p1})->checkpoint(MAPK~{p1},MAPKPH-MAPK~{p1})))
…
reachable(MAPK~{p1,p2}))
reachable(!(MAPK~{p1,p2})))
oscil(MAPK~{p1,p2}))
…
François Fages
MPRI Bio-info 2007
Boolean Semantics
Associate:
• Boolean state variables to molecules
denoting the presence/absence of molecules in the cell or compartment
• A Finite concurrent transition system [Shankar 93] to rules
(asynchronous) over-approximating the set of all possible behaviors
A reaction A+B=>C+D is translated into 4 transition rules for the possibly
complete consumption of reactants:
A+BA+B+C+D
A+BA+B +C+D
A+BA+B+C+D
A+BA+B+C+D
François Fages
MPRI Bio-info 2007
Kripke Structure K=(S,R)
Given:
V is a set of state variables, with domain D,
T a set of transition rules between states.
Associate:
a Kripke structure (S,R) where
S=DV is the set of possible states with variables ranging in domain D
RSxS is the total relation induced by T, that is
(A,B) is in R if there exists a transition rule from state A to B
(A,A) is in R if there exist no transition from state A.
François Fages
MPRI Bio-info 2007