Investigating Conformational Changes of Biological

Investigating Conformational Changes of Biological
Macromolecules Using Markov State Models
Xuhui Huang
Department of Chemistry
The Hong Kong University of Science and Technology
Static Structure of Cell at Atomic Resolution
Electron tomography structure
of an entire bacterial cell
Aloy, P and Russel R.B., Nat Rev Mol Cell Biol. 2006:188-97
Bio-molecules Function in a Dynamic Fashion
Central Dogma of Molecular Biology:
Transcription!
DNA
mRNA
Translation!
Protein
Central Dogma:
Transcription!
DNA
RNA Pol!
mRNA
Central Dogma:
Transcription!
DNA
RNA Pol!
mRNA
A “Scientific
Fiction” movie
How these bio-molecules dynamically operate at molecular
level remain largely mystery, and difficult for experiments!
One Major Challenge: Timescale Gap
Atomistic MD Simulations
Experiments
Pande
Timescale Gap
DNA
RNA Pol!
mRNA
-  ~18ns/day with 1000 CPUs
-  1 millisecond MD
simulation takes 1000
CPUs >150 years
> 420,000 atoms
Pande
Fast Hardware …
http://www.ibm.com
Performance decreases with more cores (1.8 M atom system)
Anton (D. E.
Shaw)
~50-microsecond individual MD
simulations for GPCRs. (~300K
atoms)
Dror et. al. PNAS, 108, 13118, 2011
http://www.deshawresearch.com/
Timescale Gap is Largely Due to Rugged
Free Energy Landscapes
Trapped in local minimums of rugged landscapes
http://gold.cchem.berkeley.edu/
Outline
•  Hierarchical Nyström Method for Constructing
Markov State Models
•  The Automatic state Partitioning algorithm for
Multi-body systems (APM)
•  Some applications in Biology
Conformational Dynamics Often Resemble a
Markov Process
Molecular Dynamics
n-butane has three
metastable states
Markov Model Can Describe Conformational
Dynamics by Coarse-grained Time and Space
•  Coarse-graining of space into discrete states
will introduce memory:
•  Coarse-graining of time can make the
memory appear short if there is separation of
timescales
•  This resembles a memoryless discrete-state
discrete-time Markov model
P(nτ ) = [T (τ )]n P(0)
Figure Courtesy: John Chodera
How to construct MSMs?
Define transition probabilities
between states from dynamic
trajectories
Decompose configuration space into
non-overlapping states
: # of trajectories
initiated from i and terminate
in j at time τ
We can then extract long time dynamics from MSMs built from short
simulations
n
P(nτ ) = [T (τ )] P(0)
Pande, Hummer, Schutte, Levy, Webber, Noé, Chodera. Vanden-Eijnden ...
How to construct MSMs?
Split
Lump
K-centers
Spectra
Method
Conformations
Microstates
Lump kinetically related microstates in the same
macrostate.
Macrostates
MSMBuilder
Bowman, Huang, Pande, Methods, 2009, 49:197
Huang, X., Bowman, G., Bacallado, S., Pande., V.S., Proc. Nat. Acad. Sci. U.S.A., 106, 19765, (2009)
Spectral Clustering on 1D potential
Noé. et.al Curr. Opin. in Struc. Biol. 2008
Noise of Spectral Clustering on Biomolecules
First separate the most disconnected
blocks from the transition probability
matrix.
RNA hairpin
0.001
0.001
T2
>0.001
1e-10
T1
>0.001
1e-10
Difficult to determine
number of macrostates!
T3 (>0.001)
Noé and Weber: Left out rarely visited or
trapped states
Bowman and Pande: Subsample the data
τ
τk = −
ln µk (τ )
Nyström Method
Nyström method finds approximate eigen-decompositions of a symmetric
matrix (e.g. transition count matrix) by its sub-matrices
For symmetric sub-matrix A:
A = UΛU −1 = UΛU T
Λ: eigenvalue; U: eigenvector
Nyström approximation of K:
€
Yuan Yao
Yao, Cui, Bowman, Silva, Sun and Huang. , J. Chem. Phys, 174106, 2013
Nyström Method
Nyström method finds approximate eigen-decompositions of a symmetric
matrix (e.g. transition count matrix) by its sub-matrices
For symmetric sub-matrix A:
A = UΛU −1 = UΛU T
Λ: eigenvalue; U: eigenvector
Nyström approximation of K:
€
If entries in C and B are much
smaller than those in A:
Provides a good approximation
of the leading eigenvectors of K
Nyström Method
Transition probability matrix is non-symmetric:
P˜ u = λu ⇒ D−1/ 2KD1/ 2ν = λν
where ν = D1/ 2 u
From the eigenvectors of the symmetric matrix K n = D−1/ 2KD1/ 2 , we can
obtain the eigenvectors of P˜
€
€
€
€
,
Where:
approximates the leading eigenvectors
of the transition probability matrix
,
Hierarchical Nyström Extension Graph (HNEG)
How to select the split between A and C?
We use a multilevel approximation scheme
Alanine Dipeptide
NEG-Builder
The logarithm of the
posterior probability
of the models
Yao, Cui, Bowman, Silva, Sun and Huang. , J. Chem. Phys, 174106, 2013
Protein Trpzip2
HNEG outperforms PCCA
SHC
Outline
•  Hierarchical Nyström Method for Constructing
Markov State Models
•  The Automatic state Partitioning algorithm
for Multi-body systems (APM)
•  Some applications in Biology
Dynamics often occur at multiple timescales for
multi-body systems
Protein-Ligand
Recognition
Hydrophobic
Collapse
Protein
Aggregation
http://cos.gmu.edu/node/1267
Splitting-and-Lumping Algorithm Does not Work
Target potential
K=115
K=115
K=15,000
K=15,000
Incorporating kinetic information in geometric
clustering
P
Key insight: Generate microstates with a maximum residence time rather
than uniform resolution in structural space.
The residence time is defined as the time it takes for half of the
population (P(i, t)=0.5) to relax out of a certain microstate.
n T j −ks
n T j −ks
P(i, t) = ∑ ∑ δ (s j (ks) − i)δ (s j (ks + t) − i) / ∑ ∑ δ (s j (ks) − i)
j=1 k=0
j=1 k=0
To be submitted
Flowchart
•  We start from a single
microstate and perform bipartitioning until all the
microstates have the same
maximum residence time.
•  P C C A i s u s e d t o l u m p
microstates into macrostates.
•  The algorithm is iterated using
a Monte-Carlo scheme to
improve the metastability (Q).
APM Algorithm Outperforms Existing Algorithm
Target potential
K=115
K=115
K=15,000
Hydrophobic collapse of two benzenes
Conformational States
Kolmogorov Test
Relax from one collapsed state
Relax from the separated state
APM Algorithm Outperforms Splitting-and-Lumping
Algorithm
new
Outline
•  Hierarchical Nyström Method for Constructing
Markov State Models
•  The Automatic state Partitioning algorithm for
Multi-body systems (APM)
•  Some applications in Biology
RNA Pol II Elongation Complex
DNA
RNA Pol II!
mRNA
Active site
DNA
RNA
Mg
NTP
Bridg
e
Heli
x
Trigger Loop
Kornberg, Landick, Cramer, Nudler, Vassylyev, Block Labs …
Our Previous Simulations Provide Insights on RNA
Polymerase Dynamics
A hopping model for the product release from MSMs
Pol II
Da, Wang, Huang, J. Am. Chem.
Soc., 134, 2399, (2012)
Bacterial RNA Pol
Da, Pardo, Wang, Huang, PLoS.
Comp. Bio., e1003020, (2013)
Dynamics of Translocation Remain Mystery
Pre-translocation
Post-translocation
?
Kornberg Lab
Science, 2001, 292:1876
Kornberg Lab
Cell, 2004, 119:481
Large System Size and Long Timescales
Molecular Dynamics Simulations
•  System size ~420,000 atoms.
Protein (~3600 residues), water,
DNA, RNA, ions, etc.
•  Conformational changes may
occur even at millisecond
timescale or longer.
Model Construction
Ø Generate the initial pathways using
the non-linear climber morphing
algorithm (D. Weiss and M Levitt,
JMB, 2009)
Shenzhen supercomputing
center
AMBER03 force field
PME for long-range
electrostatics
Ø Seeding two rounds of unbiased MD
simulations with total simulation time
> 2,000 ns.
Ø  Construct a ~1000-state MSM.
Further divide these states into four
groups.
Model Validation
1
•  Slowest implied
timescale ~10µs.
•  Implied timescales go
flat when lag time > 5ns.
Residence Probability
2. MSM can reproduce the MD data:
Lag time (ns)
τk = −
τ
ln µ k (τ )
Free Energy Landscape for Translocation
Under Review, PNAS
Flexible Ligand Binding : LAO Protein
Arginine
LAO (Lysine-Arginine-Ornithine) protein:
one of the Periplasmic Binding Proteins
Silva, D., Bowman, G.R., Peinado, A., Huang, X., PLoS. Comp. Bio., 7(5), e1002054, (2011)
Flexible Ligand Binding : LAO Protein
Two step mechanism:
•  Unbound to encounter complex:
Conformational selection + Induced fit
•  Encounter complex to Bound:
Induced fit
E W, Vanden-Eijnden E, Ren, Noé
Silva, D., Bowman, G.R., Peinado, A., Huang, X., PLoS. Comp. Bio., 7(5), e1002054, (2011)
Human IAPP peptide amyloid genesis as cause of
type 2 diabetes
Amyloid deposits along the islet capillaries.
hIAPP is
intrinsically
disordered.
Type II diabetes patients
Bonner-Weir S , O'Brien T D Diabetes 2008;57:2899-2904
Aggregation of the hIAPP peptide is
cytotoxic to pancreatic beta cells (store
and release insulin to control blood
sugar level).
hIAPP: Protein Folding without a Hub
Folded Proteins: With a Hub
hIAPP: Without a hub
Transitions between states are all slow!
Qiao, Bowman, Huang, J. Am. Chem. Soc., 135(43), 16092, (2013)
Aggregation Prone Metastable Conformations
Two features:
Beta-turn structure (plat geometry) + Extended hydrophobic surface
Qiao, Bowman, Huang, J. Am. Chem. Soc., 135(43), 16092, (2013)
Acknowledgement
HKUST
Daniel Silva
Lintai Da
Fatima Pardo
Yutong Zhao
Fu Kit Sheong
Qin Qiao
Stanford
Peking University
Prof. Roger Kornberg
Prof. Yuan Yao
UCSD
Prof. Dong Wang
Funding
Hong Kong RGC, NSFC