Investigating Conformational Changes of Biological Macromolecules Using Markov State Models Xuhui Huang Department of Chemistry The Hong Kong University of Science and Technology Static Structure of Cell at Atomic Resolution Electron tomography structure of an entire bacterial cell Aloy, P and Russel R.B., Nat Rev Mol Cell Biol. 2006:188-97 Bio-molecules Function in a Dynamic Fashion Central Dogma of Molecular Biology: Transcription! DNA mRNA Translation! Protein Central Dogma: Transcription! DNA RNA Pol! mRNA Central Dogma: Transcription! DNA RNA Pol! mRNA A “Scientific Fiction” movie How these bio-molecules dynamically operate at molecular level remain largely mystery, and difficult for experiments! One Major Challenge: Timescale Gap Atomistic MD Simulations Experiments Pande Timescale Gap DNA RNA Pol! mRNA - ~18ns/day with 1000 CPUs - 1 millisecond MD simulation takes 1000 CPUs >150 years > 420,000 atoms Pande Fast Hardware … http://www.ibm.com Performance decreases with more cores (1.8 M atom system) Anton (D. E. Shaw) ~50-microsecond individual MD simulations for GPCRs. (~300K atoms) Dror et. al. PNAS, 108, 13118, 2011 http://www.deshawresearch.com/ Timescale Gap is Largely Due to Rugged Free Energy Landscapes Trapped in local minimums of rugged landscapes http://gold.cchem.berkeley.edu/ Outline • Hierarchical Nyström Method for Constructing Markov State Models • The Automatic state Partitioning algorithm for Multi-body systems (APM) • Some applications in Biology Conformational Dynamics Often Resemble a Markov Process Molecular Dynamics n-butane has three metastable states Markov Model Can Describe Conformational Dynamics by Coarse-grained Time and Space • Coarse-graining of space into discrete states will introduce memory: • Coarse-graining of time can make the memory appear short if there is separation of timescales • This resembles a memoryless discrete-state discrete-time Markov model P(nτ ) = [T (τ )]n P(0) Figure Courtesy: John Chodera How to construct MSMs? Define transition probabilities between states from dynamic trajectories Decompose configuration space into non-overlapping states : # of trajectories initiated from i and terminate in j at time τ We can then extract long time dynamics from MSMs built from short simulations n P(nτ ) = [T (τ )] P(0) Pande, Hummer, Schutte, Levy, Webber, Noé, Chodera. Vanden-Eijnden ... How to construct MSMs? Split Lump K-centers Spectra Method Conformations Microstates Lump kinetically related microstates in the same macrostate. Macrostates MSMBuilder Bowman, Huang, Pande, Methods, 2009, 49:197 Huang, X., Bowman, G., Bacallado, S., Pande., V.S., Proc. Nat. Acad. Sci. U.S.A., 106, 19765, (2009) Spectral Clustering on 1D potential Noé. et.al Curr. Opin. in Struc. Biol. 2008 Noise of Spectral Clustering on Biomolecules First separate the most disconnected blocks from the transition probability matrix. RNA hairpin 0.001 0.001 T2 >0.001 1e-10 T1 >0.001 1e-10 Difficult to determine number of macrostates! T3 (>0.001) Noé and Weber: Left out rarely visited or trapped states Bowman and Pande: Subsample the data τ τk = − ln µk (τ ) Nyström Method Nyström method finds approximate eigen-decompositions of a symmetric matrix (e.g. transition count matrix) by its sub-matrices For symmetric sub-matrix A: A = UΛU −1 = UΛU T Λ: eigenvalue; U: eigenvector Nyström approximation of K: € Yuan Yao Yao, Cui, Bowman, Silva, Sun and Huang. , J. Chem. Phys, 174106, 2013 Nyström Method Nyström method finds approximate eigen-decompositions of a symmetric matrix (e.g. transition count matrix) by its sub-matrices For symmetric sub-matrix A: A = UΛU −1 = UΛU T Λ: eigenvalue; U: eigenvector Nyström approximation of K: € If entries in C and B are much smaller than those in A: Provides a good approximation of the leading eigenvectors of K Nyström Method Transition probability matrix is non-symmetric: P˜ u = λu ⇒ D−1/ 2KD1/ 2ν = λν where ν = D1/ 2 u From the eigenvectors of the symmetric matrix K n = D−1/ 2KD1/ 2 , we can obtain the eigenvectors of P˜ € € € € , Where: approximates the leading eigenvectors of the transition probability matrix , Hierarchical Nyström Extension Graph (HNEG) How to select the split between A and C? We use a multilevel approximation scheme Alanine Dipeptide NEG-Builder The logarithm of the posterior probability of the models Yao, Cui, Bowman, Silva, Sun and Huang. , J. Chem. Phys, 174106, 2013 Protein Trpzip2 HNEG outperforms PCCA SHC Outline • Hierarchical Nyström Method for Constructing Markov State Models • The Automatic state Partitioning algorithm for Multi-body systems (APM) • Some applications in Biology Dynamics often occur at multiple timescales for multi-body systems Protein-Ligand Recognition Hydrophobic Collapse Protein Aggregation http://cos.gmu.edu/node/1267 Splitting-and-Lumping Algorithm Does not Work Target potential K=115 K=115 K=15,000 K=15,000 Incorporating kinetic information in geometric clustering P Key insight: Generate microstates with a maximum residence time rather than uniform resolution in structural space. The residence time is defined as the time it takes for half of the population (P(i, t)=0.5) to relax out of a certain microstate. n T j −ks n T j −ks P(i, t) = ∑ ∑ δ (s j (ks) − i)δ (s j (ks + t) − i) / ∑ ∑ δ (s j (ks) − i) j=1 k=0 j=1 k=0 To be submitted Flowchart • We start from a single microstate and perform bipartitioning until all the microstates have the same maximum residence time. • P C C A i s u s e d t o l u m p microstates into macrostates. • The algorithm is iterated using a Monte-Carlo scheme to improve the metastability (Q). APM Algorithm Outperforms Existing Algorithm Target potential K=115 K=115 K=15,000 Hydrophobic collapse of two benzenes Conformational States Kolmogorov Test Relax from one collapsed state Relax from the separated state APM Algorithm Outperforms Splitting-and-Lumping Algorithm new Outline • Hierarchical Nyström Method for Constructing Markov State Models • The Automatic state Partitioning algorithm for Multi-body systems (APM) • Some applications in Biology RNA Pol II Elongation Complex DNA RNA Pol II! mRNA Active site DNA RNA Mg NTP Bridg e Heli x Trigger Loop Kornberg, Landick, Cramer, Nudler, Vassylyev, Block Labs … Our Previous Simulations Provide Insights on RNA Polymerase Dynamics A hopping model for the product release from MSMs Pol II Da, Wang, Huang, J. Am. Chem. Soc., 134, 2399, (2012) Bacterial RNA Pol Da, Pardo, Wang, Huang, PLoS. Comp. Bio., e1003020, (2013) Dynamics of Translocation Remain Mystery Pre-translocation Post-translocation ? Kornberg Lab Science, 2001, 292:1876 Kornberg Lab Cell, 2004, 119:481 Large System Size and Long Timescales Molecular Dynamics Simulations • System size ~420,000 atoms. Protein (~3600 residues), water, DNA, RNA, ions, etc. • Conformational changes may occur even at millisecond timescale or longer. Model Construction Ø Generate the initial pathways using the non-linear climber morphing algorithm (D. Weiss and M Levitt, JMB, 2009) Shenzhen supercomputing center AMBER03 force field PME for long-range electrostatics Ø Seeding two rounds of unbiased MD simulations with total simulation time > 2,000 ns. Ø Construct a ~1000-state MSM. Further divide these states into four groups. Model Validation 1 • Slowest implied timescale ~10µs. • Implied timescales go flat when lag time > 5ns. Residence Probability 2. MSM can reproduce the MD data: Lag time (ns) τk = − τ ln µ k (τ ) Free Energy Landscape for Translocation Under Review, PNAS Flexible Ligand Binding : LAO Protein Arginine LAO (Lysine-Arginine-Ornithine) protein: one of the Periplasmic Binding Proteins Silva, D., Bowman, G.R., Peinado, A., Huang, X., PLoS. Comp. Bio., 7(5), e1002054, (2011) Flexible Ligand Binding : LAO Protein Two step mechanism: • Unbound to encounter complex: Conformational selection + Induced fit • Encounter complex to Bound: Induced fit E W, Vanden-Eijnden E, Ren, Noé Silva, D., Bowman, G.R., Peinado, A., Huang, X., PLoS. Comp. Bio., 7(5), e1002054, (2011) Human IAPP peptide amyloid genesis as cause of type 2 diabetes Amyloid deposits along the islet capillaries. hIAPP is intrinsically disordered. Type II diabetes patients Bonner-Weir S , O'Brien T D Diabetes 2008;57:2899-2904 Aggregation of the hIAPP peptide is cytotoxic to pancreatic beta cells (store and release insulin to control blood sugar level). hIAPP: Protein Folding without a Hub Folded Proteins: With a Hub hIAPP: Without a hub Transitions between states are all slow! Qiao, Bowman, Huang, J. Am. Chem. Soc., 135(43), 16092, (2013) Aggregation Prone Metastable Conformations Two features: Beta-turn structure (plat geometry) + Extended hydrophobic surface Qiao, Bowman, Huang, J. Am. Chem. Soc., 135(43), 16092, (2013) Acknowledgement HKUST Daniel Silva Lintai Da Fatima Pardo Yutong Zhao Fu Kit Sheong Qin Qiao Stanford Peking University Prof. Roger Kornberg Prof. Yuan Yao UCSD Prof. Dong Wang Funding Hong Kong RGC, NSFC
© Copyright 2026 Paperzz