Bystroff Lab Research -- 2009 Yao-ming Huang *, Patrick Buck *, Suzanne Matthews‡, Vibin Ramakrishnan *, Philippa Reeder°, Saeed Salem‡z, Zujun Shentu‡z, Mohammed al Hasan‡z, Benjamin Cole, Jeffrey Bush, Addis Abebe^, Peter Watson^, Sam De Luca, Derek Pitman, Peter Ragone, Mohammed J. Zaki‡, Christopher Bystroff *‡ CENTER FOR BIOTECHNOLOGY AND INTERDISCIPLINARY STUDIES, DEPTS OF BIOLOGY* & COMPUTER SCIENCE‡, RENSSELAER POLYTECHNIC INSTITUTE, TROY, NEW YORK 12180 °Dept of Chemical Engineering, ^HHMI Minority Summer Internships in Biotechnology, (Undergraduates in italics) zZaki lab collaborators HMMSUM: Position-specific substitution matrix for pairwise alignment Complementation and reconstitution of fluorescence from circularly permuted green fluorescent protein: Leave-one-out biosensor design Yao-ming Huang, Suzanne Matthews Yao-ming Huang, Philippa Reeder, Derek Pitman, Peter Ragone Recent advances in the ability to discriminate between homologous and non-homologous proteins in the “Twilight Zone” of sequence similarity, must be accompanied by accurate alignments if they are to be of value to molecular modelers. Pairwise alignments require a measure of evolutionary distance, traditionally modeled using global amino acid substitution matrices. But real differences in the likelihood of substitutions may exist for different structural contexts within proteins, since structural context contributes to the selective pressure. HMMSUM (HMMSTR-based SUbstitution Matrices) is a new model for structure-dependent amino acid substitution probabilities consisting of a set of 281 matrices, one for each of the sequence-structure contexts defined in HMMSTR (a Hidden Markov Model for protein STRucture). HMMSUM does not require the structure of the protein to be known, using HMMSTR predictions instead. Alignments using the HMMSUM matrices compare favorably to alignments using the BLOSUM or SDM / HSDM, the structurederived substitution matrices, when validated against remote homolog alignments from BAliBASE. Green fluorescent protein (GFP) has been used as a proof of concept for "leave-one-out" design of complementary protein fragments, with the goal of producing programmable protein biosensors. A protein can be designed that has a segment of the polypeptide chain omitted from the middle of the sequence by first cyclizing the protein, then truncating it. Three variants of GFP have been synthesized that are each missing one of the eleven b strands from its b barrel structure. In two of the variants, adding the omitted peptide sequence in trans reconstitutes fluorescence. Recovery of fluorescence upon incubation with the missing strand is quantified, along binding affinity, refolding rate, and fluorescence quantum yield. We show that GFP with b-strand 7 "left out" (so-called t7SP) binds the free b-strand 7 peptide with sub-micromolar affinity (Kd=~0.5mM) and folds at a rate comparable to the unpermuted GFP. The unliganded t7SP exists in a partly unfolded state. Upon addition of the peptide ligand, fluorescence is recovered at the rate of refolding. Refolding of t7SP is at least a three state process, both with and without the ligand peptide, similar to previously published results for full length, unpermuted GFP. The conserved kinetic properties strongly suggest that the rate limiting steps in the folding pathway have not been altered by cyclic permutation and truncation. Comparison of alignments of 1MUCA and 4ENL proteins Kinetic model for peptide binding to leave-one-out GFP +L 1MUCA very fast I1 k1 k1 UF k2 GFP-analyte Complementation and Fluorescence Analyte with target region GFP 1-10 with programmed binding site + Protein unfolding is modeled as an ensemble of pathways, where each is a tree of intermediate states and branches in the tree represent additional degrees of conformational freedom. Using a known protein structure, the program (GeoFold) generates a directed acyclic graph of linked elemental subsystems, each modeling a partitioning of a substructure into two at topologically allowed positions in the chain. The graph begins at the native state and ends at small fragments representing the fully unfolded state. Each substructure is assigned a free energy based on its buried solvent accessible surface area, sidechain entropy, and backbone configurational entropy. Each bifurcating edge, representing the transition state of a single partitioning, is assigned a free energy barrier height based on the principle that exposed surfaces are solvated before the configurational entropy is expressed. To simulate unfolding on the graph, rates are calculated for each elemental subsystem at each time step using transition state theory. The model exhibits two-state behavior with respect to temperature or denaturant and shows the expected linear relationship of overall folding rate with denaturant. Predicted unfolding rates are compared with experimental values for fifteen well-studied proteins. Strengths and deficiencies of the model are discussed. Figure 1. (a) E lemental subsystem for the kinetic model. f is a su bset of the protein , and is partitioned int o u1 and u2, passing over energy barrier ‡. (b) Topological operators. A pivot motion is a rotation around a point. A hinge is a rotation around two points. A break is a translation. Rotations and translati ons must not cross chains. 0 r = 0.78 -1 1TUC -2 ln k exp -3 1CSP 1TUD -4 1SHFA -5 1UBQ 1HRC 1HD -6 1BNR -7 -8 1TEN 1CIS 2ABD -9 2CI2 -10 -50 -40 -30 -20 ln kcalc -10 0 10 k 2' I2•L k3 Ib,S US Re-wired green fluorescent protein folds and glows despite non-circular topological permutation of the sequence Vibin Ramakrishnan, Saeed M. Salem, Suzanne Matthews, Patrick Buck, Mohammed J. Zaki, N*•L k3 Programmable GFP Complementation: GFP 1-10 is designed in silico to bind a 6-12bp segment of a target analyte, resulting in GFP-analyte complementation and fluorescence GeoFOLD: A mechanistic model to study the effect of topology on protein unfolding pathways and kinetics. +L fast very fast I2 k3 4ENL +L N* Ib I1•L Circular GFP topology 7 10 8 11 9 Philippa Reeder, Yao-ming Huang, Jonathan Dordick Proteins have canonical packing arrangements that can be achieved through may different topological connectivities. Different topologies that generate the same core packing arrangement may exhibit differences in folding kinetics and stability, due to the differences in the folding pathway. To test this hypothesis, we engineered a new topology for GFP by non-circular permutation. The new, “re-wired” GFP folds and glows with the same excitation/emmission properties as the wild-type. Re-wiring proteins may be useful for understanding protein folding pathways, and may be used as a new tool for protein design. Context Shapes: Efficient Complementary Shape Matching for Protein-Protein Docking. 3 4 2 5 1 6 very fast I2,S Re-wired GFP topology 7 10 8 11 9 3 4 2 5 1 6 Zujun Shentu, Mohammed al Hasan, Addis Abebe, Mohammed Zaki Physical molecular interactions are the mechanisms for virtually all biochemical processes from enzyme catalysis to signal transduction. As more crystal structures and better homology models are available, new methods muct be explored for predicting how these molecules interact. The long-term goal of this study is to enable a protein to be docked to all proteins of known structure, predicting interactions and simultaneously predicting the interaction mode. In order to make docking fast enough to be done as a database search, we have developed a data structure that describes the shape of a molecular surface in detail yet which can be efficiently rotated, matched, and clustered. Preliminary results show that ContextShapes are as good or better than the state of the art in protein-protein docking. Four aims are proposed: First, the scoring functions and representation of ContextShapes will be refined, and surface water will be included in the shape. Second, our approach will be enhanced to incorporate various surface properties that aid in the docking predictions. Third, we aim to handle flexible docking, both implcitly in the intial steps and explicitly in the refinement steps. And finally, ContextShapes will be heirarchically indexed for efficient database searching. The final product will be a database search engine for protein-protein interactions. Figure 5. Log unfolding rates, experimental versus simulated . Simulated unfolding rate s correlate strongl y with experimentally observed ones, suggesting correctness of t he energy parame ters. However, the slope is >> 1 ; slow unfolder s unfold too slowly in si mulations, due to an unch aracteriz ed dependency. The probability that r• 0.78 given n=12 and random data is p < 0.002 , using the resampling method. Experimental data obtai ned fro m (Fersht, 2002; Nauli, et al., 2001; Plaxco, et al., 1998) CALF: C-alpha only molecular simulations of proteins using sequence-specific, knowledge-based potentials for hydrogen bonding, local structure and pairwise contacts. Patrick Buck, Peter Watson Some short peptide sequences have strong structural preferences that are independent of their three-dimensional context in proteins non-local interactions as shown by peptide folding studies using NMR and simulation. We devise a coarse-grained force field requiring only alpha carbons to fold reduced protein representations with a program called CALF (C-ALpha based Force field) with the hopes of folding local protein sequences with high accuracy. CALF builds sequence specific statistical potentials based on database frequencies for pseudo-backbone angles, pairwise contacts and hydrogen bond donor/acceptor pairs., and simulates folding via Brownian dynamics. The local structure prediction program HMMSTR was used to identify 27 protein segments, 12 residues in length that had strong local structure preferences. CALF simulations were run for 0.9 μs and each trajectory was clustered based on structure. 19 of the 27 sequences had trajectories where the largest structure made up more than half of the total trajectory, indicating a strong structural bias. Of the 19 structures with strong structural biases, 15 had cluster centers that were within 2.3 Å RMSD of the structure in the native. Results from folding simulations of 12-residue peptides Alpha helical crossovers favor right-handed supersecondary structures in the absence of beta-strand pairing. The Phone Cord effect in protein folding. Benjamin Cole, Vibin Ramakrishnan The remarkable predominance of right-handedness in beta-alpha-beta helix crossovers has been explained in terms of equilibrium stability, but a kinetic control mechanism may also play a role. If the beta-sheet contacts are made before the crossover helix is fully formed and the helix formation follows the energetic pathway of least resistance, then the folding helix would impart a left-handed torque on the ends of the two strands, and this would tear apart left-handed conformation but not a right-handed one. Statistics on protein structures show that right-handed crossovers predominate even among all-alpha proteins, where the equilibrium stability of the beta sheet twist does not apply. Using simple demonstrative molecular simulations, we can reproduce the right-handed preference in beta-alpha-beta units using basic assumptions about torsion angle barriers, without using geometry-specific beta strand forces. The kinetic trapping mechanism is dubbed the Phone Cord effect" because it is reminiscent of the way a phone cord forms supercoils to relieve the torional stress of coil formation. LH RH Model representation Selected Publications Summing probability fields from weighted occurrences Bystroff, C. MASKER: (2002) Improved solvent-excluded molecular surface area estimations using Boolean masks. Protein Eng 15, 959-65. Bystroff, C. & Garde, S. (2003) Helix propensities of short peptides: molecular dynamics versus bioinformatics. Proteins 50, 552-62. Bystroff, C. & Krogh, A. (2007) Hidden markov models for prediction of protein features. Methods Mol Biol 413, 173-98. Bystroff, C. & Shao, Y. (2002) Fully automated ab initio protein structure prediction using I-SITES, HMMSTR and ROSETTA. Bioinformatics 18 Suppl 1, S54-61. Bystroff, C., Shao, Y. & Yuan, X. (2004) Five hierarchical levels of sequence-structure correlation in proteins. Appl Bioinformatics 3, 97-104 (2004). Bystroff, C., Thorsson, V. & Baker, D. (2000) HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. J Mol Biol 301, 173-90. Huang, Y.M. & Bystroff, C. (2006) Improved pairwise alignments of proteins in the Twilight Zone using local structure predictions. Bioinformatics 22, 413-22. Ramakrishnan V, Salem SM, Zaki MJ & Bystroff C. (2008) Developing a detailed mechanistic model for protein unfolding. Bioinformatics (submitted) Shao, Y. & Bystroff, C. (2003) Predicting interresidue contacts using templates and pathways. Proteins 53 Suppl 6, 497-502. Shentu, Z., Al Hasan, M., Bystroff, C. & Zaki, M.J. (2008) Context shapes: Efficient complementary shape matching for protein-protein docking. Proteins 70, 1056-73. Shinde, A.V. et al. (2007) Identification of the Peptide Sequences within the EIIIA (EDA) Segment of Fibronectin That Mediate Integrin 91-dependent Cellular Activities. J Biol Chem 283, 2858-70 (2008). Bystroff C. & Webb-Robertson, B.J. (2008) I-sites 2007: Pairwise covariance adds little to secondary structure prediction but improves the prediction of non-canonical local structure. Bioinformatics (submitted) Xia, K. et al. (2007) Identifying the subproteome of kinetically stable proteins via diagonal 2D SDS/PAGE. Proc Natl Acad Sci U S A 104, 17329-34. Yuan, X. & Bystroff, C. (2005) Non-sequential structure-based alignments reveal topology-independent core packing arrangements in proteins. Bioinformatics 21, 1010-9. Zaki, M.J., Nadimpally, V., Bardhan, D. & Bystroff, C. (2004) Predicting protein folding pathways. Bioinformatics 20 Suppl 1, i386-93. Funding from DBI-0448072
© Copyright 2026 Paperzz