LabPoster2009 - Center for Bioinformatics

Bystroff Lab Research -- 2009
Yao-ming Huang *, Patrick Buck *, Suzanne Matthews‡, Vibin Ramakrishnan *, Philippa Reeder°, Saeed Salem‡z, Zujun Shentu‡z, Mohammed al Hasan‡z, Benjamin Cole,
Jeffrey Bush, Addis Abebe^, Peter Watson^, Sam De Luca, Derek Pitman, Peter Ragone, Mohammed J. Zaki‡, Christopher Bystroff *‡
CENTER FOR BIOTECHNOLOGY AND INTERDISCIPLINARY STUDIES, DEPTS OF BIOLOGY* & COMPUTER SCIENCE‡, RENSSELAER POLYTECHNIC INSTITUTE, TROY, NEW YORK 12180
°Dept of Chemical Engineering, ^HHMI Minority Summer Internships in Biotechnology, (Undergraduates in italics) zZaki lab collaborators
HMMSUM: Position-specific substitution matrix for pairwise alignment
Complementation and reconstitution of fluorescence from circularly permuted green fluorescent protein: Leave-one-out biosensor design
Yao-ming Huang, Suzanne Matthews
Yao-ming Huang, Philippa Reeder, Derek Pitman, Peter Ragone
Recent advances in the ability to discriminate between homologous and non-homologous proteins in the “Twilight Zone” of sequence similarity, must be accompanied by accurate alignments if they are to be of
value to molecular modelers. Pairwise alignments require a measure of evolutionary distance, traditionally modeled using global amino acid substitution matrices. But real differences in the likelihood of substitutions
may exist for different structural contexts within proteins, since structural context contributes to the selective pressure. HMMSUM (HMMSTR-based SUbstitution Matrices) is a new model for structure-dependent
amino acid substitution probabilities consisting of a set of 281 matrices, one for each of the sequence-structure contexts defined in HMMSTR (a Hidden Markov Model for protein STRucture). HMMSUM does not
require the structure of the protein to be known, using HMMSTR predictions instead. Alignments using the HMMSUM matrices compare favorably to alignments using the BLOSUM or SDM / HSDM, the structurederived substitution matrices, when validated against remote homolog alignments from BAliBASE.
Green fluorescent protein (GFP) has been used as a proof of concept for "leave-one-out" design of complementary protein fragments, with the goal of producing programmable protein biosensors. A protein can be designed
that has a segment of the polypeptide chain omitted from the middle of the sequence by first cyclizing the protein, then truncating it. Three variants of GFP have been synthesized that are each missing one of the eleven b
strands from its b barrel structure. In two of the variants, adding the omitted peptide sequence in trans reconstitutes fluorescence. Recovery of fluorescence upon incubation with the missing strand is quantified, along binding
affinity, refolding rate, and fluorescence quantum yield. We show that GFP with b-strand 7 "left out" (so-called t7SP) binds the free b-strand 7 peptide with sub-micromolar affinity (Kd=~0.5mM) and folds at a rate
comparable to the unpermuted GFP. The unliganded t7SP exists in a partly unfolded state. Upon addition of the peptide ligand, fluorescence is recovered at the rate of refolding. Refolding of t7SP is at least a three state
process, both with and without the ligand peptide, similar to previously published results for full length, unpermuted GFP. The conserved kinetic properties strongly suggest that the rate limiting steps in the folding pathway
have not been altered by cyclic permutation and truncation.
Comparison of alignments of 1MUCA and 4ENL proteins
Kinetic model for peptide binding to
leave-one-out GFP
+L
1MUCA
very fast
I1
k1
k1
UF
k2
GFP-analyte Complementation
and Fluorescence
Analyte with target region
GFP 1-10 with
programmed binding site
+

Protein unfolding is modeled as an ensemble of pathways, where each is a tree of intermediate states and branches in the tree represent additional degrees of conformational freedom. Using a known protein
structure, the program (GeoFold) generates a directed acyclic graph of linked elemental subsystems, each modeling a partitioning of a substructure into two at topologically allowed positions in the chain. The graph
begins at the native state and ends at small fragments representing the fully unfolded state. Each substructure is assigned a free energy based on its buried solvent accessible surface area, sidechain entropy, and
backbone configurational entropy. Each bifurcating edge, representing the transition state of a single partitioning, is assigned a free energy barrier height based on the principle that exposed surfaces are solvated
before the configurational entropy is expressed. To simulate unfolding on the graph, rates are calculated for each elemental subsystem at each time step using transition state theory. The model exhibits two-state
behavior with respect to temperature or denaturant and shows the expected linear relationship of overall folding rate with denaturant. Predicted unfolding rates are compared with experimental values for fifteen
well-studied proteins. Strengths and deficiencies of the model are discussed.
Figure 1. (a) E lemental subsystem for the
kinetic model. f is a su bset of the protein ,
and is partitioned int o u1 and u2, passing
over energy barrier ‡. (b) Topological
operators. A pivot motion is a rotation
around a point. A hinge is a rotation
around two points. A break is a translation.
Rotations and translati ons must not cross
chains.
0
r = 0.78
-1
1TUC
-2
ln k exp
-3
1CSP 1TUD
-4
1SHFA
-5
1UBQ
1HRC
1HD
-6
1BNR
-7
-8
1TEN
1CIS
2ABD
-9
2CI2
-10
-50
-40
-30
-20
ln kcalc
-10
0
10
k 2'
I2•L
k3
Ib,S
US
Re-wired green fluorescent protein folds and glows despite non-circular topological
permutation of the sequence
Vibin Ramakrishnan, Saeed M. Salem, Suzanne Matthews, Patrick Buck, Mohammed J. Zaki,
N*•L
k3
Programmable GFP Complementation: GFP 1-10 is designed in silico to bind a 6-12bp
segment of a target analyte, resulting in GFP-analyte complementation and fluorescence
GeoFOLD: A mechanistic model to study the effect of topology on protein unfolding pathways and kinetics.
+L
fast
very fast
I2
k3
4ENL
+L
N*
Ib
I1•L
Circular GFP topology
7
10
8
11
9
Philippa Reeder, Yao-ming Huang, Jonathan Dordick
Proteins have canonical packing arrangements that can be achieved through may different topological connectivities. Different topologies
that generate the same core packing arrangement may exhibit differences in folding kinetics and stability, due to the differences in the
folding pathway. To test this hypothesis, we engineered a new topology for GFP by non-circular permutation. The new, “re-wired” GFP
folds and glows with the same excitation/emmission properties as the wild-type. Re-wiring proteins may be useful for understanding protein
folding pathways, and may be used as a new tool for protein design.
Context Shapes: Efficient Complementary Shape Matching for
Protein-Protein Docking.
3
4
2
5
1
6
very fast
I2,S
Re-wired GFP topology
7
10
8
11
9
3
4
2
5
1
6
Zujun Shentu, Mohammed al Hasan, Addis Abebe, Mohammed Zaki
Physical molecular interactions are the mechanisms for virtually all biochemical processes from enzyme
catalysis to signal transduction. As more crystal structures and better homology models are available, new
methods muct be explored for predicting how these molecules interact. The long-term goal of this study is to
enable a protein to be docked to all proteins of known structure, predicting interactions and simultaneously
predicting the interaction mode. In order to make docking fast enough to be done as a database search, we
have developed a data structure that describes the shape of a molecular surface in detail yet which can be
efficiently rotated, matched, and clustered. Preliminary results show that ContextShapes are as good or better
than the state of the art in protein-protein docking. Four aims are proposed: First, the scoring functions and
representation of ContextShapes will be refined, and surface water will be included in the shape. Second, our
approach will be enhanced to incorporate various surface properties that aid in the docking predictions. Third,
we aim to handle flexible docking, both implcitly in the intial steps and explicitly in the refinement steps. And
finally, ContextShapes will be heirarchically indexed for efficient database searching. The final product will be
a database search engine for protein-protein interactions.
Figure 5. Log unfolding rates,
experimental versus simulated .
Simulated
unfolding rate
s
correlate
strongl y
with
experimentally observed ones,
suggesting correctness of t he
energy parame ters. However, the
slope is >> 1 ; slow unfolder s
unfold too slowly in si mulations,
due to an unch
aracteriz ed
dependency. The probability that
r•
0.78 given n=12 and random
data is p < 0.002 , using the
resampling
method.
Experimental data obtai ned fro m
(Fersht, 2002; Nauli, et al.,
2001; Plaxco, et al., 1998)
CALF: C-alpha only molecular simulations of proteins using sequence-specific, knowledge-based potentials for hydrogen bonding, local
structure and pairwise contacts.
Patrick Buck, Peter Watson
Some short peptide sequences have strong structural preferences that are independent of their three-dimensional context in proteins non-local interactions as shown by peptide folding studies using NMR and simulation. We
devise a coarse-grained force field requiring only alpha carbons to fold reduced protein representations with a program called CALF (C-ALpha based Force field) with the hopes of folding local protein sequences with high
accuracy. CALF builds sequence specific statistical potentials based on database frequencies for pseudo-backbone angles, pairwise contacts and hydrogen bond donor/acceptor pairs., and simulates folding via Brownian
dynamics. The local structure prediction program HMMSTR was used to identify 27 protein segments, 12 residues in length that had strong local structure preferences. CALF simulations were run for 0.9 μs and each
trajectory was clustered based on structure. 19 of the 27 sequences had trajectories where the largest structure made up more than half of the total trajectory, indicating a strong structural bias. Of the 19 structures with strong
structural biases, 15 had cluster centers that were within 2.3 Å RMSD of the structure in the native.
Results from folding simulations of 12-residue peptides
Alpha helical crossovers favor right-handed supersecondary structures in the absence of beta-strand pairing.
The Phone Cord effect in protein folding.
Benjamin Cole, Vibin Ramakrishnan
The remarkable predominance of right-handedness in beta-alpha-beta helix crossovers has been explained in terms of equilibrium stability, but a kinetic control mechanism may also play a role. If the beta-sheet
contacts are made before the crossover helix is fully formed and the helix formation follows the energetic pathway of least resistance, then the folding helix would impart a left-handed torque on the ends of the
two strands, and this would tear apart left-handed conformation but not a right-handed one. Statistics on protein structures show that right-handed crossovers predominate even among all-alpha proteins, where the
equilibrium stability of the beta sheet twist does not apply. Using simple demonstrative molecular simulations, we can reproduce the right-handed preference in beta-alpha-beta units using basic assumptions
about torsion angle barriers, without using geometry-specific beta strand forces. The kinetic trapping mechanism is dubbed the Phone Cord effect" because it is reminiscent of the way a phone cord forms
supercoils to relieve the torional stress of coil formation.
LH
RH
Model representation
Selected Publications
Summing probability fields from weighted occurrences
Bystroff, C. MASKER: (2002) Improved solvent-excluded molecular surface area estimations using Boolean masks. Protein Eng 15, 959-65.
Bystroff, C. & Garde, S. (2003) Helix propensities of short peptides: molecular dynamics versus bioinformatics. Proteins 50, 552-62.
Bystroff, C. & Krogh, A. (2007) Hidden markov models for prediction of protein features. Methods Mol Biol 413, 173-98.
Bystroff, C. & Shao, Y. (2002) Fully automated ab initio protein structure prediction using I-SITES, HMMSTR and ROSETTA. Bioinformatics 18 Suppl 1, S54-61.
Bystroff, C., Shao, Y. & Yuan, X. (2004) Five hierarchical levels of sequence-structure correlation in proteins. Appl Bioinformatics 3, 97-104 (2004).
Bystroff, C., Thorsson, V. & Baker, D. (2000) HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. J Mol Biol 301, 173-90.
Huang, Y.M. & Bystroff, C. (2006) Improved pairwise alignments of proteins in the Twilight Zone using local structure predictions. Bioinformatics 22, 413-22.
Ramakrishnan V, Salem SM, Zaki MJ & Bystroff C. (2008) Developing a detailed mechanistic model for protein unfolding. Bioinformatics (submitted)
Shao, Y. & Bystroff, C. (2003) Predicting interresidue contacts using templates and pathways. Proteins 53 Suppl 6, 497-502.
Shentu, Z., Al Hasan, M., Bystroff, C. & Zaki, M.J. (2008) Context shapes: Efficient complementary shape matching for protein-protein docking. Proteins 70, 1056-73.
Shinde, A.V. et al. (2007) Identification of the Peptide Sequences within the EIIIA (EDA) Segment of Fibronectin That Mediate Integrin 91-dependent Cellular Activities. J Biol Chem 283, 2858-70 (2008).
Bystroff C. & Webb-Robertson, B.J. (2008) I-sites 2007: Pairwise covariance adds little to secondary structure prediction but improves the prediction of non-canonical local structure. Bioinformatics (submitted)
Xia, K. et al. (2007) Identifying the subproteome of kinetically stable proteins via diagonal 2D SDS/PAGE. Proc Natl Acad Sci U S A 104, 17329-34.
Yuan, X. & Bystroff, C. (2005) Non-sequential structure-based alignments reveal topology-independent core packing arrangements in proteins. Bioinformatics 21, 1010-9.
Zaki, M.J., Nadimpally, V., Bardhan, D. & Bystroff, C. (2004) Predicting protein folding pathways. Bioinformatics 20 Suppl 1, i386-93.
Funding from
DBI-0448072