Using Building Blocks to Guide Protein Structure Prediction

Going Beyond Fragments
Using Building Blocks to Guide Protein Structure Prediction
I. Putz, M. Schneider, S. Doerr, F. Salomon, M. Mabrouk, F. Kamm, and O. Brock
Robotics and Biology Laboratory, Technische Universität Berlin, Germany
[email protected]
1. Building Blocks as Structural Alphabet
Motifs
Folds
Homologs
Range of Building Blocks
Information about protein structure in the PDB
2. Preprocessing Stage: Comprehensive Building Block Library (BBL)
Construction
Validation
Building block and set of
associated sequences:
All-vs-All
FM TP
TBM TP
TBM FP
100
10
1
If we are able to retrieve building blocks for a
target sequence they are mostly true positives.
However, their accuracy and amount drops for FM
targets.
Retrieved Building Blocks
Comparison of Prediction Performance *
100
TBM
90
FM
TBM, bblocks accuracy > 0.5
FM, bblocks accuracy > 0.5
Building
blocks
helped
80
70
60
50
40
30
T0644 (TBM), Acc. 0.92
20
No or wrong building
blocks or modeling failed
10
0
0
10
20
30
40
50
60
70
80
90
T0716 (TBM)
Acc. 1.0
GDT BB 83.82
GDT MBS 66.67
100
GDT RBO-MBS (without building blocks)
structural
matches
Set of non-redundant proteins spanning fold space from ASTRAL 1.75 [1].
FM FP
T0653
T0684
T0695
T0734
T0739
T0740
T0741
T0735
T0719
T0726
T0658
T0693
T0737
T0649
T0650
T0659
T0661
T0666
T0668
T0671
T0673
T0675
T0676
T0678
T0680
T0688
T0690
T0691
T0696
T0700
T0709
T0711
T0713
T0720
T0742
T0743
T0681
T0705
T0749
T0732
T0753
T0746
T0724
T0712
T0689
T0651
T0710
T0663
T0697
T0733
T0747
T0744
T0707
T0699
T0702
T0706
T0738
T0667
T0648
T0683
T0750
T0704
T0721
T0672
T0717
T0698
T0644
T0652
T0736
T0674
T0686
T0708
T0654
T0687
T0685
T0682
T0657
T0664
T0645
T0655
T0662
T0669
T0679
T0692
T0694
T0701
T0703
T0714
T0715
T0716
T0731
T0752
Fragments
global
Accuracy of Retrieved Building Blocks *
Log number of building blocks
INFORMATION
The PDB is believed to be structurally complete. We
therefore mine the PDB for a "vocabulary" of naturally
occurring substructures and use this vocabulary for
protein structure prediction. Our vocabulary consists of
building blocks: spatially contiguous but not necessarily
sequence-contiguous structural units that are recurrent
in the PDB. We present a method to extract conserved
building blocks from the PDB. To account for the fact
that structure is more preserved than sequence, this
method initially ignores sequence and exclusively
operates in the structural domain. Spatial information
retrieved from matching building blocks is used to guide
search in protein structure prediction.
GDT RBO-MBS-BB (with bblocks)
local
4. RBO-MBS-BB Prediction Results
Matching building block:
structurally contiguous and
sequentially non-contiguous
'FSTLKSTVEAIWAG_SSMGIR_TIGGGI'
'MKIVHEIKERILDK_KAIGVY_EMMCVM'
'QLIEQEMKQAAYES_VTSTFH_SGVVVI'
'EYTKEVLKSIAEEL_IALEVM_HIHLFV'
...
'QELDYLTRHYLVKN_GYIKFI_KIEVYL'
3n05A
Structural coverage of secondary structure regions with
building blocks of 118 CASP 9 targets and target 3n05A.
Guiding Search with Spatial Information
Augment sequence signal of building blocks with
homologous sequences (Sensitivity ).
Long-range interactions derived from candidate
building blocks serve as guiding restraints.
Conformations
BB Conformations
Smoothing energy landscape by adding
restraints as energy terms.
Structural coverage of secondary structure regions of 118 CASP9
targets with retrieved building blocks and their precision.
Spatial information obtained
from building blocks often
improved
prediction
accuracy for TBM targets
compared to
prediction
without these restraints. Our
modeling method tolerates
wrong restraints to some
extent. However, we often
do not find enough building
blocks to constrain the
search space for larger
targets.
* Evaluation was done on per target basis.
1a19A, 3.5 RMSD
+Restraints
Precision
T0693-D1 (100 aa), GDT 30.00
T0658-D1 (FM)
Acc. 0.32
GDT BB 6.29
GDT MBS 7.53
5. Analysis
3n2wA
Energy
Filter candidate building blocks based on statistical
feature analysis (Specificity ).
Coverage
FM
T0735-D3 (88 aa), GDT 32.10 T0740-D1 (155 aa), GDT 23.55
Energy
Retrieve matching building blocks with alignment of
HMM-fragment profiles of building blocks against
HMM target profile [2].
TBM
T0668-D1 (78 aa), GDT 44.23 T0662-D1 (76 aa), GDT 85.20 T0753-D1 (109 aa), GDT 55.56
Red: native
Rank 1 (all models)
Rank 7 (model 1)
Rank 12 (all models), 4 (model 1)
Blue: model
3. Prediction Stage: Retrieval of Building Blocks and Modeling
Retrieval and Filtering
Prediction Results RBO-MBS-BB
T0669 (TBM)
Acc. 1.0
GDT BB 50.26
GDT MBS 31,70
Sampling favors decoys containing distance
constraints of candidate building blocks.
[1] Chandonia JM, Hon G, Walker NS, Lo Conte L, Koehl P, Levitt M, Brenner SE. (2004). The ASTRAL compendium in 2004. Nucleic Acids Research 32:D189-D192.
[2] Söding,J. (2005). Protein homology detection by HMM-HMM comparison. Bioinformatics. 21, 951-960.
with building blocks
Our results show further evidence that conserved structural units, such as our building
blocks, exist and can be detected from sequence. We were able to leverage new spatial
information from our building blocks to successfully guide search for several TBM
targets. Although our building blocks provide a strong sequence signal, retrieving them
through sequence matching still has its limitations, especially for FM targets. Due to the
early stage of development we are not able to use the information provided by building
blocks to its full extent for modeling.
1a19A, 1.0 RMSD
Prediction improvements,
gray: native, colored: model
Further work will focus closing the gap between retrievable and existing building blocks.
We will further explore how to use their information best to successfully guide
conformational search.