Going Beyond Fragments Using Building Blocks to Guide Protein Structure Prediction I. Putz, M. Schneider, S. Doerr, F. Salomon, M. Mabrouk, F. Kamm, and O. Brock Robotics and Biology Laboratory, Technische Universität Berlin, Germany [email protected] 1. Building Blocks as Structural Alphabet Motifs Folds Homologs Range of Building Blocks Information about protein structure in the PDB 2. Preprocessing Stage: Comprehensive Building Block Library (BBL) Construction Validation Building block and set of associated sequences: All-vs-All FM TP TBM TP TBM FP 100 10 1 If we are able to retrieve building blocks for a target sequence they are mostly true positives. However, their accuracy and amount drops for FM targets. Retrieved Building Blocks Comparison of Prediction Performance * 100 TBM 90 FM TBM, bblocks accuracy > 0.5 FM, bblocks accuracy > 0.5 Building blocks helped 80 70 60 50 40 30 T0644 (TBM), Acc. 0.92 20 No or wrong building blocks or modeling failed 10 0 0 10 20 30 40 50 60 70 80 90 T0716 (TBM) Acc. 1.0 GDT BB 83.82 GDT MBS 66.67 100 GDT RBO-MBS (without building blocks) structural matches Set of non-redundant proteins spanning fold space from ASTRAL 1.75 [1]. FM FP T0653 T0684 T0695 T0734 T0739 T0740 T0741 T0735 T0719 T0726 T0658 T0693 T0737 T0649 T0650 T0659 T0661 T0666 T0668 T0671 T0673 T0675 T0676 T0678 T0680 T0688 T0690 T0691 T0696 T0700 T0709 T0711 T0713 T0720 T0742 T0743 T0681 T0705 T0749 T0732 T0753 T0746 T0724 T0712 T0689 T0651 T0710 T0663 T0697 T0733 T0747 T0744 T0707 T0699 T0702 T0706 T0738 T0667 T0648 T0683 T0750 T0704 T0721 T0672 T0717 T0698 T0644 T0652 T0736 T0674 T0686 T0708 T0654 T0687 T0685 T0682 T0657 T0664 T0645 T0655 T0662 T0669 T0679 T0692 T0694 T0701 T0703 T0714 T0715 T0716 T0731 T0752 Fragments global Accuracy of Retrieved Building Blocks * Log number of building blocks INFORMATION The PDB is believed to be structurally complete. We therefore mine the PDB for a "vocabulary" of naturally occurring substructures and use this vocabulary for protein structure prediction. Our vocabulary consists of building blocks: spatially contiguous but not necessarily sequence-contiguous structural units that are recurrent in the PDB. We present a method to extract conserved building blocks from the PDB. To account for the fact that structure is more preserved than sequence, this method initially ignores sequence and exclusively operates in the structural domain. Spatial information retrieved from matching building blocks is used to guide search in protein structure prediction. GDT RBO-MBS-BB (with bblocks) local 4. RBO-MBS-BB Prediction Results Matching building block: structurally contiguous and sequentially non-contiguous 'FSTLKSTVEAIWAG_SSMGIR_TIGGGI' 'MKIVHEIKERILDK_KAIGVY_EMMCVM' 'QLIEQEMKQAAYES_VTSTFH_SGVVVI' 'EYTKEVLKSIAEEL_IALEVM_HIHLFV' ... 'QELDYLTRHYLVKN_GYIKFI_KIEVYL' 3n05A Structural coverage of secondary structure regions with building blocks of 118 CASP 9 targets and target 3n05A. Guiding Search with Spatial Information Augment sequence signal of building blocks with homologous sequences (Sensitivity ). Long-range interactions derived from candidate building blocks serve as guiding restraints. Conformations BB Conformations Smoothing energy landscape by adding restraints as energy terms. Structural coverage of secondary structure regions of 118 CASP9 targets with retrieved building blocks and their precision. Spatial information obtained from building blocks often improved prediction accuracy for TBM targets compared to prediction without these restraints. Our modeling method tolerates wrong restraints to some extent. However, we often do not find enough building blocks to constrain the search space for larger targets. * Evaluation was done on per target basis. 1a19A, 3.5 RMSD +Restraints Precision T0693-D1 (100 aa), GDT 30.00 T0658-D1 (FM) Acc. 0.32 GDT BB 6.29 GDT MBS 7.53 5. Analysis 3n2wA Energy Filter candidate building blocks based on statistical feature analysis (Specificity ). Coverage FM T0735-D3 (88 aa), GDT 32.10 T0740-D1 (155 aa), GDT 23.55 Energy Retrieve matching building blocks with alignment of HMM-fragment profiles of building blocks against HMM target profile [2]. TBM T0668-D1 (78 aa), GDT 44.23 T0662-D1 (76 aa), GDT 85.20 T0753-D1 (109 aa), GDT 55.56 Red: native Rank 1 (all models) Rank 7 (model 1) Rank 12 (all models), 4 (model 1) Blue: model 3. Prediction Stage: Retrieval of Building Blocks and Modeling Retrieval and Filtering Prediction Results RBO-MBS-BB T0669 (TBM) Acc. 1.0 GDT BB 50.26 GDT MBS 31,70 Sampling favors decoys containing distance constraints of candidate building blocks. [1] Chandonia JM, Hon G, Walker NS, Lo Conte L, Koehl P, Levitt M, Brenner SE. (2004). The ASTRAL compendium in 2004. Nucleic Acids Research 32:D189-D192. [2] Söding,J. (2005). Protein homology detection by HMM-HMM comparison. Bioinformatics. 21, 951-960. with building blocks Our results show further evidence that conserved structural units, such as our building blocks, exist and can be detected from sequence. We were able to leverage new spatial information from our building blocks to successfully guide search for several TBM targets. Although our building blocks provide a strong sequence signal, retrieving them through sequence matching still has its limitations, especially for FM targets. Due to the early stage of development we are not able to use the information provided by building blocks to its full extent for modeling. 1a19A, 1.0 RMSD Prediction improvements, gray: native, colored: model Further work will focus closing the gap between retrievable and existing building blocks. We will further explore how to use their information best to successfully guide conformational search.
© Copyright 2026 Paperzz