Eur. J. Biochem. 271, 4762–4768 (2004) FEBS 2004 doi:10.1111/j.1432-1033.2004.04440.x Universal positions in globular proteins From observation to simulation Nikolaos Papandreou1, Igor N. Berezovsky2,3, Anne Lopes4, Elias Eliopoulos1 and Jacques Chomilier4 1 Laboratory of Genetics, Agricultural University of Athens, Greece; 2Department of Structural Biology, The Weizmann Institute of Science, Rehovot, Israel; 3Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA; 4 Equipe Biologie Structurale, LMCP, Universite´s Paris 6 and Paris 7, Paris, France The description of globular protein structures as an ensemble of contiguous Ôclosed loopsÕ or Ôtightened end fragmentsÕ reveals fold elements crucial for the formation of stable structures and for navigating the very process of protein folding. These are the ends of the loops, which are spatially close to each other but are situated apart in the polypeptide chain by 25–30 residues. They also correlate with the locations of highly conserved hydrophobic residues (referred to as topohydrophobic), in a structural alignment of the members of a protein family. This study analysed these positions in 111 representatives of different protein folds, and then carried out dynamic Monte Carlo simulations of the first steps of the folding process, aimed at predicting the origins of the assembling folds. The simulations demonstrated that there is an obvious trend for certain sets of residues, named Ômostly interacting residuesÕ, to be buried at the early stages of the folding process. Location of these residues at the loop ends and correlation with topohydrophobic positions are demonstrated, thereby giving a route to simulations of the protein folding process. Despite the continuously increasing number of experimentally determined protein structures, many new folds are still to be discovered. This was illustrated clearly in a recent study [1], where a plot of the number of protein families vs. the number of resolved complete genomes resulted in a quasi-linearly increasing function. Elucidating the evolutionary mechanisms leading to the emergence of a finite number of protein folds [2,3] from the vast number of protein sequences [4,5], as well as the mechanisms of the formation of mature protein globules [6], remains a topic both of great challenge and interest. The latter mechanisms are related to the physical basis of protein structure formation and stability [7], and thus can point to possible evolutionary routes [8]. This study is based on universal structural units of protein folds, named Ôclosed loopsÕ [9] or Ôtightened-end fragmentsÕ (TEFs) [10]. These major elements are universally present in all types of protein folds and have the following features in common: (a) they usually start and end in the hydrophobic core [11]; (b) they form loop-like structures of nearly standard size (25–30 amino acid residues); (c) they serve as universal units of protein domain structure [12]; (d) the ends of these elements (or so-called locks [13]), mainly correspond to clusters of hydrophobic amino acids in general (WIMVYLF), and highly conserved ones, the topohydrophobic (TH) positions [14,15], in particular. Determination of the TH positions is based on the analysis of multiple structural alignments of members of a protein family, limited to a pair sequence identity with a maximum of 30%. TH positions are of particular importance for the formation and stability of the protein core [16]. From a dynamic point of view, the early formation of a nucleus composed of TH positions would favor the formation of closed loops and considerably speed up the folding process [17]. The coupled concepts of TH and closed loops/TEFs therefore offer a simple and general scenario for the folding mechanism of globular proteins [11,15] and provide a set of critical positions in the protein core [10,11,13]. The loop structure of globular proteins is a general concept, independent from secondary structure, as well as from the particular folding mechanism of each protein [9,10,13]. This study addresses the question of predicting these critical positions from the sequence, a task of major importance to approach the structure of a protein of unknown folding. To successfully build such a structure, numerous pieces of information have to be collected by combining various methods. An initial calculation of critical positions could be a first step, providing a frame of structural restraints, as TEF limits and TH residues are located mainly inside the protein core. The notion of topohydrophobic positions suggests that the forces that bury these residues and lead to a stable core do not rely on the details of the amino acid side chain structure, but rather on an adequate succession of hydrophobic and polar amino acid residues along the polypeptide chain. Thus simplified protein models, such as lattice ones, are adequate tools for calculations aimed at locating critical residues. Correspondence to N. Papandreou, Laboratory of Genetics, Agricultural University of Athens, Iera Odos 75, 11855 Athens, Greece. Fax: +30 2105294322, Tel.: +30 2105294372, E-mail: [email protected] Abbreviations: MIR, mostly interacting residues; PDB, protein data bank; SCOP, structural classification of proteins; TEF, tightened end fragment; TH, topohydrophobic. (Received 29 June 2004, revised 22 September 2004, accepted 15 October 2004) Keywords: folding nucleus; hydrophobic core; lattice simulation; protein folding. FEBS 2004 Universal positions in globular proteins (Eur. J. Biochem. 271) 4763 This study was carried out on a dataset of 111 globular proteins with well-defined structures in the Protein Data Bank (PDB), that were representative of various folds, and for which the TEFs were available. For a subset of 73 proteins of the above database, the TH positions have also been determined. The initial stages of folding were simulated using a simplified model, which consists of an alpha-carbon reduced representation of the polypeptide chain on a 24-first neighbour lattice. A standard Monte Carlo algorithm dynamically simulated the folding process and a statistical mean force potential was used to describe the interactions between noncontiguous residues. A commonly accepted lattice model has been used [18] and was focused on the first stages of folding process, by measuring the tendency of amino acids to be packed inside the hydrophobic core, depending on the peculiarities of polypeptide chain sequence. Starting from random conformations, the Monte Carlo simulations revealed that a subset of hydrophobic residues had a strong tendency to be buried. These residues, named Ômostly interacting residuesÕ (MIR), were found to statistically match TEF limits and TH positions. These results are in agreement with the hydrophobic collapse mechanism, which can be further generalized onto the nucleation–condensation mechanism, a hybrid of hierarchical and hydrophobic collapse mechanism [23,24]. Materials and methods The protein database consisted of 111 globular protein chains, representing 78 different folds, according to the structural classification of proteins (SCOP classification) [22]. In detail, there are 26 a class proteins, 23 b class, 26 a + b class, 18 a/b class and 18 of the small proteins class, providing a balanced representation of the major known folds. The polypeptide chain lengths vary between 50 and 250 residues. Simulations have been carried out using a Ca representation of the polypeptide chains and the lattice geometry (Fig. 1) is as in [18]. Fig. 1. The lattice model. The solid line represents the backbone from Ca to Ca positions, while the dotted line is the underlying cubic lattice. On an underlying cubic lattice (Fig. 1, dotted lines) with edges of unit length, contiguous alpha carbons are connected by vectors of the form (± 2, ±p1,ffiffiffi0) (Fig. 1, solid lines). The length of such a vector is 5 lattice units and is equivalent to 3.8 Å, the typical distance between contiguous alpha carbons in proteins. In this geometry, for residue i, there are 24 possible positions for residue i + 1 to occupy. This kind of polypeptide chain projection allows for a more realistic representation of the polypeptide chain [18]. Two spatial constraints are implemented. First, the distance betweenpnoncontiguous alpha carbons cannot be ffiffiffi less than 3.8 Å ( 5 lattice units), and second (contrary to cubic lattice, where only angles of 90 and 180 are possible), limit angles here are 66 and 143 (seven possible values), approximating the range of pseudo-angle s in natural proteins [19]. The different nature of amino acids is taken into account in the force field used to attribute an energy value to each chain conformation. The distance-independent 20 · 20 residue pair energy matrix of Miyazawa and Jernigan was used [20]. In detail, if two noncontiguous residues i and j are found within a distance smaller or equal to 5.88 Å, a term Eij is added to the total energy, depending on their nature. p The ffiffiffiffiffi maximum interaction range of 5.88 Å corresponds to 12 lattice units and seems a reasonable estimate for the mean noncovalent interaction range between amino acid residues. For each protein, 100 different initial conformations were randomly generated and used as starting points for 100 simulation runs, to avoid dependency from the initial state. The only constraint placed on initial states is their noncompactness, in the sense that amino acid residues placed far away in the sequence were not allowed to be close in space, to avoid clustering due to particular initial state conformation. Quantitatively, this constraint introduces a minimum spatial distance, dmin, according to the separation Delta ¼ |i–j| between residues i and j: (1) Delta ¼ 6‚10, dmin ¼ 7 Å; (2) Delta ¼ 11‚15, dmin ¼ 11 Å; (3) Delta ¼ 16‚20, dmin ¼ 19 Å; (4) Delta more than 20, dmin ¼ 27 Å. The single residue movements [18] are of two kinds; end flip movement for the N and C terminal residues and corner movements for the others. The choice of the move set is more or less arbitrary, as the elementary one-residue moves are sufficient to bring the protein to a folded state. In this case, the restriction to elementary moves only, apart from its simplicity, permits a sequential analysis of the chain tendency to form compact fragments around particular amino acid residues from the beginning of the simulation. After each move, the calculated conformational energy was subjected to a standard Metropolis criterion, at constant temperature. Because the goal was to analyse the propensity of residues to be buried from the start of folding, we ensured that the maximum number of Monte Carlo steps was sufficient to allow formation of compact chain fragments. Due to the serial nature of the algorithm, this time limit is correlated to protein chain length L. It was empirically determined that for small proteins of about 50 residues, the value tmax is around 106 Monte Carlo steps. Thus, the following linear relation was adopted to generalize tmax to proteins of any length L: tmax ¼ INT (106 L/50), where INT is integer part, because tmax is an integer by definition (Monte Carlo steps). FEBS 2004 4764 N. Papandreou et al. (Eur. J. Biochem. 271) For each simulation, 104 records of intermediate conformations were taken at regular time intervals. As the number of simulations per protein is 100 (one for each initial state), the end result is a set of 106 records per protein. For every recorded conformation, and for each amino acid residue the number of residues with which it is in noncovalent interaction was calculated. In spatial terms, these noncovalent neighbours are the amino acid residues pffiffiffiffiffi lying within a distance of 5.88 Å or 12 lattice units. For a given protein and for residue i, at the r-th record, the number of noncovalent neighbours is nc(i,r). The time mean of this quantity is 6 NCðiÞ ¼ 10 1 X nc(i,r) 6 10 r¼1 NC(i) values are rounded to the nearest integer. This mean number of noncovalent neighbours is a quantitative measure of the tendency of a residue to be buried from solvent. The higher the NC(i), the stronger this tendency. If NC is the mean value of NC(i) over the sequence for a given protein, the residues for which NC(i) is significantly higher than NC are of particular interest and are called mostly interacting residues (MIRs). Their selection requires fixing a cut-off value above the mean value NC. It was found that NC(i) varies between 1 and 8 and that NC ¼ 4 for all studied proteins. Figure 2 presents the distribution of the different values of NC(i) over the amino acid residues of all 111 proteins. The most probable value is four, which coincides with the mean sequence value, which is also four for all proteins as stated above. From this distribution, it appears that 13% of residues have a number of noncovalent neighbours equal to or higher than six, which was adopted as the lowest NC(i) value for considering residue i as a MIR. In order to validate this model, once the positions of MIRs were determined they were compared to TEF limits and to topohydrophobic positions. The comparison with TEFs was performed on the complete database of 111 proteins. The comparison with TH positions was performed on a 73protein subset of this database, where these positions were determined. For the remaining 38 proteins, the calculation of TH positions was not possible, because to obtain this at least four 3D structures of members of the same family are required, with a pair identity not exceeding 30% [14,15]. This critirion was not fullfilled for these 38 cases. The PDB codes [21] of the database are given in Table 1. Results The Monte Carlo algorithm for folding simulation has been applied to the entire protein dataset and the histograms NC(i), containing the distribution of noncovalent neighbours along the amino acid sequence, have been obtained for each protein. In Fig. 3 the positions of TEFs, TH and MIR for 10 proteins of the database representative of the various classes as determined by SCOP [22] are illustrated. Among the 1920 calculated MIRs, 92% were hydrophobic, following the definition of topohydrophobic residues (i.e. they belonged to the set ÔVIMWYLFÕ). Also, the total numbers of MIRs and TH positions, in the 73-protein subset where they are compared, are relatively close (1299 MIRs vs. 1011 TH). In the same subset, the total number of TEFs was 309; thus the number of TEF limits was 618, about half the number of MIRs. To assess the overall quality of agreement between predicted critical positions (MIR) and structure-defined ones (TH and TEF limits), a statistical analysis is required. This has been carried out over the whole database, i.e. over all 111 proteins for the comparison between MIR and TEF limits and for the subset of 73 proteins for the comparison between MIR and TH. The results are presented in two histograms in Figs 4 and 5. The histogram of Fig. 4 gives the comparison between MIR and TH positions and is constructed as follows. Each TH position is placed at the origin of the abscissa. Then, the neighbouring MIRs that are closer to this central TH than to any other TH are located. Their number is plotted as a function of their sequence distance with respect to the central TH. This is reproduced for all THs along all the 73 proteins of the data set. Thus Fig. 4 shows a histogram of the separation between TH and the closest MIR. The plotted distances range from )20 to +20, and MIRs lying at distances greater than ± 20 residues from the closest TH are added to the histogram at the ± 20 positions. The second histogram (Fig. 5) follows the same rules and concerns the comparison of MIR to TEF limits. It is constructed using the whole database of 111 proteins. From observation of Figs 4 and 5 it is evident that comparison of MIR with TH and TEF limits clearly presents a peak at the origin. This is an indication that the residues predicted to be MIRs actually do correspond to TH positions. They also statistically correlate with TEF limits, which are mostly hydrophobic [13] as it was already shown that most TH positions are located in or in vicinity of TEF ends [10]. The agreement between MIR and TH is very clear and 63% of MIR were found within ± 5 positions from a TH residue. The TEF histogram presents two main secondary maxima at positions ± 3 and 57% of MIR was found within ± 5 positions from a TEF limit. This good agreement between prediction and analysis [13] is of great interest in the prediction of elements of the protein core from the sequence. Discussion Fig. 2. Distribution of the mean number of noncovalent neighbours over all 111 sequences of the dataset. The existence of critical positions in protein structures, punctuated by TH positions and/or TEF limits, is of great importance for protein folding and stability. Consecutive formation of the globule core [10,11,17] composed essentially of these residues [13] leads to tremendous optimi- FEBS 2004 Universal positions in globular proteins (Eur. J. Biochem. 271) 4765 Table 1. A list of the PDB codes, names and SCOP classes of the proteins studied. The TEFs are known for all these proteins. Proteins with known TH positions are in bold. The uppercase letters at the end of the code correspond to the chain. PDB code Name SCOP PDB code Name SCOP PDB code Name SCOP 1aep Apolipophorin-III a 2sns b 1gmpA RNase Sa a+b b b 1aba 1opr b 1ble a 2pelA Legume lectin b 3cla Cytochrome c oxidase Phospholipase A2 Lysin Retinoid-X receptor a Cytochrome c3 Interleukin-10 a a a a a a 1knb 2stv 1pmy 1qabA 2plv3 1cbs b b b b b b 5nll 3chy 1 cls 1dhr 5p21 1asu 1rro Oncomodulin a 1ivpA Adenovirus fibre STNV coat protein Pseudoazurin Transthyretin Picornavirus Cellular retinoicacid-binding protein 2 (HIV-2) protease b 1lbbA 2sas Calcium-binding protein Parvalbumin Myoglobin Glycera globin Lamprey globin Hemoglobin (horse) Hemoglobin (human) a 1ptf a+b 1tml a a a a a a 1ubi 1frd 153 L 1lsg 1acf 1ctf a+b a+b a+b a+b a+b a+b 1tpfB 1brsA 1akz 1rvvA 1 ns5A 1jkeB Triosephosphate isomerase Endonuclease Uracil-DNA glycosylase Lumazine synthase Hypothetical protein YbeA D-Tyr tRNAtyr deacylase a/b a/b a/b a/b a/b a/b a a 1aihA 1apyA Histidine-containing phosphocarrier Ubiquitin 2Fe-2S ferredoxin Lysozyme, Goose Lysozyme, Chicken Profilin Ribosomal protein L7/12 Integrase Glycosylasparaginase Glutaredoxin Orotate phosphoribosyltransferase Fructose permease, subunit Iib Chloramphenicol acetyltransferase Flavodoxin Signal transduction protein Cutinase Dihydropteridin reductase cH-p21 Ras Retroviral integrase, catalytic domain Glutamate receptor ligand binding core Cellulase E2 a/b a/b 1lcl Staphylococcal nuclease Xylanase II Bacillus 1–3, 1–4-b-glucanase Serine esterase 1utg 2mhr Uteroglobin Myohemerythin a a 1yna 2ayh 256bA Cytochrome b562 a 1aa0 Fibritin 1occD 1poc 1lis 1lbd 2cy3 2ilk a+b a+b 1iodG 1dtdB Coagulation factor X Carboxypeptidase inhibitor small small small small small small 4cpv 1bvd 1hbg 2lhb 2mhbA 1dkeA 1eca 1lki a/b a/b a/b a/b a/b a/b a/b a/b a/b a/b a 1ast Astacin a+b 1icfI 3c2c 1 bp2 1enh Erythrocruorin Leukemia inhibitory factor Mitochondrial cytochrome c Cytochrome c2 Phospholipase A2 DNA-binding protein a a a 1dtp 1nox 2pii a+b a+b a+b 2bbkL 1sgpI 1ajj 2erl Pheromone a 1durA a+b 1i8nA Anti-platelet protein small 1pht b 1fxd a+b 1ejgA Crambin small b b b 1c0bA 1shaA 1ag2 a+b a+b a+b 1ehs 1tgj 4rxn b 1abrA Abrin A-chain a+b 1caa 1cdcA 2 lm CD2, first domain Macromycin b b 1plfB 1mgsA a+b a+b 1fas 1pk4 1anu Cohesin-2 domain b 1hucB Platelet factor 4 Chemokine (growth factor) (Pro)cathepsin B Heat-stable enterotoxin B TGF-b3 Rubredoxin, Clostridium pasteurianum Rubredoxin, Archaeon Pyrococcus furiosus Fasciculin Plasminogen small small small 1reiA Phosphatidylinositol 3-kinase a-Spectrin, SH3 domain Signal transduction protein Seed storage protein 7 s vicillin Immunoglobulin Diphtheria toxin NADH oxidase Signal transduction protein Ferredoxin II, Peptostreptococcus Ferredoxin II, Desulfovibrio gigas Ribonuclease A c-src Tyrosine kinase Prion protein domain MHC class II p41 invariantchain fragment Methylamine dehydrogenase Ovomucoid III domain ldl Receptor a+b 1hpi small 1f3g Glucose-specific factor III b 2act Actinidin a+b 1hip 1sno Staphylococcal nuclease b 2 ci2 a+b 1knt small 1gpc DNA-binding protein b 1fkb Chymotrypsin inhibitor CI-2 FK-506 binding HIPIP, Ectothiorhodospira vacuolata HIPIP, Allochromatium vinosum Collagen type VI a+b 1edmB Factor IX small 3cytO 1pwt 1semA 1cauB small small small small 4766 N. Papandreou et al. (Eur. J. Biochem. 271) FEBS 2004 Fig. 3. Examples of comparison of MIR, TH and TEF for 10 sequences of various folds. In each example, the PDB code (with the chain) is given, followed by the name, the SCOP class and the fold of the protein in parentheses. The following lines represent the sequence and the TEFs. The residues belonging to a TEF are indicated ÔIÕ. In case of TEF overlap, two lines are used for this representation (for example in protein 1shaA). The next line shows TH positions, where the corresponding residues are indicated ÔTÕ. The final line shows MIR residues, indicated by ÔMÕ. For 3chy and 5p21, due to the sequence length, the results appear in two consecutive blocks. zation of the folding process, by reducing the conformational space to be explored. Thus, the prediction of these ÔhotÕ residues becomes an important step in approaching the native three-dimensional structure. A first approach to this goal was undertaken in this study. The guiding hypothesis was that, in order to achieve fast folding, FEBS 2004 Universal positions in globular proteins (Eur. J. Biochem. 271) 4767 Fig. 4. Histogram of the correspondence between TH positions and MIR from a set of 73 proteins. Fig. 5. Histogram of the correspondence between TEF ends and MIR from a set of 111 proteins. critical residues should have a tendency to contact each other and thus form the origins of the hydrophobic core. The results confirmed this hypothesis. Using a simple alpha-carbon lattice model, formation of the nucleation sites at initial steps of the folding process was demonstrated. These results suggest that folding initiation can be based on the early formation of a set of nucleation sites around selected hydrophobic residues [10,11,13]. This is essentially the basis of the hydrophobic collapse mechanism [23], which supposes formation of hydrophobic tertiary interactions that initiate secondary structure. It can be extended onto a unified nucleation–condensation mechanism, which is a combination of hierarchical and hydrophobic collapse mechanisms [23,24]. In the latter case, hydrophobic tertiary interactions are consolidated at the same time as elements of secondary structure (with possible variations of the kinetics of the mechanism caused by the different intrinsic stabilities of the secondary structural elements). These models have been developed from experiments and simulations of folding and unfolding of several small proteins [23,24] and particularly from the analysis of the residual structure of denatured states, which are thought to correlate to the nucleation sites. The comparison of MIR predictions with this type of data is being considered for future studies. The secondary peaks in the histogram representing the correlation between MIR and TEF (Fig. 5) come from the proteins belonging mainly to the a class. For these folds, the TEF limits are often located inside a helices and are mainly hydrophobic. Sometimes, the predicted MIR are not exactly these limits but are the nearest hydrophobic residues, which in a helix are located three positions away because of the a-helix periodicity. This observation is in full agreement with the definition of the van der Waals locks, as extended (three to five residues long) segments of polypeptide chains interacting with each other, and thus forming Ôloop-n-lockÕ structures in globular proteins [13]. The main conclusion of this study is that burying MIR positions can serve as the creation of anchors for sequential formation of closed loops. These results remarkably corroborate experimental evidence on the initial stages of the folding process. NMR analysis of folding intermediates of protein bovine pancreatic trypsine inhibitor [25] revealed loop formation in early, non-native states, stabilized by nonlocal interactions. Also, an NMR study on the folding of lysozyme [26] showed the early formation of hydrophobic clusters, which are linked together by long-range interactions. These interactions were shown not to occur in the native structure, but they are apparently important for keeping the loop structure and thereby speeding up the folding procedure. The appearance of these essential features in this folding simulation permits an initial estimation of the anchor regions for loop formation. This approach therefore provides a set of structural constraints from first principles for an unknown structure. This information could be incorporated at the early steps of a prediction method for building protein structures from the sequence by producing anchor residues known to belong to the structural core. In a second stage they can be introduced as a set of constraint distances in a more detailed modeling process. Acknowledgements This project has been funded by a Concerted Action from the European Union, QLG2-CT-2002–01298, and by the Greek-French bilateral PLATO program (grant no 04146WM). I. N. B. was also supported by the Post-Doctoral Fellowship of the Feinberg Graduate School, Weizmann Institute of Science. References 1. Kunin, V., Cases, I., Enright, A.J., de Lorenzo, V. & Ouzounis, C.A. (2003) Myriads of protein families, and still counting. Genome Biol. 4, 401. 2. Koonin, E.V., Wolf, Y.I. & Karev, G.P. (2002) The structure of the protein universe and genome evolution. Nature 420, 218–223. 3. Xia, Y. & Levitt, M. (2004) Simulating protein evolution in sequence and structure space. Curr. Opin. Struct. Biol. 14, 202– 207. 4. Rost, B. (2002) Did evolution leap to create the protein universe? Curr. Opin. Struct. Biol. 12, 409–416. 5. Liu, J. & Rost, B. (2003) Domains, motifs and clusters in the protein universe. Curr. Opin. Chem. Biol. 7, 5–11. 6. Daggett, V. & Fersht, A. (2003) The present view of the mechanism of protein folding. Nat. Rev. Mol. Cell. Biol. 4, 497– 502. 7. Shakhnovich, E.I. (1997) Theoretical studies of protein-folding thermodynamics and kinetics. Curr. Opin. Struct. Biol. 7, 29–40. 8. Tiana, G., Shakhnovich, B.E., Dokholyan, N.V. & Shakhnovich, E.I. (2004) Imprint of evolution on protein structures. Proc. Natl Acad. Sci. USA 101, 2846–2851. 4768 N. Papandreou et al. (Eur. J. Biochem. 271) 9. Berezovsky, I.N., Grosberg, A.Y. & Trifonov, E.N. (2000) Closed loops of nearly standard size: common basic element of protein structure. FEBS Lett. 466, 283–286. 10. Lamarine, M., Mornon, J.P., Berezovsky, I.N. & Chomilier, J. (2001) Distribution of tightened end fragments of globular proteins statistically match that of topohydrophobic positions: towards an efficient punctuation of protein folding? Cell. Mol. Life Sci. 58, 492–498. 11. Berezovsky, I.N., Kirznher, V., Kirzhner, A. & Trifonov, E.N. (2001) Protein folding: looping from hydrophobic nuclei. Proteins 45, 346–350. 12. Berezovsky, I.N. (2003) Discrete structure of van der Waals domains in globular proteins. Protein Engineering 16, 161–167. 13. Berezovsky, I.N. & Trifonov, E.N. (2001) Van der Waals locks: loop-n-lock structure of globular proteins. J. Mol. Biol. 307, 1419– 1426. 14. Poupon, A. & Mornon, J.P. (1998) Populations of hydrophobic amino acids within protein globular domains; identification of conserved ÔtopohydrophobicÕ positions. Proteins 33, 329– 342. 15. Poupon, A. & Mornon, J.P. (1999) ÔTopohydrophobic positionsÕ as key markers of globular protein folds. Theoret Chem. Accounts 101, 2–8. 16. Poupon, A. & Mornon, J.P. (1999) Predicting the protein folding nucleus from sequences. FEBS Lett. 452, 283–289. 17. Berezovsky, I.N. & Trifonov, E.N. (2002) Loop fold structure of proteins: resolution of Levinthal’s paradox. J. Biomol. Struct. Dynamics 20, 5–6. FEBS 2004 18. Skolnick, J. & Kolinski, A. (1991) Dynamic Monte Carlo simulations of a new lattice model of globular protein folding, structure and dynamics. J. Mol. Biol. 221, 499–531. 19. Labesse, G., Colloc’h, N., Pothier, J. & Mornon, J.P. (1997) P-SEA: a new efficient assignment of secondary structure from C alpha trace of proteins. Comput Appl. Biosci. 13, 291–295. 20. Miyazawa, S. & Jernigan, R.L. (1996) Residue-residue potentials with a favorable contact pari term and an unfavorable high packing density term for simulation and threading. J. Mol. Biol. 256, 623–644. 21. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N. & Bourne, P.E. (2000) The Protein Data Bank. Nucleic Acids Res. 28, 235–242. 22. Murzin, A.G., Brenner, S.E., Hubbard, T. & Chothia, C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536– 540. 23. Fersht, A. & Daggett, V. (2002) Protein folding at atomic resolution. Cell 108, 573–582. 24. Fersht, A. (1997) Nucleation mechanisms in protein folding. Curr. Opin. Struct. Biol. 7, 3–9. 25. Ittah, V. & Haas, E. (1995) Nonlocal interactions stabilize long range loops in the initial folding intermediates of reduced bovine pancreatic trypsin inhibitor. Biochemistry 34, 4493–4506. 26. Klein-Seetharaman, J., Oikawa, M., Grimshaw, S.B., Wirmer, J., Duchardt, E., Ueda, T., Imoto, T., Smith, L.J., Dobson, C.M. & Schwalbe, H. (2002) Long-range interactions within a non-native protein. Science 295, 1719–1722.
© Copyright 2026 Paperzz