BIOINFORMATICS APPLICATIONS NOTE Vol. 17 no. 2 2001 Pages 202–203 RCNPRED: prediction of the residue co-ordination numbers in proteins Piero Fariselli and Rita Casadio ∗ CIRB Biocomputing Unit, Department of Biology, University of Bologna, via Irnerio 42, 40126 Bologna, Italy Received on July 20, 2000; revised on September 19, 2000; accepted on October 13, 2000 ABSTRACT Summary: The RCNPRED server implements a neural network-based method to predict the co-ordination numbers of residues starting from the protein sequence. Using evolutionary information as input, RCNPRED predicts the residue states of the proteins in the database with 69% accuracy and scores 12 percentage points higher than a simple statistical method. Moreover the server implements a neural network to predict the relative solvent accessibility of each residue. A protein sequence can be directly submitted to RCNPRED: residue co-ordination numbers and solvent accessibility for each chain are returned via e-mail. Availability: Freely available to non-commercial users at http://prion.biocomp.unibo.it/rcnpred.html Contact: [email protected]; [email protected] In the post-genomic era, efficient automatic methods for prediction of protein features are becoming increasingly important to cope with the amount of data arising from sequencing projects. We tackled the problem of predicting residue co-ordination numbers by developing a neural network-based method. Correct predictions of residue coordination numbers in a sequence are particularly relevant in helping to find the correct protein folding. Methods that predict contacts between residue pairs can benefit from imposing constraints to the maximum number of contacts that each residue can make. Tools that address the problem of predicting contacts among protein residues have been developed with some extent of success (Thomas et al., 1996; Olmea and Valencia, 1997; Fariselli and Casadio, 1999). The correct assignment of the positions of residue contacts in proteins has proven extremely effective to determine the three-dimensional structure of a given protein, as it was recently demonstrated in the CASP3 competition (Ortiz et al., 1999). The present method was trained to discriminate between two different states of residue contact numbers (or co-ordination numbers). For each residue type, the contact number in a given position ∗ To whom the correspondence should be addressed. 202 of the protein sequence can be greater or lower than the average value of the contact distribution of the residue type in the database. The residue co-ordination number is computed inside a spherical cut-off centred into each residue and by counting the number of residues falling inside a defined volume (Flöckner et al., 1995). The contact distributions computed for each residue type (20) using the database are endowed with different average values. The threshold for discriminating whether a contact number is greater or lower than the average value is different and depends on the residue. This procedure ensures a direct comparison between contact numbers of different residue types, irrespective of their size and steric hindrance. We have previously shown that the best performing method to predict residue co-ordination numbers is a neural network trained with evolutionary information in the form of sequence profile (Fariselli and Casadio, 2000). The neural network implemented in RCNPRED operates with an input window comprising 15 residues. This choice was made since the prediction accuracy is unaffected by changing the window dimension from 7 to 17 residues. The number of hidden neurons is set to 8 for similar reasons (the explored range was from to 2 to 32 nodes). A baseline predictor (Richardson and Barlow, 1999) was used as the simplest possible predictor to score the neural network accuracy. This comparison showed that the neural network trained with evolutionary information as input performs 12 percentage points better than the baseline one (Fariselli and Casadio, 2000). Although a strict connection between accessibility and contact numbers is commonly accepted, for each residue the surface accessibility is differently distributed than the number of residue contacts in the database (Fariselli and Casadio, 2000). Therefore RCNPRED also implements a neural network predicting whether a given residue is exposed or not to the solvent. Evolutionary information is used as input, similarly to previously described methods (Rost and Sander, 1994; Cuff and Barton, 2000). The two-state prediction of the solvent accessibility reaches an accuracy of 76% on a cross-validated set comprising 651 c Oxford University Press 2001 RCNPRED S CN Prob RACC = = = = Query sequence (residue 1-letter code) Range of predicted residue co-ordination number Probability of the residue contact assignment Relative accessibility ( Exposed (E) >=16% Buried (B) < 16%) P(E)/P(B) = Network outputs relative to the exposed (E) or buried (B) classes, respectively. ____________________________________________________________ S CN Prob RACC P(E) P(B) ____________________________________________________________ G 6<= CN <= 12 with probability of K 5<= CN <= 10 with probability of K 5<= CN <= 10 with probability of K 5<= CN <= 10 with probability of D 5<= CN <= 11 with probability of R 5<= CN <= 11 with probability of K 5<= CN <= 10 with probability of G 0<= CN <= 6 with probability of E 0<= CN <= 5 with probability of D 0<= CN <= 5 with probability of A 0<= CN <= 6 with probability of R 5<= CN <= 11 with probability of Y 6<= CN <= 11 with probability of ................................. 0.98 0.95 0.96 0.93 0.90 0.98 0.91 0.94 0.84 0.79 0.51 0.75 0.73 E E E E E E E E E E B E E 0.995 0.999 0.999 0.999 0.994 0.987 0.950 0.925 0.847 0.950 0.252 0.825 0.749 0.005 0.001 0.001 0.001 0.006 0.013 0.050 0.075 0.153 0.050 0.748 0.175 0.251 Fig. 1. The output of RCNPRED. S is the query sequence given by the user and pasted on the web interface. CN (the co-ordination number) gives the range of the predicted number of contacts for each residue. Prob is the reliability of the prediction (its range is [0, 1]). RACC is the predicted relative residue accessibility (Buried (B) or Exposed (E)). The classification depends on a relative accessibility value lower or higher than 16%. Associated probability values (P(B) and P(E)) are also computed. proteins endowed with a low sequence identity (<25%) (Fariselli and Casadio, 2000). The architecture of the RCNPRED server is extremely simple. It takes a single sequence from the web page and it uses the PSI-BLAST program to search against SWISSPROT for similarity sequences. Subsequently, a script directly uses the PSI-BLAST output for building a sequence profile suited for the net input. After this, two types of predictions (residue co-ordination number and relative solvent accessibility) are computed and both results are mailed back to the user in ASCII format. A typical server output is shown in Figure 1, where for each residue of the query sequence, different predicted features are reported in a column format. The first column lists the predicted range of the residue co-ordination numbers (CN), giving the minimum and the maximum predicted values. The second column represents the level of confidence of the prediction (Prob), evaluated as the absolute value of the difference between the two output values of the network. This is a real number ranging from 0 (the lowest reliability) to 1 (the higher reliability). The last column is the predicted relative accessibility of each residue with our system. Two labels highlight buried (B) or exposed (E) residues and their probability values (P(B) and P(E)). The decision threshold for this prediction is set equal to 16% of the relative solvent accessibility (Rost and Sander, 1994). In summary, RCNPRED is a predictor for discriminating if a given residue, depending on its sequence context, has a number of contacts greater or lower than its average value in the database. This type of classification is complementary to predicting residue solvent accessibility and can be used to improve protein structure prediction. ACKNOWLEDGEMENTS Contract grant sponsors were Ministero della Università e della Ricerca Scientifica e Tecnologica (MURST) and the Italian Centro Nazionale delle Ricerche (CNR). REFERENCES Cuff,J.A. and Barton,G.J. (2000) Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins, 40, 502–511. Fariselli,P. and Casadio,R. (1999) Neural network based predictor of residue contacts in proteins. Protein Eng., 12, 15–21. Fariselli,P. and Casadio,R. (2000) Prediction of the number of residue contacts in Proteins. In Proceedings of the Eighth International Conference on Intelligence Systems for Molecular Biology. AAAI Press, Menlo Park, CA, pp. 146–151. Flöckner,H., Braxenthaler,M., Lackner,P., Jaitz,M., Ortner,M. and Sippl,M.J. (1995) Progress in fold recognition. Proteins, 3, 376– 386. Olmea,O. and Valencia,A. (1997) Improving contact predictions by the combination of correlated mutations and other sources of sequence information. Fold Des., 2, S25–32. Ortiz,A.R., Kolinski,A., Rotkiewicz,P., Ilkowski,B. and Skolnick,J. (1999) Ab initio folding of proteins using restraints derived from evolutionary information. Proteins, 3 (Suppl.), 177–185. Rost,B. and Sander,C. (1994) Conservation and prediction of solvent accessibility in protein families. Proteins, 20, 216–226. Richardson,C.J. and Barlow,D.J. (1999) The bottom line for prediction of residue solvent accessibility. Protein Eng., 12, 1051– 1054. Thomas,D.J., Casari,G. and Sander,C. (1996) The prediction of protein contacts from multiple sequence alignments. Protein Eng., 9, 941–948. 203
© Copyright 2026 Paperzz