Binding pocket shape analysis for protein function prediction

Binding pocket shape analysis for protein function
prediction
Richard J. Morris* , Abdullah Kahraman , Thomas Funkhouser , Rafael
Najmanovich , Gareth Stockwell , Fabian Glaser , Roman Laskowski , and
Janet M. Thornton
EMBL-EBI, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK
John Innes Centre, Norwich Research Park, Colney, Norwich NR4 7UH, UK
Princeton University, 35 Olden Street, Princeton, NJ 08544, USA
1 Introduction
Structural Genomics initiatives are generating an increasing number of protein structures with
very limited biochemical characterisation. The analysis and functional assignment of protein
structure is a key challenge and a major bottleneck towards the goal of well-annotated genomes.
As shape plays a crucial rôle in biomolecular recognition and function, the development of
shape analysis techniques is important for understanding protein structure-function relationships.
Protein surfaces typically contain clefts of different shapes and sizes. It has been shown that
for enzymes, the active site is commonly found in the largest cleft, Laskowski et al. (1996).
This makes the detection of the active site relatively straightforward, and can be performed by
a program such as SURFNET, Laskowski (1995). SURFNET does this by placing expandable
spheres between protein atoms such that the radii of these spheres does not penetrate the two
protein atoms or any nearby atoms. The union of all such spheres is used by SURFNET to
describe the 3D cleft shape. Residue conservation can be used to improve the binding pocket
prediction, Glaser et al. (2005).
In this contribution, we present an efficient method for comparing protein binding pockets
based on a spherical harmonics expansion. Binding pocket shapes are approximated as
func
tions on the unit
sphere
by
describing
each
surface
point
by
its
spherical
coordinates
and setting , Morris et al. (2005). The same methodology can be applied to enhance the description by including other properties such as the electrostatic potential, Ritchie
and Kemp (1999, 2000).
2 Shape and spherical harmonics
Shape is a key concept in the molecular sciences. Especially in molecular biology, shape is
a major factor in virtually all processes within the cell as shape plays an important rôle in
molecular recognition. Shape is, however, not an absolute property at the molecular level, as
probes with differing electronic properties will experience different interactions and therefore
see different shapes. This complication will be ignored in the following and molecules will be
treated as rigid bodies with well-defined boundaries defined by their van der Waals radii.
Mathematically there exist many ways of describing and representing shapes including triangulations, iso-surfaces, polygons, equations, distance distributions and statistical landmark
theory. Here we use explicit, global
functions in the form of a real spherical harmonics ex
, are single-valued, infinitely differentiable, complex
pansion. Spherical harmonics, 91
functions of two variables, and , indexed by two integers, and . Spherical
harmonics
form a complete set of orthonormal functions and thus any function of and can be expanded
as
(2.1)
The expansion coefficients, , are obtained from
(2.2)
In this manner, a spectral decomposition (Fourier analysis) of any function on the unit sphere
may be performed. The lower values describe the rough low-resolution shape, whereas the
higher values add high-resolution detail to the picture.
Due to the orthonormality of the spherical harmonics, the expansion coefficients are unique
and can therefore be used directly for describing shapes. This numerical integration can be
a time-consuming undertaking and the results are often highly sensitive with respect to the
integration layout and the weights. We circumvent such problems and speed up the procedure
by using spherical-t-designs, Hardin
and Sloane (1996). The full integration over the unit sphere
for all and values up to
can be computed in about one hundredth of a second on a
standard single processor linux box.
Figure 1: The method in pictures. Top left: The surface of the protein is analysed and binding
pockets are predicted using the program SURFNET. Top right: The SURFNET binding pocket
prediction is reduced based on residue conservation scores. Bottom left: The real location of
the ligand. Bottom right: The shape of the binding pocket is expanded in spherical harmonics
and these coefficients are used to compare against coefficients describing other binding pockets
and ligands.
Shape expansion coefficients have been computed for all ligands in the PDB and for a number of protein binding pockets predicted from deposited protein structures. Shapes can be compared using a standard "! metric
92
Figure 2: The results of matching 40 binding pocket shapes to 40 ligands. Hits on the diagonal
indicate that the binding pocket shape matched best to the right ligand in the right conformation.
Off-diagonal hits mean either the right ligand in a different conformation or the wrong ligand.
The dimension is determined by place in 225-dimensional space.
. For !
(2.3)
these comparisons therefore take
3 Results
We present the results for a test dataset consisting of 40 low-similarity proteins that bind to either
ATP, NAD, a heme or a steroid. The shape matching matrix is shown in Fig. 2. As may be seen,
shape matching using spherical harmonics provides in many cases valuable information about
the ligand. The results shown here are for shape only. The inclusion of additional properties is
likely to enhance the performance further.
4 Conclusions
A fast shape-matching method has been described for capturing the global shape of a protein’s
binding pocket or ligand. The method can also be applied directly to the protein
itself.
The
surface is treated as a (single-valued) function of the spherical coordinates, and , that can
then be expanded in a linear combination of real spherical harmonics. If two objects share a
common orientation, these coefficients can be directly compared using the standard Euclidean
distance ( ! ) metric in -dimensional space, where is determined by the order of the spherical harmonics chosen to represent the shape. We employ a heuristic that defines a standard
frame of reference based on the first, second, and third moments of a 3D object. Although
93
spherical harmonics enjoy mathematically convenient rotational properties, it would be of advantage not have to perform any rotational superposition (optimisation) at all. In the field of
Computer Science this has led to the development of 3D shape retrieval systems based on rotation invariant descriptors, Funkhouser at al. (2003), Kazhdan at al. (2003). This approach
of course loses orientation information in that the original shape can obviously not be reconstructed from the rotation invariant descriptors, but overall the advantages of this method make
it superior to registration dependent approaches.
The main hurdle in this method is how to obtain an accurate prediction of the binding pocket.
Another difficulty is the above mentioned alignment and also how to handle conformational
changes in both protein and ligands.
Although there are potential problems with this approach, we have shown that for predicting
protein function, shape is a powerful concept that can be described and compared efficiently
using spherical harmonics.
References
Funkhouser, T., Min, P., Kazhdan, M., Chen, J., Halderman, A., Dobkin, D. and Jacobs, D.
(2003). A search engine for 3D models, ACM Transitions on Graphics, 22,1, 1-28.
Glaser, F., Morris, R. J., Najmanovich, R. J., Laskowski, R., and Thornton, J. M. (2005).
SURFNET + ConSurf: A two-stage methodology for accurate binding pocket prediction.
Submitted.
Hardin, R.H and Sloane, J. A. (1996). McLaren’s Improved Snub Cube and Other New Spherical Designs in Three Dimensions, Discrete and Computational Geometry, 15, 429-441.
Laskowski, R. (1995). SURFNET: A program for visualizing molecular surfaces, cavities, and
intermolecular interactions, J. Mol. Graph., 13, 323-330.
Laskowski, R., Luscombe, N. M., Swindells, M. B. and Thornton, J. M. (1996). Protein clefts
in molecular recognition and function, Protein Sci., 5. 2438-2452.
Kazhdan, M., Funkhouser, T. and Rusinkiewicz, S. (2003). Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors, in Eurographics Symposium on Geometry
Processing editors Kobbelt, Schröder and Hoppe.
Morris, R. J., Najmanovich, R., Kahraman, A., Thornton, J. M. (2005). Real spherical harmonic expansion coefficients as 3D shape descriptors for protein binding pocket and ligand comparisons. Bioinformatics. 15 21(10):2347-55.
Ritchie, D. W. and Kemp, G. J. L. (1999). Fast Computation, Rotation, and Comparison of
Low Resolution Spherical Harmonic Molecular Surfaces, J. Comp. Chem., 20,4, 383395.
Ritchie, D. W. and Kemp, G. J. L. (2000). Protein Docking Using Spherical Polar Fourier
Correlations, Proteins: Structure, Function, and Genetics, 39,4, 178-194.
94