On a Generic Shape Complementarity Score

On a Generic Shape Complementarity Score
Horea T. Ilieş∗
Morad Behandish
Motivation. The ability to quantify shape complementarity (i.e., a measure for the ‘goodness of
fit’) of geometric interfaces appears fundamental to applications as diverse as mechanical design and
manufacturing automation, robot motion planning and navigation, protein docking and rational
drug design, and in the broad scientific arena whenever the behavior/function of a system is dependent on proper geometric alignment (i.e., interfaceability) of the constituents. However, the current
challenge lies in the lack of a generic mathematical formulation that applies to objects of arbitrary
shape, and obtaining a general measure of interfaceability without simplifying assumptions on the
shape domain remains an open problem.
In spite of the substantial amount of research on ad-hoc measures of shape complementarity
for protein complexes (i.e., finite arrangements of spherical atoms) reviewed in [5, 8], the problem
is scarcely studied for objects of arbitrarily complex surface features [1]. Here we propose a novel
formulation and computational framework for objects of arbitrary shape in the Euclidean 3−space,
potentially extensible to higher dimensions, built on a generalization of the ideas that are in use in
the most recent protein docking systems [2, 4].
Formulation. Given two sets S1 , S2 ∈ S in the 3−space, where S ⊂ E3 represents the collection
of all ‘well-defined’ solid objects (here specified as compact regular semi-analytic subsets of the
Euclidean metric space E3 = (R3 , d) with the usual L2 −norm as the metric d(x, y) = kx − yk2 for
all x, y ∈ R3 [6]) the basic idea is to formulate the so-called shape complementarity score function
f : SE(3) → R as a cross-correlation of the form
Z
f (t; S1 , S2 ) = (ρ1 ∗ ρ2 )(t) =
ρ1 (x) ρ2 (t−1 x) dx,
(1)
R3
where SE(3) ∼
= SO(3) n R3 is the Special Euclidean group (i.e., the group of all rigid body
transformations), ∗ is the convolution operator, and dx is the infinitesimal volume element in E3 .
The functions ρ1,2 = ρ(x; S1,2 ) are shape descriptors (called affinity functions) that are invariant
under rigid body motion, i.e., ρ(x; tS) = ρ(t−1 x; S) for all x ∈ R3 , t ∈ SE(3), and S ∈ S. How can
one define the affinity function ρ : (R3 − ∂S) → R (or C)1 for a given shape in such a way that
the integral in (1) produces a higher score when there is a better geometric fit between the surface
features of the stationary solid S1 and the displaced solid tS2 ? This raises a more fundamental
question, which is, what exactly do we mean by the ‘goodness of fit’ ?
We start from an intuitive qualitative definition: A generic shape complementarity score model
for objects of arbitrary shape can be obtained from a comparative overlapping of shape skeletons
∗
Computational Design Lab, Departments of Mechanical Engineering and Computer Science and Engineering,
University of Connecticut, Authors may be reached at [email protected] and [email protected].
1
Our particular choice of the kernel used in the definition of the affinity function excludes the boundary from its
domain, which does not affect (1) when dealing with solid objects that have nowhere-dense boundaries [7]. The range
is changed to complex plane for practical reasons to be explained shortly. In this case, the definition in (1) needs to
be modified to Re{ρ1 ∗ ρ2 } to ensure an ordering on the range of the score function.
1
Figure 1: Shape complementarity score profiles for (a-b) assembly of two mechanical parts, with
non-trivial fit correspondence between mating features; and (c-e) bound-bound docking of a part
of Ran GTPase in complex with NTF-2 [PDB Code: 1A2K].
between the mutually complement features, i.e., by overlapping the external skeleton of one object
with the internal skeleton of its assembly partner. For a precise quantitative formulation, we use our
novel concept of the Skeletal Density Function (SDF), which can be conceptualized as a continuous
extension of the definition of the traditional shape skeletons:
I
h
i
ρ(x; S) =
φ M (x, S)d(x, ∂S) + id(x, y) dy⊥ ,
(2)
∂S
where M : R3 × S → {−1, 0, +1} is the Point Membership Classification (PMC) function [9],
the three integer outcomes coding ‘in’, ‘on’, and ‘out’, respectively, yielding the signed distance
function as the real part inside brackets; dy⊥ is the projection of the surface element dy on the
plane normal to the vector (y − x) for x ∈ R3 and y ∈√
∂S. The kernel φ : C → C can be defined in a
variety
of ways, a proper candidate being φ(ζ; σ) ∝ ( 2πζ 2 )−1 g (| tan ∠ζ| − 1; σ), where g(x; σ) =
√
( 2πσ)−1 exp[− 12 (x/σ)2 ] is the isotropic Gauss function. This particular form is composed of a
‘medial’ component (the Gaussian term) that characterizes the skeletal density that extends an
implicit definition of conventional skeletons, and a ‘proximal’ component (the inverse-square term)
that obligates the skeletal branches to stronger densities near the object boundaries. The latter also
adjusts the proper phase shift ∠φ = −2∠ζ of the integrand in (2), which induces (approximately)
opposite phases between the high-density internal and external skeletal regions as a result of (2).
This in turn results in meaningful contribution terms to the cross-correlation in (1); i.e., a positive
real ‘award’ in (1) in case of external/internal skeletal overlap (i.e., proper fit), and a negative
real ‘penalty’ in case of external/external overlap (i.e., separation) or internal/internal overlap (i.e.,
collision), the relative strength of each contribution being adjusted by the proportionality coefficient
in the φ−kernel that is chosen to be dependent on the sign of Re{ζ} only.
Validation. We will review several practical examples including those illustrated in Figure 1, and
demonstrate the effectiveness of the method in mechanical assembly automation (a-b) [3], as well as
ab initio protein docking (c-d). We will also investigate the theoretical and computational properties
of the new formulation in comparison with the state-of-the-art in protein docking algorithms (e)
[2].
Conclusion. Our proposed approach to model complementarity is generic, it applies to arbitrarily complex shapes; produces inherently robust results against small perturbations; is effective in
steering both gradient-based and evolutionary optimization algorithms; possesses appealing computational properties that suggest efficient computational algorithms in the 3D Euclidean space,
and subsumes the existing protein docking (of spherical atoms) approaches as special cases.
2
References Cited
[1] Pankaj K Agarwal, Herbert Edelsbrunner, John Harer, and Yusu Wang. Extreme elevation on
a 2-manifold. Discrete & Computational Geometry, 36(4):553–572, 2006.
[2] Chandrajit L. Bajaj, Rezaul Chowdhury, and Vinay Siddahanavalli. F2Dock: Fast Fourier
protein-protein docking. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 8(1):45–58, 2011.
[3] Morad Behandish and Horea T. Ilies. Peg-in-hole revisited: A generic force model for haptic
assembly. In Proceedings of ASME Computers and Information in Engineering Conference
(CIE), 2014.
[4] Rezaul Chowdhury, Muhibur Rasheed, Donald Keidel, Maysam Moussalem, Arthur Olson,
Michel Sanner, and Chandrajit L. Bajaj. Protein-protein docking with F2Dock 2.0 and GBrerank. PloS one, 8(3):e51307, 2013.
[5] Miriam Eisenstein and Ephraim Katchalski-Katzir. On proteins, grids, correlations, and docking. Comptes rendus biologies, 327(5):409–420, 2004.
[6] Aristides Requicha. Mathematical models of rigid solid objects. Production Automation Project,
Technical Memo. No. 28, University of Rochester, 1977.
[7] Aristides Requicha and Robert B. Tilove. Mathematical foundations of constructive solid geometry: General topology of closed regular sets. Production Automation Project, Technical
Memo. No. 27, University of Rochester, 1978.
[8] David W Ritchie. Recent progress and future directions in protein-protein docking. Current
Protein and Peptide Science, 9(1):1–15, 2008.
[9] Robert B. Tilove. Set membership classification: A unified approach to geometric intersection
problems. IEEE Transactions on Computers, 100(10):874–883, 1980.
3