Protein Structure Prediction on GPU: a Declarative

Protein Structure Prediction on GPU:
a Declarative Approach in a Multi-agent Framework
Federico Campeotto(1),(2)
Agostino Dovier(2)
Enrico Pontelli(1)
[email protected]
[email protected] [email protected]
(1) New Mexico State University, Las Cruces, NM USA
(2) University of Udine, DIMI, Udine, Italy
Abstract—This paper provides a novel perspective in the
Protein Structure Prediction (PSP) problem. The PSP problem
focuses on determining putative 3D structures of a protein
starting from its primary sequence. The proposed approach relies
on a multi-agents approach, where concurrent agents explore
the folding of different parts of a protein. The strength of the
approach lies in the agents’ ability to apply different types
of knowledge (expressed in the form of declarative constraints)
to prune the local space of folding alternatives. The paper
demonstrates the suitability of a GPU approach to implement
such multi-agent infrastructure, with significant improvements
in speed and quality of solutions w.r.t. other methods (e.g., based
on fragments assembly approaches).
I. I NTRODUCTION
The prediction of the 3D structure of a protein from a
sequence of amino acids is one of the most popular problems
in Bioinformatics [1]. The large number of conformations in
which a protein can potentially fold, together with the lack
of an accurate energy function that can guide the folding
process, makes the Protein Structure Prediction (PSP) problem
challenging even for proteins of relative short length.
In this paper, we tackle the PSP problem using a multi-agent
approach, to concurrently explore and assemble foldings of
local segments of the protein. These agents are in charge of
retrieving, filtering, and coordinating local information about
parts of a protein, aiming to reach a global consensus. Relationships among substructures are expressed as constraints,
where new knowledge about the protein (e.g., properties of the
amino acids) can be readily integrated and used to prune the
space of potential conformations. This model is effectively
implemented using a General Purpose Graphical Processing Units (GPU) architecture, leading to significant gains in
execution time, if compared to a sequential implementation.
GPUs offer parallelism at relatively low cost and they elegantly
accommodate the proposed PSP multi-agent infrastructure. In
this framework, we present a solver that performs local search,
checking the consistency of the constraints until it reaches
a local minimum. Agents use GPU cores to explore large
portions of the search space and to propagate constraints on
the ensemble of structures produced by each agent.
II. R ELATED W ORK
Ab-initio prediction methods for PSP simulate the folding
process by using knowledge about the chemical structure of
the protein and the laws of physics. They usually require a
large amount of computational resources, and some results
have been obtained from the adoption of distributed computing [2] and high-performance supercomputers [3]. GPU
technology has also been used for ab-initio prediction [4].
An alternative strategy is represented by the comparative
modeling approach, which assumes that a limited set of
structural motifs can represent the majority of the existing
protein structures. In I-TASSER [5] a comparative modeling
approach is improved with a hierarchical strategy. Given
a target sequence, I-TASSER first generates some protein
templates through “threading” techniques. Then, it assembles
templates fragments in order to generate a set of candidate
structures, from which it extracts the structure with minimum
energy. I-TASSER server [6] was ranked as the first server for
protein structure prediction in the recent CASPs [7].
Declarative techniques have also been extensively employed
to model the PSP problem. Using Constraint Programming
(CP) , it is easy to describe spatial properties of the unknown
protein in terms of geometric constraints. In these models, it is
common to superimpose the protein structure on a discretized
representation of the three dimensional space, often organized
as a crystal lattice structure. Successful results have been
obtained for both the simple two amino-acids (HP) model
by solving a constraint optimization problem [8], and for
the complete model [9] that uses all the amino acids types.
A promising idea has been adopted in [10], where the 3D
conformation of a protein is predicted via potein’s fragments
assembly, modeled in terms of finite domain constraints and
implemented using a constraint logic programming system.
The use of a multi-agent systems model in the context of
the PSP problem has been relatively limited. An agent-based
framework for ab-initio simulations combining the predictions
of five different tools is presented in [11], whereas in [12] Concurrent Constraint Programming is used where a blackboard
enables communication between agents. Another multi-agent
framework based on blackboards is presented in [13], while
in [15] agents are used as a reinforcement learning approach
for solving the PSP problem in the HP model.
III. BACKGROUND
A protein is a polymer composed of a sequence of simple
molecules, called amino acids. This sequence, referred to as
the primary sequence, folds to a unique stable structure, that
represents the conformation with minimum free energy; such
structure determines the biological function of the protein. We
refer to this structure as the tertiary structure of the protein.
The Protein Structure Prediction (PSP) problem is defined as
the problem of finding the tertiary structure of a target protein
given its primary sequence. There are 20 types of amino acids.
Each one can be represented by a common structure, called
the backbone, constituted by two Carbons, one Hydrogen,
one Oxygen, and one Nitrogen atom. The distinction between
amino acids originates from an additional group of (1–18)
atoms, called the side-chain. In a simple model, this group
can be represented by a single large centroid (group R). The
centroid is linked to the central carbon atom Cα (see Fig. 1–
left), and its size and distance from Cα can be statistically
determined for each amino-acid type.
Fig. 1: Representation of an amino acid (Left), and the peptide
plane, the peptide bond that binds two amino acids, and the
Phi (φ) and Psi (ψ) dihedral angles (Right).
Two consecutive amino acids are linked together by a
peptide bond, that connects the C–O group of the first amino
acid to the N -H group of the next one. Due to the double bond
character of the peptide bond, these atoms lie on the same
plane, known as peptide plane (Fig. 1–right). Therefore, the
peptide bond backbone has only two degrees of freedom per
amino acid group: (1) the rotation around the N -Cα bond (φ
angle), and (2) the rotation around the Cα -C bond (ψ angle). A
series of amino acids joined by peptide bonds form the backbone chain of the protein whose 3D coordinates describe the
whole tertiary structure. Sequences of φ and ψ angles, defined
by backbone atoms CN CαC and N CαCN , respectively,
determine exactly the tertiary structure (as 3D coordinates
of atoms), and vice-versa. Specific ranges of the torsional
angles determine also particular local structures, referred to
as secondary structures of the protein. Secondary structure
elements often assume regular and repetitive spatial shapes,
being constrained by the hydrogen bonds formed by local
inter-residue interactions. The most common examples are the
α-helices and the β-sheets. Other parts of the polypeptide,
instead, present less regular structures. They allow the folds
between the structured parts, and they are usually identified
by loops and turns.
We use an energy function E as objective function to measure the “quality” of a given structure. The lower the energy
value, the closer the structure is to the real tertiary protein
structure. E is the weighted sum of the three components: (a)
The Contact Potential component that uses the statistical table
of contact energies described in [14]. (b) The Torsional and
Correlation potential component that is a statistical potential
for the torsional angles and the correlation between amino
acids in the protein’s sequence that uses a pre-calculated table
of energies. (c) The Hydrogen bond potential [16] component
that is based on the orientation-dependent hydrogen bonding
potential created between pairs of atoms N –H and O atoms.
IV. P ROBLEM FORMALIZATION
A Constraint Satisfaction Problem (CSP) P is a triple
P = hX, D, Ci, where X = {x1 , . . . , xn } is a set of finitedomain variables, D = {D1 , . . . , Dn } is the corresponding
set of domains, and C = {C1 , . . . , Cm } is a finite set
of constraints on X. A constraint can be described as a
subset Ci ⊆ Di1 × · · · × Diki of the Cartesian product of
the domains of the variables xi1 , . . . , xiki for some subset
{i1 , . . . , iki } ⊆ {1, . . . , n}. Solving a CSP P consists of
finding a n-tuple S = hs1 , . . . , sn i such that sj ∈ Dj for
1 ≤ j ≤ n and for each 1 ≤ i ≤ m hsi1 , . . . , siki i ∈ Ci . A
Constraint Optimization Problem (COP) is a pair C = (P, E),
where P is a CSP and E : D1 × · · · × Dn → R is a cost
function, and where we seek for a solution s of P s.t. E(s)
is minimal.
Given a primary sequence of a protein of length n, we model
the PSP problem as a COP C = (hX, D, Ci, E) as follows.
X = {x1 , . . . , xn } is a set of finite domain variables, where
the variable xi is associated to the i-th amino acid of the
protein. D = {D1 , . . . , Dn } is the set of variable domains,
where Di is a finite set of pairs of torsional angles hφ, ψi that
can be assigned to the i-th amino acid. C = {C1 , . . . , Cm }
is a finite set of constraints over X that describe geometric
properties that the final structure must satisfy to be a physically
admissible structure. E is a cost function representing the
energy of the relative structure. Finding a solution for C means
finding an assignment S = hs1 , . . . , sn i of pairs of angles to
each amino acid s.t. E(S) is minimum.
V. T HE M ULTI -AGENT S YSTEM
A Multi-Agent System (MAS) is composed of several agents
in an environment, each with a certain degree of independence
and collaborating with other agents to solve a common problem [17]. These agents are partially autonomous and they have
a local (possibly partial) view of the global system. Usually, an
agent receives information from the external environment, and
applies some rules in order to determine the action to perform
on the environment. In this work the framework is organized
as a multi-level MAS to simulate the folding process (namely
to solve the COP described in Sect. IV). We describe the four
types of agents used here (see Fig. 2).
a) The Supervisor Agent: Its task is to assign different
sub-sequences of the primary sequence among the immediately underlying types of agents—the coordinator and the
structure agents—and guide the entire folding process towards
a stable global configuration. The supervisor agent sets a
priority order among these agents. First, each “highly constrained” secondary structure (α-helix/β-sheet) is computed by
the structure agents; afterwards, the coordinator agents (one in
this work) are invoked to model the whole tertiary structure
variables in X 0 are labeled (a local, not improving, minimum).
The two families of agents use different sets of fragments as
well as different weights for the energy function. In particular,
the structure agent uses parameters tuned to form secondary
structures such as α-helices and β-sheets, while the coordinator agent uses fragments sets and energy weights studied for
loop-modeling.
c) Worker agents: they are the lower-level agents of
the system. Each of them has direct control over one xi
variable and it explores its domain Di . A worker agent
can be associated either to a coordinator or to a structure
agent. Assignment and consistency checking of constraints are
performed in parallel on a GPU; a set of admissible structures
(i.e., structures that satisfy all the constraints) is returned to
the higher level agent.
Fig. 2: The overall framework
VI. I MPLEMENTATION
by moving loops or turns. The supervisor agent must ensure
also that the structures determined by each structure agent can
be merged together to obtain the global structure that can be
effectively folded by the coordinator agent. We assume that the
offsets on the primary structure of the locations of α-helices
and β-sheets is given by an external source (e.g., a secondary
structure assignment server).
b) The Coordinator and the Structure Agents: they are
assigned to a segment of the protein, namely to a set of
variables X 0 = {xp , . . . , xq } ⊆ X with domains Dp , . . . , Dq ,
and to a list of associated worker agents (wa) wap , . . . , waq .
The structure agent applies a beam search strategy on
the space of solutions. Assume an initial structure is given
(initially set as a straight configuration). For each variable
xi ∈ X 0 , the structure agent invokes the corresponding worker
agent wai that instantiates in parallel the variable xi with all
the possible values in domain Di , while wai leaves the values
of xj with i 6= j unchanged. These “multiple” tries produce
a set of structures that are obtained by rotating the torsional
angles of the i-th amino acid of the target sequence. Rotations
are performed on the structure given as input to wai . To
select the best assignment choice, a parallel implementation
of the energy functionE calculates the energy values for
each structure in the set. The variable corresponding to the
assignment with minimum energy is assigned and the relative
structure is set as new current structure (particular cases and
backtracking are handled).
The coordinator agent executes a space sampling. Assume
again an initial structure is given; then, for each variable xi ∈
X 0 , the structure agent invokes the corresponding worker agent
wai that instantiates in parallel the variable xi with all the
possible values in domain Di . For each assignment of xi , a
number of random values to the remaining variables of X 0 is
concurrently explored. We select the energetically best among
these structure. If it improves the “best solution”, a new “best
solution” is found (a local minimum). Otherwise, we just store
the assignment xi /dj that led to the best, leaving unassigned
the other variables, and repeat the process with X 0 \ {xi }. The
computation terminates in a new “best solution” or when all
We implemented the MAS system presented in the previous sections using the NVIDIA’s Compute Unified Device
Architecture (CUDA), a software and hardware environment
that facilitates the adoption of GPUs in general purpose
computing. The underlying framework of CUDA is SingleInstruction Multiple-Thread (SIMT): the same instructions can
be executed by different threads that run on identical cores
(on different data). A general CUDA application consists of
sequential code, executed by the host (CPU), and parallel code,
executed by the device (GPU). The code called by the host
and executed by the device is called kernel, it is written in
standard C-code, and it is defined as a grid of independent
blocks that are assigned sequentially to the multiprocessors of
the GPU (coarse grain parallelism). In turn, a block is a set
of concurrent threads (fine grain parallelism), that leads to a
high level parallelism. In the rest of this section, we briefly
explain how we use the parallelism offered by a GPU in order
to assign variables, verify constraints, and calculate the energy
function. Let us denote by Bj the j-th block of a kernel, and
by Bjz (Bj1,...,m ) the z-th (all the) thread(s) of the block Bj .
d) Constraints: To relate finite domain variables (i.e.,
pairs hφ, ψi) to points in R3 (i.e., positions of the atoms
affected by the two angles), we define two constraints: (1)
the sang (single angle) constraint that calculates the set of
structures for the structure agent, and (2) the mang (multiple
angle) constraint that is used to sample the space of solutions
for the coordinator agent. To obtain physically admissible
structures, we used an additional constraint, the alldistant
constraint, that imposes a minimum distance between each pair
of Van der Waals spheres of the atoms of a given structure.
sang constraint. Let xi be the variable labeled by the worker
~ a list
agent, Di the corresponding domain of size k, and S
of 3D points. The worker agent invokes the kernel with k
blocks and 1 thread per block. Each block Bj calculates a new
~ 0 , given by the rotations of the torsional angles of
structure S
j
~ as described earlier.
the i-th amino acid of S,
mang constraint. For this constraint we set a maximum size
M for the sample of structures given by the propagation.
This limit is imposed in order to obtain a good compromise
between the memory needed to store the sample of structures
on the GPU and the time required to calculate a significant
subset of the search space. Hence, the worker agent invokes
the kernel with k blocks and m = b k+M
k c threads per block,
since each thread is used to calculate a different structure. Let
xi be the variable to label, xp , . . . , xq the list of variables
~ =
that belong to the same coordinator agent of xi , and D
Dp , . . . , Di , . . . , Dq the list of the corresponding domains.
~ the thread B z calculates the new structure
Given a structure S,
j
z
S~0 j by rotating the torsional angles of the i-th amino acid of
~ according to the value dj ∈ Dj , together with the rotation
S
of the torsional angles of the amino acids p, . . . , q according
to a random subset of Dp × · · · × Di−1 × Di+1 × · · · × Dq .
alldistant constraint. Given a set of k conformations of
a protein of length n, we verify whether each of them satisfies
the alldistant constraint in time O(n): we “unfold” the
sequential implementation that uses two nested loops for each
pair of atoms by invoking a kernel with a number of threads
per block equal to the length of the structure to check.
Every time the alldistant constraint is woken up by the
labeling of a finite domain variable, a kernel with k blocks and
n threads per block checks the consistency of the constraint
on the set of k confromations obtained by the previous
propagation of the sang and the mang constraint. Observe
that the number of threads per block corresponds always to
the length of the structure, since each thread maps exactly one
amino acid. The alldistant constraint is checked on the
whole structure in order to avoid clashes also between atoms
of amino acids that are not associated with the current set
of variables that have to be labeled. To keep track of the non
admissible structures, we use a Boolean array V of size k, and
we set the i-th cell of the array to False if the i-th structure
of the set does not satisfy the constraint. All the cells of V
are initially set to True. As soon as a thread Bij calculates
an Euclidean distance between the atoms of the j-th amino
acid and another atom of the i-th structure that is less of the
minimum threshold, Bij sets the i-th cell of V to False.
e) Energy Function:: We implemented the energy function making use of a double level of parallelization: (1) given a
set of admissible structures, the energy value of each structure
is calculated in parallel by a number of blocks equal to the
size of the set, and (2) for a given structure each energy field
is calculated in parallel by a thread within the block.
To obtain a linear time for the contact and the hydrogen potential we adopt the same strategy used for the alldistant
constraint since we need to calculate atom-atom distances for
all pairs of amino acids of a given structure. Namely, if we
consider a structure of length n, then we use n threads for
both the contact potential and the hydrogen bond potential
and 2n threads for the correlation and torsional potential.
~1 , . . . , S
~k of structures of length
Hence, given the list P~ = S
n, it is possible to calculate the energy value of each structure
by invoking a kernel with k blocks and 2n + 2 threads per
block. Note that in the case of the structure agent we calculate
the energy value only on the substructure associated to such
agent, since the rest of the structure is not yet folded. To
avoid unnecessary computations, we check the Boolean array
of admissible structures V that has been set by the previous
propagation of the constraints. Hence, we calculate the energy
value for the i-th structure if and only if V[i] = True.
f) Implementation details about CUDA: Due to the features of the architectural model of CUDA, we must consider
three main aspects that can affect the performance of the
parallel computations: (1) the maximum number of threads per
block, (2) the maximum number of threads, called warp, that
can be physically executed in parallel on each processor of the
GPU, and (3) the information stored on the device memory and
the copies of data to and from the host memory. In this paper
we used a (typical) hardware where the maximum number of
threads per block is limited to 1024, and the size of a warp is
32. The first restriction could potentially limit the maximum
size of the target protein to 1024 when we use a thread per
amino acid (e.g., to implement the alldistant constraint).
Protein typical size, however, is less than 1024; if this is not
the case we might split the computation in multiple executions
of the same kernel. Restriction (2) is important when we allow
threads of the same warp to diverge to different computational
branches. We split the parallel computation of different parts
of the kernel code among warps of 32 threads. For example,
the kernel computing the energy function is invoked with
n
e (instead of
2m + 64 threads per block where m = d 32
2n + 2 threads per block): 2m threads are used to compute
the contact and hydrogen bond potential, and 64 threads are
used to compute the correlation and torsional potential. Aspect
(3) regards the optimization of the memory usage in order to
achieve maximum memory throughput. CUDA has different
types of memory spaces. Each thread block has access to a
small amount of a fast shared memory within the scope of
the block. In turn, all threads have access to the same global
memory. Global memory is much slower than shared memory
but it can store more data. Since applications should strive to
minimize data transfers between the host and the device (i.e.,
data transfers with low bandwidth), we reserve in the global
memory an array of size equal to the maximum number of
structures expected for the sampling of the coordinator agent
multiplied the size of a structure. Moreover, we reserve an
array of Boolean values for the admissible structures and an
array of doubles for the energy values. Each kernel receives
the number of structures to be considered and the pointers to
the arrays in the global memory, in order to properly overwrite
them considering only the memory area affected by the kernel
function. The memory transfers to and from the CPU are
made at each labeling step by copying the array of structures
produced by the propagation of constraints and the array of the
energy values. Only the elements affected by the computation
of the kernel are transferred into the host memory. To optimize
the computation on the device, we store all the structures to
be rotated, the structures on which performing the consistency
check for the alldistant constraint, and those on which
calculating the energy values into the shared memory of the
GPU, being careful not to exceed the maximum size available
for the device we use in this paper. The shared memory can
be a limitation when we manage a large number of structures
or very long proteins. Nevertheless, this is not a problem if
we consider that the size of our domains is about 300 and
the proteins are typically 200–300 amino acids long. These
numbers are compatible with the features of CUDA (e.g., max
number of threads per block), which makes this architecture
particularly efficient on our model.
VII. E XPERIMENTAL R ESULTS
We report some experimental results obtained from the
implementation of the multi-agent system. Experiments are
conducted on a Linux Intel Xeon E5645, 2.4 GHz machine.
The graphics card is a NVIDIA Tesla C2075. To evaluate the
speedup using the GPU, we implemented a sequential version
of the multi-agent system and we compared the computational
times of the two implementations on a set of target proteins
of different lengths. Since the coordinator uses a randomized
search strategy, we report the average values of 4 runs.
g) Quality evaluation of the predictions: We study the
quality of the predicted structures in terms of the RMSD1
value given by the comparison with the real, known, structure.
Moreover, we compared our system with the state-of-the-art ITASSER algorithm in the following way: we selected a subset
of significant proteins among those used for the benchmark III
presented in [5], and we compared the RMSD values between
the best structures found by I-TASSER and the structures
found by our system (Table I). The most important aspect is
the small time (5 minutes) needed for the multi-agent system
to find a solution, compared with the 5 CPU hours of [5]. This
is a promising result if we consider that the average RMSD for
small proteins given by the MAS (4.3) is in line with the one
found by I-TASSER (4.0), and that our model is potentially
able to find solutions that are relatively close to the native
protein in a relatively small amount of time.
Protein ID
Len.
2CR7
1ITP
1OF9
1TEN
1FAD
1JNU
1GYV
1ORG
Average
60
68
77
87
92
104
117
118
90
I-TASSER
RMSD
4.6
10.9
3.6
1.6
3.6
2.7
3.3
2.4
4.0
Multi Agent System
RMSD
Time
3.2
0.720
2.4
2.041
3.3
2.984
3.6
5.428
4.3
3.185
2.7
8.942
5.3
11.068
5.0
5.765
4.3
5.016
TABLE I: Quality evaluation (RMSD in Å, Time in min)
h) GPU vs CPU: To compare the MAS solver w.r.t. the
sequential version, we used proteins with lengths that range
from 12 to 100. Table II reports the times needed by the two
implementations to compute each target protein, the speedup
(CPU time/GPU time), and the energy values found.
The average energy value found by the sequential version is
in line with the one found by the parallel version on the GPU,
and this aspect is also reflected by the two respective average
1 The Root Mean Square Deviation measures the spatial similarity of corresponding atoms, using an optimal roto-translation to overlap two structures.
Protein ID
1LE0
1ZDD
2GP8
2K9D
2IGD
1AIL
1JHG
Average
Len.
12
34
40
44
61
69
100
51
GPU
0.045
0.325
0.640
0.720
1.851
1.568
5.391
1.505
Time (Min.)
CPU
0.185
4.045
6.548
5.952
32.404
22.289
138.032
29.922
Ratio
4.11
12.44
10.23
8.27
17.51
14.21
25.60
19.88
Energy
GPU
CPU
-474.5
-471.9
-2609.7
-2635.3
-3020.3
-3040.1
-6611.1
-6651.1
-8088.2
-7746.6
-11911.5 -11969.3
-23533.3 -22684.7
8035.5
7885.5
TABLE II: Comparison between GPU and the CPU.
RMSD values (not reported in the table). However, the parallel
implementation provides a significant gain in terms of time,
increasing with protein size.
The GPU and the CPU costs depend also on the energy
function used. A better energy function can use more than
three energy fields that can lead to different computational
costs. Since the energy function can be changed without
affecting the overall structure of the system, we evaluated
the differences between the costs of the two implementations
without any particular energy function, but using the RMSD
w.r.t. the native, known structure as objective function for both
the structure and the coordinator agent. Given a protein of
length n, the RMSD calculation costs O(n) since we calculate
only the n distances between the atoms of the original structure
and the corresponding atoms of the predicted one.
Table III presents the comparison between the CPU and the
GPU implementations using the RMSD as objective function.
The benchmark set is the same as before. For each target
protein we report the GPU and the CPU costs, the speedup
calculated as the ratio between CPU and GPU, and the RMSD
calculated on the predicted structure. Computational times
Protein ID
1LE0
1ZDD
2GP8
2K9D
2IGD
1AIL
1JHG
Average
Len.
12
34
40
44
61
69
100
51
GPU
0.020
0.206
0.398
0.380
1.667
1.244
3.906
1.208
Time (Min.)
CPU
Ratio
0.091
4.55
1.476
7.16
4.863
12.21
4.697
12.36
20.077
12.04
14.220
11.43
87.225
22.33
22.425
18.56
RMSD
GPU
0.5
1.0
1.1
1.1
2.0
1.6
2.0
1.3
(Å)
CPU
0.5
1.0
1.3
1.1
1.9
1.7
2.1
1.3
TABLE III: GPU vs CPU using the RMSD as Energy
are visibly reduced for both the implementations, but the
speedup is almost doubled. This shows us that the energy
function affects considerably the GPU cost, even if it is
implemented in parallel. A more detailed analysis concerns
the computational cost of each single function. We calculated
the average time required by each kernel function and by
the corresponding sequential implementation on the whole
benchmark set. The alldistant constraint is the one that
affects the least the cost on the GPU, requiring about 10.6%
of the whole computation time. On the other hand, the energy
function corresponds to about 72.7% of the total cost, while
the sang/mang constraints amount to about 16.5%. The
situation changes for the functions that require O(n2 ) time on
the CPU, where the obtained speedup is remarkable. On the
CPU, the consistency check of the alldistant constraint
is about the 53.9% of the total cost, followed by the energy
function (42.3%) and the sang/mang constraint (3.66%).
A larger difference of behavior between the GPU and
the CPU can be observed on the time required by the
alldistant constraint and the energy function. Again, the
constraint affects more the CPU, while the energy interests
more the GPU (see Table III). The alldistant constraint
is checked on the CPU for all the structures given by the
angle constrain propagation, while the energy only on those
admissible. The GPU propagates the alldistant constraint
in parallel on all the structures of the set, and this affects less
the cost. Instead, the cost increases for the energy calculation
even if it is performed in parallel, since it has a greater number
of mathematical operations w.r.t. those for the propagation of
the constraint: adding complex constraints to the model can
provide accurate structures while still preserving the speedup.
i) Improving the prediction while preserving the speedup:
In this section we show that the system is highly modular and
that we can easily add geometric constraints that make the
structure more realistic while preserving good computational
times and, in general, increasing the speedup. We consider the
case of modeling the side-chain of each amino acid. This can
be done by defining a new constraint that relates the position of
each Cα atom with the group R defined on it. The centroid
(CG) constraint enforces a relation among four real variables
p1 , p2 , p3 , and p4 , . This relation establishes the value to assign
to the variable p4 representing the coordinates of the side chain
defined on the carbon atom Cαi , given the bend angle formed
by the carbon atoms p1 = Cαi−1 , p2 = Cαi , and p3 = Cαi+1 ,
and the average Cαi –side chain distance [18]. Moreover, this
constraint checks the minimum distance between side chains
and all the other atoms of the structure in order to avoid steric
clashes, as in the case of the alldistant constraint. Note
that using a sequential algorithm, it is possible to check the
consistency of this constraint on a given assignment of values
to the variables in P in time O(n2 ). Again, we adopted the
same strategy used for the alldistant constraint to obtain
O(n) time on the parallel implementation.
In Table IV, we report the execution time of the GPU
and the CPU implementations, using both the alldistant
constraint and the CG constraint; we also report the speedup
measured as the ratio between the CPU and the GPU time,
and the energy of the best structure found. The benchmark set
is the same used in the previous section.
Protein ID
1LE0
1ZDD
2GP8
2K9D
2IGD
1AIL
1JHG
Average
Len.
12
34
40
44
61
69
100
51
GPU
0.046
0.317
0.612
0.571
2.077
1.622
4.342
1.369
Time (M in.)
CPU
Ratio
0.331
6.76
4.877
15.38
8.770
14.33
8.501
14.88
51.481
24.79
27.062
16.68
157.106
36.18
36.875
26.93
Energy
GPU
CPU
-475.5
-485.1
-2517.7
-2519.7
-2923.4
-2930.0
-6093.9
-6139.7
-7912.7
-7971.5
-11614.0 -11651.8
-22258.7 -21317.0
7685.1
7573.5
TABLE IV: GPU vs CPU exploiting the CG constraint
Let us observe that the speedup increases, since a O(n)
(resp., O(n2 )) time function is added to the parallel (resp.,
sequential) implementation. The use of the GPU allows us to
easily improve the model by adding constraints that do not
overly affect the execution time. The same argument does not
hold for the sequential implementation. An example is the
1JHG protein, where the time for the CPU increases to 19
minutes (using CG), while the time on the GPU decreases.
The CG constraint prunes the search at the cost of its filtering
algorithm, amortized by the parallel implementation.
VIII. C ONCLUSIONS AND F UTURE WORK
In this paper we presented a novel perspective to the Protein
Structure Prediction Problem. We used a declarative approach
for ab-initio simulation, implementing a multi agent system.
Moreover, we used the GPU architecture to explore the search
space and to propagate constraints. The results are remarkable.
The use of GPU allows us to obtain speedups up to 36.
As future work, we plan to extend the multi agent system
by assigning different GPUs to different agents in a MultiGPU environment. Moreover, we plan to improve the quality
of the result by using appropriate constraints for each type of
secondary structure elements. A careful handling of constraint
propagation for the employed constraints is under development.
R EFERENCES
[1] C. B. Anfinsen, “Principles that govern the folding of protein chains,”
Science, vol. 181, pp. 223–230, 1973.
[2] A. Beberg et al., “Folding@home: Lessons from eight years of volunteer
distributed computing,” in Parallel and Distributed Processing, IEEE
International Symp., 2009, pp. 1–8.
[3] IBM Blue Gene Team, “Blue Gene: A vision for protein science using
a petaflop supercomputer,” IBM Systems Journal, vol. 40, 2001.
[4] L.C.T. Pierce et al., “Routine access to millisecond time scale events
with accelerated molecular dynamics,” Journal of Chemical Theory and
Computation, vol. 8, pp. 2997–3002, 2012.
[5] S. Wu et al., “Ab initio modeling of small proteins by iterative tasser
simulations,” BMC Biology, vol. 5, 2007.
[6] “I-tasser online, protein structure and function prediction,” http://
zhanglab.ccmb.med.umich.edu/I-TASSER/.
[7] J. Moult et al., “Critical assessment of methods of protein structure
prediction (casp): Round iii,” Proteins, vol. Suppl 3, pp. 2–6, 1999.
[8] R. Backofen and S. Will, “A constraint-based approach to fast structure
prediction in 3D protein models,” Constraints, vol. 11, no. 1, 2006.
[9] A. Dal Palù et al., “Constraint logic programming approach to protein
structure prediction,” BMC Bioinformatics, vol. 5, no. 186, 2004.
[10] A. Dal Palù et al., “CLP-based protein fragment assembly,” TPLP,
vol. 10, no. 4-6, pp. 709–724, 2010.
[11] L. Palopoli and G. Terracina, “Coopps: a system for the cooperative
prediction of protein structures,” J. Bioinformatics and Computational
Biology, vol. 2, no. 3, 2004.
[12] L. Bortolussi et al., “Agent-based protein structure prediction,” Multiagent & Grid Systems, vol. 3, no. 2, pp. 183–197, 2007.
[13] P.P. Gonzalez Perez et al., “Multi-agent systems applied in the modeling
and simulation of biological problems,” World Academy of Science,
Engineering and Technology, vol. 58, p. 128, 2009.
[14] F. Fogolari et al., “Scoring predictive models using a reduced representation of proteins: model and energy definition,” BMC Structural Biology,
vol. 7, no. 15, 2007.
[15] G. Czibula et al., “Solving the protein folding problem using a distributed q-learning approach,” Journal of Computer Technology and
Applications, vol. 2, pp. 404–413, 2011.
[16] A. Morozov and T. Kortemme, “Potential functions for hydrogen bonds
in protein structure prediction and design,” Advances in Protein Chemistry, vol. 4, no. 72, 2005.
[17] M. Wooldridge, “An introduction to MAS,” 2002, Wiley & Sons.
[18] F. Fogolari et al., “Modeling of polypeptide chains as C-α chains, Cα chains with C-β, and C-α chains with ellipsoidal lateral chains,”
Biophysical Journal, vol. 70, pp. 1183–1197, 1996.