Protein Structure Prediction on GPU: a Declarative Approach in a Multi-agent Framework Federico Campeotto(1),(2) Agostino Dovier(2) Enrico Pontelli(1) [email protected] [email protected] [email protected] (1) New Mexico State University, Las Cruces, NM USA (2) University of Udine, DIMI, Udine, Italy Abstract—This paper provides a novel perspective in the Protein Structure Prediction (PSP) problem. The PSP problem focuses on determining putative 3D structures of a protein starting from its primary sequence. The proposed approach relies on a multi-agents approach, where concurrent agents explore the folding of different parts of a protein. The strength of the approach lies in the agents’ ability to apply different types of knowledge (expressed in the form of declarative constraints) to prune the local space of folding alternatives. The paper demonstrates the suitability of a GPU approach to implement such multi-agent infrastructure, with significant improvements in speed and quality of solutions w.r.t. other methods (e.g., based on fragments assembly approaches). I. I NTRODUCTION The prediction of the 3D structure of a protein from a sequence of amino acids is one of the most popular problems in Bioinformatics [1]. The large number of conformations in which a protein can potentially fold, together with the lack of an accurate energy function that can guide the folding process, makes the Protein Structure Prediction (PSP) problem challenging even for proteins of relative short length. In this paper, we tackle the PSP problem using a multi-agent approach, to concurrently explore and assemble foldings of local segments of the protein. These agents are in charge of retrieving, filtering, and coordinating local information about parts of a protein, aiming to reach a global consensus. Relationships among substructures are expressed as constraints, where new knowledge about the protein (e.g., properties of the amino acids) can be readily integrated and used to prune the space of potential conformations. This model is effectively implemented using a General Purpose Graphical Processing Units (GPU) architecture, leading to significant gains in execution time, if compared to a sequential implementation. GPUs offer parallelism at relatively low cost and they elegantly accommodate the proposed PSP multi-agent infrastructure. In this framework, we present a solver that performs local search, checking the consistency of the constraints until it reaches a local minimum. Agents use GPU cores to explore large portions of the search space and to propagate constraints on the ensemble of structures produced by each agent. II. R ELATED W ORK Ab-initio prediction methods for PSP simulate the folding process by using knowledge about the chemical structure of the protein and the laws of physics. They usually require a large amount of computational resources, and some results have been obtained from the adoption of distributed computing [2] and high-performance supercomputers [3]. GPU technology has also been used for ab-initio prediction [4]. An alternative strategy is represented by the comparative modeling approach, which assumes that a limited set of structural motifs can represent the majority of the existing protein structures. In I-TASSER [5] a comparative modeling approach is improved with a hierarchical strategy. Given a target sequence, I-TASSER first generates some protein templates through “threading” techniques. Then, it assembles templates fragments in order to generate a set of candidate structures, from which it extracts the structure with minimum energy. I-TASSER server [6] was ranked as the first server for protein structure prediction in the recent CASPs [7]. Declarative techniques have also been extensively employed to model the PSP problem. Using Constraint Programming (CP) , it is easy to describe spatial properties of the unknown protein in terms of geometric constraints. In these models, it is common to superimpose the protein structure on a discretized representation of the three dimensional space, often organized as a crystal lattice structure. Successful results have been obtained for both the simple two amino-acids (HP) model by solving a constraint optimization problem [8], and for the complete model [9] that uses all the amino acids types. A promising idea has been adopted in [10], where the 3D conformation of a protein is predicted via potein’s fragments assembly, modeled in terms of finite domain constraints and implemented using a constraint logic programming system. The use of a multi-agent systems model in the context of the PSP problem has been relatively limited. An agent-based framework for ab-initio simulations combining the predictions of five different tools is presented in [11], whereas in [12] Concurrent Constraint Programming is used where a blackboard enables communication between agents. Another multi-agent framework based on blackboards is presented in [13], while in [15] agents are used as a reinforcement learning approach for solving the PSP problem in the HP model. III. BACKGROUND A protein is a polymer composed of a sequence of simple molecules, called amino acids. This sequence, referred to as the primary sequence, folds to a unique stable structure, that represents the conformation with minimum free energy; such structure determines the biological function of the protein. We refer to this structure as the tertiary structure of the protein. The Protein Structure Prediction (PSP) problem is defined as the problem of finding the tertiary structure of a target protein given its primary sequence. There are 20 types of amino acids. Each one can be represented by a common structure, called the backbone, constituted by two Carbons, one Hydrogen, one Oxygen, and one Nitrogen atom. The distinction between amino acids originates from an additional group of (1–18) atoms, called the side-chain. In a simple model, this group can be represented by a single large centroid (group R). The centroid is linked to the central carbon atom Cα (see Fig. 1– left), and its size and distance from Cα can be statistically determined for each amino-acid type. Fig. 1: Representation of an amino acid (Left), and the peptide plane, the peptide bond that binds two amino acids, and the Phi (φ) and Psi (ψ) dihedral angles (Right). Two consecutive amino acids are linked together by a peptide bond, that connects the C–O group of the first amino acid to the N -H group of the next one. Due to the double bond character of the peptide bond, these atoms lie on the same plane, known as peptide plane (Fig. 1–right). Therefore, the peptide bond backbone has only two degrees of freedom per amino acid group: (1) the rotation around the N -Cα bond (φ angle), and (2) the rotation around the Cα -C bond (ψ angle). A series of amino acids joined by peptide bonds form the backbone chain of the protein whose 3D coordinates describe the whole tertiary structure. Sequences of φ and ψ angles, defined by backbone atoms CN CαC and N CαCN , respectively, determine exactly the tertiary structure (as 3D coordinates of atoms), and vice-versa. Specific ranges of the torsional angles determine also particular local structures, referred to as secondary structures of the protein. Secondary structure elements often assume regular and repetitive spatial shapes, being constrained by the hydrogen bonds formed by local inter-residue interactions. The most common examples are the α-helices and the β-sheets. Other parts of the polypeptide, instead, present less regular structures. They allow the folds between the structured parts, and they are usually identified by loops and turns. We use an energy function E as objective function to measure the “quality” of a given structure. The lower the energy value, the closer the structure is to the real tertiary protein structure. E is the weighted sum of the three components: (a) The Contact Potential component that uses the statistical table of contact energies described in [14]. (b) The Torsional and Correlation potential component that is a statistical potential for the torsional angles and the correlation between amino acids in the protein’s sequence that uses a pre-calculated table of energies. (c) The Hydrogen bond potential [16] component that is based on the orientation-dependent hydrogen bonding potential created between pairs of atoms N –H and O atoms. IV. P ROBLEM FORMALIZATION A Constraint Satisfaction Problem (CSP) P is a triple P = hX, D, Ci, where X = {x1 , . . . , xn } is a set of finitedomain variables, D = {D1 , . . . , Dn } is the corresponding set of domains, and C = {C1 , . . . , Cm } is a finite set of constraints on X. A constraint can be described as a subset Ci ⊆ Di1 × · · · × Diki of the Cartesian product of the domains of the variables xi1 , . . . , xiki for some subset {i1 , . . . , iki } ⊆ {1, . . . , n}. Solving a CSP P consists of finding a n-tuple S = hs1 , . . . , sn i such that sj ∈ Dj for 1 ≤ j ≤ n and for each 1 ≤ i ≤ m hsi1 , . . . , siki i ∈ Ci . A Constraint Optimization Problem (COP) is a pair C = (P, E), where P is a CSP and E : D1 × · · · × Dn → R is a cost function, and where we seek for a solution s of P s.t. E(s) is minimal. Given a primary sequence of a protein of length n, we model the PSP problem as a COP C = (hX, D, Ci, E) as follows. X = {x1 , . . . , xn } is a set of finite domain variables, where the variable xi is associated to the i-th amino acid of the protein. D = {D1 , . . . , Dn } is the set of variable domains, where Di is a finite set of pairs of torsional angles hφ, ψi that can be assigned to the i-th amino acid. C = {C1 , . . . , Cm } is a finite set of constraints over X that describe geometric properties that the final structure must satisfy to be a physically admissible structure. E is a cost function representing the energy of the relative structure. Finding a solution for C means finding an assignment S = hs1 , . . . , sn i of pairs of angles to each amino acid s.t. E(S) is minimum. V. T HE M ULTI -AGENT S YSTEM A Multi-Agent System (MAS) is composed of several agents in an environment, each with a certain degree of independence and collaborating with other agents to solve a common problem [17]. These agents are partially autonomous and they have a local (possibly partial) view of the global system. Usually, an agent receives information from the external environment, and applies some rules in order to determine the action to perform on the environment. In this work the framework is organized as a multi-level MAS to simulate the folding process (namely to solve the COP described in Sect. IV). We describe the four types of agents used here (see Fig. 2). a) The Supervisor Agent: Its task is to assign different sub-sequences of the primary sequence among the immediately underlying types of agents—the coordinator and the structure agents—and guide the entire folding process towards a stable global configuration. The supervisor agent sets a priority order among these agents. First, each “highly constrained” secondary structure (α-helix/β-sheet) is computed by the structure agents; afterwards, the coordinator agents (one in this work) are invoked to model the whole tertiary structure variables in X 0 are labeled (a local, not improving, minimum). The two families of agents use different sets of fragments as well as different weights for the energy function. In particular, the structure agent uses parameters tuned to form secondary structures such as α-helices and β-sheets, while the coordinator agent uses fragments sets and energy weights studied for loop-modeling. c) Worker agents: they are the lower-level agents of the system. Each of them has direct control over one xi variable and it explores its domain Di . A worker agent can be associated either to a coordinator or to a structure agent. Assignment and consistency checking of constraints are performed in parallel on a GPU; a set of admissible structures (i.e., structures that satisfy all the constraints) is returned to the higher level agent. Fig. 2: The overall framework VI. I MPLEMENTATION by moving loops or turns. The supervisor agent must ensure also that the structures determined by each structure agent can be merged together to obtain the global structure that can be effectively folded by the coordinator agent. We assume that the offsets on the primary structure of the locations of α-helices and β-sheets is given by an external source (e.g., a secondary structure assignment server). b) The Coordinator and the Structure Agents: they are assigned to a segment of the protein, namely to a set of variables X 0 = {xp , . . . , xq } ⊆ X with domains Dp , . . . , Dq , and to a list of associated worker agents (wa) wap , . . . , waq . The structure agent applies a beam search strategy on the space of solutions. Assume an initial structure is given (initially set as a straight configuration). For each variable xi ∈ X 0 , the structure agent invokes the corresponding worker agent wai that instantiates in parallel the variable xi with all the possible values in domain Di , while wai leaves the values of xj with i 6= j unchanged. These “multiple” tries produce a set of structures that are obtained by rotating the torsional angles of the i-th amino acid of the target sequence. Rotations are performed on the structure given as input to wai . To select the best assignment choice, a parallel implementation of the energy functionE calculates the energy values for each structure in the set. The variable corresponding to the assignment with minimum energy is assigned and the relative structure is set as new current structure (particular cases and backtracking are handled). The coordinator agent executes a space sampling. Assume again an initial structure is given; then, for each variable xi ∈ X 0 , the structure agent invokes the corresponding worker agent wai that instantiates in parallel the variable xi with all the possible values in domain Di . For each assignment of xi , a number of random values to the remaining variables of X 0 is concurrently explored. We select the energetically best among these structure. If it improves the “best solution”, a new “best solution” is found (a local minimum). Otherwise, we just store the assignment xi /dj that led to the best, leaving unassigned the other variables, and repeat the process with X 0 \ {xi }. The computation terminates in a new “best solution” or when all We implemented the MAS system presented in the previous sections using the NVIDIA’s Compute Unified Device Architecture (CUDA), a software and hardware environment that facilitates the adoption of GPUs in general purpose computing. The underlying framework of CUDA is SingleInstruction Multiple-Thread (SIMT): the same instructions can be executed by different threads that run on identical cores (on different data). A general CUDA application consists of sequential code, executed by the host (CPU), and parallel code, executed by the device (GPU). The code called by the host and executed by the device is called kernel, it is written in standard C-code, and it is defined as a grid of independent blocks that are assigned sequentially to the multiprocessors of the GPU (coarse grain parallelism). In turn, a block is a set of concurrent threads (fine grain parallelism), that leads to a high level parallelism. In the rest of this section, we briefly explain how we use the parallelism offered by a GPU in order to assign variables, verify constraints, and calculate the energy function. Let us denote by Bj the j-th block of a kernel, and by Bjz (Bj1,...,m ) the z-th (all the) thread(s) of the block Bj . d) Constraints: To relate finite domain variables (i.e., pairs hφ, ψi) to points in R3 (i.e., positions of the atoms affected by the two angles), we define two constraints: (1) the sang (single angle) constraint that calculates the set of structures for the structure agent, and (2) the mang (multiple angle) constraint that is used to sample the space of solutions for the coordinator agent. To obtain physically admissible structures, we used an additional constraint, the alldistant constraint, that imposes a minimum distance between each pair of Van der Waals spheres of the atoms of a given structure. sang constraint. Let xi be the variable labeled by the worker ~ a list agent, Di the corresponding domain of size k, and S of 3D points. The worker agent invokes the kernel with k blocks and 1 thread per block. Each block Bj calculates a new ~ 0 , given by the rotations of the torsional angles of structure S j ~ as described earlier. the i-th amino acid of S, mang constraint. For this constraint we set a maximum size M for the sample of structures given by the propagation. This limit is imposed in order to obtain a good compromise between the memory needed to store the sample of structures on the GPU and the time required to calculate a significant subset of the search space. Hence, the worker agent invokes the kernel with k blocks and m = b k+M k c threads per block, since each thread is used to calculate a different structure. Let xi be the variable to label, xp , . . . , xq the list of variables ~ = that belong to the same coordinator agent of xi , and D Dp , . . . , Di , . . . , Dq the list of the corresponding domains. ~ the thread B z calculates the new structure Given a structure S, j z S~0 j by rotating the torsional angles of the i-th amino acid of ~ according to the value dj ∈ Dj , together with the rotation S of the torsional angles of the amino acids p, . . . , q according to a random subset of Dp × · · · × Di−1 × Di+1 × · · · × Dq . alldistant constraint. Given a set of k conformations of a protein of length n, we verify whether each of them satisfies the alldistant constraint in time O(n): we “unfold” the sequential implementation that uses two nested loops for each pair of atoms by invoking a kernel with a number of threads per block equal to the length of the structure to check. Every time the alldistant constraint is woken up by the labeling of a finite domain variable, a kernel with k blocks and n threads per block checks the consistency of the constraint on the set of k confromations obtained by the previous propagation of the sang and the mang constraint. Observe that the number of threads per block corresponds always to the length of the structure, since each thread maps exactly one amino acid. The alldistant constraint is checked on the whole structure in order to avoid clashes also between atoms of amino acids that are not associated with the current set of variables that have to be labeled. To keep track of the non admissible structures, we use a Boolean array V of size k, and we set the i-th cell of the array to False if the i-th structure of the set does not satisfy the constraint. All the cells of V are initially set to True. As soon as a thread Bij calculates an Euclidean distance between the atoms of the j-th amino acid and another atom of the i-th structure that is less of the minimum threshold, Bij sets the i-th cell of V to False. e) Energy Function:: We implemented the energy function making use of a double level of parallelization: (1) given a set of admissible structures, the energy value of each structure is calculated in parallel by a number of blocks equal to the size of the set, and (2) for a given structure each energy field is calculated in parallel by a thread within the block. To obtain a linear time for the contact and the hydrogen potential we adopt the same strategy used for the alldistant constraint since we need to calculate atom-atom distances for all pairs of amino acids of a given structure. Namely, if we consider a structure of length n, then we use n threads for both the contact potential and the hydrogen bond potential and 2n threads for the correlation and torsional potential. ~1 , . . . , S ~k of structures of length Hence, given the list P~ = S n, it is possible to calculate the energy value of each structure by invoking a kernel with k blocks and 2n + 2 threads per block. Note that in the case of the structure agent we calculate the energy value only on the substructure associated to such agent, since the rest of the structure is not yet folded. To avoid unnecessary computations, we check the Boolean array of admissible structures V that has been set by the previous propagation of the constraints. Hence, we calculate the energy value for the i-th structure if and only if V[i] = True. f) Implementation details about CUDA: Due to the features of the architectural model of CUDA, we must consider three main aspects that can affect the performance of the parallel computations: (1) the maximum number of threads per block, (2) the maximum number of threads, called warp, that can be physically executed in parallel on each processor of the GPU, and (3) the information stored on the device memory and the copies of data to and from the host memory. In this paper we used a (typical) hardware where the maximum number of threads per block is limited to 1024, and the size of a warp is 32. The first restriction could potentially limit the maximum size of the target protein to 1024 when we use a thread per amino acid (e.g., to implement the alldistant constraint). Protein typical size, however, is less than 1024; if this is not the case we might split the computation in multiple executions of the same kernel. Restriction (2) is important when we allow threads of the same warp to diverge to different computational branches. We split the parallel computation of different parts of the kernel code among warps of 32 threads. For example, the kernel computing the energy function is invoked with n e (instead of 2m + 64 threads per block where m = d 32 2n + 2 threads per block): 2m threads are used to compute the contact and hydrogen bond potential, and 64 threads are used to compute the correlation and torsional potential. Aspect (3) regards the optimization of the memory usage in order to achieve maximum memory throughput. CUDA has different types of memory spaces. Each thread block has access to a small amount of a fast shared memory within the scope of the block. In turn, all threads have access to the same global memory. Global memory is much slower than shared memory but it can store more data. Since applications should strive to minimize data transfers between the host and the device (i.e., data transfers with low bandwidth), we reserve in the global memory an array of size equal to the maximum number of structures expected for the sampling of the coordinator agent multiplied the size of a structure. Moreover, we reserve an array of Boolean values for the admissible structures and an array of doubles for the energy values. Each kernel receives the number of structures to be considered and the pointers to the arrays in the global memory, in order to properly overwrite them considering only the memory area affected by the kernel function. The memory transfers to and from the CPU are made at each labeling step by copying the array of structures produced by the propagation of constraints and the array of the energy values. Only the elements affected by the computation of the kernel are transferred into the host memory. To optimize the computation on the device, we store all the structures to be rotated, the structures on which performing the consistency check for the alldistant constraint, and those on which calculating the energy values into the shared memory of the GPU, being careful not to exceed the maximum size available for the device we use in this paper. The shared memory can be a limitation when we manage a large number of structures or very long proteins. Nevertheless, this is not a problem if we consider that the size of our domains is about 300 and the proteins are typically 200–300 amino acids long. These numbers are compatible with the features of CUDA (e.g., max number of threads per block), which makes this architecture particularly efficient on our model. VII. E XPERIMENTAL R ESULTS We report some experimental results obtained from the implementation of the multi-agent system. Experiments are conducted on a Linux Intel Xeon E5645, 2.4 GHz machine. The graphics card is a NVIDIA Tesla C2075. To evaluate the speedup using the GPU, we implemented a sequential version of the multi-agent system and we compared the computational times of the two implementations on a set of target proteins of different lengths. Since the coordinator uses a randomized search strategy, we report the average values of 4 runs. g) Quality evaluation of the predictions: We study the quality of the predicted structures in terms of the RMSD1 value given by the comparison with the real, known, structure. Moreover, we compared our system with the state-of-the-art ITASSER algorithm in the following way: we selected a subset of significant proteins among those used for the benchmark III presented in [5], and we compared the RMSD values between the best structures found by I-TASSER and the structures found by our system (Table I). The most important aspect is the small time (5 minutes) needed for the multi-agent system to find a solution, compared with the 5 CPU hours of [5]. This is a promising result if we consider that the average RMSD for small proteins given by the MAS (4.3) is in line with the one found by I-TASSER (4.0), and that our model is potentially able to find solutions that are relatively close to the native protein in a relatively small amount of time. Protein ID Len. 2CR7 1ITP 1OF9 1TEN 1FAD 1JNU 1GYV 1ORG Average 60 68 77 87 92 104 117 118 90 I-TASSER RMSD 4.6 10.9 3.6 1.6 3.6 2.7 3.3 2.4 4.0 Multi Agent System RMSD Time 3.2 0.720 2.4 2.041 3.3 2.984 3.6 5.428 4.3 3.185 2.7 8.942 5.3 11.068 5.0 5.765 4.3 5.016 TABLE I: Quality evaluation (RMSD in Å, Time in min) h) GPU vs CPU: To compare the MAS solver w.r.t. the sequential version, we used proteins with lengths that range from 12 to 100. Table II reports the times needed by the two implementations to compute each target protein, the speedup (CPU time/GPU time), and the energy values found. The average energy value found by the sequential version is in line with the one found by the parallel version on the GPU, and this aspect is also reflected by the two respective average 1 The Root Mean Square Deviation measures the spatial similarity of corresponding atoms, using an optimal roto-translation to overlap two structures. Protein ID 1LE0 1ZDD 2GP8 2K9D 2IGD 1AIL 1JHG Average Len. 12 34 40 44 61 69 100 51 GPU 0.045 0.325 0.640 0.720 1.851 1.568 5.391 1.505 Time (Min.) CPU 0.185 4.045 6.548 5.952 32.404 22.289 138.032 29.922 Ratio 4.11 12.44 10.23 8.27 17.51 14.21 25.60 19.88 Energy GPU CPU -474.5 -471.9 -2609.7 -2635.3 -3020.3 -3040.1 -6611.1 -6651.1 -8088.2 -7746.6 -11911.5 -11969.3 -23533.3 -22684.7 8035.5 7885.5 TABLE II: Comparison between GPU and the CPU. RMSD values (not reported in the table). However, the parallel implementation provides a significant gain in terms of time, increasing with protein size. The GPU and the CPU costs depend also on the energy function used. A better energy function can use more than three energy fields that can lead to different computational costs. Since the energy function can be changed without affecting the overall structure of the system, we evaluated the differences between the costs of the two implementations without any particular energy function, but using the RMSD w.r.t. the native, known structure as objective function for both the structure and the coordinator agent. Given a protein of length n, the RMSD calculation costs O(n) since we calculate only the n distances between the atoms of the original structure and the corresponding atoms of the predicted one. Table III presents the comparison between the CPU and the GPU implementations using the RMSD as objective function. The benchmark set is the same as before. For each target protein we report the GPU and the CPU costs, the speedup calculated as the ratio between CPU and GPU, and the RMSD calculated on the predicted structure. Computational times Protein ID 1LE0 1ZDD 2GP8 2K9D 2IGD 1AIL 1JHG Average Len. 12 34 40 44 61 69 100 51 GPU 0.020 0.206 0.398 0.380 1.667 1.244 3.906 1.208 Time (Min.) CPU Ratio 0.091 4.55 1.476 7.16 4.863 12.21 4.697 12.36 20.077 12.04 14.220 11.43 87.225 22.33 22.425 18.56 RMSD GPU 0.5 1.0 1.1 1.1 2.0 1.6 2.0 1.3 (Å) CPU 0.5 1.0 1.3 1.1 1.9 1.7 2.1 1.3 TABLE III: GPU vs CPU using the RMSD as Energy are visibly reduced for both the implementations, but the speedup is almost doubled. This shows us that the energy function affects considerably the GPU cost, even if it is implemented in parallel. A more detailed analysis concerns the computational cost of each single function. We calculated the average time required by each kernel function and by the corresponding sequential implementation on the whole benchmark set. The alldistant constraint is the one that affects the least the cost on the GPU, requiring about 10.6% of the whole computation time. On the other hand, the energy function corresponds to about 72.7% of the total cost, while the sang/mang constraints amount to about 16.5%. The situation changes for the functions that require O(n2 ) time on the CPU, where the obtained speedup is remarkable. On the CPU, the consistency check of the alldistant constraint is about the 53.9% of the total cost, followed by the energy function (42.3%) and the sang/mang constraint (3.66%). A larger difference of behavior between the GPU and the CPU can be observed on the time required by the alldistant constraint and the energy function. Again, the constraint affects more the CPU, while the energy interests more the GPU (see Table III). The alldistant constraint is checked on the CPU for all the structures given by the angle constrain propagation, while the energy only on those admissible. The GPU propagates the alldistant constraint in parallel on all the structures of the set, and this affects less the cost. Instead, the cost increases for the energy calculation even if it is performed in parallel, since it has a greater number of mathematical operations w.r.t. those for the propagation of the constraint: adding complex constraints to the model can provide accurate structures while still preserving the speedup. i) Improving the prediction while preserving the speedup: In this section we show that the system is highly modular and that we can easily add geometric constraints that make the structure more realistic while preserving good computational times and, in general, increasing the speedup. We consider the case of modeling the side-chain of each amino acid. This can be done by defining a new constraint that relates the position of each Cα atom with the group R defined on it. The centroid (CG) constraint enforces a relation among four real variables p1 , p2 , p3 , and p4 , . This relation establishes the value to assign to the variable p4 representing the coordinates of the side chain defined on the carbon atom Cαi , given the bend angle formed by the carbon atoms p1 = Cαi−1 , p2 = Cαi , and p3 = Cαi+1 , and the average Cαi –side chain distance [18]. Moreover, this constraint checks the minimum distance between side chains and all the other atoms of the structure in order to avoid steric clashes, as in the case of the alldistant constraint. Note that using a sequential algorithm, it is possible to check the consistency of this constraint on a given assignment of values to the variables in P in time O(n2 ). Again, we adopted the same strategy used for the alldistant constraint to obtain O(n) time on the parallel implementation. In Table IV, we report the execution time of the GPU and the CPU implementations, using both the alldistant constraint and the CG constraint; we also report the speedup measured as the ratio between the CPU and the GPU time, and the energy of the best structure found. The benchmark set is the same used in the previous section. Protein ID 1LE0 1ZDD 2GP8 2K9D 2IGD 1AIL 1JHG Average Len. 12 34 40 44 61 69 100 51 GPU 0.046 0.317 0.612 0.571 2.077 1.622 4.342 1.369 Time (M in.) CPU Ratio 0.331 6.76 4.877 15.38 8.770 14.33 8.501 14.88 51.481 24.79 27.062 16.68 157.106 36.18 36.875 26.93 Energy GPU CPU -475.5 -485.1 -2517.7 -2519.7 -2923.4 -2930.0 -6093.9 -6139.7 -7912.7 -7971.5 -11614.0 -11651.8 -22258.7 -21317.0 7685.1 7573.5 TABLE IV: GPU vs CPU exploiting the CG constraint Let us observe that the speedup increases, since a O(n) (resp., O(n2 )) time function is added to the parallel (resp., sequential) implementation. The use of the GPU allows us to easily improve the model by adding constraints that do not overly affect the execution time. The same argument does not hold for the sequential implementation. An example is the 1JHG protein, where the time for the CPU increases to 19 minutes (using CG), while the time on the GPU decreases. The CG constraint prunes the search at the cost of its filtering algorithm, amortized by the parallel implementation. VIII. C ONCLUSIONS AND F UTURE WORK In this paper we presented a novel perspective to the Protein Structure Prediction Problem. We used a declarative approach for ab-initio simulation, implementing a multi agent system. Moreover, we used the GPU architecture to explore the search space and to propagate constraints. The results are remarkable. The use of GPU allows us to obtain speedups up to 36. As future work, we plan to extend the multi agent system by assigning different GPUs to different agents in a MultiGPU environment. Moreover, we plan to improve the quality of the result by using appropriate constraints for each type of secondary structure elements. A careful handling of constraint propagation for the employed constraints is under development. R EFERENCES [1] C. B. Anfinsen, “Principles that govern the folding of protein chains,” Science, vol. 181, pp. 223–230, 1973. [2] A. Beberg et al., “Folding@home: Lessons from eight years of volunteer distributed computing,” in Parallel and Distributed Processing, IEEE International Symp., 2009, pp. 1–8. [3] IBM Blue Gene Team, “Blue Gene: A vision for protein science using a petaflop supercomputer,” IBM Systems Journal, vol. 40, 2001. [4] L.C.T. Pierce et al., “Routine access to millisecond time scale events with accelerated molecular dynamics,” Journal of Chemical Theory and Computation, vol. 8, pp. 2997–3002, 2012. [5] S. Wu et al., “Ab initio modeling of small proteins by iterative tasser simulations,” BMC Biology, vol. 5, 2007. [6] “I-tasser online, protein structure and function prediction,” http:// zhanglab.ccmb.med.umich.edu/I-TASSER/. [7] J. Moult et al., “Critical assessment of methods of protein structure prediction (casp): Round iii,” Proteins, vol. Suppl 3, pp. 2–6, 1999. [8] R. Backofen and S. Will, “A constraint-based approach to fast structure prediction in 3D protein models,” Constraints, vol. 11, no. 1, 2006. [9] A. Dal Palù et al., “Constraint logic programming approach to protein structure prediction,” BMC Bioinformatics, vol. 5, no. 186, 2004. [10] A. Dal Palù et al., “CLP-based protein fragment assembly,” TPLP, vol. 10, no. 4-6, pp. 709–724, 2010. [11] L. Palopoli and G. Terracina, “Coopps: a system for the cooperative prediction of protein structures,” J. Bioinformatics and Computational Biology, vol. 2, no. 3, 2004. [12] L. Bortolussi et al., “Agent-based protein structure prediction,” Multiagent & Grid Systems, vol. 3, no. 2, pp. 183–197, 2007. [13] P.P. Gonzalez Perez et al., “Multi-agent systems applied in the modeling and simulation of biological problems,” World Academy of Science, Engineering and Technology, vol. 58, p. 128, 2009. [14] F. Fogolari et al., “Scoring predictive models using a reduced representation of proteins: model and energy definition,” BMC Structural Biology, vol. 7, no. 15, 2007. [15] G. Czibula et al., “Solving the protein folding problem using a distributed q-learning approach,” Journal of Computer Technology and Applications, vol. 2, pp. 404–413, 2011. [16] A. Morozov and T. Kortemme, “Potential functions for hydrogen bonds in protein structure prediction and design,” Advances in Protein Chemistry, vol. 4, no. 72, 2005. [17] M. Wooldridge, “An introduction to MAS,” 2002, Wiley & Sons. [18] F. Fogolari et al., “Modeling of polypeptide chains as C-α chains, Cα chains with C-β, and C-α chains with ellipsoidal lateral chains,” Biophysical Journal, vol. 70, pp. 1183–1197, 1996.
© Copyright 2026 Paperzz