TEL-AVIV UNIVERSITY RAYMOND AND BEVERLY SACKLER FACULTY OF EXACT SCIENCES BLAVATNIK SCHOOL OF COMPUTER SCIENCE ParaDock – a flexible DNA rigid protein geometric complementarity based docking algorithm This thesis is submitted in partial fulfillment of the requirements for the M.Sc. degree in the Blavatnik School of Computer Science, Tel-Aviv University by Itamar Banitt The research work in this thesis has been carried out at Tel-Aviv University under the supervision of Prof. Haim Wolfson September 2010 1 Abstract Accurate prediction of protein-DNA complexes could provide an important stepping stone towards a thorough comprehension of vital intracellular processes. Few attempts were made to tackle this issue, through binding patch prediction, protein function classification, and distance constraints based docking. In this work we introduce ParaDock: a novel ab initio protein – DNA docking algorithm. ParaDock combines short DNA sections, rigidly docked to the protein based on geometric complementarity, to create planarly bent DNA molecules, ranked using a geometrical complementarity score. Our algorithm was tested on the bound and unbound targets of a protein-DNA benchmark including 47 complexes. With neither addressing protein flexibility, nor applying any refinement procedure, CAPRI acceptable solutions were ranked in the top 10 of 83% of the bound complexes, and 70% of the unbound. Without requiring prior knowledge of DNA length and sequence, and within less than 2h per target on a standard 2.0GHz CPU, ParaDock offers a fast ab initio docking solution. 2 Table of contents Motivation 4 Background Biological background 5 Molecular Docking 6 Related work Protein – DNA binding characterization 9 Existing prediction solutions 9 ParaDock algorithm Algorithm Motivation 11 Concept of the algorithm 12 Algorithm details PatchDock 13 Partial solution filtering 13 Curve creation 14 Molecular solution 14 Scoring 15 Experimental results Data set 16 Solution evaluation 16 Results 11 Discussion 20 Future work 22 Appendix Implementation 23 Complexity 24 Main parameters 25 Benchmark contents 26 References 28 Hebrew abstract 33 3 Motivation The central dogma of molecular biology (1) states that genetic information stored in the DNA is transcripted and then translated to create proteins in all living cells. This widely agreed conjecture places the DNA molecule as an extremely important element in many intracellular processes. All DNA functions within the living cell, including among others transcription, replication, damage repair, packing and strand splitting, are dependent on interaction with proteins. Comprehensive understanding of these interactions, of which uncovering of the structure of protein – DNA complexes is a vital component, will most likely provide important insights to the inner cell systems, such as gene expression mechanisms, and DNA related diseases. Although many studies tackled various aspects of protein – DNA binding (some of which will be detailed hereafter), no well established algorithm exists for ab initio protein - DNA docking. Such an algorithm can be an important stepping stone in the establishment of a complete structural complex. 4 Background Biological background The DeoxyriboNucleic Acid (DNA) was first described in 1869 by Friedrich Miescher (2). It is a polymeric macromolecule built as a chain of nucleotides, each composed of a nucleic acid, a sugar group, and a phosphate group(3). Four types of nucleotides appear in natural DNA - adenine, guanine, thymine and cytosine. While other polymers present varied structural behavior, DNA is almost exclusively found as a double-stranded molecule, containing two polymeric chains bound together. The nucleotides of both chains create Watson-Crick pairs: adenine bases of one chain are paired with thymine bases of the other, and the same goes for guanine and cytosine. Hydrogen bonds between adjacent pairs create the well known helical structure of the DNA molecule, named B-form. DNA molecules vary in length, depending on the organism, and can be very long: e.g. human chromosome #1, is ~220 million base pairs long. DNA protein interaction, responsible for many crucial intracellular processes, takes the form of intermolecular binding, in which a DNA molecule and one or more protein molecules create a polymolecular complex. While DNA is found in nature almost solely in B-form, this native DNA structure changes noticeably when bound to a protein. A protein – DNA benchmark published by Bonvin et al. (4), contains an assembly of 47 complexes covering most major groups of DNA binding proteins, as these groups were defined by Luscombe et al.(5). In this benchmark, interface RMSD was calculated for each complex, resulting in 12 complexes displaying over 5.0 Å IRMSD, and another 22 of more than 2.0 Å, thus demonstrating the magnitude of structural deformation in protein DNA binding, projecting on its importance and affect on docking attempts. The DNA molecular deformations found in the majority of the complexes can be classified (according to 6) into just a few types: bending, twisting, groove widening or compression, and base flipping. Visualization of such deformations can be seen in figure 1, and representatives of all types can be found in the previously mentioned benchmark (4). Despite the numerous types, bending is the most common, and, more importantly, noticeably has the greatest effect on the DNA molecule global shape, whereas other types of structure distortion are of a more local character. 5 a b c d Figure 1 - various conformational changes undergone by DNA molecules when bound to proteins: a – base flipping (1EMH); b – helix unwinding (1EYU); c – bending (1O3T); d – A-form helix (1QRV) Molecular docking The newly developed field of bioinformatics employs mathematical, statistical and computational tools and methods, to address various aspects of molecular biology. Probably the most common practice of bioinformatics involves the analysis of biological sequences, referring both to DNA and protein sequences. Studies of gene expression and regulation, modeling of evolutionary processes, and computational systems biology are some of the other recent additions to this newborn field of science. Among these, structural bioinformatics deals with the development and analysis of prediction tools based on structural data, notably, focusing on the prediction of folding and docking of macromolecules. While folding attempts to predict the structure of a molecule given its sequence (be it protein or RNA), docking deals with computationally uncovering the structure of a complex composed of two or more molecules bound together. Numerous docking algorithms have been published in the last two decades following the immense increase in available biological data, much owing to the human genome project. Docking techniques differ from one another both in concept and purpose, but the vast majority are composed of two main elements: a search algorithm and a scoring function. Difficulty of docking is in a large extent the result of the vastness of the conformation space. Three translational and three rotational degrees of freedom, with additional degrees as molecular flexibility is introduced, combined with the large size of the molecules involved, generate too large a space to be searched exhaustively. Thus, the search algorithm is aimed at locating the largest proportion of eligible candidates in minimum time, and several such approaches can be identified when surveying the various solution methods (7). 6 Conformational sampling methods use known properties of bound complexes (such as shape complementarity) to actively and specifically generate the conformations to be considered. These techniques embed an assumption of some sort regarding the final solution (inspiring the method by which the complexes are generated), resulting in great speed, accompanied with possible weaknesses produced by the assumptions in question. PatchDock (8) for example, generates conformations based on surface shape complementarity. Incremental construction methods, used mainly to dock small, very flexible ligands, reduce the search space by anchoring a part of the ligand to a predicted binding site, and building the rest of it piece by piece, fixing the last added fragment in place. FlexX (9) may be considered as a representative of this sort of algorithms. Stochastic methods scan the conformational space on the fly. Normally, a population of random initial conformations is generated, after which multiple small steps are executed, slowly transforming each initial solution. Transformation in each step can be carried out in different ways, many of them involving an energy function. Monte Carlo based algorithms, as well as simulated annealing, steepest descent, and genetic algorithms have been published among others. ROSSETA (10) and HADDOCK (11) are members of this family. Scoring functions are the second major element of docking algorithms, and are commonly dual purposed, used both for directing the search algorithms and assessing candidate poses for final ranking (7). As in search algorithms, scoring functions can be roughly sorted into three main categories. Force field based functions employ a direct calculation of forces, based on physicochemical properties, including kinetic, thermodynamic, electrostatic and others, which are active at the molecular and intermolecular scale. As the molecular system in question presents high complexity, force fields are difficult to calculate, and elements not accounted for might significantly influence results (e.g. water molecules). Empirical scoring functions use an energy function approximated by the most (allegedly) relevant factors, often including hydrogen bonds, hydrophobic effects, electrostatics, etc. Such simplifications offer improved speed, but might reduce accuracy. Knowledge based functions abandon the physical and chemical known forces, and adopt a more pragmatic potential, based on the statistical analysis of databases containing known complex structures. Contacts can be categorized into types (e.g. certain pairs of amino acids) to create typical distributions used to assess the 7 candidate conformations. HADDOCK (11), PatchDock (8), and DrugScore (12) can be considered representatives of force field, empirical and knowledge based functions respectively. Despite this categorized summary, the variety and diversity of docking algorithms is immense. Many do not fit the categories mentioned here or combine some of them, and the wide selection cannot be thoroughly described in this scope. 8 Related work Protein-DNA binding characterization A vital key to the comprehension and prediction of DNA – protein interactions, lies in the broad and general characterization of the complex's intermolecular interface and the conformational changes undergone by the participating molecules during binding. Among the first characteristics to be spotted in the early DNA - protein complexes found, was the abundance of positively charged amino acids concentrated in the binding site upon the protein surface (13, 14). This positive electrostatic charge is assumed to complement the negative charge found mainly on the B-form DNA backbone. It was further suggested in some studies, that this electrostatic compatibility steers protein – DNA recognition, and is the source of the primary force pulling the molecules together (15, 16). A comprehensive study of the protein surface patch taking part in DNA binding was performed by Thornton et al. (17). All intermolecular contacts between protein and DNA atoms were surveyed through 129 protein DNA complexes. For each type of amino acid, the number of found contacts to the DNA molecule was divided by the expected number of contacts (calculated using that amino acid relative ASA, and the native binding site size), thus receiving a parameter representing each acid type's propensity to appear in the binding site. While the expected propensity value would be 1.0, many types of acids (including not only the electrostatically charged acids) presented significantly different values. Existing prediction solutions Several different approaches and methods were employed trying to predict DNA – protein binding, addressing various aspects of the problem. Function recognition algorithms attempt to predict whether a certain protein is DNA binding or not. Such algorithm was introduced by Mandel-Gutfreund et al. (18), based on the aforementioned electrostatic properties of DNA binding sites. Given a protein structure, the surface electrostatic potential was assessed, and the largest electrostatic patch located. The boolean prediction (binding or non-binding) was based on the 9 size (ASA) of the positive patch, as well as its average potential. Three DNA binding structural motifs (helixturn-helix, helix-hairpin-helix, and helix-loop-helix) were coupled by Thornton et al (19) with the electrostatic properties at the suggested binding site for this same purpose, and shown to possess significant discriminating power. Further work done by Thornton et al. (20) tackles a different perspective of protein-DNA binding, trying to predict the binding site of the protein. Their work embodies the comparison between the abilities of 5 different score functions to distinguish the native binding site on the protein surface from equally sized decoys on that same protein. The scores studied were each based on one of the following properties of the candidate binding site: ASA (accessible surface area), amino acid propensities, electrostatic charge, hydrophobicity, and residue conservation. The electrostatic and amino acid propensity based scores, have shown good performance, while all three remaining scores exhibited poor prediction ability. Other attempts to locate the native binding site employed machine learning methods, such as SVM and perceptron, based on sequential motifs and conserved residues data (21-23). All mentioned attempts showed ability to predict the binding site with non trivial probability, none managing to achieve more than 85% accuracy. Recent publications (24, 25) by Bonvin et al. embody the first large scale head on confrontation with protein – DNA docking. The HADDOCK algorithm, applied to the whole (previously discussed) benchmark, modeling both protein and DNA flexibility, displayed highly accurate results. The algorithm utilizes distance constraints extracted from experimental data (gathered from various possible sources, such as NMR, conservation data, etc.), to reconstruct and refine the protein-DNA complex. In this recent paper (25), the experimental data was replaced with distance constraints extracted from the target complexes themselves. The results, measured by the CAPRI (Critical Assessment of PRedicted Interactions) standards (26) of fnat (Frequency of native contacts) and IRMS (Interface RMS) , included a high proportion of extremely accurate solutions, depicting HADDOCK's ability to accurately predict DNA-protein complexes, provided that the bulk of the interface distance constraints are experimentally supplied a-priori by experiment or other prior knowledge. 11 ParaDock algorithm Algorithm Motivation Three observations have motivated the algorithmic design of ParaDock. The first is the well established structural consistency and repetition of B-form DNA. Second, is the ability to characterize probable binding sites, using geometric shape complementarity, electrostatics and amino acid propensities, as was earlier discussed. The third observation is the approximal planarity displayed by bound DNA helices. Two studies indirectly point to this conclusion. In the previously mentioned survey done by Thornton et al. (6) bending is the most common of all deformation modes, and its influence on global molecular shape seems the most significant. A second perspective as displayed by Dickerson (27, 28) views these deformations through the angular position of the base-pairs, relative to the helix central axis. Each such deformation is classified as major or minor, and further divided into three modes – kinks, planar curves, and writhes. While smooth planar curves are the least common in this study, it can be observed that kinks and minor writhes create approximately planar DNA curves, as the deformation is local. Therefore, apart from major writhed molecules, amounting to 10 out of 86 surveyed complexes, the surveyed DNA molecules are approximately planar. Within the 10 writhed complexes, some complete a full writhe, eventually returning to close to planar form. To further strengthen the claim of DNA planarity (at the relevant scale), we have approximated the DNA target molecules of the protein-DNA benchmark (4) with planes. Only an average of 0.08% of the atoms were located farther than 12 Å (DNA helix radius (29)) from the approximated plane, and only in a single complex, 1EMH, the ratio of these distant atoms was higher than 1% - easily explicable by a flipped base (shown in figure 1a). 11 Concept of the ParaDock algorithm The structural repetition and DNA planarity suggest the combination of partial rigid docking solutions (filtered using electrostatics, geometry and amino acids propensities) into a complete structure, and the approximation of the central DNA axis using planar curves. The algorithm structure is as follows: I - Local docking and filtering – local rigid docking solutions of the protein and short DNA section are found using PatchDock (8), and then filtered. II - Co – linear pairs of solutions, and co – planar triplets are located, and planar conic curves as well as linear curves are generated to fit them. III - Molecular solutions are constructed along the curves. IV - Solutions are scored and ranked by geometrical complementarity. Figure 2 Illustration of the ParaDock algorithm. Illustrations are genuine figures from the 1O3T test run. 12 Algorithm details PatchDock Paradock's input is a PDB file, containing the sequence and structure of the protein in question. First, we seek to find rigid partial solutions, indicating local complementarity of the protein's shape to B-form DNA. We use PatchDock (8), a geometric hashing (30) based rigid docking algorithm, which detects shape complementarity of molecular surfaces. Input to PatchDock is the protein molecule, and a short (6 base pairs) section of B-form DNA, of arbitrary sequence. PatchDock program parameters were adjusted to yield 15 - 30 thousand solutions, each of which is a complex containing the protein and the short DNA section docked together. PatchDock scores the solutions, using a distance transform grid, thus scoring every atom of the ligand (in our case, the short DNA section molecule), by its distance from the protein surface. Distant atoms (more than 1 Å from the surface) are not scored, close atoms (-1, +1 Å) receive a positive score, and penetrating atoms receive a negative score, respective to the penetration depth. Partial solution filtering Two other scoring functions are added to the geometric score calculated by PatchDock in order to filter out the relevant partial solutions. A primitive electrostatic score is the number of interacting atoms of the protein, belonging to a positively charged amino acid, minus the number of these belonging to negatively charged amino acids. This score gives a rough approximation of the electrostatic charge of the interface patch on the protein surface, very simply and swiftly. Finally, a third score was devised to account for the amino acid propensities observed by Thornton (17) et al. Protein atoms participating in the interface were counted by their residue type, resulting in a 20th dimensional vector. The inner product of this vector and the vector of average propensities found in (17), weighing both patch composition and size, was used as the third score. Out of all the solutions generated by PatchDock, a mere 2000, having the highest scores, were left to proceed to the next step, focusing attention on specific areas on the protein surface, possessing properties of electrostatics, geometry and amino acid composition, compatible with DNA binding patches. 13 Curve creation Next, we exploit the consistency of B-DNA shape, and represent the partial solutions by their central axes only, discarding the molecular data of the DNA. Hence, we are left with a protein molecule, surrounded by a cloud of segments in 3 dimensions. It should be noted that the phase of the partial solutions, i.e. their rotational angle around the central axis, is discarded along with the whole molecular data. It will be implicitly accounted for during scoring . Utilizing the aforementioned relative planarity of DNA molecules we now combine pairs and triplets of segments to generate long planar solutions, employing the known local compatibility to the protein. The segments extracted from the PatchDock filtered results are scanned to find co-linear pairs, and co-planar triplets (with necessary certain tolerances), used in turn to create linear and conic curves respectively. These approximate well the central axes of observed DNA molecules. The produced curves represent the location of the desired DNA molecule central axis, around which we will forthwith build the solution molecule. The creation of conic curves (the handling of which was implemented using CGAL (30)) produces undesired effects such as complete circles and ellipses, alongside with double branched hyperbolas, all of which are not particularly compatible with known DNA molecules. These are therefore trimmed, to consist of a single branch, with an overall turn angle not exceeding 90 degrees. The curve is trimmed to leave a section that is coherent with the maximal number of segments out of the 2000 filtered solutions. A coherent segment is defined as one that is close to the curve, and almost parallel to its tangent. Curves containing points with curvature higher than a certain threshold are discarded (as very sharp kinks are not common in B-form DNA, and might result in DNA molecules "wrapped" around the protein, thus receiving a very high score). Molecular solution Around the trimmed curve, a DNA molecule needs to be constructed. First, a template molecule of arbitrary sequence is multiplied to create a straight DNA molecule of the desired length. In order to bend this molecule, its central axis is segmented, and each segment is paired with a section on the target curve. Rigid transformations are created transforming each such segment onto its corresponding section on the curve. Finally, each atom from the straight molecule is transformed onto its location using the appropriate 14 transformation, creating the final bent molecule. Several molecules are created for each curve, using different translations along the central axis, and rotational angles around it. A geometrical score is calculated to choose the final candidate molecule. These final molecules are bent only DNA molecules, not accounting for any other type of flexibility. Scoring Several scoring functions were examined when attempting to rank the 1,000 – 2,000 solutions each complex produced: I - Electrostatics and amino acid propensities were earlier described. They have shown mediocre results as basis to score functions, possibly because they were already used in filtering the partial solutions. II - The amount of coherent partial solutions, which was used while trimming the curves, might have been a good measure for geometrical compatibility, and as small tolerances are allowed might have compensated for the lack of local flexibility in the final solution. Partial solutions individual scores (geometric, electrostatic, propensities) were also considered. Probably due to the uneven spread of partial solutions, this score function seemed to let their spatial density to weigh heavily on final ranking. III - Curvature of the central axis curve might represent the energetic cost of the deformation involved, but has also shown poor performance. IV - PatchDock geometrical score, plainly yielding the best results, was eventually used. The geometrical score function uses a distance transform grid to score every atom of the DNA molecule by its distance from the protein surface. Distant atoms (more than 1 Å from the surface) are not scored, close atoms (-1, +1 Å) receive a positive score, and penetrating atoms receive a negative score, respective to the penetration depth. Many variations of these basic score functions including weighted combinations were tested, none proving better than the geometrical score. Program output is therefore a geometrically ranked list of solution complexes, each containing the unchanged protein docked with an arbitrary sequenced DNA molecule, of varying length. 15 Experimental results Data set The Bonvin et al. benchmark (4) consists of experimentally elucidated complexes of 47 different DNA binding proteins, bound to DNA, alongside with the native unbound proteins. The benchmark contains representatives of seven out of the eight groups defined in the structural classification of protein DNA complexes by Luscombe et al. (5), categorized by their IRMS (Interface RMS) as easy, intermediate or hard. IRMS is the root mean square deviation between interface atoms of both molecules in the bound and unbound version, representing the magnitude of conformational changes undergone by both molecules when bound to each other. ParaDock was tested on both bound and unbound cases. The DNA molecule was removed from the target complex, leaving the protein structure as the input for ParaDock in the bound case, its results to be compared back with the target complex. As the protein's flexibility is not modeled, ParaDock's output, when run on the unbound protein, must be aligned with the target complex in order to evaluate correctness. This was established using our C-alpha based structural alignment server (30, 32). Proteins which bind to DNA in a non-monomeric form constitute a significant portion of the benchmark. Because Paradock is currently unable to deal with multiple molecules, only one copy of the protein was left in each complex, both as input and for solution evaluation. Solution evaluation ParaDock does not require knowledge of the DNA sequence and length, thus requiring some alterations in the result evaluation measures commonly used in docking attempts. Difference of length between the target DNA and the DNA output molecule, combined with the arbitrarity of the sequence used to create the solution, does not allow for a straightforward comparison between the two molecules, owing to three facts: (i) The correspondence between the target base pairs and the solution base pairs is undefined. (ii) Not all atoms that exist in the target DNA appear in the solution, as the DNA sequence differs. (iii) Some base pairs of the target might be missing, and vice versa (as lengths differ). Drawn from the CAPRI evaluation methods (26), fnat is 16 the first measure used with a slight adjustment. Fnat (Fraction of the NATive contacts) is calculated as the proportion of native contacts between a nucleotide and an amino acid that appear in the evaluated solution. As the correspondence of the base pairs is not given, all possible correspondences are examined (maintaining order and sequence of base pairs), and the highest scoring is set as the work point correspondence list. To complete fnat, fnonnat is also calculated, being the proportion of the contacts found in the evaluated solution that do not exist in the target complex. IRMS, also drawn from CAPRI, is calculated between the solution and the target locations of all backbone atoms in the nucleotides and amino acids, which take part in the interface. Another important measure is the number of intermolecular clashes in the solution. A large amount of clashes will inflate the interface in both molecules, creating more contacts between nucleotides and amino acids, thus resulting in high fnat numbers. Clashes were defined to fit the CAPRI definition, as non-hydrogen atoms from different molecules, less than 3 Å apart. It may be noted that the CAPRI evaluation system utilizes another measure – ligand RMSD, calculated for the whole ligand, and used as an alternative to IRMS. DNA "tails" far from the protein are very difficult to predict, and create an increase in ligand RMSD, more suited for use with smaller (in diameter) ligands. Since it is fully interchangeable with IRMS in terms of CAPRI standards, we have not used it here. Interface distance threshold was taken as 5 Å. To better perceive these measures, a few solution complexes are illustrated in figure 3. Results ParaDock was run on the 47 benchmark complexes, in both bound and unbound versions. For each complex approximately 15,000 – 30,000 partial solutions were found by PatchDock, usually dispersed over the entire protein surface (as illustrated in figure 2). The 2,000 filtered partial solutions (in most complexes) are distinctly concentrated in certain areas (figure 2), implying either geometric, electrostatic or propensity compatibility in that location on the surface. Within these, 300 – 500 pairs of collinear solutions were normally located (each of which yields a linear curve), alongside 2 – 3 million coplanar triplets, resulting in 1,000 – 2,000 solutions of both types (as most segment triplets are unsuitable to create a conic curve). 17 ParaDock was run on a single 2.0 GHz processor, with an average overall run time of ~2 hours (~45 minutes of which consumed by PatchDock). CAPRI definition of an acceptable solution which was used here, is either fnat >= 0.3, or fnat >=0.1 and IRMS < 4.0 Å. When tested on the bound version, ParaDock places an acceptable or better solution within the 10 top ranking solutions, in 83% of the cases. From the remaining, another 13% presented a solution with fnat > 0.2 (failing only to reach IRMS < 4.0 Å). ParaDock exhibited slightly worse results when run on the unbound version of the benchmark. Scoring parameters were altered to allow more collisions (compensating for the lack of protein flexibility), and a clustering algorithm was applied to filter out similar solutions. An acceptable solution was found in the top 10 of 70% of the complexes, with an additional 17% with fnat > 0.2. Illustrations of result complexes are depicted in figure 3. Both bound and unbound results are detailed in table 1. b a Figure 3 – Illustration of best result complexes within the top 10 ranking of four targets. Proteins are shown in blue (differences between bound and unbound versions can be observed), native solution in red, ParaDock solution in green. Trimmed curve (DNA central axis) appears in black. c d 18 a b c d complex 1B3T unbound 1B3T bound 1DFM unbound 1DFM bound Fnat 0.37 0.61 0.23 0.67 Fnonnat 0.61 0.42 0.78 0.64 IRMS (Å) 5.44 3.37 6.7 5.5 Best in top 10 - bound protein complex 2C5R 1PT3 1MNN 1FOK 1KSY 3CRO 1EMH 1H9T 1TRO 1BY4 1HJC 1DIZ 1RPE 1VRR 1F4K 1K79 1KC6 1EA4 1Z63 1R4O 1AZP 1W0T 1CMA 1JJ4 1VAS 4KTQ 1Z9C 1DDN 2IRF 1JT0 1G9Z 1A73 2FIO 1QNE 1ZS4 1QRV 1O3T 1B3T 3BAM 1RVA 1ZME 1DFM 1BDT 2FL3 7MHT 1EYU 2OAA Average: native IRMS 0.49 1.35 1.48 1.53 1.58 1.58 1.62 1.68 1.7 1.77 1.8 1.82 1.87 2.08 2.26 2.37 2.38 2.43 2.51 2.61 2.7 2.78 2.81 2.83 3.04 3.23 3.24 3.26 3.35 3.49 3.67 4.26 4.41 4.57 4.71 5.19 5.2 5.32 5.55 5.68 5.76 6.31 6.45 6.71 6.71 6.82 8.95 evaluation medium medium acceptable medium acceptable acceptable medium medium medium medium acceptable medium medium medium medium acceptable medium medium acceptable acceptable acceptable medium medium medium medium acceptable medium acceptable medium medium medium medium medium medium acceptable acceptable medium acceptable medium Best in top 10 - unbound protein Fnat 0.62 0.73 0.49 0.64 0.36 0.07 0.43 0.52 0.73 0.29 0.90 0.50 0.38 0.85 0.21 0.77 0.75 0.94 0.30 0.55 0.52 0.40 0.29 0.41 0.33 0.21 0.22 0.90 0.81 0.00 0.61 0.29 0.51 0.32 0.53 0.43 0.74 0.58 0.71 0.59 0.55 0.67 0.36 0.44 0.57 0.40 0.64 Fnonnat 0.90 0.72 0.65 0.58 0.89 0.97 0.84 0.72 0.72 0.89 0.50 0.76 0.79 0.60 0.89 0.48 0.73 0.63 0.95 0.80 0.82 0.68 0.88 0.71 0.80 0.88 0.93 0.61 0.64 0.00 0.48 0.74 0.75 0.78 0.70 0.84 0.57 0.47 0.50 0.55 0.72 0.64 0.74 0.63 0.71 0.73 0.46 IRMS 8.8 4.6 3.8 3.1 9.8 5.1 3.3 6.0 3.8 9.6 2.7 5.1 8.9 4.9 11.4 3.1 3.7 3.0 9.0 5.8 9.2 5.8 8.2 4.8 6.4 9.8 7.6 3.1 2.7 25.1 5.0 10.0 5.1 10.3 3.7 8.7 3.1 4.1 2.4 4.3 6.4 5.5 12.4 3.2 5.7 6.1 2.9 0.51 0.70 6.3 evaluation medium medium acceptable acceptable acceptable medium acceptable medium acceptable acceptable medium acceptable medium medium medium acceptable medium acceptable acceptable acceptable acceptable acceptable acceptable acceptable acceptable medium acceptable acceptable acceptable medium acceptable acceptable acceptable Fnat 1.00 0.50 0.41 0.32 0.32 0.58 0.27 0.29 0.16 0.42 0.50 0.33 0.47 0.70 0.21 0.31 0.56 0.65 0.53 0.30 0.39 0.79 0.38 0.38 0.39 0.21 0.41 0.40 0.45 0.09 0.44 0.29 0.15 0.32 0.06 0.88 0.08 0.37 0.32 0.13 0.21 0.23 0.42 0.56 0.48 0.37 0.38 Fnonnat 0.92 0.78 0.88 0.72 0.91 0.63 0.89 0.84 0.95 0.86 0.81 0.83 0.73 0.57 0.91 0.84 0.66 0.62 0.88 0.89 0.85 0.70 0.85 0.62 0.75 0.89 0.73 0.84 0.81 0.96 0.55 0.72 0.94 0.79 0.96 0.87 0.96 0.61 0.82 0.91 0.88 0.78 0.66 0.61 0.59 0.67 0.59 IRMS 4.3 5.2 5.6 5.3 9.5 3.6 5.2 13.1 10.5 6.9 8.2 7.9 7.2 2.6 12.2 7.8 4.2 3.3 5.8 10.5 11.7 3.1 6.9 2.7 7.8 7.1 5.0 4.3 4.8 12.4 5.1 10.4 13.1 9.0 14.5 4.6 17.0 5.4 11.0 16.5 15.4 6.7 5.7 5.4 6.9 8.4 8.4 0.39 0.79 7.8 Table 1 - ParaDock results as tested on the benchmark. Shown are the best solutions within the top 10 ranked. Native IRMS (Å) – IRMS between bound and unbound molecules, Taken from (4). Evaluation – solution classification according to CAPRI standards. Fnat - The proportion of native contacts between a nucleotide and an amino acid that appear in the evaluated solution. Fnonnat - The proportion of the contacts found in the evaluated solution that do not exist in the target complex. IRMS (Å) – calculated between the solution and the target locations of all backbone atoms in the nucleotides and amino acids, which take part in the interface. Interface distance is defined to be 5 Å, as in CAPRI. 19 The acceptable solutions (both bound and unbound) present an average of 135.5 clashes per solution. More than 60% of these are very shallow clashes (less than 3 Å deep into the protein), which will most likely be handled easily by a future refinement procedure , leaving an average of 54 non-shallow clashes per solution, well in the neighborhood of normally accepted CAPRI solutions. Discussion ParaDock provides a protein – DNA docking solution based entirely on the protein's structure, requiring neither distance constraints, nor any information about the DNA molecule (sequence, length, etc.). We have employed generic observations regarding the electrostatic and geometric properties, as well as amino acidpropensities of protein – DNA complexes, and made simplifying assumptions on the flexibility of DNA molecules, thus deriving a simple and fast algorithm, which combines partial binding solutions to predict complexes containing long bent DNA molecules. CAPRI acceptable solutions are within the 10 top ranked solutions in 76% of the bound and 70% of the unbound cases, prior to any form of refinement. The algorithm presents very good ability to predict the location and general position of the DNA molecule, though it exhibits lower accuracy when it comes to RMSD measures. This is explicable due to the lack of refinement and the simplification of flexibility in the DNA molecule, combined with the unaddressed flexibility of the protein. A succeeding stage of such local refinement is obviously called for, but remains outside of this work's scope. As expected, ParaDock presents better performance on bound versions of the proteins. Target complexes in table 2 are sorted by their bound-unbound IRMS, representing the magnitude of the conformational changes undergone during binding and as can be foreseen, it is apparent that in the unbound version results deteriorate as native IRMS rises. In a vast majority of the solutions, fnat + fnonnat > 1, implying that the molecular interface found on the solution complex is larger than the target interface. This is due to two possible reasons – first, as mentioned before, the definition of interface used here inflates the interface size as the volume of intermolecular overlap (steric clashes) grows. Second, because the length of the target DNA molecule is not used by the algorithm, output DNA molecule may be longer than the target molecule, likely enlarging intermolecular interface 21 (figure 3c exhibits a solution quite longer than the target molecule). The nature of the geometric score, seeking large complementary interface strengthens the latter explanation, bearing in mind that the solutions in question are amongst the highest scoring solutions. As the average sum of fnat and Fnonnat is merely 1.2, we do not regard this as a significant drawback. Arbitrary DNA sequence was used both as input for PatchDock, and as a template for the full molecular solution. Though such a choice might have interrupted severely with molecular complementarity, it seems not to have deteriorated results significantly (as they are rather good). Despite the fact that energy was very indirectly handled here, this might imply that the main contributors to the affinity of the protein and DNA are not heavily sequence related, thus supporting the "DNA sliding" theory (33). The latter suggests that some DNA binding proteins slide along the DNA searching for their cognate sequence, continuously in contact with the DNA, thereby, exhibiting significant affinity to any DNA sequence. While ParaDock handles the Bonvin et al. benchmark quite well, it is important to mind two main inherent inabilities, in addition to the lack of refinement. The filtering stage denies the creation of irregular solutions, i.e. solutions in which the protein's interface does not sustain such properties of electrostatics and amino acid propensities earlier observed. Such target complexes, will not be predictable by ParaDock. On top of that, as the length of target DNA molecules will increase, the underlying approximation of the central axis with a planar conic curve, will eventually fail. Such targets will demand a different treatment, perhaps of combining several ParaDock solution curves to create a more complex DNA molecule. 21 Future work When discussing future work, solution refinement, modeling local DNA changes as well as protein flexibility, is probably the most obvious choice, enabling increased accuracy in complex prediction combined with a more precise score function (due to local conformation optimizations). Multiple molecule docking can also be handled. Assuming symmetry (as done in Symmdock , (34)), will enable running ParaDock on a single copy of the protein, and concatenating multiple copies of each solution to create final candidate complexes. A more advanced improvement would be an attempt to approximate DNA helices with more complex curves, discarding the planarity assumption, thus creating a larger more diverse solution space. Such curves can either be concatenated conic curves (as used here), other parametric curves, or non-parametric curves custom built using the 3D segments representation. 22 Appendix Implementation The search for co-planar segment triplets is a time consuming task, accounting for a significant portion of ParaDock's overall runtime. The naive algorithm would require n^3 attempts, amounting to 8,000,000,000 possible triplets in our 2,000 segments case. A simple speed-up was achieved by locating all co-planar segment pairs, and storing a list of co-planar mates for each segment. Then, through intersecting every pair of lists, all co-planar triplets are located. As the lists are sorted, their intersection takes the length of i's and j's lists, and the overall complexity is , and being . Worst case complexity is still O(n^3), but as list lengths average 40-50 segments at most, this approach proves 20 times faster. All handling of conic curves was done using CGAL(31). As CGAL is required to be geometrically consistent, all parameters are stored as accurate rational numbers, and all calculations use their full length. This yields some actions which are somewhat time consuming. The transformation of ~2,500,000 co-planar triplets into conic curves presented a problem. As most triplets are unsuitable to create a conic curve (e.g. zigzagging segments), a quick filtering stage was inserted. Conic curves are defined by 5 points. We have selected P1 and P5 to be the farthest pair of segment endpoints, P2 to be the second endpoint of P1's segment, P4 to be the second endpoint of P5's segment, and P3 as the center of the third segment. A segment triplet passes the filtering if for every i,j,k, s.t. i<j<k, IS_LEFT_TURN(Pi,Pj,Pk) = constant. Thus making sure that the chain of points is curved in consistent direction, and does not intersect itself. In most complexes, less than 5% of the triplets passed this test, and many others which passed failed later because the middle segment was not parallel to the created curve. The final result is less than 2,000 solution curves per target. In the unbound version of the benchmark, a clustering method was exercised in order to thin out similar solutions crowding the 10 top ranking solutions. A directional graph was created, solutions being its vertices, and an edge was added from solution i to solution j if more than a certain percentage of j's curve was in close proximity to i's curve, meaning i "covered" j. A greedy algorithm selected the solution with the highest 23 outgoing degree, and deleted all its neighbors, hopefully remaining with fewer, longer and more complete solutions. Complexity PatchDock's complexity is detailed in (35). Scoring of partial solutions (electrostatic score, amino acid propensity score) is done using a distance transform grid, and time complexity is the same as the original PatchDock geometric complementarity score – , DTG is the constant time required to create the distance transform grid, protein_size is the number of protein atoms (relevant when filling the grid), n is the number of partial solutions, and sect_size is the constant number Filtering the solutions merely requires sorting, i.e located in of atoms in the short DNA section. . Co-planar triplets and co-linear pairs are as detailed above, m being the number of partial solutions after filtering (2,000 were used here). Trimming requires per solution, in order to locate all coherent segments using a grid. Assuming a constant maxima of close segments per query point, and minding that solution length is limited by the protein diameter, leaves overall, k being the number of final solutions. Molecular solution creation and scoring takes , where is the length of solution i, rot_num is the number of sampled rotations around the central axis, and trans_num is the number of sampled translations along it. As rot_num and trans_num are constants, overall complexity is . Clustering demands proximity test for each solution pair. Each test is linear to the solution length, thus the overall is . Greedy selection merely requires solution sorting, as does the final ranking. Overall complexity is thus: . n - # of partial solutions, m – number of filtered solutions, k – number of final solutions. 24 PatchDock runtime is on the average ~45 minutes per complex on a single 2.0 GHz processor, program parameters adjusted to create 15 – 30 thousand solutions. Overall actual running time averages ~120 minutes. Main parameters Many parameters were used in the process of implementing ParaDock. Listed here are parameters found to be of significant influence on either results or runtime. parameter parameter default value parameter main influence # of partial solutions that pass filtering 2000 Geometric score in respect do distance from surface. 1 Å < dist : 0 points. -1 Å < dist < 1 Å : 1 points. -2.2 Å < dist < -1 Å : 0 points. -3.6 Å < dist < -2.2 Å : -4 points. dist < -3.6 Å : -8 points. Major effect on runtime, due to m^3 complexity of locating triplets. Influences ability to rank solutions correctly. Lower penalties were used in unbound case for shallow penetrations. Parallelity threshold – minimal absolute value allowed as inner product of two parallel unit vectors. Co-planarity threshold. Two planes were defined, each based on a segment, and the other segment's center point. The threshold is the minimal absolute value allowed as inner product of unit normal vectors of both planes. DNA short section length. The section used as input ligand in PatchDock. 0.96 0.999 6 base pairs. 25 Determines number of linear solutions. Light effect on amount of results. Determines number of co-planar triplets, has significant influence on their numbers, and thus on runtime. Parameter's influence was not thoroughly studied. The shorter the section, solutions will probably tend to posses higher curvature. Benchmark contents (4) PDB id Easy 2c5r 1pt3 1mnn 1fok 1ksy 3cro 1emh 1h9t 1tro 1by4 1hjc 1diz 1rpe Intermediate 1vrr 1f4k 1k79 1kc6 1ea4 1z63 1r4o 1azp 1w0t 1cma 1jj4 1vas 4ktq 1z9c 1ddn 2irf 1jt0 1g9z 1a73 2fio 1qne 1zs4 Protein description Number of IRMS molecules in (Å) target complex Phage PHI29 replication organizer protein P16.7 Col-E7 nuclease domain Sporulation specific transcription factor NDT80 Restriction endonuclease FOKI Papillomavirus replication initiation domain E-1 Phage 434 CRO Human uracil-DNA glucosylase FADR, fatty acid responsive transcription factor TRP repressor Retinoid X receptor DNA binding domain RUNX1 runt domain E. coli 3-methyladenine DNA glycosylase II Phage 434 repressor 4 2 2 2 3 3 2 3 3 3 2 2 3 0.49 1.35 1.48 1.53 1.58 1.58 1.62 1.68 1.70 1.77 1.80 1.82 1.87 Restriction endonuclease BSTYI Replication terminator protein ETS-1 DNA binding and autoinhibitory domain Restriction endonuclease HINCII Transcription repressor COPG Sulfolobus solfataricus WI2/SNF2 ATPase core domain Glucocorticoid receptor Hyperthermophile chromosomal protein SAC7D HTRF1 DNA-binding domain Methionine repressor Papillomavirus type 18 E2 pyrimidine dimer specific excision repair DNA polymerase Organic hydroperoxide resistence transcription regulator Diphtheria TOX repressor Interferon Regulatory Factor 2 Multidrug binding transcription factor QACR I-CreI endonuclease Intron-encoded homing endonuclease I-PPOI Phage PHI29 transcription regulator P4 Adenovirus major late promotor TBP Phage lambda CII 3 3 2 3 3 2 2.08 2.26 2.37 2.38 2.43 2.51 3 2 3 2 2 2 2 3 2.61 2.70 2.78 2.81 2.83 3.04 3.23 3.24 2 2 2 2 2 2 2 2 3.26 3.35 3.49 3.67 4.26 4.41 4.57 4.71 26 Difficult 1qrv 1o3t 1b3t 3bam 1rva 1zme 1dfm 1bdt 7mht 2fl3 1eyu 2oaa High mobility group protein D CAP-CAMP Epstein-Barr virus nuclear antigen-1 Restriction endonuclease BAMHI Eco RV endonuclease Proline utilization transcription activator PUT3 Restriction endonuclease BGLII Phage P22 Arc gene regulating protein HHAI methyltransferase Restriction endonuclease HINP1I PVUII endonuclease Restriction endonuclease MVAI 27 3 2 2 3 2 2 3 3 2 2 2 2 5.19 5.20 5.32 5.55 5.68 5.76 6.31 6.45 6.71 6.71 6.82 8.95 References 1. Crick, F. (1970): Central Dogma of Molecular Biology. Nature 227, 561-563. PMID 4913914 2. Dahm, R.(2005) Friedrich Miescher and the discovery of DNA Developmental Biology 278 274–288 3. Voet, D. and Voet, J. G. (1995). Biochemistry – Second Edition. John Willey and Sons, Inc.pp 850-861. 4. Van Dijk, M. and Bonvin, A. M. J. J. (2008). A protein–DNA docking benchmark. Nucleic Acids Research, Vol. 36, No. 14 e88 5. Luscombe, N.M., Austin, S.E., Berman, H.M. and Thornton, J.M. (2000) An overview of the structures of protein-DNA complexes. Genome Biol., 1, e1. 6. Jones, S., Van Heyningen, P., Berman, H. M., Thornton, J. M. (1999). Protein-DNA interactions: a structural analysis, Journal of Molecular Biology, Volume 287, Issue 5, 877-896. 7. Moitessier, N., Englebienne, P., Lee, D., Lawandi, J. and Corbeil, CR.(2008)Towards the development of universal, fast and highly accurate docking/scoring methods: a long way to go. British Journal of Pharmacology 153, S7–S26 8. Duhovny, D., Nussinov, R. and Wolfson, H.J. (2002) Efficient unbound docking of rigid molecules. In Guigo,R. and Gusfield,D. (eds), Proceedings of the Fourth International Workshop on Algorithms in Bioinformatics. Springer-Verlag GmbH Rome, Italy, September 17–21, 2002, Vol. 2452, pp. 185– 200. 9. Rarey, M., Kramer, B., Lengauer, T. and Klebe, G. (1996) A Fast Flexible Docking Method using an Incremental Construction Algorithm. J. Mol. Biol. 261, 470–489 10. Meiler, J. and Baker, D. (2006) ROSETTALIGAND: Protein–Small Molecule Docking with Full Side-Chain Flexibility. PROTEINS: Structure, Function, and Bioinformatics 65:538–548. 11. Dominguez, C., Boelens, R. and Bonvin A.M.J.J. (2003). HADDOCK: a protein–protein docking approach based on biochemical or biophysical information. J Am Chem Soc 125: 1731–1737. 28 12. Velec, H.F.G., Gohlke, H., Klebe, G. (2005). DrugScoreCSD - knowledgebased scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction. J Med Chem 48: 6296–6303. 13. Nadassy, K., Wodak, S. J. and Janin J. (1999). Structural Features of Protein−Nucleic Acid Recognition Sites. Biochemistry, 38 (7), 1999–2017. 14. Ohlendorf, D. H. and Matthew, J. B. (1985). Electrostatics and flexibility in protein–DNA interactions. Advan. Biophys. 20, 137–151. 15. Wade, R. C., Gabdoulline, R.R., Lu'demann, S. K. and Lounnas V. (1998). Electrostatic steering and ionic tethering in enzyme–ligandbinding: Insights from simulations. Proc. Natl. Acad. Sci. USA Vol. 95, pp. 5942–5949, May 1998. 16. Mandel-Gutfreund, Y., Margalit, H., Jernigan, R. L. and Zhurkin, V. B. (1998). A role for CH…O interactions in protein–DNA recognition. J. Mol. Biol. 277, .1140–1121 17. Luscombe, M. N., Laskowski, R. A., and Thornton, J. M. (2001). Amino acid-base interactions: a three-dimensional analysis of protein-DNA interactions at an atomic level. Nucleic Acids Research, 21, 2860 – 2874. 18. Stawiski, E. W., Gregoret, L. M. and Mandel-Gutfreund, Y. (2003). Annotating Nucleic Acid-Binding Function Based on Protein Structure. J. Mol. Biol. 326, 1065–1079. 19. Shanahan, H. P., Garcia, M. A., Jones, S. and Thornton, J. M. (2004) Identifying DNA-binding proteins using structural motifs and the electrostatic potential. Nucleic Acids Research, 2004, Vol. 32, No. 16 4732–4741doi:10.1093/nar/gkh803 20. Jones, S., Shanahan, H. P., Berman, H. M. and Thornton, J. M. (2003). Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins. Nucleic Acids Research, Vol. 31, No. 24 7189-7198 21. Ahmad, S., Gromiha, M. M. and Sarai, A. (2004). Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information. BIOINFORMATICS Vol. 20 no. 4, 477–486 22. Ahmad, S., and Sarai, A. (2005). PSSM-based prediction of DNA binding sites in proteins. BMC Bioinformatics 6:33 29 23. Wu, J., Wu, H., Liu, H., Zhou, H. and Sun, X.. (2007) Support Vector Machine for Prediction of DNA-Binding Domains in Protein-DNA Complexes. Life System Modeling and Simulation. Lecture Notes in Computer Science, Volume 4689/2007, 180-187. 24. Van Dijk, M., Van Dijk, A. D. J., Hsu, V., Boelens R., and Bonvin, A. M. J. J. (2006). Informationdriven protein–DNA docking using HADDOCK: it is a matter of flexibility. Nucleic Acids Research, Vol. 34, No. 11 3317–3325. doi:10.1093/nar/gkl412 25. Van Dijk M. and Bonvin, A. M. J. J. (2010). Pushing the limits of what is achievable in protein–DNA docking: benchmarking HADDOCK’s performance. Nucleic Acids Research, 2010, 1–14. doi:10.1093/nar/gkq222 26. Me´ndez, R., Leplae, R., De Maria, L., and Wodak, S. J. (2003). Assessment of Blind Predictions of Protein–Protein Interactions: Current Status of Docking Methods. PROTEINS: Structure, Function, and Genetics 52:51–67 27. Dickerson, R. E. (1998). DNA bending: the prevalence of kinkiness and the virtues of normality Nucleic Acids Research, Vol. 26, No. 8, 1906–1926 28. Dickerson, R. E. and Chiu, T. K. (1997), Helix bending as a factor in protein/DNA recognition. Biopolymers, 44: 361–403. doi: 10.1002/(SICI)1097-0282(1997)44:4<361::AID-BIP4>3.0.CO;2-X. 29. Mandelkern, M., Elias, J., Eden, D. and Crothers, D. (1981). The dimensions of DNA in solution. J Mol Biol 152 (1): 153–61. doi:10.1016/0022-2836(81)90099-1 30. Wolfson, H.J. and Rigoutsos, I. Geometric hashing: An overview. IEEE Computational Science and Eng., 11:263–278, 1997. 31. CGAL, Computational Geometry Algorithms Library, http://www.cgal.org 32. Nussinov, R. and Wolfson, H.J. Efficient Detection of Three - Dimensional Motifs in Biological Macromolecules by Computer Vision Techniques, Proceedings of the National Academy of Sciences, U.S.A., 88, 10495-10499, (1991). 33. Berg, O. G., Winter, R. B. and Von Hippel, P. H. (1981). Diffusion-driven mechanisms of protein translocation on nucleic acids. 1. Models and theory Biochemistry 20 (24), 6929-6948 31 34. Schneidman-Duhovny, D., Inbar, Y., Nussinov, R. and Wolfson, H.J. Geometry based flexible and symmetric protein docking. Proteins, 60: 224-231, 2005. 35. Schneidman-Duhovny, D., Nussinov, R and Wolfson, H.J. (2003) PatchDock: a method for binding site detection and unbound docking of rigid molecules. M.Sc. thesis, Tel – Aviv University, School of Computer Science. 31 32 תקציר חלבונים ומולקולות דנ"א נקשרים אלו לאלו במסגרת תהליכים פנים תאיים חיוניים רבים .ביצוע תחזיות מדויקות ואמינות למבנה המרחבי של החלבון והדנ"א כאשר הם קשורים זה לזה ,עשוי להוות נקודת ציון משמעותית בדרך להבנה מעמיקה של תפקודי התא. גישות שנוסו בתחום זה כוללות ניסיון לחזות את אתר הקישור על גבי מעטפת החלבון ,סיווג תפקודו של החלבון כקושר/לא קושר דנ"א ,ואלגוריתם עגינה המבוסס על אילוצי מרחק מתוך מידע ניסויי .בעבודה זו אנו מציגים את :ParaDockאלגוריתם עגינה המשלב פתרונות עגינה קשיחה של מקטעי דנ"א קצרים ,המבוססים על תאימות גיאומטרית ,לכדי מולקולות דנ"א מכופפות מישורית אשר מדורגות גם הן על בסיס תאימות גיאומטרית .האלגוריתם נבדק על בסיס נתונים הכולל 41מבני חלבון – דנ"א ,בגרסאותיהם הקשורות והלא קשורות כאחד .ללא התמודדות עם גמישות החלבון ,וללא שלב עידון המאפשר שינויים מבניים מקומיים ,ב 33%מהמבנים בצורתם הקשורה ,וב 10%מהמבנים בצורתם הלא קשורה ,לפחות פתרון מקובל אחד (על פי סטנדרט )CAPRIמדורג בעשרת הפתרונות העליונים .ללא דרישת רצף או אורך מולקולת הדנ"א ,ובזמן ריצה של כשעתיים על מעבד 2.0 GHzסטנדרטי, ParaDockמספק פתרון עגינה מהיר ופשוט למבני חלבון – דנ"א. 33 אוניברסיטת תל-אביב הפקולטה למדעים מדויקים ע"ש ריימונד ובברלי סאקלר בית הספר למדעי המחשב – ParaDockאלגוריתם עגינת חלבון קשיח ודנ"א גמיש חיבור זה מוגש כחלק מהדרישות לקבלת תואר מוסמך בבית הספר למדעי המחשב ע"ש בלבטניק באוניברסיטת תל – אביב על ידי איתמר בנית המחקר וכתיבת העבודה נעשו בהנחייתו של פרופ' חיים וולפסון תשרי תשע"א 34
© Copyright 2026 Paperzz