Protein Crystallography Part III — Refinement and Model Building Overview Tim Grüne Dept. of Structural Chemistry Head of Dept.: Prof. G. Sheldrick University of Göttingen From Map to Model The PDB File Model Refinement November/December 2005 http://shelx.uni-ac.gwdg.de Restraints and Constraints Model Building [email protected] Molecular Biology Course 2005 From Map to Model 1 Protein Crystallography III From Map to Model From Map to Model From Map to Model From Map to Model From Map to Model The PDB File The PDB File Model Refinement Model Refinement Restraints and Constraints Restraints and Constraints Model Building Model Building Molecular Biology Course 2005 2 Protein Crystallography III • The actual experimental data is an electron density map. • An initial electron density (and also a final one) looks quite messy and is difficult to interpret. • The information stored in the final coordinate model contains more useful information. • The coordinate model is the target of model building and refinement. Molecular Biology Course 2005 3 Protein Crystallography III From Map to Model The PDB File Data Visualisation Storing Structural Data — the PDB–File ball–and–stick Cα trace(smooth) CPK (space filling) From Map to Model The protein models that are stored e.g. in the Protein Data Bank, PDB, http://www.pdb.org, do not represent the mere experimental data. From the experiment we get diffraction intensities and — after some work — the electron density ρ within the unit cell. The model is the best match (from the author’s point of view) that explains the experimental data. From Map to Model The PDB File A typical PDB-file contains a header with supplemental information (authors, compound, publication, etc.), the crystallographic space group and unit cell dimensions. The PDB File Model Refinement Model Refinement Restraints and Constraints Restraints and Constraints Model Building Model Building Cα trace (coloured by B-factor) Molecular Biology Course 2005 ball-and-stick (coloured by B-factor) 4 ribbons Protein Crystallography III The main part of the file are ATOM entries, one per line. An atom entry contains atom type, atom name, residue type it belongs to, and coordinates, occupancy, and B-factor. HEADER TITLE TITLE TITLE AUTHOR . . . CRYST1 .. . ATOM ATOM ATOM . . . Molecular Biology Course 2005 The PDF File LIGASE 28-APR-99 1CLI X-RAY CRYSTAL STRUCTURE OF AMINOIMIDAZOLE RIBONUCLEOTIDE 2 SYNTHETASE (PURM), FROM THE E. COLI PURINE BIOSYNTH 3 PATHWAY, AT 2.5 A RESOLUTION C.LI,T.J.KAPPOCK,J.STUBBE,T.M.WEAVER,S.E.EALICK 71.170 1 2 3 N CA C 211.680 THR A THR A THR A 94.450 5 5 5 5 90.00 15.163 15.093 16.450 90.00 80.897 82.326 83.017 90.00 P 21 21 21 61.279 61.723 61.598 1.00 20.99 1.00 22.09 1.00 21.68 16 N C C Protein Crystallography III The PDB File The Use of Occupancy: Multiple Conformation Occupancy and B–factor of an Atom From Map to Model The PDB File Model Refinement Restraints and Constraints Model Building Occupancy A typical crystal consist of a large number (> 1013) of unit cells, and the resulting model is therefore only an average of all these cells. Some atoms, especially those of large side chains (Arginine, Phenylalanine, . . . ) can be partially disordered, others can have several but fixed orientations. An occupancy lower than 1 indicates that an atom occupies this position in only a fraction of all unit cells. B–factor Even though data are most often collected at 100 K, atoms are not immobile but vibrate — thermal motion. The temperature — or B — factor describes the vibration as a sphere within which the atom oscillates. For high resolution, the B-factor splits up into a (symmetric) 3x3–matrix that describes anisotropic thermal motion in three dimensions. Molecular Biology Course 2005 6 Protein Crystallography III From Map to Model The PDB File Model Refinement Restraints and Constraints Model Building Molecular Biology Course 2005 7 Protein Crystallography III Model Refinement Model Refinement Reliability of Data: The Data to Parameter Ratio Model Refinement and Model Building From Map to Model The PDB File Model Refinement Restraints and Constraints Model Building Creating a model from X-ray data is an iterative process consisting of model building and refinement . Refinement means global improvement of the model with respect to the experimental data. Coordinates of all atoms together with their temperature factors (and sometimes, at very high resolution, even the occupancy), are moved in order to minimise the difference between the measured intensities and the ones calculated from the model. Model Building means local improvement of the model with respect to the experimental data. Atoms are added, removed, or moved in order to ensure that 1. the model makes sense bio–chemically (proximity of atoms, H-bonding, position of solvent molecules, etc.) 2. the model fits the calculated electron density (e.g. check for multiple conformations) Molecular Biology Course 2005 8 Protein Crystallography III From Map to Model The PDB File Model Refinement Restraints and Constraints Model Building No measurement can be exact and is only an approximation to the true value. It is therefore important to have enough data to support the deduced model. In protein crystallography we want to determine at least the coordinates for every atom of the structure. If more data are available, we add the isotropic B-value, and at best we can even determine an anisotropic B-value. Our data are the unique reflections the number of which is determined by the resolution, the spacegroup, and the unit cell dimensions. Res.[Å] parameters data/parameters 3.0 x,y,z 0.9:1 2.3 x,y,z; B 1.5:1 1.8 x,y,z; B 3.1:1 1.5 x,y,z; B 5.4:1 1.5 x,y,z; U11U12U13U23U22U33 2.4:1 6.1:1 1.1 x,y,z; U11U12U13U23U22U33 0.8 x,y,z; U11U12U13U23U22U33 16:1 G. Sheldrick These ratios, up to about 1.8Å, would be much too low to allow building of a proper model. The effective number of data is increased by the incorporation of additional — (bio–) chemical etc. — information. Molecular Biology Course 2005 Model Refinement 9 Protein Crystallography III Model Refinement Data Fitting — Least Squares Parameters are often fitted to a predicted or presumed curve by the least–squares–fit , which minimises the total distance of all data–points to the predicted curve. Data Fitting — Maximum Likelihood The line (parameters are slope and y-intercept) is to be fitted to the (data) points. The least-squares-fit yields the line with the smallest total distance to the data points (red dashed line). With only a few data points, the confidence that this is the correct line is not very strong. From Map to Model The PDB File Model Refinement Restraints and Constraints From Map to Model The PDB File Model Refinement More data do not necessarily give a different line, but they reduce the error of the prediction, so that we can trust the result much more. Model Building A more modern approach than least-squares is the maximum likelihood method. It applies statistical assumptions and allows to include more data and information, e.g. experimental phases. For macromolecules, maximum likelihood is more stable and leads to overall better results, often with reduced model bias. Restraints and Constraints Model Building Maximum likelihood incorporates errors of the data and avoids that a model is built with higher accuracy than the data would permit. That is why the data to parameter ratio is an important figure to indicate the quality of a model. Refinement and building strategies differ depending on that ratio. Molecular Biology Course 2005 10 Protein Crystallography III Molecular Biology Course 2005 11 Protein Crystallography III Model Refinement Model Refinement Quality Figures: the R–value Local Minima and Traps Refinement programs target at minimisation of the R–value , which describes the agreement between measured amplitudes (F obs(hkl)) and those calculated from Refinement can only find the next minimum of its target function. From Map to Model From Map to Model the model (F calc(hkl)). R= The PDB File P hkl (|Fobs | − |Fcalc |) P hkl (|Fobs |) best model The PDB File Model Refinement Model Refinement Restraints and Constraints Restraints and Constraints bad model Model Building Model Building good model For small molecules, R–values between 2% and 5% are normal, for macromolecules, the range is approximately 20%–30%. Depending on the starting point (red crosses), this might result in a good or a bad model. Molecular Biology Course 2005 12 |Fobs| are represented by the reflection data (observations), |Fcalc| are calculated from (x,y,z) and B-values of the atoms of the model. Protein Crystallography III As a rule of thumb one can expect an R–value about 1/10 of the resolution: a 2.5Å structure should have an R–value of 25%. Molecular Biology Course 2005 Restraints and Constraints 13 Protein Crystallography III Restraints and Constraints Restraints and Constraints The reflection data alone would not be sufficient to create a trustworthy model. There are too many parameters. Therefore it is necessary to incorporate additional information. This is done by using restraints and constraints. Refinement and Overfitting From Map to Model The PDB File Model Refinement Since the amplitudes lack some information (their phase) and are not ideal (for protein structures, the errors are fairly large), this difference can be nearly arbitrarily reduced by adding more and more atoms that were not really present in the crystal structure or allowing positions that chemically do not make much sense (stereochemical clashes). This is called overfitting of data. It is therefore important to impose restraints and constraints. Restraints and Constraints Model Building One measure to reduce overfitting is the Rfree–value . About 5%–10% of the reflections are excluded from minimisation of the R–value. They remain unconsidered and are like an “independent judge”: after refinement, the R free value is calculated like the R–value, but with the excluded reflections. The two values must not differ too much. Constraints are fixed conditions and cannot be changed (e.g. occupancy of atoms). From Map to Model Restraints allow variation within certain limits. The PDB File Model Refinement Restraints and Constraints Model Building These ideal values are derived from high resolution structures that showed that certain geometric properties of macromolecules do not vary a lot. Examples are • bond lengths (e.g. C − C = 1.54Å) • planarity of aromatic rings (Phe, Tyr,. . . ) • anti-bumping (unbonded atoms cannot get too close) Most models of macromolecules can only be built because of this extra information. It improves the data to parameter ratio. Molecular Biology Course 2005 14 Protein Crystallography III Molecular Biology Course 2005 15 Protein Crystallography III Model Building Model Building Model Building: Getting Started Directionality of α–Helices The first steps in building the model consist of finding larger groups of residues with special features. From the main chain (Cα–chain) one cannot determine the direction, nor which part of the sequence it covers. One gets help from the so-called christmas tree: the side chains of an α–helix point towards the N–terminal end of the protein chain. In proteins this is the (Cα) main chain, in nucleic acids the position of the bases. α– helices are particularly easy to locate, even at medium to low resolution (2.5–4Å). From Map to Model From Map to Model The PDB File The PDB File Model Refinement Model Refinement Restraints and Constraints Restraints and Constraints Model Building Model Building Selenomethionine substituted proteins have become very popular for MAD– experiment. The heavy selenium atoms are easy to find in the electron density map and help docking the sequence to the map. Disulphide bridges or metals bound to an active centre can also be helpful. Molecular Biology Course 2005 16 Protein Crystallography III Molecular Biology Course 2005 Model Building 17 Protein Crystallography III Model Building Automated Model Building β–strands At resolution better than, say, 2.5Å building is extremely facilitated by programs like Arp/Warp (A. Perrakis, V. Lamzin) or Resolve (T. Terwilliger), which utomatically build large parts of the structure. These programs can even overcome local minima. The other secondary structure element of proteins, β–strands are also striking but more difficult to build. Especially the direction of the peptide chain can be difficult to find. From Map to Model From Map to Model The PDB File The PDB File Model Refinement Model Refinement Restraints and Constraints Restraints and Constraints Model Building Model Building Refinement programs (either least-squares or maximum likelihood) cannot cross this barrier — they would get stuck in the local minimum and could not move the Phenylalanine into the right position. Molecular Biology Course 2005 18 Protein Crystallography III Molecular Biology Course 2005 19 Protein Crystallography III Model Building Model Building Manual Model Building What about Hydrogen Atoms? Computer programs do not know about biology, certainly not of a specific molecule/structure. Human interaction is therefore required to pay attention to: From Map to Model The PDB File Model Refinement Restraints and Constraints Model Building From Map to Model • presence and identification of ligands and/or metal ions (from crystallisation or protein preparation) • special interaction for complexes • exceptions from standard values used in refinement • correct placement of solvent (water) molecules The PDB File Model Refinement Restraints and Constraints Model Building 20 Protein Crystallography III During refinement, hydrogens are treated as riding atoms, that is, in a fixed position relative to the groups they belong to (like the carbons of a phenylalanine ring). Instead of completely ignoring hydrogens, this method improves the quality of the model and also aids to keep the correct distances to neighbouring groups. Because of the fixed position, riding atoms do not increase the number of parameters. Even this sort of information increases the data to parameter value and hence improves the quality of the model. This becomes especially important at medium or low resolution (2.5Å and worse). Molecular Biology Course 2005 X-rays interact with the electron shell of atoms. The strength of interaction is proportional to the total number. Hydrogen atoms only have one electron. They cannot be detected by X-ray diffraction (unless with very high resolution data, 1Å). This is different for neutron diffraction, which makes this technique very valuable for studies of enzymes and their active centres. Molecular Biology Course 2005 Model Building 21 Protein Crystallography III Model Building Empty Space? — The Solvent Region Arrangement of molecules in the unit cell Electron density map From Map to Model The Solvent Model From Map to Model The PDB File Protein crystals are not very tightly packed. The space between the molecules is filled with solvent, 50–70% of the total volume on average. Because it is disordered, it contributes mostly to reflections below 6Å resolution (d>6Å). The PDB File Model Refinement Model Refinement Restraints and Constraints Restraints and Constraints Model Building Model Building Possible ways to treat the solvent are: 1. ignore the solvent — results in high R-value: Not liked by crystallographers and publishers. 2. ignore data with d>6Å — better R-value but worse maps: difficult to interpret. 3. consider the solvent region as a flat lake of electron density, i.e. with a low but constant average number of electrons. The “holes” in both pictures are not vacuum. They are filled with solvent, i.e., mostly water molecules. They disorderd but still contribute to the diffraction pattern at low resolution. Molecular Biology Course 2005 22 Protein Crystallography III Molecular Biology Course 2005 23 Protein Crystallography III
© Copyright 2026 Paperzz