Protein Crystallography Overview From Map to Model

Protein Crystallography
Part III — Refinement and Model Building
Overview
Tim Grüne
Dept. of Structural Chemistry
Head of Dept.: Prof. G. Sheldrick
University of Göttingen
From Map to Model
The PDB File
Model Refinement
November/December 2005
http://shelx.uni-ac.gwdg.de
Restraints and Constraints
Model Building
[email protected]
Molecular Biology Course 2005
From Map to Model
1
Protein Crystallography III
From Map to Model
From Map to Model
From Map to Model
From Map to Model
From Map to Model
The PDB File
The PDB File
Model Refinement
Model Refinement
Restraints and Constraints
Restraints and Constraints
Model Building
Model Building
Molecular Biology Course 2005
2
Protein Crystallography III
• The actual experimental data is an electron density map.
• An initial electron density (and also a final one) looks quite messy and is difficult
to interpret.
• The information stored in the final coordinate model contains more useful information.
• The coordinate model is the target of model building and refinement.
Molecular Biology Course 2005
3
Protein Crystallography III
From Map to Model
The PDB File
Data Visualisation
Storing Structural Data — the PDB–File
ball–and–stick
Cα trace(smooth)
CPK (space filling)
From Map to Model
The protein models that are stored e.g.
in the Protein Data Bank, PDB,
http://www.pdb.org, do not represent the mere experimental data. From the experiment we get diffraction intensities and — after some work — the electron density ρ
within the unit cell. The model is the best match (from the author’s point of view) that
explains the experimental data.
From Map to Model
The PDB File
A typical PDB-file contains a header with supplemental information (authors, compound, publication, etc.), the crystallographic space group and unit cell dimensions.
The PDB File
Model Refinement
Model Refinement
Restraints and Constraints
Restraints and Constraints
Model Building
Model Building
Cα trace (coloured by
B-factor)
Molecular Biology Course 2005
ball-and-stick (coloured by
B-factor)
4
ribbons
Protein Crystallography III
The main part of the file are ATOM entries, one per line. An atom entry contains atom
type, atom name, residue type it belongs to, and coordinates, occupancy, and B-factor.
HEADER
TITLE
TITLE
TITLE
AUTHOR
. . .
CRYST1
.. .
ATOM
ATOM
ATOM
. . .
Molecular Biology Course 2005
The PDF File
LIGASE
28-APR-99
1CLI
X-RAY CRYSTAL STRUCTURE OF AMINOIMIDAZOLE RIBONUCLEOTIDE
2 SYNTHETASE (PURM), FROM THE E. COLI PURINE BIOSYNTH
3 PATHWAY, AT 2.5 A RESOLUTION
C.LI,T.J.KAPPOCK,J.STUBBE,T.M.WEAVER,S.E.EALICK
71.170
1
2
3
N
CA
C
211.680
THR A
THR A
THR A
94.450
5
5
5
5
90.00
15.163
15.093
16.450
90.00
80.897
82.326
83.017
90.00 P 21 21 21
61.279
61.723
61.598
1.00 20.99
1.00 22.09
1.00 21.68
16
N
C
C
Protein Crystallography III
The PDB File
The Use of Occupancy: Multiple Conformation
Occupancy and B–factor of an Atom
From Map to Model
The PDB File
Model Refinement
Restraints and Constraints
Model Building
Occupancy A typical crystal consist of a large number (> 1013) of unit cells, and
the resulting model is therefore only an average of all these cells. Some atoms,
especially those of large side chains (Arginine, Phenylalanine, . . . ) can be partially
disordered, others can have several but fixed orientations. An occupancy lower
than 1 indicates that an atom occupies this position in only a fraction of all unit
cells.
B–factor Even though data are most often collected at 100 K, atoms are not immobile
but vibrate — thermal motion. The temperature — or B — factor describes the
vibration as a sphere within which the atom oscillates. For high resolution, the
B-factor splits up into a (symmetric) 3x3–matrix that describes anisotropic thermal
motion in three dimensions.
Molecular Biology Course 2005
6
Protein Crystallography III
From Map to Model
The PDB File
Model Refinement
Restraints and Constraints
Model Building
Molecular Biology Course 2005
7
Protein Crystallography III
Model Refinement
Model Refinement
Reliability of Data: The Data to Parameter Ratio
Model Refinement and Model Building
From Map to Model
The PDB File
Model Refinement
Restraints and Constraints
Model Building
Creating a model from X-ray data is an iterative process consisting of model building
and refinement .
Refinement means global improvement of the model with respect to the experimental data. Coordinates of all atoms together with their temperature factors (and
sometimes, at very high resolution, even the occupancy), are moved in order to
minimise the difference between the measured intensities and the ones calculated
from the model.
Model Building means local improvement of the model with respect to the experimental data. Atoms are added, removed, or moved in order to ensure that
1. the model makes sense bio–chemically (proximity of atoms, H-bonding, position of solvent molecules, etc.)
2. the model fits the calculated electron density (e.g. check for multiple conformations)
Molecular Biology Course 2005
8
Protein Crystallography III
From Map to Model
The PDB File
Model Refinement
Restraints and Constraints
Model Building
No measurement can be exact and is only an approximation to the true value. It is
therefore important to have enough data to support the deduced model.
In protein crystallography we want to determine at least the coordinates for every atom
of the structure. If more data are available, we add the isotropic B-value, and at best
we can even determine an anisotropic B-value. Our data are the unique reflections
the number of which is determined by the resolution, the spacegroup, and the unit cell
dimensions.
Res.[Å] parameters
data/parameters
3.0
x,y,z
0.9:1
2.3
x,y,z; B
1.5:1
1.8
x,y,z; B
3.1:1
1.5
x,y,z; B
5.4:1
1.5
x,y,z; U11U12U13U23U22U33
2.4:1
6.1:1
1.1
x,y,z; U11U12U13U23U22U33
0.8
x,y,z; U11U12U13U23U22U33
16:1
G. Sheldrick
These ratios, up to about 1.8Å, would be much too low to allow building of a proper
model. The effective number of data is increased by the incorporation of additional —
(bio–) chemical etc. — information.
Molecular Biology Course 2005
Model Refinement
9
Protein Crystallography III
Model Refinement
Data Fitting — Least Squares
Parameters are often fitted to a predicted or presumed curve by the least–squares–fit ,
which minimises the total distance of all data–points to the predicted curve.
Data Fitting — Maximum Likelihood
The line (parameters are slope and y-intercept) is to
be fitted to the (data) points. The least-squares-fit
yields the line with the smallest total distance to the
data points (red dashed line). With only a few data
points, the confidence that this is the correct line is
not very strong.
From Map to Model
The PDB File
Model Refinement
Restraints and Constraints
From Map to Model
The PDB File
Model Refinement
More data do not necessarily give a different line,
but they reduce the error of the prediction, so that
we can trust the result much more.
Model Building
A more modern approach than least-squares is the maximum likelihood method. It
applies statistical assumptions and allows to include more data and information, e.g.
experimental phases. For macromolecules, maximum likelihood is more stable and
leads to overall better results, often with reduced model bias.
Restraints and Constraints
Model Building
Maximum likelihood incorporates errors of the data and avoids that a model is built
with higher accuracy than the data would permit.
That is why the data to parameter ratio is an important figure to indicate the quality of
a model. Refinement and building strategies differ depending on that ratio.
Molecular Biology Course 2005
10
Protein Crystallography III
Molecular Biology Course 2005
11
Protein Crystallography III
Model Refinement
Model Refinement
Quality Figures: the R–value
Local Minima and Traps
Refinement programs target at minimisation of the R–value
, which describes the
agreement between measured amplitudes (F obs(hkl)) and those calculated from
Refinement can only find the next minimum of its target function.
From Map to Model
From Map to Model
the model (F calc(hkl)).
R=
The PDB File
P
hkl (|Fobs | − |Fcalc |)
P
hkl (|Fobs |)
best model
The PDB File
Model Refinement
Model Refinement
Restraints and Constraints
Restraints and Constraints
bad
model
Model Building
Model Building
good
model
For small molecules, R–values between 2% and 5% are normal, for macromolecules,
the range is approximately 20%–30%.
Depending on the starting point (red crosses), this might result in a good or a bad
model.
Molecular Biology Course 2005
12
|Fobs| are represented by the reflection data (observations), |Fcalc| are calculated
from (x,y,z) and B-values of the atoms of the model.
Protein Crystallography III
As a rule of thumb one can expect an R–value about 1/10 of the resolution: a 2.5Å
structure should have an R–value of 25%.
Molecular Biology Course 2005
Restraints and Constraints
13
Protein Crystallography III
Restraints and Constraints
Restraints and Constraints
The reflection data alone would not be sufficient to create a trustworthy model.
There are too many parameters. Therefore it is necessary to incorporate additional
information. This is done by using restraints and constraints.
Refinement and Overfitting
From Map to Model
The PDB File
Model Refinement
Since the amplitudes lack some information (their phase) and are not ideal (for protein
structures, the errors are fairly large), this difference can be nearly arbitrarily reduced
by adding more and more atoms that were not really present in the crystal structure or
allowing positions that chemically do not make much sense (stereochemical clashes).
This is called overfitting of data. It is therefore important to impose restraints and
constraints.
Restraints and Constraints
Model Building
One measure to reduce overfitting is the Rfree–value . About 5%–10% of the reflections
are excluded from minimisation of the R–value. They remain unconsidered and are like
an “independent judge”: after refinement, the R free value is calculated like the R–value,
but with the excluded reflections. The two values must not differ too much.
Constraints are fixed conditions and cannot be changed (e.g. occupancy of atoms).
From Map to Model
Restraints allow variation within certain limits.
The PDB File
Model Refinement
Restraints and Constraints
Model Building
These ideal values are derived from high resolution structures that showed that
certain geometric properties of macromolecules do not vary a lot. Examples are
• bond lengths (e.g. C − C = 1.54Å)
• planarity of aromatic rings (Phe, Tyr,. . . )
• anti-bumping (unbonded atoms cannot get too close)
Most models of macromolecules can only be built because of this extra information. It
improves the data to parameter ratio.
Molecular Biology Course 2005
14
Protein Crystallography III
Molecular Biology Course 2005
15
Protein Crystallography III
Model Building
Model Building
Model Building: Getting Started
Directionality of α–Helices
The first steps in building the model consist of finding larger groups of residues with
special features.
From the main chain (Cα–chain) one cannot determine the direction, nor which part
of the sequence it covers. One gets help from the so-called christmas tree: the side
chains of an α–helix point towards the N–terminal end of the protein chain.
In proteins this is the (Cα) main chain, in nucleic acids the position of the bases. α–
helices are particularly easy to locate, even at medium to low resolution (2.5–4Å).
From Map to Model
From Map to Model
The PDB File
The PDB File
Model Refinement
Model Refinement
Restraints and Constraints
Restraints and Constraints
Model Building
Model Building
Selenomethionine substituted proteins have become very popular for MAD–
experiment. The heavy selenium atoms are easy to find in the electron density map
and help docking the sequence to the map. Disulphide bridges or metals bound to an
active centre can also be helpful.
Molecular Biology Course 2005
16
Protein Crystallography III
Molecular Biology Course 2005
Model Building
17
Protein Crystallography III
Model Building
Automated Model Building
β–strands
At resolution better than, say, 2.5Å building is extremely facilitated by programs like
Arp/Warp (A. Perrakis, V. Lamzin) or Resolve (T. Terwilliger), which utomatically build
large parts of the structure. These programs can even overcome local minima.
The other secondary structure element of proteins, β–strands are also striking but
more difficult to build. Especially the direction of the peptide chain can be difficult to
find.
From Map to Model
From Map to Model
The PDB File
The PDB File
Model Refinement
Model Refinement
Restraints and Constraints
Restraints and Constraints
Model Building
Model Building
Refinement programs (either least-squares or maximum likelihood) cannot cross this
barrier — they would get stuck in the local minimum and could not move the Phenylalanine into the right position.
Molecular Biology Course 2005
18
Protein Crystallography III
Molecular Biology Course 2005
19
Protein Crystallography III
Model Building
Model Building
Manual Model Building
What about Hydrogen Atoms?
Computer programs do not know about biology, certainly not of a specific
molecule/structure. Human interaction is therefore required to pay attention to:
From Map to Model
The PDB File
Model Refinement
Restraints and Constraints
Model Building
From Map to Model
• presence and identification of ligands and/or metal ions (from crystallisation or
protein preparation)
• special interaction for complexes
• exceptions from standard values used in refinement
• correct placement of solvent (water) molecules
The PDB File
Model Refinement
Restraints and Constraints
Model Building
20
Protein Crystallography III
During refinement, hydrogens are treated as riding atoms, that is, in a fixed position
relative to the groups they belong to (like the carbons of a phenylalanine ring).
Instead of completely ignoring hydrogens, this method improves the quality of the
model and also aids to keep the correct distances to neighbouring groups. Because
of the fixed position, riding atoms do not increase the number of parameters.
Even this sort of information increases the data to parameter value and hence improves the quality of the model. This becomes especially important at medium or low
resolution (2.5Å and worse).
Molecular Biology Course 2005
X-rays interact with the electron shell of atoms. The strength of interaction is proportional to the total number. Hydrogen atoms only have one electron. They cannot be
detected by X-ray diffraction (unless with very high resolution data, 1Å). This is different for neutron diffraction, which makes this technique very valuable for studies of
enzymes and their active centres.
Molecular Biology Course 2005
Model Building
21
Protein Crystallography III
Model Building
Empty Space? — The Solvent Region
Arrangement of molecules in the unit cell
Electron density map
From Map to Model
The Solvent Model
From Map to Model
The PDB File
Protein crystals are not very tightly packed. The space between the molecules is filled
with solvent, 50–70% of the total volume on average. Because it is disordered, it
contributes mostly to reflections below 6Å resolution (d>6Å).
The PDB File
Model Refinement
Model Refinement
Restraints and Constraints
Restraints and Constraints
Model Building
Model Building
Possible ways to treat the solvent are:
1. ignore the solvent — results in high R-value: Not liked by crystallographers and
publishers.
2. ignore data with d>6Å — better R-value but worse maps: difficult to interpret.
3. consider the solvent region as a flat lake of electron density, i.e. with a low but
constant average number of electrons.
The “holes” in both pictures are not vacuum. They are filled with solvent, i.e., mostly
water molecules. They disorderd but still contribute to the diffraction pattern at low
resolution.
Molecular Biology Course 2005
22
Protein Crystallography III
Molecular Biology Course 2005
23
Protein Crystallography III