Atomic structures refinement in cryoEM maps

Atomic structures refinement in
cryo-EM maps
Martyn Winn
STFC, Harwell, UK
Based on slides from Garib Murshudov, MRC-LMB, Cambridge, UK
Contents
About REFMAC
Structural restraints
Fit into EM maps
Effects of over-sharpening
A. Brown, F. Long, R. A. Nicholls, J. Toots, P. Emsley and G. Murshudov (2015)
Tools for macromolecular model building and refinement into electron cryomicroscopy reconstructions. Acta Cryst. D71, 136-153
About REFMAC
Refmac is a program for refinement of atomic models into experimental data
It was designed originally for Macromolecular Crystallography - now also cryoEM
It is based on some elements of Bayesian
statistics: it tries to fit chemically and
structurally consistent atomic models in
X-ray crystallographic data
Note: Refmac does not build or dock atomic models (cf. Coot, Buccaneer,
Chimera, FlexEM, etc.) - rather it optimises the current atomic model.
Purpose of refinement
Crystallographic refinement has two purposes:
1) Fit chemically and structurally sensible atomic model into observed
data. In reciprocal space, optimise agreement of |Fobs| to |Fcalc|.
2) To calculate best possible (electron density) map so that atom model
can criticised. In MX, phases are calculated not observed.
In cryoEM:
1) Still true. Maintaining correct chemistry especially important at lower
resolutions.
2) Map is considered fixed (at least at the moment).
+
Data
=
Atomic model
Fit and refine
Optimise atomic coordinates to EM density + other parameters (B
factors, occupancies, scale factors). Can give significant
improvement.
Impose and maintain good geometry. Use prior knowledge about
macromolecular chemistry. Very important at low resolution.
Atomic structure refinement
Data: EM reconstruction, X-ray intensities, approx. X-ray phases
(MAD/SAD/SIRAS/MIR), known geometries
Model: atomic coordinates, occupancies, ADPs (B factors), rigid bodies,
TLS, bulk solvent, scale factors, twin laws
Target function: f = w.fexpt + fgeom
fexpt = - log [P(Fo;Fc)]
fgeom =  w (bobs - bideal)2 + ...
Reciprocal space refinement:
F.T.
Fo
|Fo|
Weights
Target function: f = w.fexpt + fgeom
fexpt = - log [P(Fo;Fc)]
fgeom =  w (bobs - bideal)2 + ...
Weight between fexpt + fgeom
• High resolution  refinement guided by EM density
• Low resolution  rely more on known geometry
Weight is automatically determined, but can be manually adjusted
Weights between different restraint types
Likelihood function for fit into
cryo-EM map
Probability distribution of “observed” structure factors given coordinates of a
molecule:
P(Fo ; Fc ) = Ne
|Fo -Fc |2
S+2 s 2
Fo “observed” structure factors calculated from EM map
Fc calculated structure factors – from coordinates

variance of signal
2 variance of noise
N normalisation
Fc calculated from current model using electron scattering form factors (describe
interaction with electrostatic potential)
What do we know about macromolecules?
(a.k.a. prior knowledge / structural restraints)
1) Macromolecules consist of atoms that are bonded to each other in a
specific way (dictionary files for residues and ligands)
2) If there are two molecules with sufficiently high sequence identity then
it is likely that they will be similar to each other in 3D (external
restraints)
3) It is highly likely that if there are two copies of a the same molecule
they will be similar to each other (at least locally) (local NCS restraints)
4) Oscillation of atoms close to each other in 3D cannot be dramatically
different (B factor restraints)
5) Proteins tend to form secondary structures (secondary structure
restraints)
6) DNA/RNA tend to form base-pairs, stacked bases tend to be parallel
(base pair restraints)
External (reference) structure restraints
When refining against low resolution data, restraints to a (higher
resolution) external structure and/or secondary structure restraints
can be useful.
Restraints generated by the program ProSmart (Rob Nicholls):
1) Aligns structure in the presence of conformational changes.
Sequence is not used.
2) Generates restraints for aligned atoms
3) Identifies secondary structures (at the moment helix and strand,
but the approach is general and can be extended to any motif).
4) Generates restraints for secondary structures
Robust estimator functions are used for restraints,
i.e. if differences between target and model is very
large then their contributions are down-weighted
Geman–McClure function
Restraints: external restraints
All restraints can be visualised and applied in Coot:
Yellow=target, purple=reference
Restraints: secondary structure
•
•
•
In addition to reference restraints or when no high-resolution structures exist
helical fragment restraints
and secondary structure h-bond restraints (which include: helix restraints; sheet
restraints; and loop restraints).
Visualisation of helix
restraints in Coot
v0.8.3 Extras → Read
ProSMART Restraints ...
Can also edit and save
list of restraints
Restraints to current distances
(jelly-body)
To stabilise refinement at low resolution, the following term can be added to
the target function:
å w(| d | - | d
current
|) 2
pairs
Summation is over all pairs in the same chain and within given distance (default
4.2A). dcurrent is recalculated at every cycle. This function does not contribute to
gradients. It only contributes to the second derivative matrix (direction of
optimisation).
It is equivalent to adding springs between atom pairs. During refinement interatomic distances are not changed very much. If all pairs would be used and
weights would be very large then it would be equivalent to rigid body
refinement.
Example: 10Å
After positioning the molecule with Molrep (grey model) and 150 cycle of jelly
body refinement (black model)
Basepair and parallel plane restraints
A-U b.p.
• Base-pair, parallel planes and
sugar pucker restraints for
nucleic acids
• Parallel planes also for protein
- nucleic acid pairs
Green – before refinement, Blue – refined without LIBG, Yellow – refined with LIBG
Refinement protocol
• REFMAC can refine models
against EM maps
• Input can be multiple or
composite maps
• External restraints can be
applied for different regions to
account for variation in local
resolution
• Electron scattering factors are
used
• Symmetry restraints can be
applied
Restraints: reference restraints
•
•
Use information from structures at high resolution to restrain refinement at lower
resolution
Reduces the chance of overfitting
Target
+
Reference
(restraints from)
=
Refined
Restraints can be tuned to local resolution
Tighter restraints
may be required
Masking improves local resolution
Masks
Composite map refinement
1
2
Interface from full map
Monitoring fit to density: FSCaverage
•
•
•
•
•
Measure of fit to density (analogous to crystallographic Rfactor)
Used to follow the progress of refinement
FSCaverage avoids a dependence on weight
FSC is calculated over resolution shells
If shells are sufficiently narrow the weights are roughly the same within each shell
Refinement of deposited structures
•
•
•
Re-refined structures deposited in
the EMDB at a resolution of 4 Å or
higher (n = 14)
Significant improvements in
MolProbity clashscore
And fit-to-density (as measured by
FSCaverage)
Overfitting
What leads to overfitting?
1.
2.
3.
4.
5.
Insufficient data (low resolution, partial occupancy)
Ignoring data (cutting by resolution)
Sub-optimal parameterisation
Bad weights
Excess of imagination
Validation of overfitting
Validation of overfitting
Not overfitted
FSCwork = model refined against half map 1, compared to half map 1
FSCfree = model refined against half map 1, compared to half map 2
Overfitted
Effect of oversharpening
A report in Science:
Bartesaghi A, Merk A, Banerjee S, Matthies D, Wu X, Milne J, Subramaniam S
“2.2 A resolution cryo-EM structure of beta-galactosidase in complex with a
cell-permeant inhibitor” SCIENCE (2015)
Claim 2.2A maps. Deposited map does not look like to be at 2.2A.
It is the case of over-sharpening.
Map quality is better that deposited map.
1500
<|F|> vs resolution
1000
Map from EMDB
0
500
<|Fo]|>
Blurred with B value 40 and 60
0.0
0.2
0.4
Resolution 1/d
0.6
0.8
Map from PDB
Blurred: B = 60
Model refined against blurred map
Some recent developments
Difference maps
Refmac can do vector difference maps. It is very simple and seems to wok:
Fdiff = Fo – Fc
Where Fo is from the map and Fc is the from the atomic model scaled to fit to Fo.
Both Fo and Fc are complex numbers (two dimensional vectors)
To invoke this the following keyword needs to be used:
mapcalculate vector
Reference map sharpening
Cross validation requires maps to be on the same sharpening level. Refmac can
sharpen given map to a reference map. Usually reference map is taken as full
reconstruction map.
Fourier transformation (structure factor) from reference map is calculated and
then average values of modules of structure factors in resolution shells are
calculated. For given map average values of modules of Fourier coefficients in
resolution bins scaled to reference maps coefficients
Summary
Problem
Solution
Refinement programs used in X-ray
crystallography were not suitable for EM
maps
Modified REFMAC to handle EM maps
taking into consideration electron
scattering factors and multiple maps
generated by masking
RNA geometry is handled poorly by
refinement programs
LIBG generates external restraints to
improve base pairing, base planarity and
pucker geometry
No Rfactor equivalent to monitor fit to
density
FSCaverage as a global measure of fit to
density
No Rfree for validating overfitting
An approach to overfitting using half maps
generated during map refinement