Atomic structures refinement in cryo-EM maps Martyn Winn STFC, Harwell, UK Based on slides from Garib Murshudov, MRC-LMB, Cambridge, UK Contents About REFMAC Structural restraints Fit into EM maps Effects of over-sharpening A. Brown, F. Long, R. A. Nicholls, J. Toots, P. Emsley and G. Murshudov (2015) Tools for macromolecular model building and refinement into electron cryomicroscopy reconstructions. Acta Cryst. D71, 136-153 About REFMAC Refmac is a program for refinement of atomic models into experimental data It was designed originally for Macromolecular Crystallography - now also cryoEM It is based on some elements of Bayesian statistics: it tries to fit chemically and structurally consistent atomic models in X-ray crystallographic data Note: Refmac does not build or dock atomic models (cf. Coot, Buccaneer, Chimera, FlexEM, etc.) - rather it optimises the current atomic model. Purpose of refinement Crystallographic refinement has two purposes: 1) Fit chemically and structurally sensible atomic model into observed data. In reciprocal space, optimise agreement of |Fobs| to |Fcalc|. 2) To calculate best possible (electron density) map so that atom model can criticised. In MX, phases are calculated not observed. In cryoEM: 1) Still true. Maintaining correct chemistry especially important at lower resolutions. 2) Map is considered fixed (at least at the moment). + Data = Atomic model Fit and refine Optimise atomic coordinates to EM density + other parameters (B factors, occupancies, scale factors). Can give significant improvement. Impose and maintain good geometry. Use prior knowledge about macromolecular chemistry. Very important at low resolution. Atomic structure refinement Data: EM reconstruction, X-ray intensities, approx. X-ray phases (MAD/SAD/SIRAS/MIR), known geometries Model: atomic coordinates, occupancies, ADPs (B factors), rigid bodies, TLS, bulk solvent, scale factors, twin laws Target function: f = w.fexpt + fgeom fexpt = - log [P(Fo;Fc)] fgeom = w (bobs - bideal)2 + ... Reciprocal space refinement: F.T. Fo |Fo| Weights Target function: f = w.fexpt + fgeom fexpt = - log [P(Fo;Fc)] fgeom = w (bobs - bideal)2 + ... Weight between fexpt + fgeom • High resolution refinement guided by EM density • Low resolution rely more on known geometry Weight is automatically determined, but can be manually adjusted Weights between different restraint types Likelihood function for fit into cryo-EM map Probability distribution of “observed” structure factors given coordinates of a molecule: P(Fo ; Fc ) = Ne |Fo -Fc |2 S+2 s 2 Fo “observed” structure factors calculated from EM map Fc calculated structure factors – from coordinates variance of signal 2 variance of noise N normalisation Fc calculated from current model using electron scattering form factors (describe interaction with electrostatic potential) What do we know about macromolecules? (a.k.a. prior knowledge / structural restraints) 1) Macromolecules consist of atoms that are bonded to each other in a specific way (dictionary files for residues and ligands) 2) If there are two molecules with sufficiently high sequence identity then it is likely that they will be similar to each other in 3D (external restraints) 3) It is highly likely that if there are two copies of a the same molecule they will be similar to each other (at least locally) (local NCS restraints) 4) Oscillation of atoms close to each other in 3D cannot be dramatically different (B factor restraints) 5) Proteins tend to form secondary structures (secondary structure restraints) 6) DNA/RNA tend to form base-pairs, stacked bases tend to be parallel (base pair restraints) External (reference) structure restraints When refining against low resolution data, restraints to a (higher resolution) external structure and/or secondary structure restraints can be useful. Restraints generated by the program ProSmart (Rob Nicholls): 1) Aligns structure in the presence of conformational changes. Sequence is not used. 2) Generates restraints for aligned atoms 3) Identifies secondary structures (at the moment helix and strand, but the approach is general and can be extended to any motif). 4) Generates restraints for secondary structures Robust estimator functions are used for restraints, i.e. if differences between target and model is very large then their contributions are down-weighted Geman–McClure function Restraints: external restraints All restraints can be visualised and applied in Coot: Yellow=target, purple=reference Restraints: secondary structure • • • In addition to reference restraints or when no high-resolution structures exist helical fragment restraints and secondary structure h-bond restraints (which include: helix restraints; sheet restraints; and loop restraints). Visualisation of helix restraints in Coot v0.8.3 Extras → Read ProSMART Restraints ... Can also edit and save list of restraints Restraints to current distances (jelly-body) To stabilise refinement at low resolution, the following term can be added to the target function: å w(| d | - | d current |) 2 pairs Summation is over all pairs in the same chain and within given distance (default 4.2A). dcurrent is recalculated at every cycle. This function does not contribute to gradients. It only contributes to the second derivative matrix (direction of optimisation). It is equivalent to adding springs between atom pairs. During refinement interatomic distances are not changed very much. If all pairs would be used and weights would be very large then it would be equivalent to rigid body refinement. Example: 10Å After positioning the molecule with Molrep (grey model) and 150 cycle of jelly body refinement (black model) Basepair and parallel plane restraints A-U b.p. • Base-pair, parallel planes and sugar pucker restraints for nucleic acids • Parallel planes also for protein - nucleic acid pairs Green – before refinement, Blue – refined without LIBG, Yellow – refined with LIBG Refinement protocol • REFMAC can refine models against EM maps • Input can be multiple or composite maps • External restraints can be applied for different regions to account for variation in local resolution • Electron scattering factors are used • Symmetry restraints can be applied Restraints: reference restraints • • Use information from structures at high resolution to restrain refinement at lower resolution Reduces the chance of overfitting Target + Reference (restraints from) = Refined Restraints can be tuned to local resolution Tighter restraints may be required Masking improves local resolution Masks Composite map refinement 1 2 Interface from full map Monitoring fit to density: FSCaverage • • • • • Measure of fit to density (analogous to crystallographic Rfactor) Used to follow the progress of refinement FSCaverage avoids a dependence on weight FSC is calculated over resolution shells If shells are sufficiently narrow the weights are roughly the same within each shell Refinement of deposited structures • • • Re-refined structures deposited in the EMDB at a resolution of 4 Å or higher (n = 14) Significant improvements in MolProbity clashscore And fit-to-density (as measured by FSCaverage) Overfitting What leads to overfitting? 1. 2. 3. 4. 5. Insufficient data (low resolution, partial occupancy) Ignoring data (cutting by resolution) Sub-optimal parameterisation Bad weights Excess of imagination Validation of overfitting Validation of overfitting Not overfitted FSCwork = model refined against half map 1, compared to half map 1 FSCfree = model refined against half map 1, compared to half map 2 Overfitted Effect of oversharpening A report in Science: Bartesaghi A, Merk A, Banerjee S, Matthies D, Wu X, Milne J, Subramaniam S “2.2 A resolution cryo-EM structure of beta-galactosidase in complex with a cell-permeant inhibitor” SCIENCE (2015) Claim 2.2A maps. Deposited map does not look like to be at 2.2A. It is the case of over-sharpening. Map quality is better that deposited map. 1500 <|F|> vs resolution 1000 Map from EMDB 0 500 <|Fo]|> Blurred with B value 40 and 60 0.0 0.2 0.4 Resolution 1/d 0.6 0.8 Map from PDB Blurred: B = 60 Model refined against blurred map Some recent developments Difference maps Refmac can do vector difference maps. It is very simple and seems to wok: Fdiff = Fo – Fc Where Fo is from the map and Fc is the from the atomic model scaled to fit to Fo. Both Fo and Fc are complex numbers (two dimensional vectors) To invoke this the following keyword needs to be used: mapcalculate vector Reference map sharpening Cross validation requires maps to be on the same sharpening level. Refmac can sharpen given map to a reference map. Usually reference map is taken as full reconstruction map. Fourier transformation (structure factor) from reference map is calculated and then average values of modules of structure factors in resolution shells are calculated. For given map average values of modules of Fourier coefficients in resolution bins scaled to reference maps coefficients Summary Problem Solution Refinement programs used in X-ray crystallography were not suitable for EM maps Modified REFMAC to handle EM maps taking into consideration electron scattering factors and multiple maps generated by masking RNA geometry is handled poorly by refinement programs LIBG generates external restraints to improve base pairing, base planarity and pucker geometry No Rfactor equivalent to monitor fit to density FSCaverage as a global measure of fit to density No Rfree for validating overfitting An approach to overfitting using half maps generated during map refinement
© Copyright 2026 Paperzz