X-ray Scattering (SAXS) combined with
crystallography & computation: Defining
accurate macromolecular structures,
conformations, & assemblies in solution
John A. Tainer
Advanced Light Source at Lawrence
Berkeley National Laboratory
www.bl1231.als.lbl.gov
www.bioisis.net
Biology is written in the
language of molecular shapes
that encode connections &
chemistry
Q R Biophysics 40:191-285
Nature Methods 6: 606-612.
ALS SIBYLS beamline, 12.3.1
SAXS data changes our understanding of
molecular machines - new Mre11–Rad50 structure
bound to ATPγS exists in solution
SAXS data of P. furiosus
Mre11–Rad50–ATP
Log
Intensity
Rad50
dimer
Fit of M. jannaschii
Mre11–Rad50–ATPγS
crystal structure
Chi2 = 1.199
Mre11
dimer
Scattering angle (A-1)
SAX/NS Data Collection
scattering angle
Collection:
Several images at varying concentration of particle
SAX/NS images are reduced to 3 column text files
Set of curves are used to verify concentration independence
Set is reduced to a single curve for modeling
Output results in either atomistic, volumetric or ensemble model
atomistic
volumetric/bead
ensemble
3
Verifying Concentration Independence
Modeling SAXS is prone to over-fitting
Chi-square routinely < 1
lowest noise
can be biased by:
•interparticle interference
•S(q)
•beamstop issues
highest noise
Important to have access to the
concentration series used to generate the
reduced/merged SAXS curve
How to “structure” the SAXS/SANS data for storage
Data types
1-Dimensional
Rg from Guinier & real space
d
max
Porod Volume
Molecular weight
Porod Exponent
2-Dimensional
SAX/NS data:
Merged SAX/NS curve used for modeling
Unscaled curves used to generate merged.
P(r) : used to estimate dmax
3-Dimensional
Models:
Volumetric/Bead
Atomistic
Hybrid
We created BioIsis.net, a web-accessible
database that stores SAXS data and
models.
SAXS can be performed at different
conditions:
- pH ?
- salt concentration?
- temperature ?
Conformations are condition specific.
(same as NMR studies)
Database must allow for many models
or experiments per ORF(s).
conditions
…an online database for macromolecular SAXS
Relational database using MYSQL
Written using Ruby-on-Rails MVC Framework
• raw scattering data
• transformed data
• experimental conditions
• SAXS derived models
• linked to related experiments (e.g. different conditions)
P(r) Plot
Guinier Plot
Kratky Plot
The “Details” page is specific to a SAXS
experiment.
The page shows the raw data used and
the refined SAXS parameters, Rg, I(0),
dmax, etc.
SAXS experiments are condition specific
(pH, salt, temperature…). Therefore,
several entries may exist demonstrating
the solution structure variability of the
macromolecule.
unfolded
folded
Same macromolecule
Different conditions
All the data is download-able.
What and Why?
What information should be stored in PDB?
1.Requisite structural models (coordinates) relevant to a publication
sequence-based structures
•atomistic model(s)
•ensemble set
•Associated experimental information:
•SAX/NS data (small byte footprint) – store all data
•Buffer conditions (each submission ⟹ explicit description of buffer composition)
•Location/instrument (synchrotron/home source)
Why?
1.Allow for independent analysis and validation
2.Promotes technical development of SAX/NS
3.Information contained in the file should allow:
•reproduction of the deposited structural model
•detection of error (e.g. model bias)
Adding SAX/NS to the PDB
PDB: NMR and EM
1.PDB already promotes Bifurcated deposits for:
•EM
•NMR
Deposits to PDB already accepts ensembles for NMR
from http://deposit.rcsb.org/depoinfo/print_nmr.html
SAX/NS deposits can be an inherited set of properties from NMR and EM
Accepts EM Model
Stores EM volumetric models ≣ experimental data
In EM, the final
volumetric blob
constitutes the data
and is used to build
an atomistic model
http://salilab.org/modeller/tutorial/cryoem/fit.html
In SAX/NS, our models are evaluated against the scattering profile
Not quite raw data,
at the mercy of the
algorithms used to
pick/classify and
generate the
volume element.
• Sequence Level Structures:
• excludes volumetric models like EM
• Includes all atom or C-alpha models
SAX/NS
Invariants
(structural parameters derived directly from SAXS)
Q, Porod Invariant
Directly related to mean square electron density of scattering particle.
Requires convergence in Kratky plot (q2 I(q) vs q).
Not always calculable from SAX/NS curve.
Vp, Porod Volume
lc, correlation length
Rg, radius-of-gyration
Requires a folded particle, otherwise Q won’t converge properly.
Q acts as a normalization constant and corrects for:
1.concentration
2.contrast, (Δρ)2
Does not require Q
Concentration independent
Contrast independent (as long as structure does not change)
Essentially normalized to I(0)
Defining a new Invariant
Kratky Plot
flexible-unfolded
compact-folded
q, Å
q, Å
plot differently
Data converges for both
•compact - folded
•flexible - unfolded
Q can not be calculated from flexible sample
Leaves Rg as the only structural parameter
However, in total intensity plot(q⋅I(q) vs q) suggesting a new normalization method
13
The Volume-of-Correlation
=
=
1. substitute for I(q)
independent of:
1. contrast
2. concentration
lc is the expected correlation length
2. collect like terms
5. collect like terms
4. substitute P(r) = 4π r2 γ(r)
3. integrate by parts
∞
0
correlation function
Vc: A Novel SAS Ratio
Vc
MD simulation of SAM-1
• 8 different protein and RNA samples
• 4 to 7 different concentrations
Vc sensitive to conformational state like Rg
67% variance is contained within 2% mean
CONCENTRATION INDEPENDENCE!
15
Direct Mass Determination
9446 PDB entries range from 8 to 400 kDa (protein only)
Simulated SAXS
•
•
•
•
QR scales with mass, linear via power-law distribution.
Using actual data, 9% mass error with previously frozen samples.
Linear relationship covers a large mass range 20 to 1,000 kDa.
Effective for RNA samples 5% error.
Protein
RNA
taken from BioIsis.net
purified via SEC-MALS
purified via SEC-MALS
Experimental SAXS
Use to infer mixtures:
...expect 26 kDa and get 40 kDa
• monomer ↔ dimer?
16
Validation
Mass from the SAXS curve should match the mass of model.
•use Vc as a validation tool
•authors can use standard curve (dependent on standards having same density flexible?)
Model validation should follow NMR and X-ray standards
•bond lengths, geometry, etc
Quality of the fit of the model to the data
•standard is to use Chi2
•we propose a Shannon-channel limited Chi2 known as LMC
Quality of the data
•Most easily evaluated from concentration series
•For data with Guinier region, agreement between Rg(Guinier) and Rg(real space)
should be within 5% for great data
•Noise levels, divide curve into Shannon-bins and report average noise in each bin
Current SAXS modeling Software
CRYSOL
•multipole expansion based method
SASTBX
•zernicke polynomials
All use
FOXS
•direct Debye calculation
In Crystallography and NMR:
•use cross-validation Rfree (X-ray)
•complete cross-validation (NMR)
In SAXS:
Don’t have a similar statistical method as Rfree or CV
Propose using the least median chi-square (LMC) to assess model-data fit
Shannon-Nyquist Sampling Theorem
SAXS data are highly redundant, but how redundant?
P. Moore circa 1979:
minimum set of points needed to represent a SAS curve given dmax and qmax
qmax
dmax
n
We collect ~500 to 800 data points
0.32
43
4
125 fold-redundancy
0.32
71
7
0.32
240
25
Chi2 should be calculated against independent random variables
So, for Xylanase, we can select 4 points.
•But which 4?
•What if we select 4 bad points?
Least Median Chi-Square (LMC)
Want a selection method that provides resistance to outliers
Provides true assessment of model-data agreement
Consider a position estimate?
Intensity measurement
pH measurement
Phase angle
6 measurements = {3,4,4,4,4,5}
mean = 4
median = 4
7 measurements = {3,4,4,4,4,5,21}
mean = 6.4
median = 4
Notice: Mean changes, median does not!
Mean is “sensitive” to outliers
Median tends to be “insensitive” to outliers
Sensitivity is called “breakdown point”
Median has a 49.99% tolerance to outliers = ROBUST
Least Median Chi-Square (LMC)
Determine n
Divide data into n equal bins
Randomly Sample
1 point from each bin
1000x
Calculate Chi2
Take Median value
Least Median Chi-Square (LMC)
Using Xylanase data (dmax: 43)
Generated 1600 conformations (CONCOORD) using PDB: 1REF
Calculated Chi2 using CRYSOL and LMC
Best identified Model
Normal Chi2: 1.0
LMC: 1.39
Prevents:
•over-fitting
•mis-identification
LMC is sensitive to “small” differences
Requirements
What are the requirements regarding the supporting experimental data that need
to be deposited?
Rg (Guinier)
Rg (real)
Mass of the particle estimated from data and model
Vc (volume-of-correlation)
Composition (DNA/RNA/Protein)
dmax estimate from SAX/NS data (establishes Shannon limit)
Solution sample details
Sequence information
Deposit concentration series to evaluate quality of the data
Calculate Chi2 and LMC to evaluate quality of the fit of the model to the data
© Copyright 2025 Paperzz