Extracting Equilibrium Constants from Kinetically Limited Reacting

Provided for non-commercial research and educational use only.
Not for reproduction, distribution or commercial use.
This chapter was originally published in the book Methods in Enzymology, Vol. 455,
published by Elsevier, and the attached copy is provided by Elsevier for the author's
benefit and for the benefit of the author's institution, for non-commercial research and
educational use including without limitation use in instruction at your institution,
sending it to specific colleagues who know you, and providing a copy to your
institution’s administrator.
All other uses, reproduction and distribution, including without limitation commercial
reprints, selling or licensing copies or access, or posting on open internet sites, your
personal or institution’s website or repository, are prohibited. For exceptions,
permission may be sought for such use through Elsevier's permissions site at:
http://www.elsevier.com/locate/permissionusematerial
From: John J. Correia and Walter F. Stafford, Extracting Equilibrium Constants
from Kinetically Limited Reacting Systems. In Michael L. Johnson, Jo M. Holt, and
Gary K. Ackers, editors: Methods in Enzymology, Vol. 455,
Burlington: Academic Press, 2009, pp. 419-446.
ISBN: 978-0-12-374596-5
© Copyright 2009 Elsevier Inc.
Academic Press.
Author's personal copy
C H A P T E R
F I F T E E N
Extracting Equilibrium Constants
from Kinetically Limited
Reacting Systems
John J. Correia* and Walter F. Stafford†
Contents
420
421
421
428
436
442
443
443
1. Introduction
2. Methods
3. Simulation and Analysis of Dimerization
4. Kinetically Mediated Dimerization
5. A Stepwise Approach
6. Final Thoughts
Acknowledgments
References
Abstract
It has been known for some time that slow kinetics will distort the shape of a
reversible reaction boundary. Here we present a tutorial on direct boundary
fitting of sedimentation velocity data for a monomer-dimer system that exhibits
kinetic effects. Previous analysis of a monomer-dimer system suggested that
rapid reaction behavior will persist until the relaxation time of the system
exceeds 100 s (reviewed in Kegeles and Cann, 1978). Utilizing a kinetic integrator feature in Sedanal (Stafford and Sherwood, 2004), we can now fit for the koff
values and measure the uncertainty at the 95% confidence interval. For the
monomer-dimer system the range of well determined koff values is limited to
0.005 to 105 s1 corresponding to relaxation times (at a loading concentration
of the Kd) of 70 to 33,000 s. For shorter relaxation times the system is fast
and only the equilibrium constant K but not koff can be uniquely determined. For
longer relaxation times the system is irreversibly slow, and assuming the
system was at initial equilibrium before the start of the run, only the equilibrium
constant K but not koff can be uniquely determined.
*
{
Department of Biochemistry, University of Mississippi Medical Center, Jackson, Mississippi, USA
Boston Biomedical Research Institute, Watertown, Massachusetts, USA
Methods in Enzymology, Volume 455
ISSN 0076-6879, DOI: 10.1016/S0076-6879(08)04215-8
#
2009 Elsevier Inc.
All rights reserved.
419
Author's personal copy
420
John J. Correia and Walter F. Stafford
1. Introduction
Sedimentation velocity has been a standard hydrodynamic technique
since the inception of the method in the 1920s by Svedberg (Svedberg and
Nichols, 1927; reviewed in van Holde, 2004). As with all separation
techniques, the resolution and the shape of the profile can be influenced
by the presence of molecular interactions, including interactions with small
molecules. This was intensively investigated by many early practitioners of
analytical ultracentrifugation, including Gilbert, Cann, Cox, and Kegeles
(Cann, 1970; Cann and Kegeles, 1974; Gilbert, 1955, 1959, 1960; Gilbert
and Jenkins, 1959; Kegeles et al., 1967; Oberhauser et al., 1965) and
extensively reviewed by Kegeles, Cox, and Cann in 1978 (Cann, 1978a,b;
Cox, 1978; Kegeles, 1978; Kegeles and Cann, 1978). Much of the early
work was done assuming the absence of diffusion (Belford and Belford,
1962; Gilbert and Gilbert, 1978; Gilbert and Jenkins, 1959; van Holde,
1962) and with the appropriate approximate solutions to the transport
equations. Analysis often involved graphical comparisons of simulation
with experimental data. Simulation methods for solutions of the Lamm
equation, the transcendental differential equation that describes sedimentation and diffusion in the ultracentrifuge, used numerical methods developed
by Weiss and Yphantis (Correia et al., 1976; Dishon et al., 1966, 1967) and
were advanced greatly by the development of the finite element method
developed by Claverie (Claverie, 1976; Claverie et al., 1975). The development of direct comparison of data with simulation by computer fitting
techniques were first developed in 1981 by Todd and Haschemeyer, but
complete implementation of this approach had to wait for the development
of faster computing power. Development of better fitting and simulation
algorithms continues (Dam et al., 2005; Demeler and Saber, 1998; Philo,
2006; Schuck, 1998; Schuck and Demeler, 1999; Stafford, 1998, 2000;
Stafford and Sherwood, 2004), and these approaches have now been fully
implemented into user-friendly software platforms (including but not limited to Sedfit, Sedphat, Sedanal, and Ultrascan). This rather brief and
selective introduction brings us to a stage where users are now able to
directly fit sedimentation velocity data with relative ease, but with the
caveat that complexities of many kinds require some thoughtful selection
of approaches and fitting models. Here we begin by presenting a brief
tutorial on the analysis and direct boundary fitting of sedimentation velocity
data focusing on a weak monomer-dimer system. Then we discuss the
effects of slow kinetics on the shape of a reversible reaction boundary, and
we present direct boundary-fitting methods on sedimentation velocity data
for kinetically mediated monomer-dimer systems.
Author's personal copy
Extracting Equilibrium Constants
421
2. Methods
All data simulation and fitting were done with Sedanal. The maximum
integration steps allowed should be reset to at least 10,000,000 to allow slow
kinetic systems to come to equilibrium before the start of the simulated run.
Data were simulated in absorbance mode at 50 K rpm, unless noted
otherwise, with 0.005 abs units of Gaussian random noise added to each
data set. For the dimer cases a 100 KDa monomer, s1 ¼ 5.8 S, s2 ¼ 9 S, with
an extinction coefficient of 1.2 ml/mg/cm was used. Data were simulated
with a time interval between scans of 100 s. Data were preprocessed to
select for meniscus, base, and fit regions and stored as abr files. An even
number of scans (typically 20) were chosen for direct boundary fitting so
that the gradient in the last few scans was near the base of the fit region.
Fitting models were constructed with ModelEditor 1.74. Fitting is done
using the Levenberg-Marquart algorithm, although initial approaches to the
minima were often done with the Simplex algorithm of Nelder and Mead
(1965). Confidence intervals for s1, s2, K, and koff were estimated using
F-statistics at the 95% confidence interval.1
3. Simulation and Analysis of Dimerization
A weak but rapidly reversible dimerization reaction is shown in
Fig. 15.1, where panel A shows the weight average sedimentation coefficients, Sw, plotted as a function of the logarithm of the plateau concentration
(see Correia, 2000, for a detailed discussion of the use of Sw) and panel B
shows the shape of the g(s) distributions derived from simulated data. Note
that the Sw for this 81-fold concentration series are sufficiently far from the
s1 and s2 values for the monomer and dimer species alone and thus will
require some extrapolation or fitting method to be estimated. The shapes of
the g(s) distribution show the same general result with the boundary
maximum position not predicting either the s1 or s2 values. This is a typical
situation for associating systems, unless they are strongly cooperative
(Cann, 1978), and the determination of one or both end points is a major
experimental challenge for the investigator.
1
An F statistic is constructed by stepping or searching along a parameter axis repeating the fit for all floated
parameters until the fit hits a target rms value expressed as the (rms/rmso)2, where is the rmso of the best fit of
the data. The target F-stat is determined by the degrees of freedom and is typically around 1.013 for this kind
of data at the 95% or two standard deviation level. Notice many of the confidence intervals are asymmetric
(Tables 15.1 and 15.2 and Fig. 15.8).
Author's personal copy
422
John J. Correia and Walter F. Stafford
A
9.0
8.5
Sw
8.0
7.5
7.0
6.5
Kd
6.0
5.5
−6.5
−6.0
−5.5
−5.0
logC (M)
B
−4.5
−4.0
1.6
1.4
1.2
0.6
0.5
0.8
0.4
g(s)/C
g(s)
1.0
0.3
0.2
0.1
0.6
0.0
4
0.4
6
8
10
S
0.2
0.0
2
4
6
8
10
12
S
Figure 15.1 Simulation and analysis of concentration dependence of weak dimerization with a 100K protein, Kd ¼ 4 mM, ext ¼ 1.2 ml/mg/cm, s1 ¼ 5.8 and s2 ¼ 9.0. Panel
A: weight average sedimentation coefficient Sw on simulations performed at 0.4 to
40 mM. Panel B: g(s) analysis performed with DCDTþ2 (Philo, 2006) on simulations
done at 2, 4, 8, 12, 20, and 28 mM (corresponding to 0.24 to 3.36 OD and thus implying
the use of both a 1.2-cm and a 3-mm cell in an XLA). These data correspond to the
boxed region in panel A. The horizontal lines in panel A and the vertical dotted lines in
panel B correspond to the sedimentation coefficients of the monomer and the dimer.
The insert presents normalized traces, g(s)/co.
Quantitative analysis of these data is best done by global direct boundary
fitting of all the data to an appropriate model. Our preferred approach is fitting
with Sedanal, which fits multiple data sets to concentration time-difference
Author's personal copy
423
Extracting Equilibrium Constants
curves to eliminate time-independent systematic errors inherent in the optical
systems (Stafford and Sherwood, 2004). An example of this is shown in
Fig. 15.2 when the best direct boundary fit is presented for all the data in
Fig. 15.1B to a rapid monomer-dimer equilibrium model (koff ¼ 0.1 s1).
The best-fitted K2 value is listed in Table 15.1 along with a 95% confidence
interval. As pointed out earlier the extrapolated values for the monomerdimer sedimentation coefficients are not known a priori for experimental data,
and thus Table 15.1 also presents a fit floating s2 and K2. The values are clearly
correlated (a correlation coefficient R of 0.99884 can be estimated from the
data in the F-stat log file generated under output files) with a slightly smaller s2
causing a larger K2, and the total confidence interval for K2 larger than for the
fit of K2 alone (0.4 vs. 0.12). The estimate of s2, 8.934 S, in this case is close to
the correct value of 9.0 S. One might thus infer this reflects the fact that
0.10
A
0.20
B
ΔC
0.15
0.10
0.05
0.05
0.00
ΔC
0.4
0.00
0.8
D
C
0.3
0.6
0.2
0.4
0.1
0.2
0.0
1.5
0.0
ΔC
1.0
E
F
0.8
1.2
0.6
0.9
0.4
0.6
0.2
0.3
0.0
0.0
6.0
6.2
6.4
6.6
Radius (cm)
6.8
7.0
6.0
6.2
6.4
6.6
6.8
7.0
Radius (cm)
Figure 15.2 A plot of the best Sedanal global fit of the monomer dimer data (2–28 mM
data shown in panels A–F) presented in Fig. 15.1B. The model is a rapidly reversible
monomer-dimer holding s1 and s2 to the correct values. Data are plotted as Dc, which is
the difference between pairs of scans, to remove time independent systematic noise.
Superimposed are the data (black squares) and the best fit (black lines) for each cell.
The residuals are also plotted (black lines) and appear as a noisy trace near zero. The
best fitted value for K2 (2.527 105) appears in Table 15.1 with a 95% confidence
interval <2.47, 2.59> calculated with an F-stat procedure available in Sedanal. A
second fit allowing s2 to float is also presented in Table 15.1 Notice the correlation
between a lower s2 and a larger K2, value as well as the larger confidence interval that
reflects the coupling between the s2 and K2 values.
Table 15.1
Monomer-dimer
Model
C’s
s1
s2
K2 (M1)
rms
koff ¼ 0.1
1-28 mM
5.8
5.8
4.515 <3.46,5.15>
4.658 <3.72,5.24>
5.978 <5.88,6.07>
5.85
9.0
8.934 <8.91,8.96>
8.971 <8.94,9.00>
8.967 <8.93,9.00>
8.946 <8.93,9.00>
8.77
2.527 105 <2.47,2.59>
2.962 105 <2.77,3.17>
5.684 105 <4.21,8.48>
5.356 105 <4.02,7.79>
2.50 105
4.558 105 <4.38,4.74>
0.00570
0.00543
0.00520
0.00520
0.00550
0.00690
5.85
8.932 <8.91,8.96>
2.871 105 <2.68,3.08>
0.00541
2 species plot
s values
0.4-28 mM
0.4-28 mM
0.4-28mM
Author's personal copy
Extracting Equilibrium Constants
425
multiple data sets from a range of concentrations around Kd (cmax ¼ 7Kd in this
case) can be fitted to extract an accurate s2 value when the correct s1 value is
used in the fitting. To test this inference, a third fit floating s1, s2, and K2 is also
shown in Table 15.1 Both parameter uncertainty and confidence intervals are
clearly larger for K2, reflecting a strong correlation between the limits of s in
the fit and K2 (R for K2 vs. s1 ¼ 0.9985; R for K2 vs. s2 ¼ 0.9990; R for s2 vs
s1 ¼ 0.9999). In this case while s2 is again well determined, K2 is too large by
a factor of two (which is only 0.48 kcal smaller on a DG scale) with a large
and skewed confidence interval and s1 is smaller by 22% with a large and
asymmetric confidence interval that does not include the correct value, 5.8 S.
Adding a lower concentration point of 0.4 mM to the fit (corresponding to an
Abs280 ¼ 0.0576) might help constrain s1 except the lower signal-to-noise in
this case appears to limit its impact (Table 15.1). Thus, the best-fitted parameter values for s2 and K2 are surprisingly insensitive to the absolute value of s1.
These results are presented to introduce the method of direct boundary
fitting, and they allow us to make three additional points:
1. The difficulty of fitting velocity data for a self-associating system where the
model and the endpoints may be uncertain is the best reason for also using
equilibrium analysis where the molecular weights of monomer and dimer
can be calculated and are usually known to good precision, thus making
the fitting for K2 more robust (Correia et al., 1995, 2001; Zhao and
Beckett, 2008). Some systems may still be more amenable to velocity
studies than equilibrium studies (Correia, 2000), and it is worth emphasizing that only velocity studies will provide information about shape and
kinetics (Gelinas et al., 2004; Stafford and Sherwood, 2004). That being
said, one can still attempt to fit velocity data to extract molar mass. In fact,
for experimental data an additional test of the model is to fit for molecular
weight. For a monomer-dimer system one would start by fitting M1 and
constraining M2 to 2 M1. For these data the best fit to M1 and K2,
constraining s1 and s2, returns values of M1 ¼ 51384 <50518, 52281>
and K2 ¼ 3.076 105 M1 <2.98, 3.18>. Thus, this monomer-dimer
system is exceedingly well determined.
2. Data must be collected over as wide a concentration range as possible,
10% to 90% degree of association, to obtain the reasonable estimates for
the end points of the isotherm, although this is clearly not necessarily
sufficient to determine both endpoints. This range is obviously dependent upon Kd, the optics used for data collection and thus the range of
experimental concentrations. We stress this point to discourage the idea
that a single concentration or a few concentrations are sufficient to
extract de novo thermodynamic and hydrodynamic information about a
system. For example, fitting the 4 mM data alone yields a K2 ¼ 2.456 105 M1 with error limits three times larger <2.29, 2.64> than the s1, s2
constrained fit in Table 15.1 The central issues are where the monomer-
Author's personal copy
426
John J. Correia and Walter F. Stafford
dimer model and the sedimentation coefficient limits would come from
in the case of a single experimental data set. It is simply not an appropriate application of the scientific method. To enhance the range of concentrations the use of different path length cells or different wavelengths,
and the use of interference or fluorescence optics are extremely helpful.
These approaches raise issues about determining extinction coefficients
that will be addressed in a later section.
3. Extrapolation techniques may also be utilized or required to accurately
estimate hydrodynamic end points. These techniques have been discussed previously in a review on weight average techniques (Correia,
2000). Direct fitting of Sw vs. c is a common approach although the end
points are usually constrained and not necessarily well determined
(Correia, 2000). The use of a reciprocal plot, 1/sw vs. 1/c may be useful
to extract s2 or sn with the caveat that the functional form of the
extrapolation is unclear in part because at extremely high macromolecule concentrations nonideality may become an issue and thus require
additional fitting parameters (Stafford and Sherwood, 2004). Additional
graphical techniques commonly used in sedimentation equilibrium analysis involve plotting of the moments of the distributions (Stafford, 1980;
Yphantis and Roark, 1972). Figure 15.3 presents a two-species plot
(Roark and Yphantis, 1969; Sophianopoulos and van Holde, 1964) for
the data in Fig. 15.1B. A linear plot is typically consistent with the
presence of only two species (hence the name), thus improving one’s
confidence in a real experimental situation about the fitting model
proposed. The lower moments, sw vs. 1/sn, appear to do a better job
of predicting the correct s1 value, while the higher moments, szþ1 vs.
1/sz, underestimate but approach the correct s2 value for the dimer.
(To our knowledge only DCDT (Stafford, 1992), DCDTþ2 (Philo,
2000), and Sedanal (Stafford and Sherwood, 2004) calculate these higher
moments, although it would be trivial to program in any spreadsheet.
Only Sedanal is programmed to generate two species plots for equilibrium data analysis.) This approach is useful for estimating the parameter
guesses for an unknown fitting problem in combination with simple
rules about the expected relationship between s values for a polymer.
On the basis of a constant axial ratio model between monomers and
oligomers, we expect sn values to be s1*n2/3. (In this case 9/5.8 ¼
1.552, which is slightly less than 22/3 ¼ 1.587.) The ratio of s2/s1 can
thus be constrained by entering a relationship into the equation editor of
Sedanal found under the fit window; for example, s2 ¼ 1.59*s1 would
force the fit to obey the n2/3 rule. An exception to this <n2/3 rule
occurs when the monomer is unfolded or intrinsically disordered and
thus sediments with a larger frictional value than the folded form that
dimerizes. In these cases s2 may very well be >s1*22/3.
Author's personal copy
427
Extracting Equilibrium Constants
Sw vs 1/Sn
Sz vs 1/Sw
Sz + 1 vs 1/Sz
Sw Sz Sz+1
9
8
7
6
0.10
0.12
0.14
0.16
1/Sn 1/Sw 1/Sz
0.18
Figure 15.3 An example of a two-species plot to estimate s1 and s2 for the dimerization reaction presented in Figs. 15.1 and 15.2. The data at 2, 4, 8, 12, 20, and 28 mM
were analyzed with DCDT þ 2 to generate n-, w-, z- and z þ 1-average sedimentation
coefficients of the g(s) distribution. The data are then plotted as shown, with the crosses
corresponding to the correct monomer and dimer sedimentation coefficients. Linear fits
have correlation coefficients of 0.9996, 0.9992, and 0.9992. The lower moments, Sw vs.
1/Sn, appear to do a better job of predicting s1, crossing the line at 5.85 S, while the
higher moments, Szþ1 vs. 1/Sz approach the s2 value for the dimer, crossing the line at
8.77 S. The near approach emphasizes the importance of spanning a wide concentration
range during the analysis of binding data. The heavy black line is a plot of s vs 1/s.
While this graphical approach has been criticized for potentially being
too sensitive to noise and baseline artifacts in the g(s) data, especially the
higher moments (Correia, 2000; Philo, 2001), it has also been pointed out
that one must simulate the actual experiment to know how conditions will
affect moment analysis (Philo, 2000). In the simulations discussed by Philo
(2000), the samples were mixtures of eight noninteracting components with
s values raging from 6.2 to 30 S. In this dimerization case (Fig. 15.1) the
s values for the various samples only vary from 6.5 to 8.5 S, and thus for any
one sample the scans come from a narrower region and the moment
estimates are much more precise (much less than 1% using the new algorithms in DCDTþ2; see Fig. 15.1A). If we use these values in the direct
boundary analysis the fit will clearly be worse because 8.77 S is outside the
95% confidence interval for s2. If we constrain the s1 value to 5.85 S and
float s2 and K2, then the fit is reasonable with accurate and precise estimates
for both parameters (Table 15.1). An additional solution to this problem of
estimating hydrodynamic parameter values for monomers and oligomers
involves the use of hydrodynamic bead modeling in combination with
structural data (Byron, 2008).
Author's personal copy
428
John J. Correia and Walter F. Stafford
We reemphasize the importance of spanning a wide concentration
range, if possible, during the collection and analysis of binding data, in
this context to minimize the distance required to extrapolate to the end
points. If one imagines data sets where Kd is significantly larger or smaller
than 1 mM, well-determined estimates of s1 and s2, respectively, are likely
to be obtained with a major challenge being estimating the unknown
s value. In some instances an experimental approach involving mutants
that knock out association can be utilized to extract s1 values (Zhao and
Beckett, 2008). This is generally a useful strategy employed in studies of the
thermodynamics of homo- and heteroassociation (Chen et al., 2008;
Correia et al., 2001). In addition, many proteins have irreversible aggregates
or inactivated monomers in the mixture, and these may prove useful in
establishing one or more end points of the reaction (Correia et al., 1995;
Snyder et al., 2004). Unfortunately, unless these components can be
removed by gel filtration or dissociated by the addition of reducing agents
(TCEP being preferred because of low absorbance and the absence of pH
dependence), they also add parameters to the fitting model. In the case of a
larger Kd, verifying the hypothesis of a monomer-dimer equilibrium
becomes challenging because there may be no experimental estimate possible for a maximum molecular weight or sedimentation coefficient. One
additional inverse approach is to constrain K2 with a value determined by
sedimentation equilibrium to help narrow down the range of s values. (For
example, constraining K2 to 2.5 105 M1 actually improves the accuracy
and precision of the lower s value limit (s1 ¼ 5.978 S; s2 ¼ 8.946 S; see
Table 15.1), although the rms of the fit is clearly worse. These considerations are presented and discussed to introduce the modeling and direct
boundary-fitting methods that are also useful for analysis of kinetically
mediated velocity data.
4. Kinetically Mediated Dimerization
We will now apply these general approaches to the analysis of kinetically mediated velocity data. Simulations of the role of kinetics in a
monomer-dimer sedimentation velocity experiment are presented in
Fig. 15.4. Dimerization data were simulated at the Kd, 4 mM, and koff was
allowed to vary from 0.1 to 0.000001 s1. At koff ¼ 0.001 s1 the shape of
the boundary becomes more skewed relative to the faster cases and as the off
rate decreases the shapes become profoundly bimodal resolving into a
monomer and a dimer zone. There is no significant difference between
the shapes generated between 105 and 106 s1, and thus we can qualitatively conclude that hydrodynamic data for a kinetically mediated dimer
reaction is sensitive to koff values from 103 to 105 s1. These features
Author's personal copy
429
Extracting Equilibrium Constants
A
koff = 0.1
koff = 0.01
koff = 0.001
koff = 0.0001
koff = 0.00001
koff = 0.000001
c(s)
0.6
0.4
0.2
B 0.0
1
0.2
2
g(s)
3
4
0.1
5
6
0.0
2
4
6
8
10
S
Figure 15.4 Simulations of the role of kinetics in a monomer-dimer sedimentation
velocity experiment. Dimerization data were simulated at the Kd, 4 mM, and koff was
allowed to vary from 0.1 to 0.000001 s1. Panel A presents c(s) distributions derived
from all 90 scans and fitting f/fo at a resolution of 0.1 s. Panel B presents g(s) distributions, labeled as 1 through 6, corresponding to fast to slow koff. The curves verify that
both approaches can provide similar visual information about the system, especially
when one fits for f/fo. (Note the apparent noise in these data vs. the data in Fig. 15.1B is
due to different scaling.) At koff ¼ 0.001 s1 the shape of the boundary becomes skewed
and as the off rate decreases the shapes become profoundly bimodal resolving into a
monomer and a dimer zone. There is no significant difference between the shapes
generated between 105 and 106 s1.
are evident in both g(s) distribution and c(s) distributions (if generated with
f/fo fitting and regularization). When comparing distributions at different
loading concentrations it is recommended to plot normalized g(s) distributions (g(s*)/co) to observe coincidence of the curves, which implies lack of
association, or shifting in the distribution, consistent with the presence of
concentration dependence (Gelinas et al., 2004; Stafford, 2009; Figure 15.1B).
Why are velocity data influenced by kinetics? There are two reasons.
First, the centrifuge cell is sector shaped to avoid convection, and thus
during the run there is radial dilution of the sample, which causes some
dissociation of the reactive complexes during the run. Second, sedimentation causes separation and resolution of different hydrodynamic particles
and as they separate the species must reequilibrate to maintain equilibrium.
Author's personal copy
430
John J. Correia and Walter F. Stafford
If the time of reequilibration is fast, then the boundary maintains equilibrium throughout and migrates as a reaction boundary (Cann, 1970). If the
time of reequilibration is slower than the rate of sedimentation, then the
species become fractionated and the boundary shape is kinetically mediated.
Strictly speaking, the ability to resolve boundaries depends on the experimental conditions (Cann and Kegeles, 1974). For example, Fig. 15.5 shows
simulations at the Kd for koff ¼ 0.001 s1 where the runs were done at 40,
50, and 60K rpm. This has been previously expressed in terms of half-time
of dissociation vs. the time of the experiment or of sedimentation (Kegeles
and Cann, 1978). To be consistent with this previous approach, one can use
a kinetic equation derived in Bernasconi (1976) for a monomer-dimer
equilibrium:
ð1=tÞ2 ¼ 8k1 k1 ½co þ ðk1 Þ2 ;
ð15:1Þ
where t is the relaxation time, k1 and k1 are the forward and reverse
rate constants, kon and koff in our nomenclature, and [co] is the initial total
concentration of monomer (Fig. 15.6). Alternatively, Sedanal has a kinetic
calculator that can simulate relaxation kinetics for various models and
0.16
60 K
50 K
40 K
0.14
0.12
g(s)
0.10
0.08
0.06
0.04
0.02
0.00
0
2
4
6
8
10
12
14
S
Figure 15.5 The observed shape of a boundary is influenced by the speed of the
experiment (Cann and Kegeles, 1974). A g(s) analysis is shown for data simulated at
the Kd, 4 mM, with koff ¼ 0.001 s1 and at 40, 50, and 60 K rpm. The bimodality of the
boundary is more evident as the speed of sedimentation exceeds the ability of the system
to reequilibrate. Note that data were chosen so that the peak broadening limits were in a
comparable range (243–254 kD) so as to maintain similar height to width characteristics
of the distributions.
Author's personal copy
431
Extracting Equilibrium Constants
6
0.01
0.001
0.0001
0.00001
(1/t)2 = 8k1k−1[Co] + (k−1)2
Log(t s)
5
4
3
2
1
0.000000
0.000005
0.000010
[Co] M
0.000015
0.000020
Figure 15.6 Relaxation times for dimer dissociation calculated with an approximate
equation derived in Bernasconi (1976). The data are consistent with relaxation times at
4 mM Co varying from 70 s at a koff of 0.005 s1 to 33,000 s for a koff of 105 s1. More
precise estimates can be derived with a kinetic integrator function in Sedanal or with
Kimsim at various monomer-dimer ratios.
species concentrations, and thus allows for estimation of relaxation times.
This is analogous to what one could simulate with a program like KINSIM.
Explicit derivations of relaxation constants in terms of equilibrium and rate
constants are outlined in Bernaconi (1976), and examples for sedimentation
velocity can be found in earlier literature (Cann and Kegeles, 1974, Kegeles
and Cann, 1978).
To explore the ability to extract kinetic parameters from these systems,
sedimentation velocity data for koff equal to 103, 104 and 105 s1 were
simulated for nine loading concentrations from 0.4 to 40 mM and fit by the
direct whole boundary-fitting approach. Results for fits with constrained
s1 and s2 end points are presented in Table 15.2 and an example of the best
fit of koff equal to 104 s1 data (only 2–28 mM data presented) is shown in
Fig. 15.7. Relative to the data in Fig. 15.2, notice the bimodal appearance of
the curves in panels 7A and 7B and the exaggerated skewed peaks in panels
7C through 7F. The K2 values are all accurate to within 3% to 6%, while the
koff values are better determined for the slower cases. Note for the 0.001 s1
case in particular the confidence interval is highly asymmetric towards faster
off rates (Table 15.2 and Fig. 15.8). This suggests kinetic effects will be
evident in the 0.01 to 0.001 s1 regime. Floating s2 in the fits for all three
kinetically mediated cases reveals that s2 is well determined for this span of
Table 15.2 Kinetically mediated monomer-dimer
Model
s1
s2
K2 (M1)
koff (sec1)
rms
koff ¼ 0.001
5.8
9.0
8.971
<8.95,9.00>
8.985
<8.96,9.01>
9.0
3.965 103
<2.41,10.99>
2.19 103
<1.37,4.83>
2.08 103
<1.42,3.66>
1.130 104
<1.08,1.19>
0.899 104
<0.77,1.04>
0.900 104
<0.76,1.06>
0.950 105
<0.82,1.09>
0.726 108
<unbound,0.34 105>
0.950 105
0.00548
5.8
2.66 105
<2.59,2.73>
2.825 105
<2.67,2.99>
3.831 105
<3.12,4.83>
2.584 105
<2.53,2.64>
2.769 105
<2.65,2.90>
2.769 105
<2.62,2.92>
2.595 105
<2.53,2.66>
2.727 105
<2.65,2.88>
2.620 105
<2.55,2.69>
2.723 105
<2.65,2.88>
0.552 109
<unbound,0.40 105>
0.00513
koff ¼ 0.0001
5.236
<4.74,5.61>
5.8
5.8
koff ¼ 0.00001
5.80
<5.71,5.88>
5.8
5.8
5.8
5.804
<5.76,5.85>
8.963
<8.93,8.98>
8.963
<8.94,8.99>
9.0
8.951
<8.94,8.97>
8.995
<8.99,9.00>
8.952
<8.94,8.97>
0.00543
0.00534
0.00527
0.00517
0.00517
0.00539
0.00513
0.00536
Author's personal copy
433
Extracting Equilibrium Constants
ΔC
0.08
A
0.15
B
0.10
0.04
0.05
0.00
ΔC
0.3
C
0.00
0.6
D
0.4
0.2
0.2
0.1
ΔC
0.0
1.0
0.8
0.0
1.5
F
E
0.6
1.2
0.9
0.4
0.6
0.2
0.3
0.0
0.0
6.0
6.2
6.4
6.6
6.8
Radius (cm)
7.0
6.0
6.2
6.4
6.6
6.8
Radius (cm)
7.0
Figure 15.7 A plot of the best Sedanal global fit of kinetically mediated monomerdimer data (2–28 mM data shown in panels A–F) with koff ¼ 0.0001 s1 (Fig. 15.4). The
model is a reversible monomer-dimer equilibrium holding s1 and s2 to the correct
values and fitting for K2 and koff. Data are plotted as Dc, which is the difference between
pairs of scans to remove time independent systematic noise. Superimposed are the data
(black squares) and the best fit (lines) for each cell. The residuals are also plotted and
appear as a noisy trace near 0. The best fitted value for K2 (2.584 105 M1) and koff
(1.130 104 sec1) appear in Table 15.2 with 95% confidence intervals <2.53 105,
2.64 105> and <1.08 104,1.19 104> calculated with an F-stat procedure
available in Sedanal. A second fit allowing s2 to float is also presented in Table 15.2
Notice the correlation between a lower s2 and a larger K2 value as well as the larger
confidence interval that reflects the coupling between the s2 and K2 values.
monomer-dimer data. This is anticipated since s2 was well determined in
the rapidly reversible case (see Table 15.1). As for the rapidly reversible data,
s2 and K2 are coupled with a lower s2, causing a larger K2 value with larger
confidence intervals for K2 (but not necessarily koff) relative to the K2 fit
alone for all three ranges of koff. Interestingly, the best fit koff values for the
103 and 104 s1 cases are closer to the correct values with confidence
intervals that include the correct values when you allow s2 to float. In particular, note the upper error limit of 4.8 103 s1 for the 0.001 s1 data,
which suggests for this monomer-dimer system that 0.005 s1 is also the upper
limit for quantitatively observing kinetic effects. For the 105 s1 data this is
not the case, where floating s2 allows or causes koff to converge on a much
smaller value (0.73 108 s1) with an unbound lower confidence limit.
Author's personal copy
434
John J. Correia and Walter F. Stafford
A
Kkoff fit
Kkoffs2 fit
Kkoffs1s2 fit
1.02
1.01
1.00
0.002
0.004
0.006
0.008
0.010
0.012
0.00008
0.00009
0.00010
0.00011
0.00012
0.00013
B
Fstat
1.06
1.04
1.02
1.00
0.00007
C 1.03
1.02
1.01
1.00
0.000000
0.000003
0.000006
0.000009
0.000012
koff S−1
Figure 15.8 F-statistic values are plotted for the all koff fits reported in Table 15.2
Panels A–C report results for koff values equal to (A) 103, (B) 104 and (C) 105 s1. In
each panel different line-symbols as shown in the upper panel represent fits where K2
and koff were floated (filled squares), where K2, koff and s2 were floated (open squares),
and where K2, koff, s1 and s2 were floated (filled downward triangle). The horizontal
dotted lines represent the target F-stat values consistent with a 95% confidence interval.
There are relatively small changes in the s2 and K2 values, suggesting 105 s1
is the lower limit or slightly beyond the lower limit for quantitatively estimating kinetic parameters for this system when you do not know the end points of
the reaction. To investigate this further, fits were done constraining s1 to 5.8 S
and koff to the best fit value of 0.905 105 s1. The rms of this fit (0.00536)
is close to that of the s2, K2, koff fit (0.00513), and thus outside the confidence
interval, but more important the s2 and K2 values are not significantly changed.
Thus, in this range of koff values (<0.00001 s1) varying koff has no significant
impact on parameter estimates. (It is helpful to do the fitting and F-stat analysis
Author's personal copy
Extracting Equilibrium Constants
435
on log(koff) when koff is in this range and thus poorly determined. The upper
limit of the confidence interval in this case (0.34 105 s1) is three orders of
magnitude greater than the ‘‘best’’ fit value (0.72 108 s1) and thus
searching error space in log units improves the range of the region to be
potentially searched.)
In attempts to also fit for s1 in these three kinetic regimes, we found that
s1 values become better determined as the kinetics slow down, consistent
with emergence of a boundary that sediments with the monomer s value.
This is evident by inspection in Fig. 15.4. Surprisingly, floating s1 has no
significant effect on K2 or koff, either best-fit values or confidence intervals
(Table 15.2 and Fig. 15.8). Note in particular the same upper confidence
limit is obtained for the 105 s1 koff F-stat <, 0.40> when fitting for
s1, s2, and K2. Thus, for this system we conclude that constraining s1 is a
reasonable least squares approach to analyzing the data, and because s2 values
are well determined in all these cases (see Tables 15.1 and 15.2), it further
suggests for analysis of real data the traditional approach of fixing the end
points of the fit (by extrapolation or constraining s2/s1 ratios) is a reasonable
starting point for quantitative analysis. The magnitude of K2 in particular is
coupled to the s1 and s2 values but relatively insensitive to reasonably
constrained end points (i.e., a factor of 2 in Table 15.2). As we have tried
to stress throughout this presentation, these results pertain to both the model
and the data sets or the concentrations spanned in the analysis. Success must
be measured by a combination of reasonable assumptions and systematic
testing of the reliability and impact of each parameter fixed or floated in the
fitting. The total uncertainty should reflect this span of parameters or at least
should include caveats about how parameters were both estimated and
constrained.
For these monomer-dimer systems, which include the range of data sets
analyzed, the range of well-determined koff values occurs between 0.005
and 105 s1, corresponding to relaxation times (at a loading concentration
of the Kd) of 70 to 33,000 s. Examples from the literature that have
applied this approach and measured koff values in this ranges include Zhao
and Beckett (2.7 104 s1; 2008) and Gelinas et al. (1.41 103 s1;
2004). This problem has also been looked at by Dam et al. (2005), who
concluded a slightly narrower range of accessible koff values, 103 to 104
s1 for an A þ B $ C system. Our criteria for deciding what koff values are
measurable are a bit different. Here we focus on direct boundary fitting and
F-stat ranges, while they at times compare rms values and primarily visually
compare c(s) and Lsg(s) traces for different orders of magnitude of koff.
Note in Table 15.1 the rms alone is not necessarily the best criterion of the
best model, although the point we are trying to make pertains to experimental data when you are also trying to constrain parameter guesses.
Nonetheless, in general we both agree that the particular model, the range
of the data, the presence or absence of heterogeneity, and most important
Author's personal copy
436
John J. Correia and Walter F. Stafford
parameter correlations, will influence ones ability to interpret kinetically
mediated sedimentation velocity data. Ultimately the ability to see and
quantify kinetic effects in transport data depends on the relaxation time of
the system relative to the time of the experiment (Cann and Kegeles, 1974;
Kegeles and Cann, 1978; Fig. 15.6). For relaxation times less than 70 s the
system is fast and only the equilibrium constant K2 but not koff can be
uniquely determined. For relaxation times greater than 33,000 s the
system is irreversibly slow and, assuming the system was at initial equilibrium before the start of the run, only the equilibrium constant K but not koff
can be uniquely determined (see further examples in Stafford, 2000; Dam
et al., 2005).2 For this monomer-dimer system koff values occurring
between 0.005 and 105 s1 are quantitatively measurable (Table 15.2).
Our analysis has also determined that dimer sedimentation coefficients are
much better determined than monomer values. This is probably influenced
by the signal-to-noise in the data at higher concentrations, and we anticipate that the use of interference or fluorescence data with more uniform
signal-to-noise might improve this situation. Monomer s values only
become well determined for these cases in the limit of slow kinetics coinciding with the emergence of a monomer zone in the reacting boundary.
For the same concentration range of the data, as Kd gets larger s1 should
become better determined. Finally, it is worth stating that for real data sets,
simulating the best-fitted values over the same concentration ranges as the
experimental data and then reanalyzing the simulated data serves as a useful
test of the reliability of the experimental conclusions and the constraints
imposed upon the analysis.
5. A Stepwise Approach
Here we have presented a brief tutorial on the direct boundary fitting
of sedimentation velocity data, including the effects of slow kinetics for a
monomer-dimer system. We have stressed the role of parameter estimation
and cross-correlation effects between parameters. We have related the
fitting process back to experimental situations in which one may not
2
We can relate this situation back to the analysis of a single 4 mM data set. At a koff 105 s1 we have
approached a situation that resembles two noninteracting components. Yet one may still attempt to fit the
data for a K assuming they are reversibly interacting. In this case we get K ¼ 2.47 105 M1 <2.24 105,2.74 105> and koff ¼ 0.912 105 s1 <unbound, 2.38 105>, and yet we would have no
experimental evidence of reversible association unless we repeated the run at a different concentration. You
can also fit this single data set to a two component model A + B to extract molecular weights (M1 ¼ 48,990
<40403, 59350>; M2 ¼ 107,029 <84357, 138531>) where you have assumed, not proved, two noninteracting components. Fitting multiple data sets would give better confidence intervals but also reveal a
concentration dependent shift in the ratio of dimer to monomer and thus prove a reversible association
model.
Author's personal copy
Extracting Equilibrium Constants
437
know the correct model or correct end points for the reaction. An overview
of this approach includes the following steps.
1. Collect velocity data over a wide concentration range. One may need
to return to experiments if the fitting demonstrates a still wider concentration range is required. We typically stress a range of association from
10% to 90% for binding data, but some cases may clearly require more
extreme extents of reaction to nail down parameters in the best fit sense.
2. Initially analyze all data by generating g(s) and/or c(s) distributions.
Follow recommended protocols for these analyses. For g(s), use scans
late in the run to maximize species resolution and minimize the total
number of scans to enhance the peak broadening limits without sacrificing the advantages of averaging over many pairs of scans (Philo, 2006;
Stafford, 1992). Try to use a consistent range of scans (i.e., the same range
of o2t), giving a consistent value for the peak broadening limit, referred
to as the maximum molar mass in Sedanal. This will maintain a constant
height-to-width characteristic in the family of g(s) distributions and thus
provide better superposition for monodisperse distributions or a smoother
appearance to any concentration dependent transitions (Figure 15.1B).
For c(s), the best results are obtained if you allow the smallest species to
pellet and include data from the full range of scans (Schuck, 1998). One
advantage of c(s) is the ability to see small amounts of larger aggregates in
the system. Another case where c(s) excels appears to be with antibody
samples where dissociated chains and higher-order cross-linked aggregates
(dimer, trimer, tetramer, and above) are observed in significant amounts
along with the main antibody complex (Arakawa et al., 2006). This can be
very useful for model building.
3. Plot the family of distributions to generate a hypothesis about the behavior of the system. For small changes or subtle changes in the distribution
shape, it helps to normalize the distributions by dividing by the concentration (the signal in appropriate units or the area under the distributions
see Figure 15.1B). For noninteracting systems the normalized data (g(s)/
co) should superimpose (see Figure 1 in Gelinas et al., 2004). One does
not generally expect c(s) distributions to perfectly overlay because of the
nature of the fitting function, but the same general features are typically
evident. For self-associating systems the distributions should shift by mass
action to larger species as the loading concentration is increased (see
Figure 3 in Gelinas et al., 2004) unless there is strong nonideal behavior,
in which case the nonideality must be taken into account in the model
and the fitting described subsequently (Stafford, 2009).
4. Plotting Sw vs concentration helps to establish the range of sedimentation coefficients observed and thus the changes that are occurring in the
underlying species distribution. Be warned that g(s) distributions and
Sw values can be very misleading about the actual species present in the
Author's personal copy
438
John J. Correia and Walter F. Stafford
reacting boundary (Stafford, 2009). At this stage a hypothesis about an
appropriate model is required to proceed to direct boundary fitting.
Most software packages come with preloaded models of various types
(noninteracting and interacting, definite and indefinite association).
Sedanal offers ModelEditor, which allows the user to build any model
desired, up to 25 reactions, 26 species, and 10 components, in addition
to the isodesmic type models. Typically, users begin with their
preconceived ideas about the system.
5. For noninteracting systems, global fitting to molecular weight should
quickly reveal monodisperse behavior (i.e., a monomer or a dimer).
Multiple peaks or overlapping peaks requires two component or three
component noninteracting models. For small amounts of large aggregate choose scans late in the run to exclude those aggregates from the
fitting. (F-stat analysis on these minor components typically reveals very
poorly determined s values for small amounts of aggregation.) In the
therapeutic proteins industry analysis of aggregation is an important and
central aspect of the study as aggregates can be the cause of therapeutic
instability and undesired side effects (Arakawa, 2006). For interacting
systems, we mostly try to exclude or account for aggregates in the model
so that we can extract stoichiometric and thermodynamic information
about the system.
6. For a simple monomer-dimer system, the boundary should skew to the
faster sedimenting species as concentration is raised. It helps to recognize the expected shape of the distribution for different models (see
Figs. 15.1B and 15.4) and simulations are useful to establish the
expected behavior. As described previously, one expects s2 to be
s1*22/3. If Sw exceeds this value, then consider higher-order reaction
schemes. In a two species plot this generates upward curvature (Roark
and Yphantis, 1969).
7. We have said nothing about calculating extinction coefficients or the
buoyancy terms, (1 nr) for two component systems or density increment, (@r/@c2)m, for multiple component systems. In the simulations
shown above these values were based on typical data for tubulin, 1.2 ml/
mg/cm and a reasonable (1 nr) value for proteins, 0.257. (Sedanal
defines the buoyancy term as density increment in the ModelEditor.)
These values can be different for each data set, especially for different
wavelengths or path lengths or for heterogeneous multicomponent systems (see step 8 subsequently). Most users seem to estimate these values
from amino acid compositions using a program like Sednterp. When
doing runs on highly charged macromolecules or in the presence of
osmolytes preferential interaction effects or osmotic stress effects can
influence these values dramatically (Eisenberg, 2002). For these cases,
direct measurement of density, partial specific volume, or density increment with an Anton Paar Density Meter (DMA 5000) is preferable.
Author's personal copy
Extracting Equilibrium Constants
439
In fact our experience is that even when comparing different batches of
buffers, it can make a difference in the best fitted s values obtained when
doing global direct boundary fitting. This can show up as slight systematic
differences between the different sets of data. Each experimental situation
and system should dictate what is or is not necessary. A typical inverse use
of Sedanal for monodisperse systems is to constrain the molar mass to the
correct value, when known, and fit for the density increment just to be
sure it is in a reasonable range of values. Extinction coefficients, density
and buoyancy values must be known or assumed to start fitting data, so
compile information early.
8. Estimation of extinction coefficients from amino acid composition
works very well at 280 nm (Laue et al., 1992; Pace et al., 1995). But
what do you do when you collect data at 290 nm or 230 nm, or better
yet at the minima typically found near 250 nm? Many users collect a
wavelength scan and then determine the absorbance ratio between 280
and the wavelength of the unknown extinction. However, this is not
typically of sufficient accuracy for direct boundary fitting. A direct
approach for a noninteracting system is to acquire and fit data simultaneously from both wavelengths constraining the 280-nm extinction
value and floating the unknown extinction value while linking the
fitted concentration in each cell. Do this over multiple samples and
take an average value for use in subsequent analysis. For proteins labeled
chemically or with a GFP construct, this procedure may have to be
repeated for every new preparation due to differences in labeling
stoichiometry or GFP-folding efficiency.
9. Start with reasonable parameter estimates and constrain as many parameters as possible to begin direct boundary fitting. This was done earlier
by fitting for K2 while constraining s1 and s2, even if they are only
guesses. Current versions of Sedanal allows up to 32 data sets for velocity
analysis. It is often sufficient to start with 3 to 6 data sets to speed up the
fitting and get an initial verification that the model is appropriate.
Starting with 200 points in the grid between the meniscus and the base
of the cell also speeds up the fitting. This should be increased later to 400
or 800 points to improve the resolution of parameter estimation. All the
fits for Tables 15.1 and 15.2 were done with 800 points in each grid.
Note however, for some models that generate high gradients near the
meniscus and base regions of the cell the number of points is a critical
feature. For example, the isodesmic model often requires 2400 points
with more point density near the meniscus and still more point density in
the base region (Sontag et al., 2004). (Grid spacing is under Claverie
control in the ‘‘Advanced. . .’’ window.)
10. Good fits generate random residuals with rms noise levels in the range of
0.003 to 0.008 a.u. for absorbance data, depending upon the condition
of the optics, the wavelength and the path length of cell. At this stage
Author's personal copy
440
John J. Correia and Walter F. Stafford
adding more data sets or increasing the number of points in the grid will
improve the stability of the numerical analysis. If there are not enough
points or an inappropriate point density for a fit, a grid ‘‘Check Grid’’
error message will appear on the fit window screen.
11. Confidence intervals can be generated in three ways (found under error
estimation control in the ‘‘Advanced. . .’’ window): (1) Monte Carlo,
which is performed on simulated data, (2) bootstrap with replacement,
and (3) F-statistic analysis. For example the fit of the single 4-mM data
set described earlier was repeated with bootstrap and generated a mean
of 2.455 105 M1 with a standard deviation of 0.01 105. Recall the
F-stat result was 2.456 105 M1 with a confidence interval of <2.29,
2.64> or 0.35 105. An F-stat analysis forces the fitting into regions
of parameter error space that are more sensitive to cross-correlation
effects typically not explored by bootstrap or Monte Carlo methods.3
An F-statistic for a particular control file is activated under the error
estimation control button but must also be activated for each parameter
you want to do statistics on by a shift-left-click with the cursor in that
parameter box. F-stat analysis can be very slow, as it can take up to
32 consecutive fits (default maxima for error searching is 16 fits below
and 16 fits above the best fit value) to explore error space above and
below the best-fitted value. For this reason we usually do one parameter
F-stat at a time and set up concurrent fits for different parameters on the
same computer or different computers. For example, some of the F-stats
in Table 15.2 fits were performed on s1, s2, K2, and koff, thus requiring
four separate searches of error space. Done concurrently as four separate
fits can take up to 24 h, even on a fast computer; done consecutively,
this might take 2–3 days. To run different fits on different computers
involves copying 9 data sets, a control file (which defines the fit in the
control window in Sedanal), and a Modelinfo.txt file. These can all be
conveniently transferred between computers by clicking on Package on
the Control screen; this creates a zip file containing of all the required
files for an analysis. This zip file can be loaded on several different
machines to carry out the F-stat analysis. (Poor man’s parallel processing.) We are currently working on parallelizing SEDANAL as much as
possible to run on multiprocessor PCs.
12. Upon obtaining good fits of the data, one can release constraints on
limited numbers of parameters and then investigate their uncertainties.
3
Note that Monte Carlo and bootstrap (with or without replacement) methods have historically been used to
estimate error bars by simulating and fitting many different noise perturbed data sets or by fitting select or
limited regions of an experimental data set. In general, F-stat approaches, first applied to sedimentation
equilibrium data analysis in the program Nonlin ( Johnson, et al., 1981), give much larger and more
asymmetric estimates of error bars than these other methods. One can think of a Monte Carlo or bootstrap
method as exploring the size of the minima in the error space and not the width at the F-stat limit. Parameter
correlation effects account for the major differences between the approaches.
Author's personal copy
Extracting Equilibrium Constants
441
In the monomer-dimer examples above, after fitting K2 or K2 and koff
we next floated s2 and then s1 and s2. Because s2 was well determined, it
forced us to trust those estimates rather than impose some other constraint on the system. Alternatively, as s1 was much less well determined,
one might consider fixing s2 and constraining s1 by an n2/3 rule. The
rms of the fit is the ultimate determinant of the best fit, although one
prefers to avoid parameter values that contradict hydrodynamic principles (e.g., having species sediment faster than physically possible for
their size, shape, and partial specific volume). In Sednterp this shows up
as s > smax error messages. (In Sednterp, the inquiring user can type in
the molecular weight of a protein or proposed complex and then
predict the s value and, under Results, look at the axial ratio required
to generate this hydrodynamic behavior.)
13. What can go wrong? Nonrandom residuals imply the wrong model at
worst or incorrect parameters for the correct model. A lot of computer
work usually has to be done to reject a model. One must first explore
parameter values; therefore, changing the guess for both fitted and fixed
parameters is a good starting place. Be systematic, using the flow chart
described previously for sedimentation coefficients as an example. In
the well-behaved cases shown earlier, all fits were relatively reasonable,
even for incorrect s1 values. Ideally, convergence of the fit to the correct
values improves the goodness of fit and the confidence intervals verify
the assignments. As previously, the uncertainty in these cases requires a
realistic evaluation of the confidence intervals. Furthermore, sedimentation velocity may not be the best way to measure kinetics for many
systems, although we suggest when self-association or macromolecular
interactions are involved that sedimentation studies are certainly an
extremely useful technique for complementing other kinetic methods
(see Eccleston et al., 2008).
14. What if the residuals are nonrandom and nothing seems to improve
them? The best case scenario is that the model is basically correct (e.g.,
monomer-dimer) but there is heterogeneity in the sample. Maybe there
is incompetent monomer (Snyder et al., 2004) or aggregated or disulfide
cross-linked dimer present. This can be seen as a trailing or leading
deviation in the residuals. This is where ModelEditor is very useful.
Make a monomer-dimer model with a second component that is
noninteracting (2A $A2; and A0 ) and refit the data. If there are deviations in the residuals on the lower s side of the sedimentation boundary,
then assume it is a monomer; if the deviations are on the higher s side,
assume it is a dimer; and fit for s initially assuming the concentration of
A0 is a global parameter present at the same fractional amounts in all
samples. This constraint can be released later. Then float the molecular
weight to verify the identity of the contaminant. A more complex case
might involve heterogeneity in K2, where different isoforms have
Author's personal copy
442
John J. Correia and Walter F. Stafford
different association constants, or sample aging produces a distribution
of affinities. All of these situations show up as a systematic variation in
K2 with the affinity decreasing with increasing loading concentration
(Yphantis et al., 1978; Xu, 2004).
6. Final Thoughts
This simple tutorial is hopefully a useful tool for outlining how to
analyze velocity data collected in the analytical ultracentrifuge by direct
boundary fitting. As the system becomes more complex, simple approaches
still apply, although each additional component must be studied with the
same care and precision. The more components, the more species, the more
parameters, the more cross-correlation, and the more difficult it is to arrive
at a single simple solution. (This is where 32 data sets may become important.) Constraining parameters becomes more critical. Systems biology
approaches can apparently simulate anything but the uniqueness of the
parameter set remains a challenging open question (Alon, 2007).
Use the literature, and learn from the past. Our ongoing research
projects include Caþ2-mediated nonideal dimerization, heteroassociations
(1:1 and 2:1), antibody-antigen interactions, phosphorylation-dependent
transcription factor associations, ligand-mediated oligomerization, and
indefinite associations. All of these systems are being approached with direct
boundary-fitting methods. The strategies described here apply. There is
extensive mention in data analysis literature to the best model or the
simplest model that describes the data. In the AUC field this has come to
be known as the most parsimonious model applying the principles of
Occam’s razor (Brookes and Demeler, 2008; Brown et al., 2007), which is
based on the idea that the simplest model (the model with the fewest
assumptions) that fits the data (in the rms sense) is the best model. If more
parameters do not improve the fit, then exclude them (shave them away as
with a razor), ignore them, or constrain them in some reasonable way. This
was the case in the equilibrium and kinetically mediated data sets where s1
had a surprisingly small impact on the other parameters or on the goodness
of fit. For these data, K2 and koff were fairly well determined. K2 was
consistently within a factor of two of the correct answer, while koff was
typically within the error bars of the uncertainty, with caveats about
parameter correlations and not being to close to the edges of the 0.005 to
1.0 105 s1 experimentally measurable regime. May this success be your
good fortune in many of the systems you study as well.
Author's personal copy
Extracting Equilibrium Constants
443
ACKNOWLEDGMENTS
We thank our collaborators and their and our experimental systems that over the years have
challenged and taught us how to problem solve with AUC data. We thank Nichola Garbett,
David Dignam, and Jim Cole for reading earlier versions of this chapter. The authors regret
the need to be selective in referencing and apologize for all the omissions necessitated by the
ever-expanding work in this field.
REFERENCES
Alon, U. (2007). ‘‘An introduction to systems biology: Design principles of biological
circuits.’’ Chapman & Hall/CRC Mathematical and Computational Biology Series,
London.
Arakawa, T., Philo, J. S., Ejima, D., Tsumoto, K., and Arisaka, F. (2007). ‘‘Aggregation
analysis of therapeutic proteins, part 2.’’ Bioprocess International 5, 36–47.
Belford, G. G., and Belford, R. L. (1962). ‘‘Sedimentation in chemically reacting systems. II.
Numerical calculations for dimerization.’’ J. Chem Phys. 37, 1926–1932.
Bernasconi, C. F. (1976). ‘‘Relaxation kinetics’’ pp. 14–15. Academic Press, New York.
Brookes, E., and Demeler, B. (2007). ‘‘Parsimonious regularization using genetic algorithms
applied to the analysis of analytical ultracentrifugation experiments.’’ GECCO Proc.
ACM. 978-1-59593-697.
Brown, P., Balbo, A., and Schuck, P. (2007). Using prior knowledge in the determination of
macromolecular size-distributions by analytical ultracentrifugation. Biomacromol. 8,
2011–2024.
Byron, O. (2008). ‘‘Hydrodynamic modeling: The solution conformation of macromolecules and their complexes.’’ Methods Cell Biol. 84, 327–373.
Cann, J. R. (1970). ‘‘Interacting macromolecules: The theory and practice of their electrophoresis, ultracentrifugation, and chromatography.’’ Academic Press, New York.
Can, J. R, and Kegele, G (1974). Theory of sedimentation for kinetically controlled
dimerization reaction. Biochemistry 13, 1868–1874.
Cann, J. R. (1978a). Measurement of protein interactions mediated by small molecules using
sedimentation velocity. Methods Enzymol. 48, 242–248.
Cann, J. R. (1978b). Ligand binding by associating sytems. Methods Enzymol. 48, 299–307.
Chen, W., Lam, S. S., Srinath, H., Jiang, Z., Correia, J. J., Schiffer, C. A., Fitzgerald, K. A.,
Lin, K., and Royer, W. E., Jr. (2008). Interferon regulatory factor activation revealed by
the crystal structure of dimeric IRF-5. Nature Struct. & Mol. Biol. 15, 1213–1220.
Claverie, J. M. (1976). Sedimentation of generalized systems of interacting particles III.
Concentration dependent sedimentation and extension to other transport methods.
Biopolymers 15, 843–857.
Claverie, J. M., Dreux, H., and Cohen, R. (1975). Sedimentation of generalized systems of
interacting particles. I. Solution of systems of complete Lamm equations. Biopolymers 14,
1685–1700.
Correia, J. J. (2000). ‘‘The analysis of weight average sedimentation data.’’ Methods in
Enzymol. 321, 81–100.
Correia, J. J., Chacko, B. M., Lam, S. S., and Lin, K. (2001). Sedimentation studies reveal
a direct role of phosphorylation in Smad3:Smad4 homo- and hetero-trimerization.
Biochemistry 40, 1473–1482.
Correia, J. J., Gilbert, S. P., Moyer, M. L., and Johnson, K. A. (1995). Sedimentation studies
on the kinesin head domain constructs K401, K366 and K341. Biochemistry 34,
4898–4907.
Author's personal copy
444
John J. Correia and Walter F. Stafford
Correia, J. J., Johnson, M. L., Weiss, G. H., and Yphantis, D. A. (1976). Numerical study of
the Johnson–Ogston effect in two component systems. Biophysical Chem. 5, 255–264.
Cox, D. J. (1978). Calculation of simulated sedimentation velocity profiles for selfassociating solutes. Methods Enzymol. 48, 212–242.
Dam, J., Velikovsky, C. A., Mariuzza, R. A., Urbanke, C., and Schuck, P. (2005).
Sedimentation velocity analysis of heterogeneous protein-protein interactions: Lamm
equation modeling and sedimentation coefficient distributions c(s). Biophys. J. 89,
619–634.
Demeler, B., and Saber, H. (1998). Determination of molecular parameters by fitting
sedimentation data to finite-element solutions of the Lamm equation. Biophys. J. 74,
444–454.
Dishon, M., Weiss, G. H., and Yphantis, D. A. (1966). Numerical solutions of the Lamm
equation. I. Numerical procedure. Biopolymers 4, 449–456.
Dishon, M., Weiss, G. H., and Yphantis, D. A. (1967). Numerical solutions of the Lamm
equation. III. Velocity centrifugation. Biopolymers 5, 697–713.
Eccleston, J. F., Martin, S. R., and Schilstra, M. J. (2008). Rapid kinetic techniques. Methods
Cell Biol. 84, 445–477.
Eisenberg, H. (2002). Modern analytical ultracentrifugation in protein science: Look forward, not back. Protein Sci. 11, 2647–2649.
Gelinas, A. D., Toth, J., Bethoney, K. A., Stafford, W. F., and Harrison, C. J. (2004).
Mutational analysis of the energetics of the GrpEDnaK binding interface: Equilibrium
association constants by sedimentation velocity analytical ultracentrifugation. J. Mol. Biol.
339, 447–458.
Gilbert, G. A. (1955). General discussion. Discuss Faraday Soc. 20, 65–77.
Gilbert, G. A. (1959). Sedimentation and electrophoresis of interacting substances. 1.
Idealized boundary shape for a single substance aggregating reversibly. Proc. Roy. Soc.
(London) A250, 377–388.
Gilbert, G. A. (1960). Concentration-dependent sedimentation of aggregating proteins in
the ultracentrifuge. Nature 186, 882–883.
Gilbert, G. A., and Jenkins, R. C. Ll. (1959). Sedimentation and electrophoresis of interacting substances. II. Asymptotic boundary shape for two substances interacting reversibly. Proc. Royal Soc. London Ser. A 253, 420–437.
Gilbert, L. M., and Gilbert, G. A. (1978). Molecular transport of reversibly reacting systems:
Asymptotic boundary profiles in sedimentation, electrophoresis, and chromatography.
Methods Enzymol. 48, 195–213.
Johnson, M. L., Correia, J. J., Halvorson, H., and Yphantis, D. A. (1981). Analysis of data
from the analytical ultracentrifuge by nonlinear least-squares techniques. Biophysical J. 36,
575–588.
Kegeles, G. (1978). Pressure-jump light-scattering observations of macromolecular interaction kinetics. Methods Enzymol. 48, 308–320.
Kegeles, G., and Cann, J. (1978). Kinetically controlled mass transport of associatingdossociating macromolecules. Methods Enzymol. 48, 248–270.
Kegeles, G., Rhodes, L., and Bethune, J. L. (1967). Sedimentation behavior of chemically
reacting systems. PNAS 58, 45–51.
Laue, T. M., Shah, B. D., Ridgeway, T. M., and Pelletier, S. L. (1992). Computer-aided
interpretation of analytical sedimentation data for proteins. In ‘‘Analytical ultracentrifugation in biochemistry and polymer science’’ (S. E. Harding, et al., eds.), pp. 90–125.
Royal Society of Chemistry, Cambridge, UK.
Nelder, J. A., and Mead, R. (1965). A simplex method for function minimization. Comput. J.
7, 308–313.
Author's personal copy
Extracting Equilibrium Constants
445
Oberhauser, D. F., Bethune, J. L., and Kegeles, G. (1965). Countercurrent distibutions of
chemically reacting systems: IV. Kinectically controlled dimerization in a boundary.
Biochemistry 4, 1878–1884.
Pace, C. N., Vajdos, F., Fee, L., Grimsley, G., and Gray, T. (1995). How to measure and
predict the molar absorption coefficient of a protein. Prot. Sci 4, 2411–2423.
Philo, J. S. (2000). A method for directly fitting the time derivative of sedimentation velocity
data and an alternative algorithm for calculating sedimentation coefficient distribution
functions. Anal. Biochem. 279, 151–163.
Philo, J. S. (2006). Improved methods for fitting sedimentation coefficient distributions
derived by time-derivative techniques. Anal. Biochem. 354, 238–246.
Roark, D. E., and Yphantis, D. A. (1969). Studies of self-associating systems by equilibrium
ultracentrifugation. Ann. N.Y. Acad. Sci. 164, 245–278.
Schuck, P. (1998). Sedimentation analysis of noninteracting and self-associating solutes using
numerical solutions to the Lamm equation. Biophys. J. 75, 1503–1512.
Schuck, P., and Demeler, B. (1999). Direct sedimentation analysis of interference optical
data in analytical ultracentrifugation. Biophys. J. 76, 2288–2296.
Snyder, D., Lary, J., Chen, Y., Gollnick, P., and Cole, J. L. (2004). Interaction of the trp
RNA-binding attenuation protein (TRAP) with anti-TRAP. J. Mol. Biol. 338, 669–662.
Sontag, C. A., Stafford, W. F., and Correia, J. J. (2004). A comparison of weight average and
direct boundary fitting of sedimentation velocity data for indefinite polymerizing systems.
Biophys. Chem. 108, 215–230.
Sophianopoulos, A. J., and van Holde, K. E. (1964). Physical studies of muramidase
(lysozyme) II. pH-dependent dimerization. J. Biol. Chem. 239, 2516–2524.
Stafford, W. F. (1980). Graphical analysis of nonideal monomer, N-mer, isodesmic, and
type II indefinite self-associating systems by equilibrium ultracentrifugation. Biophys. J.
29, 149–166.
Stafford, W. F. (1992). Boundary analysis in sedimentation transport experiments: A procedure for obtaining sedimentation coefficient distributions using the time derivative of the
concentration profile. Anal. Biochem. 203, 295–301.
Stafford, W. F. (1998). Time difference sedimentation velocity analysis of rapidly reversible
interacting systems: Determination of equilibrium constants by non-linear curve fitting
procedures. Biophys. J. 74, A301.
Stafford, W. F. (2000). Analysis of reversibly interacting macromolecular systems by time
derivative sedimentation velocity. Methods Enzymol. 323, 302–325.
Stafford, W. F. (2009). Protein-protein and ligand-protein interactions studied by analytical
ultracentrifugation. In ‘‘Protein structure, stability, and interactions,’’ ( J. W. Schriver,
ed.), vol. 490, pp. 83–113. Humana Press, New York.
Stafford, W. F., and Sherwood, P. J. (2004). Analysis of heterologous interacting systems by
sedimentation velocity: Curve fitting algorithms for estimation of sedimentation coefficients, equilibrium and kinetic constants. Biophys. Chem. 108, 231–243.
Svedberg, T., and Nichols, J. B. (1927). The application of the oil turbine type of ultracentrifuge to the study of the stability region of CO-hemoglobin. J. Am. Chem. Soc. 49,
2920–2934.
Tai, M., and Kegeles, G. (1984). A micelle model for the sedimentation behavior of bovine
beta-casein. Biophys. Chem. 20, 81–87.
Todd, G. P., and Haschemeyer, R. H. (1981). General solution to the inverse problem of the
differential equation of the ultracentrifuge. PNAS 78, 6739–6743.
van Holde, K. E. (1962). Sedimentation in chemically reacting systems. I. The isomerization
reaction. J. Chem. Phys. 37, 1922–1926.
van Holde, K. E. (2004). Sedimentation equilibrium and the foundations of protein chemistry. Biophys. Chem. 108, 5–8.
Author's personal copy
446
John J. Correia and Walter F. Stafford
Xu, Y. (2004). Characterization of macromolecular heterogeneity by equilibrium sedimentation techniques. Biophys. Chem. 108, 141–163.
Yphantis, D. A., and Roark, D. E. (1972). Equilibrium centrifugation of nonideal systems.
Molecular weight moments for removing the effects of nonideality. Biochemistry 11,
2925–2934.
Yphantis, D. A., Correia, J. J., Johnson, M. L., and Wu, G. M. (1978). ‘‘Detection
of Heterogeneity in Self Associating Systems,’’ In ‘‘Physical Aspects of Protein Interactions,’’ N. Catsimpoolas, ed. Elsevier, New York, pp. 275–303.
Zhao, H., and Beckett, D. (2008). Kinetic partitioning between alternative protein-protein
interactions controls a transcriptional switch. J. Mol. Biol. 380, 223–236.