Efficient Use of Nonequilibrium Measurement to Estimate Free Energy Differences for Molecular Systems F. MARTY YTREBERG and DANIEL M. ZUCKERMAN Center for Computational Biology and Bioinformatics, University of Pittsburgh, 200 Lothrop St., Pittsburgh, Pennsylvania 15261 Received 26 February 2004; Accepted 4 June 2004 DOI 10.1002/jcc.20103 Published online in Wiley InterScience (www.interscience.wiley.com). Abstract: A promising method for calculating free energy differences ⌬F is to generate nonequilibrium data via “fast-growth” simulations or by experiments—and then use Jarzynski’s equality. However, a difficulty with using Jarzynski’s equality is that ⌬F estimates converge very slowly and unreliably due to the nonlinear nature of the calculation—thus requiring large, costly data sets. The purpose of the work presented here is to determine the best estimate for ⌬F given a (finite) set of work values previously generated by simulation or experiment. Exploiting statistical properties of Jarzynski’s equality, we present two fully automated analyses of nonequilibrium data from a toy model, and various simulated molecular systems. Both schemes remove at least several k B T of bias from ⌬F estimates, compared to direct application of Jarzynski’s equality, for modest sized data sets (100 work values), in all tested systems. Results from one of the new methods suggest that good estimates of ⌬F can be obtained using 5– 40-fold less data than was previously possible. Extending previous work, the new results exploit the systematic behavior of bias due to finite sample size. A key innovation is better use of the more statistically reliable information available from the raw data. © 2004 Wiley Periodicals, Inc. J Comput Chem 25: 1749 –1759, 2004 Key words: free energy; Jarzynski relation; extrapolation; non-equilibrium; block averages Introduction The calculation of free energy differences ⌬F plays an essential role in many fields of physics, chemistry, and biology.1–20 Examples include determination of the solubility of small molecules, and binding affinities of ligands to proteins. Rapid and reliable estimates of ⌬F would be particularly valuable to structure-based drug design, where current approaches to virtual screening of candidate compounds rely primarily on ad hoc methods.21,22 Free energy estimates are also critical for protein engineering.23,24 The focus of this report is nonequilibrium “fast-growth” free energy calculations.17–20,25 These methods hold promise—yet to be fully realized—for very rapid estimation of ⌬F. The central idea behind the nonequilibrium methods is to calculate the irreversible work during a very rapid (thus nonequilibrium) switch between the two systems or states of interest. Multiple switches are performed, and the resulting set of work values can be used to estimate ⌬F via Jarzynski’s equality (detailed later).17,18 Recent path sampling approaches to nonequilibrium ⌬F calculations are, we note, extremely encouraging,26,27 although currently unproven for large molecular systems. Somewhat surprisingly, nonequilibrium ⌬F calculations are critical for analyzing single-molecule pulling experiments.1,8 In essence, these experiments generate nonequilibrium work values, as pointed out by Hummer and Szabo, so the only way to estimate the free energy profile is to use Jarzynski’s equality.8,28 The methods that we develop in this report should be equally useful for analyzing such experiments. It has been accepted for some time that there are three sources of error29 for nonequilibrium ⌬F calculation via molecular mechanics simulation: (1) inaccuracy of the force field,6,30 (2) inadequate sampling of the configurational space,31–35 and (3) bias due to finite sample size.18,25,36 –39 Indeed, errors in free energy calculations have been of long-standing interest.40 – 44 Correspondence to: F. M. Ytreberg; e-mail: [email protected]; D. M. Zuckerman; e-mail: [email protected] Contract/grant sponsor: Department of Environmental and Occupational Health, and the Center for Computational Biology and Bioinformatics at the University of Pittsburgh. Contract/grant sponsor: National Institutes of Health; Contract/Grant number: T32 ES007318. © 2004 Wiley Periodicals, Inc. 1750 Ytreberg and Zuckerman • Vol. 25, No. 14 Journal of Computational Chemistry • Table 1. Features of the Test Systems Used in the Current Study, as Well as the Sources of the Independent Estimates ⌬F ind. Test system/source of ⌬F ind Switching between two one-dimensional harmonic potentials in 100 -steps with 1 relaxation step at each value of . ⌬F ind is the exact ⌬F obtained numerically. Chemical potential calculation for a Lennard– Jones fluid in 1 -step. This corresponds to instantaneous switching or single-stage free energy perturbation. ⌬F ind is a slow-growth estimate.20 Growing a chloride ion by 1.0 Å in water in 10 -steps with 1 relaxation step at each value of . ⌬F ind is a slow-growth estimate. Methanol-to-ethane mutation in water using 200 -steps with 1 relaxation step at each value of . ⌬F ind is from multistage free energy perturbation.25 Palmitic-to-stearic acid mutation in water using 55 -steps with 10 relaxation steps at each value of .67 ⌬F ind is from another fast-growth data set using 55 -steps with 40 relaxation steps. W 具W典 N tot Skew ⌬F all ⌬F ind 30,000 29.5 k B T 14.4 k B T 0.52 0.7 k B T 0.7 k B T 100,000 305.1 k B T 83.5 k B T ⫺0.59 0.7 k B T 1.2 k B T 40,000 40.1 kcal/mol 8.6 kcal/mol 0.37 18.4 kcal/mol 16.9 kcal/mol 9,600 37.0 kcal/mol 12.3 kcal/mol 0.38 7.4 kcal/mol 5.3 kcal/mol 20,000 28.6 kcal/mol 7.5 kcal/mol 1.41 15.2 kcal/mol 13.6 kcal/mol The first column gives simulation details of the systems under consideration. The next three columns show statistical features of each data set: total number of work values N tot, the mean work 具W典, the standard deviation W , and the skew. The fifth column is the standard Jarzynski estimate for a data set, ⌬F all, obtained by using Equation (5) with all N tot work values. The last column is an independent measure of the free energy ⌬F ind using a non-fast-growth method, in all cases but one. The present study addresses only source (3), and attempts to determine the most efficient use of fast-growth work values. In other words, given a (finite) set of work values generated by simulation or experiment, what is the best estimate for ⌬F? We do not here attempt to prescribe the best method for generating nonequilibrium work values. Methods to lessen the effect of bias due to finite sample size have been proposed for the case when switching between two states is performed in both directions,45– 47 and for the idealized case in which the nonequilibrium work values follow a nearly Gaussian distribution.7,48 Hummer also considered errors in nonequilibrium ⌬F calculations.19 To our knowledge, however, other workers have not addressed unidirectional switching in highly non-Gaussian systems. The techniques outlined in the following sections offer rapid estimates for ⌬F for the systems we studied—namely, a onedimensional “toy” system switched between two harmonic potentials, the chemical potential for a Lennard–Jones fluid, “growing” a chloride ion in water, and the relative solubilities of methanol and ethane, as well as of palmitic and stearic acids. Work values for these systems follow highly non-Gaussian distributions; (see Table 1). We compare our extrapolated results to ⌬F obtained by using Jarzynski’s equality, finding a 5– 40-fold decrease in the the number of work values needed to estimate ⌬F for the test systems considered here. We proceed by first introducing a modified block averaging technique, building on the original proposal by Wood et al.49 Block averaging provides well-behaved, but biased, ⌬F estimates. We then discuss two distinct schemes for extrapolating to the “infinite data limit.” Both methods combine rigorous and ad hoc approaches, but are guaranteed to converge to the correct ⌬F as the size/quality of the data set increases. Our work systematizes and extends previous work by Zuckerman and Woolf,25 who originally proposed the use of extrapolation via block averages. Fast-Growth Nonequilibrium, fast-growth techniques have been described in detail elsewhere,17–19,25 so we will simply outline the method. Consider two systems defined by potential energy functions U 0 and U 1 . A typical case is when the two potentials describe distinct chemical species, as for solubility or binding affinity calculations, via an appropriate thermodynamic cycle.50 To calculate the free energy difference ⌬F between these two systems, one must “switch” the system from U 0 to U 1 , which is most simply done via a switching parameter in the expression U 共x兲 ⫽ U0 共x兲 ⫹ 关U1 共x兲 ⫺ U0 共x兲兴, (1) where x is a set of configurational coordinates, and U (x) describes the “hybrid” potential energy function for all values of from 0 to 1. We note that nonlinear scaling with is also possible,5,6,14,15,51 resulting in hybrid potentials differing from eq. (1). Calculating Free Energy Differences Figure 1. Distribution of work values for the palmitic to stearic acid mutation test system described in the Test Systems section. Also included in this plot are the estimators given by eqs. (2) and (3) shown by the green dotted and blue dot-dash lines, respectively. The dashed black line shows the ⌬F estimate obtained by using Jarzynski’s equality in eq. (5) for all available data. Note that 1.0 kcal/mol ⬇ 1.7 k B T. Our approach here also applies, in principle, to other such choices. Essentially, the idea behind fast-growth methods is to perform rapid switches from ⫽ 0 3 1, where each switch is generated starting from coordinates drawn from the equilibrium ensemble for ⫽ 0. During each switching simulation, the irreversible work is accumulated, generating a single work value W. 17,25 Multiple Figure 2. The running Jarzynski estimate, given by eq. (5), as a function of the number of work values used in the estimate, N is shown as a solid blue line. The dashed red line shows the (subsampled) block averaged free energy estimate given by eq. (9), plotted as a function of the number of work values in each block, n. Data used for these estimates were obtained from the chemical potential calculation test system. The Jarzynski estimate displays erratic convergence behavior, while the block averaged free energy estimate displays a smooth monotonically decreasing estimate. Such behavior occurs in all systems for which W ⬎ k B T. 1751 Figure 3. The subsampled block-averaged free energy ⌬F n , given by eq. (9), obtained by generating 500 subsets of 100 work values each, drawn at random from the palmitic to stearic acid data set. The red squares with errorbars show the averages of the ⌬F n estimates with standard deviations obtained by using the 500 subsets. The blue curve shows the ⌬F n data from a single representative subset of 100 work values, and ⌬F all from Table 1 is included as the dashed black line. In this plot, extrapolating to the infinite data limit corresponds to continuing the single subset ⌬F n data (blue curve) to ⫽ 0 to obtain the intercept. This plot also demonstrates that the large (small n) data are more reliable as shown by the errorbars. Figure 4. Examples of the cumulative integral, CI( ) are shown for two subsets of 100 work values drawn at random from the palmitic to stearic acid mutation test system. The first subset is represented by open symbols and the second by closed symbols. The dashed black line shows the estimate using all available data ⌬F all, the blue squares are the subsampled ⌬F n and the red triangles are the CI( ). The strength of using CI( ) for extrapolation is its explicit use of all the ⌬F n values. For each subset, the value of was chosen by a fully automated process to minimize the slope of the small- tail of CI( ), as described in the text. For the open symbols it was found that ⫽ 0.75, and for the closed symbols ⫽ 0.39. In this example, the subset represented by the open symbols slightly overestimates ⌬F all, while the subset represented by the closed symbols slightly underestimates ⌬F all. 1752 Ytreberg and Zuckerman • Vol. 25, No. 14 switches are performed to generate a distribution of these work values (W). Simple Estimators It has been appreciated for some time that the average work obtained over many such switches provides a rigorous upper bound for the free energy difference,49,52 ⌬F ⱕ 具W典 0, (2) where the 具. . .典0 represents an average over many switches starting from the equilibrium ensemble for ⫽ 0 and ending at ⫽ 1. Equality occurs only in the limit of infinitely slow switches, because the work values generated in this limit are reversible. Tighter bounds are also known.2,18,29 Further, if the distribution of work values (W) is Gaussian (this will always be the case for sufficiently slow switching,17,19,53 but also may occur in certain nonequilibrium situations7), then the first term of the high temperature expansion of Zwanzig54 gives the exact result ⌬F ⫽ 具W典 0 ⫺ 1 2  W , 2 (3) where W is the standard deviation of (W), and  ⫽ 1/k B T where T is the temperature of the system and k B is the Boltzmann constant. However, for the fast-growth work values under consideration here, the distribution of work values can be very broad and non-Gaussian (see Table 1). Thus, eqs. (2) and (3) will not provide reasonable estimates of ⌬F; (see Fig. 1). It is possible to use higher order moments to estimate ⌬F (see, for example, refs. 7, 48, 54, and 55), but these estimators are only considered useful in the near-equilibrium regime.17,19 Jarzynski Equality Due to recent work by Jarzynski,17,18,20,56 it is possible to estimate ⌬F using fast-growth work values W via e ⫺⌬F ⫽ 具e ⫺W典 0. (4) This remarkable relationship is valid for arbitrary switching speed, implying that switches may be performed as rapidly as desired. The Jarzynski equality thus provides an estimate for ⌬F for a set of N work values given by ⌬F ⬟ ⌬F Jarz ⫽ ⫺ 冋冘 册 1 1 ln  N N e⫺  W i , (5) i⫽1 where the “⬟” denotes a computational estimate. The ⌬F estimates given by eq. (5), however, are very sensitive to the distribution of work values (W). 7,18,25 If the width of the work distribution is large, i.e., W ⬎⬎ k B T (this implies a very rapid switch and/or a complex system), then often thousands, or even tens of thousands of work values are needed to reliably • Journal of Computational Chemistry estimate ⌬F. An example of this can be seen in Figure 1, where a histogram of work values is shown for the palmitic to stearic acid system (described later). The value of ⌬F all given by the Jarzynski 1 2 equality, as well as estimators 具W典 0 and 具W典 0 ⫺ 2  W , are shown on this plot. This graphically demonstrates why eqs. (2) and (3) are often poor estimates of the free energy for fast-growth work values. In principle, nonequilibrium switching can be performed at arbitrary speeds. Hendrix and Jarzynski explored varying switch speeds for a Lennard–Jones fluid.20 Their work suggested that the determining factor in the accurate calculation of ⌬F was physical CPU time spent during the calculation. Rapid switches appeared to have no advantage over doing fewer slower switches, based upon using eq. (5) for all ⌬F estimates. If the switch is performed “instantaneously,” then the work performed is simply the difference in potential functions, and eq. (4) becomes e ⫺⌬F ⫽ 具e ⫺关U1共x兲⫺U 0共x兲兴 典0 , (6) often called single-stage free energy perturbation.54,57 In this limit, the system is not allowed to relax at any intermediate values of . Instead U 1 (x) is evaluated at values of x drawn from the equilibrium ensemble for ⫽ 0. The advantage of this method is that data can be generated very quickly. However, in practice, unless there is sufficient overlap between the states described by U 0 and U 1 , the estimate of ⌬F will be biased, often by many k B T. 58,59 The problem of attaining overlap of states can be improved by drawing from the equilibrium ensemble for an unphysical “soft-core” state (such as for ⫽ 0.5).33,51 Other Methods Use of the Jarzynski equality for free energy calculations has only been described recently, and a comparison to other approaches is informative. Equilibrium approaches, for instance, are better known procedures for free energy calculations.6,9,12,15,35,60 – 62 One quasi-equilibrium approach to estimating ⌬F using fewer work values is “slow-growth”—i.e., performing the switching process more slowly, leading to a narrower (W). 38,52,53,63,64 However, slower switching speed means that more computational time will be spent to generate each work value— offsetting some of the advantage gained by doing fewer switches. If the switch is performed so slowly that the system remains near equilibrium during the switch, then the width of the distribution will be very small ( W ⬍ k B T), and thus only a few work values are required for accurate estimation of ⌬F. 38,53 Slow-growth approaches have also made use of the bounds provided by eq. (2) in several cases.52,64 Thermodynamic integration (TI) is probably the most common fully equilibrium approach. In TI, ⌬F is calculated by allowing the system to attain equilibrium at each value of . Then ⌬F is found using 冕 冓 1 ⌬F ⫽ d 0 冔 ⭸U 共x兲 . ⭸ (7) Calculating Free Energy Differences Thermodynamic integration can provide very accurate ⌬F calculations, but is also computationally expensive.6,15,35,60 Bidirectional switching in nonequilibrium calculations is also possible. When using the Jarzynski equality eq. (5), the equilibrium ensemble is generated for ⫽ 0. Then (W) is generated by doing switches from ⫽ 0 3 1 (forward switches) with configurations drawn from the ⫽ 0 ensemble. It is also possible to generate another equilibrium ensemble for ⫽ 1 and then perform reverse switches from ⫽ 1 3 0. This bidirectional nonequilibrium approach was originally used by Reinhardt and coworkers for bounds-based ⌬F estimates.52,65 It has been shown that, if one combines the use of the forward and reverse work values, accurate free energy estimates can be obtained using fewer work values than with forward switches only.45– 47,66 It has been recently demonstrated that most efficient use of forward and reverse work values is for Bennett’s method.3,45,46 There is, however, a distinct motivation for using Jarzynski estimates with only forward switches, when one considers the eventual goal of predicting relative binding affinities for application in drug design. In this situation, one need only generate a single high-quality equilibrium ensemble for a particular ligandreceptor or reference complex, echoing the strategy of van Gunsteren and coworkers.33 Then relative binding affinities can be determined for other ligands without generating another equilibrium ensemble—potentially a significant decrease in computational expense. Test Systems To show the generality of the methods proposed in this study we consider five test systems of varying complexity: a one-dimensional “toy” system switched between two harmonic potentials, a chemical potential calculation for a Lennard–Jones fluid, “growing” a chloride ion in water, methanol-to-ethane mutation in water, and stearic-to-palmitic acid mutation in water. The latter two systems are examples of components of relative solubility calculations and consist of alchemical mutations of fully solvated molecules (see refs. 25, 67 for simulation details). The second system is a chemical potential calculation done by the particle insertion method (see ref. 20 for details). Table 1 explains important features of the data sets for each of the test systems used here. Statistical features of the data sets include the total number of work values N tot, the mean work 具W典, the standard deviation W , and skew.68 These data sets are all considered difficult to use for ⌬F calculations owing to their broadness ( W Ⰷ k B T; see Fig. 1) and the fact that 具W典 ⫺ ⌬F ⬎ 10 k B T; see eqs. (2) and (3). The values for the skew demonstrate that these data sets are non-Gaussian. Table 1 also includes estimates of ⌬F. The first estimate ⌬F all is obtained by using eq. (5) with all available data, and thus represents the best estimate for a given data set using the “standard” Jarzynski approach. The second estimate ⌬F ind is an independent measure of the free energy difference, obtained by a non-fast-growth approach, in all cases but one, and is specified in Table 1 for each case. As will be seen, some of the fast-growth data sets may not be fully converged, even considering all the work values; apparently 1753 these “full” data sets still suffer from finite-sampling bias. Therefore, it is valuable to consider independent estimates ⌬F ind. Below, we compare our results to both ⌬F all and ⌬F ind from Table 1. Of the five data sets analyzed in this report, three were obtained in other studies noted in Table 1, and simulation details are not given here. For the present report, we performed simulations of a toy model and a chloride system. The one-dimensional toy data were obtained by switching the system from H 0 ( x) ⫽ 0.5x 2 to H 1 ( x) ⫽ 2.0( x ⫺ 4.0) 2. The free-energy difference for this system can be solved exactly, giving ⌬F ⫽ 0.7 k B T. Fast-growth work values were obtained by Brownian dynamics simulation with a time step of 0.001. Mass, friction coefficient and k B T values were all set to 1.0, and starting configurations for each fast-growth trajectory were generated every 3000 time steps. The growing chloride system was studied using TINKER version 4.1,69 with the simulation conditions chosen to closely match those of Lybrand et al. in ref. 60. Stochastic dynamics simulations were carried out in the canonical ensemble (constant N, V, T) in a cubic box of edge length 18.6216 Å. The temperature was held at 300 K by a Berendsen thermostat with a time constant of 0.1 ps.70 The chloride ion was modeled with Lennard–Jones parameters ⫽ 4.4463 Å and ⫽ 0.1070 kcal/mol, and was solvated by 214 SPC water molecules. Ewald summation approximated charge interactions and RATTLE was used to hold the water molecules rigid.71 For this test system, the Lennard–Jones “size” was increased by 1.0 Å, from ⫽ 4.4463 Å at ⫽ 0 to ⫽ 5.4463 Å at ⫽ 1. To obtain fast-growth work values, a time step of 1.0 fs was used. The system was equilibrated for at least 10 ps, after which starting configurations for each fast-growth trajectory were generated every 100 time steps. The slow-growth estimate ⌬F ind for the chloride system was obtained by generating fifty work values using 500 -steps with 1000 relaxation steps at each value of . The resulting distribution had W ⬇ 0.2 k B T. ⌬F ind was then generated from this distribution using eq. (5). Block Averaging Block averages are the key to the present method. The motivation for using block averages can be seen in Figure 2 (see also refs. 2 and 49). The solid blue line is a running estimate for ⌬F obtained by using eq. (5) and the dashed red line is obtained by block averaging. Both curves were obtained using the chemical potential calculation test system. The running Jarzynski estimate exhibits very poor convergence behavior, making it very difficult to establish when a reliable estimate of ⌬F has been obtained. The block averaged free energy, however, displays a monotonically decreasing ⌬F estimate, which smoothly approaches ⌬F all. Each block averaged free energy (⌬F n ) estimate was obtained from a set of N work values (W 1 , W 2 , . . . , W N ) using the steps below. The present procedure represents a slight modification of earlier schemes,12,25,49 described by the following steps. 1. Choose n different work values at random from the set, generating a subset with relabeled indices (W 1 , W 2 , . . . , W n ). Let j 1754 Ytreberg and Zuckerman • Vol. 25, No. 14 denote the index for the particular draw performed, so that j ⫽ 1 for the first subset (“block”) of n values, j ⫽ 2 for the second, and so on. 2. Use Jarzynski’s equality, (5), to obtain a free energy estimate Ᏺ n, j for this block Ᏺ n, j ⫽ ⫺ 冉 1 1 ln  n 冘 冊 e⫺  W i . i僆block j • Journal of Computational Chemistry Jarzynski estimate using all N data points—i.e., given a data set with N values, ⌬F N is approximated by ⌬F all for subsampling. This indeed makes sense since ⌬F all is the only estimate available for ⌬F N given N work values. Bootstrapping, on the other hand, always overestimates ⌬F N by comparison. Therefore, all results shown below are obtained using the subsampled estimates for ⌬F n only. (8) Extrapolation Methods 3. Repeat steps 1 and 2 until m blocks have been obtained, each containing n values. The value of m will be discussed below. Now the average ⌬F n and standard deviation n can be calculated using ⌬F n ⬟ 1 m 冘 m Ᏺ n, j ⫽ j⫽1 n2 ⬟ 1 m 1 m 冘冋 m j⫽1 冘 ⫺ 冉 1 1 ln  n 冘 i僆block j e⫺  W i 冊册 , (9) m 共Ᏺ n, j ⫺ ⌬F n兲 2, (10) j⫽1 where the “⬟” denotes a computational estimate. This process is carried out for every possible value of n (i.e. n ⫽ 1, 2, 3, . . . , N). In previous work,25,49 the value for the number of blocks was chosen to be m ⫽ N/n. The weakness of this choice is that a reshuffling of the data set gives a new (generally different) set of ⌬F n approximations. Yet the estimates of ⌬F n should be insensitive to reshuffling, as the work values have no intrinsic order. Therefore, we choose m large enough that the resulting ⌬F n estimates do not depend upon the value of m. This is typically accomplished with m ⬃ 100 ⫻ N/n. Note that our scheme uses work values drawn at random without replacement. We call this a “subsampled” ⌬F n following ref. 72. It is worth mentioning that the “bootstrap” analysis strategy provides yet another route to block averaging. For this scheme, work values are drawn from (W 1 , W 2 , . . . , W N ) at random with replacement—i.e., it is possible to draw a particular work value more than once in a given block.73 The difference between the bootstrapped and subsampled methods can be illustrated by considering a data set of N work values where N ⫺ 1 values are large, and one value is very small. Due to the highly nonlinear nature of the Jarzynski equality, the single small work value will dominate eq. (9). Suppose estimates for ⌬F n are generated for n ⫽ N using both of these methods. The subsampled method will only have one ⌬F N estimate since reshuffling the work values has no effect when n ⫽ N. However, the bootstrapped method calculates a ⌬F N estimate that is larger than the subsampled ⌬F N due to the fact that it will draw the small work value only a fraction of the time. A generalization of this argument shows that the bootstrapped ⌬F n will exceed the subsampled ⌬F n for every value of n. Although we have successfully used both bootstrapped and subsampled ⌬F n estimates (data not shown), there is a distinct advantage to using the subsampled estimates for the approaches outlined below. The subsampled estimate for n ⫽ N is just the Now that a smooth function has been obtained in the block averaged free energy ⌬F n , shown in Figure 2, extrapolation to the infinite data limit becomes feasible, as originally suggested by Zuckerman and Woolf.25 The basic idea is to plot ⌬F n as a function of some variable and then extrapolate to the infinite data limit (n 3 ⬁). It is useful to plot ⌬F n as a function of ⫽ 1/n , with ⬍ 1.0, as shown in Figure 3.25 The plot was obtained by generating 500 subsets of 100 work values each, drawn at random from the palmitic to stearic acid mutation data set. Subsampled ⌬F n estimates were then computed for each subset of 100 work values following the steps outlined later. The red squares with errorbars show the averages of the ⌬F n estimates with standard deviations obtained by using the 500 subsets. In other words, Figure 3 shows the range of ⌬F n estimates expected from a set of 100 work values. The blue curve shows the ⌬F n data from a single representative subset of 100 work values, and ⌬F all from Table 1 is included as the dashed black line. A value of ⫽ 0.5 was chosen as discussed below. Note that the smallest uncertainty occurs for ⫽ n ⫽ 1 due to the fact that ⌬F n ( ⫽ 1) is simply the average work and is not expected to vary substantially between each subset. Plotting ⌬F n estimates as a function of ⫽ 1/n as in Figure 3 (rather than vs. n) allows us to develop two simple extrapolation schemes as explained in the following sections. The infinite data limit (n 3 ⬁) now corresponds to ⫽ 0, and in addition, this simple form gives a bounded interval 僆 (0, 1], rather than an infinite one (such as ⌬F n as a function of n, shown in Fig. 2). Below, two extrapolation schemes are outlined. The second, based on cumulative integrals, appears to perform significantly better than the first method, which relies on a linear extrapolation. Linear Extrapolation Because the block-averaged free energy ⌬F n in eq. (9) decreases monotonically,2,25,29 one can hope to obtain a reasonable estimate of ⌬F by simply continuing the curve in Figure 3 with a straight line. Such a linear extrapolation (as opposed to higher order) guarantees that our extrapolated results will not exceed ⌬F n for the maximum available n, which estimates a rigorous upper bound for the true ⌬F. 29 We test this extrapolation method using the test systems described earlier. This fully automated process contains the following steps: (1) draw a subset containing N work values (W 1 , W 2 , . . . , W N ) at random from the full data set. (2) Generate the subsampled ⌬F n as a function of ⫽ 1/n for n ⬍ N for ⫽ 1/2. (There is a range of valid choices 0.3 ⬍ ⬍ 0.7 that produce similar results and are discussed below.) (3) Extrapolate to ⫽ 0 Calculating Free Energy Differences using a straight line from the small- tail of the ⌬F n data. The intercept at ⫽ 0 is our extrapolated free energy ⌬F lin. (4) Using these same N work values, calculate the Jarzynski free energy estimate ⌬F Jarz using eq. (5). This process is repeated 500 times to obtain the average and standard deviation of ⌬F lin and ⌬F Jarz. The results from this simple extrapolation scheme are guaranteed to converge to the correct ⌬F as N 3 ⬁—that is, as the size/quality of a data set is increased. This is simply due to the fact that the leftmost ⌬F n value (Fig. 3) will approach ⫽ 0, leaving less “room” for error. Of course, for N ⬍ ⬁, it cannot be shown rigorously whether an extrapolated result will approach ⌬F from above or below as N grows. Our choice of ⫽ 0.5 for the linear extrapolation method is arbitrary, yet there are several practical considerations that have been taken into account. In general, smaller values of generate estimates with greater uncertainty. This occurs because, as is decreased, the ⌬F n data in Figure 3 are confined to a very small region in the rightmost half of the figure. Thus, the extrapolation must span a larger “distance,” and very small differences in the slope of the tail will cause large differences in the extrapolated estimate. This reasoning also provides the heuristic lower limit on , since values of ⬍ 0.3 tend to “overshoot” ⌬F. On the other hand, for larger , the slope of the tail typically becomes very large, also leading to uncertain extrapolations. (We note that ⫽ 0.5 with subsampled block-averages, appears to be similar to ⬇ 0.3 for the “naive” averaging scheme pursued previously25.) In the linear method, there is some leeway in the amount of data one considers to be the small- tail of the ⌬F n data. For our results we have chosen the tail to be 1/5 of the data. We found that including more than a 1/5 of the data (say, 1/3 or 1/4) introduced bias in the estimate, and that including much less data (say, 1/10 or 1/20) resulted in the same estimate as that for 1/5 but with larger uncertainty. A simple extension of the linear method shown here, is to fit ⌬F n to a nonlinear function, such as quadratic in , as in previous work by Zuckerman and Woolf.25 These nonlinear extrapolation methods offer little, if any, improvement in the average accuracy of ⌬F extrapolations. Furthermore, due to the inherent instability of high order fits, the standard deviations for the nonlinear extrapolated results are much larger than those obtained for linear extrapolation (data not shown). Cumulative Integral Extrapolation As previously mentioned (see Fig. 3), the most precise ⌬F n values are obtained for larger ⬇ 1 (i.e., smaller n), yet the previous linear extrapolation scheme relies exclusively on small values of . Thus, in an effort to use the more precise large- data to extrapolate ⌬F, we now formulate an integration scheme, which explicitly includes all values of . Consider ⌬F n in Figure 3 to be a smooth function ⌬F n ( ), from ⫽ 0 to 1. We are free to consider the area under this function, rewritten using integration by parts, 冕 1 0 d ⌬F n共 兲 ⫽ 冕 1 0 d 共1 ⫺ 兲 d⌬F n共 兲 ⫹ ⌬F n共 ⫽ 0兲. d (11) 1755 But ⌬F n ( ⫽ 0) ⫽ ⌬F is the desired free energy, and so one can consider a cumulative integral estimate, defined by 冕 冉 1 CI共 兲 ⬅ d ⬘ ⌬F n共 ⬘兲 ⫺ 共1 ⫺ ⬘兲 冊 d⌬F n共 ⬘兲 , d⬘ (12) where CI 3 ⌬F as 3 0. We evaluate this integral for decreasing —i.e., starting from CI( ⫽ 1) ⫽ 0 and “accumulating” CI() from right to left in Figure 3. The derivative is evaluated numerically. A sample plot of the cumulative integral is shown in Figure 4. This plot was generated using two subsets (represented by open and closed symbols) of 100 work values drawn at random from the palmitic to stearic acid mutation test system. The dashed black line shows the estimate using all available data ⌬F all, the blue squares are the subsampled ⌬F n and the red triangles are CI( ). For each of the two subsets, the value of was chosen to minimize the slope of the tail of CI( ), as discussed below. For the open symbols it was found that ⫽ 0.75 and for the closed symbols ⫽ 0.39. The subset represented by the open symbols slightly overestimates ⌬F all, while the subset represented by the closed symbols slightly underestimates ⌬F all— both as a result of statistical fluctuation. To motivate our approach to choosing , consider the case where N is large enough to obtain ⌬F exactly. In this situation, if is chosen carefully, CI( ) will have nearly zero slope for small , because accumulating more values will not change the estimate. Thus, while there is no guarantee, one can hope to extrapolate ⌬F by simply finding a value of where the slope dCI( )/d is minimized for small . The extrapolated free energy ⌬F ci will be the value of CI( ) for the smallest value of available, min ⫽ 1/N . Our fully automated test of this new extrapolation method is very similar to that described in the previous section, with only minor differences: (1) the value of is not fixed, rather is automatically chosen to minimize the slope of the small- tail of CI( )—see Figure 4, and (2) once the value of is determined, the free energy is estimated by ⌬F ci ⫽ CI( min). Once repeated CI estimates ⌬F ci are obtained, comparison is made to the Jarzynski estimate ⌬F Jarz using the same procedure as in the last section. The CI approach, like the linear method, is guaranteed to yield the true ⌬F for large N. However, also as with the linear approach, the CI extrapolations are neither upper nor lower bounds. Yet, as will be seen in the next section, the CI estimates appear to decrease monotonically, suggesting a heuristic upper bound. In implementing the CI method, as for the linear method we chose the small- tail of CI( ) to be 1/5 of the data. Including more than a 1/5 of the data (1/4, 1/3) introduced bias in the estimate, and including much less data (1/10, 1/20) resulted in the same estimate as that for 1/5 but with larger uncertainty. Results The results of this initial study are very positive, as shown by the rapid convergence of our extrapolated ⌬F estimates (Fig. 5). Compared to ⌬F Jarz, estimates of ⌬F can be made with 5– 40-fold fewer work values, i.e., less computational expense. Furthermore, 1756 Ytreberg and Zuckerman • Vol. 25, No. 14 the cumulative integral (CI) extrapolations compare favorably to the independent estimates ⌬F ind, hinting that they “see” beyond the poorly-sampled fast-growth data. Figure 5 demonstrates how the linear and cumulative integral (CI) methods described above compare to the Jarzynski estimate of eq. (5), for each of the five test systems. For all of the plots shown, the dashed black horizontal line corresponds to the Jarzynski estimate using all available work values, ⌬F all from Table 1. The solid black horizontal line is an independent estimate ⌬F ind described in Table 1. The red squares are averages of ⌬F Jarz using eq. (5). The blue triangles are averages of ⌬F lin from Sec. VIA for ⫽ 0.5, and the vertical blue bars indicate the range of ⌬F lin obtained for 0.4 ⬍ ⬍ 0.6 (i.e., not statistical uncertainty), with lower estimates given by ⫽ 0.4. The green circles are averages of ⌬F ci from the previous section. The inset for each plot shows standard deviations for each of the estimates ⌬F . Averages and standard deviations were obtained by performing 500 independent trials for each estimate (⌬F Jarz, ⌬F lin , ⌬F ci) for every value of N. Thus ⌬F indicates the expected statistical uncertainty—that is the range of values one would expect if the calculation was performed de novo. A glance at Figure 5 reveals that the linearly extrapolated ⌬F lin estimates converge to the estimate ⌬F all more quickly than the Jarzynski estimate ⌬F Jarz. Remarkably, the linear estimates have similar uncertainty to the Jarzynski estimates for all the test systems. Thus, with the possible exception of the Lennard–Jones fluid test system, the linear extrapolation significantly outperforms the Jarzynski estimates. The weakness of the linear method is the arbitrary choice of ⫽ 0.5, and its reliance on the less certain (large-n) ⌬F n values as explained earlier (see Fig. 3). Thus, the linear method can never be expected to “see beyond the data” analyzed. The CI approach overcomes some of the disadvantages of the linearly extrapolated estimates. The value of is chosen by an automated process, and the CI method explicitly utilizes the more certain (small-n) ⌬F n values. It should be noted that, while the uncertainty is larger for the CI method than the linear and Jarzynski approaches in the last three test systems, these three systems are believed to have biased data. (Here we use “biased” to indicate 兩⌬F ind ⫺ ⌬F all兩 ⬎ k B T, with ⌬F ind believed to be a reliable result for the true ⌬F. The bias in these ⌬F all values can be independently verified using a self-consistency criterion (data not shown). For the first two systems, where the fast-growth data sets are known to be converged, the CI method has comparable uncertainty to the linear and Jarzynski approaches. Remarkably, the CI method appears to have the ability to partly overcome bias due to incomplete sampling of the nonequilibrium data. This can be seen by comparison to independent estimates ⌬F ind, obtained by methods other than fast-growth in four of the five cases; (see Table 1). In all five of these systems, the CI method converges to ⌬F ind while the linear and Jarzynski methods are converging to ⌬F all. This implies that the CI method may have the ability to “see beyond the data”—at least beyond the poorly sampled estimates of ⌬F n for large n. The reason that the CI method may be able to partially overcome the bias in the data is due to the fact that the CI method explicitly uses all the ⌬F n data (as discussed earlier). By design, the reliable small-n (large-) data is explicitly included. This is • Journal of Computational Chemistry important because, as the size/quality of a data set is increased, the small-n ⌬F n data will remain essentially unchanged, while the large-n (small-) tail will change significantly. Thus, the CI method is expected to be less sensitive to the bias in the data due to a small number of work values. To obtain a quantitative comparison between CI extrapolated estimates ⌬F ci and Jarzynski estimates ⌬F Jarz, we ask the following question: how many work values are necessary to obtain a ⌬F estimate that falls within approximately 2.0 kcal/mol of the independent estimate ⌬F ind? (Note that a comparison using ⌬F all likely is not valid, because ⌬F all appears to be biased for the growing chloride and methanol to ethane test systems.) Table 2 summarizes the results of this comparison. The CI estimates offer a significant improvement over the Jarzynski estimates in all of the test systems, with a 5– 40-fold decrease in the number of work values needed to estimate ⌬F ind within 2.0 kcal/mol. A comparison for the palmitic to stearic acid test system is not attempted since the independent estimate ⌬F ind is obtained from a fast-growth data set, and thus may suffer from bias. However, because the data used for ⌬F ind was obtained for a slower switching speed than ⌬F all, it is encouraging that the CI estimates overshoot ⌬F all, as it is expected that the actual value of ⌬F lies closer to ⌬F ind than ⌬F all. Conclusions We have described two methods that improve standard nonequilibrium estimates of free energy differences, ⌬F: linear extrapolation, and cumulative integral (CI) extrapolation. These approaches apply to data obtained either in experiments or computation. Both of the methods rely on block averaged free energies ⌬F n , which are extrapolated to the infinite data limit, and offer more rapid estimates of ⌬F than using the Jarzynski equality alone. Five test systems were used in this study: a one-dimensional “toy” system switched between two harmonic potentials, a chemical potential calculation for a Lennard–Jones fluid, growing a chloride ion in water, a methanol-to-ethane mutation in water, and a palmitic-to-stearic acid mutation in water. Previous work by Zuckerman and Woolf25 used a quadratic extrapolation method to estimate ⌬F. The present study offers several improvements: (1) fully automated methods; (2) two new schemes for calculating block averaged free energies, ⌬F n ; (3) a key innovation in CI extrapolation’s use of the more reliable (small-n) ⌬F n data; (4) a quantitative comparison between the CI and Jarzynski ⌬F estimates, showing a 5– 40-fold decrease in the number of work values needed for the CI estimates; (5) tests of our extrapolation methods on five systems of varying complexity. Bootstrapped and subsampled block averaged free energies are described, which are key to reliable extrapolations. These approaches to estimating ⌬F n offer very smooth convergence properties, and hence, statistically reliable extrapolation. The linearly extrapolated estimates have uncertainties similar to the “standard” Jarzynski estimates, and produce results that converge more rapidly. The success of the “simple-minded” linear extrapolation illustrates the power of the underlying idea: systematic behavior in bias can be exploited. Calculating Free Energy Differences Figure 5. A comparison between ⌬F estimates for linear extrapolation, cumulative integral (CI) extrapolation, and the Jarzynski equality for all of the test systems. The red squares are averages of Jarzynski estimates given by eq. (5), the blue triangles are averages of linearly extrapolated estimates from the Extrapolation Methods section for ⫽ 0.5, and the blue bars indicate the range of ⌬Flin for 0.4 ⬍ ⬍ 0.6 (i.e., not statistical uncertainty). The green circles are averages of CI extrapolated estimates from the Extrapolation Methods section. For each plot, the dashed black horizontal line indicates the Jarzynski estimate using all available data ⌬Fall, and the solid black horizontal line is a reliable independent estimate ⌬Find— both from Table 1. Note that ⌬Fall ⬇ ⌬Find for the first two plots. The inset in each plot shows the standard deviation for each of the estimates. Averages and standard deviations were obtained by performing 500 independent trials for each estimate for each value of N. Note that the vertical scales for each plot are different. [Color figure can be viewed in the online issue, which is available at www.interscience.wiley.com.] 1757 1758 Ytreberg and Zuckerman • Vol. 25, No. 14 Table 2. A Quantitative Comparison between the Cumulative Integral (CI) Estimates ⌬F ci and Jarzynski Estimates ⌬F Jarz Shown in Figure 5. Test system N ci N Jarz CI ratio One-dimensional to system Chemical potential for a Lennard–Jones fluid Growing chloride by 1.0 Å in water Methanol-to-ethane mutation in water 40 700 150 400 200 3500 6000 9600 5 5 40 24 The first column shows the test system used in the comparison. The second and third columns are the number of work values needed to obtain estimates that falls within 2.0 kcal/mol of ⌬F ind for the CI (N ci) and Jarzynski (N Jarz) estimates. The last column is the ratios of these values—i.e., the approximate improvement of the linear and CI estimates over the Jarzynski estimates. Note that the palmitic to stearic acid test system is not included since ⌬F ind is obtained from another (slower) fast-growth data set, and thus may not represent the actual ⌬F. A quantitative comparison between the CI extrapolated ⌬F estimates and those using Jarzynski’s equality show a marked decrease in the number of work values needed to estimate ⌬F when using the CI estimates. The CI extrapolation method obtains accurate ⌬F estimates using 5– 40 times less data than the traditional Jarzynski approach. It was also found that the CI method appears to have the ability to “see beyond” the statistically uncertain large-n block averages, ⌬Fn. The large n data are dominated by small work values which, typically, are poorly sampled. By relying on the full range of n, the CI results compare very well with reliable, independent estimates of ⌬F. Other similar extrapolation methods could be developed that may offer improvement over those presented here. Such methods are currently under investigation by the authors. Future work by the authors will use extrapolation methods, such as those described here, to generate ⌬F estimates for large molecular systems such as relative protein–ligand binding affinities. Software which automatically performs the analyses described here is available from the authors (www.ccbb.pitt.edu/zuckerman). Acknowledgments We gratefully acknowledge Jay Ponder for granting F.M.Y. permission to write code for TINKER to perform fast-growth free energy calculations, and to Alan Grossfield for his assistance in writing and implementing this code. We would like to thank Arun Setty for fruitful discussion and valuable suggestions. Special thanks are due to Hirsh Nanda and Thomas Woolf for permission to use the raw data for the palmitic to stearic mutation, and to David Hendrix and Chris Jarzynski for permission to use the raw data for the chemical potential calculation. References 1. Liphardt, J.; Dumont, S.; Smith, S. B.; Tinoco, I.; Bustamante, C. Science 2002, 296, 1832. • Journal of Computational Chemistry 2. Zuckerman, D. M.; Woolf, T. B. Phys Rev Lett 2002, 89, 180602. 3. Shirts, M. R.; Bair, E.; Hooker, G.; Pande, V. S. Phys Rev Lett 2003, 91, 140601. 4. Pearlman, D. A.; Kollman, P. A. J Chem Phys 1989, 90, 2460. 5. Kong, X.; Brooks, C. L. J Chem Phys 1996, 105, 2414. 6. Shirts, M. R.; Pitera, J. W.; Swope, W. C.; Pande, V. S. J Chem Phys 2003, 119, 5740. 7. Gore, J.; Ritort, J.; Bustamante, C. Proc Natl Acad Sci USA 2003, 100, 12564. 8. Hummer, G.; Szabo, A. Proc Natl Acad Sci USA 2001, 98, 3658. 9. Bash, P. A.; Singh, U. C.; Langridge, R.; Kollman, P. A. Science 1987, 236, 564. 10. Woods, C. J.; Essex, J. W.; King, M. A. J Phys Chem B 2003, 107, 13703. 11. McCammon, J. A. Curr Opin Struct Biol 1991, 2, 96. 12. Shobana, S.; Roux, B.; Andersen, O. S. J Phys Chem B 2000, 104, 5179. 13. Isralewitz, B.; Gao, M.; Schulten, K. Curr Opin Struct Biol 2001, 11, 224. 14. Bitetti–Putzer, R.; Yang, W.; Karplus, M. Chem Phys Lett 2003, 377, 633. 15. Boresch, S.; Tettinger, F.; Leitgeb, M.; Karplus, M. J Phys Chem B 2003, 107, 9535. 16. Park, S.; Khalili–Araghi, F.; Tajkhorshid, E.; Schulten, K. J Chem Phys 2003, 119, 3559. 17. Jarzynski, C. Phys Rev Lett 1997, 78, 2690. 18. Jarzynski, C. Phys Rev E 1997, 56, 5018. 19. Hummer, G. J Chem Phys 2001, 114, 7330. 20. Hendrix, D. A.; Jarzynski, C. J Chem Phys 2001, 114, 5974. 21. Sotriffer, C.; Klebe, G.; Stahl, M.; Bohm, H.-J. Burger’s Medicinal Chemistry and Drug Discovery; Wiley: New York, 2003, vol 1, 6th ed, chapt 7. 22. Bajorath, J. Nat Rev Drug Discov 2002, 1, 882. 23. Lazar, G. A.; Marshall, S. A.; Plecs, J. J.; Mayo, S. L.; Desjarlais, J. R. Curr Opin Struct Biol 2003, 13, 513. 24. DeGrado, W. F.; Nilsson, B. O. Curr Opin Struct Biol 1997, 7, 455. 25. Zuckerman, D. M.; Woolf, T. B. Chem Phys Lett 2002, 351, 445. 26. Sun, S. X. J Chem Phys 2003, 118, 5769. 27. Ytreberg, F. M.; Zuckerman, D. M. J Chem Phys 2004, 120, 10876. 28. Schurr, J. M.; Fujimoto, B. S. J Chem Phys B 2003, 107, 14007. 29. Zuckerman, D. M.; Woolf, T. B. J Stat Phys 2004, 114, 1303. 30. Grossfield, A.; Ren, P.; Ponder, J. W. J Am Chem Soc 2003, 125, 1567. 31. Lu, N.; Kofke, D. A. J Chem Phys 2001, 114, 7303. 32. Lu, N.; Kofke, D. A. J Chem Phys 2001, 115, 6866. 33. Oostenbrink, C.; van Gunsteren, W. F. J Comput Chem 2003, 24, 1730. 34. Jarzynski, C. Phys Rev E 2002, 65, 046122. 35. Mordasini, T. Z.; McCammon, J. A. J Phys Chem B 2000, 104, 360. 36. Wood, R. H. J Phys Chem 1991, 95, 4838. 37. Miller, M. A.; Reinhardt, W. P. J Chem Phys 2000, 113, 7035. 38. Hu, H.; Yun, R. H.; Hermans, J Mol Sim 2002, 28, 67. 39. Schön, J. C. J Chem Phys 1996, 105, 10072. 40. Hodel, A.; Simonson, T.; Fox, R. O.; Brünger, A. T. J Phys Chem 1993, 97, 3409. 41. Pearlman, D. A., Kollman, P. A. J Chem Phys 1989, 91, 7831. 42. Di Nola, A.; Brunger, A. T. J Comput Chem 1998, 19, 1229. 43. Pearlman, D. A. J Comput Chem 1994, 15, 105. 44. Edholm, O.; Ghosh, I. Mol Sim 1993, 10, 241. 45. Lu, N.; Kofke, D. A.; Woolf, T. B. J Comput Chem 2004, 25. 46. Lu, N.; Singh, J. K.; Kofke, D. A. J Chem Phys 2003, 118, 2977. 47. Bennett, C. H. J Comput Phys 1976, 22, 245. Calculating Free Energy Differences 48. Amadei, A.; Apol, M. E. F.; DiNola, A.; Berendsen, H. J. C. J Chem Phys 1996, 104, 1560. 49. Wood, R. H.; Mühlbauer, W. C. F.; Thompson, P. T. J Phys Chem 1991, 95, 6670. 50. Tembe, B. L.; McCammon, J. A. Comput Chem 1984, 8, 281. 51. Pitera, J. W.; van Gunsteren, W. F. J Phys Chem B 2001, 105, 11264. 52. Reinhardt, W. P.; Hunter, J. E. J Chem Phys 1992, 97, 1599. 53. Hermans, J. J Phys Chem 1991, 95, 9029. 54. Zwanzig, R. W. J Chem Phys 1954, 22, 1420. 55. Hummer, G.; Szabo, A. J Chem Phys 1996, 105, 2004. 56. Crooks, G. E. Phys Rev E 2000, 61, 2361. 57. Kirkwood, J. G. J Chem Phys 1935, 3, 300. 58. Liu, H.; Mark, A. E.; van Gunsteren, W. F. J Phys Chem 1996, 100, 9485. 59. Swanson, J. M. J.; Henchman, R. H.; McCammon, J. A. Biophys J 2004, 86, 67. 60. Lybrand, T. P.; Ghosh, I.; McCammon, J. A. J Am Chem Soc 1985, 107, 7793. 61. Mark, A. E.; van Gunsteren, W. F.; Berendsen, H. J. C. J Chem Phys 1991, 94, 3808. 1759 62. Hummer, G.; Pratt, L. R.; Garcia, A. E. J Phys Chem 1996, 100, 1206. 63. Watanabe, M.; Reinhardt, W. P. Phys Rev Lett 1990, 65, 3301. 64. Jarque, C.; Tidor, B. J Phys Chem B 1997, 101, 9402. 65. Hunter, J. E.; Reinhardt, W. P.; Davis, T. F. J Chem Phys 1993, 99, 6856. 66. Frenkel, D.; Smit, B. Understanding Molecular Simulation; Academic Press: San Diego, 1996. 67. Nanda, H.; Woolf, T. B., In preparation. 68. Beyer, W. H., Ed. CRC Standard Mathematical Tables and Formulae; CRC Press: Boca Raton, 1991, 29th ed. 69. Ponder, J. W.; Richard, F. M. J Comput Chem 1987, 8, 1016. 70. Berendsen, H. J. C.; Postma, J. P. M.; van Gunsteren, W. F.; DiNola, A.; Haak, J. R. J Chem Phys 1984, 81, 3684. 71. Andersen, H. C. J Comput Phys 1983, 52, 24. 72. Politis, D. N.; Romano, J. P.; Wolf, M. Subsampling; Springer-Verlag: New York, 1999. 73. Efron, B.; Tibshirani, R. J. An Introduction to the Bootstrap; Chapman and Hall: New York, 1993.
© Copyright 2026 Paperzz