Using Bayes` Theorem for Free Energy Calculations

UNIVERSITY OF CINCINNATI
Date: 25-Aug-2009
I, David M. Rogers
,
hereby submit this original work as part of the requirements for the degree of:
Doctor of Philosophy
in
Chemistry
It is entitled:
Using Bayes' Theorem for Free Energy Calculations
Student Signature:
David M. Rogers
This work and its defense approved by:
Committee Chair:
Thomas Beck, PhD
Thomas Beck, PhD
Bruce Ault, PhD
Bruce Ault, PhD
H Brian Halsall, PhD
H Brian Halsall, PhD
George Stan, PhD
George Stan, PhD
8/26/2009
160
Using Bayes’ Theorem for Free Energy
Calculations
A dissertation submitted to the
Division of Research and Advanced Studies
Of the University of Cincinnati
in partial fulfullment of the
requirements for the degree of
DOCTORATE OF PHILOSOPHY (Ph.D.)
In the Department of Chemistry
Of the College of Arts and Sciences
by
David M. Rogers
B.S. Chemistry, University of Cincinnati, 2004
Minor Mathematics, University of Cincinnati, 2004
August, 2009
Committee Chair: Thomas L. Beck, Ph.D.
Abstract
Statistical mechanics is fundamentally based on calculating the probabilities of molecular-scale
events. Although Bayes’ theorem has generally been recognized as providing key guiding principals
for setup and analysis of statistical experiments [83], classical frequentist models still predominate
in the world of computational experimentation. As a starting point for widespread application of
Bayesian methods in statistical mechanics, we investigate the central quantity of free energies from
this perspective. This dissertation thus reviews the basics of Bayes’ view of probability theory,
and the maximum entropy formulation of statistical mechanics before providing examples of its
application to several advanced research areas. We first apply Bayes’ theorem to a multinomial
counting problem in order to determine inner shell and hard sphere solvation free energy components of Quasi-Chemical Theory [140]. We proceed to consider the general problem of free energy
calculations from samples of interaction energy distributions. From there, we turn to spline-based
estimation of the potential of mean force[142], and empirical modeling of observed dynamics using integrator matching. The results of this research are expected to advance the state of the art in
coarse-graining methods, as they allow a systematic connection from high-resolution (atomic) to
low-resolution (coarse) structure and dynamics. In total, our work on these problems constitutes a
critical starting point for further application of Bayes’ theorem in all areas of statistical mechanics.
It is hoped that the understanding so gained will allow for improvements in comparisons between
theory and experiment.
iii
Using Bayes’ Theorem for Free Energy Calculations
Author: David M. Rogers
Dr. Thomas L. Beck Research Group, University of Cincinnati
c 2009 David M. Rogers
Copyright All Rights Reserved
iv
Acknowledgments
From the beginning of my work on this dissertation, I have been grateful to my colleagues who
continue to share in the painstaking process of discovering and transferring new knowledge. Senior
graduate students like Jason Clohecy and Nobunaka Matsuno set the norm and let me know that I
could get through it alive. It’s a good feeling to be a part of the graduate school community, whose
membership is too large to mention here.
My family has been the greatest source of encouragement, from my nearest of kin in Cincinnati
all the way to my Uncles, Aunts, Grandparents, & Co. in Toledo, Cleveland, Michigan and New
Jersey. I would especially like to thank my wife, Melanie Lynn, and two sons, James Michael and
Charles Ralph for their constant love and attention, my parents (and anyone who has ever had to
clean up after me, sorry) for investing in my formative years, and my brother, Daniel Allen, for
various shenanigans not to appear in this work.
It took a lot of self-discipline and prayer to get through the tight spots, and it often seemed I was
getting nowhere. But it’s strangely comforting to know that most of the biggest questions do not
have an answer. Academically, a great test of where we’re at is to try and answer God’s challenge,
“Who is this who darkens counsel by words without knowledge? Now prepare yourself like a man;
I will question you, and you shall answer Me. Where were you when I laid the foundations of
the earth? ...” (Job 38:2-4,NKJV). Although I have managed to include a decent amount of new
insight in this dissertation, on such questions I must still agree with Newton when he says that
“Gravity explains the motions of the planets, but it cannot explain who set the planets in motion.
v
God governs all things and knows all that is or can be done.”
I am without a doubt indebted to my advisor and Ph.D. committee (including Matthew D. Wortman), whose names have now become inseparable with the rest of this document, the remainder
of my academic career, and not least the list of people I can trust to be responsible and call on for
help and support in the future. I could not have made it this far without their useful additions and
well-tempered advice on how to make sense out of all my crazy ideas.
Twenty five references cited in this dissertation were published before I was born, attesting to
the fact that we would collectively be nowhere without the teachers that have gone before us. We
put far too low a priority in this country on our children’s education system, and I express my
gratitude for the dedication of my early and grade-school teachers and their families. I’d also like to
acknowledge all the chemistry faculty here at the University of Cincinnati for the positive influence
they had on my undergraduate career, especially Professors Apryll M. Stalcup and Thomas L. Beck
for encouraging me to move on to graduate school.
Finally, I am grateful for the generous financial support from our University of Cincinnati Department of Chemistry for which all its faculty and staff, past and present, deserve credit as well
as our industrial affiliates program and the DOE Computational Science Graduate Fellowship (DEFG02-97ER25308). The focus of the United States and her citizens on advanced science is second
to none, and has enabled us to lead the world in scientific and technological innovation.
vi
Contents
Front Matter
ii
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
2
3
vii
Introduction
1
1.1
Overview of Current Problems in Biophysical Simulations . . . . . . . . . . . . .
1
1.2
Approximations in Computational Physics . . . . . . . . . . . . . . . . . . . . .
5
1.3
Subjective Probability in Statistical Mechanics . . . . . . . . . . . . . . . . . . .
6
1.4
Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
1.5
Material Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
Information Theory and Statistical Mechanics
13
2.1
Maximum Entropy Example: Quasi-Harmonic Analysis . . . . . . . . . . . . . .
20
2.2
Maximum Entropy Example: Volume Partitioning . . . . . . . . . . . . . . . . .
22
Information Theory Perspective on Stochastic Dynamics
27
3.1
Thermodynamics of Coarse-Grained Systems . . . . . . . . . . . . . . . . . . . .
27
3.1.1
Connection to Bead-Based Coarse-Graining . . . . . . . . . . . . . . . .
30
3.2
Integration Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
3.3
Continuous Time Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
3.4
Discrete Time Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
vii
CONTENTS
3.5
4
5
6
3.4.1
Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
3.4.2
Problem Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
Sources of Modeling Error in Choosing Integrator Parameters . . . . . . . . . . .
44
Free Energy Inference for a Multinomial Counting Problem
49
4.1
Quasi-Chemical Theory Division . . . . . . . . . . . . . . . . . . . . . . . . . .
52
4.2
Formulation of the Inference Problem . . . . . . . . . . . . . . . . . . . . . . . .
56
4.3
Solution of the Inference Problem . . . . . . . . . . . . . . . . . . . . . . . . . .
60
4.4
Re-Weighting Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
4.5
Computing Conditional Averages via Re-Weighting . . . . . . . . . . . . . . . .
67
4.6
Application to Polarizable Ion Solvation . . . . . . . . . . . . . . . . . . . . . . .
70
Inference on the PMF using Spline-Based Approximations
77
5.1
Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
5.2
Solution for One-Dimensional Systems . . . . . . . . . . . . . . . . . . . . . . .
83
5.3
Resolution Dependence of P-Spline Estimation . . . . . . . . . . . . . . . . . . .
85
5.4
Multi-dimensional generalization . . . . . . . . . . . . . . . . . . . . . . . . . .
91
5.5
Deciding Simulation Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . .
92
5.6
Molecular Simulation Examples . . . . . . . . . . . . . . . . . . . . . . . . . . .
93
Closing Remarks
102
Bibliography
107
A Probability Theory
126
viii
List of Figures
3.1
Time evolution of a stochastic oscillator. . . . . . . . . . . . . . . . . . . . . . . .
33
3.2
Phase diagram for a 2-step Markov implementation of the GLE. . . . . . . . . . .
43
4.1
Methane to SPC water energy distribution. . . . . . . . . . . . . . . . . . . . . . .
53
4.2
Division of the minimum solute–solvent distance (rrmin ) into successive shells. . . .
58
4.3
µex
OS,HS profile at Scen of 1ots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
4.4
Effect of conditioning on the interaction energy distribution. . . . . . . . . . . . .
68
4.5
Contributions to the OS-LR conditional free energy. . . . . . . . . . . . . . . . . .
72
4.6
Effect of size and polarizability on the solvation shell organization. . . . . . . . . .
73
5.1
Overfitting using generalized linear least squares. . . . . . . . . . . . . . . . . . .
80
5.2
Posterior average penalty parameter showing variation with respect to total problem scale. 84
5.3
Spline smoothing equivalent convolution kernels. . . . . . . . . . . . . . . . . . .
90
5.4
Validation of the force matching for common potential energy functions. . . . . . .
95
5.5
Effect of sample size on average error of the fitted functions. . . . . . . . . . . . .
97
5.6
Effect of sample size on distribution of observed distances. . . . . . . . . . . . . .
98
5.7
Comparison of internal PDFs between all-atom and united-atom force-matched octane.101
ix
List of Tables
x
4.1
Partial molar hydration free energies for whole salts. . . . . . . . . . . . . . . . .
71
4.2
Comparison of local solvation environment indicators. . . . . . . . . . . . . . . .
75
5.1
List of spline parameters in the united atom octane model. . . . . . . . . . . . . . 100
Chapter 1
Introduction
1.1
Overview of Current Problems in Biophysical Simulations
A central challenge of theoretical chemistry is connecting atomistic behavior with the observable
properties of matter. The detailed quantum-mechanical understanding of small molecules reached
decades ago is increasingly finding application on larger length and time scales. With each increase
in problem size, the complexity and detail of our descriptive models increase to match, necessitating
the ubiquitous appearance of computational modeling in the field. This expansion has carried over
into the computational methods used to model these systems as well, as evidenced by the steady
progression from atomistic quantum-mechanical to force field-based energy computations.
Accordingly, some of the most important current research frontiers are improving the computational methods for ligand/receptor binding, protein folding and large-scale organization of nanomaterials, and the creation of accurate microscopic solvation models. These have already shown their
utility [87] and indeed have even greater potential for positively impacting overall human health and
productivity.
One of the most direct applications of computational chemistry is in drug docking studies. A
simple calculation of the energy released when a potential ligand (drug) molecule binds (or docks) to
a specific receptor protein identified by biologists can be used to identify lead drug candidates. Such
1
1.1. OVERVIEW OF CURRENT PROBLEMS IN BIOPHYSICAL SIMULATIONS
computational modeling has become very important to the pharmaceutical industry [87]. A review
of lead optimization companies [36] reveals widespread use of computational modeling technology.
It has proven successes in (amongst others) the design of HIV protease inhibitors [179]. The most
widely identified area for improvement in this field is the development of faster and more accurate
modeling techniques.
However, in order to accurately determine the thermodynamic driving forces (i.e. free energies)
for ligand association governing the activity of drug molecules, such energy calculations must be
carried out for a large number of sampled configurations. This presents a sampling problem because
an accurate energy function requires inclusion of very many chemical details – making it more
computationally expensive. A manifestation of these problems can be seen in association reaction
modeling involving large degrees of freedom (e.g. highly flexible ligands, receptors or nearby water
molecules). A tradeoff must therefore be found between cost and accuracy. Efficient sampling for
these systems presents significant challenges for simulation design [96].
Apart from docking studies, another mainstay of computational chemistry is composed of studies explaining and predicting the great wealth of laboratory phenomena – such as analyte partitioning
in chromatographic separations, chemical reactions in solution [86, 97, 98, 49, 69], single-molecule
pulling experiments [74, 41, 113], enzyme function [5], acid-base equilibria [156, 34], phase equilibria [151], etc. Core chemical concepts such as molecular electronic structure, hydrogen bonding,
hydrophobic effects, solution structure, acid-base equilibria, and derived descriptive models provide
the necessary framework for interpreting difficult results and designing increasingly more productive (and complicated) experiments and laboratory instruments. A continuing challenge is to create
descriptive models which can help elucidate the application-specific interplay between these effects.
As is the case for drug design, improvement in energy function accuracy and sampling efficiency
are continually adding to our fundamental understanding of these ideas. Theoretical investigations
are distinguished from experiment in this respect because they are limited to systems which are
sufficiently small enough. In terms of simulating atomistic dynamics, ab-initio approaches (based on
solving the Schrödinger equation) are limited to simulations extending to tens of picosecond with a
2
CHAPTER 1. INTRODUCTION
few thousand atoms, force field based molecular dynamics can extend this to nanoseconds with a few
hundred thousand atoms (at the common level of approximation), and coarse grained simulations
may provide another few orders of magnitude increase for both [122]. The efficiency of these last
two methods depends, in a large part, on innovations in calculation methodology. Excellent sources
of information on current progress in computational modeling are the Reviews in Computational
Chemistry [109] and Annual Reports in Computational Chemistry [158] book series.
To begin investigation into these problems, this dissertation will consider a novel scheme for
calculating solvation free energies. A free energy is (proportional to) the logarithm of the likelihood
ratio between two states.
A ⇋ AB,
β∆FA→AB = − ln
[AB]
[A]
(1.1)
When we make the particular choice of A and AB as (respectively) systems without and with interactions between a solution phase and a solute, ∆FA→AB is a solvation free energy. Because we
can define the solvent environment as anything we like, differences in solvation free energies for
different environments can give likelihood ratios between any two end-points in a reaction.
The primary solvation problem studied in this dissertation is ion solvation in water. Because
the basic mathematical form of almost all commonly used force-field models was derived from gas
phase data based on the assumption of pairwise-additive interactions, one may expect some differences between the model and system energies in the condensed phase. As averages over large
populations of states, solvation free energies are experimentally accessible and are thus useful for
judging force-field accuracy. These differences have proven particularly troublesome for ion solvation in water because excluded volume, hydrogen bonding, long-ranged water reorganization
(dielectric response), and possibly even local charge transfer effects are difficult to quantify and
untangle. The difficulty comes from the fact that high-level ab-initio energies are not, in principle,
decomposable into separate additive terms.
As an alternative, the present work will employ quasi-chemical theory (QCT) in order to separate the contributions arising from alternate competing interactions. In contrast to assuming a direct
3
1.1. OVERVIEW OF CURRENT PROBLEMS IN BIOPHYSICAL SIMULATIONS
decomposition of the energy function, QCT decomposes the solvation process into a series of physical steps. The results can then be used to understand the origin of the differences between models
in terms of free energy differences between chemically meaningful systems. These systems are
(assuming the solvent is water): (0) separate solute and solvent, (1) an n water cluster in solution
with definite shape, (2) the solute bound to the cluster in gas phase, (3) the solute-water cluster in
solution, and (4) the solute itself. In the case of n = 0, the (0→1) step is the creation of excluded
volume, or the solvation of a so-called “hard-sphere” shell with free energy µex
OS,HS , the (1+2→3)
step’s free energy is termed outer-shell long-ranged, µex
OS,LR , and the (3→4) step is the removal of
the solvent shell restriction, termed µex
IS because the inner-shell is allowed to form.
At the present time, conventional atomistic molecular dynamic simulations have reached scaling
limits in system size, parallel processing speed, and integration timestep size. In typical biological
conditions (water at 150 mM) the Bjerrum length is 0.7 nm, and the Debye screening length is 0.8
nm [145]. Practically, this means that screened electrostatic forces persist up to several nanometers.
Within this range, interactions between all pairs of atoms must be considered, and the most efficient
scheme has been to compute an electrostatic field using the particle mesh Ewald formalism, which
scales as O (N log N) with the number of atoms, N due to its use of a fast Fourier transform (FFT).
For very large systems, parallelization is the only way to carry out dynamics in a reasonable amount
of time. However the communication requirements of the FFT (hypercube connection topologies)
limit the parallel scaling even on current generation massively parallel supercomputers. The practical result is that reasonable scaling cannot be achieved above several hundred thousand atoms [7]
– below requirements for very large biological or industrial systems. Finally, dynamics algorithms
employ discrete timesteps in order to integrate the equations of motion. The largest time step possible is limited by the fastest motions in the system and sophisticated attempts at increasing this time
step have been frustrated by energy drift and resonance problems [146].
In order to sustain the progress of molecular modeling to even larger simulations, coarsegraining offers the possibility of reducing the model complexity while retaining its essential physics.
Several methods have grown up around this observation [122]. The most frequently proposed first
4
CHAPTER 1. INTRODUCTION
step away from atomistic simulations is to use a grouping of bonded atoms together into coarsegrained sites. These sites can be treated much like their atomic predecessors in terms of force-fieldbased energy and dynamics calculations [59, 123] or given internal equations of state, geometrical
structure, and multipole moments.
Because the essential physics and chemistry of each system under consideration can vary widely,
creating a general prescription for coarse-graining has proven to be difficult. As expected from the
structure-function relationship of proteins, the nano-scale structures of these biologically interesting
systems have shown clear associations to their particular functions [66]. Creating a link between (in
principle known) electronic structures and purely large-scale models thus requires a method which
is able to extract such unique features. This has prompted the present investigations using existing
modeling techniques at very high levels of theory (and computational complexity) to parametrize
system-specific coarse models. The result has been a parametrization method which makes possible
the treatment of atomic systems in general.
1.2
Approximations in Computational Physics
In attempting to collect our ideas about computational modeling, it is helpful to start with an overarching framework. The idea is to represent the “chemistry” of any system (e.g. stuff in brown bottles,
steam flowing through reaction tubes, geological strata, vesicular membranes, etc.) as a function of
its composition and all the other parameters we know about it. This naturally leads us to word the
statistical mechanical (i.e. averaging over lots of possibilities) framework in which we represent
this knowledge in terms of a probability distribution function (PDF) for everything going on in the
system, given the information we know.
Such a formalism uses the known constraints (temperature, pressure, composition, reactions
that can occur, spatial constraints, conformational information about molecules, boundary flux conditions, etc.) as input and generates predictions in the form of marginal PDFs. This process simply
follows the mathematics of probability, averaging over everything save what the experiment reads as
5
1.3. SUBJECTIVE PROBABILITY IN STATISTICAL MECHANICS
output. If we have the ‘correct’ model – i.e. the equation for P (x| . . .) – then the experiment and theory should not deviate. In practice, we can only approximate (or at worst guess) the form of P (x| . . .),
and hence such marginal probabilities as the density of states P (E| . . .) = P (x| . . .)δ(E(x) − E)dx,
R
due to constraints on what can be calculated.
Some consideration of the subject material of standard physics [65] shows that for most properties, a few simple parameters should be sufficient to describe the essential physics under measurement in most experiments. Capillary action and the shapes of liquid interfaces can be described
based solely on surface tension [45]. The mutual attraction of two nanoparticles far away in solution
can be described by electrostatics and Hamaker or Lifschitz theory [112, 120, 131], random thermal
motion by a Langevin or Fokker-Planck diffusion equation [45, 166], and so on. Viewed in this
way, the well-known laws of physics exist as a collection of various levels of approximation to a
universal dynamical equation, the details of which are still being worked out [42]. This dissertation
will therefore use approximations as necessary in order to find the most useful descriptions of the
respective problems encountered.
1.3
Subjective Probability in Statistical Mechanics
There have, to date, been many examples of using Bayes’ theorem for framing statistical mechanical
questions. Perhaps the most widely known is in the gradual shift in the conceptualization of an
“ensemble”. Early ideas, associated with the names of Maxwell, Boltzmann, and others were based
on physically realizable systems with many weakly interacting particles, i.e. gases. The theory was
simply that an examination of all the particles at a single instant revealed the statistical properties
of the ensemble. Gibbs [55] adapted the concept to systems which may contain strong internal
interactions, e.g. solids or condensed phases, by imagining the ensemble as an infinite number
of physical replicas of the system. It was immediately clear that for developing such a formal
treatment of the probability distribution over phase and its consequences ‘hypotheses concerning
the constitution of matter’ would not be required except in working out special cases.
6
CHAPTER 1. INTRODUCTION
The gradual nature of the shift toward a subjective interpretation was, in large part, the result of a dispute between Gibbs and his contemporaries [44], who viewed the physical reason for
the weak coupling between ensembles which brought about equilibrium as paramount. Even as
Schrödinger [148] presents a maximum entropy derivation of the canonical ensemble similar to that
of Chapter 2, he still found it necessary to find a middle-ground by considering such distractions
as the physical realizability of infinite heat baths. The work of Jaynes [80, 84, 83] and others [58]
went a great deal toward clarifying the situation, making a distinction between the “delusion that
an ensemble describes an ‘objectively real’ physical situation” [84] and the subjective question of
determining the “agreement between the premises and the conclusions.” [55] However, the philosophical debate over objective vs. subjective interpretations continues to date [166].
Having decided, then, that we should represent an ensemble as an assignment of probabilities
to a particular system based on our state of knowledge, what further advances have come through
the subjective route? First, the utility of the maximum entropy formalism for setting up diverse
problems cannot be underestimated [107]. Next, many uses of subjective probability for calculating
system properties, in particular free energies have also emerged. Two types of applications can
be distinguished. The first set, usually described as ‘information theory’ methods [80, 72, 57],
operate by directly applying the maximum entropy formalism to analytical or very precise numerical
models and are capable of directly generating entropy and free energy estimates. The second set,
better described as ‘Bayesian inference’ methods, work at the level of interpreting the results of
simulations. These use the simulation data as observations in the traditional Bayes manner rather
than as fixed constraints.
At this point, a distinction should be made between applying the maximum entropy formalism
and Bayes’ theorem. In the language of E. T. Jaynes, the maximum entropy formalism allows us
to assign probabilities to a definite hypothesis space during the exploratory phase of a problem.
Once the problem has enough structure to determine P ({x}|∆F, I), then Bayes’ theorem can be
applied to find P (∆F|{x}I) via inference. Application of maximum entropy methods are already
well known in statistical mechanics [107], and are thus foundational to the free energy inference
7
1.3. SUBJECTIVE PROBABILITY IN STATISTICAL MECHANICS
problems considered in this dissertation.
Focusing on the problem of Bayesian inference for free energies, examples can be found in the
reformulation of the weighted-histogram analysis method [53], inference on kinetic data [70, 160,
130], and several estimation methods for end-point free energy differences using work measurements [133, 113]. All of these methods share the theme of treating the simulation observations as
experimental data and are thus notable for their error estimation.
It is important to note that this re-use of simulation observations constitutes a new and different
approach to using subjective probability ideas in statistical mechanics. The traditional use was, as
noted above, in assignment of probabilities to an ensemble and used entropy maximization conditional on known constraints, possibly utilizing Bayes’ theorem to calculate marginal probabilities of
interesting quantities. Straightforward applications of this method gave unambiguous answers for
the probability distributions (and hence the free energies) of these systems, albeit in a form that was
almost never directly amenable to computation. This new use of simulation data makes up a second
layer of inference on the system properties, thus producing a PDF for the free energies. It is ideally
suited to computational investigation, while at the same time recognizing the limitations of current
(finite) sampling methods. This level of indirection is the reason for stating we are generally interested in free energy inference, since it is from these we can infer the distribution of any particular
system property.
Note that I said infer, not directly calculate. Performing inference on collected simulation data
discards information from the complete description of the problem. It creates a fundamentally new
problem statement – namely “What is the probability of event X given the data I have sampled
from the proper ensemble?” The answer must now contain additional uncertainty due to this last
inference, but is, for many problems, the only known way of doing the original integration.
8
CHAPTER 1. INTRODUCTION
1.4
Problem Statement
This dissertation aims to improve current understanding and computational methods for robust calculations of free energies. Free energies are intimately connected with marginal probabilities for
system behavior, and ubiquitously appear as integrating factors in calculations of system properties. Unfortunately, analytical solutions to the integral equations defining these quantities cannot be
obtained for most systems, and we are forced to resort to numerical estimation. Bayes’ theorem appears as the natural choice for such estimation because it allows us to quantify both the free energy
and its error. In effect, this gives a probability distribution for a probability. Conceptual difficulties
such as ill-posed-ness of P ({x}|∆FI) have caused most current error estimations to resort to the
traditional frequentist approach (e.g. block-averaging). However, frequentist error estimates may be
off by several orders of magnitude (2 kCal/mol in the case of 3-methylindole solvation considered
in Ref. [153]). No such formal difficulty exits when using Bayes’ theorem [83].
The first goal of the present work is to apply the Bayesian inference method to the important
problem of solvation free energy determination. In the language developed above, this involves
inference on the likelihood that a solute molecule will occupy a position in solution rather than in
isolation. This likelihood is composed of an average over the instantaneous “coupling” likelihoods
for all solvent configurations, and its logarithm is the solvation free energy. In order to gain physical
insight into the energy functions currently used to model this process, (and also to evaluate the free
energy more accurately) we will in fact consider a good deal more than this, composing the solvation
process as a series of steps and calculating other interesting system properties along the way.
A more ambitious goal is to seek a connection between alternate levels of system resolution
by using inference to create a stochastic Hamiltonian. To put this concretely, we let the event X
represent a particular transition in time between two ‘coarse states’ of the system. If we purposefully
discard information about such fast-equilibrating coordinates as bond lengths and angles, specific
locations of hydrogens, etc., and use only the slower moving variables to define a coarse state, the
transition probabilities between coarse states will provide a complete description of the stochastic
9
1.5. MATERIAL COVERED
dynamics, but will no longer be delta functions as in exact dynamics.
Sampling errors and computability problems are a recurring theme in statistical mechanics investigations. The present work will therefore also undertake to provide tractable computational
procedures. Three aspects of this problem are speed of the computation itself, approximations employed, and consistency of the estimation with respect to the amount of available input (simulation)
data. The first two are usually considered together in the initial development of algorithms, and are
usually confined to sample generation – which always has the option of generating less data. If any
approximations are made during this phase, they should rigorously be included in the prior information for the ensemble under consideration. Because of the confinement to the sampling phase and
the many existing sampling methods described, this work is primarily concerned with the last type
of error.
1.5
Material Covered
Whereas the present chapter has examined the motivation and background for the use of Bayes’
theorem in statistical mechanical thinking and outlined a series of research questions including:
(1) Interrogating the chemistry of molecular solvation and determinants of solution concentration
through the application of QCT, (2) finding rigorous connections between system descriptions at
alternate resolution levels by applying the machinery of statistical mechanics, resulting in coarse
systems with stochastic Hamiltonians [81, 119], and (3) providing tractable computational procedures for calculating important quantities which are robust with respect to the amount of available
simulation data; the remainder of this dissertation will cover the each question raised individually.
First, the equations and terminology used throughout the dissertation will be related in Chapter 2.
A short introduction to Bayesian probability notation is included in an appendix for the unfamiliar
reader. The maximum entropy formulation of statistical mechanics [80, 81] is developed using
simple arguments based on conditional probability and repeated experiments. Some examples of
maximum entropy inference in the statistical mechanical literature are reviewed.
10
CHAPTER 1. INTRODUCTION
Chapter 3 considers generalized coarse-grained ensembles and their associated stochastic dynamics based on the traditional NVT ensemble of atomistic simulations. Application of information
theory ideas to this coarse-graining process yields some new and non-trivial insight relevant to
current investigations into time-dependent behavior [128, 24] and quantitative measurement of the
consistency between coarse and fine results [121] when using the force matching method. In particular, the force-matching method is shown to be a special case of the decision problem of minimizing
information loss during the process of discarding fine-scale information.
Next (Ch. 4), the particular problem of determining solvation properties for hard sphere solutes
in arbitrary solutions from molecular dynamics simulations on Weeks-Chandler-Anderson (WCA)
particles and its solution using Bayesian inference is described. This work is non-trivial, since it provides an efficient method for computing hard sphere solvation free energies required by QCT [19].
Further algorithmic developments based on the same re-weighting also permit the facile determination of other important quantities such as µex
OS,LR and its components. An example of this analysis
is carried out completely for several model anions using the AMOEBA [137] polarizable force field
model.
The problem of inferring a coarse-grained Hamiltonian from observed dynamics is fully developed and solved in Chapter 5. Using the Langevin equation as an integrator model and carefully
considering the choice of prior probability for the coarse Hamiltonian, an application of the ideas of
Ch. 3 leads to a multidimensional generalization of the one-dimensional Bayesian penalized spline
method described elsewhere [104]. Key aspects of the algorithm formulation and software design
are reviewed [139]. This method is applied to several test cases which show that the present formulation has several advantageous properties compared to other methods in the literature. Application
to molecular systems demonstrates the accuracy and robustness of the procedure.
The final chapter (6) reconsiders the goals of this original research – listing accomplishments
and noting further areas of study. In brief, significant algorithmic developments have been made in
both QCT and coarse-graining. The use of Bayes’ theorem for QCT computations has provided a
formulation which is algorithmically simple and easy to implement using existing simulation pack11
1.5. MATERIAL COVERED
ages. Our results using QCT have shown that it can be very useful in restricted and inhomogeneous
environments, but needs more work to be applied to large solutes. The Bayesian formulation for
inferring coarse-scale stochastic Hamiltonians leads naturally to force matching procedure, and has
shown great promise as a method for generating coarse-grained models for arbitrary systems. The
implementation described is currently lacking only in its algorithmic efficiency due to its proof-ofconcept status. There are a great many possible applications for this fully developed method which
could not be carried out in this dissertation due to space and time limitations. Finally, formulation
and testing of computationally tractable calculation procedures have been considered throughout.
These have formed a set of statistical mechanical analysis methods which give useful insight into
many of the current problems in computational physics and which are robust with respect to the
amount of available simulation data.
12
Chapter 2
Information Theory and Statistical
Mechanics
As we will see in this chapter, the maximum entropy (MAXENT) formulation of statistical mechanics is an information theory approach and as such is intimately related to Bayesian statistics. This
connection comes about as a consequence of attempting to find a probability distribution which
exhibits some desired behavior conforming to known prior information, but which is in all other
respects completely random. The end result is a theory which is able to make inferences on any
other quantity in the model conditional on this prior information.
From this perspective, not only is it possible to derive the (NVE,NVT,etc.) ensembles commonly used in statistical mechanics, but it is also possible to derive models incorporating molecular
level information in the same way [28]. Relatively little attention has been given to this “reduced
information” aspect of statistical mechanics since the seminal work of Jaynes. This chapter will
give an overview and address some of the issues that surface. The end result will be a few examples
illustrating why incorporating molecular level information has been difficult. These examples will
give new insights and ideas on how to proceed.
To define MAXENT mathematically, consider a game of chance wherein M possible outcomes
13
{xi }M
1 exist, each with a corresponding payoff of Ei . In order to analyze the game, we make an
initial (zero-th order) estimate of the probabilities of each outcome, call them {p0i }M
1 . Now we can
0
start to calculate things like the expected payoff, hEi0 = ∑M
i=1 Ei pi . Of course, in order to make any
sort of real strategy, what we really want to know is the distribution of payoffs over N turns. To get
this sort of information, let’s figure out the likelihood of playing the game N times and observing
outcome frequencies of { fi }M
1 (i.e. hitting each xi just about N f i times). You will notice that this
probability is a multinomial distribution [67].
M
M
i=1
i=1
0
0
ln P({N fi }M
1 |N p ) = ln N! − ∑ ln(N f i )! + ∑ N f i ln pi
(2.1)
Using Stirling’s formula1
N! = Γ(N + 1) ≈
√
2πN(N/e)N , N ≥ 100,
(2.2)
we can see that before long the likelihood of observing a given frequency distribution decreases as
M
0
0
ln P({ fi }M
1 |N p ) ≈ N ∑ f i ln(pi / f i ) + const..
(2.3)
i=1
As we would expect the maximum of the PDF is peaked around the zero-th order probabilities
2
and measures something we will call H, the “information entropy” of f ≡ { fi }M
1 .
lim P( f |N p0 )1/N ≡ eH( f |p
0)
N→∞
(2.4)
M
H( f |p0 ) = ∑ fi ln(p0i / fi )
(2.5)
i=1
1 By
N=100, the error has decreased to about 1%. As an aside, another commonly used approximation to Stirling’s
N [148]. Comparing to Eq. 2.2 in the asymptotic limit shows that this formula has an increasing error on
formula is (N/e)
√
the order of N.
2 This
is shifted by a constant from than the standard definition of entropy (and even information entropy) since it can
be shown to always be negative. This point will be discussed in detail, but for now it is interesting to note that Jaynes [83]
remarked that expressing H as just such an entropy difference would be more satisfactory.
14
CHAPTER 2. INFORMATION THEORY AND STATISTICAL MECHANICS
Just like games of chance always have some subtleties to them that are not apparent on a first
analysis, much scientific progress has been made by tabulating frequencies (going back to the works
of Pascal, Fermat, Huygens, De Moivre, Halley, and others [162]). Accordingly, discrepancies
indicate that something has been left unexplained, since it doesn’t fit the usual expectations. This
is, in fact, the basis for using H as an information measure. For the present example, suppose we
noticed that the game’s outcome was largely the result of other players, who conspire in such a way
that the expected payoff is not ∑ Ei p0i , but some other expectation, hEi = U.
This completely changes our understanding of the game and motivates the following question.
“What is the most likely outcome distribution making up this average?” From the above discussion,
this (unknown) distribution, call it p, should maximize the likelihood of Eq. 2.3 (maximizing the
information entropy for large N) under the constraint that U = ∑ Ei pi . It turns out that this is exactly
the prescription given for inferring the distribution of molecular structures in statistical mechanics.
It has been shown elsewhere how to find probabilities p which maximize H given the above
constraint, U, using the method of Lagrange multipliers [83]. In fact, the solution can be given
j
in general for any number of constraints of the form G j = ∑ gi pi (with corresponding Lagrange
multipliers {λ j }K1 ).
− ln pi /p0i =
K
∑ λ j gij + ln Z
(2.6)
j=1
Where the normalization constant,
M
Z ≡ ∑ p0i e− ∑ j=1 λ j gi ,
K
j
(2.7)
i=1
has already been solved and eliminated from the set of Lagrange multipliers.
15
The value of H 3 at the maximum is
K
H=
∑ λ j G j + ln Z.
(2.8)
j=1
At this point, some purely mathematical observations can be made. First, the entropy maximum
(Eq. 2.6) can be rigorously proved by noting that ln x ≤ 1 − x with equality only at x = 1. This gives,
for (2.4),
H f = ∑ fi ln(p0i / fi ) ≤ ∑ fi (p0i / fi − 1) = 0
i
i
This applies for any (normalized) choice of p0 . So we can make the substitution, p0i → p0i (pi /p0i )
and move the ratio in parentheses to the right side to get an inequality concerning our chosen distribution, p and any distribution, f .
K
∑ fi ln(p0i / fi ) ≤ ∑ fi ln(p0i /pi ) = ∑ fi ( ∑ λ j gij + ln Z)
i
i
i
j=1
K
Hf ≤
∑ λ j G j + ln Z
(2.9)
j=1
So if we let f vary over all distributions satisfying our constraints, the right side remains constant,
and the entropy is maximized if and only if we choose f as the canonical distribution, Eq. 2.6.
Next, the Lagrange multipliers have still been left unspecified. However, we get some hints by
considering the form of Z(λ), Eq. 2.7. This function is mathematically a Laplace transformation
of the p0 probability distribution function, which is identical to the moment generating function
(MGF) of statistics – with the argument signs reversed. In fact, from Eq. 2.6, it also serves as the
3 Equations
2.7 and 2.8 don’t explicitly state their functional dependence. However Z is fundamentally a function of
λ, while H is to be maximized given the constraints {G j }K
1 . Since these also determine λ, H is completely specified by
the constraints. Of course, both are predicated on p0 , and hence any fixed properties of the state space as well.
16
CHAPTER 2. INFORMATION THEORY AND STATISTICAL MECHANICS
j
MGF of (the gi distributions of) p using the appropriate combination of scaling and shifting.
M
M(t|p) ≡ ∑ pi e∑ j=1 t j gi
K
j
i=1
= Z(λ − t)/Z(λ)
The connection to the moment generating function makes it easy to computing averages G j ≡
(1)
(n)
G j and even higher order cumulants, G j , defined by
(n)
Gj
∂n ln M(t|p) =
.
∂t j n
t=0
(2.10)
In fact, mixed partial derivatives even give covariances, etc. as expected.
In statistical mechanics, we posit a system with M states, each with a known energy, and fix
Z
the average to get U = − ∂ln
∂λU . To finish the connection to thermodynamics, we just need a new
terminology for the (possibly scaled) variables and an appropriate physical choice for p0 . For the
latter, intuition suggests the ideal gas reference state, in which each Cartesian coordinate (including
velocity) for every particle present is assigned an equal probability. As for the terminology, the
constraints are already known, internal system energy, volume, etc. Their Lagrange multipliers also
have special identities, and must have units canceling those of the constraints.
The multiplier for the energy is the inverse temperature, λU ≡ β =
1
kB T .
This choice for the
energy scale gives physically meaningful values, since we usually see a linear dependence of the
internal energy on T . Although the above discussion uses unitless quantities, most expositions prefer
to express everything in units of energy so that U = βE. Dividing Eq. 2.8 through by β gives
T kB H = E + kB T ln Z.
(2.11)
And it is now tempting to state that the thermodynamic entropy, S is kB H. This is not the case,
since historical thermodynamic assumptions emphasized the zero-temperature limit of the entropy
17
as near zero. This has lead to a problematic definition,
M
S ≡ −kB ∑ pi ln pi ,
(2.12)
i=1
which can cause conceptual trouble because it depends in a peculiar way on the number of system
states. Although quantum-mechanics infers that all finite systems have a countable (usually infinite)
set of discrete states, continuous probabilities have units of (dx)−1 , which are just (degeneracy)−1
R
for discrete sums. This leads us to the strange situation in which expressions like − dxp(x) ln p(x)
have units of ln dx.
Where do the units go in Eq. 2.11 using this definition? It should be apparent from the above
that the free energy is slightly altered as well. The usual definition of
βF = − ln
Z
dxe−βE(x)
(2.13)
has units of − ln dx. These definitions only match up with Eq.s 2.4 and 2.7 when the prior probabilities are uniform, i.e. p0i = 1/Ω, where Ω is the phase space volume. In this case, 2.12 and 2.13
ignore the additive constant to give
H = S/kB − ln Ω
− ln Z = βF + ln Ω.
(2.14)
Some of the difficulties that have arisen from this conceptual shift are notions about the “absolute” entropy of a system, which cannot be uniquely defined for a system having an infinite number
of available states. The standard idea is to use ~/2 as a very small constant to define the number of
states available to a continuous system.
4
However, this choice is not unique as any physical length
scale can be used to define an “ideal” maximum number of states. Thus, a basic understanding of
4 Referring to the Heisenberg uncertainty principle, ~/2 ≤ ∆x∆p, gives a “volume element” for phase space. However,
R
~
dx still goes to negative infinity.
when a delta-function distribution is allowed, δ(x) ln 2δ(x)
18
CHAPTER 2. INFORMATION THEORY AND STATISTICAL MECHANICS
this counting problem is important when dealing with changes in the number of degrees of freedom
for continuous systems – e.g. removal of “fast coordinates” in a coarse-graining procedure, or translational motion in binding free energy calculations using MM-PBSA (see below). More have come
from the fact that any numerical simulation or integration necessarily includes the ideal gas integration factor, 1/Ω, or, as is the case for non-Cartesian coordinates, an appropriate Jacobian [23, 37].
These conceptual problems are generally solved by strictly considering systems in which separate
degrees of freedom are either coupled or uncoupled, but still present in the integration – i.e. forcing
conservation of degrees of freedom.
Practically, whenever we deal with continuous systems, equations 2.13 and 2.12 are used,
while 2.7 and 2.8 make more sense for discrete systems and should be used for developing general
concepts. To see this, consider a discrete system with two states in which the first has probability
1. The traditional definition assigns zero entropy to this case, while Eq. 2.14 asserts H = − ln 2.
Suppose we now find another measurable property for the system which is on or off with equal
probability. Considering this as a subsystem of the whole, the traditional entropy for this subsystem is S/kB = ln 2, while Eq. 2.8 gives an entropy of zero to the completely random case. Both
definitions are additive for these uncoupled subsystems, so the complete system with four states
(probabilities 1/2, 1/2, 0, 0) has a traditional entropy of ln 2 and an unchanged information entropy
of H = − ln 2. If we repeat the last step N times, the traditional entropy becomes S/kB = N ln 2,
while Eq. 2.8 remains constant at H = − ln 2. All the common tools of statistical mechanics apply
to both definitions. However, H is invariant to the number of distinguished states under the above
process and is more clearly a measure of the distribution itself.
We need to use the continuous case when considering the system volume, V in the derivation
of the constant pressure ensemble. Briefly, to constrain the average volume to a given hV i at n
19
2.1. MAXIMUM ENTROPY EXAMPLE: QUASI-HARMONIC ANALYSIS
particles, we can substitute gi = Vi and p0i = cVin−1 into Eq. 2.6 to get5
pi = cV n−1 e−λVi /Z →
∞
λn n−1 −λV
V e
Γ(n)
(2.15)
Z = c ∑ Vin−1 e−λVi → cλ−n Γ(n).
i=1
Subtracting off ln 1/c as in Eq. 2.14 comes naturally by dividing out c in (2.15). This gives
βF = n ln λ − ln Γ(n)
(2.16)
and
hV i =
∂βF
= n/λ
∂λ
using (2.10). Placing this constraint requires an energy β−1 λhV i, so the (scaled) Lagrange multiplier
of the volume β−1 λ = P = nβ−1 /hV i is just what we mean by the pressure.
The main derivation very closely followed the usual one given for the NVT ensemble [81, 28,
40]. However this latter point has given a rather unconventional derivation of the NPT ensemble
for an ideal gas – even without any condition on hEi! The next few examples will consider more
such uses of the MAXENT formalism before we proceed to our main object, a statistical mechanical
ensemble for coarse grained systems in Ch. 3.
2.1
Maximum Entropy Example: Quasi-Harmonic Analysis
In order to illustrate the potential impact that the MAXENT viewpoint can make, the entropy estimate of Karplus and Kushick is briefly reviewed [91]. Their prior information is the joint variancecovariance matrix for 3N − 6 coordinates, which can be calculated from molecular dynamics simchoice of P 0 (V |n) ∝ V n−1 can be justified by the following explanation. Let V denote the proposition that the
available system volume is V . Before any particles are added, P 0 (V |0) uniquely satisfies the scale and power independence, i.e. P (x)dx = P (ax)d(ax) = P (xn )dxn all have the same functional form. Next, we can use induction to get
n)
= const.V P (V |n). Taking α to mean that another particle is added to the system (see Ch. 3).
P 0 (V |nα) = P (V |n) PP(α|V
(α|n)
5 The
Keeping with the ideal gas interpretation of p0 , this should be proportional to the volume.
20
CHAPTER 2. INFORMATION THEORY AND STATISTICAL MECHANICS
ulations. Because we are maximizing the entropy given this information, the estimate obtained is
formally an upper bound on the true entropy. It is used extensively in the molecular mechanics Poisson-Boltzmann + Surface Area (MM-PBSA) and related free energy approximation schemes,
where it has proven fairly useful [159, 99].
We can also note that normal mode analysis, a slight variation of the above, makes the further
assumption that we are only interested in the dynamics in a local basin. This can be well-described
by a harmonic potential, and obtains similar restraints from a calculation of the Hessian matrix of
second derivatives.
First, to avoid infinities, we must shift the entropy to the standard definition of Eq. 2.12. Using
the set of covariances between coordinates i and j,
[C]i j = δxi δx j ,
(2.17)
as constraints and maximizing S/kB gives
P (x) ∝ exp(−
3N
∑ λi j δxi δx j ).
(2.18)
i, j=1
This is just the form of a Gaussian distribution, so the connection between the Lagrange multipliers (together known as the penalty matrix in this context) and the covariance matrix is wellknown [67]. Note that even though we have not constrained the mean, this makes no difference in
the entropy. It is a function of the determinant of the covariance matrix, |C|.
S/kB =
1
2
3N − 6 1 + ln(2π) + ln |C|
(2.19)
If we state our answer in terms of S/kB , why worry about the earlier definition, H (Eq. 2.4)?
Consider the effect of the following operation.
i Run a simulation of an arbitrary molecule and compute the covariances 2.17.
21
2.2. MAXIMUM ENTROPY EXAMPLE: VOLUME PARTITIONING
ii Calculate 2.19 for the molecule’s internal 3N − 6 degrees of freedom.
3N−6
iii Add a linearly dependent degree of freedom to every simulation frame, say x0 = ∑i=1
xi , and
recompute (i) and (ii).
Of course, since the new degree of freedom is linearly dependent on the rest, it adds no new
information and should not change the system’s entropy. However, the result of (iii) will be negative
infinity. Or, more precisely, it will be the result of (ii) plus an additional ln dx. This difficulty does
not appear when using Eq. 2.4 – which gives the same H for both systems.
For a more practical example, consider applying this model to a receptor-ligand binding problem [172]. A straightforward application of Eq. 2.19 excluding ligand translation and rotation degrees of freedom for the unbound case and including them for the bound case would lead to a numerically correct, but conceptually erroneous answer. Instead, appearance of infinities in Eq. 2.4 warns
us that these external degrees of freedom have been restrained when near the receptor and forces us
to consider a more appropriate volume-based definition of binding, i.e. identically confined bound
and unbound states for the ligand.
2.2
Maximum Entropy Example: Volume Partitioning
What is the physical content of the radial distribution function or similar metrics of local fluid
structure? Although the last section made information theory inference appear easy, this is not
always the case. In particular, the partition function can be very difficult to calculate for even
the simplest of problem specifications – rendering the technique impossible to apply. This section
gives a novel but difficult example of MAXENT to answer the above question and points out some
possibilities for choosing prior information with parsimony.
The MAXENT process described in this chapter finds the entropy decrease from an unperturbed
situation, giving an energy scale for measuring the information content of the constraints imposed
on the system. We will try to calculate an approximate entropy change for creating a fluid with
22
CHAPTER 2. INFORMATION THEORY AND STATISTICAL MECHANICS
local structural ordering. If this structural ordering is carried out in several separate systems, the
free energies and entropies will be additive over systems – showing the analogy between the infinite
dilution chemical potential and the free energy change for this process. To make this explicit, we
will exhibit the corresponding solute-solvent interaction energy function by analogy with Eq. 2.6.
However, one caveat will also be found. If the initial problem contains correlations not accounted for as constraints, then the entropy will not be physical. This will eventually lead to a
consideration of “subsystem entropy” in Ch 3 which will remedy these difficulties.
Since we could use any type of average as a constraint to quantify the fluid structure (and could
think of several related to the radial distribution function, g(r)), this becomes an exercise in figuring
out which integrals are easiest to calculate. A first try would be to constrain the radial distribution
function itself.
1 N
g(r) =
∑
4πr2 ρ i=1
Z
δ(ri (x) − r)P (x)dx
We would then specify a continuous set of Lagrange multipliers to cover the range of r under
consideration, λ(r). The partition function (2.7) would be
Z
e−
R
λ(r)g(r)dr
P 0 (x)dx,
which implicitly uses an ideal gas reference state, since equal probability was given to each Cartesian
coordinate in phase space. This is an important point, since it means all the relations for ideal gases
can be used to simplify this integral, and that the formulas rigorously apply to free energy and
entropy differences from an ideal gas state.
A more computationally friendly idea is to separate g(r) into (say M) shells, and count the average number of solvent centers in each shell. The partition function becomes a sum over all possible
solvent center counts in each shell, {n j }M
1 . We can now use our knowledge of ideal gas statistics to
simplify this sum. First, processes occurring in each region of space should be independent, so to
get count probabilities we only need to specify a volume for each shell, v j , without reference to its
shape, and then multiply the partition function from each shell to get a total Z. Next, the occupancy
23
2.2. MAXIMUM ENTROPY EXAMPLE: VOLUME PARTITIONING
distribution for each shell should be Poisson [67] with mean µ0j = v j ρ.
To rigorously prove this statement, consider a system with N particles in a total volume V . The
distribution of {n j }M
1 arrived at through randomly placing these particles with equal probability
everywhere in the box is multinomial (Eq. 2.1). Defining the total shell volume and counts as
VM = ∑Mj=1 v j , NM = ∑Mj=1 n j , the distribution is
P ({n j }M
1 )
N!
=
(N − NM )! ∏Mj=1 n j !
V −VM
V
N−NM
M
∏
j=1
v n j
j
V
.
It is a simple exercise in using Stirling’s formula (Eq. 2.2) to prove that this approaches a product of
Poisson PDFs as mentioned above.
M
P({n j }) = ∏ (ρv j )n j e−ρv j /n j !
j=1
As noted above, the partition function can be more complicated to compute. We would like
this integration result to come out in closed form in a very clean way so that we can use the con
strained averages {µ j ≡ n j } to quantify the entropy change. Since the probabilities of each n j are
independent, we can write ln Z = ∑Mj=1 ln Z j with
Z j ({λ j }M
1 )=
∞
∑ e−λ n (µ0j )nj e−µ /n j !
j j
n=0
0
= eµ j −µ j
0
j
∞
∑ (µ j )n e−µ /n j !
j
n=0
µ j −µ0j
=e
µ j ≡ µ0j e−λ j .
Next, we note how the “exponentially tilted” probability P (n j )e−λ j n j /Z j is still Poisson with
mean µ j . Although this situation does not usually occur,6 it further simplifies our analysis, since
6 There are some references in the statistical literature examining the set of probability functions (termed natural
exponential families) for which this exponential tilting property holds [18, 17, 106, 62]
24
CHAPTER 2. INFORMATION THEORY AND STATISTICAL MECHANICS
Z
this already gives us the average without using n j = − ∂ln
∂λ j .
To interpret this as a change in the (ideal gas) solvent probability distribution on solvation, the
interaction energy function is
β∆E =
M
∑ λ jn j,
(2.20)
j=1
which is exactly the form of an off-lattice Gō potential [182]. The physical picture of the information
contained in the RDF is just what is expected, that the solvent is “pulled” or “pushed” into higher or
lower density regions with potential λ(r)n(r) = −n(r) ln µµ0 = −n(r) ln g(r). To solidify this analogy,
consider the integration
− ∑ ln Z j →
j
Z ∞
0
µ0 (r) − µ(r)dr =
Z ∞
0
4πρr2 (1 − g(r))dr = 2ρB2 .
(2.21)
This clearly exhibits the connection of these results to the virial series expansion ( ∂βP
∂N = 1/hV i +
2ρB2 /hV i + · · · ). To derive the partial molar pressure arrived at here, compute the partial molar free
energy from Eq.s 2.21 and 2.16 and take the derivative with respect to volume.
β
∂P
∂2 βF
∂
[− ln hV i + 2NB2 /hV i]
=−
=−
∂Nα
∂hV i∂Nα
∂hV i
= 1/hV i + 2ρB2 /hV i
The final step is to calculate the information entropy.
M
H=
j=1
Hj =
M
∑ Hj
(2.22)
λ j µ j + ln Z j = µ j ln(µ0j /µ j ) + µ j − µ0j
(2.23)
∑ λ j n j + ln Z =
j=1
By Eq. 2.9, this forms an upper bound on the entropic penalty due to solvent reorganization in the
volumes {v j }. However, Eq. 2.9 does not consider other possible types of solvent reorganization.
In particular, if the solvent initially contained some ordering within each v j , but left µ0j = ρv j ,
25
2.2. MAXIMUM ENTROPY EXAMPLE: VOLUME PARTITIONING
this would be indistinguishable from an ideal gas in our model. Now if this initial ordering was
subsequently removed by the addition of a solute, then Eq. 2.22 may show no entropy loss, while
the more complete entropy measure (including the initial ordering) would actually increase!
The above argument shows that MAXENT arguments are only physically realistic if we account
for all constraints actually present on the system. This makes Eq. 2.21 correct only for the case
of particle solvation in an ideal gas. Such considerations would be an important reference point in
problems such as estimating the ideal distribution of pair contacts in a Gō model. Now, if “extra”
ordering is present in the system, Eq. 2.22 is no longer an upper bound, but may still be applicable
if the extra ordering is small. By analogy to the virial series expansion, we would expect Eq. 2.21
to work well when the solvent density ρ, is low. Another example is the information theory model
of Hummer et. al. [72], which is reproduced using the above formalism with one site (M = 1)
and adding another constraint on the volume fluctuations, δn2 . Again, this model gives good
approximations to the reorganization free energy when the solvent density is low.
26
Chapter 3
Information Theory Perspective on
Stochastic Dynamics
Stochastic dynamics results whenever there is uncertainty in the initial conditions of an experiment. The traditional example is a ceaselessly jostling grain of pollen under Robert Brown’s microscope [50]. A non-traditional one is the de Broglie-Bohm interpretation of quantum mechanics [25, 56]. In either case, the motion of interesting coordinates (a subset of the whole) can only
be described statistically. This chapter describes the basic thermodynamic theory of such statistical
descriptions.
3.1
Thermodynamics of Coarse-Grained Systems
In standard formulations of statistical mechanics such as those encountered in Ch. 2, pure system
states, xi are assigned energies Ei . Coarse-grained systems result when a collection of states are
lumped together into coarse states, or mesostates. A mesostate yk can be formally defined by specifying the fine-scale states it contains
yk = {xi |i ∈ Yk }.
(3.1)
27
3.1. THERMODYNAMICS OF COARSE-GRAINED SYSTEMS
Here Yk is a list of pure state numbers, with the usual mutual exclusive and exhaustive restrictions
on the set of all mesostates. In this case, specifying that the system is in the particular mesostate yk
gives only low-resolution, coarse information about the system state.
Thermodynamic connections between fine and coarse level descriptions can be made by specifying gi for each pure state. From (2.6), defining Bi ≡ ∑Kj=1 λ j gi , each state has probability
j
j
pi = p0i e−Bi /Z. The marginal probability for any of a set of states is (denoting coarse probabilities with uppercase P-s)
Pk ≡ P (yk ) = P ( ∑ xi ) = Z −1
i∈Yk
∑ p0i e−B
i
i∈Yk
≡ Pk0 Zk /Z
(3.2)
So that pure states can be thought of as a subsystem of their respective mesostate with PDF
P (xi ) = P (xi |yk )P (yk )
P (xi |yk ) =
p0i e−Bi
I(i ∈ Yk ).
Pk0 Zk
(3.3)
(3.4)
Here I(cond.) is an indicator function which is one when the condition is satisfied and zero otherwise.
The interpretation of the substates xi belonging to a particular mesostate as a subsystem can be
made complete by exhibiting the decomposition
ln Z = ∑ Pk (ln Zk − ln (Zk /Z))
k
= ∑ Pk ln Zk + Pk ln (Pk0 /Pk ).
(3.5)
k
The last term on the right is just H(P) – Eq. 2.4 applied to the coarse system.
Eq. 3.3 can be used to expand the averages
G j ≡ g j = ∑ Pk g j |yk ≡ ∑ Pk G j|k ,
k
28
k
(3.6)
CHAPTER 3. INFORMATION THEORY PERSPECTIVE ON STOCHASTIC DYNAMICS
into contributions from each subsystem. The nesting structure is particularly apparent in that the
conditional distributions (3.4) are maximum entropy distributions given G j|k .
A special case of this construction is Jaynes’ [80] description of a thermometer. Imagine that
the xi describe the states of a reaction vessel in thermal contact with a vein of mercury, while only
the progress of the reaction is described by yk . The average energy of the mercury can be inferred
from its expansion, giving hE|yk i and, by the Lagrange multiplier in (3.4), the temperature of the
reaction vessel conditional on the progress of the reaction.
Finally, the entropy can be decomposed by combining (3.5) and (3.6) into (2.8).
K
H=
∑ λ j G j + ln Z
j=1
= ∑ Pk
k
K
∑ λ j G j|k + ln Zk
j=1
!
+ Pk ln (Pk0 /Pk )
≡ hHk iCG + HCG
(3.7)
The above expression says that in a coarse-grained system, or any system where we have incomplete knowledge, the total entropy can be expressed as the information entropy of the events
we know about HCG plus the average entropy of all the embedded subsystems Hk . When the subsystems are completely random, Hk is zero and thermodynamic considerations can justifiably focus
only on {yk }. However, as discussed by Jaynes [82], there is always the possibility that some new
experimental process could manipulate a now unknown property of the system – causing a change
in hHk iCG which had always been negligible (previously making entropy changes associated with
subsystems indistinguishable). For an example of this, we need only remember the alterations in the
thermodynamic theory due to the entropy of (usually) isoenergetic spin states.
We now have an explicit formula for the “missing” entropy in the volume partitioning problem
earlier. Although it seems to introduce more uncertainty, this result actually extends the range of
problems which can be treated using thermodynamics substantially. The ability to deal with partial
information lets us treat cases where part of the problem is left unspecified (as in the thermometer ex29
3.1. THERMODYNAMICS OF COARSE-GRAINED SYSTEMS
ample above) and even nonequilibrium processes (as will be shown in the next few sections). When
information is removed, dynamic systems become irreversible, acting as entropy generators which
continually loose information in proportion to the amount of uncertainty introduced at each application of the stoßzahlansatz.1 Early versions of this idea appeared in the works of Boltzmann [105],
while the present discussion is based on the more recent works of Jaynes [81].
3.1.1
Connection to Bead-Based Coarse-Graining
The commonly employed mesoscale model simulation consists of coarse “blobs” or “beads” defined
from a full atomistic system model in a simple way.
  

R(~x) A
 =   ·~x

B
r(~x)
(3.8)
The bead positions, R, are just linear combinations of the atom positions. A full one-to-one transformation should also specify “bath” variables, r, independent of R so that P (x|R) = P (r|R). Of course
more complicated transformations could be defined, however the above definition is sufficient for a
very large class of models.
Functions of coordinates can be defined using Eq. 3.8 as well. Notably, the velocities of coarse
coordinates are straightforward linear combinations of atomic forces.
   
Ṙ A ˙
  =   ·~x
ṙ
B
(3.9)
This means the momenta are (using symmetric, positive definite mass-metric tensors M for the new
and m for the old coordinates)

 
p
A −1
 R
  = M ·   · m ·~p.
B
pr

(3.10)
1 This German phrase originating in the work of Boltzmann has been variously translated as the molecular chaos,
random phase, or repeated randomness assumption.
30
CHAPTER 3. INFORMATION THEORY PERSPECTIVE ON STOCHASTIC DYNAMICS
and similarly for the forces, FR , Fr = ṗR , ṗr .
For the kinetic energies to coincide ( 12 ~pT ·m−1 ·~p = 12 [pR , pr ]·M−1 ·[pR , pr ]T ), the coarse masses
must be given by
 −1
 −1,T
A
A
·m·  .
M= 
B
B
(3.11)
In standard usage, m is diagonal and the rows of A are aweighted
average of unique coordinates. In

A
this case, B can be chosen so that so that all the rows of   are orthogonal and the matrix inverses
B
can be replaced by transposes (times a diagonal matrix). In this case, the mass of a coarse site comes
out to be a weighted sum of its respective atom masses as expected.
For any coarse-graining scheme to be successful, the probability distribution of the coarse coordinates (through whatever process it may have been obtained), should approximate the marginal 2
fine scale distribution.
P (R)dR =
Z
···
Z
Z
···
= dR
{~x|R(~x)=R}
Z
P (~x)d~x
P (R, r)dr
(3.12)
(3.13)
This is the continuous analog of Eq. 3.2, and the volume elements have been written explicitly to
avoid messing with the Jacobian at this stage.
Equation 3.7 is significant for studies employing fully coarse-grained models, since it gives a
rigorous way to calculate any thermodynamic property directly. All that is necessary is a satisfactory
description of the subsystem equation of state, hHk iCG (R, N,V, T ), which should have a relatively
simple form compared to complete systems.
Although it could be argued that dissipative particle dynamics is moving in this direction, developments in the field of bead-based coarse-graining have not yet fully recognized this connection.
2 A marginal distribution is any distribution resulting from integration over some of the original variables – possibly
named for the location commonly used to put down the result after adding all the numbers in a row.
31
3.2. INTEGRATION SCHEMES
This is evidenced by the reports of representability problems for CG “potentials,” − ln P (R) (free
energies in the above discussion). Briels and Akkermans [26] noted that no “soft” pairwise potential
can reproduce the correct pressure at the CG level. This was later verified for a single-site version of
the TIP4P water model by a thorough investigation of virial pressure, internal energy, and transferability across temperatures and densities [85]. This has not stopped other authors [77] from directly
equating the atomic and CG virials. From the discussion in this section, we would expect such differences between the average energy or virial (Eq. 3.6) and average CG free energy or its derivatives
and have a neat way to go about parametrizing such conditional averages (vis. the equation of state,
above).
3.2
Integration Schemes
In classical mechanics, particle trajectories are completely determined for all future times by the
initial conditions and the Hamiltonian H (x, p) – writing the coordinates and momenta of the N
particle system as x and p, respectively. Many expositions on dynamical theory also define the
Louiville operator,
N
iL ≡
∂H ∂
∂H ∂
∑ ∂p j ∂x j − ∂x j ∂p j ,
(3.14)
j=1
as a way to express the time derivative of any function of coordinates and momenta only.
df
∂f
∂f
= ẋT
+ ṗT
= iL f (t)
dt
∂x
∂p
(3.15)
f (t) = etiL f (0)
(3.16)
This last form is a formal solution only, giving a neat notation for the state of the system at any
future time. In order to do any computations, some power series expansion of the exponent will
have to be done, leading back to 3.15.
These are important relations for understanding the traditional development of the fluctuationdissipation theorem [100] (FDT) and projector operator formalism [119]. Eq. 3.15 can be applied
32
CHAPTER 3. INFORMATION THEORY PERSPECTIVE ON STOCHASTIC DYNAMICS
Figure 3.1: Time evolution of an ensemble of stochastic oscillators (dotted circles) is compared
with deterministic motion (ellipses). Stochastic interactions with an external bath continually add
and remove kinetic energy from the system, resulting in a spreading out the phase space distribution.
to a probability distribution on phase space, which we’ll term W (x, p,t), after adding another partial
derivative with respect to time. The total time derivative of the distribution should be zero by the
Louiville theorem, implying conservation of probability or “phase space volume” in any dynamical
process. The result is
dW
=
dt
∂
+ iL W (x, p,t) = 0.
∂t
(3.17)
Figure 3.1 presents a phase space diagram of a simple harmonic oscillator (e.g. a sheet of particles in a stationary sound wave) to illustrate these concepts. Classical propagation using (3.16) leads
to a trajectory confined to a single ellipse. The shape of the ellipses are completely specified as isosurfaces of the system energy, and no lines are allowed to cross. Now imagine that the harmonic
oscillator (e.g. a gas molecule in the sound wave above) is moving through an external medium,
continually gaining energy through random collisions from behind and losing it with through collisions with other particles from the front. These random collisions will cause a group of particles
initially in a small region of phase space to spread out (areas enclosed by dotted lines).
The complete system dynamics should give us a way to rigorously describe the course of this
random collision process. Decomposing the Louiville operator into separate parts acting either on
coarse or bath coordinates is possible from the linear nature of (3.8) and (3.14). Since the coarse
33
3.3. CONTINUOUS TIME INTEGRATION
momenta completely specify the bead trajectories, the bath variables only enter their equation of
motion through exerting forces, FR (t) = hFR |Ri + ∆FR (r0 ,t). Some rather complicated mathematical
arguments based on 3.15 lead to a generalized Langevin equation (GLE) [6]
FR (t) = S(t) + K(t) − β
Z t
0
K(τ)K T M−1 pR (t − τ)dτ.
(3.18)
The forces acting on R at each (infinitesimal) point in time are the average force hFR |Ri ≡ S(R), a
random force, K(t), and a friction dragging in the opposite direction of the momentum, pR . The
derivation of the above equation from the complete system dynamics (3.14) assumed only that the
force autocorrelation function, K(τ)K T , does not depend on the coarse momenta pR .
Although propagating the system according to the conserved forces, hFR |Ri ≡ S(R) (as is done
in many CG simulations [123]) will give a correct coarse PDF, allowing randomness through the
∆FR term is the only way to model the true dynamics of the system. Akkermans, Padding and
Briels [6, 128] should be credited with early development of dynamic CG simulations from atomistic
models, which were reported later by Izvekhov [76]. Other important developments in dynamic
CG models are the general purpose CG-MD program COGNAC [9], and momentum-conserving
stochastic integrators for dissipative particle dynamics [46].
3.3
Continuous Time Integration
Stochastic integration methods can be defined in two different ways, either continuous or discrete
in time. Traditional developments have been continuous, focusing on infinitely differentiable functions. Unfortunately, these are not usually computable unless further (discretization) approximations
are made. Such approximations occupy a rather strange and uncomfortable part of the theory, associated with a change in the system Hamiltonian after parametrization. So, after considering the
traditional derivation of the FDT, the next section will set out to produce the analogous theorem for
the discrete case.
34
CHAPTER 3. INFORMATION THEORY PERSPECTIVE ON STOCHASTIC DYNAMICS
The merit of continuous time developments is the availability of an analytical solution for the
equilibrium probability distribution – the canonical distribution found in Ch. 2. This is proved by
adding a Fokker-Planck drift-diffusion process to the total time derivative of the distribution (3.17).
dW
dt
∂
C(pR )
W (R, pR ,t) = 0
=
+ iL − ∇ p
2 ∇ p − A(pR )
∂t
(3.19)
This connects with (3.18) in the Langevin limit where the drift A(pR ) ≡ −GM−1 pR = −β
Rt 0
K(τ)K T M−1 pR (t −
τ)dτ is an instantaneous “damping” with coefficient matrix G and C is the momentum-space variancecovariance matrix for a pure diffusion process (without the presence of iL or A). For a more general
treatment of continuous time integration including time-correlations, the reader is referred to the
work of Cáceres and Budini [27].
Now it is a classical result that the canonical ensemble PDF,
W (R, pR ,t) ∝ e−βH (R,pR ) ,
is already stationary with respect to iL (i.e. deterministic evolution satisfies
(3.20)
∂W
∂t
= −iL W = 0). This
leaves us to prove that the momentum space drift/diffusion terms cancel. Doing the substitution for
the kinetic energy of H implied in §3.1.1 gives the FDT
G = β2 CM−1 .
(3.21)
The assumptions involved in deriving (3.19) and (3.21) are difficult to justify in the context of
discrete integration schemes. First, commonly used integration schemes are based on the Langevin
limit for computational simplicity. In this limit, the friction force does not have a “memory” in
time and must therefore be confined to an instantaneous friction. However, the bath variables cannot physically relax/decorrelate infinitely fast as required by this assumption while bead dynamics
remain stable at short time-scales (i.e. an autocorrelation function hpR (t0 )pR (t0 − τ)i with zero
slope at τ = 0). This trouble with short-timescale behavior was noted even in the early works of
35
3.4. DISCRETE TIME INTEGRATION
Kubo [100] as a justification for the full GLE. Unfortunately elimination of the Langevin assumption is also difficult because after finding appropriate assumptions for the continuous time GLE, it
must be discretized and remain computable.
Second, the discretization process requires a further set of assumptions related to the chosen
time step, ∆. These assumptions describe the path that the system takes in-between discrete time
steps, and constitute an additional layer of approximation to (3.19). Although advanced numerical
schemes making use of the mathematics of Gaussian random walk processes [157, 173] exist, the
source of random noise is no longer directly accessible – rendering further adaptations of the theory
difficult. An example of one such desirable extension would be conservation of energy between the
coarse and fine subsystems, which are usually not infinite as assumed in the GLE. Current studies
in this direction completely fix these quantities, while an approach consistent with §3.1.1 might
allow exchanges between baths (depending on the total amount of energy, momentum, etc. in each).
Furthermore, linearly integrating the PMF for very long time steps can lead to large errors which
may be better treated by allowing the force function itself to depend on the length of the time step
taken (e.g. smoothed potentials).
3.4
Discrete Time Integration
Many of the problems with post-hoc discrete integration techniques discussed above can be eliminated by analyzing the discrete-time equivalent of (3.19) from the start. This discrete form is
found by considering the transition probability density P R, pR (t + ∆)|R, pR (t) In terms of Fig. 3.1,
a deterministic process would have a delta-function shape transition probability centered around
e∆iL [R, pR ]T , while the stochastic evolution described would have a Gaussian-like transition probability serving to grow the enclosed region at each application of the integrator.
P R, pR (t + ∆)|I =
36
ZZ
P R, pR (t + ∆)|R, pR (t)I P R, pR (t)|I dRd pR
(3.22)
CHAPTER 3. INFORMATION THEORY PERSPECTIVE ON STOCHASTIC DYNAMICS
This integral equation approach has both advantages and disadvantages. It can be computable
from the start, since the mathematical form of the integrator is arbitrary and defines a transition
probability density. However, the transition probability must be proven to give a correct canonical
ensemble. This problem and other analytical calculations can be more difficult for discrete transition
probabilities.
As we shall see, it is possible to derive a discrete analogue of the FDT giving a canonical
distribution and a consistent, stationary, velocity autocorrelation function. Unlike a simple Euler
integration method for a continuous stochastic differential equation, the same equilibrium state is
reached regardless of the timestep chosen. In this analogue, the thermostat has the simple physical
interpretation of representing stochastic exchanges with the bath subsystem over a whole time step,
allowing considerations such as total system energy, momentum, etc. to enter at the level of the
integration scheme. This means that no restrictions on the thermodynamics of the system have to be
compromised at the outset. Such a complete avoidance of infinitesimal time considerations seems
essential if we are to pursue rigorous calculations on mesoscale systems.
Finally, a discrete jump formulation makes issues related to the information leak introduced by
coarse-graining [81] much more clear. Loss of information occurs whenever we average over the
bath variables (i.e. at each integration time step and not just at the beginning of a simulation). This
happens in real numerical schemes at every time step, where the average is over any set of variables not included in the prior information used for propagation. This introduces irreversible gains
in information entropy of the propagating distribution for each step, correspondingly irreversible
dynamics, and experimental uncertainty for the exact value of time-dependent averages.
3.4.1
Problem Formulation
To define a working notation, the standard quantities of position, momentum, and force are nondimensionalized via dividing by a constant of the appropriate units and represented schematically
37
3.4. DISCRETE TIME INTEGRATION
for each time step (indicated with superscripts) as τi = {si , ui , fi }.
1
∆
Ri = p M− 2 si
β
1
1
pR i−1/2 = p M 2 ui
β
p
β −1
M 2 fi
Fi =
∆
(3.23)
Non-dimensional quantities are shown on the right, ∆ is the integration time step, and M is the
diagonal matrix of coarse-grained masses introduced in §3.1.1.
Using these quantities, a discretized version of the GLE is 3
K
ui = σzi + ui−1 + α1 fi−1 − (γ1 + 1)ui−1 + ∑ αk fi−k − γk ui−k
k=2
≡ σzi + αT f − gT[1:]~uK[1:]
si = si−1 + ui .
(3.24)
(3.25)
The vector α has been introduced to increase the accuracy of the integration scheme. A vector
notation for g ≡ [1, γ1 , γ2 , . . . , γK ]T , α, and ~u ≡ [ui , ui−1 , . . . , ui−K ]T has been introduced to simplify
the notation. The slicing operator [1 :] indicates that only the elements from γ1 onward are used in
(3.24). The Python convention for slicing will be used, where vector numbering starts from zero,
and the slice [i, j) corresponds to i, i + 1, . . . , j − 1.
The damping coefficients in the above equation should be related to the velocity autocorrelation
function, x ≡ [σ1 , . . . , σK ]T . The members should be equal to the averages σ j ≡ ui , u|i− j| . In our
nondimensionalized units, the velocity u should be normally distributed with variance 1, leaving
σ0 = 1. These numbers are covariances and should be distinguished from σ, the scalar standard
deviation of the Gaussian random walk process in Eq. 3.24.
3 Equations
N1 w, σ
38
2
3.24 and 3.28 use a notation for z following a standard normal distribution so that P (w = zσ) =
1 − w2
2σ2
≡ (2πσ2 )− 2 e
CHAPTER 3. INFORMATION THEORY PERSPECTIVE ON STOCHASTIC DYNAMICS
Since we have assumed the memory function decays to zero in K time steps, Eq. 3.24 defines
a K + 1 step Markov process. A complete description of the integrator state is given by any set
of K consecutive configurations {τ}i−1
i−K ≡ ~τ. The analogue of the phase-space density (W ) for this
process is the K step probability distribution, P (~τ). The traditional single-step PDF is just a marginal
distribution of this PDF. Propagation of the Markov process in time (going between circled areas
in Fig. 3.1) generalizes Eq. 3.22 by including any other integrator state variables. It is expressed
mathematically by
Pi (τi , τi−1 , . . . , τi−K+1 ) =
Z
P (τi ,~τ)dτi−K ,
(3.26)
with
P (τi ,~τ) = P (τi |~τ)Pi−1 (~τ).
(3.27)
Finally, the integrator update equations (3.24) imply the following transition probability density.
P (si , ui , fi |~τ) = δ fi − f(si ) δ si − (si−1 + ui ) N1 ui − αT f K + gT[1:]~u[1:] , σ2
= P ( fi |si )P (si |ui , si−1 )P (ui | f ,~u[1:] ).
(3.28)
(3.29)
We can now see that the most complicated actor in this equation is the normally distributed ui which
must hold the balance between the stochastic and frictional forces.
At equilibrium, the Markov process is said to be “stationary”, since the PDF is invariant to (3.26)
– i.e.
Pi (~τ) = Pi−1 (~τ) ≡ Peq (~τ).
(3.30)
The goal of this derivation is to find conditions on {α, g, x, σ} such that the equilibrium PDF of
~τ is stationary under the numerical integration process. To date, we have considered the problem
of finding g[1:] = [γ1 , . . . , γK ]T from any specified autocorrelation function x = [σ1 , . . . , σK ]T , and
assumed a standard numerical integration scheme (viz. Verlet, above) would suffice for the deterministic force terms. For the FDT analogue considered, there should be K constraints on g (one for
39
3.4. DISCRETE TIME INTEGRATION
each σi ), plus one relation between σ and g (to ensure σ0 = 1). These will completely determine g
and the scale of the random noise from x. We find that it is more convenient to choose σ and then
determine the last member of the autocorrelation function from it.
3.4.2
Problem Solution
Solving Eq. (3.26) for a set of conditions analogous to a discrete FDT is, in general, difficult. However, by analogy to the stationary distribution invariance to iL , we can start by assuming that the
usual position and velocity updates due to the Hamiltonian leave the canonical distribution stationary. In this section we therefore consider a one particle case where no deterministic force is applied
during the dynamics. This assumes that the bath (memory function and random force effects) responds to each particle independently.
In a first attempt, the K = 0, 1, and 2 cases were manually worked out. The following general
solution strategy emerged. Assuming the velocities at each step have a multivariate normal distribution, the (K × K) variance-covariance matrix contains all the information about the equilibrium
distribution. Doing the multiplication (3.28) to get 3.27 results in a new K + 1 × K + 1 covariance
matrix for the joint distribution. Because integration over any coordinate in a multivariate normal
distribution just eliminates its row and column from the variance-covariance matrix (leaving the rest
unchanged), stationarity implies that the upper-left K × K sub-block4 must equal the lower-right
(original) sub-block. This solution strategy can be generalized to any K with a little matrix algebra.
There may also exist an argument by induction, but this strategy has not been pursued here.
For a system at equilibrium, it must be true that the marginal distribution of the velocity, P (ui ),
and the corresponding distribution of the force have mean zero (in the system frame of reference).
From Eq. (3.28), we are adding Gaussian random noise at each step, and from the central limit
theorem we can infer that all marginal distributions of the velocity must therefore be Gaussian with
4 To
see this without writing
D
Edown the matrix, from the layout of ~u, the diagonal elements of the variance-covariance
matrix are the variances u2i− j , j = 0, 1, . . . , K going down and to the right. Integrating over ui−K gives the upper-left
sub-block and integrating over ui gives the lower-right sub-block (original covariance matrix).
40
CHAPTER 3. INFORMATION THEORY PERSPECTIVE ON STOCHASTIC DYNAMICS
zero mean. Postulating a multivariate normal distribution for Peq (~u[1:] ) with mean 0 and penalty
matrix PK gives
P (~u[1:] ) ∼ NK ~0, P−1
K
(3.31)
P (ui ,~u[1:] ) = (2π)−(K+1)/2 σ−1 |PK |1/2 exp −
(
1
gT ·~u
~uT[1:] PK~u[1:] +
2
σ
2 )
∼ NK+1 ~0, P−1
K+1 .
(3.32)
The penalty matrix is the inverse of the variance-covariance matrix.

1
σ1
σK
···


 σ1
1
···

P−1
=
 .
K+1
.
..
 .
..
.
 .

σK σK−1 · · ·

 


T

σK−1   1 x 
≡

.. 

. 
x P−1
K

1
(3.33)
The terms in the exponent must satisfy

0




~uT 
 0

..
.


0 ··· 

 ggT 
T
+

 σ2 ~u = ~u PK+1~u.


PK
(3.34)
To find the relation between x and g, we can use the relationship between PK+1 and its inverse
−1
T
(PK+1 · P−1
K+1 = I), and let PK = AΛA be the eigenvalue decomposition of the positive-definite
penalty matrix. Substituting (3.33) and (3.34) gives one intermediate step


 g
~0T
A



σ−2
~0T
~0
Λ−1



gT
~0 AT
 
xT


  1
 = I.
·
x AΛAT
(3.35)
The first three matrices are the decomposition of PK+1 and prove that the normalization constants
41
3.4. DISCRETE TIME INTEGRATION
of Eq. 3.32 work out correctly, since |PK+1 | = |PK | (g[0] |A|)2 σ−2 = |PK | σ−2 .
Solving for the diagonal matrix gives



σ2
~0T
~0
Λ


 
=
2 hg, xi + 1 + gT AΛAT g
gT AΛ + xT A
AT x + ΛAT g
Λ
⇒ hg, xi = σ2 − 1
and



P−1
K · g = −x.
(3.36)
(3.37)
Eliminating g between the two constraints gives
2
xT P−1
K x = 1−σ .
(3.38)
So that if we choose x, then compute the inverse of PK (3.33), σ2 is forced to take on a certain value.
This is not a good ending, so we can try letting the last x[ K − 1] = σK be determined by a choice of
σ. To solve for σK , we divide out the last column and row of the penalty matrix according to

 U
PK ≡ 
wT

w 
,
h
(3.39)
where the scalar h must be a positive, since PK is positive definite, and U = PK−1 + wwT /h by
analogy to (3.34) with variable ordering switched.5 Writing out the product results in a quadratic
equation for σK with roots
σK =
− 1h
q
2
T
2
w, x[:K) ±
w, x[:K) + h(1 − σ − x[:K) Ux[:K) ) = − h1 w, x[:K) + γK .
(3.40)
The last equality comes from writing out the last column of constraint 3.37. The choice of sign is
arbitrarily given to γK , and we now have a single relation between σK , γK , and σ for any given x[:K) .
wT
5[ √
,
h
42
√ T √ wT
h] · [ h, √ ] =
h
wwT /h
wT
w
h
CHAPTER 3. INFORMATION THEORY PERSPECTIVE ON STOCHASTIC DYNAMICS
1.0
0.5
0.0
-0.5
-1.0
-1.0
-0.5
0.0
0.5
1.0
Figure 3.2: Phase diagram for a 2-step Markov implementation of the GLE. σ1 is along the x axis
and σ2 along the y. Shaded regions correspond to stable integration schemes. No solution for
Eq. 3.41 exists outside of these regions.
43
3.5. SOURCES OF MODELING ERROR IN CHOOSING INTEGRATOR PARAMETERS
The final problem left is to determine what set of σ values are allowed solutions for a given x.
Examining the solution for
2
T
γ2K = w, x[:K) + h(1 − σ2 − x[:K)
Ux[:K) )
T
= h(1 − σ2 − x[:K)
PK−1 x[:K) ) ≥ 0
(3.41)
T P
implies 0 < σ2 ≤ 1 − x[:K)
K−1 x[:K) . For K = 1, any σ ≤ 1 is allowed, however choosing σ and σ1
for a 2-step Markov process requires σ2 ≤ 1 − σ21 . Note that σ1 is a covariance and can be either
positive or negative. It is therefore possible that even some choices of x[:K) will give no solution. A
simple requirement on x of any size is that xT PK x ≤ 1. The set of allowable σ1 , σ2 for a Markov
process of size K ≥ 2 is shown in Fig. 3.2.
3.5
Sources of Modeling Error in Choosing Integrator Parameters
For deterministic systems, errors in modeling the time-dependent behavior are due to approximations in the update equation τi+1 |τi . Well-known sources of error include the errors in the chosen
Hamiltonian, force field parametrization, or Taylor expansion error and loss of numerical precision
in Störmer-Verlet algorithms [64]. For stochastic systems, the degree to which the bath variables
influence the dynamics of the system can also be regarded as a source of error. This section will
analyze both sources in terms of an information loss associated with performing a dynamics update
step. As will be shown, the information loss arrived at reflects the error growth with respect to a
single timestep. It gives a quantitative measure for comparing different integration schemes, and
thus any choices made in the design of the update step.
The information loss metric derived in this section is slightly different than a similarly termed
metric used recently by Katsoulakis and Trashorras [92] in connection with their CGMC formalism
for arriving at stochastic lattice dynamics [93]. Although both measure a relative statistical entropy
between an exact and an approximating process, theirs measures the information loss over a while
44
CHAPTER 3. INFORMATION THEORY PERSPECTIVE ON STOCHASTIC DYNAMICS
trajectory, while the present discussion is confined to a single time step (of arbitrary ∆). In addition,
the CGMC formalism appears only in connection with lattice systems and has not yet been applied
to continuous systems.
After deriving our quantitative measure of force field approximation error for a quite general
class of stochastic integrator methods (i.e. any specifying the form of a transition probability,
P (τi+1 |τi θ)), it is natural to choose integrator parameters to minimize this error. We show that this
minimization problem is identical to minimizing an expected loss function as is commonly done
in decision theory. Consistent with the phase-space volume invariance of deterministic dynamics
considered in §3.3, the present measure is invariant with respect to any one-to-one transformation
of the coarse coordinates and is directly related to the integrator error. Finally, we note that although
the arguments here apply only to stochastic integrators, it may later be possible to recover analogous results for the deterministic case by letting the transition PDF approach zero width and taking
appropriate derivatives with respect to ∆.
Imagine fitting model parameters to the transitions of some deterministic process which has been
observed at time points τ1 → τ2 → · · · → τM . Call the set of observed transitions V . According to
(3.14), there is some function, e∆iL τi , which will exactly produce τi+1 for any i. Unfortunately, we
are forced to choose a surrogate model by specifying parameters, θ, for our numerical integration
algorithm. The practical consequence is that our model will necessarily assign a probability of
less than one to e∆iL τi , resulting in some loss of probability along the “true” trajectory. More
generally, if the underlying process is stochastic, we will assign a lower probability to the real
transition distribution P (τi+1 |τi ), and there will be less overlap in phase space. The above difference
in distributions (or overlap in phase space) can be quantified using the same argument as in Ch. 2
which lead to H (2.4)
H P (τi+1 |V τi )P (τi+1 |θ̂τi ) =
P (τi+1 |θ̂τi )
∑ P (τi+1 |V τi ) ln P (τi+1 |V τi )
τi+1
≡ −L(θ̂,V, τi ),
(3.42)
45
3.5. SOURCES OF MODELING ERROR IN CHOOSING INTEGRATOR PARAMETERS
giving a quantitative answer to the question “How likely is the PDF induced by V after decision θ̂
has been made?”
The above definition is related to the “information loss” decision function of Jaynes [83]. The
use in that context was unclear as to the decision to be made. However it suggested interesting
properties for this type of loss function. First, the present discussion could be generalized by letting
τi+1 be anything we need to predict based on V (but for which we use the integrator model, θ,
instead). It also suggested the present connection to sufficient statistics. Since H( f |p0 ) ≤ 0 (i.e.
L ≥ 0), there is always some divergence between the PDFs unless θ has specified the transition
probability for a given τi perfectly. If L = 0, θ acts as a sufficient statistic for decisions about τi+1 in
the sense that P (τi+1 |V θ̂τi ) = P (τi+1 |θ̂τi ). Choosing θ̂ to minimize the expected information loss
over the whole set of observed τ will have hLi = 0 if and only if the complete transition PDF is
correctly represented. Otherwise, (3.42) will choose θ̂ to be as close to sufficient for decisions about
system dynamics as possible.
Although Eq. 3.42 was stated for any P (τi+1 |V τi ), in order to do a numerical computation a class
of probability distribution functions for this transition must be considered as part of the prior information.6 As shown above, specifying a form for the integrator [including the functional form of the
Hamiltonian (e.g. pairwise distance energies plus angle energies plus torsion energies)] completely
specifies this transition probability form up to the adjustable parameters, θ.
L(θ̂,V, τi ) = −
∑
θ,τi+1
P (θ|V )P (τi+1 |θV τi ) ln
= ∑ P (θ|V )L(θ̂, θ, τi ) − ln
θ
L(θ̂, θ, τi ) ≡ − ∑ P (τi+1 |θτi ) ln
τi+1
P (τi+1 |θ̂τi )P (τi+1 |θτi )
P (τi+1 |θτi )P (τi+1 |V τi )
P (τi+1 |θτi )
P (τi+1 |V τi )
P (τi+1 |θ̂τi )
P (τi+1 |θτi )
(3.43)
(3.44)
This form makes it clear that averaging over the information loss (3.44) should proceed over the
6 Prior probabilities are always present in probability statements, even if we don’t explicitly write them – a fact made
clear by the irrelevance of most prior information to the problem at hand.
46
CHAPTER 3. INFORMATION THEORY PERSPECTIVE ON STOCHASTIC DYNAMICS
posterior distribution of the parameters given the observations (θ|V ). The quantity on the right hand
side of (3.43) does not depend on θ̂, so it is not important for choosing parameters, but conveys an
extra information loss due to uncertainty in P (θ|V ).
Definition 3.42 is also invariant with respect to any one-to-one transformation of coordinates,
τi+1 . This is trivially true for the sum above. It also works in the continuous limit, since probabilities appear only as a ratio in the logarithm. As a consequence of this invariance, the integration
scheme 3.24 has a neat expression for L in terms of force.
1
1
σ̂2 σ2
T ˆ
T 2
L[θ̂, θ, τi ] =
ln 2 + 2 − 1 + 2 α ( f − f ) − (ĝ − g) ~u
σ̂
σ̂
2
σ
(3.45)
The above equation assumes that α is fixed and that the the observed data includes sets of K + 1
velocities ~u and coordinates ~s[1:] . For a derivation, see Ch. 5.
The properties of the information loss metric make a good case for its use in parametrization of
molecular force-fields P(R′ |R). To see how this would work in practice, consider the GLE model
(3.45). The transition probability gives a specific form for P (V |θI) which can be used to generate a
posterior distribution via Bayes’ theorem.
P (θ|V I) =
P (V |θI)P (θ|I)
P (V |I)
(3.46)
This model is necessarily an approximation to the “true” system dynamics and the loss criterion
(3.45) focuses the particular choice of transition probability (θ̂ = argmin L[θ̂, θ, τi ] ) around the
observed dynamics.
The above framework allows us to discuss a currently unsolved problem in coarse-grain simulation design, namely the choice of mesostates. It is usually assumed that the experimenter will have
enough chemical intuition to decide which interactions are important for the system behavior (and
should be represented on the CG level). However, in complicated many-body environments, these
choices do not usually come as easily, requiring an actual dynamics simulation using each possible
47
3.5. SOURCES OF MODELING ERROR IN CHOOSING INTEGRATOR PARAMETERS
CG model under consideration. This guess and check procedure could be circumvented by comparing the values obtained at the minimum of Eq. 3.45. These reflect the ability of particular choices
of mesostates and terms in the potential energy function to represent the transition probability, and
can aid in justifying a specific choice. Comparisons between entirely different transition PDFs can
also be made.
Some extensions of the above remain to be proved. Following Jaynes [81], there should be a
subjective H theorem giving limits on the information entropy of τi+1 introduced at each stochastic
integration step. However, extending his ideas at this level of generality is not straightforward,
since if the process always collapses toward a certain state (e.g. every step makes τi+1 = 1, then
H[P (τi+1 )] will be less than H[P (τi )]. To get Jaynes’ result, it is necessary to assume a deterministic
underlying dynamical process and consider the spreading out of paths in probability space when
only a single guess about the bath variables, r, is made (as in Ref. [92]). This direction holds the
possibility of connecting (via some inequalities) the information loss, above, with the entropy of the
transition probability and the uncertainty of the bath variables.
One final note is that the transition probability (P (τi+1 |τi )) itself can also be a source of error if
it is less ordered than the underlying process. The divergence of similar trajectories as time proceeds
is the root cause of the butterfly effect in chaos theory. Although our stochastic integrator should
produce trajectories exhibiting variation over the whole range of path space allowed, intuition tells
us that large random variations should not happen to conserved variables. The solution is to add
conservation laws to the update steps, referencing more information about the subsystems. Useful
steps in this direction have been taken by the DPD community [68, 124, 48, 15, 132].
48
Chapter 4
Free Energy Inference for a Multinomial
Counting Problem
This chapter describes the formulation and implementation of a free energy inference scheme [140].
The problem is to use data from simulation frames to calculate the solvation free energy of a
molecule with coordinates ~x1 in a bath of molecules specified by coordinates ~xN . The Potential
Distribution Theorem [177, 19, 134] (PDT) expression for this quantity is
e−βµ
ex (~x1 )
D
E
= e−β∆E |~x1 .
0
(4.1)
∆E ≡ EN+1 (~xN ,~x1 ) − EN (~xN ) − E1 (~x1 )
To motivate the form of Eq. 4.1, consider the following re-weighting problem. Assume the
distribution function for a variable, ~x, given initial information, A, is a known function P (~x|A). How
can this function be used to find the distribution resulting after adding new information, B? This is
the type of problem for which Bayes’ theorem is well suited.
P (~x|AB) =
P (~xB|A)
P (B|~xA)
P (~x|A)
=R
P (B|~xA)P (~x|A)d~x
P (B|A)
(4.2)
49
For the case of molecular solvation, let A represent the proposition that the system is in an ideal
gas state, i.e. every molecule is isolated and has only internal interactions. Calling the potential
energy of this system EA and applying the canonical distribution gives for P (~x|A)
P (~x|A) = R
e−βEA
.
e−βEA d~x
(4.3)
Now let B represent the proposition that a term EB has been added to the potential energy function.
Similar to Eq. 4.3, P (~x|AB) will thus be the canonical distribution with potential energy EA + EB . To
make this ‘addition’ proposition obey the rules of formal logic, both A and B should be understood
as distinct properties of the potential energy function, occupying different positions (say 1 and 2) in
a list of additive terms.
In order to find the re-weighting factors, W (~x) = P (B|~xA)/ P (B|~xA)P (~x|A)d~x, these two canonR
ical distributions can be inserted into Eq. 4.2 to give
P (B|~xA) = c(A)e−βEB ,
(4.4)
with some proportionality constant, c(A). The existence of this constant implies we can only get
ratios of probabilities between alternate systems, B. The appropriate re-weighting measure is the
ratio of Eq. 4.4 to the unperturbed case.
P (B|~xA)
= e−βEB
P (EB = 0|~xA)
(4.5)
Eq. 4.5 gives the instantaneous likelihood for changing the problem specification (by introducing
EB ) at a given phase space point, ~x.
Because B has been worked to the left-hand side, the relative probabilities of two different
50
CHAPTER 4. FREE ENERGY INFERENCE FOR A MULTINOMIAL COUNTING PROBLEM
perturbations (B, EC = 0) and (BC) to the energy can now be determined.
P (C|AB)P (B|A)
P (C|AB)
=
P (B, EC = 0|A) P (EC = 0|AB)P (B|A) P (EC = 0|AB)
Z
P (C|AB~x)
=
P (~x|AB)d~x
P (EC = 0|AB~x)
P (BC|A)
=
(4.6)
This novel formula directly reveals the influence of the instantaneous potential energy jump likelihoods (Eq. 4.5) on the average likelihood for a system to switch potential energy functions (Eq. 4.6).
The latter is simply an average over the former!
To compare with Eq. 4.1, e−βµ
ex (~x1 )
P (BC|A)
becomes P (B,E
when we assign the following valC =0|A)
ues to the propositions, B,C: B couples the solvent molecules together in solution [EA + EB =
E1 (~x1 ) + EN (~xN )], and C couples the solvent to the solute [EC = ∆E(~xN ,~x1 )]. The generality of the
above argument allows us to insert any other desired constraint in addition to A (e.g. fix the solute
position, ~x1 ) as part of the original problem specification, as long as it is consistently treated as a
given constraint. The physical content of Eq. 4.6 is just as stated in § 1.4, that the total solvation
likelihood is composed of an average over the instantaneous “coupling” likelihoods for all solvent
configurations.
Moreover, by arbitrarily re-defining A, B,C, the likelihood expressed in Eq. 4.6 can also be understood as the likelihood for changing between any two potential energy functions. The connection
to chemistry should be obvious. These potential energy functions describe endpoints of the reaction
B → BC.
This derivation is unique in that it uses logical inference instead of mathematics to derive the
traditional free energy perturbation formula,
P (ED = EB + EC |A)
P (D|~xA)P (~x|A)dx
=R
P (B|A)
P (B|~xA)P (~x|A)dx
R −β(E +E )
B
C P (~
e
x|A)d~x
= R −βE
B
e
P (~x|A)d~x
R
=
Z
e−βEC P (~x|AB)d~x.
(4.7)
(4.8)
51
4.1. QUASI-CHEMICAL THEORY DIVISION
Where the last step can be verified by writing out the canonical distributions for all of the PDFs in
Eq.s 4.7 and 4.8.
As can be imagined from their direct connection to chemical reactions, accurate evaluations of
free energies are crucial for understanding a wide range of chemical and physical phenomena. These
include chemical reactions in solution[86, 97, 98, 49, 69], solvation free energies[86, 97, 98, 154, 38,
153, 180, 103, 117], potentials of mean force between complex molecular species in water[88, 14],
acid-base equilibria[156, 34], ligand/drug binding interactions[86, 98, 87, 171, 118, 39], and phase
equilibria[151].
4.1
Quasi-Chemical Theory Division
With the free energy perturbation formula (Eq. 4.6) in hand, it is now possible to try to estimate
βµex (~x1 ) from sampling data. A direct approach is possible using either Monte-Carlo (MC) integration of Eq. 4.1 or via estimation of the relative probabilities of any quantity for substitution into
Eq. 4.2.
P (~xEC = 0|AB)/P (EC = 0|AB)
P (C|AB) βEC (~x)
P (~x|AB)
=
=
e
P (~x|ABC)
P (~xC|AB)/P (C|AB)
P (EC = 0|AB)
(4.9)
= eβ(EC (~x)−∆FB→BC )
This second method is the basis for several advanced treatments such as the histogram overlap[165],
Bennett acceptance ratio (BAR)[21], and weighted histogram (WHAM)[101, 53] methods. It is easy
to see that the major difficulty in using the second formula comes from the determination of the ratio
on the left hand side of (4.9). To calculate this ratio, two simulations (at EA + EB and EA + EB + EC )
can be carried out and the probabilities numerically estimated. To infer the free energy change, the
ratio must be well-sampled. Moving P (~x|ABC) to the right and integrating over all configurations
52
CHAPTER 4. FREE ENERGY INFERENCE FOR A MULTINOMIAL COUNTING PROBLEM
1
Non-Interacting
Interacting
Probability (kJ/mole)-1
0.1
0.01
0.001
1e-04
1e-05
1e-06
1e-07
-20
0
20
40
60
80
100
Interaction Energy (kJ/mole)
Figure 4.1: Methane to SPC water energy distribution, re-printed from [140]. Solid and dotted lines
show the forward and backward distributions of instantaneous switching energies (− ln of Eq. 4.5,
where A is the uncoupled system [water-water interactions only], and B is the coupling between
Methane and SPC water). Note that the overlapping region is sparsely sampled (occupying less than
≈0.2% of the uncoupled distribution). Two observed single-count outliers at the right of the coupled
distribution can be seen in the plot.
having the same numerical value of ε ≡ βEC pools as many samples as possible for each energy.
P (ε|AB) = P (ε|ABC)eε−βµ
ex (~x1 )
(4.10)
So even using all the samples available to infer the free energy using this method, the ratio may
be poorly known if the energy distributions do not overlap. It is now easier to see potential problems
using the first method in this context. Eq. 4.1 is an integral over the re-weighted distribution of
EC . However, such a re-weighting can only be successful if these energy distributions overlap.
Without overlap in the energy distributions, both forward and backward (B → BC, particle insertion
or BC → B, deletion, respectively) formulations fail.
53
4.1. QUASI-CHEMICAL THEORY DIVISION
An illustration of this failure was provided by Fig. 1 of Rogers and Beck [140]. Fig. 4.1 plots the
distributions of EC |AB (non-interacting) and EC |ABC (interacting) for the case of methane solvation
in water. Those configurations whose energies are most important for characterizing the solvated
state occupy only a small fraction of the total configurational space of the uncoupled system. For
the displayed case, it is clear that, even though methane is a reasonably small-sized molecule, large
simulation times are required to adequately sample the low-energy tail of the uncoupled distribution
(as it occupies only 0.2% of the sample space). Trying an exponential averaging formula for the
fully coupled case gives a 16 kJ/mol error in the free energy, since the high-energy exponential tail
is not adequately sampled during simulations. Larger molecules such as CF4 become even more
problematic [13], and no detectable overlap would occur for a small peptide, for example.
Because of the known limitations of these methods, physically motivated free energy pathways
have been sought out. These are based on the idea that useful approximations to separate parts can
be arrived at from chemical intuition, and/or that numerical estimation will be accurate for small
perturbations. In drug screening computations on large sets of potential ligands, a crude estimation is
usually made by directly approximating the free energy for a single step. A more common example
of the pathway idea is the separation of the process into the initial solvation of a Lennard-Jones (LJ)
solute C (re-defining the above propositions), then “turning on” its electrostatic interactions with
solution, D. This makes the compound pathway B → BC → BCD. Good approximations to the
first and second steps, one-step (OS) perturbation and linear interaction energy (LIE) [127, 152],
respectively, have been combined in recent reports and provide a better trade-off of accuracy vs.
efficiency than was possible with the single-step approximations mentioned above.
Another idea is to consider the reactions
OS,HS
OS,LR
IS
B −−−→ BCn Dn −−−→ BCn D −
→ BD,
(4.11)
where we have defined the propositions as follows. The uncoupled solvent-solvent and solute-solute
intermolecular interactions are represented by B and all the solute-solvent cross-interactions by D.
54
CHAPTER 4. FREE ENERGY INFERENCE FOR A MULTINOMIAL COUNTING PROBLEM
Separating out the gas phase interaction energy with n distinguished solvent molecules from ED =
∆E gives ED (~xN ,~x1 ) = EDn (~xn ,~x1 ) + EDN−n (~xN , x1 ). Thus the central step in the above corresponds
to the coupling of a solute plus n water complex to the remaining N − n solvent molecule bath.
In order to consider chemically realistic solvation structures, let Cn correspond to a new constraint
on the structure of the n distinguished water molecules – arbitrarily chosen to form the solute’s
inner-shell (IS) solvation layer. Quasi-chemical theory (QCT) defines B → BCn Dn as the gas-phase
association reaction between the solute and a specific structure of n solvent molecules, as well as
the creation of structure Cn in bulk. The solvation of the solute plus n solvent molecule complex is
the next step, followed by the removal of constraint Cn .
The above idea has been tested in detail by many recent studies [129, 150, 13, 14, 140] for the
choice of n = 0. In this case, EDn = 0, and the only consideration is the likelihood of formation
for the solvent structure C0 . A simple definition of the IS region as a sphere of radius λ around the
center of every solute atom was used to define C0 . If any solute center encroaches on this sphere,
OS,HS
n > 1 and thus C0 is not formed. These considerations go equally for the B −−−→ BC0 step as for a
−IS
hypothetical BD −−→ BC0 D step, where we can appeal to Bayes’ theorem to directly calculate
ex
e−βµOS,HS =
P (BC0 |A)
= P (C0 |AB).
P (B|A)
(4.12)
This is the probability for spontaneous formation of the (cavity) structure C0 in the AB (uncoupled)
system. Note that this calculation cannot be worked in the reverse because C0 is a restriction on
phase space volume and not simply an energy function which can be added and then subtracted
later.
ex
e−βµIS =
P (BD|A)
1
=
P (BC0 D|A) P (C0 |ABD)
P (BC0 DC̄0 |A)
6=
P (BC0 D|A)
(4.13)
(4.14)
QCT is an important conceptual division of the solvation process, since it is based on chemically
55
4.2. FORMULATION OF THE INFERENCE PROBLEM
relevant intermediates. Using the hard sphere definition of C0 above, scaled-particle theory [11,
12, 140, 78] has been found to give excellent approximations to the OS, HS step. In addition, a
linear solvation energy approximation (cumulant expansion to second order) works very well for
the second step at IS radii fairly well sampled in a standard coupled (ABD) simulation [73, 20, 140].
With the QCT division in place, the remainder of this chapter will focus on the specific problem
of inferring the three components of the free energy.
4.2
Formulation of the Inference Problem
The problem of determining the IS and OS,HS free energy components is simply to determine the
probability of a specific molecular event, C0 , the formation of a cavity of size λ at a specified point
in solution. This is the first inference problem worked out in this dissertation, so the formulation
and solution will be given in detail. Stated in the general terms used here, this problem setup should
be applicable to inference on the probability of any set of observable molecular events. This section
also explains how to extend the range of sampled events and estimate usually rare probabilities with
greater precision.
Before writing down any equations, the problem should be simplified as much as possible. To
consider arbitrary cavity sizes, we’ll need the probabilities for a greatly expanded set of events –
instantaneous cavity formations of any size. To do this, note that C0 (λ) is satisfied if and only if
the distance from our observation point to the closest solvent center is larger than λ. Most studies
using water neglect the positions of hydrogen atoms, and define the water center as the location
of the oxygen atom. Therefore, for each configuration (~x) the distance to the closest water oxygen
[rmin (~x)] is a “sufficient statistic” for the inference P (C0 |{~x}) = P (C0 |{rmin }). We can easily store
the value of this statistic for every simulation frame.
There is one further piece of information needed before carrying out the analysis. The distribution that generated the observed sample counts is important at this stage. Even though inference
can be done based only on the sample data, the statement we want to make is something like “the
56
CHAPTER 4. FREE ENERGY INFERENCE FOR A MULTINOMIAL COUNTING PROBLEM
estimated probability is 1/2 in the coupled system”. For µex
OS,HS , this distribution is B in (4.11). For
−µex
IS , it is D. This means that the name of the generating distribution was retained, while only the
mathematical structure [e.g. the P (~x|D) in P (C0 |D) =
R
I(C0 ;~x)P (~x|D)d~x] was discarded. This
detail lets us try to include samples from more than one underlying distribution to improve the
precision of the analysis.
Using samples from more than one underlying distribution is particularly simple if the mathematical structure of the distribution remains the same, while only variations in the available phase
space distinguish them. That is to say we are sampling from the same distributions but placing
restrictions on which states the system is allowed to occupy.
To see how this might be done, consider the conditional expansion
P (C0 (λL )) = P (rmin > λL ) = P (rmin > λL |rmin > λ1 )P (rmin > λ1 ) = . . . .
(4.15)
This expansion can be carried out ad infinitum by recursively sub-dividing the interval between the
radius of interest, λL , and 0. Intuition suggests that a simulation including the condition rmin > λ1
by, e.g. including a hard sphere of that size during MC sampling, would have good sampling for the
statistic rmin in a region not too much larger than λ1 .
This means that carrying out several such growth steps would allow observations out to large
λ needed for free energy inference. Now that any number of simulations with hard sphere radii
below that of interest can be used in the calculation of the result, what is the proper way to combine
simulations including hard spheres at each λ?
Since we want to numerically compute the free energy profile [− ln P (rmin > λi ) ≡ − ln pi ] with
respect to λ at a set of points, λi = hi, it makes sense to divide the possible values for rmin into
57
4.2. FORMULATION OF THE INFERENCE PROBLEM
corresponding shells with total probability
sk = P (rmin ∈ (λk , λk+1 ])
(4.16)
such that
j−1
p j = 1 − ∑ sk .
(4.17)
k=0
This division further reduces the sufficient statistics from the set of all observed rmin values to the
shell that each rmin falls within. Assuming independent sampling for simulation j, the total sample
count in each bin, x j (k), (labeling bins with k = j, . . . , L) are now a sufficient set of data from a
simulation with condition C0 (λ j ).
sL






sL−1




s1



s0












 




 







 

1 − p2
- rmin
0 = λ0
λ0
λ1
..
.
totals
λ1
λ2
···
x0 (0)
x0 (1)
x1 (1)
...
...
ζ0
ζ1
...
λL−1
x0 (L − 1)
x1 (L − 1)
ζL−1
λL
x0 (L)
x1 (L)
..
.
ζL
N0
N1
..
.
Figure 4.2: Division of the minimum solute–solvent distance (rrmin ) into successive shells [140].
Derived quantities are shown above the axis (from bottom to top): probabilities of rrmin falling in
each shell, sk , and cavity probabilities pi . The logical organization of simulation data is shown
below the axis (from top to bottom): Bin counts x j (k) from simulations including hard spheres of
size λ j , total samples from each simulation, N j , and bin totals ζk .
This division is pictured in Fig. 4.2 (reprinted from Ref. [140]). As indicated in Fig. 4.2, all
of the indices refer to to distances, however different variables have different ranges. Simulations
are carried out at λ0 = 0 and any desired λ j (as long as it falls on a shell boundary jh). This
corresponds to a special case of simulations at all jh, where no counts were taken at some λ j -s.
58
CHAPTER 4. FREE ENERGY INFERENCE FOR A MULTINOMIAL COUNTING PROBLEM
From these simulations, counts are collected in all bins, sk , k = 0, 1, . . . , Λ, where Λ is the largest L
to be considered. When computing a single pL , λL+1 is tacitly taken to be ∞ for that computation
and so all counts with rmin > λL from simulation j should contribute to x j (L). This means that in
a serial processing case, x j (L) are always computed by subtracting all other x j (k) from total counts
from each simulation, N j . Last, p0 and δ0 can be taken to be 1, as they express likelihood ratios of
the unconstrained to the unconstrained system.
The following points are important to consider when choosing a suitable definition for C0 (the
excluded volume in our case). First, a suitable set of λ j must be chosen to increase sampling where
existing data cannot determine the free energy with great precision. Next, if the system already
contains constraints, i.e. if λ0 was not simulated, then everything can be shifted to the left and the
output will indicate the free energy difference from the lowest λ system. Finally, the distance to the
closest solvent center, rmin , can be re-defined as any continuous variable on which constraints are
′
= min({rIJ (~x) − σIJ }) for
sequentially added. An example of this is shifting rmin by defining rmin
any set of solute centers, I, and solvent centers, J. In order to get the true free energy profile from
the unconstrained system in this case, the set of λ should be chosen such that the smallest does not
place any constraint on the system.
As shown in Fig. 4.2, it is also helpful to define simulation count totals,
L
Nj =
∑ x j (k),
(4.18)
k= j
and bin totals
ζk =
k
∑ x j (k)
j=0
ζL =
k = 0, 1, . . . , L − 1
L−1
L−1
L−1
j=0
j=0
k=0
(4.19)
∑ x j (L) = ∑ N j − ∑ ζk .
These will become important in simplifying some of the algebra showing up in the next section.
59
4.3. SOLUTION OF THE INFERENCE PROBLEM
4.3
Solution of the Inference Problem
From the above shell probabilities (Eq. 4.16), a probability distribution for the free energy of the
constraint C0 (λL ) given only the data counts and their possible ranges can be constructed. Doing
that results in a logarithm of a sum of sk -s. This is not appealing since attempting to analytically find
its probability distribution from the distribution of {sk }L0 will be difficult. A slightly nicer expression
should result from considering the conditional expansion of Eq. 4.15.
L
P (C0 (λL )) = ∏ δi
(4.20)
i=1
δi ≡ P (rmin > λi |rmin > λi−1 ) ≡ pi /pi−1
(4.21)
To carry out the inference, we just need a prior probability for the sk -s and the likelihood of
observing the counts x j (k) (shown schematically in Fig. 4.2) given s ≡ {sk }L0 . The prior of Haldane
and Zellner [183] (i.e. an improper Dirichlet distribution with zero initial observations),
L
P (s|I) = ∏ s−1
k ,
(4.22)
k=0
is an appropriate non-informative choice by the arguments of Jaynes [83]. The likelihood function
is described by a multinomial distribution conditional on the first j − 1 counts being zero:
P ( x j (k), k = j, j + 1, . . . , L |N j , s, x j (k) = 0, k = 0, 1, . . . , j − 1)
=
N j ! ∏Lk= j (sk /p j )x j (k)
∏Lk= j x j (k)!
−N j x j (L)
sL
= A(x)p j
L−1
∏ sk
x j (k)
,
(4.23)
k= j
where A is some function of x in which we are not interested, since our final goal is the posterior
probability of s (and thus p).
Noting that the full likelihood is the product of likelihoods for all simulations and discarding
60
CHAPTER 4. FREE ENERGY INFERENCE FOR A MULTINOMIAL COUNTING PROBLEM
terms not involving the sk gives:
L−1
P (s| x j (k) , N, I) ∝ P (s|I) ∏ p j j sLj
−N x (L)
j=0
L−1
=
∏
−N
pj j
j=0
L−1
=
∏ pj
−N j
j=0
!
sL j=0
!
pLL
L−1
∏ sk
x j (k)
j = 0, 1, . . . , L − 1
k= j
∑L−1 x j (L)−1
L−1
∑kj=0 x j (k)−1
∏ sk
k=0
ζ −1
L−1
ζk −1
∏ sk
.
(4.24)
k=0
To state the posterior probability distribution neatly, we’ll need to carry out a transformation of
variables. This can be done in two steps – s to p, and then p to δ.

1 − p1


 1 − p2


..

.


1 − pL


 
 
 
 
=
 
 
 

1
s0


  s1
1 1

 .
.. .. . .
 .
.
. .
 .

sL−1
1 1 1 1









(4.25)
Showing that the Jacobian is unity for this transformation, and the trivial transformation of 1 − pi to
pi . The PDF simplifies somewhat to
L−1
P (p|NxI) ∝ pζLL −1 ∏ pi−Ni (pi − pi+1 )ζi −1 .
(4.26)
i=0
As for the change of variables to δ j , expressing each probability as its appropriate chain of conditional probabilities gives
i
pi = ∏ δ j ,
j=1

i


 ∏k=1 δk
∂pi
=
∂δ j 

0
δj
if i ≥ j,
i = 1, 2, . . . , L.
(4.27)
otherwise
Which makes the Jacobian easy to calculate. Since the partial derivative matrix is diagonal, the
61
4.3. SOLUTION OF THE INFERENCE PROBLEM
determinant is the product of all diagonal terms.
L i−1
L−1
∂pi
L− j
= ∏ ∏ δj = ∏ δj
∂δ
i
i=1 j=1
j=1
i=1
L
|J| = ∏
(4.28)
This transformation makes the posterior PDF a product of independent beta-distributed increments!
L−1
ζ −N −1
P (δ|NxI) ∝ pζLL −1 ∏ δL−i
pi i i (1 − δi+1 )ζi −1
i
(4.29)
i=0
Collecting the exponents belonging to each δi gives Beta distributions with parameters
δi ∼ Beta (αi , βi )
i = 1, 2, . . . , L
(4.30)
L−1
αi = ζL + L − i + ∑ (ζ j − N j − 1)
j=i
L−1
L−1
j=0
j=i
= N − ∑ ζj + ∑ ζj − Nj
i−1
=
∑ Nj − ζj
(4.31)
j=0
βi = ζi−1 .
(4.32)
This is what we might have suspected from the start, since the number of counts in shell k
should really be combined from all simulations which can observe such an event. This explanation
also shows why αi is always non-negative, since the only counts appearing in the sum (4.31) have
the total contributing simulation counts Ni counter-balancing them. It warns that the probability for
any increment will be more uncertain if either αi or βi is small (corresponding to no counts above
λi or no counts inside (λi−1 , λi ], respectively).
The free energy released by removing the constraint C0 is constructed by adding together the
62
CHAPTER 4. FREE ENERGY INFERENCE FOR A MULTINOMIAL COUNTING PROBLEM
incremental free energies,
L
ln(pL ) = ∑ ln(δi ).
(4.33)
i=1
We derive the Bayesian minimum mean-squared error estimator[83] by integrating over the
posterior PDF.
hln δi =
Γ(α+β)
Γ(α)Γ(β)
Z
δα−1 (1 − δ)β−1 ln δdδ
= ψ0 (α) − ψ0 (α + β)
(4.34)
Here ψn (z) is the polygamma function[3]. A similar integral can be derived for the variance (below).
Since the the means and variances of independent random variables are additive under summation,
the Bayesian estimates for the free energy and its variance are then
L
hln(pL )iE = ∑ ψ0 (αi ) − ψ0 (αi + βi )
(4.35)
i=1
L
D
E
(ln(pL ) − E (ln(pL )))2 = ∑ ψ1 (αi ) − ψ1 (αi + βi ).
E
4.4
(4.36)
i=1
Re-Weighting Formulas
For the λ0 = 0 case, no restriction is made on the system sampling, so the simulation and counting
are straightforward. Common MD programs (and even many MC programs), however, have no
facilities for directly including hard spheres. Since sampling via MD is usually more efficient [8],
we have devised a method for generating the conditional samples referenced above using a smooth
potential followed by re-weighting.
For generating samples, we can employ a Weeks-Chandler-Anderson [175] (WCA) LJ potential
truncated at the minimum so as to be purely repulsive. This interaction potential, M(ri j ) has the
63
4.4. RE-WEIGHTING FORMULAS
standard LJ parameters of well-depth, εi j , and minimum-energy radius, R0,i j .
M(ri j ) =
 
R0,i j 6
R0,i j 12

εi j
− 2 ri j
+ 1 , ri j ≤ R0,i j
ri j


0,
(4.37)
ri j > R0,i j
These should be chosen to model the hard sphere potential, applying it between the same atom types
′
and mimicking the same distances as the chosen rmin
definition.
The following strategy has been adopted to accomplish this. First, R0 is chosen as 0.1Å more
than the distance of closest approach, λ j . Next, βε is found so that the outward force of the WCA
potential matches the average inward force exerted by the fluid on the hard sphere boundary (F =
− ∂r∂ β ln P (rmin > r) ) plus an extra term, taken to be 1.
r=λ
∂M(r) −
= F +1
∂r r=λ
⇒ βε =
λ
12
F +1
R0 12
R0 6
−
λ
λ
(4.38)
Figure 4.3 shows a plot of the (negative) cavity formation energy at Sscen , of the Bacterial chloride transporter structure 1ots (+ signs) before incorporating the sampling data from λ = −1.2Å. The
first thing to note is that the profile begins at λ0 = −2.28Å. This is because rmin has been shifted by
the LJ contact distance between a each atom in the system and a Cl− before taking the minimum
′
≡ min {ri j − R0,i j } , and 2.28 was the largest R0,i j – corresponding to no constraint.
i.e. rmin
−2
At λ = 1.2Å the error in the βµex
OS,HS (λ) profile had grown to 1.66 × 10 , making it a good
candidate for the next constraint location. Estimating F ≈ 1.4 by numerical differencing gives the
parameters for all model potential interactions with the hard-sphere-like particle at Sscen . The solid
line shows its particular model potential interaction with solvent water oxygen or hydroxyl atoms
(whose R0,i j are both 1.8Å) – shifted to the left by 1.8 to correspond with r′ . As prescribed by
′
Eq. 4.38, the slope at this point has a slightly greater magnitude than the slope of the ln P (rmin
> r′ )
curve. This makes the simulation sample regions past λ = −1.2 as shown by the number of samples
64
CHAPTER 4. FREE ENERGY INFERENCE FOR A MULTINOMIAL COUNTING PROBLEM
β Energy
λ=−1.2
8
6
4
2
0
−2
−4
−6
−8
1000
900
800
700
600
500
400
300
200
100
0
β M(r)
ln p(r’min>r’)
N
Neff
−2 −1.5 −1 −0.5
r’ / Å
0
0.5
Figure 4.3: Cavity removal energy at Scen derived from an unconstrained simulation, (+) signs.
In order to get precise energies past λ = −1.2, a model potential M(r) is added. The resulting
′
cumulative distribution N(rmin
> r′ ) and the effective number of samples (4.40) are shown.
′
with rmin
> r′ (shown in dashed lines, marked N in Fig. 4.3).
Now that the WCA model potential has been simulated, sampling the needed region of configuration space, there is still one more obstacle before the inference of §4.2 can be carried out.
The distribution of samples must be re-weighted to remove the influence of the WCA potential
outside the hard sphere radius, λ j . This re-weighting can be viewed as an acceptance/rejection resampling procedure [67]. Writing A as the original canonical distribution on system coordinates
~x and M j as the sum of all pairwise model potential terms mimicking λ j , the standard acceptance/rejection algorithm accepts a configuration ~xl drawn from P (~x|M j A) as generated from the
j
(λ)A)
. The constant c j is chosen such that
distribution P (~x|C0 (λ)A) with probability wl = Pc j(~Pxl(~|Cxl0|M
j A)
j
wl ≤ 1 for all relevant configurations. To use this idea, we’ll assign outcome probabilities to the
65
4.4. RE-WEIGHTING FORMULAS
sampled configurations {~xl }(M j A) according to
j
wl =
eβM j (~xl ) I(rmin (~xl ) > λ)
.
maxl∈ j eβM j (~xl ) I(rmin (~xl ) > λ)
(4.39)
Here I(cond.) is an indicator function which is one when the condition is satisfied and zero otherwise.
Eq. 4.39 guarantees that no sample will unduly contribute to averages, since wl ≤ 1. The expected number of re-sampled data points is
j
Neff
Nj
= ∑ wl ,
j
(4.40)
l=1
and the expected number of shell observations, x j (k), is similarly
Nj
x j (k) = ∑ wl I(rmin ∈ (λk , λk+1 ]).
j
(4.41)
l=1
The effective number of samples is an important output of the re-weighting procedure, since
it controls the magnitude of the error estimates. Fig. 4.3 shows the value of Neff using the model
potential described earlier for different possible values of λ. As λ is increased from the smallest
′
rmin
observed, both the number of samples above λ and the constant, c, decrease. The combined
effect serves to make a shoulder on the plot around the chosen r′ = −1.2Å. For similar plots on bulk
systems, Neff is usually larger [closer to N(rmin > r)] and displays a maximum at the chosen λ –
indicating that the choice of model potential (4.37 and 4.38) reasonably mimicks the hard sphere.
Although we are free to choose λ j based on the Neff plot, there is a danger in choosing λ too
small. M(r) biases the simulation away from configurations where solute molecules approach the
solvent. Inside λ this is good. However, it will also cause states where many solvent molecules stack
up just outside of λ to be poorly sampled. In this case c may be underestimated because a critical
configuration was not sampled. This can be expected to be the principal cause of re-weighting
errors.
66
CHAPTER 4. FREE ENERGY INFERENCE FOR A MULTINOMIAL COUNTING PROBLEM
4.5
Computing Conditional Averages via Re-Weighting
The last part of the solvation free energy (4.11) is the outer-shell long-ranged free energy for the
OS,LR
process BCn Dn −−−→ BCn D. This step constitutes a traditional free energy problem as considered
at the beginning of §4.1, and requires an accurate approximation method in order to make QCT
calculations efficient. To see this, consider the number of samples in the overlap region of (4.10)
which conform to condition C0 . Although the conditioning process may increase the fraction of
samples in this region, it will always discard some samples resulting in less observations available
for inference. So any stratification of simulation samples will discard data, and is only worthwhile
if the result is amenable to a useful approximation.
Several studies [149, 150, 140] have shown that the following cumulant expansion is a very
good approximation to the free energy for this step (indicating that the distribution of EDN−n is
nearly Gaussian).
βµex
OS,LR
βEDN−n |ABCn Dn + βEDN−n |ABCn D
≈
2
(4.42)
EDN−n (~xN ,~x1 ) ≡ ED (~xN ,~x1 ) − EDn (~xn ,~x1 )
The left and right averages in the numerator form upper (ABCn Dn system) and lower (ABCn D)
bounds on the free energy by the Gibbs-Bogoliubov inequality [75, 168, 21, 35, 71, 110].
Figure 4.4 illustrates the convergence to Gaussian behavior for water solvation in water with
increasing λ. The initial unconditioned distribution from the uncoupled system has a large positive
energy tail due to high energy water-water close contacts, and the coupled system also has a specific
structure at low energies due to close contacts. As the conditioning distance is increased, these
close contacts are eliminated and replaced by many interactions of similar magnitude with first shell
waters – hinting at the origin of the observed Gaussian form.
67
Probability (kJ/mole)-1
4.5. COMPUTING CONDITIONAL AVERAGES VIA RE-WEIGHTING
10-1
Non-Interacting
10-2
Interacting
10-3
10-4
10-5
10-6
-200 50 300 550 800
-150 -100 -50 0
-1
10
3.0 A
10-2
10-3
10-4
10-5
10-6
-100 -80 -60 -40 -20 0 20
-50
-30
-10
Energy (kJ/mole)
2.6 A
50
3.5 A
10
Figure 4.4: Effect of conditioning on the interaction energy distribution between a bath of TIP3P
water molecules and a single solute water. The upper left figure contains no conditioning. Solutesolvent interaction energy distributions become increasingly Gaussian as λ was increased from zero
to 3.5Å – evidenced by plotting the distributions corresponding to the coupled (×) and uncoupled
(+) cases. Note that the histograms also become more noisy due to a decrease in available samples
w.r.t λ.
68
CHAPTER 4. FREE ENERGY INFERENCE FOR A MULTINOMIAL COUNTING PROBLEM
The expressions for estimating re-weighted averages and variances are
N
ˆ = ∑i=1 Fi wi
hFi
∑Ni=1 wi
2
ˆ
D ˆ E ∑Ni=1 Fi − hFi
wi
.
σ̂2F ≡ δF 2 =
N
∑i=1 wi
Also, the standard deviation of the mean should be
(4.43)
(4.44)
q
σ̂2F /(Ne f f − 1). The number of samples N
should be the total number of samples from all simulations with λ j ≤ λ.
Calculating conditional averages and effective sample numbers as a function of λ for any tabulated quantity can be easily implemented in a single Python script. The script takes columnˆ and σ̂2
formatted M(~x), rmin , and F trajectory data 1 , and outputs a datafile containing λ, Neff , hFi,
F
as a function of λ. It does this by iterating over λ j values and assembling a list of M i , and F i values
conforming to rmin > λ j from all simulations with min(rmin ) < λ. A subroutine calculates c j values
j
for each simulation independently, and then returns the results of Eq.s 4.39,4.43, and 4.44. Using
this program, it is possible to calculate conditional average profiles (i.e. as a function of λ) for any
simulation observable.
The three parts of the free energy (along with error estimates) can now be computed by adding
ex
together the free energy profiles computed using §4.3 and Eq.4.42 as µex = µex
OS,HS (λ) + µOS,LR (λ) +
µex
IS (λ). The result should formally be independent of λ. However, deviations from a straight line can
be observed at small λ because of error in the Gaussian approximation 4.42, and at high λ because
of poor sampling. From our experience with nonpolarizable and polarizable ion simulations, the
Gaussian approximation can be expected to hold (with maybe 1 or 2 kCal/mol error) when λ >
hrmin |ABDi. For the purpose of estimating µex when a straight line is not apparent, we have used
the largest (plus 1 standard deviation) and smallest (minus one standard deviation) values in a range
of λ around the center of the plot (half of the range where all three components can be reliably
1 Each column holds the values of single quantity, F, for every sampled configuration. Each row holds the value of all
quantities at a particular frame number, i. One file holds values from one simulation.
69
4.6. APPLICATION TO POLARIZABLE ION SOLVATION
estimated).
4.6
Application to Polarizable Ion Solvation
We have analyzed the roles of size and polarization in determining the thermodynamics and structure of bulk anion hydration for the Hofmeister series Cl− , Br− , and I− using the present QCT
method. One nanosecond of molecular dynamics using the AMOEBA polarizable force field as
implemented in Amber 10 [29] was used to generate 2000 frames each for pure water, WCA model
potential+water, ion+water, and ion+WCA+water systems. Histograms of rmin (using the reweighting of §4.4 as necessary) were calculated between the solute particle and water oxygens for
all but the pure water system, where test particle positions (systematically generated on a grid) took
the place of the solute. These were fed as input to a script implementing the free energy inference
ex
of §4.3 to generate µex
IS and µOS,HS profiles. For every frame other than pure water simulation data,
∆E of Eq. 4.1 was calculated using Amber and corrected for changes in net charge by addition of a
uniform background correction term [19] to both endpoints
Ec = −332.0716
Q2 π
kCal/mol,
2V η2
(4.45)
where η is the Ewald coefficient (Å−1 ), V is the system volume (Å3 ) and Q is its net charge (e). Using the method of §4.5 to average these ∆E-s gives µex
OS,LR under the Gaussian assumption. Adding
these three parts gives the total solvation free energy. Table 4.1 compares the calculated free energies to experimental measurements on whole salts. As observed by the originators of the AMOEBA
force field [137, 60], we find excellent agreement with experiment.
Decomposing the µex
OS,LR component gives meaningful insight into the role of the ionic size and
polarizability parameters in the thermodynamics of ion solvation. Understanding the interplay of
these two key parameters in molecular simulations using point induced dipoles has shown some
of the strengths and weaknesses of force fields including polarizability. We chose to focus on the
70
CHAPTER 4. FREE ENERGY INFERENCE FOR A MULTINOMIAL COUNTING PROBLEM
µex
Current
Friedman[51]
Tissandier[163]
Schmid[147]
Na-Cl
-174.2 (1.6)
-174.1
-174.0
-174.0
Na-Br
-167.7 (2.1)
-170.8
-167.6
-167.6
Na-I
-159.7 (2.0)
-159.7
-158.7
-159.2
Table 4.1: Partial molar hydration free energies for whole salts at infinite dilution, in kcal/mol.
Reproduced from Ref. [141].
OS,LR part of the free energy because it contributes upwards of 90% to the total µex at the λ chosen
for investigation here (3.12, 3.29, and 3.5 Å for the Cl− , Br− , and I− ions, respectively). The
contribution of µex
IS is very small (roughly -0.5 kcal/mol) for all three anions, while at the above radii
µex
OS,HS is 5.3, 6.3, and 7.7 kcal/mol.
Figure 4.5 and Table 4.2 show that hydration free energies display a significantly stronger dependence on ion size than on ion polarizability, consistent with free energy calculations using other
polarizable models [174]. In fact, the differences in total solvation free energies between different
polarizability parameters are not significant with respect to our inference uncertainties; they would
all agree with experiment if used in Tbl. 4.1.
We have further divided the total µex
OS,LR into first-order electrostatic, induction, and van der
Waals (vdW) components. The first-order electrostatic contribution is the electrostatic interaction
energy when solute and solvent are brought together (to a pre-sampled configuration) instantaneously, without allowing the induced dipole moments to relax. It makes up the largest part of
the long-ranged contribution, and it exhibits size-dependent specificity and a slight decrease in favorability with increasing polarizability. As expected, the subsequent solute and solvent dipole
relaxation energy (termed the induction contribution) displays the largest dependence on polarizability, and the observed downward trend with increasing polarizability overcomes the reverse trend
in the first-order electrostatic part. The vdW piece exhibits an ordering that is inverted relative to
the electrostatic parts, with the Cl− ion displaying the largest positive value. This vdW contribution
thus serves to counteract somewhat the size-driven separation in free energies due to the OS,HS
term, but displays little ion specificity compared to the electrostatic parts of the free energy.
71
4.6. APPLICATION TO POLARIZABLE ION SOLVATION
Free Energy Components
−60
Total
LJ
−70
Cl− size
Br− size
I− size
8
4
−80
Energy / kCal/mol
12
0
−90
−4
Induction
−60
0
−70
−10
qΦ
−80
−20
−90
−100
0
α / A3
−30
4
5.65 7.250
4
5.65 7.25
Figure 4.5: Contributions to the OS-LR conditional free energy average at λ ≈ hRmin i1 . As explained
in the text, the components are averages over the coupled and uncoupled cases. The corresponding
contributions from a nonpolarizable Cl− -SPC/E system are shown as single points. In the upper lefthand panel, the upper curves for each ion are the total hydration free energies obtained by adding
the inner-shell and outer-shell packing contributions.
72
CHAPTER 4. FREE ENERGY INFERENCE FOR A MULTINOMIAL COUNTING PROBLEM
Solvation Anisotropy
H2O
Cl−
Na+
2.0
1.5
1.0
α=0
α=4
α=5.65
α=7.25
0.5
Water CM Distance / Å
0.0
I−
Br
1.5
1.0
0.5
0.0
Cl−−
Br
I−
α=5.65
1.5
1.0
LJ
Na−like
Cl−like
Br−like
I−like
0.5
0.0
−0.5
0
5
10
15
n
5
10
15
20
Figure 4.6: Effect of size and polarizability on the solvation shell organization. Plotting the projection of the average center of mass of the closest n solvent waters onto a vector in the direction of the
first three shows how quickly subsequent layers of solvent relax to an isotropic distribution. Labels
are by polarizability parameter with the exception of the lower right panel which shows uncharged,
nonpolarizable solutes with the same vdW parameters as the corresponding ions.
73
4.6. APPLICATION TO POLARIZABLE ION SOLVATION
Next a structural analysis was performed to examine the solvation anisotropy around the anions.
Figure 4.6 presents plots of the distance to the average center of mass of the nearest n waters (ranked
by center of mass) after projecting along a vector roughly in the direction of the closest three waters
(see Ref. [141] for details). This anisotropy is significant because it shows one of the mechanisms
by which ordering may increase in response to increased polarization. It also indicates a possible
role of polarization on surface affinity, since a rightward shift of the first zero in the anisotropy plot
(N0 in Tbl. 4.2) may indicate a higher solute preference for surface-like solvation shells [176]. As
opposed to the hydration free energies, we find that the solvation anisotropy depends more on ion
polarizability than on ion size. Increased polarizability leads to increased anisotropy along the sequence Cl− , Br− , I− , for their normal AMOEBA parameter values (α = 4, 5.65, 7.25, respectively).
In the lower-left panel, we present the anisotropy plot for the sequence Cl− , Br− , I− but with a
fixed ion polarizability of α = 5.65 Å3 . Surprisingly, there appears to be little size dependence to
the anisotropy for charged ions. The final lower-right panel displays the anisotropy for uncharged
solutes with vdW parameters corresponding to each ion. For uncharged solutes, there is a clear size
dependence, and the water solvation shell is quite asymmetric.
Referring to Tbl. 4.2, the average distance to the closest water shows relatively little dependence
on ion polarizability, but does show a substantial compression from turning on the ion charge as
a consequence of electrostriction. The average ion dipole moment magnitude exhibits strong dependence on ion polarizability as expected. As points of reference, the average dipoles for the Cl−
and Br− ions with their AMOEBA parameters are 1.56 and 2.19 D, respectively, compared with
the AIMD values of roughly 0.8 and 1.0 D [61]. It thus appears that the AMOEBA force field is
appreciably over-polarizing the ions.
This over-polarization is significant because the majority of classical simulations that have examined surface affinity have targeted the ion polarizability as the key determinant of the observed
enrichment of anions at the water/vapor interface [90, 126, 30, 176]. Other simulations have emphasized the important role of ion size [43, 164], while Hagberg, Brdarski, and Karlstrom [63] have
shown that both factors play important roles. Intuitively, many-body polarization should be a major
74
CHAPTER 4. FREE ENERGY INFERENCE FOR A MULTINOMIAL COUNTING PROBLEM
Size
Na
Na
Cl
Br
I
Cl
Cl
Cl
Br
Br
Br
Br
I
I
I
I
Charge
+1
0
0
0
0
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
α
0.12
0
0
0
0
0
4
5.65
7.25
4
5.65
7.25
0
4
5.65
7.25
N0
5
10
14
16
18
6
8
9
8
9
9
10
9
9
9
10
< Rmin >1
2.279(2)
2.837(8)
3.344(6)
3.492(5)
3.681(7)
3.116(2)
3.115(2)
3.119(2)
3.304(3)
3.289(3)
3.281(3)
3.274(2)
3.515(3)
3.496(3)
3.485(3)
3.470(3)
hDI i
0.0238(3)
0.0
0.0
0.0
0.0
0.0
1.56(2)
2.92(4)
0.0
1.16(2)
2.19(3)
3.61(5)
0.0
0.80(1)
1.48(3)
2.37(4)
hDW i
2.57(06)
2.72(05)
2.61(06)
2.67(05)
2.68(02)
2.69(09)
2.66(04)
2.70(07)
2.63(04)
2.61(07)
2.65(07)
µex
-87.7(11)
1.5(2)
2.8(3)
3.1(3)
3.5(5)
-85.0 (10)
-84.5 (5)
-85.9 (6)
-77.0 (10)
-77.5 (10)
-78.0 (10)
-79.1 (4)
-68.0 (8)
-69.0 (10)
-69.2 (9)
-70.0 (9)
Table 4.2: Comparison of local solvation environment indicators (reproduced from Ref. [141]):
vdW parameters, charge (e), polarizability parameter (Å3 ), N0 , the number of waters required for
Fig. 4.6 to reach zero, average distance to the closest water oxygen (Å), average ion dipole moment
(Debye), average first-shell water dipole moment (Debye), and single-ion total hydration free energy
(kcal/mol). Numbers in parentheses indicate numerical uncertainty in the last digit.
75
4.6. APPLICATION TO POLARIZABLE ION SOLVATION
contributor [126, 176], especially when the higher electric field of the interface is noted. Experimental evidence [33, 32], however, has suggested a greater correlation of surface affinity with size
and hydration free energy than with polarizability. In addition, size-dependence of surface enrichment has been observed for nonpolarizable anions [43], which should have similar anisotropy plots
as argued above. Because the ion polarizability is closely correlated with solvation anisotropy, we
plan to further examine the issue of over-polarization of the anions in several of the classical models
in relation to enrichment of anions at the water/vapor interface [181].
76
Chapter 5
Inference on the PMF using
Spline-Based Approximations
Ch. 3 showed that the potential of mean force (PMF) describes the canonical distribution of coarsegrained degrees of freedom. Monte Carlo or molecular dynamics simulations using this free energy
in place of the traditional potential energy will therefore sample the correct distribution in this
reduced dimensionality setting. This chapter aims to form and parametrize useful approximations
to the PMF which will allow us to do dynamics at the coarse-scale.
Representability of the coarse-grain PMF is the first issue to consider. If the set of all coarse configurations were directly partitioned using a grid-based decomposition, total memory requirements
for storing an energy at each grid point would increase exponentially with the number of degrees of
freedom. This forces a different approximation scheme for nontrivial coarse systems. A better idea
is to adopt the approximation used for conventional all-atom force fields, writing the function as a
sum of additive one-dimensional functions.
E(x) =
∑
bonds,ij
Ebond (ri j ; i j) +
∑
angles,ijk
Eang (cos θi jk ; i jk) +
∑
torsions,ijkl
Etor (φi jkl ; i jkl) +
∑
Epair (ri j ; i j)
pairs,ij
(5.1)
Here, the sets of atoms indicated by the subscripts, i, j, etc. can depend on the system setup as well
77
as the particular configuration (e.g. group-based cutoffs). The largest number of terms in Eq. 5.1
come from the last sum over pairs of atoms. This number of pairs scales as the square of the number
of atoms, so assuming a force field model brings the memory storage requirements down from
exponential to quadratic in the system size.
In probability terms, Eq. 5.1 expresses the assumption that certain subsets of the system coordinates are (conditionally) independent. The condition is due to the minor irritance of pairwise forces
that link otherwise separable terms. However, if pairwise forces were turned off (or their values
fixed), the above expression would state that all of the bonds, angles, and torsions are moving independently from one another. This likens additivity in force fields with the statistical definition of
independence that states two variables are independent if their joint probability can be factored into
a product of marginal probabilities.
The conceptual distinction between internal and non-bonded, pairwise terms suggests determining the form of the non-bonded interactions first and then fitting appropriate functions to the bonded
interactions. Although pre-calculating non-bonded terms could save significant computation time
for large scale biopolymer systems, it is not pursued here due to the simplicity of the systems considered.
The commonly used functional forms for Ebond ,Eang , etc., intentionally left unspecified in Eq. 5.1,
do not need to be assumed. Rather the form for the energy function referred to in Ch. 3 is decided
by the choice of independent coordinates. As long as each additive energy term operates on only
one variable, it can be represented by a one-dimensional spline using p parameters, θ.
p−1
y(r) =
∑ Bn (r)θn ≡ B(x) · θ
(5.2)
n=0
The vector notation makes B(r) a row-vector of spline coefficients and θ a vector of parameters.
We calculate the spline coefficients by evaluating standard B-spline basis functions [47]. It is
important that B-splines allow a choice of the energy function’s continuity, via the spline order, as
well as its resolution, via the number of free parameters (knots). Numerical integration relies on
78
CHAPTER 5. INFERENCE ON THE PMF USING SPLINE-BASED APPROXIMATIONS
smooth derivatives; complicated energy functions require higher resolutions. A useful property of
B-splines is that for all choices of the knot spacing and spline order, the best B-spline approximation
to any given function has the same order of approximation error as a polynomial fitting [136].
Since all the terms in the energy function are additive, we can absorb the additions of Eq. 5.1
into the vector notation by concatenating the coefficients and parameters from all the functions to
redefine larger vectors B and θ.
E(x) = B(x) · θ
(5.3)
For the case of N f energy components, E k (r) = Bk (rk ) · θk with spline coefficients Bk (r)T ∈ R pk ,
this means that B(x) = [B1 (r1 ), B2 (r2 ), . . . , BN f (rN f )] – taking the total number of parameters to be
N
f
p = ∑k=1
pk .
The forces (negative energy derivatives) are of the form F(xl ) = Dl · θ, where Dl is a 3N ×
p matrix depending on configuration Rl . It can be constructed by concatenating Dk -s made by
individual tensor (outer) products for each term in Eq. 5.1
Dk =
∂rk
⊗ Bk (rk ).
∂~x
(5.4)
Using this representation for the PMF, the rest of the chapter will carry out the probability
analysis outlined at the end of Ch. 3. Results will be presented for several test cases of increasing
complexity.
5.1
Problem Formulation
To form the inference problem, after considering the hypotheses space (which was done in the last
section), we must consider next the data to be observed. Once appropriate likelihood and prior PDFs
are found, Bayes theorem gives the posterior PDF for the parameters (viz. Eq. 3.46).
Using the traditional Langevin equation, the effective force (ui+1 − ui ) at each time-step should
be a function of the coordinates Ri plus random noise. If we take the set of configurations to
79
5.1. PROBLEM FORMULATION
Spline Fitting
2
1.5
1
y
0.5
0
-0.5
f(x)
y Samples
GLS, p=40
GLS, p=20
Bayesian, p=100
-1
-1.5
-2
-3
-2
-1
0
x
1
2
3
Figure 5.1: Overfitting using generalized linear least squares – although decreasing the number of
parameters from 40 + r − 1 (allowing an exact fit of the 40 observations) to smaller values decreases
the roughness, it also decreases the spline’s approximation ability. In contrast, the present method
gives a good fit even using 100 + r − 1 parameters.
be given at the outset of the problem, then the data are composed of these forces. From (3.24),
ui+1 − ui = f (Ri ) + σξ − γui . If the value of ui is not explicitly specified (e.g. the ordering of the
forces in time has been lost), then ui+1 − ui ∼ N1 f (Ri ), σ2 + (γ1 + 1)2 gives the likelihood function for a single observation. Assuming conditional independence given the set of Ri and a com-
plete parametrization for f (Ri ), σ, the likelihood for an entire set of observed forces is a product
of these normal distributions. For notational convenience, we’ll let z represent the inverse variance
z−1 ≡ σ2 + (γ1 + 1)2 .
Next, the prior for parameter space must be given. A standard idea is to assign a uniform prior
to all parameters. The procedure of finding the maximum of the posterior PDF using this prior is
equivalent to the generalized linear least squares [135] (maximum likelihood) fitting. Unfortunately,
80
CHAPTER 5. INFERENCE ON THE PMF USING SPLINE-BASED APPROXIMATIONS
this prior leads to over-fitting of the observed data points (example in Fig. 5.1), since there is no
penalty for functions with very large oscillations.
A more desirable prior probability for the set of spline parameters, θ, would penalize highly
oscillating functions. Ideally, such a penalty would allow the number of parameters to be arbitrarily
high without changing the shape of the fitted function – converging to a limit at infinite resolution.
Such a prior can be found by physical analogy to the traditional spline device [178]. It consisted
of a section of rubber held in place with push-pins at several points along a desired smooth curve.
By analogy to this device, the prior probability distribution for the function will be taken as the
canonical distribution of the stretched string at thermal equilibrium with a heat bath.
P (θ|λI) ∝ e
−βE y(r;θ)
=
λ
λ (p−n)/2 − 2 θT ·Q·θ
|Q|1/2 ( 2π
)
e
(5.5)
(5.6)
The specification of an energy function, βE(y), thus leads to a complete description of the probability density for our spline’s position. This connection between energies and probabilities equates the
minimum energy solution to the maximum posterior likelihood estimate.
Equation 5.6 has used the standard expression for the potential energy of a stretched string,
2
R
E y(r; θ) = 21 T (r)∇(n) y(r) ρ(r)dr, where T is the tension and ρ is the density. Because B-
spline functions are linear in their parameters, the derivative operator works on the B-spline coefficients, B, to make
∇(n) y(r) = A(r) · θ,
(5.7)
so that we can define
1
Q≡
V
V≡
Z
Z
T (r)
A(r)T · A(r)ρ(r)dr
T0
ρ(r)dr
λ ≡ βV T0 .
(5.8)
(5.9)
(5.10)
81
5.1. PROBLEM FORMULATION
This prior for the spline parameters leads directly to the well-known problem of penalized
splines (P-splines) in the mathematics literature. In this field, the difficult problem has traditionally been determining the penalty parameter, λ, which depends on the scale of the input data {y, r}.
Many proposals for choosing the penalty scale parameter have been considered [138, 104, 89], including introduction of non-uniform (in r) penalty and/or variance parameters [52, 22, 144, 16].
Adaptation to cases with non-Gaussian sampling error have been addressed by consideration of
inference schemes with alternative likelihood functions [4].
More recently, Lang and Brezger [104] interpreted the complete fitting process as a Bayesian
inference problem. As in the present analysis, this leads to consideration of the penalty function as
a prior distribution on function space, with the attendant penalty parameter as a (nuisance) hyperparameter.
From Jaynes [83], the standard priors for scale variables λ and z should be 1/λ and 1/z, respectively. My original coding of this method using this prior worked most of the time. However, it
has one serious numerical issue which appears in the limit as λ approaches infinity. This situation
occurs when the sample size is small and the derivative order is large, allowing the belief that an
n − 1 degree polynomial is a possible solution to the inference problem (i.e. a singularity at λ = ∞).
Because MCMC sampling alternates between drawing values for θ and λ, it can get stuck once the
algorithm encounters a λ large enough to force a polynomial solution.
We have fixed this problem by adding a small residual energy, E0 to the stretched string. [142]
This will enforce a minimum amount of deviation in the posterior distribution of the spline function
and deny the possibility of an exact polynomial fit. This creates results consistent with the notion
that the true solution to the fitting problem should be smooth, but not exact. Adding a similar
constant, V0 , multiplying z denies the possibility of exactly fitting the input measurements [170].
The joint prior for θ, z, λ is now
P (θzλ|I) = P (θ|λI)P (λ|I)P (z|I)
λ
∝ λ(p−n)/2−1 z−1 e− 2 (θ
82
T ·Q·θ+E
0
)− zV20
(5.11)
CHAPTER 5. INFERENCE ON THE PMF USING SPLINE-BASED APPROXIMATIONS
The priors for λ and z are now technically Gamma distributions with parameters (for λ) a = 0,b = E0 .
From the above discussion, the prediction for λ is effectively specifying an energy scale for
measuring displacements in our string. Eq. 5.11 is equivalent to starting with a scale-invariant prior
2
and adding ν observations of small displacements, ∇(n) y(ri ) = w2i to our state of knowledge –
giving a Gamma distribution for λ| E0 = ∑ν1 w2i with parameters a0 = ν/2, b0 = E0 /2. Although
we chose ν = 0, Lang and Brezger [104] tried this approach with ν = 2 and found that λ depends
on the scale of the function measurements due to bias toward a0 /b0 : the prediction for λ implied by
the added displacement observations. Jullion and Lambert [89] set ν = 2 and provided a hyperprior
for E0 /ν. The corresponding interpretation is that ν prior observations of the string displacement
were made but subject to large uncertainty.
The next section will present the posterior distribution of the spline and scale parameters, comparing the predictions between the present and other Bayesian methods in the literature.
5.2
Solution for One-Dimensional Systems
Applying Bayes’ theorem to calculate the posterior distribution given S data samples, D, we find
P (θzλ|DI) ∝ λ(p−n)/2−1 zS/2−1
n o
× exp − 2z kY − D · θk2 +V0 − λ2 θT · Q · θ + E0 .
(5.12)
The conditional posterior distributions are therefore
θ| · · · ∼ N(θ̄, Σ )
Σ −1 · θ̄ = zDT ·Y
z| · · · ∼ Γ S/2, (kD · θ −Y k2 +V0 )/2
λ| · · · ∼ Γ (p − n)/2, (θT · Q · θ + E0 )/2 ,
(5.13)
Σ −1 = λQ + zDT · D,
(5.14)
(5.15)
83
5.2. SOLUTION FOR ONE-DIMENSIONAL SYSTEMS
3
10
10
X
Y, Hyperprior
-1
Z, 10-3 Bias
Z, 10 Bias
.07-x
6
10
2
1/2
2
10
1
λ
-2
10
10
RMSE
10
10
.07
-6
10
0
10
-10
10
-10
10
-6
10
-2
10
2
10
6
10
Scale
-6
10
-2
10
2
10
6
10
10
10
Figure 5.2: Posterior average penalty parameter showing variation with respect to total problem
scale. For each method (X: this report, Y : Lambert and Jullion, Z: Lang and Brezger) the y coordinates of 100 sets of 50 equally spaced random samples are scaled and the average (left panel) and
deviation (right panel) of ln λ over fitting trials were estimated.
where we have formed a matrix D ∈ MS×p by stacking row-vectors of spline coefficients B(r) from
all observations. Similarly, Y is a column-vector of all the observed function values.
Using MCMC techniques [143, 54] to sample this posterior distribution Eq. (5.13–5.15) is the
direct analogue of observing multiple positions at thermal equilibrium of a spline device (as described earlier) attached to the data points by means of springs with force constant z. Alternately,
a maximum likelihood estimate (MLE) can be carried out using relatively fewer iterations of the
above steps by simply maximizing each conditional posterior and iterating until convergence. Both
processes require a Cholesky decomposition for θ at each iteration, which is an O(p3 ) operation,
and results presented in §5.6 show problems using the MLE for small sample sizes.
This Bayesian inference procedure has been carried out for the system shown in Fig. 5.1 using
four different prior distributions: the present prior distribution with E0 = V0 = 10−10 (X in Fig. 5.2),
the prior of Jullion and Lambert containing a hyperprior on E0 with ν = 2, νE0 ∼ Γ(10−4 , 10−4 ) (Y
p
in Fig. 5.2), and the original prior of Lang and Brezger (Z in Fig. 5.2) with a0 /b0 = 10−1 or 10−3 ,
√
biasing λ accordingly. Figure 5.2 shows the average (left panel) and root mean square deviation
(right panel) of the Bayesian estimate for ln λ over 100 trial sets of random samples {x, y}51 0. The
random samples were generated with equally spaced x values and y values drawn from a Gaussian
distribution with mean sin(πx/3)/0.72 and unit variance before scaling by factors ranging from
84
CHAPTER 5. INFERENCE ON THE PMF USING SPLINE-BASED APPROXIMATIONS
10−9 to 10+9 . The same 100 sets of random samples were used for all methods and problem scales
in Fig. 5.2. Twenty 4th order (cubic) B-spline parameters were used to fit the function on a periodic
domain [−3, 3), and the order of the penalty function was set to n = 2, making the default solution
a straight line in the absence of data.
For these methods to give correctly scaled solutions to the fitting problem, the product λθT Qθ
√
must remain invariant to the scale of θ on average, implying that λ vs. the problem scale should
p
be a straight line in Fig. 5.2. As expected, Z shows a bias toward the prior a0 /b0 values, giving
good results for scales of 10−1 or 10−3 . However, they show overfitting / oversmoothing when the
scale is below/above that value as λ is too small or too large, respectively. Method Y gives good
results for very small problem scales, but fails for large problem scales, causing oversmoothing and
widely varying ln λ estimates. This is because λ must decrease at large scales, but the prior for λ is
too small close to zero.
For X, the method breaks down at scales less than 10−5 . This happens because the scale of the
√
input data is below the order of E0 , violating our assumption in setting E0 . To understand this in
the context of our prior from Sec. 5.1, even as the data scale diminishes (reducing θT Qθ), λ gets
stuck at E0 and can’t go higher. This can be seen directly in Fig. 5.2. The same process happens for
z, which gets stuck at V0 as the scale diminishes. However, as long as a relevant minimum length
scale for the y data is known, the present method has the most predictable performance. At large
scales,
E0
θT Qθ
is large, E0 ,V0 have become irrelevant and scale-independence is achieved. At small
scales, the method avoids numerical divergence as required.
5.3
Resolution Dependence of P-Spline Estimation
The above problem requires us to choose the number of parameters for the spline and does not
indicate the dependence of the final answer on the chosen resolution. This section will show a
correspondence between the average of Eq. 5.13 and the solution of a resolution-independent differential equation by deriving the differential problem form and an error estimate for our B-spline
85
5.3. RESOLUTION DEPENDENCE OF P-SPLINE ESTIMATION
solution.
A general formula for (the Fourier transform of) the convolution kernel solving this differential equation was first given by Silverman [155]. Subsequently, several authors improved the results on asymptotic convergence of smoothing spline and convolution-based methods, including
Messer [115] and Messer and Goldstein [116], where convenient approximate convolution kernels
were derived for fixed sample spacing. Nychka [125] relaxed the equal spacing restriction and
showed that the convolution kernel decays exponentially as long as the samples are spaced closely
enough. Finally, Abramovich and Grinshtein [1] provided a systematic derivation of an asymptotically equivalent convolution kernel for arbitrary derivative order, sample spacing, and variable
tension. We will briefly sketch the results of the latter in this section, and then provide the connection to the finite element solutions recommended in this work, establishing their convergence
properties with respect to increasing resolution.
We begin with the task of finding the function u ∈ Hn (Ω) (where Hn (Ω) is the standard Sobolev
space of functions with square integrable derivatives on Ω up to nth order) which minimizes the
functional (identifying al pha ≡ λ/z)
Φ[u] =
S
1
S
∑ (yl − u(rl ))
2
Z
+ VαS
Ω
l=1
ρ(r)u(n) (r)2 dr.
(5.16)
In the following discussion we are considering both the infinite, Ω = R, and periodic domains,
Ω = [0,V ). Next, note that we can re-formulate the sum appearing on the right hand side as an
integral using the definitions
y(r) =
φ(r) =



 ∑{l:r=rl } yl , r ∈ {rl }S1
1
∑
{l:r=rl }


0,
otherwise
S
1
S
∑ δ(r − rl ),
(5.17)
l=1
so that Φ can be conveniently expressed as a norm using the scalar product ha, bi =
86
R
Ω ā(r)b(r)dr,
CHAPTER 5. INFERENCE ON THE PMF USING SPLINE-BASED APPROXIMATIONS
where ā denotes complex conjugation.
D
E
Φ[u] = const.({yl }S1 )+ < φh (y − u), φh (y − u) > + VαS ρh u(n) , ρh u(n)
(5.18)
To keep the notation simple, the definitions φh ≡ φ1/2 and ρh ≡ ρ1/2 are made.
The solution can be found by setting the functional derivative to zero since the functional is
positive and has a unique minimum as long as ρh is bounded above zero everywhere in Ω and φ(r)
contains at least n mass points [95]. We proceed as in Kimeldorf and Wahba [94] by using the scalar
product invariance of the Fourier-Plancherel transform[161] g̃(ω) ≡ Fg = (2π)−1/2
as well as the convolution theorem F[ab] = (2π)−1/2 ã ∗ b̃ to get
iωr
Ω e g(r)dr,
R
Φ[ũ] = −2(2π)−1/2 F[φh y], φ̃h ∗ ũ + (2π)−1 φ̃h ∗ ũ, φ̃h ∗ ũ
+(2π)−1 VαS hρ̃h ∗ ((−iω)n ũ), ρ̃h ∗ ((−iω)n ũ)i + const.
(5.19)
Functional differentiation with respect to ũ(ω) yields the minimum as the solution of
F[φy](ω) = F[φu] + VαS ωn F[ρF −1 [ωn ũ]]
φy = φu + VαS (−1)n ∂t∂ n (ρu(n) ).
n
(5.20)
(5.21)
The last equation gives us the differential operator form of the minimization problem (5.16)
[95, 169, 2]. The interpretation for continuous sample density φ(r) is immediate, however it also
remains valid for finite S in the following sense. On any interval between sample points, φ(r) as
defined above is formally zero, and hence the solution u(r) is that of the homogeneous equation –
i.e. a 2n − 1 order polynomial when ρ is constant or (by definition) an L-spline for general ρ [94].
When integrating equation (5.21) over small regions, we can treat all terms in the equation on equal
footing at the sample points. The net effect of the sample points is thus to effect changes in the
function’s nth (and higher) order derivatives.
The above argument (and references) show that the unconstrained solution of Eq. (5.16) is a
87
5.3. RESOLUTION DEPENDENCE OF P-SPLINE ESTIMATION
unique element of Hn (Ω). Next we want to know if the spline estimate can approximate this “infinite
resolution” solution to arbitrary accuracy as the number of spline parameters increases. To do this,
we draw an analogy between the B-spline solution θ̄|α of Eq. (5.13) and the Ritz-Galerkin finite
element (weak) solution of Eq. (5.21). This lets us apply the typical error bounds for B-spline
approximation [136].
Multiplying (5.21) through by a test function v(t) and integrating gives the bilinear form
a(u, v) =
Z
Ω
φuv + VαS ρu(n) v(n) dt
(5.22)
To use standard procedures to get the approximation error in the induced norm kuka =
we first define the norms
|v|k,m ≡
with |v|k,∞ as the maximum of v(k) over Ω.
s
m
Z k m
d v dr,
k
(5.23)
dr
Ω
p
a(u, u)
Now, let uh = D({r}S1 ) · θ̄ be the solution of Eq. (5.13) and note that it is the minimizer of
ku − uh ka over all uh in the space of Rth order Cardinal B-splines with knot spacing h (denoted SRh ).
We thus have, for any vh ∈ SRh ,
ku − uh k2a ≤ ku − vh k2a
ku − vh k2a
S
=
1
S
∑ (u(rl ) − vh (rl ))
l=1
2
+ VαS
Z
Ω
(n)
ρ(u(n) − vh )2 dr
2
ku − uh k2a ≤ |u − vh |20,∞ + αρVmax
S |u − vh |n,2 .
(5.24)
Where we have defined ρmax = |ρh (u − vh )|0,∞ . Choosing vh as a projection of u onto SRh following
Reif [136], we find
ku − uh k2a ≤ (C1 hR )2 + αρSmax (C2 hR−n )2 ,
(5.25)
where C1 and C2 are constants proportional to |u|n,∞ .
The approximation error thus falls into two regimes depending on the “effective” samples per
88
CHAPTER 5. INFERENCE ON THE PMF USING SPLINE-BASED APPROXIMATIONS
p
interval hn S/(αρmax ). For large effective sample sizes, the first term in Eq. (5.25) dominates
q
and gives S1 ∑Sl=1 (u(rl ) − uh (rl ))2 ∼ O(hR ). For small sample sizes, appropriate in the infinite
resolution limit, the second term dominates and gives |u − uh |n,2 ∼ O(hR−n ) – both of which are of
the same order as an optimal polynomial approximation to u.
From Eq. (5.20) it is also easy to find a convolution kernel estimate for u. Writing Eq. 5.13 as a
convolution kernel, we have
S
r − r0 rl − r0
Yl
,
h
h
l=1
−1 R
· Bc (y),
G (x, y) ≡ BRc (x)T · S1 DT D + αS Q
uh (r) = S−1 ∑ G
(5.26)
(5.27)
so that our function estimate is a linear combination of centered B-spline basis functions, BRc . Setting
ρ = 1 in the large sample size limit, when φ is approximately constant over a large range of knots,
G is symmetric and its Fourier transform [from Eq. (5.20)] approaches [155]
−1
αω2n
.
G̃(ω) = 1 +
V Sφ
(5.28)
Convolution kernels for several derivative orders, n, are plotted in Figure 5.3. Abramovich and
Grinshtein [1] have provided a method to systematically derive such asymptotic (continuous φ)
approximations for general ρ and φ.
These asymptotic approximations give a sense of how the algorithm changes in response to differing penalty orders, n. From the discussion following (5.21), as n increases, higher order derivatives of the function are changing at the sample points, leading to a larger width convolution kernel.
Note that the asymptotic approximation plotted in Fig. 5.3 breaks down when S < p. However resolution independence is still maintained; since in intervals without design points the solution will
approximate the L-spline solution to (5.21). This provides a second interpretation to n as specifying
the order of the “default” polynomial solution.
89
5.3. RESOLUTION DEPENDENCE OF P-SPLINE ESTIMATION
0.8 HaL
HbL
0.6
0.4
0.2
0.0
0.8 HcL
HdL
0.6
0.4
0.2
0.0
-10
-5
0
5
10
-10
-5
0
5
10
Figure 5.3: Spline smoothing equivalent convolution kernels, G(x − y), plotted for α/V S =
{1, 1/3, 1/9} (solid black, dashed, thick grey, respectively), and φ = 1. From (a) to (d), the order of the penalty derivative increases from n = 1 to n = 4. Higher sample numbers cause the kernel
to steepen, placing more emphasis on x = 0, while higher α values do the opposite.
90
CHAPTER 5. INFERENCE ON THE PMF USING SPLINE-BASED APPROXIMATIONS
5.4
Multi-dimensional generalization
When multiple additive functions are present in the fitting equation (5.1), each additive function
should have its own string energy prior. This turns λ into a list of dimension N f and Q into a set
of matrices penalizing their respective functions. The posterior probability for each element of λ
replacing Eq. (5.15) is thus
λk | · · · ∼ Γ (pk − n)/2, (θTk Qk θk + E0 )/2 .
(5.29)
Concatenating the parameters as described in the introduction makes Q a null matrix except for
sub-blocks Qk along the diagonal.
Allowing multiple functions to share the same set of spline parameters collapses the total set of
functions to identify via combining functions Bk which share parameters θK to make a total of NF
unique functions. Note the use of capitalized indices for combined sets.
Dl,K =
∂rkl
∑ ∂~x ⊗ Bk (rkl )
k∈K
(5.30)
To do inference on the PMF using force data, an independent normal distribution for the force
on each coarse coordinate was assumed at the beginning of the chapter. This turns z into a list of zi ,
and replaces the mean and variance of Eq. (5.13) with
3N
Σ −1 = diag (λ) · Q + ∑ zi DT∗,i · D∗,i ,
i=1
3N
Σ −1 θ̄ = ∑ zi DT∗,iY∗,i .
(5.31)
i=1
Where we have formed a list of matrices D ∈ MS×3N×p by stacking Dl ∈ M3N×p for all observations.
Correlations between elements of each measurement could be incorporated by a trivial modification
as long as they remain a known function of r; however this is not pursued here.
If 3NI dimensions share a common zI , the posterior probability for each element of z replacing
91
5.5. DECIDING SIMULATION PARAMETERS
Eq. (5.14) is
NI
!
zI | · · · ∼ Γ NI S/2, (V0 + ∑ kD∗,i θ −Y∗,i k )/2
i∈I
2
(5.32)
.
5.5
Deciding Simulation Parameters
Minimizing the expected loss function of Ch. 3 gives a method for choosing a “best” set of parameters from samples of the posterior distribution (5.12). After deriving two Lemmas which prove
Eq. 3.45, this section will show that the θ minimizing information loss is equivalent to force matching for the one-step Langevin equation of this chapter, and give a novel formula for choosing the
random force parameters, z.
We can preserve some generality by finding an expression for the relative entropy of two multivariate normal distributions in terms of a simpler integral.
−1/2
−1/2
(x − µ̂)
(x − µ) f Ĉ
L Np µ, C Np µ̂, Ĉ = L f C
′ −1/2 1/2 ′
−1/2
=L f C
Ĉ (x − Ĉ
(µ − µ̂)) f x
−1/2
−1/2
−1/2 ~
0,
I
N
(µ − µ̂), Ĉ
CĈ
= L Np Ĉ
p
(5.33)
The central step made use of a transformation of variables, x = Ĉ1/2 x′ + µ.
Now to calculate the last term, substitute the formula for the relative entropy from Eq. 3.44.
L Np (µ, C)Np (~0, I) =
*
ln
|C|−1/2 exp − 12 (x − µ)T C−1 (x − µ)
+
exp − 21 xT x
= 12 xT x µ,C − 21 ln |C| − 21 (x − µ)T C−1 (x − µ) µ,C
= 12 Tr(C) − p − ln |C| + µT µ
Here, the last step has made use of xT x = (x − µ)T (x − µ) + µT µ = Tr(C) + µT µ.
92
(5.34)
CHAPTER 5. INFERENCE ON THE PMF USING SPLINE-BASED APPROXIMATIONS
For a one-dimensional normal distribution on the force, Eq.s 5.33 and 5.34 reduce to Eq. 3.45.
Substituting the one-step method used in this chapter and minimizing the total expected loss (3.42)
is equivalent to minimizing
2
1
1
σ̂2 σ2
hLi =
ln 2 + 2 − 1 + 2 f (Ri ; θ̂) − f (Ri ; θ) − (γ̂ − γ)ui
σ̂
σ̂
2
σ
(5.35)
Assuming the force and velocity are uncorrelated, we can average over the distribution of ui to get
2
1
σ̂2 σ2 + (γ̂ − γ)2
1
hLi =
.
− 1 + 2 f (Ri ; θ̂) − f (Ri ; θ)
ln 2 +
σ̂2
σ̂
2
σ
(5.36)
This says that the best choice of θ̂ minimizes its difference from the expected θ|V , or θ̂ = hθi,
with the average taken over the posterior distribution (5.12) – exactly the prescription of the force
matching method.
Minimizing the above equation with respect to σ̂2 gives a somewhat non-intuitive result
E
γ̂−γ
2
2
ˆ
√
σ̂ = σ + ( f − f ) + (γ̂ − γ) − 1−σ̂2 .
2
D
2
(5.37)
Assuming the friction coefficient is relatively stable and neglecting the last two terms says that the
best estimate for the magnitude of the stochastic force should be its posterior average plus the expected error in the force function due to uncertainty in the function parametrization. This extra
randomness effectively converts uncertainty about the underlying dynamical process into information entropy of the coarse dynamics. The last two terms could not have been arrived at without a
discrete FDT, and may be important in further investigations of time-dependent phenomena.
5.6
Molecular Simulation Examples
A python implementation of the methods used in this chapter have been made available as a part of
the ForceSolve project on sourceforge.net (http://forcesolve.sourceforge.net). [139] This software
93
5.6. MOLECULAR SIMULATION EXAMPLES
was created as a proof-of-concept for a general method to fit molecular dynamics data to arbitrary
stochastic integration models.
After validation of the implementation’s ability to fit common one-dimensional bond, angle,
and torsion potential energy functions from atomic force data, validation of LJ potential function
fitting was carried out. Finally, a complete all-atom system composed of 512 octane molecules in a
periodic box was coarsened to a single united-atom octane model. All of the above tests were able
to reproduce the underlying potential energy functions with smooth functions to within acceptable
errors, showing the utility of the force matching implementation developed.
To validate the force matching method implementation, 2000 configurations each were drawn
from the canonical ensemble of a diatomic molecule and a three-atom system with a single bond
angle. The same number of configurations were also taken equally spaced over the torsion angle a
four-atom system. Random forces were added to each atom and the position and total force data
were input to the force matching program. Note that for the bond angle and torsion systems, the
force on the angle is not normally distributed due to the transformation of coordinates, and that generalized least squares fitting would give no solution for regions containing no samples. Nevertheless,
Fig. 5.4 shows that the magnitude of the force is correctly estimated in the region of sampled data
points, and approaches a constant outside of the fitted region because of the n = 2 derivative order
imposed on the energy function. The magnitude of the random force was also correctly estimated
to within the precision calculated from the posterior distribution. Also note the absence of strange
effects from tabulating the angle energy as a function of cos θ and plotting in terms of θ. The basis functions were 4th order B-splines with 100 + 4 − 1 (bond/angle) or 200 (torsion) parameters.
Further calculations using up to 405 (bond/angle) or 800 (torsion) parameters and 6th order splines
showed very similar spline fits, in line with the theoretical convergence properties derived above.
Next we consider a system of 256 Lennard-Jones particles with unit mass interacting via the pair
potential
E (r) =
∑
1≤i< j≤256
94
4ci j |ri − r j |−12 − |ri − r j |−6 .
(5.38)
CHAPTER 5. INFERENCE ON THE PMF USING SPLINE-BASED APPROXIMATIONS
Figure 5.4: Validation of the force matching method for common potential energy functions. Original force functions (solid lines) are shown along with the matched functions (dashed lines) superimposed on a contour plot of the sample distribution used for the fitting.
95
5.6. MOLECULAR SIMULATION EXAMPLES
Using the notation |ri − r j | to mean the Euclidean distance between ri ∈ R3 and the closest periodic
image of r j (since all the atoms have been placed into a cubic cell of length 7.49). By setting ci j =
1/2 whenever 1 ≤ i ≤ 128 and 129 ≤ j ≤ 256, or ci j = 1 otherwise, we have set up a two-component
(binary) mixture. Due to their relatively low cross-attraction, particles of type A (1 ≤ i ≤ 128) and
those of type B (129 ≤ i ≤ 256) have been found to be immiscible under the types of conditions
used here [111], separating into two liquid layers within the simulation cell. The liquid state may
be metastable with respect to a solid crystal phase, but such crystallization was not observed in any
of the simulations reported here.
Langevin dynamics with a time-step of ∆ = 1.461 × 10−3 was used to simulate this system until
steady-state behavior was observed. Five hundred configurations, {r}500
1 , were obtained by sampling
every 500 steps in a canonical ensemble simulation, which uses a slight (stochastic) modification of
the equations of motion [157] to guarantee that the velocities are independently normally distributed
about zero with variance β−1 , here chosen to be 0.7917. Although the units stated for this problem
make use of a reduced units convention, the description is equivalent to one with physical units of
ε = 1.26 kJ/mol, σ = 3.7 Å, T = 120 K, m = 16 g/mol, ∆ = 1.935 fs, and γ = 5.1677 ps−1 .
The set of forces to be matched was generated from the set of configurations using the following
procedure. First, the three pairwise potential functions 4ci j t −12 − t −6 were replaced with their
spline representations on t ∈ (4/7, 17/7) and used to find the mean force corresponding to each
configuration. Next, independent normally distributed random noise with magnitude σF = 60.91
was added to each dimension of each generated force sample. This procedure generates the same
sample distribution as would be expected from a Langevin dynamics simulation with a moderate
damping coefficient of γ∆ = 10−2 .
Since we have three functions to match (A:A, A:B, and B:B), we refer to §5.4 (5.30) with
NF = 3 to compute Dl,k ∈ M3N×pk for each frame and calculate the posterior mean and covariance
using Eq. (5.31). In this equation, the sets K are assumed to include all terms of Eq. (5.38) which
share a common set of parameters θK (i.e. interactions between atoms of type A:A, A:B or B:B).
Similarly, we will assume each particle type has its own zi , so we use Eq. (5.32) with NI = 2 and
96
CHAPTER 5. INFERENCE ON THE PMF USING SPLINE-BASED APPROXIMATIONS
-1.2
a-a
a-b
b-b
log10(RMSE/σ)
-1.4
-1.6
-1.8
-2
-2.2
50 100 150 200 250 300 350 400 450 500
M
Figure 5.5: Effect of sample size on average error of the fitted functions.
each set I is simply the set of all coordinates belonging to atoms of a common type.
The pairwise functions were fit to 6th order B-splines with 170 knots and h = 0.1/7 to give a
range of (0, 17/7), forcing the function and all its derivatives to zero at 17/7. MCMC sampling was
carried out as described for the one-dimensional test cases except for the use of the density function
ρ(r) = r2 , which is more appropriate for the spherically symmetric function domains considered
here. Figure 5.5 shows the fitting results and a comparison between the posterior average and
maximum likelihood estimators for a variety of sample sizes. Corresponding distributions of the
observed pairwise distances for the M = 50 case are shown in Fig. 5.6 as a function of sample
size (left scale). For this case the present approach is contrasted with the generalized least squares
solution on the right scale.
Since the actual system does not allow observations of all pairwise distances (particularly for
small r), the average mean-squared error between the functions and their spline fits were calculated
97
5.6. MOLECULAR SIMULATION EXAMPLES
1
1200
M / bin
1000
M(a-a)
M(a-b)
a-b
a-b (Bayesian)
a-b (GLS)
0.8
0.6
Force / σ
1400
800
0.4
600
0.2
400
0
200
0
-0.2
1
1.2
1.4
1.6
R
1.8
2
2.2
2.4
Figure 5.6: Effect of sample size on distribution of observed distances.
98
CHAPTER 5. INFERENCE ON THE PMF USING SPLINE-BASED APPROXIMATIONS
by averaging over the observed distances from all 500 samples.
An unusually high error is observed for the function describing interactions between particles
of type A and B. Inspection of the spline estimates reveals that this error is due to oversmoothing
(the MLE estimate was essentially zero) caused by the relatively small number of samples for this
function, and so does not occur when λ is set to zero – even for the M = 50 case (GLS, Fig. 5.6).
This failure of the MLE makes it a worse choice than GLS, and could not have been predicted from
calculation of the posterior function error. The expected error conditional on the MLE estimate for
λ, z is (in units of Fig. 5.5) −5.0 for all functions at M = 50. At M = 100, the A-A and B-B error
estimates jump to around −1.9 and remains constant as M increases, while the A-B error estimate
remains at −5.0 until M = 350, where it jumps to −2.2 and slowly increases to −2.1 at M = 500.
These numbers taken at the most likely λ, z significantly underestimate the fitting error at small
sample sizes due to their neglect of variations in these scale parameters.
On the other hand, the posterior average estimate (lower three lines in Fig. 5.5) behaves as expected, approximating the input function with accuracy increasing with sample size. The expected
function error using this method is slightly over-estimated due to the uncertainty in λ, z – smoothly
decreasing from −1.5, −1.6 for A-A,A-B at M = 50 to −1.9, −1.8 at M = 500. Examining the
M = 250 case, the likelihood ratio of the posterior average estimate θ̄ to the MLE is 10−585 , but
the average estimate performs better, and must be used, because of the width of the posterior distribution. Considering θ as a 510-dimensional vector, if the probability distribution for α = λ/z|θ̄
is relatively flat over a range R away from θ̄, then there are R510 “states,” θ, α, z, for which θ = θ̄|α
and which therefore have a very low, but similar posterior probability. Integrating over a region in
state-space is essential in justifying the astronomical difference in pointwise probabilities. Finally,
as the sample size increases, the posterior probability narrows, causing both estimators to converge.
The dramatic failure of the MLE shown in Fig. 5.5 demonstrates the importance of averaging over
the posterior parameter distribution for small sample sizes.
One final example which will be considered is a melt of 512 octane molecules. One picosecond of molecular dynamics was carried out in the NVE ensemble using the OPLS all-atom force
99
5.6. MOLECULAR SIMULATION EXAMPLES
Name
angle CH2-CH2-CH3
angle CH2-CH2-CH2
tor CH2-CH2-CH2-CH2
tor CH2-CH2-CH2-CH3
pair CH2-CH3
pair CH2-CH2
pair CH3-CH3
bond CH2-CH3
bond CH2-CH2
Params
103
103
200
200
160
160
160
103
103
Samples/Frame
2
4
3
2
8
6
1
2
5
Table 5.1: List of spline parameters in the united atom octane model, automatically generated by
ForceSolve from the molecular topology
field [31] as implemented in the Gromacs [108] program. Coarse configurations were generated
from each frame by retaining only the center of mass of CH2 or terminal CH3 units from a single
molecule. Because 512 molecules were present, this procedure could be repeat for each molecule,
making the effective sample size very large. A subset of 999 one-molecule configurations (using
the total forces actually present during the simulation) were used to do the fitting of 1292 4th order
spline parameters (see Table 5.6) and 2 z parameters, one for each CG atom type. Six total linear
constraints (on the bond, angle, and, torsion energies) were also present (in Q) in order to fix their
integral to zero. The spline for the pairwise potentials was forced to adopt a value of zero at the
large-distance end by the choice of spline range and did not require additional constraints.
Figure 5.6 shows a comparison of the coarse and fine-scale probability distribution functions for
the bond, angle, torsion, and end-to-end CH3 distance. The excellent agreement between the PDFs
using only 999 sample frames is an encouraging first result, indicating that the present method
would have been very helpful for parametrizing united atom models [114]. Future work on this
system could utilize the derivations of Ch. 3 to find suitable expressions for the pressure and energy
contributions due to the removed degrees of freedom as well as examine the dynamical properties
of the system more closely.
100
CHAPTER 5. INFERENCE ON THE PMF USING SPLINE-BASED APPROXIMATIONS
Figure 5.7: Comparison of internal PDFs between all-atom and united-atom force-matched octane.
101
Chapter 6
Closing Remarks
This dissertation has considered current challenges in computational statistical mechanics from the
viewpoint of subjective (Bayesian) probability. We have found a consistent way to formulate free
energy and coarse graining problems, and have given applications which extend the range of scientific hypotheses that can be made in these areas.
Free energies were equated to logarithms of likelihood ratios for alternative possible problem
specifications. First, the traditional canonical phase-space distributions were derived in Ch. 2. These
give PDFs for states of the system conditional on any type of prior information (i.e. an energy function, average molecular properties, etc.). Next, Ch. 4 showed that the probability for a change in
prior information is an average over re-weighting functions, a result which should again be valid
for any type of prior information. Together, these results give a consistent view of thermodynamics
in general. Thermodynamic cycles gain the conceptual interpretation of a Bayesian “learning” process, where 4.2 gives the PDF for properties of any state point, while the normalization constants
give the likelihoods of switching between states. A state function is any functional of a canonical
distribution. The implications for these state functions of approximations made in the free energy
of transitioning between states can also be understood in terms of Eq. 4.2.
Further possible applications of the above developments are in the extension to the space of
trajectories rather than static configurations. Developments along these lines are already known for
102
CHAPTER 6. CLOSING REMARKS
free energies (i.e. Jarzynski’s equality [79]). Applying both maximum entropy and the associated
reweighting ideas of Ch. 4 to whole trajectories could yield a complete theory of nonequilibrium
thermodynamics.
Chapter 4 went on to exhibit computationally important thermodynamic cycles for the special
case of solvation free energy problems. Division of the problem into the formation of a specific local
solvation (cavity) structure followed by solute coupling removed the high-energy close contact contributions from ∆E, leading to a simple Gaussian interaction energy distribution and consequently a
linear interaction energy solvation problem for the OS, LR component. The probabilities of cavity
formation were calculated using a straightforward (but lengthy) application of Bayes’ theorem to
the observed simulation data.
Because of the linear dependence of ∆E on the coupling between the solute and solvent, µex
OS,LR
could be divided into additive contributions from each term of ∆E. This decomposition proved
useful in a recent study on the physics of ion solvation using polarizable force-fields [141]. It
was able to show that the electrostatic, induction, and Lennard-Jones components were all more
dependent on ion size than polarizability along the Cl− , Br− , I− series. Comparing a plot of the
electrostatic and induction free energy components with respect to cavity size to the traditional
Born model [10] also holds the possibility of rationalizing specific ion effects [102].
Important extensions of the ideas presented here include consideration of solvation structures
other than spherical cavities. Multiple spherical cavities could be used for larger molecules, however
sampling would be more difficult in this case. Considering structures with n > 0 solvent molecules
may alleviate these problems. Such a method would be directly applicable to cations, which form
relatively rigid local solvation shells involving around four waters [167]. Perhaps analytical forms
such as scaled particle theory [11] could be developed for the evaluation of formation free energies
for these structures in bulk.
Some other interesting paths to consider are improving the accuracy of the three FE components.
QM/MM energy functions could be incorporated into the determination of µex
OS,LR using appropriate
re-weighting of the configurations generated during MD sampling of the MM force field. MM
103
force-fields may be expected to better approximate QM energies in regions of configuration space
without close solvent-solute contacts. Also, OS,HS and IS components would be better estimated if
the “uncorrelated sample” assumption could be removed. Jaynes [83] presents the beginnings of an
idea to accomplish this, whereby the simulation time-series is treated as a Markov process in rmin .
Chapter 3 presented a high-level view of the statistical thermodynamics of reduced dimensionality systems. The first major result was Eq. 3.7, describing system entropy in terms of its contained
subsystems. It shows a new way to conceptualize the coarse-graining process, explaining the origin
of thermodynamic inconsistencies. Assuming an entropy functional for the removed degrees of freedom conditional on the remaining, coarse, coordinates is in line with dissipative particle dynamics
ideas. It could also be applied to add missing details to rigid or completely implicit solvent models,
and united atom force fields. It remains to be seen what choice for internal equations of state will
make physical sense for each specific problem.
Dynamical equations for continuous and discrete-time deterministic and stochastic models were
also given. The time-dependent phase-space probability densities were derived which serve as the
starting point for probabilistic analysis of trajectories. Assuming a suitable numerical scheme exists
for collecting time-dependent ensembles, sections 3.3 and 3.4 focused on deriving FDT-s relating
in order to preserve the correct equilibrium PDF. The latter material is the second major result of
the chapter, solving the problem for discrete-time integrators under the assumption that the position
update step preserved the equilibrium position-space PDF. An extension of the analysis considering
a simultaneous position and velocity update, using Taylor expansion of the potential energy function
in the exponent of e−βECG (R) could eventually lead to more accurate position update equations as
well.
Using the concept of transition probability densities, an information loss metric was derived and
shown to result in a useful choice for all model parameters including the random noise scale. This
third result reduces to force matching when the problem is restricted to determining only the energy
function for generalized Langevin-type numerical integrators. However the information loss metric
is much more general. It can be applied to arbitrary integration schemes and requires only observed
104
CHAPTER 6. CLOSING REMARKS
trajectory data, fulfilling the requirement of computability. It finds its minimum when the chosen
coarse variables are well-predicted by the model, and (to some extent) penalizes models whose
parameters cannot be predicted with certainty. Future work may be able to connect the information
loss with the subjective H theorem, sandwiching the spread in the phase-space distribution at other
times between zero and an associated information entropy.
The discrete version of the FDT became useful in Ch. 5 for designing an integrator matching
algorithm. This chapter gave a worked example of starting from a model for an integrator and
applying Bayesian inference to get the parameters from observed fine-scale dynamics. For more
complicated ACFs, the theorems of Ch. 3 will directly apply during the fitting process, eliminating
any need to appeal to, for example, the Fourier transform of functions over an infinite time domain.
More complicated integration schemes could be formulated that allow conservation of energy, momentum, etc. between the coarse system and a finite heat bath and/or incorporate some features of
the bath subsystem entropy functions. A more immediately useful generalization of the Langevin
equation would be to allow a position-dependent random force magnitude and damping coefficients.
These could correlate the dynamics of different parts of the system even in the absence of an energy
function, but are not straightforward to parametrize, since the posterior distributions may not turn
out to be solvable linear systems as in Ch. 5.
Overall, the present work has expanded the range of hypotheses for which we can consider the
problem of “agreement between the premises and the conclusions.” This has been accomplished
by introducing and/or extending new concepts and types of premises which can be considered –
existence of subsystems within a coarse-grain system of interest, discrete generalized Langevin
processes, and physical division of free energy problems using structural conditions – in addition to
the usual thermodynamic assumptions of energy functions, average energies, volumes, etc. These
have been complimented by completely worked example solutions based on traditional bead-based
coarse-graining, information loss in the agreement between transition PDFs and observed trajectory
data, and a Bayesian method for inference on absolute solvation free energies for small solutes in
arbitrary molecular environments. Finally, particular applications of the free energy and CG theories
105
presented herein will allow more complete connection between atomic and continuum systems.
106
Bibliography
[1] F. Abramovich and V. Grinshtein. Derivation of equivalent kernel for general spline smoothing: A systematic approach. Bernoulli, 5(2):359–379, 1999.
[2] F. Abramovich and D. M. Steinberg. Improved inference in nonparametric regression using
lk-smoothing splines. J. Stat. Plan. Inf., 49(3):327–341, Feb. 1996.
[3] M. Abramowitz and I. E. Stegun. Handbook of Mathematical Functions. National Bureau of
Standards, 1964.
[4] M. Aerts, G. Claeskens, and M. P. Wand. Some theory for penalized spline generalized
additive models. Journal of Statistical Planning and Inference, 103(1-2):455–470, Apr. 2002.
[5] P. K. Agarwal, S. R. Billeter, P. T. R. Rajagopalan, S. J. Benkovic, and S. Hammes-Schiffer.
Network of coupled promoting motions in enzyme catalysis. Proceedings of the National
Academy of Sciences of the United States of America, 99(5):2794–2799, 2002.
[6] R. L. C. Akkermans and W. J. Briels. Coarse-grained dynamics of one chain in a polymer
melt. J. Chem. Phys., 113(15):6409–6422, 2000.
[7] S. R. Alam, J. S. Vetter, P. K. Agarwal, and A. Geist. Performance characterization of molecular dynamics techniques for biomolecular simulations. In PPOPP, pages 59–68, 2006.
[8] M. P. Allen and D. J. Tildesley. Computer Simulation of Liquids. Clarendon Press, Oxford,
1987.
107
BIBLIOGRAPHY
[9] T. Aoyagi, F. Sawa, T. Shoji, H. Fukunaga, J. ichi Takimoto, and M. Doi. A general-purpose
coarse-grained molecular dynamics program. Comput. Phys. Comm., 145(2):267–279, 2002.
[10] H. S. Ashbaugh. Convergence of molecular and macroscopic continuum descriptions of ion
hydration. J. Phys. Chem. B, 104(31):7235–7238, 2000.
[11] H. S. Ashbaugh and L. R. Pratt. Colloquium: Scaled particle theory and the length scales of
hydrophobicity. Rev. Mod. Phys., 78(1):159–178, 2006.
[12] H. S. Ashbaugh and L. R. Pratt. Contrasting nonaqueous against aqueous solvation on the
basis of scaled-particle theory. J. Phys. Chem. B, 111(31):9330–9336, 2007.
[13] D. Asthagiri, H. S. Ashbaugh, A. Piryatinski, M. E. Paulaitis, and L. R. Pratt. Non-van der
Waals treatment of the hydrophobic solubilities of CF4 . J. Am. Chem. Soc., 129(33):10133
–10140, 2007.
[14] D. Asthagiri, S. Merchant, and L. R. Pratt. Role of attractive methane-water interactions in the
potential of mean force between methane molecules in water. J. Chem. Phys., 128:244512,
2008.
[15] J. B. Avalos and A. D. Mackie. Dynamic and transport properties of dissipative particle
dynamics with energy conservation. J. Chem. Phys., 111(11):5267–5276, 1999.
[16] V. Baladandayuthapani, B. K. Mallick, and R. J. Carroll. Spatially adaptive bayesian penalized regression splines (P-splines). Journal of Computational & Graphical Statistics,
14(2):378–394, June 2005.
[17] S. K. Bar-Lev and P. Enis. Reproducibility and natural exponential families with power
variance functions. Ann. Stat., 14(4):1507–1522, Dec. 1986.
[18] O. E. Barndorff-Nielsen. Information and Exponential Families in Statistical Theory. John
Wiley & Sons Ltd, Apr. 1978.
108
BIBLIOGRAPHY
[19] T. L. Beck, M. E. Paulaitis, and L. R. Pratt. The Potential Distribution Theorem and Models
of Molecular Solutions. Cambridge, New York, 2006.
[20] D. Ben-Amotz and R. Underwood. Unraveling water’s entropic mysteries: A unified view of
nonpolar, polar, and ionic hydration. Acc. Chem. Res., 41(8):957–967, 2008.
[21] C. H. Bennett. Efficient estimation of free energy differences from Monte Carlo data. J.
Comput. Phys., 22:245–268, 1976.
[22] C. Biller. Adaptive bayesian regression splines in semiparametric generalized linear models.
J. Comp. Graph. Statist., 12:122–140, 2000.
[23] S. Boresch and M. Karplus. The Jacobian factor in free energy simulations. J. Chem. Phys.,
105(12):5145–5154, 1996.
[24] D. Boyer and J. Viñals. Grain boundary pinning and glassy dynamics in stripe phases. Phys.
Rev. E, 65(4):046119, Apr. 2002.
[25] J. Bricmont, D. Dürr, M. Galavotti, G. Ghirardi, F. Petruccione, and N. Zanghi, editors.
Chance in Physics, volume 574 of Foundations and Perspectives Series: Lecture Notes in
Physics. Springer, Berlin, 2001.
[26] W. J. Briels and R. L. C. Akkermans. Representation of coarse-grained potentials for polymer
simulations. J. Mol. Sim., 28:145–152, 2002.
[27] M. O. Cáceres and A. A. Budini. The generalized ornstein - uhlenbeck process. J. Phys. A:
Math. Gen., 30(24):8427–8444, 1997.
[28] H. B. Callen. Thermodynamics and an Introduction to Thermostatistics. Wiley, 2nd edition,
Sept. 1985.
[29] D. A. Case, T. A. Darden, T. E. Cheatham III, C. L. Simmerling, J. Wang, R. E. Duke,
R. Luo, M. Crowley, R. C. Walker, W. Zhang, K. M. Merz, B. Wang, S. Hayik, A. Roitberg,
109
BIBLIOGRAPHY
G. Seabra, I. Kolossvry, K. F. Wong, F. Paesani, J. Vanicek, X. Wu, S. Brozell, T. Steinbrecher, H. Gohlke, L. Yang, C. Tan, J. Mongan, V. Hornak, G. Cui, D. H. Mathews, M. G.
Seetin, C. Sagui, V. Babin, and P. A. Kollman. AMBER 10. University of California, San
Francisco, 2008.
[30] T.-M. Chang and L. X. Dang. Recent advances in molecular simulations of ion solvation at
liquid interfaces. Chem. Rev., 106(4):1305–1322, Apr. 2006.
[31] B. Chen, M. G. Martin, and J. I. Siepmann. Thermodynamic properties of the williams, oplsaa, and mmff94 all-atom force fields for normal alkanes. J. Phys. Chem. B, 102(14):2578–
2586, 1998.
[32] J. Cheng, M. R. Hoffmann, and A. J. Colussi.
Anion fractionation and reactivity at
air/water:methanol interfaces. implications for the origin of hofmeister effects. J. Phys. Chem.
B, 112(24):7157–7161, 2008.
[33] J. Cheng, C. D. Vecitis, M. R. Hoffmann, and A. J. Colussi. Experimental anion affinities for
the air/water interface. J. Phys. Chem. B, 110(51):25598–25602, 2006.
[34] C. Chipot, A. E. Mark, V. S. Pande, and T. Simonson. Applications of Free Energy Calculations to Chemistry and Biology. In C. Chipot and A. Pohorille, editors, Free Energy
Calculations: Theory and Applications in Chemistry and Biology, pages 463–501. SpringerVerlag, Berlin, 2007.
[35] C. Chipot and A. Pohorille. Calculating Free Energy Differences Using Perturbation Theory.
In C. Chipot and A. Pohorille, editors, Free Energy Calculations: Theory and Applications
in Chemistry and Biology, pages 33–75. Springer-Verlag, Berlin, 2007.
[36] D. E. Clark and C. G. Newton. Outsourcing lead optimisation - the quiet revolution. Drug
Disc. Today, 9(11):492–500, 2004.
110
BIBLIOGRAPHY
[37] W. K. den Otter and W. J. Briels. The calculation of free-energy differences by constrained
molecular-dynamics simulations. J. Chem. Phys., 109(11):4139–4146, 1998.
[38] Y. Deng and B. Roux. Hydration of amino acid side chains: Nonpolar and electrostatic contributions calculated from staged molecular dynamics free energy simulations with explicit
water molecules. J. Phys. Chem. B, 108:16567–16576, 2004.
[39] Y. Deng and B. Roux. Computation of binding free energy with molecular dynamics and
grand canonical monte carlo simulations. J. Chem. Phys., 128:115103, 2008.
[40] K. A. Dill and S. Bromberg. Molecular Driving Forces: Statistical Thermodynamics in
Chemistry & Biology. Garland Science, 1st edition, Sept. 2002.
[41] R. I. Dima and H. Joshi. Probing the origin of tubulin rigidity with molecular simulations.
Proc. Nat. Acad. Sci., 105(41):15743–15748, 2008.
[42] J. F. Donoghue, E. Golowich, and B. R. Holstein. Dynamics of the Standard Model. Cambridge University Press, New York, 1992.
[43] B. L. Eggimann and J. I. Siepmann. Size effects on the solvation of anions at the aqueous
liquid-vapor interface. J. Phys. Chem. C, 112:210–218, 2008.
[44] P. Ehrenfest and T. Ehrenfest. The conceptual foundations of the statistical approach in
mechanics. Cornell University Press, Ithaca NY, 1959. English translation of Encykl. Math.
Wiss. 1912. by M. J. Moravcsik.
[45] A. Einstein, J. Stachel, A. Beck, and P. Havas. The Collected Papers of Albert Einstein: The
Swiss years, writings, 1900-1909. Princeton University Press, 1989.
[46] A. Eriksson, M. N. Jacobi, J. Nyström, and K. rn Tunstrø m. Effective thermostat induced by
coarse graining of simple point charge water. J. Chem. Phys., 129(2):024106, 2008.
111
BIBLIOGRAPHY
[47] U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee, and L. G. Pedersen. A smooth
particle mesh Ewald method. J. Chem. Phys., 103:8577–8592, 1995.
[48] E. G. Flekkøy and P. V. Coveney. From molecular dynamics to dissipative particle dynamics.
Phys. Rev. Lett., 83(9):1775–1778, Aug 1999.
[49] J. Florian and A. Warshel. Phosphate ester hydrolysis in aqueous solution: Associative versus
dissociative mechanisms. J. Phys. Chem. B, 102:719–734, 1998.
[50] B. J. Ford. The royal society and the microscope. Notes and Records of the Royal Society of
London, 55(1):29–49, 2001.
[51] H. L. Friedman and C. V. Krishnan. Thermodynamics of ion hydration. In F. Franks, editor,
Water: A Comprehensive Treatise. Plenum Press, New York, 1973.
[52] J. Friedman. Multivariate adaptive regression splines (with discussion). The Annals of Statistics, 19:1–141, 1991.
[53] E. Gallicchio, M. Andrec, A. K. Felts, and R. M. Levy. Temperature weighted histogram
analysis method, replica exchange, and transition paths. J. Phys. Chem. B, 109(14):6722–
6731, Apr. 2005.
[54] D. Gamerman and H. F. Lopes. Markov Chain Monte Carlo. CRC Press, 2006.
[55] J. W. Gibbs. Elementary principles in statistical mechanics. C. Scribner’s sons, 1902.
[56] S. Goldstein. Bohmian mechanics. In E. N. Zalta, editor, The Stanford Encyclopedia of
Philosophy, 2009.
[57] M. A. Gomez, L. R. Pratt, and S. Garde. Molecular realism in default models for information
theories of hydrophobic effects. J. Phys. Chem. B, 103:3520–3523, 1999.
[58] W. T. Grandy. Foundations of Statistical Mechanics. Kluwer, Boston, 1987.
112
BIBLIOGRAPHY
[59] R. D. Groot and P. B. Warren. Dissipative particle dynamics: Bridging the gap between
atomistic and mesoscopic simulation. J. Chem. Phys., 107(11):4423–4435, 1997.
[60] A. Grossfield, P. Ren, and J. W. Ponder. Ion solvation thermodynamics from simulation with
a polarizable force field. J. Am. Chem. Soc., 125(50):15671–15682, Dec. 2003.
[61] E. Guárdia, I. Skarmoutsos, and M. Masia. On ion and molecular polarization of halides in
water. J. Chem. Theory Comput., 5(6):1449–1453, 2009.
[62] E. Gutiérrez-Peña. Moments for the canonical parameter of an exponential family under a
conjugate distribution. Biometrika, 84(3):727–732, Sept. 1997.
[63] D. Hagberg, S. Brdarski, and G. Karlstrom. On the solvation of ions in small water droplets.
J. Phys. Chem. B, 109(9):4111–4117, Mar. 2005.
[64] E. Hairer, C. Lubich, and G. Wanner. Geometric numerical integration illustrated by the
Störmer-Verlet method. Acta Numerica, 12:399–450, 2003.
[65] D. Halliday, R. Resnick, and J. Walker. Fundamentals of Physics. Wiley, New York, 6th ed.
edition, 2001.
[66] H. Hegyi and M. Gerstein. The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. J. Mol. Biol., 288(1):147–164, 1999.
[67] R. V. Hogg, A. Craig, and J. W. McKean. Introduction to Mathematical Statistics. Prentice
Hall, 6th edition, June 2004.
[68] P. J. Hoogerbrugge and J. M. V. A. Koelman. Simulating microscopic hydrodynamic phenomena with dissipative particle dynamics. Europhys. Lett., 19(3):155–160, 1992.
[69] H. Hu and W. Yang. Free energies of chemical reactions in solution and in enzymes with ab
initio quantum mechanics/molecular mechanics methods. Ann. Rev. Phys. Chem., 59:573–
601, 2008.
113
BIBLIOGRAPHY
[70] G. Hummer. Position-dependent diffusion coefficients and free energies from Bayesian analysis of equilibrium and replica molecular dynamics simulations. New J. Phys., 7:34, 2005.
[71] G. Hummer. Nonequilibrium methods for equilibrium free energy calculations. In C. Chipot
and A. Pohorille, editors, Free Energy Calculations: Theory and Applications in Chemistry
and Biology, pages 171–198. Springer-Verlag, Berlin, 2007.
[72] G. Hummer, S. Garde, A. E. Garcia, A. Pohorille, and L. R. Pratt. An information theory
model of hydrophobic interactions. Proc. Natl. Acad. Sci. USA, 93:8951–8955, 1996.
[73] G. Hummer, L. R. Pratt, and A. E. Garcia. Free energy of ionic hydration. J. Phys. Chem.,
100:1206–1215, 1996.
[74] C. Hyeon, R. I. Dima, and D. Thirumalai. Pathways and kinetic barriers in mechanical
unfolding and refolding of rna and proteins. Structure, 14(11):1633–1645, 2006.
[75] A. Isihara. The Gibbs-Bogoliubov inequality. J. Phys. A., 1:539–548, 1968.
[76] S. Izvekov and G. A. Voth. Modeling real dynamics in the coarse-grained representation of
condensed phase systems. J. Chem. Phys., 125:151101, 2006.
[77] S. Izvekov and G. A. Voth. Solvent-free lipid bilayer model using multiscale coarse-graining.
J. Phys. Chem. B, (13):4443–4455, 2009.
[78] A. Jain and H. S. Ashbaugh. Digging a hole: Scaled-particle theory and cavity solvation in
organic solvents. J. Chem. Phys., 129(17):174505, 2008.
[79] C. Jarzynski. Nonequilibrium equality for free energy differences. Phys. Rev. Lett., 78:2690–
2693, 1997.
[80] E. T. Jaynes. Information theory and statistical mechanics. Phys. Rev., 106(4):620–630, May
1957.
114
BIBLIOGRAPHY
[81] E. T. Jaynes. Information theory and statistical mechanics. II. Phys. Rev., 108(2):171–190,
Oct 1957.
[82] E. T. Jaynes. Maximum entropy and bayesian methods. In C. Smith, G. Erickson, and
P. O. Neudorfer, editors, Maximum Entropy and Bayesian Methods, pages 1–23. Kluwer,
Dordrecht, 1992.
[83] E. T. Jaynes. Probability Theory: the Logic of Science. Cambridge, Cambridge, 2003.
[84] E. T. Jaynes and R. D. Rosenkrantz. Papers on Probability, Statistics and Statistical Physics.
Kluwer, Boston, 1989.
[85] M. E. Johnson, T. Head-Gordon, and A. A. Louis. Representability problems for coarsegrained water potentials. J. Chem. Phys., 126(14):144509, 2007.
[86] W. L. Jorgensen. Free energy calculations: A breakthrough for modeling organic chemistry
in solution. Acc. Chem. Res., 22:184–189, 1989.
[87] W. L. Jorgensen.
The many roles of computation in drug discovery.
Science,
303(5665):1813–1818, Mar. 2004.
[88] W. L. Jorgensen and D. L. Severance. Aromatic-aromatic interactions: Free energy profiles for the benzene dimer in water, chloroform, and liquid benzene. J. Am. Chem. Soc.,
112:4768–4774, 1990.
[89] A. Jullion and P. Lambert. Robust specification of the roughness penalty prior distribution
in spatially adaptive bayesian p-splines models. Comput. Stat. & Data Anal., 51:2542–2558,
2007.
[90] P. Jungwirth and D. J. Tobias. Specific ion effects at the air/water interface. Chem. Rev.
(Washington, DC, U.S.), 106(4):1259–1281, Apr. 2006.
115
BIBLIOGRAPHY
[91] M. Karplus and J. N. Kushick. Method for estimating the configurational entropy of macromolecules. Macromolecules, 14(2):325–332, 1981.
[92] M. A. Katsoulakis and J. Trashorras. Information loss in coarse-graining of stochastic particle
dynamics. J. Stat. Phys., 122(1):115–135, Jan. 2006.
[93] M. A. Katsoulakis and D. G. Vlachos. Coarse-grained stochastic processes and kinetic monte
carlo simulators for the diffusion of interacting particles. J. Chem. Phys., 119(18):9412–9427,
2003.
[94] G. S. Kimeldorf and G. Wahba. A correspondence between bayesian estimation on stochastic
processes and smoothing by splines. Ann. Math. Stat., 41(2):495–502, Apr. 1970.
[95] G. S. Kimeldorf and G. Wahba. Some results on Tchebycheffian spline functions. J. Math.
Anal. Appl., 33(1):82–95, 1971.
[96] D. B. Kitchen, H. Decornez, J. R. Furr, and J. Bajorath. Docking and scoring in virtual
screening for drug discovery: methods and applications. Nat. Rev. Drug Discov., 3(11):935–
949, Nov. 2004.
[97] P. Kollman. Free energy calculations: Applications to chemical and biochemical phenomena.
Chem. Rev., 93:2395–2417, 1993.
[98] P. Kollman. Advances and continuing challenges in achieving realistic and predictive simulations of the properties of organic and biological molecules. Acc. Chem. Res., 29:461–469,
1996.
[99] P. A. Kollman, I. Massova, C. Reyes, B. Kuhn, S. Huo, L. Chong, M. Lee, T. Lee, Y. Duan,
W. Wang, O. Donini, P. Cieplak, J. Srinivasan, D. A. Case, and T. E. Cheatham. Calculating
structures and free energies of complex molecules: Combining molecular mechanics and
continuum models. Acc. Chem. Res., 33(12):889–897, 2000.
116
BIBLIOGRAPHY
[100] R. Kubo. The fluctuation-dissipation theorem. Reports on Progress in Physics, 29:255–284,
1966.
[101] S. Kumar, J. M. Rosenberg, D. Bouzida, R. H. Swendsen, and P. A. Kollman. The weighted
histogram analysis method for free-energy calculations on biomolecules. I. The method. J.
Comput. Chem., 13(8):1011–1021, 1992.
[102] W. Kunz, P. Lo Nostro, and B. W. Ninham. The present state of affairs with Hofmeister
effects. Curr. Opin. Colloid Interface Sci., 9:1–18, 2004.
[103] G. Lamoureux and B. Roux. Absolute hydration free energy scale for alkali and halide ions
established from simulations with a polarizable force field. J. Phys. Chem. B, 110:3308–3322,
2006.
[104] S. Lang and A. Brezger. Bayesian P-splines. J. Comput. Graph. Stat., 13:183–212, 2004.
[105] J. L. Lebowitz. Boltzmann’s entropy and time’s arrow. Phys. Today, 46(9):32–38, 1993.
[106] G. Letac and M. Mora. Natural real exponential families with cubic variance functions. Ann.
Stat., 18(1):1–37, Mar. 1990.
[107] R. D. Levine and M. Tribus, editors. The Maximum Entropy Formalism. M.I.T Press, Cambridge, 1979.
[108] E. Lindahl, B. Hess, and D. van der Spoel. GROMACS 3.0: A package for molecular simulation and trajectory analysis. J. Molec. Model., 7:306–317, 2001.
[109] K. B. Lipkowitz and D. B. Boyd, editors. Reviews in Computational Chemistry, volume 1.
Wiley-VCH, New Jersey, 1990.
[110] N. Lu and T. B. Woolf. Understanding and Improving Free Energy Calculations in Molecular
Simulations: Error Analysis and Reduction Methods. In C. Chipot and A. Pohorille, editors,
117
BIBLIOGRAPHY
Free Energy Calculations: Theory and Applications in Chemistry and Biology, pages 199–
247. Springer-Verlag, Berlin, 2007.
[111] K. Maeda, W. Matsuoka, T. Fuse, K. Fukui, and S. Hirota. Solid-liquid phase transition of
binary lennard-jones mixtures on molecular dynamics simulations. J. Molec. Liq., 102(13):1–9, 2003.
[112] J. Mahanty and B. W. Ninham. Dispersion Forces. Academic Press, London, 1976.
[113] P. Maragakis, F. Ritort, C. Bustamante, M. Karplus, and G. E. Crooks. Bayesian estimates of
free energies from nonequilibrium work data in the presence of instrument noise. The Journal
of Chemical Physics, 129(2):024102, July 2008.
[114] M. G. Martin and J. I. Siepmann. Transferable potentials for phase equilibria. 1. united-atom
description of n-alkanes. J. Phys. Chem. B, 102(14):2569–2577, 1998.
[115] K. Messer. A comparison of a spline estimate to its equivalent kernel estimate. The Annals
of Statistics, 19(2):817–829, 1991.
[116] K. Messer and L. Goldstein. A new class of kernels for nonparametric curve estimation. The
Annals of Statistics, 21(1):179–195, 1993.
[117] D. L. Mobley, E. Dumont, J. D. Chodera, and K. A. Dill. Comparison of charge models for
fixed-charge force fields: Small-molecule hydration free energies in explicit solvent. J. Phys.
Chem. B, 111:2242–2254, 2007.
[118] D. L. Mobley, A. P. Graves, J. D. Chodera, A. C. McReynolds, B. K. Shoichet, and K. A.
Dill. Predicting absolute ligand binding free energies to a simple model site. J. Mol. Biol.,
371:1118–1134, 2007.
[119] H. Mori. Transport, collective motion, and brownian motion. Progress of Theoretical Physics,
33:423–455, 1965.
118
BIBLIOGRAPHY
[120] I. D. Morrison and S. Ross. Colloidal Dispersions: Suspensions, Emulsions, and Foams.
Wiley, New York, 2002.
[121] F. Müller-Plathe. Coarse-graining in polymer simulation: From the atomistic to the mesoscopic scale and back. ChemPhysChem, 3(9):754–769, 2002.
[122] S. O. Nielsen, C. F. Lopez, G. Srinivas, and M. L. Klein. Coarse grain models and the
computer simulation of soft materials. Journal of Physics: Condensed Matter, 16:R481–
R512, 2004.
[123] W. G. Noid, J.-W. Chu, G. S. Ayton, V. Krishna, S. Izvekov, G. A. Voth, A. Das, and H. C.
Andersen. The multiscale coarse-graining method. I. a rigorous bridge between atomistic and
coarse-grained models. J. Chem. Phys., 128(24):244114, 2008.
[124] P. E. nol and P. Warren. Statistical mechanics of dissipative particle dynamics. Europhys.
Lett., 30(4):191–196, 1995.
[125] D. Nychka. Splines as local smoothers. The Annals of Statistics, 23(4):1175–1197, 1995.
[126] A. Öhrn and G. Karlström. Many-body polarization, a cause of asymmetric solvation of ions
and quadrupoles. J. Chem. Theory Comput., 3(6):1993–2001, 2007.
[127] C. Oostenbrink. Efficient free energy calculations on small molecule host-guest systems–
a combined linear interaction energy/one-step perturbation approach. J. Comput. Chem.,
30(2):212–221, 2009.
[128] J. T. Padding and W. J. Briels. Time and length scales of polymer melts studied by coarsegrained molecular dynamics simulations. J. Chem. Phys., 117(2):925–943, July 2002.
[129] A. Paliwal, D. Asthagiri, L. R. Pratt, H. S. Ashbaugh, and M. E. Paulaitis. An analysis of
molecular packing and chemical association in liquid water using quasichemical theory. J.
Chem. Phys., 124:224502, 2006.
119
BIBLIOGRAPHY
[130] S. Park and V. S. Pande. Validation of markov state models using shannon’s entropy. J. Chem.
Phys., 124(5):054118, 2006.
[131] V. A. Parsegian. Van der Waals Forces: A Handbook for Biologists, Chemists, Engineers,
and Physicists. Cambridge, 2006.
[132] L. Pastewka, D. Kauzlarić, A. Greiner, and J. G. Korvink. Thermostat with a local heatbath coupling for exact energy conservation in dissipative particle dynamics. Phys. Rev. E,
73:037701, 2006.
[133] A. Pohorille and E. Darve. A bayesian approach to calculating free energies in chemical and
biological systems. AIP Conference Proceedings, 872(1):23–30, 2006.
[134] L. R. Pratt and D. Asthagiri. Potential Distribution Methods and Free Energy Models of
Molecular Solutions. In C. Chipot and A. Pohorille, editors, Free Energy Calculations: Theory and Applications in Chemistry and Biology, pages 323–351. Springer-Verlag, Berlin,
2007.
[135] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes in
C: The art of scientific computing, chapter General Linear Least Squares, pages 671–681.
Cambridge University Press, 1999.
[136] U. Reif. Uniform b-spline approximation in sobolev spaces. Numerical Algorithms, 15(1):1–
14, July 1997.
[137] P. Ren and J. W. Ponder. Polarizable atomic multipole water model for molecular mechanics
simulation. J. Phys. Chem. B, 107(24):5933–5947, June 2003.
[138] J. Rice.
Bandwidth choice for nonparametric regression.
The Annals of Statistics,
12(4):1215–1230, 1984.
[139] D. M. Rogers and T. L. Beck. ForceSolve. Sourceforge, Chicago, 2008.
120
BIBLIOGRAPHY
[140] D. M. Rogers and T. L. Beck. Modeling molecular and ionic absolute solvation free energies
with quasichemical theory bounds. J. Chem. Phys., 129(13):134505, 2008.
[141] D. M. Rogers and T. L. Beck. Quasi-chemical analysis of polarizable anion hydration. to be
submitted to J. Phys. Chem. B, 2009.
[142] D. M. Rogers and T. L. Beck. Resolution and scale independent nonparametric function
matching using a string energy penalized spline prior. to be submitted, 2009.
[143] R. Y. Rubinstein. Simulation and the Monte Carlo Method. Wiley-Interscience, 1981.
[144] D. Ruppert and R. J. Carroll. Spatially adaptive penalties for spline fitting. Australian and
New Zealand Journal of Statistics, 42:205–223, 2000.
[145] W. B. Russel, D. A. Saville, and W. R. Schowalter. Colloidal Dispersions. Cambridge
University Press, 1989.
[146] T. Schlick, R. D. Skeel, A. T. Brunger, L. V. Kale, J. A. Board, J. Hermans, and K. Schulten.
Algorithmic challenges in computational molecular biophysics. Journal of Computational
Physics, 151:9–48, 1999.
[147] R. Schmid, A. M. Miah, and V. N. Sapunov. A new table of the thermodynamic quantities
of ionic hydration: values and some applications (enthalpyentropy compensation and born
radii). Phys. Chem. Chem. Phys., 2:97–102, 2000.
[148] E. Schrödinger. Statistical thermodynamics. Cambridge, 1967.
[149] J. K. Shah, D. Asthagiri, L. R. Pratt, and M. E. Paulaitis. Gaussian models for the statistical
thermodynamics of liquid water. Arxiv Preprint, (physics/0608209), 2006.
[150] J. K. Shah, D. Asthagiri, L. R. Pratt, and M. E. Paulaitis. Balancing local order and longranged interactions in the molecular theory of liquid water. J. Chem. Phys., 127:144508,
2007.
121
BIBLIOGRAPHY
[151] M. S. Shell and A. Z. Panagiotopoulos. Methods for Examining Phase Equilibira. In
C. Chipot and A. Pohorille, editors, Free Energy Calculations: Theory and Applications
in Chemistry and Biology, pages 353–387. Springer-Verlag, Berlin, 2007.
[152] M. R. Shirts, D. L. Mobley, and S. P. Brown. Free energy calculations in structure based drug
design. In K. M. Merz, D. Ridge, and C. H. Reynolds, editors, Structure Based Drug Design.
Cambridge University Press. in press.
[153] M. R. Shirts and V. S. Pande. Comparison of efficiency and bias of free energies computed
by exponential averaging, the Bennett acceptance ratio, and thermodynamic integration. J.
Chem. Phys., 122:144107, 2005.
[154] M. R. Shirts, J. W. Pitera, W. C. Swope, and V. S. Pande. Extremely precise free energy
calculations of amino acid side chain analogs: Comparison of common molecular mechanics
force fields for proteins. J. Chem. Phys., 119:5740–5761, 2003.
[155] B. W. Silverman. Spline smoothing: the equivalent variable kernel method. The Annals of
Statistics, 12:898–916, 1984.
[156] T. Simonson.
Free Energy Calculations: Approximate Methods for Biological Macro-
molecules. In C. Chipot and A. Pohorille, editors, Free Energy Calculations: Theory and
Applications in Chemistry and Biology, pages 423–461. Springer-Verlag, Berlin, 2007.
[157] R. D. Skeel and J. A. Izaguirre. An impulse integrator for langevin dynamics. Mol. Phys,
100(24):3885—3891, 2002.
[158] D. Spellmeyer, editor. Annual Reports in Computational Chemistry. Elsevier, New York,
Mar. 2005.
[159] J. Srinivasan, T. E. Cheatham, P. Cieplak, P. A. Kollman, and D. A. Case. Continuum solvent
studies of the stability of DNA, RNA, and phosphoramidate–DNA helices. J. Am. Chem.
Soc., 120(37):9401–9409, 1998.
122
BIBLIOGRAPHY
[160] S. Sriraman, I. G. Kevrekidis, and G. Hummer. Coarse master equation from bayesian analysis of replica molecular dynamics simulations. J. Phys. Chem. B, 109(14):6479–6484, 2005.
[161] E. M. Stein and G. Weiss. Introduction to Fourier analysis on Euclidean spaces. Princeton
University Press, Princeton, N.J., 1971. Princeton Mathematical Series, No. 32.
[162] J. Tabak. Probability and Statistics: The Science of Uncertainty. Facts on File, New York,
2004.
[163] M. D. Tissandier, K. A. Cowen, W. Y. Feng, E. Gundlach, M. H. Cohen, A. D. Earhart,
J. V. Coe, and T. R. Tuttle. The proton’s absolute aqueous enthalpy and gibbs free energy of
solvation from cluster-ion solvation data. J. Phys. Chem. A, 102(40):7787–7794, 1998.
[164] S. Vaitheeswaran and D. Thirumalai. Hydrophobic and ionic interactions in nanosized water
droplets. J. Am. Chem. Soc., 128(41):13490–13496, Oct. 2006.
[165] J. P. Valleau and D. N. Card. J. Chem. Phys., 57:5457, 1972.
[166] N. G. van Kampen. Stochastic processes in physics and chemistry. Elsevier, 2007.
[167] S. Varma and S. B. Rempe. Coordination numbers of alkali metal ions in aqueous solutions.
Biophys. Chem., 124:192–199, Dec. 2006.
[168] L. Verlet and J.-J. Weis. Perturbation theory for the thermodynamic properties of simple
liquids. Molec. Phys., 24:1013–1024, 1972.
[169] G. Wahba, editor. Spline models for observational data, 1990.
[170] G. Wahba and Y. Wang. Behavior near zero of the distribution of gcv smoothing parameter
estimates. Stat. Prob. Lett., 25(2):105–111, Nov. 1995.
[171] J. Wang, Y. Deng, and B. Roux. Absolute binding free energy calculations using molecular
dynamics simulations with restraining potentials. Biophys. J., 91:2798–2814, 2006.
123
BIBLIOGRAPHY
[172] J. Wang, P. Morin, W. Wang, and P. A. Kollman. Use of MM-PBSA in reproducing the
binding free energies to HIV-1 RT of TIBO derivatives and predicting the binding mode to
HIV-1 RT of Efavirenz by docking and MM-PBSA. J. Am. Chem. Soc., 123(22):5221–5230,
2001.
[173] W. Wang1 and R. D. Skeel. Analysis of a few numerical integration methods for the Langevin
equation. Mol. Phys., 101(14):2149–2156, July 2003.
[174] G. L. Warren and S. Patel. Hydration free energies of monovalent ions in transferable intermolecular potential four point fluctuating charge water: An assessment of simulation methodology and force field performance and transferability. J. Chem. Phys., 127(6):064509, 2007.
[175] J. D. Weeks, D. Chandler, and J. C. Andersen. Role of repulsive forces in determining the
equilibrium structure of simple liquids. J. Phys. Chem., 54:5237–5247, 1971.
[176] C. D. Wick and S. S. Xantheas. Computational investigation of the first solvation shell
structure of interfacial and bulk aqueous chloride and iodide ions.
J. Phys. Chem. B,
113(13):4141–4146, 2009.
[177] B. Widom. Some topics in the theory of fluids. J. Chem. Phys., 39:2808–2812, 1963.
[178] R. Wilkes. United states patent: 2790245, Apr. 1957.
[179] A. Wlodawer and J. Vondrasek. Inhibitors of HIV-1 protease: A major success of structureassisted drug design. Ann. Rev. Biophys. Biomol. Struct., 27:249–284, June 1998.
[180] F. M. Ytreberg and D. M. Zuckerman. Simple estimation of absolute free energies for
biomolecules. J. Chem. Phys., 124:104105, 2006.
[181] Z. Zhao, D. M. Rogers, and T. L. Beck. Polarization and charge transfer in the hydration of
chloride ions. 2009. preprint.
124
BIBLIOGRAPHY
[182] Y. Zhou, C. K. Hall, and M. Karplus. First-order disorder-to-order transition in an isolated
homopolymer model. Phys. Rev. Lett., 77(13):2822–2825, Sep 1996.
[183] M. Zhu and A. Y. Lu. The counter-intuitive non-informative prior for the Bernoulli family. J.
Stat. Edu., 12(2), 2004.
125
Appendix A
Probability Theory
In subjective probability, the number P (B|A) is a numerical representation of the truth value of
proposition B given that proposition A is true. When all probabilities are constrained to be either
zero or one, this reduces to formal Aristotelian logic [83], where the above, if equal to one, would
be read as “A implies B”. The complete set of rules necessary in such a system of logic are the
product and sum rules: (respectively)
P (B and C|A) ≡ P (BC|A)
= P (C|AB)P (B|A)
P (B or C|A) ≡ P (B +C|A) = P (B|A) + P (C|A) − P (BC|A).
(A.1)
(A.2)
The order in which the propositions appear (as long as they remain on the same side of the |) is
not important, so P (BC|A) = P (CB|A). Since the above formulas should remain valid if we interchange the meanings of B and C, Eq. A.1 could equally be written as P (BC|A) = P (B|AC)P (C|A).
Fixing their original meanings, the interpretations of these two forms are different, corresponding
to either learning B|A or C|A first. This reciprocity forms the basis of Bayes’ Theorem, a simple
expression of the rules of subjective probability.
P (C|AB) =
126
P (B|AC)P (C|A)
P (B|A)
(A.3)
APPENDIX A. PROBABILITY THEORY
This theorem remains valid for any particular assignment to the propositions A, B, and C. For
most problems in statistical inference, C is the hypothesis, B is the observed data, and A is the
prior information indicating the values of P (B|AC) and P (C|A). P (B|A) can be obtained from the
integration of P (BC|A) over all possible hypotheses, C. Or, as is more common, P (B|A) can be
ignored in the likelihood ratio expression P (C1 |AB)/P (C2 |AB).
The behavior of this expression under the extreme cases where a contradiction exists is interesting to note. If C contradicts either the prior, A or the constraint B (i.e. they are mutually exclusive),
then P (C|AB) = 0 as it should. If instead B contradicts A, then Eq. A.3 numerically diverges, indicating no prediction about C can be made given inconsistent prior information.
As a simple introductory example of using probability concepts, we present the Bayesian inference on the mean and variance of a normal (Gaussian) distribution. For parameter inference
problems, we assign (in Eq. A.3) the proposition B = {yi }S1 to mean “the S observed data samples
had values y1 , . . . , yS ”, and C ≡ H to be the hypothesis, namely that the mean, µ and variance, σ2
of the normal distribution generating B have those specific values. Everything other than D and H
in the problem specification are implicitly included in the prior information, A. Thus in this case A
includes the proposition that S scalar data samples were taken, the data are normally distributed, any
µ is equally likely, and σ2 is a positive scale parameter. In addition, any other proposition that does
not lead to a contradiction with the problem setup may be included in A as long as it is completely
irrelevant for conclusions about C.
Jaynes [83] presents a wonderful discussion of determining prior distributions based on symmetry operations. In order to make the prior PDF for location variables such as µ invariant to
translations, we require
P (µ|A)dµ = P (µ + c|A)d(µ + c).
(A.4)
Of course, this functional equation is trivially solved by setting P (µ|A) = const.. This prior is
technically “improper,” i.e. its normalization constant is infinite. However this situation does not
prevent the use of the rules of probability theory (A.1 and A.2) and likelihood ratios still have a
127
definite interpretation, it just requires more attention to keep track of the constants.
Similar to the above, the symmetry constraint on the prior distribution for a scale parameter
suggests it should be invariant under multiplication by a constant. To solve the invariance condition
P (v|A)dv = P (cv|A)d(cv),
(A.5)
differentiate both sides with respect to c and divide by dv to give
0 = f (cv) + cv f ′ (cv)
(A.6)
which is solved by f ′ (u)/ f (u) = −1/u ⇒ ln f (u) = const. − ln u. Translating this back into a probability distribution for σ2 we have
P (σ2 |A) = const. × σ−2 .
(A.7)
For convenience, we’ll state the results in terms of σ−2 ≡ z. Equation A.7 can be transformed into a
prior for z via. multiplying by the Jacobian1
2
dσ = const. × z−1
P (z|A) = P (σ (z)|A) dz 2
(A.8)
which we should have expected, since z is also a scale parameter.
Now that we have a prior, we can write down Eq. A.3, substituting a normal distribution for
B|AC and z−1 for σ2 (since they represent equivalent propositions, C).
z
S
P (C|AB) = RR
1 To
z 2 −1 e− 2 ∑i=1 (yi −µ)
S
z
S
2
z 2 −1 e− 2 ∑i=1 (yi −µ) dµdz
S
2
(A.9)
get the numerator/denominator right, imagine multiplying by dz and note that the PDF times the volume element
appears on both sides. There’s a reason the derivative comes out negative too; the range (0, ∞) for σ2 has been reversed
to (∞, 0) for z so that our volume element is backwards.
128
APPENDIX A. PROBABILITY THEORY
Writing ∑Si=1 (yi − µ)2 ≡ S (y − hyi + hyi − µ)2 = S (y − hyi)2 + S(hyi − µ)2 and noting that the
denominator is (at it must be) independent of µ and z simplifies the above to
S
Sz
P (µz|AB) ∝ z 2 −1 e− 2 (v̂+(hyi−µ) ) ,
2
(A.10)
where have defined (y − hyi)2 ≡ v̂. For a given z, the conditional distribution of µ is a normal
distribution with mean hyi and variance σ2 /S as expected.
To infer a value for z, we treat µ as a nuisance parameter, integrate it out of Eq. A.10, and
normalize to get
P (z|AB) =
This is a Gamma distribution with a =
S−1
2
Sv̂
2
S−1
2
Γ( S−1
2 )
and b =
z
Sv̂
2.
S−3
Sv̂
2 e− 2 z .
(A.11)
If the variance is unknown, doing the same
integration over z gives a strange posterior distribution for µ
S
Γ( S2 )
(hyi−µ)2 − 2
√
,
1
+
v̂
v̂πΓ( S−1
2 )
(A.12)
which is the Student’s T-distribution [67] when the number of degrees of freedom is S − 1 and
t2 =
v̂
2
S−1 (hyi − µ) .
The posterior mean and variance of µ|AB are therefore hyi and
v̂
S−3 ,
for S ≥ 3.
129