Sequence Selection during Copolymerization

9546
J. Phys. Chem. B 2007, 111, 9546-9562
Sequence Selection during Copolymerization
Jonathan A. D. Wattis*,† and Peter V. Coveney*,‡
Theoretical Mechanics, School of Mathematical Sciences, UniVersity of Nottingham, UniVersity Park,
Nottingham NG7 2RD, United Kingdom, and Centre for Computational Science, Department of Chemistry,
UniVersity College London, Gower Street, London WC1E 6BT, United Kingdom
ReceiVed: March 5, 2007; In Final Form: May 3, 2007
We analyze a simple model for the copolymerization of two chemically distinct monomers that displays a
wide variety of product sequence compositions depending on the rate coefficients for the stepwise addition
of each monomer to a growing polymer chain. We show that in certain regions of parameter space the
copolymer sequence composition depends sensitively on these parameter values, and we compute the
information content of the resultant sequences. Finally, we show how the model may be used to interpret
experimental data on copolymerization reactions.
1. Introduction
When two monomers, each of which can individually selfpolymerize, are mixed and react to form a copolymer, what is
the resulting copolymer sequence? This is the question we set
ourselves to answer in the present paper. Such copolymerization
reactions arise inter alia in industrial and biological processes,
and the resulting sequences are often critical in determining the
properties of the product polymers. While there is significant
literature on certain aspects of pure and applied copolymerization,2,13,27 remarkably little insight is available on sequence
selection in such systems.
Flory14 discusses initiation and termination of the polymerization process. For the main problem concerned with the
kinetics at intermediate times, he considers the evolution of a
system in which as one monomer is depleted the other
dominates. He also discusses the semiempirical Alfrey-Price
Q,e scheme for approximating the rates of monomer addition
and briefly summarizes models of copolymerization kinetics.
Carraher8 reprises the basic kinetics of copolymerization before
discussing block copolymers and their physical properties,
applications, and commercial uses. Experimentally, the determination of copolymer composition can be achieved via a
variety of techniques. UV/vis spectroscopy can be used in
situations where the monomers have different characteristic
adsorption frequencies. In certain systems transmission electron
microscopy (TEM) can be used to assess the periodicity of block
copolymers. Campbell et al.7 discuss these techniques together
with a detailed account of the use of NMR techniques, which
are suited to the investigation of copolymer composition,
monomer sequences, configurational sequences, and reaction
mechanisms. As well as determining the large-scale ratios of
monomers composing copolymers, NMR can be used to
determine the detailed sequence structure of copolymers. NMR
also permits the determination of stereochemical configurations,
allowing meso and racemic (or isotactic, heterotactic, and
syndiotactic) sequences to be differentiated. This is possible
because while the 1H spectra of homopolymers may be fairly
* Authors to whom correspondence should be addressed. E-mail:
[email protected]; [email protected].
† University of Nottingham.
‡ University College London.
simple that of the copolymer is more complex due to interactions
between the different monomers in the chain’s sequence. One
further advantage of NMR is that polymerization rates can be
found from spin-spin relaxation times, and hence kinetic
properties can be investigated. Nevertheless, the interpretation
of NMR data can be complicated due to the effects of uncertain
orientation, crystallinity, and/or cross-linking of polymers.
Recently, experimental results on systems in which left- and
right-handed N-carboxyanhydride (NCA) amino acids copolymerize have been reported by Blocher et al.4 and Hitz et al.16
The copolymerization products show a distinct bias away from
the binomial distribution, which one would naively expect if
the two monomers had equal attachment probabilities. Although
these authors still observe a dominance of polymers containing
approximately equal numbers of left- and right-handed monomers, there are considerably more homopolymers (comprised
of both pure left- and pure right-handed forms) than are
predicted by a straightforward binomial distribution. We shall
show that this can be accounted for by a model where the
probabilities of attachment depend both on the attaching
monomer and on the unit that currently terminates a growing
polymer. Our central assumption is that polymers grow by the
stepwise addition of a single monomer at a time with rates
dependent on the identity of the currently terminating monomer
unit within the growing polymer chain and on the nature of the
monomer being added. This assumption enables us to exploit
basic properties of the Becker-Döring equations.3
Becker-Doring theory describes the general stepwise growth
of chains and clusters, being valid for the formation of crystal
nuclei as well as polymers.3,20 The latter case is simpler to study
as it typically has growth rates that are independent of cluster
size (equivalent to polymer length). In systems with continuous
input of monomers, instead of the equilibrium state, a steady
state may be approached in the large-time limit. Since this would
imply a fixed monomer concentration, a common approach is
to study models in which the monomer concentration is held at
some fixed value; this is often referred to as the “pool chemical
approximation”. While the steady-state solution is straightforward to write down for Becker-Döring polymerization systems
involving only a single species, the steady state is vastly more
complex in the multicomponent model formulated below. To
10.1021/jp071767h CCC: $37.00 © 2007 American Chemical Society
Published on Web 07/21/2007
Sequence Selection during Copolymerization
model the formation of copolymers of the form ...AABAABBBA..., we propose a model that has two sets of concentration
variables: one for the polymers ending in an “A” and a second
set for the concentrations of those terminated by a “B”. In
addition each set of variables has two subscripts, denoting
the total number of A and B units in the chain. Since the
equtions for these two concentration distributions are coupled
together, the steady state for the system as a whole is extremely
complex.
The Becker-Döring equations previously have been generalized in various other ways to deal with multicomponent
systems. For example, Wu28 determined the nucleation rate
of a multicomponent system by proposing a multicomponent
Becker-Döring model and analyzing the saddle point of
an equilibrium distribution. Carr and Dunwell9 used a multicomponent Becker-Döring model to analyze cell surface
capping. In previous papers, we have used generalized multicomponent Becker-Döring systems of equations to describe
the formation of vesicles in systems that exhibit an induction
time and in systems that display a curious size-templating
phenomenon5,10,11 and cluster growth in which shape changes
occur simultaneously with aggregation.22 In all of these cases a
scalar system of ordinary differential equations for the distribution of clusters as a function of size and composition is
produced; in this paper we obtain a more complicated vector
system in which we have to solve for two such distributions
simultaneously.
One of the purposes of analyzing the model proposed
here is to show that copolymerization can give rise to
long chains and, more importantly, that chains with a possibly
complex internal sequence structure can be formed, namely, a
structure of the type which could be used to store information
as is seen in RNA and DNA. However, as we have described
in other papers,23-25 the passing on of such information requires some form of feedback mechanism. This could be by
way of an autocatalytic process, as in our model for
the emergence of an “RNA world”,23 or in a more indirect way
in which products (long chains) influence the production of
more basic species (e.g., monomers), as observed in Sandars’
model of chiral polymerization.19,24 The model proposed here
has no such feedback mechanisms, and so the sequence of
monomer units comprising any polymer depends only on the
attachment rate constants and on the concentrations of monomers.
In the present paper, we analyze a system in which the
rates are small perturbations away from a model system in
which both monomers have the same rate of attachment as well
as systems in which the rate constants are strongly biased in
favor of producing either homopolymers or heteropolymers with
alternating sequences of monomers. Some of our mathematical methods of solution are based on a simple generalization of the asymptotic methods used in our earlier papers on
the Becker-Döring system17,26 to determine the evolution
of clusters of large sizes at large times. We also use exact
methods to determine the time evolution of quantities of
macroscopic importance, such as the total number of polymer
chains present and the average length of the chains. Although
such exact methods lead to complicated expressions, largetime asymptotic approximations provide useful descriptions of
behavior. In section 2 the copolymerization model is formulated, and the kinetic behavior of some macroscopic quantities
is derived for general systems. The detailed mathematical analysis leading to these results is summarized in
Appendix A.
J. Phys. Chem. B, Vol. 111, No. 32, 2007 9547
There then follow three sections in which specific systems
are studied in more detail. The full mathematical analysis is
reserved for Appendix B, and the results for three example
systems are illustrated in sections 3, 4, and 5. In section 3 we
examine a system in which the rate of attachment of both A
and B monomers is the same regardless of whether the chain
currently ends in an A or a B unit. We give details of the kinetics
of the macroscopic quantities such as average chain length,
number of chains, and mass of chains as well as describing the
steady-state solution. Similar details are given for the heteropolymer-dominated and homopolymer-dominated systems in
sections 4 and 5. The latter is of particular interest, since
homopolymers only predominate at shorter lengths, while longer
polymers are dominated by heteropolymers, albeit with chains
that include long homopolymer subsequences. The analysis of
the expected lengths of homopolymer subsequences and information content for the chains is given in section 6. Section 7
illustrates an application of our theory to experimental results
on the polymerization of amino acids, which occur in left- and
right-handed monomeric forms, and so upon polymerization
form a mixture of homochiral and heterochiral chains. Using
liquid chromatography/mass spectrometry, Luisi’s group has
managed to obtain the relative abundances of each type of
polymer,4,16 finding abnormally high concentrations of homopolymers. While they argue that this is due to a high-order
kinetic Markov process, we show in this paper that the results
can be explained by a model that uses only first-order Markov
kinetics. The conclusions from this study are presented in section
8.
2. Copolymerization Model
We model the formation of copolymers via the following four
reactions
...XXXA + A f ...XXXAA with rate coefficient ) kAA
...XXXA + B f ...XXXAB with rate coefficient ) kAB
...XXXB + A f ...XXXBA with rate coefficient ) kBA
...XXXB + B f ...XXXBB with rate coefficient ) kBB (2.1)
where X represents either an A or a B monomeric unit. This is
the starting point for Flory’s analysis of copolymerization14 (eq
1 of chapter 5). We assume that polymerization occurs by only
one monomeric unit being added at a time (stepwise growth),
the key physically realistic assumption that enables us to use
Becker-Döring equations to model the process. We also assume
that the rate of addition of monomers depends only on the
concentration of the monomer being added and the final
monomer on the chain; that is, there are no longer-range
interactions involving monomer units further down the chain.
To model the full system in all its complexity we would need
γ
to introduce concentration variables cr,s
, where γ is a string of
A’s and B’s encoding the exact order in which monomers have
been added to make the polymer. Such a problem is NP
(nonpolynomial) hard.15 Therefore, instead of the full problem,
we shall analyze a simplification. We shall classify a polymer
chain by just three quantities: the number of A monomers of
which it is composed, which we denote by a subscript r; the
number of B monomers it contains, denoted by the subscript s;
the final monomer in the chain, which we shall denote with a
superscript. Thus we shall denote the copolymer chains by
A ∞,∞
B ∞,∞
{Cr,s
}r)1,s)0 and {Cr,s
}r)0,s)1 and their concentrations by lowerA
∞,∞
B
∞,∞
and {cr,s
(t)}r)0,s)1
, respectively.
case variables {cr,s(t)}r)1,s)0
9548 J. Phys. Chem. B, Vol. 111, No. 32, 2007
Wattis and Coveney
B
A
cq,0
) 0 and c0,q
) 0 for q g 2
(2.12)
Then eq 2.6 is valid for all (r,s) except (1,0) and (0,0), and eq
2.7 is valid for all (r,s) except (0,1) and (0,0).
There are two possibilities for the monomer concentration;
first, assuming the pool chemical approximation, we will
consider constant monomer concentrations cA1,0 and cB0,1 independent of time; alternatively, we could consider a constant
density condition for each component, namely, that
Figure 1. Illustration of (r,s)-space showing domains of definition of
A
B
the concentrations cr,s
and cr,s
.
Note that in this scheme A ) CA1,0 and B ) CB0,1; pure
A
B
and C0,s
; all copolymer
homopolymers are denoted by Cr,0
A
B
chains Cr,s or Cr,s with r, s g 1 are heteropolymers. Note also
B
A
that chains of the form cr,0
and c0,s
are forbidden.
Thus we have the following reactions to consider
A
A
Cr,s
+ A f Cr+1,s
with rate coefficient kAA
(2.2)
A
B
+ B f Cr,s+1
with rate coefficient kAB
Cr,s
(2.3)
B
A
+ A f Cr+1,s
with rate coefficient kBA
Cr,s
(2.4)
B
B
+ B f Cr,s+1
with rate coefficient kBB
Cr,s
(2.5)
Instead of a simple one-dimensional distribution, we now have
to picture distributions in three dimensions, the axes being r, s,
and concentration. Figure 1 illustrates the areas in (r,s) parameter
A
B
space where cr,s
and cr,s
are defined.
The monomer reactivity ratios are defined by rA ) kAA/kAB
and rB ) kBB/kBA. Flory14 calculates how the evolution of
copolymer composition alters with rA, rB, and fA ) cA1,0/(cA1,0 +
cB0,1), noting that all systems that had been subject to experimental determination of rates showed rArB e 1. The case rArB
) 1 is classified as “ideal” and leads to the sequence of
monomers in the resultant polymers being random. For a system
in which the monomers are not replenished, it is necessary to
trace the ratio of monomers in the unreacted part of the system,
that is, fA, if one is to predict the composition of copolymers.
In the pure irreversible polymerization case (that is, neglecting
any possible breakdown of the growing chains) the kinetic
equations are
A
A
A A
B
c̆r,s
) kAAcr-1,s
cA1,0 - kAAcr,s
c1,0 + kBAcr-1,s
cA1,0 A B
c0,1 r g 2, s g 1 (2.6)
kABcr,s
B
B
B B
A
) kBBcr,s-1
cB0,1 - kBBcr,s
c0,1 + kABcr,s-1
cB0,1 c̆r,s
B A
kBAcr,s
c1,0 r g 1, s g 2 (2.7)
A
A A
B A
A B
) -kAAc1,s
c1,0 + kBAc0,s
c1,0 - kABc1,s
c0,1 s g 1
c̆1,s
(2.8)
B
B B
A B
B A
) -kBBcr,1
c0,1 + kABcr,0
c0,1 - kBAcr,1
c1,0 r g 1
c̆r,1
(2.9)
A
A
A A
A B
) kAAcr-1,0
cA1,0 - kAAcr,0
c1,0 - kABcr,0
c0,1 r g 2 (2.10)
c̆r,0
B
B
B A
B B
) kBBc0,s-1
cB0,1 - kBAc0,s
c1,0 - kBBc0,s
c0,1 s g 2
c̆0,s
(2.11)
together with equations for the monomer concentrations.
The last four equations can be ignored if one imposes the
conditions
FA )
FB )
∞
∞
∞
∞
A
B
r(cr,s
(t) + cr,s
(t))
∑
∑
r)1 s)0
A
B
s(cr,s
(t) + cr,s
(t))
∑
∑
r)0 s)1
(2.13)
are independent of time. Since the only process occurring is
irreversible polymerization, the mass will increase without
bound in the former case, whereas in the latter (eq 2.13), an
equilibrium configuration will be reached as all of the monomer
is converted into polymeric form and the monomer concentrations drop to zero. In this case the equilibrium solution is
degenerate in that there are infinitely many equilibria, and the
one approached will depend upon the initial conditions of the
system. For the remainder of this paper we make use of the
pool chemical approximation; that is, we study the constant
monomer formulation, where we assume that mass is continuously inserted into the system as polymers form so as to keep
the concentrations cA1,0 and cB0,1 constant. The total number of
polymers and total mass of the system thus increase without
bound.
2.1. Temporal Behavior. Here we describe the evolution of
macroscopically observable quantities, such as the total number
of polymers, the mass of polymers, and average chain length.
Since the model is based on two distributions, one for polymers
A
B
) and the other for those terminated by B (cr,s
),
ending in A (cr,s
we define corresponding quantities for the total number of
polymers
∞
∞
∞
∞
NA(t) )
∑
∑ cr,sA (t)
r)1 s)0
NB(t) )
∑
∑ cr,sB (t)
r)0 s)1
N(t) ) NA(t) + NB(t)
(2.14)
the total (mass-weighted) concentration of polymeric material
being given by eq 2.13 together with F(t) ) FA(t) + FB(t).
This definition of F(t) assumes that the masses of the A
and B monomers are equal. In Appendix A a more detailed
analysis is given that is applicable to cases where the monomers
have differing masses. Appendix B describes the process by
which the definitions of macroscopic quantities in terms of
microscopic variables (eqs 2.13 and 2.14) and the kinetic
equations for the evolution of the microscopic variables
(eqs 2.6-2.11) are combined to form a solvable system of
equations for the macroscopic variables. Appendix B also
describes the methods of solution of the kinetic equations. The
resulting kinetic equations for the macroscopic quantities NA
and NB are
Sequence Selection during Copolymerization
J. Phys. Chem. B, Vol. 111, No. 32, 2007 9549
ṄA ) - γNA + δNB + (R + γ)cA1,0
ṄB ) γNA - δNB + (β + δ)cB0,1
(2.15)
where, to simplify the analysis of these equations, we define
the following parameters
R ) kAAcA1,0
β ) kBBcB0,1
γ ) kABcB0,1
δ ) kBAcA1,0
(2.16)
β
β+δ
(2.17)
R̂ )
R
R+γ
β̂ )
Since we are analyzing the system using the pool chemical
approximation, the concentrations cA1,0 and cB0,1 are constants.
The system (eq 2.15) is solved by noting that the total number
of chains N(t) ) NA(t) + NB(t) grows linearly with time so that
N(t) ) N(0) + [(R + γ)cA1,0 + (β + δ)cB0,1]t
(2.18)
Full details of the solution are given in Appendix A. To
summarize, in general, at large times we find linear growth in
the number of chains according to the asymptotic formulas
NA(t) ≈
NB(t) ≈
δ[(R + γ)cA1,0 + (β + δ)cB0,1]t
(γ + δ)
γ[(R + γ)cA1,0 + (β + δ)cB0,1]t
(γ + δ)
(2.19)
Note that the quantity NB/NA ) γ/δ is independent of time and
so the numbers of the two types of chains grow while
maintaining the same constant of proportionality.
The masses in the system (eq 2.13) grow quadratically with
time, the large-time asymptotic rate being given by
FA ≈
FB ≈
δ(R + γ)[(R + γ)cA1,0 + (β + δ)cB0,1]t2
2(γ + δ)
γ(β + δ)[(R +
γ)cA1,0
+ (β +
δ)cB0,1]t2
2(γ + δ)
(2.20)
Note that FB/FA ) (1 - R̂)/(1 - β̂) is independent of t. The
average chain length can be determined from the quantities N,
FA, and FB. Since the total mass of polymer chains is given by
FA + FB, the average length, L, is given by
L)
FA + FB [δ(R + γ) + γ(β + δ)]t
)
N A + NB
2(γ + δ)
states whose shape is described by the complementary error
function (erfc). Thus at large times, the total number of clusters
t
t
1 ) t, and the total mass of clusters F ≈ ∑r)1
r≈
N ≈ ∑r)1
1/ t2. Hence the mass grows quadratically in time while the
2
number of polymers grows linearly, and this is entirely
consistent with the individual concentration variables approaching a steady state. In the next few sections we illustrate the
form of the steady-state solution for some archetypal situations.
The general form of the steady-state solution is given in
Appendix B (along with solutions for the dynamics away from
steady state).
3. Random Polymerization
In this section we study the situation in which the rates at
which monomers attach to polymers depend neither on the
terminating unit of the polymer nor on the nature of the
monomer being attached. Hence we use the term “random
polymerization”. In later sections we will analyze the two cases
where the terminating polymeric residue displays a strong
preference to attach either to a different monomer or to a
monomer of the same species, leading to polymers with
sequences that are predominantly ...ABABAB... or ...AAAAA...,
respectively. In terms of the rates (eq 2.16), the random
polymerization scheme corresponds to R ) β ) γ ) δ; hence
R̂ ) β̂ ) 1/2 (eq 2.17).
From eqs 2.19 and 2.20, the large-time asymptotes of the
macroscopic quantities are given by NA ) NB ) t and FA ) FB
) (cA1,0 + cB0,1)t2. The average polymer length, L ) F/N, is thus
given by L ) t.
The microscopic quantities, namely, the individual concentration variables, are obtained from eqs B23, B24, and B31
[
r-2
1
1
A
cr,s
≈ cB0,1dr-2,s-1 + cA1,0
2p+2-rdp,s-1 +
4
4
p)0
1 B s-2 q+2-s
2
dr-2,q (3.1)
c0,1
2
q)0
]
∑
[
1
1 1 A r-2 p+2-r
B
≈ cA0,1dr-1,s-2 +
2
dp,s-2 +
cr,s
c1,0
4
42
p)0
∑
s-2
cB0,1
2q+2-sdr-1,q
∑
q)0
]
(3.2)
where dp,q is defined by eq B22. In this case, since R̂ + β̂ ) 1,
dp,q takes on a particularly simple form, namely, dp,q ) 2-p-q(p
+ q)!/p!q!. This enables eqs 3.1 and 3.2 to be simplified to
(2.21)
Thus the average length of the polymer chains grows linearly
in time as does the total number of chains.
Since the number of polymers is growing and the mass of
polymers is growing faster, it is tempting to assume that the
A (t) and c B (t) are
individual concentrations of each species cr,s
r,s
also increasing without bound. However, the individual concentrations approach a steady state. This can be seen from a
simple one-component Becker-Döring system, c̆r ) c1(cr-1 cr) with c1 ) 1 as the asymptotic solution and cr ) 1/2 erfc((r
- t)/x2t).26 This solution has three regions: For r/t < 1, the
concentrations are at the steady state given by cr ≈ 1; for r/t >
1, the concentrations are basically zero; in the narrow region r
- t ) O(xt), there is a smooth transition between these two
∑
A
cr,s
)
B
cr,s
)
2r(rcA1,0 + scB0,1)
(r + s)2
2s(rcA1,0 + scB0,1)
(r + s)2
(
(
exp -
(r - s)2
2(r + s)
exp -
(r - s)2
2(r + s)
)
)
(3.3)
A
B
Illustrative examples of distributions cr,s
and cr,s
are given in
Figure 2. Note that the system is dominated by heteropolymers
but that the distribution is widely spread around the line r ) s.
4. Heteropolymer-Dominated System
In this case, the system’s rate parameters are such that R ≈
β , γ ≈ δ; that is, a B monomer is much more likely to attach
to a polymer chain ending in A than a polymer that terminates
9550 J. Phys. Chem. B, Vol. 111, No. 32, 2007
Wattis and Coveney
A
B
Figure 2. Top panels: Illustrations of cr,s
(left) and cr,s
(right) distributions, plotted against aggregation numbers r,s in the case of random
A
B
1
polymerization, that is, where R ) β ) γ ) δ ) /2; thus also R̂ ) β̂ ) 1/2. Bottom panels: Similar plots of log(cr,s
) (left) and log(cr,s
) (right).
with B. Thus we expect to find long ABABAB sequences and
very few instances of AA or BB.
4.1. Temporal Behavior. The large-time asymptotes for the
total number of polymer chains ending in an A or a B reduce
to
NA(t) ∼
δ(γcA1,0 + δcB0,1)t
(γ + δ)
γ(γcA1,0 + δcB0,1)t
NB(t) ∼
(γ + δ)
(4.1)
when the approximations of this section (namely R,β , γ,δ)
are applied to the equations of subsection 2.1. The corresponding
results for the mass of material in polymeric form are
FA ≈ FB ≈
γδ(γcA1,0 + δcB0,1)t2
2(γ + δ)
(4.2)
in the limit R,β f 0. Together, these results imply that the
average chain length grows linearly in time and is given by
L≈
γδt
γ+δ
(4.3)
If the monomers are present in equal concentrations, that is cA1,0
) cB0,1 ) c1, then these results simplify further to NA ≈ δc1t,
NB ≈ γc1t, and FA ≈ FB ≈ γδc1t2/2.
4.2. Microscopic Steady-State Solution. From subsection
B4 and eqs B23 and B24 of Appendix B we have
the distributions have the same qualitative shape as for the
randomly polymerizing system of section 3, the distribution is
much more sharply localized along the line r ) s.
5. Homopolymer-Dominated System
In homopolymer-dominated systems, the rate parameters are
such that R ≈ β . γ ≈ δ; that is, a B monomer is much more
likely to attach to a polymer terminating in another B than one
that terminates with an A. Thus we expect to see long sequences
of A’s and B’s but only rarely to observe an AB or BA step.
Shorter polymers will thus be dominated by pure homopolymer
sequences. However, as polymer length increases, so does the
likelihood of finding one or a few such changes of this type.
Thus, for long polymers, we expect that the distribution of
polymers will make a transition away from being dominated
by pure homopolymers to a situation dominated by mixed
polymers.
5.1. Temporal Behavior. While eqs A1, A2, A5, A6, 2.19,
and 2.20 are valid for γ,δ > 0, in the case R,β . γ,δ > 0,
there is a simpler asymptotic solution that is valid at intermediate
times. In this limit, the eq 2.15 reduces to ṄA ) RcA1,0 and ṄB
) βcB0,1, with solutions
NA(t) ) NA(0) + RcA1,0t NB(t) ) NB(0) + βcB0,1t (5.1)
The mass is then determined by F̆A ) RNA + RcA1,0, F̆B ) βNB
+ βcB1,0, implying
1
FA ) FA(0) + R(cA1,0 + NA(0))t + R2cA1,0t2
2
A
≈
cr,r
kBAcA1,0
δ B
c0,1 )
γ
kAB
1
FB ) FB(0) + R(cB0,1 + NB(0))t + β2cB0,1t2
2
B
cs,s
≈
kABcB0,1
γ A
c1,0 )
δ
kBA
Assuming γ,δ > 0 these expressions are valid for times less
than O(1/(γ + δ)), over which the average chain length is given
by
A
B
cr+1,r
≈ cA1,0 cs,s+1
≈ cB0,1
(4.4)
Example distributions are given in Figure 3. Note that, although
FA + FB (R2cA1,0 + β2cB0,1)t
≈
L)
N A + NB
2(RcA + βcB )
1,0
0,1
(5.2)
(5.3)
Sequence Selection during Copolymerization
J. Phys. Chem. B, Vol. 111, No. 32, 2007 9551
A
B
Figure 3. Top panels: Illustrations of cr,s
(left) and cr,s
(right) distributions plotted against r,s in the heteropolymer-dominated case where R ) β
A
B
) 0.1 and γ ) δ ) 1. Bottom panels: Plots of log(cr,s
) (left) and log(cr,s
) (right). Note the vertical scales here are different from those in
Figure 2.
Over longer time scales alternative formulas described in
Appendix A, namely, eqs A1 and A2, should be used. When t
. 1/(γ + δ), eqs A1 and A2 simplify to
δ(RcA1,0 + βcB0,1)t
NA ≈
γ+δ
FB ≈
(5.4)
A
)
cr,s
δR(RcA1,0 + βcB0,1)t2
2(γ + δ)
γβ(RcA1,0 + βcB0,1)t2
(5.5)
2(γ + δ)
hence L is given by
L)
A
cr,s
≈
FA + FB (Rδ + βγ)t
≈
N A + NB
2(γ + δ)
(5.6)
Thus we observe that in both temporal regimes NA, NB, and L
all grow linearly, and FA and FB grow quadratically in time,
though with different rates when (γ + δ)t is small or large.
The transition region occurs when t ≈ 1/(γ + δ), and the kinetics
in this region are governed by the more complex expressions
or eqs A1 and A2 of Appendix A.
5.2. Steady-State Solution. We consider the limit in which
A-A and B-B growth rates are significantly faster than the
A-B and B-A rates of formation. In this case we assume R ≈
β ≈ O(1) , γ ≈ δ ≈ so that R̂,β̂ ) 1 + O(). For polymers
of length O(1), we expect the system to be dominated by pure
polymers with concentrations given by
(
(
δcB0,1
γcA0,1
B
≈
+ O(2) cr,s
+ O(2)
R+γ
β+δ
(5.8)
and
γ(RcA1,0 + βcB0,1)t
NB ≈
γ+δ
FA ≈
For other concentrations, we have
)
)
A
) R̂r-1cA1,0 ≈ cA1,0 1 cr,0
γ(r - 1)
R+γ
B
c0,s
) β̂s-1cB0,1 ≈ cB0,1 1 -
δ(s - 1)
β+δ
B
)
cr,s
δcB0,1
[1 + (r - 3)R + (s - 1)β] +
R+γ
Rβ(r - 2)cA1,0 (5.9)
γcA1,0
[1 + (r - 1)R + (s - 3)β] +
β+δ
Rβ(s - 2)cB0,1 (5.10)
Example distributions are given in Figure 4.
At shorter polymer lengths, the distribution is dominated
A
by homopolymers; that is, the cr,s
distribution is dominated by
A
B
B
A
B
and cr,s
in
cr,0 and cr,s by c0,s. This is evident in the plots of cr,s
the top part of Figure 4. As longer polymers are formed, it
becomes increasingly likely that a heteropolymer step will be
incorporated into the sequence of monomer units. Hence at
A
B
and cr,s
larger r,s, the distributions become dominated by cr,s
with both r and s nonzero. However, since the polymers so
formed have extremely long homopolymeric subsequences, the
distributions have a large variance. This effect is more evident
A
B
in the plots of log(cr,s
) and log(cr,s
) shown in the lower part of
Figure 4.
6. Average Run Length and Information Content
(5.7)
To determine the average length of a sequence of identical
units in a given polymer chain as well as other quantities that
relate to the information stored in the chain, we define the
probabilities pAA, pAB, pBA, and pBB, pXY being the probability
9552 J. Phys. Chem. B, Vol. 111, No. 32, 2007
Wattis and Coveney
A
B
Figure 4. Top panels: Illustrations of the cr,s
(left) and cr,s
(right) distributions plotted against r and s in the homopolymer-dominated case where
A
B
A
R ) β ) 1 and γ ) δ ) 1/4. Bottom panels: Plots of log(cr,s
) and log(cr,s
) against r,s. In the top left, the dominance of homopolymers cr,0
is evident
B
in the distribution of shorter chains, similarly for c0,s in the top right. The lower graphs show the dominance of heteropolymers in longer chains.
of occurrence of an X monomer followed by a Y. Thus we
have
pAA )
R
) R̂
R+γ
pBA )
δ
) 1 - β̂
β+δ
pAB )
γ
) 1 - R̂
R+γ
(6.1)
β
) β̂
β+δ
(6.2)
pBB )
Note that pAA + pAB ) 1 ) pBB + pBA.
We use the variable A to refer to the length of an A sequence,
and B for the length of a B sequence. We find that the
probabilities of finding a sequence of precisely n repeating A
units or n B units are
n-1
P[A ) n] ) pBApn-1
AA pAB P[B ) n] ) pABpBB pBA
E[A] )
∑n
)
pBApn-1
AA pAB
1
1 - pAA
)
1
1 - R̂
E[B] )
∑n
pABpn-1
BB pBA
)
1
1 - pBB
)
1
1 - β̂
∑n n2pBApn-1
AA pAB
∑n
- E[A]2 )
pBApn-1
AA pAB
pAA
γ
V[B] )
∑n n2pABpn-1
BB pBA
∑n
pABpn-1
BB pBA
- E[B]2 )
γ
)
1+
β
δ
(6.5)
These quantities give an indication of the typical lengths of chain
to be expected in the system. This calculation ignores end
effects, which are negligible in extremely long chains; formally
R
γ
(6.6)
)
(1 - pBB)2
δ
(6.4)
( )
1+
pBB
β
R
)
(1 - pAA)2
R
)
1+
∑n npBApn-1
BB pBA
V[A] )
(6.3)
The expected lengths of such homopolymeric sequences are then
defined by
∑n npBApn-1
AA pAB
in the homopolymer-dominated case, the total chain length
should exceed O(R/γ,β/δ) before such an assumption is valid.
Note that our model allows the lengths of A tracts and those
of B tracts to differ, since they are determined by completely
independent parameters. The variances of the length distributions
are also calculable
( )
1+
β
δ
(6.7)
These quantities show how likely it is to observe chains far
from the expected lengths (eqs 6.4 and 6.5). A large variance
indicates that a wide distribution of chain lengths will be
observed, while a small variance indicates a close to monodisperse distribution. The standard deviations are defined by σA2
) V[A] and σB2 ) V[B].
6.1. Run Length. The run length is the average number of
sequences of either type per 100 monomer units. To determine
this we follow Allcock and Lampe2 and define S(t) as the
number of sequences in the system. This increases whenever a
homopolymeric subsequence is completed, that is, whenever
an A-B or a B-A attachment is made; thus its rate of change
is given by
Ṡ ) γNA + δNB
(6.8)
Sequence Selection during Copolymerization
J. Phys. Chem. B, Vol. 111, No. 32, 2007 9553
We also define the flux of monomers into the system (J), which
is the total rate of monomer addition to chains
J ) (R + γ)NA + (β + δ)NB
n
pi log2 pi
∑
i)1
(6.10)
and pi is the probability of event Ei. (Note that the logarithm is
taken to the base two, since in most applications the events are
encoded as zeros or ones.)
In our case, to make a simple application of this theory we
assume that the information is stored in the sequence of
monomers, that is, whether an A is followed by another A or a
B rather than in the copolymer blocks themselves (that is, in
whether any given monomer in the chain is an A or a B). So
our system has four basic units of information, namely, the AA,
AB, BA, and BB bonds.
We use conditional probabilities to determine the probability
associated with each event. The probability of finding an AB
bond is given by the product of the probability of finding an A
monomer and the probability of it being followed by a B. Thus
P[AA] ) P[AA|A]P[A] P[AB] ) P[AB|A]P[A]
(6.11)
P[BA] ) P[BA|B]P[B] P[BB] ) P[BB|B]P[B]
(6.12)
The conditional probabilities P[XY|X] refer to the probability
of a monomer X being followed by Y; these quantities are
determined by the relative rates of the processes by which A
monomers are added to sequences that end in an A or B, etc.
Since
P[AA|A] + P[AB|A] ) 1 ) P[BB|B] + P[BA|B]
(6.13)
and the rates for AA, AB, BA, and BB formation are R, γ, δ,
and β respectively, we have
P[AA|A] ) R̂ )
R
P[AB|A] )
R+γ
1 - R̂ )
P[BA|B] ) 1 - β̂ )
γ
(6.14)
R+γ
β
δ
P[BB|B] ) β̂ )
(6.15)
β+δ
β+δ
Finally, we need the probability of finding an A or B in a chain.
This is given by the relative numbers of each type of monomer
in the system; hence
P[A] )
FB
γ(β + δ)
)
)
FA + FB δ(R + γ) + γ(β + δ)
1 - R̂
(6.17)
2 - R̂ - β̂
(6.9)
The quantity J/Ṡ is then the average sequence length of either
type of chain, and so 100Ṡ/J is the run length.
6.2. Shannon Entropy. If we are interested in the ability of
copolymers to serve as a primordial genetic code, then it is
useful to have a measure of the information contained within a
polymeric sequence. Here, we shall define “information” as
“negative entropy”, specifically the Shannon entropy, H.12,18
This is a measure of the uncertainty or entropy associated with
the sample space of a sequence. Given a sequence X composed
n
of a sequence of n events {Ei}i)1
, the Shannon entropy is
defined by H, where
H(X) ) -
P[B] )
FA
δ(R + γ)
)
)
FA + FB δ(R + γ) + δ(β + δ)
1 - β̂
(6.16)
2 - R̂ - β̂
Putting all of these probabilities together using eq 6.10 we find
H)
-1
(Rδ log2(Rδ) + βγ log2(βγ) +
σ
2γδ log2(γδ) - σ log2σ) (6.18)
where σ ) Rδ + βγ + 2γδ ) (R + γ)(β + δ)(2 - R̂ - β̂).
6.3. Randomly Polymerizing System. In this case we have,
for some constant k, R ) k(1 + R), β ) k(1 + β), γ ) k(1 +
γ), δ ) k(1 + δ), with X , 1 for all X, so that R̂ ) 1/2(1 +
1/ - 1/ ) and β̂ ) 1/ (1 + 1/ - 1/ ). The quantities A
2 R
2 γ
2
2 β
2 δ
and B describe the number of consecutive A and B units in a
chain. We find
E[A] ≈ 2 + (R - γ) E[B] ≈ 2 + (β - δ)
(6.19)
V[A] ≈ 2 + 3(R - γ) V[B] ≈ 2 + 3(β - δ)
(6.20)
so the average run length is approximately two, with the
perturbations R and γ affecting the average length of A tracts
and β and δ affecting the length of B tracts. These quantities
also affect the variance of the sequence length distributions; in
this case the standard deviations (which are approximately
x2) are large in comparison with the mean, and so we expect
a much broader distribution of sequence lengths. The run length
is calculated from 100Ṡ/J, which is
1
50 1 + (γ + δ - R - β)
4
(
)
(6.21)
It is thus affected by all perturbations, positive R and β
reducing the run number and positive γ and δ increasing it as
one would expect.
In this case, applying eq 6.18, we find σ ) 4 and H ) 2.
Any deviation of the rate parameters R, β, γ, and δ from unity
causes a reduction in information content, according to the
relationship
(
)
2(R - β - γ + δ)2 +
(R + β - γ - δ)2
+ O(3) (6.22)
H)232 loge 2
Note that the correction to H ) 2 is O(2) and not O(); that is,
small variations in the rate constants only make very small
alterations to the information content of the resulting sequences.
Even at O(2) only certain perturbations to the rate constants
alter the Shannon entropy; for example, if R ) γ and β ) δ,
then corrections to H are even smaller. We will comment further
on the information content in subsection 6.6, when similar
calculations have been presented for the other systems studied
and the results can be compared.
6.4. Heteropolymer-Dominated System. In this case, we
assume that the system has a strong preference for alternating
sequences; that is, B attaches to A and A to B with a much
higher probability than A to A or B to B; mathematically, this
corresponds to the rate parameters satisfying γ,δ . R,β so that
R̂,β̂ , 1. Equations 6.4 and 6.5 show that the expected run
lengths for uninterrupted sequences of A (and B), E[A] (and
E[B]), are both very slightly above unity, since in this case
R,β , γ,δ. The variances are V[A] ≈ R/γ and V[B] ≈ β/δ
9554 J. Phys. Chem. B, Vol. 111, No. 32, 2007
Wattis and Coveney
with corresponding standard deviations σA ≈ xR/γ and σB ≈
xβ/δ, so there is no significant variation in chain lengthssall
are tightly clustered around unity, as one would expect in a
system in which polymers of the form ...ABABABA... dominate.
We thus expect the run length to be approximately 100. We
use eq 4.1 to obtain
R
2γδ
β
Ṡ
≈1≈
J (R + γ)δ + γ(β + δ)
2γ 2δ
(6.23)
and thus we have the first correction term to the run number
(given by 100Ṡ/J). Calculating the Shannon entropy in this case,
we find H ) 1 from eq 6.18, since σ ≈ 2γδ. We will comment
further on this result in section 6.6.
6.5. Homopolymer-Dominated System. In this case the
monomers have a strong self-affinity, so the formation of bonds
between similar monomeric units dominates that of differing
units; hence polymeric strands are primarily formed through
the creation of A-A and B-B bonds with very few A-B or
B-A bonds being formed. Mathematically, this corresponds to
the parameters satisfying R,β . γ,δ so that both R̂ and β̂ are
close to, but just below, unity; we write
R̂ ) 1 - R̃ β̂ ) 1 - β̃ with R̃,β̃ , 1
(6.24)
The quantities A and B describe the run lengths of A and B
monomers, respectively, as defined by eq 6.3. Ignoring end
effects arising in short chains, we find
E[A] ) 1 +
β
R
. 1 E[B] ) 1 + . 1
γ
δ
(6.25)
Here we are assuming that the length of chains is typically much
greater than R/γ and β/δ (otherwise end effects will be
significant). The variances V[A] and V[B] are also large, giving
standard deviations σA ≈ R/γ and σB ≈ β/δ. There are thus
significant variations about the mean and substantial degrees
of polydispersity.
Calculating Ṡ and J we find
Ṡ ≈ (RγcA1,0 + βδcB0,1)t
J ≈ [(R + γ)RcA1,0 +(β + δ)βcB0,1]t (6.26)
at larger times; thus the run length can be approximated by
A
B
100Ṡ Rγc1,0 + βδc0,1
≈ 2 A
,1
J
R c1,0 + β2cB0,1
(6.27)
A small run length is to be expected in a system that is
dominated by the homopolymer reaction mechanism. Such a
system has a low probability of forming an A-B or a B-A
step, so homopolymeric subsequences are long, and in a
sequence of 100 units, there will be very few such homopolymeric subsequences.
In homopolymer-dominated systems, using eq 6.24, the
Shannon entropy H simplifies to
H)
-1
(R̃ log2 R̃ + β̃ log2 β̃ R̃ + β̃
(R̃ + β̃) log2(R̃ + β̃)) (6.28)
Thus we observe that H depends quite delicately on the sizes
of the smaller quantities R̃ ≈ γ/R and β̃ ) δ/β. The Shannon
entropy has a maximum value of unity when R̃ ) β̃, and a
minimum value of zero along the lines R̃ ) 0 and β̃ ) 0.
6.6. Entropy and Information. While the interpretation of
run length is intuitively clear, the meaning of informational
entropy requires further discussion. For the randomly polymerizing system we found H ) 2, for the system dominated by
heteropolymers H ) 1, and for the homopolymeric system, the
more complex result eq 6.28, which satisfies H e 1.
Thus the randomly polymerizing system has the potential of
containing twice as much information as either of the other two
systems. In general the homopolymeric system carries the least
information; however, if carefully balanced, the homopolymeric
system can carry as much information as the heteropolymeric
system, but if the two components have differing polymeric
growth rates, the system’s capacity for information storage is
vastly reduced.
7. Comparison with Experimental Data
The theory derived above applies to general copolymerizing
systems composed of two types of monomeric units. It is
straightforward to see that the theory could be generalized to
systems involving more types of monomeric units, and while
the complexity of the system increases, no new ideas are
required in the construction and derivation of the governing
kinetic equations. We have introduced all of the techniques
required for the progression from single-component, classical,
Becker-Döring theory to multicomponent copolymerization.
There are a number of interesting experimental systems in
which two monomers interact. In generic systems, where the A
and B monomers are quite different, there is no a priori reason
for any constraints on the rates of addition R, β, γ, and δ.
However, one of the more intriguing examples in which
surprising results occur is chiral polymerization where the
monomeric units are identical except for their chiral structure.
This difference influences the growth of polymeric chains. A
variety of copolymeric chains grow with chains typically
comprising a mixture of right-handed (D) and left-handed (L)
monomers. Defining the left-handed monomer as A ) L and
the right-handed as B (or R or D), we then expect the A-A
interactions to be identical to the B-B interactions and the A-B
monomer addition rate to be the same as the B-A rate. Hence
we expect R ) β and γ ) δ, though γ * R is possible and in
general to be expected. This restriction on the expected rates
actually simplifies the model and makes it easier to fit to
experimental data since there are fewer parameters that one can
vary to obtain the best fit. We assume that the polymerization
process is a first-order Markov process; that is, the rates are
independent of the sequence structure further along the existing
polymer chain.
Luisi’s group has reported on a system involving a racemic
mixture of amino acids, namely, penta-deuterated NCA-L Trp
and normal NCA-D Trp (the L and D tags refer to left- and righthanded forms of the monomer, respectively); this combination
enables the various copolymers to be distinguished according
to the amount of L monomer units in each copolymer. Typically,
copolymer lengths, N, are between 3 and 10, though polymers
of lengths up to 29 are also observed. Some of the experimental
systems include an added lipid, 1-palmitoyl-2-oleoyl-sn-glycero3-phosphorylcholine (POPC), as well as the amino acid Trp.
The effect of this is to favor the formation of n-mer with specific
n values. Using liquid chromatography/mass spectrometry (
selected ion monitoring method) they obtained plots of the
relative frequency of polymers of the form DiLN-i against i for
i ) 0, 1, ..., N. Blocher et al.4 report the results for 7-mers and
10-mers. Taking into account the order of monomers in a
sequence of length N ) 10, there are 210 ) 1024 different
Sequence Selection during Copolymerization
Figure 5. Stereoisomers of length n are categorized by the number of
left- (L) and right-handed (D) monomeric units of which they consist
and are denoted by DpLq with p + q ) n. The graphs show the relative
abundances of DpLq stereoisomer groups of the oligo-Trp n-mers (n )
7 and 10, respectively) obtained after two racemic NCA-Trp feedings
(a) in the absence (n ) 7) and (b) in the presence of POPC liposomes
(n ) 10). The relative abundances of the DpLq stereoisomer groups (light
gray columns) are mean values of three measurements. Standard
deviations are given as error bars. The white columns correspond to a
theoretical distribution, based on the assumption of purely statistical
oligomerization. Reprinted from ref 4. (Reproduced from HelVetica
Chimica Acta with permission.)
Figure 6. Fit of our model to the experimental data of Figure 5. The
numbers on the horizontal axis correspond to the number of L monomers
in the polymer; thus 0 f D10, 1 f D9L1, 2 f D8L2, ..., 10 f L10. The
relative abundance is plotted on the vertical axis as a percentage of the
total number of decamers. (See text for details of parameter values.)
sequences; however, neither the experimental technique nor our
model is detailed enough to distinguish all possible chain
sequences. Instead we use a categorization into the 11 different
sequence classes DiLN-i (for 0 e i e 10). The experimental
results of Blocher et al.4 are displayed in Figure 5, and our fit
of the distribution of decamers is illustrated in Figure 6.
In our fit of the experimental data, for each type of polymer
sequence, five vertical bars are shown: The leftmost and
rightmost correspond to the experimental results. The second
bar indicates the results from a simple binomial fit, which, in
our model, corresponds to the random case. The fourth bar is a
fit to the general case with R, β, γ, and δ chosen to provide the
best possible fit to the experimental data. The third bar shows
the “symmetric fit” where β and δ are chosen such that β ) R
J. Phys. Chem. B, Vol. 111, No. 32, 2007 9555
and δ ) γ so as to satisfy the symmetry conditions of the chiral
monomers, and then R and γ are chosen to give the best possible
fit. This case assumes that the rate of addition of an L monomer
to a chain ending in an L is the same as a D monomer being
added to a chain currently ending in an D and that the rate of
addition of a D monomer to a chain terminated with an L is
identical to the rate of addition of L to a D-terminated chain.
Since all of our parameters (R, β, γ, and δ) correspond to
rates of monomer addition and only the ratio of rates is relevant
to the steady-state concentrations, without loss of generality,
we can choose R + β ) 2. In the symmetric fit this implies
that R ) β ) 1, and fitting γ ) δ simply involves varying one
parameter. A fully general fit involves varying three parameters,
γ, δ, and the difference R - β.
The symmetric fit to the decamer results shown in Figure 5
yields γ ) δ ) 0.351 and is displayed by the central bar in
each group in Figure 6. To help assess the relative accuracy of
our predictions, we define the root-mean-square error rms )
expt
th
expt
- ci,N-i
)2 where ci,N-i
is the experimentally measured
∑i(ci,N-i
th
)
value of the concentration of the N-mer DiLN-i and ci,N-i
A
B
ci,N-i + ci,N-i is the predicted value of ci,N-i from our theory, in
which cA1,0 and cB0,1 are the concentrations of the D and L
monomers, respectively. For the symmetric fit, we have rms )
10.41.
A fully general fit of the experimental data yields R ) 1.1266,
β ) 0.8734, γ ) 0.421, and δ ) 0.289. Note that the average
value of γ,δ under this more general fitting is 0.355, which
differs by only 1% from the constrained, symmetric fit in which
R ) β and γ ) δ. The difference in R,β is more significant.
This general fit gives an error of rms ) 6.98, which is only a
moderate improvement over the symmetric fit of rms ) 10.41.
We should expect some improvement since in the symmetric
fit only one parameter is being altered, whereas three are fitted
for the fully general model. An alternative measure of the fit is
th
expt 2 th
the χ2 statistic, defined by χ2 ) ∑i(ci,N-i
- ci,N-i
) /ci,N-i; the
2
lower the value of χ , the better the fit to the data. The χ2 statistic
assumes that deviations from the expected results are normally
distributed. For the symmetric fit we find χsym2 ) 1.498, and
for the general fit we have χgen2 ) 1.615. The 95% confidence
interval allows values of χ2 up to χ2 ) 18.3 for a symmetric fit
and up to χ2 ) 15.5 for the general fit;1 if we found χ2 values
above these limits, then the model would be rejected as not a
good fit to the data. The 1% confidence interval allows χ2 values
up to 1.65 for the general fit and up to 2.56 for the symmetric
fit. We should be suspicious of values of χ2 much smaller than
this since it is rare that agreement with experimental measurements is so precise. The symmetric fit is a 9 degrees of freedom
(dof’s) problem since we have 11 items of data and have fitted
only one parameter and scaled the data so that its total is 100%;
the general fit is a 7 degrees of freedom problem (subtract 3
fitted parameters and the scaling to 100% from 11 items of data).
Using the fitted values of R, β, γ, and δ, we have calculated
the average sequence length and information content (Shannon
entropy) of the system using eqs 6.8, 6.9, and 6.18. These data
are given in Table 1 wherein we see that the change from a
symmetric fit to a general fit makes no difference to the expected
sequence length. The general fit, however, has a slightly lower
information content than the symmetric fit.
A more detailed analysis of liposome assisted polycondensation of NCA amino acid systems is given by Hitz et al.16 from
which the experimental results of Figure 7 are reproduced. They
find that homochiral copolymers are over-represented when
compared to a binomial distribution and that the overrepresentation is more extreme at longer polymer lengths.
9556 J. Phys. Chem. B, Vol. 111, No. 32, 2007
Wattis and Coveney
TABLE 1: Summary of the Data Arising from a Numerical
Fit of the Experimental Results to the Models Proposed and
Analyzed Hereina
system, fit
parameters
decamer
(Figure 6)
symmetric fit
R)β)1
decamer
(Figure 6)
general fit
R ) 1.1266,
β ) 0.8734
γ ) 0.421,
δ ) 0.289
χ2 data
other data
χ2 ) 1.498 rms ) 10.41
γ ) δ ) 0.351 χ95%,9dof2 ) 18.3
χ1%,9dof2 ) 2.56
SL ) 3.85
H ) 1.8265
χ2 ) 1.615 rms ) 6.98
χ95%,7dof2 ) 15.5
χ1%,7dof2 ) 1.65
SL ) 3.85
H ) 1.8245
trimer (Figure 8) R ) β ) 1
χ2 ) 0.818 rms ) 18.
symmetric fit
γ ) δ ) 0.507 χ95%,3dof2 ) 7.81
SL ) 2.97
χ5%,3dof2 ) 0.35
H ) 1.9214
trimer (Figure 8) R ) 0.901,
β ) 1.099
general fit
γ ) 0.367,
δ ) 0.619
hexamer
(Figure 9)
symmetric fit
R)β)1
hexamer
(Figure 9)
general fit
R ) 0.898,
β ) 1.102
γ ) 0.439,
γ ) 0.439
χ2
) 0.15
χ95%,1dof2 ) 3.84
χ5%,1dof2 ) 0.004
χ2
rms ) 3.859
Figure 8. Fit of our model to the experimental data of Figure 7. The
relative abundance is plotted on the vertical axis as a percentage of the
total number of trimers. See text for details of parameter values.
SL ) 3.12
H ) 1.8928
) 1.398 rms ) 18.94
γ ) δ ) 0.535 χ95%,6dof2 ) 12.59
χ5%,6dof2 ) 1.635
SL ) 2.87
H ) 1.9328
χ2 ) 1.109 rms ) 12.39
χ95%,4dof2 ) 9.49
χ5%,4dof2 ) 0.711
SL ) 2.89
H ) 1.9272
In addition to quoting the best fit values for the parameters R, β,
γ, and δ, we summarize the χ2 results, the root-mean-square error (rms),
the average sequence length (SL), and the information content (Shannon
entropy, H).
a
Figure 7. Stereoisomers of length n are categorized by the number of
left- (L) and right-handed (D) monomeric units of which they consist
and are denoted by DpLq with p + q ) n. Relative abundances for the
DpLq stereoisomer groups of the oligo-Leu n-mers are shown for one
racemic NCA-Leu feeding in the absence of POPC liposomes (eqs A1A4). The gray columns represent the relative abundances of the DpLq
stereoisomer groups and show the mean values of three measurements.
Standard deviations are given as error bars. The black columns
correspond to a theoretical distribution, based on the assumption of
purely statistical (Bernoullian) oligomerization. Reprinted from ref 16.
Copyright 2001 American Chemical Society.
Although their main study is on NCA-Trp, they also report
similar results from experiments on NCA-Ile and NCA-Leu
systems. By repeating experiments at a range of temperatures
between 5 and 50 °C, they also show that the effect is
independent of the state of the added (POPC) lipid. Fits to
distributions of trimers and hexamers are illustrated in Figures
Figure 9. Fit of our model to the experimental data of Figure 7. The
relative abundance is plotted on the vertical axis as a percentage of the
total number of hexamers. See text for details of parameter values.
8 and 9. As with Figure 6, the first and fifth bars correspond to
the experimental results, the second to the “random”, or binomial
expectation, the third to a “symmetric” fit of our model (allowing
γ ) δ to be varied and β ) R ) 1 held fixed), and the fourth
bar to a general fit (allowing γ, δ, and R - β to be varied).
For the trimer results quoted in Figure 7 (panel A1), we
provide a best fit using our model in Figure 8. The symmetric
fit gives γ ) δ ) 0.507 (in addition to R ) β ) 1), giving a
residual error of rms ) 18. The fully general fit gives γ ) 0.367
and δ ) 0.619 together with R ) 0.901 and β ) 1.099, (rates
scaled so that R + β ) 2). The average value of γ and δ is
0.493, which differs from the symmetric fit of 0.507 by only
3%. This fully general fit gives a residue of rms ) 3.859, which
is a considerable improvement on the symmetric fit. This
improvement is also demonstrated by the χ2 values of χsym2 )
0.818 and χgen2 ) 0.150. The 95% confidence intervals for these
quantities are χsym2 ) 7.81 (3 dof’s) and χgen2 ) 3.84 (1 dof),
and the 5% confidence intervals are 0.35 (3 dof’s) for the
symmetric case and 0.004 (1 dof) for the asymmetric case. As
with our results on the decamers, both fits indicate that the
homopolymeric addition rates (A + A and B + B, namely, R
and β) are about twice that of the heteropolymer rates (γ and δ
for A + B and B + A, respectively).
As in the previously analyzed experimental data, the general
fit leads to a lower Shannon Entropy (H) than the symmetric
fit. In this case the results of the average sequence length (SL)
are higher for the general fit than the symmetric fit; however,
since this data is from trimers, we expect end effects to dominate
Sequence Selection during Copolymerization
the experimental data and the theoretical predictions to be
inaccurate since they are based on long polymers.
A symmetric fit of the hexamer data in Figure 7 (panel A4)
leads to γ ) δ ) 0.535 together with R ) β ) 1 and rms )
18.94. A fully general fit yields R ) 0.898, β ) 1.102, γ )
0.439, and δ ) 0.635. This gives an average heteropolymer
addition rate of 1/2(γ + δ) ) 0.537, which is extremely close
to the symmetric fit of 0.535. However, in this fit, all parameters
differ from the symmetric fit by about 0.1, which is a relative
change of 10% for the homopolymer addition rates and 19%
for the heteropolymer addition rates. Although these changes
are quite high, the more general fit only gives a marginal
improvement in root-mean-square error, down from rms ) 18.94
to rms ) 12.39; this behavior is reflected in the χ2 statistics,
since χsym2 ) 1.398 and χgen2 ) 1.109. Such a reduction is not
so great as the reduction in the 95% confidence limits from
χsym2 ) 12.59 (6 dof’s) to χgen2 ) 9.49 (4 dof’s); the 5%
confidence intervals are 1.635 (6 dof’s) and 0.711 (4 dof’s).
Hence, overall, there is no great motivation here to relax our
assumption that the system should be symmetric in R ) β and
γ ) δ. The average sequence length is almost identical for both
fits, and the information content varies very little, again being
slightly lower for the general fit.
The terminology, as used by Hitz et al.,16 is that a zerothorder Markov process is one in which the rate of attachment of
monomers to the end of a growing polymer does not depend
on the terminating unit of the polymer. A first-order Markov
process is one in which the rate of addition depends upon the
terminating unit of the polymer, and a second-order Markov
process in one in which the rate of addition depends upon the
penultimate as well as the terminating units of the polymer. It
is not clear whether these authors allow the rates to depend, in
addition, on the monomeric unit to be added, as we do in our
model. Hitz et al.16 claim that their results show that the
copolymerization process is at least a second-order Markov
process. This argument is entirely qualitative with no supporting
calculations quoted. However, we believe our model, which
describes the copolymerization process strictly as a first-order
Markov process, provides a perfectly acceptable fit of the data;
there is no need to invoke higher-order Markov processes. This
may be due to our model allowing the growth rate of the chains
to depend upon the monomer being added as well as the
currently terminating unit of the chain.
8. Conclusions
The theoretical analysis of copolymerization presented herein
gives a more detailed description than previous analyses of
copolymerization. We have constructed a model for copolymerization that retains some important kinetic aspects of chain
sequence and composition (section 2). The mathematical model
is based on the Becker-Döring system of equations for cluster
growth. In particular, by considering the terminating unit and
overall composition of each polymer sequence, we are able to
investigate the effects that each of the four rate constants and
the two monomer concentrations have on: (i) the overall
dynamics of the total number of polymer molecules, which
grows linearly in time, (ii) the total mass of material in
polymeric form, which grows quadratically in time, and (iii)
the average polymer length, which grows linearly in time.
Three special cases are considered in more detail. For the
system in which monomers are identical and a chain’s growth
rate does not depend on its terminating unit, there is no
preference for monomer attachment built into the model and
we observe a distribution with a broad spread of compositions.
J. Phys. Chem. B, Vol. 111, No. 32, 2007 9557
In the case where A and B monomers are more likely to attach
to polymers ending with the opposite unit, a strongly peaked
narrow distribution of polymers is produced. In the opposite
extreme, where monomers are most likely to attach to polymers
with the same terminating unit we find that homopolymers
dominate at shorter polymer lengths, but as the length of
polymer increases, so does the likelihood of a heteropolymeric
step, and so for extremely long polymers, the distribution is
dominated by heteropolymers, though the distribution is spread
extremely diffusely.
Variations in the rates of monomer attachment to polymers
terminating with each species of monomer can be cumulative
and collectively have a larger effect than any one perturbation
would suggest. The most obvious examples are the strongly
homopolymer- and strongly heteropolymer-dominated systems.
We also calculated the Shannon entropy or information
content (section 6) for each of our example scenarios. We find
that the amount of uncertainty, or information content, in the
homopolymer-dominated case varies with the relative strength
of the rate coefficients. If β > R and γ > δ, then the rate at
which B’s are added to polymer chains outstrips the A addition
rate, and the system becomes swamped with B’s. In such a
system there would be little uncertainty. At best the uncertainty
is H ) 1, which matches the uncertainty in the heteropolymerdominated system. When all additions are equally likely (the
“random” case where R ) β ) γ ) δ), the information content
is maximized at H ) 2. Thus with “random addition” of
monomers, the information storage capacity of the sequences
is maximized; this, however, corresponds to a low run length
(with a mean of 2 (eq 6.19) but a relatively large standard
deviation of σ ) x2 (eq 6.20)); hence little effect of homochirality would be observed in such systems. These results
should be compared with the run lengths of the homopolymerdominated system where the run length is large and the standard
deviation is also large and of a similar magnitude (see eq 6.25
and the following text) and the heteropolymer-dominated system
in which the average run length is very slightly above unity
and the standard deviation is very small (see subsection 6.4).
The experimental scenarios that we have interpreted using our
model show average sequence lengths in the range of 3-4,
which is greater than those for the purely random system, but
not really large enough to be classified as homopolymerdominated. Similarly, the information content of the experimentally observed polymers is, uniformly across all systems
considered, about H ) 1.9, which is very high and close the
“randomly polymerizing system” analyzed in sections 3 and 6.3
where H ) 2, the highest content of all special cases that we
have considered here.
Finally, we have been able to successfully fit our models to
the experimental results from Luisi’s group4,16 on chiral copolymerization, whether this be in a system containing chiral
monomers or achiral monomers which form a chiral structure
on polymerizing. We show that the over-representation of
homopolymers found experimentally can be explained by our
model, which relies only on first-order Markov processes.
Appendix A: Solution of the Full Temporal
Copolymerization Rate Equations
In subsection 2.1, our aim is to solve the system of rate
equations for the macroscopic quantities, namely, the total
numbers of polymers of each type (those ending in an A or a
B unit) and the total mass involved in each of polymer. The
theory below makes some use of the generating functions that
are used to solve the full microscopic description of the system
9558 J. Phys. Chem. B, Vol. 111, No. 32, 2007
Wattis and Coveney
and is detailed in Appendix B. Using eq 2.18, eq 2.15 can be
solved yielding
NA(t) )
δ[(R + γ)cA1,0 + (β + δ)cB0,1]t
+ NA(0) e
(γ + δ)
[δ(γ + δ)N(0) + γ(R + γ)cA1,0 -
-(γ+δ)t
(1 - e-(γ+δ)t)
δ(β + δ)cB0,1]
NB(t) )
(γ + δ)2
(A1)
+ NB(0) e-(γ+δ)t +
(γ + δ)
[γ(γ + δ)N(0) - γ(R + γ)cA1,0 +
(1 - e-(γ+δ)t)
(γ + δ)2
)( )
(
(R + γ)cA1,0 + RNA + δNB
0
+
γ[(R + γ)cA1,0 + (β + δ)cB0,1]t
δ(β + δ)cB0,1]
() (
F̆AA
FAA
-γ δ
+
B )
γ -δ FBA
F̆A
() (
)( )
F̆AB
FAB
-γ δ
+
B )
γ -δ FBB
F̆B
(
Φ)
F̆A ) -GA,x(0, 0, t) - GB,x(0, 0, t) ) RNA + δNB +
(R + γ)cA1,0 (A3)
F̆B ) -GA,y(0, 0, t) - GB,y(0, 0, t) ) γNA + βNB +
FB )
+
2(γ + δ)
((R + γ)cA1,0 + ΓA)t + FA(0) - [ΓA - (RNA(0) +
(1 - e-(γ+δ)t)
(A5)
(γ + δ)
(A11)
(
)
(A12)
(
)
(A13)
[(β + δ)c0,1B + γNA + βNB]
1
γ + δ δ[(β + δ)c0,1B + βNB + γNA]e(γ+δ)t
A1.1. Random System. At the start of section 3, we quoted
results describing the evolution of the mass of polymeric
material at large times. This arises by solving eqs A9 and A10
using the theory above in the special case where R, β, γ, and δ
are all equal. Setting R ) k(1 + R), β ) k(1 + β), γ ) k(1 +
γ), δ ) k(1 + δ), with X , 1 for all X, we find
1
FAA ≈ ((2 + 2R + 2δ)cA1,0 +
4
(2 + R + β - γ + 3δ)cB0,1)t2 (A14)
γ(β + δ)[(R + γ)cA1,0 + (β + δ)cB0,1]t2
+
2(γ + δ)
((β + δ)cB0,1 + ΓB)t + FB(0) - [ΓB - (γNA(0) +
βNB(0))]
)
and the solution of eq A10 yields
we find
δNB(0))]
(
δ e-(γ+δ)t
γ -e-(γ+δ)t
[(R + γ)cA1,0 + RNA + δNB]
1
v3 A )
γ + δ γ[(R + γ)cA1,0 + RNA + δNB]e(γ+δ)t
v3 B )
δ(R + γ)[(R + γ)cA1,0 + (β + δ)cB0,1]t2
(A10)
Solving eq A9 we find
(β + δ)cB0,1 (A4)
FA )
)
(A9)
Such systems (R4 ) MR + F) are solved by finding a
fundamental matrix (Φ) that solves Φ̇ ) MΦ and then
substituting R ) Φv. We thus find v3 ) Φ-1F, which, in our
case, leads to
(A2)
These quantities enable us to solve for the masses in the
system (eqs B3 and B4). From
0
(β + δ)cB0,1 + βNB + γNA
)
(1 - e-(γ+δ)t)
(A6)
(γ + δ)
1
FBA ≈ ((2 + 2R + 2γ)cA1,0 +
4
(2 + R + β + γ + δ)cB0,1)t2 (A15)
where
ΓA )
ΓB )
δ(R + γ)(γ + δ)N(0) +
γ(R - δ)(R + γ)cA1,0 +
δ(δ - R)(β + δ)cB0,1
(γ + δ)2
γ(β + δ)(γ + δ)N(0) +
γ(γ - β)(R + γ)cA1,0 +
δ(β - γ)(β + δ)cB0,1
(γ + δ)2
1
FAB ≈ ((2 + R + β + γ + δ)cA1,0 +
4
(2 + 2β + 2δ)cB0,1)t2 (A16)
(A7)
1
FBB ≈ ((2 + R + β + 3γ - δ)cA1,0 +
4
(2 + 2β + 2γ)cB0,1)t2 (A17)
(A8)
A1. Partial Densities. Using eqs B3-B6 and B10-B14, we
can find equations governing the evolution of the densities FAA,
FBA, FAB , and FBB. These are
A1.2. Heteropolymer-Dominated system. Subsection 4.1 summarizes the temporal evolution of macroscopic quantities in the
case where the preferred mode of growth is for chains with
alternating monomer units; mathematically this corresponds to
R,β , γ,δ. At large times the partial densities grow according
to
Sequence Selection during Copolymerization
FAA ≈ FAB ≈
FBA
≈
FBB
≈
J. Phys. Chem. B, Vol. 111, No. 32, 2007 9559
γδ2(γcA1,0 + δcB0,1)t2
which are generating functions describing the evolution of the
pure homopolymers composed of only A or only B monomers;
2(γ + δ)2
γ2δ(γcA1,0 + δcB0,1)t2
2(γ + δ)2
(A18)
A1.3. Homopolymer-Dominated System. In subsection 5.1,
the system dominated by like-like polymerization was analysed;
this corresponds to R,β . γ,δ. At large times, the solution of
eqs A9 and A10 using the method described above leads to the
results that the partial densities grow according to
1
FAA(t) ≈ FAA(0) + R(cA1,0 + NA(0))t + R2cA1,0t2 (A19)
2
FBB(t)
≈
FBB(0)
+
β(cB1,0
∑
∑ cr,sA (t) e-rx-sy
r)1 s)0
GB(x,y,t) )
∑
∑ cr,sB (t) e-rx-sy
r)0 s)1
∞
(B1)
NA ) GA(0,0,t) NB ) GB(0,0,t) N ) NA + NB
(B2)
FAA ) -
∂GA
∂GA
FA ) F ) FAA + FAB
|
|
∂x 0,0 B
∂y 0,0 A
(B3)
FBA ) -
∂GB
∂GB
|0,0 FBB ) |
F ) FAB + FBB
∂x
∂y 0,0 B
(B4)
These are all useful quantities for characterizing the behavior
of the system. For the ensuing analysis, we decompose both of
the generating functions GA and GB into three parts, one
corresponding to each of the three equations listed in eqs 2.62.11
GA(x,y,t) ) HA(x,t) + e-xDA(y,t) + CA(x,y,t)
(B5)
GB(x,y,t) ) HB(y,t) + e-yDB(x,t) + CB(x,y,t)
(B6)
Here we have defined
∞
A
cr,0
(t) e-rx
∑
r)1
∞
(B8)
∞
∞
∞
∞
A
cr,s
(t) e-rx-sy
∑
∑
r)2 s)1
B
cr,s
(t) e-rx-sy
∑
∑
r)1 s)2
(B9)
Translating the kinetic eqs 2.6-2.11 into these generating
functions we obtain
ḢA(x,t) ) cA1,0 e-x(R + γ) - (R + γ - R e-x)HA(x,t) (B10)
ḢB(y,t) ) cB0,1 e-y(β + δ) - (β + δ - β e-y)HB(y,t)
(B11)
e-x ḊA(y,t) ) δ e-x HB(y, t) - (R + γ)e-x DA(y,t)
(B12)
e-y ḊB(x,t) ) γ e-y HA(x,t) - (β + δ) e-y DB(x,t)
(B13)
∞
GA(x,y,t) )
the formulas that GA and GB satisfy depend on the conditions
imposed on the monomer concentrations. Even if GA(x,y,t) and
GB(x,y,t) cannot be explicitly found, equations for various
moments of the distribution can be constructed from them. For
example
HA(x,t) )
B
cr,1
(t) e-rx-y
∑
r)1
CB(x,y,t) )
We make use of generating functions to analyse the kinetics of the copolymerization process. From any such analysis we can find equations for the moments of the distributions.
The first stage of our analysis is to define the generating
functions
∞
e-y DB(x,t) )
CA(x,y,t) )
together with, at this order, FBA ) 0 ) FAB .
∞
A
c1,s
(t) e-x-sy
∑
s)1
describe the polymers that are almost pure, being composed of
sequences of the forms AnB or BnA; finally, generic polymers
are described by
1
+ NB(0))t + β2cB1,0t2 (A20)
2
Appendix B: General Solution of the Kinetic Eqs
2.6-2.11
∞
e-x DA(y,t) )
∞
HB(y, t) )
B
c0,s
(t) e-sy
∑
s)1
(B7)
( ) (
ĊA
R e-x - R - γ δ e-x
)
ĊB
γ e-y
β e-y - β - δ
(
)( )
CA
CB +
R e-2xDA + δ e-x-yDB
β e-2yDB + γ e-x-yDA
)
(B14)
Before analyzing the kinetics of the system in subsection B2,
including the kinetics of the quantities N, FA, and FB (subsection
2.1), we shall analyze the form of the steady-state solution to
which we expect the system to be attracted.
B1. Steady-State Solution. We solve eqs 2.6-2.11 for their
steady states by analyzing the latter equations first; eqs 2.10
and 2.11 imply
A
B
) R̂r-1cA1,0 c0,s
) β̂s-1cB0,1
cr,0
(B15)
which in turn imply
∞
Hss
A (x)
)
∑
r)1
Ass
cr,0
-rx
e
∞
Hss
B (y)
)
∑
s)1
Bss
c0,s
e
)
cA1,0 e-x
1 - R̂ e-x
-sy
)
cB0,1 e-y
1 - β̂ e-y
(B16)
where we use the superscript ss to denote steady state. Turning
B
A
now to eqs 2.8 and 2.9 we aim to solve for cr,1
and c1,s
,
obtaining
9560 J. Phys. Chem. B, Vol. 111, No. 32, 2007
(R +δ γ)β̂
γ
)(
R̂
β + δ)
A
c1,s
)
s-1 B
c0,1
B
cr,1
r-1 A
c1,0
Wattis and Coveney
(B17)
In terms of the generating functions, we have
∞
∑
s)1
Ass -x-sy
c1,s
e
) e-x Dss
A (y) )
∞
∑
r)1
Bss
cr,1
-rx-y
)e
e
-y
Dss
B (x)
)
δ e-x
Hss
A (y)
R+γ
γ e-y
β+δ
(B18)
Hss
A (x)
(B19)
We solve the rest of the system using generating functions
directly. Equation B14 implies that the steady-state solution is
given by
( )
Css
1
A
)
∆
Css
B
(
δ
e-x-y Dss
B
R+γ
γ
-2y ss
-x-2y ss
-x-y ss
e
β̂ e DB + (1 - R̂ - β̂)e
DB +
DA
β+
-2x-y ss
R̂e-2x Dss
DA +
A + (1 - R̂ - β̂)e
)
(B20)
Figure 10. Illustration of the propagation of the steady-state solution
of eq B30 into two-dimensional aggregation space (r,s). Assuming the
system is initiated with no polymers present, below the line the system
is at steady state (ψ ) 1), and above the line the concentrations are
zero (ψ ) 0).
treat the arguments of the functions ψA and ψB as continuous
variables, which allows us to approximate discrete differences
by Taylor series of the functions ψA and ψB; thus subscripts on
ψ denote derivatives. The functions ψA and ψB tend to unity in
the large-time limit, and following the results of ref 26, we
expect this to be via a travelling diffusive wave that propagates
from small (r,s) values to larger ones.
Returning now to the general system in eqs 2.6 and 2.7 and
noting that the steady state satisfies
A
A
B
(R + γ)cjr,s
) Rcjr-1,s
+ δcjr-1,s
where
∆(x,y) ) 1 - β̂ e-y - R̂ e-x + (R̂ + β̂ - 1)e-x-y
(B21)
∞
)
∆
∞
dp,q e-px-qy
∑
∑
p)0 q)0
p+q
R̂m-qβ̂m-p(1 - R̂ - β̂)p+q-mm!
m)max{p,q}
(m - p)!(m - q)!(p + q - m)!
∑
dp,q )
ψAt )
[
δ
R+γ
A
2cr,s
(B22)
ψBt )
R̂cB0,1dr-2,s-1 + (1 - R̂)(1 - β̂)×
r-2
cA1,0
B
)
cr,s
∑ R̂
r-p-2
p)0
γ
dp,s-1 +
δ
R+γ
s-2
cB0,1
∑ β̂
s-q-2
]
β̂cA1,0dr-1,s-2 + (1 - R̂)(1 - β̂)×
∑
∑
]
B2. Kinetics of the Approach to Steady State. At large
times, the system will approach its steady state via equations
that can be constructed using a continuum limit. To find this
we substitute
A
A A
B
B B
(t) ) cr,s
ψ (r,s,t) cr,s
(t) ) cr,s
ψ (r,s,t)
cr,s
+
B
δcr-1,s
A
2cr,s
ψBrr (B27)
B
2cr,s
ψBss +
A
γcr,s-1
B
2cr,s
ψAss (B28)
Assuming that, to the leading order, ψA ) ψB ) ψ, we obtain
the equations
β+δ
r-2
s-2
γ
R̂r-p-2dp,s - 2 + cB0,1
β̂s-q-2dr-1,q (B24)
cA1,0
β+δ
p)0
q)0
[
ψArr
A
B
A
γcr,s-1
βcr,s-1
γcr,s-1
A
B
B
(ψ
ψ
)
ψ
ψAs +
s
B
B
B
cr,s
cr,s
cr,s
B
βcr,s-1
dr-2,q (B23)
q)0
B
A
B
δcr-1,s
Rcr-1,s
δcr-1,s
B
A
A
(ψ
ψ
)
ψ
ψBr +
r
A
A
A
cr,s
cr,s
cr,s
A
Rcr-1,s
which implies that dp,0 ) R̂p, d0,q ) β̂q, and d0,0 ) 1. Ultimately,
we find
A
)
cr,s
(B26)
we next assume that at the leading order the kinetics can be
A
A A
B
B B
(t) ) cr,s
ψ (r,s,t) and cr,s
(t) ) cr,s
ψ (r,s,t). We
described by cr,s
then find
Expanding the inverse of this we find
1
B
B
A
(β + δ)cjr,s
) βcjr,s-1
+ γcjr,s-1
(B25)
A
B
where cr,s
and cr,s
are the steady states for concentrations of
chains ending in A and B, respectively (eqs B23 and B24). We
ψt ) -(R + γ)ψr ψt ) -(β + δ)ψs
(B29)
which together yield the solution
ψ ) Θ((R + γ)(β + δ)t - (β + δ)r - (R + γ)s)
(B30)
where Θ(‚‚‚) is the Heaviside function. So along the r-axis
(s ) 0) the front moves with a velocity (R + γ) so that its
position is given by r ≈ (R + γ)t, and along r ) 0 (the s-axis)
its position is given by s ≈ (β + δ)t, as illustrated in Figure 10.
The higher-order (second-derivative) terms, which determine
the shape of the wavefront ψ(r,s,t), are not as straightforwardly
solved for as in single-component systems,26 since at second
Sequence Selection during Copolymerization
J. Phys. Chem. B, Vol. 111, No. 32, 2007 9561
order there are differences between ψA and ψB that need to be
accounted for.
Assuming that the system starts with initial data in which
there are no polymers present, the evolution proceeds via the
motion of a diffusive wave as illustrated in Figure 10. Thus, to
A
with r,s . 1, for example, one
form a polymer of the form cr,s
has to wait a time t ) r/(R + γ) + s/(β + δ).
B3. Asymptotes for a Random System. The “random”
system corresponds to the case where we have R ) β ) γ ) δ
) 1. This implies that Det ≈ 1 - 1/2e-x - 1/2e-y, and somewhat
more simply than eq B22, we find
dp,q )
(p + q)!
p+q
2
p!q!
≈ exp
(
)
-(p - q)2
2(p + q)
(B31)
where the asymptotic approximation is valid for large p and q
and is obtained by using Stirling’s approximation for the
factorial function.1
B3.1. Temporal BehaVior of Macroscopic Quantities. Largetime asymptotics are given by NA ) NB ) t and FA ) FB )
(cA1,0 + cB0,1)t2. For the O() terms we have
1
1
NA(t) ≈ 1 + R + δ cA1,0 +
2
2
1
1
1 + β - γ + δ cB0,1 t (B32)
2
2
[(
)
(
[(
)
(
) ]
1
1
F (t) ≈ [(1 + + + )c +
2
2
(1 + 21 + 21 + )c ]t
1
1
F (t) ≈ [(1 + + + )c +
2
2
(1 + + 21 + 21 )c ]t
R
γ
δ
A
1,0
R
B
R
β
A
cr,r
(t) ≈
δ B
c H(Vt - r - r0)
γ 0,1
B
cs,s
(t) ≈
γ A
c H(Vt - s - s0)
δ 1,0
γ
β
δ
B 2
0,1
(B34)
δ
B 2
0,1
(B35)
A
1,0
β
γ
B
(t) ≈ cB0,1H(Vt - s - s1)
cs,s+1
(B36)
In the main body of the text (section 3), we simply consider
the leading order terms, that is, with R ) β ) γ ) δ ) 0.
B3.2. Large-Time Kinetics. Following subsection B2 we
A
assume that at large times the solution is given by cr,s
(t) )
A A
B
B B
cr,sψ (r,s,t) and cr,s(t) ) cr,sψ (r,s,t) with the time-independent
A
B
and cr,s
defined by eqs 3.1 and 3.2. Substituting R )
states cr,s
β ) γ ) δ into eq B30 we obtain ψ ) Θ(2Rt - r - s); thus,
at large times, the part of the system satisfying r + s < 2Rt is
in the steady state.
In this case ψA ) ψB ) ψ, so eqs B27 and B28 are identical
and equal to
ψt ) -2ψr + ψrr ) -2ψs + ψss
(B39)
This leads to a system of four simultaneous equations (for r0,
r1, s0, and s1), with the consistency condition giving
V)
γδ
γ+δ
(B40)
This agrees with the general theory outlined in subsection B2
and eq B30 in particular. The result (eq B40) can be checked
A
A
B
by calculating NA(t) ) ∑r(cr,r
+ cr+1,r
), NB(t) ) ∑s(cs,s
+
B
cs,s+1), FA(t), and FB(t) in two ways. From eqs B38-B40 we
have NA ≈ Vt(cA1,0 + δcB0,1/γ), NB ≈ Vt(cB0,1 + γcA1,0/δ), FA ≈
V2t2(cA1,0 + δcB0,1/γ)/2, and FB ≈ FA. Comparing these results for
NA, NB, FA, and FB with the eqs 4.1 and 4.2, we find perfect
agreement.
B5. Asymptotes for a Homopolymer-Dominated System.
Here both R̂ and β̂ are 1 + O() with , 1. To the leading
order, the form of dp,q is given by Det ≈ (1 - e-x)(1 - e-y),
which implies dp,q ) 1.
Including higher-order corrections we find
(
)
β e-y
R e-x
1
1
1
)
1
+
Det 1 - e-x 1 - e-y
1 - e-y 1 - e-x
The average polymer length, L ) F/N, is thus given by
1
L ) t + (R + β + γ + δ)t
4
(B38)
A
(t) ≈ cA1,0H(Vt - r - r1)
cr+1,r
) ]
1
1
NB(t) ≈ 1 + R + γ - δ cA1,0 +
2
2
1
1
1 + β + γ cB0,1 t (B33)
2
2
A
B4. Asymptotes for a Heteropolymer-Dominated System.
In this case monomers are most likely to bind to chains that
terminate with the different species. Mathematically this corresponds to the rates satisfying γ,δ , R,β so that R̂,β̂ , 1.
The determinant and inverse (eq B22), which are vital to finding
the steady-state solution, simplify considerably. We then have
Det ≈ 1 - e-x-y to the leading order, and so dp,q ) δp,q where
this δp,q is the Kronecker delta; this simplification leads to the
steady-state concentrations (eq 4.4).
Since there are only a few O(1) concentrations, we seek a
large-time solution of the form
(B37)
which gives ψ ) Θ(2t - r - s). Thus the waiting time from
A,B
the trivial initial data for the polymer cr,s
to reach steady state
is 1/2(r + s).
(B41)
where β ) 1 - β̂ and R ) 1 - R̂, which implies dp,q ) 1 +
pR + qβ.
B5.1. Large-Time Kinetics. In this case, we can compare the
results with known results,26 since to the leading order the
problem decouples into two single-component equations. The
single-component models predict a front propagation speed that
depends on the product of the constant monomer concentration
and the aggregation rate (in the absence of fragmentation). Thus
A
to progress at a
in the present case we expect the front in cr,0
B
rate equal to R and that in c0,s to progress at rate β.
In the case γ,δ , R,β, eq B30 simplifies to ψ ) H(Rβt βr - Rs), so along the r-axis (s ) 0) the wave front moves
according to r ≈ Rt as t f ∞, and along the s-axis (r ) 0), s
≈ βt. This agrees with our intuition that the system decouples
into two single-component systems for r,s ) O(1).
From our previous analysis of single-component systems we
expect the large-time asymptotic solution for pure homopolymers to be of the form
9562 J. Phys. Chem. B, Vol. 111, No. 32, 2007
( )
( )
Wattis and Coveney
r - Rt
1
cr,0 ) cA1,0 erfc
2
x2Rt
1
s - βt
c0,s ) cB0,1 erfc
2
x2βt
(B42)
Note that these solutions agree with the results of the earlier
section, since after time t they predict values of NA, NB, FA,
and FB that match the results quoted in eqs 5.1 and 5.2. In the
limit R,β . γ,δ in eqs B27 and B28, we have
1
ψAt ) -RψAr + RψArr
2
1
ψBt ) -βψBs + βψBss
2
(B43)
for s ) 0 and r ) 0 respectively, confirming the results of eq
A
B
B42. However, in r,s g 1, we have cr,s
) δcB0,1/(R + γ) and cr,s
A
) γc0,1/(β + δ).
For larger r,s values with r,s g 1 we have the solution
A
cr,s
δcB0,1
Θ(Rβt - βr - Rs)
≈
R+γ
B
≈
cr,s
δcA1,0
Θ(Rβt - βr - Rs)
β+δ
(B44)
Acknowledgment. We are grateful to Pier Luisi Luigi for
originally motivating us to address this problem and for
interesting discussions over many years. We also acknowledge
Grant No. GR/R94916/01 from the Engineering and Physical
Sciences Research Council (EPSRC) of the United Kingdom.
P.V.C. is grateful to the EPSRC for support under EPSRC
Reality Grid Grant No. GR/R67699.
References and Notes
(1) Handbook of Mathematical Functions; Abramowitz, M., Stegun,
I. A., Eds., Dover: New York, 1972.
(2) Allcock, H. R.; Lampe, F. W. Contemporary Polymer Chemistry,
2nd ed.; Prentice Hall: Englewood Cliffs, NJ, 1990.
(3) Becker, R.; Döring, W. Kinetische behandlung der keimbildung in
übersättigten dämpfern. Ann. Phys. 1935, 24, 719-752.
(4) Blocher, M.; Hitz, T.; Luisi, P. L. Stereoselectivity in the oligomerisation of racemic tryptophan N-carboxyanhydride (NCA-Trp) as
determined by isotope labeling and mass spectrometry. HelV. Chim. Acta
2001, 84, 842-848.
(5) Bolton, C. D.; Wattis, J. A. D. Model of size-templating in vesicle
formation I: A microscopic model and analysis. J. Phys. Chem. B 2003,
107, 7126-7134.
(6) Brilliantov, N. V.; Krapivsky, P. L. Non-scaling and source-induced
scaling behaviour in aggregation models of movable clusters. J. Phys. A:
Math. Gen. 1991, 24, 4787-4803.
(7) Campbell, D.; Pethrick, R. A.; White, J. R. Polymer Characterization, 2nd ed.; Stanley Thornes: Cheltenham, U. K., 2000.
(8) Carraher, C. E., Jr. Introduction to Polymer Chemistry; Taylor &
Francis: Boca Raton, FL, 2007.
(9) Carr, J.; Dunwell, R. M. Kinetics of cell surface capping. Appl.
Math. Lett. 1999, 12, 45-49.
(10) Coveney, P. V.; Wattis, J. A. D. A Becker-Döring model of selfreproducing vesicles. J. Chem Soc., Faraday Trans. 1998, 102, 233-246.
(11) Coveney, P. V.; Wattis, J. A. D. Coarse-graining and renormalisation group methods for the elucidation of the kinetics of complex
nucleation and growth processes. Mol. Phys. 2006, 104, 177-185.
(12) Denbigh, K. G.; Denbigh, J. S. Entropy in Relation to Incomplete
Knowledge; Cambridge University Press: Cambridge, U. K., 1985.
(13) Elias, H.-G. Macromolecules; Wiley-VCH: Weinheim, Germany,
2005.
(14) Flory, P. J. Principles of Polymer Chemistry; Cornell University
Press, Ithaca, NY, 1953.
(15) Garey, M. R.; Johnson, D. S. Computers and Intractability: A Guide
to the Theory of NP-Completeness; W.H. Freeman: San Francisco, 1979.
(16) Hitz, T.; Blocher, M.; Walde, P.; Luisi, P. L. Stereoselectivity
aspects in the condensation of racemic NCA-amino acids in the presence
and absence of liposomes. Macromolecules 2001, 34, 2443-2449.
(17) King, J. R.; Wattis, J. A. D. Asymptotic solution of the BeckerDöring equations with size-dependent rate coefficients. J. Phys. A: Math.
Gen. 2002, 35, 1357-1380.
(18) Reza, F. M. An Introduction to Information Theory; Dover: New
York, 1994.
(19) Sandars, P. G. H. A toy model for the generation of homochirality
during polymerisation. Origins Life EVol. Biosphere 2003, 33, 575-587.
(20) Wattis, J. A. D. An introduction to the fundamental mathematical
models of coagulation-fragmentation processes. Physica D 2006, 222,
1-20.
(21) Wattis, J. A. D. A Becker-Döring model of competitive nucleation.
J. Phys. A: Math. Gen. 1999, 32, 8755-8784.
(22) Wattis, J. A. D. Exact solutions for multi-component coagulation:
cluster-growth with evolving size and shape profiles. J. Phys. A: Math.
Gen. 2006, 39, 7283-7298.
(23) Wattis, J. A. D.; Coveney, P. V. The origin of the RNA world: A
kinetic model. J. Phys. Chem. B 1999, 103, 4231-4250.
(24) Wattis, J. A. D.; Coveney, P. V. Analysis of symmetry-breaking
in chiral polymerisation kinetics. Origins Life EVol. Biosphere 2005, 35,
243-273.
(25) Wattis, J. A. D.; Coveney, P. V. Chiral polymerisation and the RNA
world. Int. J. Astrobiol. 2005, 4, 63-73.
(26) Wattis, J. A. D.; King, J. R. Asymptotic solution of the BeckerDöring equations. J. Phys. A: Math. Gen. 1998, 31, 7169-7189.
(27) Wittcoff, H. A.; Reuben, B. G.; Plotkin, J. S. Industrial Organic
Chemicals; Wiley: New York, 2004.
(28) Wu, D. T. General approach to barrier crossing in multi-component
nucleation. J. Chem. Phys. 1993, 99, 1990-2000.