Calculation of pKas in RNA: On the Structural

J. Mol. Biol. (2007) 366, 1475–1496
doi:10.1016/j.jmb.2006.12.001
Calculation of pKas in RNA: On the Structural Origins
and Functional Roles of Protonated Nucleotides
Christopher L. Tang 1 , Emil Alexov 1 , Anna Marie Pyle 2
and Barry Honig 1 ⁎
1
Howard Hughes Medical
Institute, Center for
Computational Biology and
Bioinformatics, Department of
Biochemistry and Molecular
Biophysics, Columbia
University, 1130 St. Nicholas
Avenue, Room 815, New York,
NY 10032, USA
2
Department of Molecular
Biophysics and Biochemistry,
Howard Hughes Medical
Institute, Yale University,
New Haven, CT 06520, USA
pKa calculations based on the Poisson–Boltzmann equation have been
widely used to study proteins and, more recently, DNA. However, much
less attention has been paid to the calculation of pKa shifts in RNA. There
is accumulating evidence that protonated nucleotides can stabilize RNA
structure and participate in enzyme catalysis within ribozymes. Here, we
calculate the pKa shifts of nucleotides in RNA structures using numerical
solutions to the Poisson–Boltzmann equation. We find that significant
shifts are predicted for several nucleotides in two catalytic RNAs, the
hairpin ribozyme and the hepatitis delta virus ribozyme, and that the
shifts are likely to be related to their functions. We explore how different
structural environments shift the pKas of nucleotides from their solution
values. RNA structures appear to use two basic strategies to shift pKas:
(a) the formation of compact structural motifs with structurallyconserved, electrostatic interactions; and (b) the arrangement of the
phosphodiester backbone to focus negative electrostatic potential in
specific regions.
© 2006 Published by Elsevier Ltd.
*Corresponding author
Keywords: ribozyme; pseudoknot; pKa calculation; Poisson–Boltzmann
equation; RNA structure
Introduction
There is increasing evidence that ionized nucleotides play important roles in RNA structure and
function. Adenosine and cytidine can protonate on
their N1 and N3 atoms, respectively, but both are
poor bases and have pKas in solution that render
them neutral at pH 7 (their pKas in solution are 3.8
for adenosine and 4.3 for cytidine; Figure 1).1,2
Nevertheless, there have been numerous examples
where, based on the examination of crystal and
Present address: E. Alexov, Department of Physics and
Astronomy, Clemson University, Clemson, SC 29634,
USA.
Abbreviations used: BPH, branch point helix; LDZ,
lead-dependent ribozyme; BWYV, beet western yellows
virus; PEMV, pea enation mosaic virus; HDVR, hepatitis
delta virus ribozyme; LPB/NLPB, linear/non-linear
Poisson–Boltzmann equation; ESP, electrostatic potential;
MCCE, multi-conformation continuum electrostatics; MC,
Monte Carlo; RMSD, root-mean-square deviation.
E-mail address of the corresponding author:
[email protected]
0022-2836/$ - see front matter © 2006 Published by Elsevier Ltd.
solution structures,3–11 protonated nucleotides appear to be present in RNA, suggesting that their
pKas have been shifted upwards from their solution
values. Nucleotides with elevated pKas have been
implicated to play a direct role in ribozyme
catalysis.12,13 For example, many lines of biochemical and structural evidence suggest that the
hepatitis delta virus ribozyme (HDVR) and the
hairpin ribozyme, in particular, utilize protonated
nucleotides or nucleotides with elevated pKas to
achieve optimal activity.14–24 Protonated nucleotides
have been implicated in a wide variety of structures,
ranging from frameshifting pseudoknots to the
ribosome itself.25–27 Therefore, a central question is
whether nucleotides with shifted pKas play as
significant a role in RNA structure and function as
they often do in proteins, where pKa-shifted residues
affect protein stability,28 control conformational
changes,29 modulate binding to substrates,30 and
participate in catalytic mechanisms.31 Indeed, the
availability of protonated nucleotides would add to
the diversity of chemical groups that could be used
for function in RNA.32,33
Here, we seek to understand the structural
determinants of pKa shifts, relative to solution
values, of nucleotides in RNA using computational
1476
Figure 1. Adenosine and cytidine in their unprotonated (A, C) and protonated (A+, C+) states. Their solution
pKas are shown in parentheses.
methods. Our work builds on the extensive literature
that exists for calculating pKas in proteins,34–42 and,
more recently, in DNA.43 Most of these methods rely
on numerical solutions to the Poisson–Boltzmann
(PB) equation to obtain electrostatic contributions to
the pKa shift. The linear PB equation (LPB) has been
used in most applications in proteins but, given the
high charge density on RNA molecules, the
nonlinear PB equation (NLPB) is more appropriate.
The NLPB has been applied extensively to highly
charged systems such as acidic membranes and
nucleic acids. Despite the large charge densities of
highly charged molecules and the high mobile ion
densities that accumulate in their vicinity, the
predictions of the NLPB have been in remarkable
agreement with experiment in many cases. Examples include the salt-dependence of binding of
proteins and ligands to DNA,44,45 the salt and
membrane charge-dependence of the binding of
proteins and peptides to membrane surfaces,46
electrostatic potentials around DNA as measured
by EPR experiments,47 and the absolute magnitude
and salt-dependence of the pKa shift experienced
by a ligand that intercalates into DNA.48 The NLPB
has also successfully explained the binding isotherms of mixed ion species binding to DNA and
RNA, 49 the stoichiometry and free energy of
magnesium binding to DNA and RNA,50,51 and
the magnesium-dependence of RNA folding.52,53
In many of these cases the salt concentration
approaches 1 M, a region where the approximation
in traditional PB methods such as Debye-Huckel
theory are believed not to be valid. However, even
the linearization condition inherent in the LPB (eϕ/
kT << 1, where ϕ is the electrostatic potential)
actually improves at high salt, since the potentials
induced by a macromolecule become weaker as the
concentration of salt increases. As discussed by
Calculating pKas in RNA
Sharp & Honig,54 a more serious problem with
Debye-Huckel theory is that it chooses one mobile
ion as a fixed charge and all other mobile ions
become part of the ion atmosphere. This introduces
an artificial asymmetry into the problem that does
not exist when the macromolecule is assumed to be
fixed and the surrounding salt is treated as mobile.
Indeed, one would not normally think of a DNA
molecule as part of the ion atmosphere of a mobile
sodium ion. The lack of symmetry in macromolecular and colloidal solutions makes it possible to
define the electrostatic free energy uniquely within
the context of the NLPB, and it is the availability of
this formalism that enabled many of the applications
that followed.54
All PB methods ignore ion–ion correlation and
effects of ion size. However, these work in opposite
directions in the sense that ion–ion correlation
effects increase the local ion concentration, while
ion size effects tend to reduce the local ion
concentration by ensuring that two ions do not
approach each other too closely. This cancellation
may account, in part, for the fact that the NLPB
provides such an accurate description of the
dependence of electrostatic free energies on salt
concentration, as summarized in the previous
paragraph. Indeed, the NLPB underestimates ion
distributions around cylinders as obtained from
Monte Carlo simulations by only 10–15%, and the
effect on electrostatic free energies appears to be in
this range or smaller.55–58 This is the reason that the
NLPB has been applied so effectively to macromolecular systems, despite its well known approximations. Simply stated, the consequences of these
approximations, especially the effect on electrostatic
free energies involving monovalent ions, do not
appear to be severe. Their effect on solutions
containing divalent ions are almost certainly more
serious, although there have been few experimental
tests that have made it possible to determine the
magnitude of the problem.
Based on the past successes of applying the
NLPB to charged macromolecules, it seems reasonable to apply it to the calculation of pKas in RNA.
However, pKa calculations on systems with many
titratable groups require the use of methods that
take multiple ionization equilibria into account. As
will be discussed below, when a significant number
of nucleotides are involved, the existence of a large
number of ionization states can introduce computational complexity that, due to the lack of
additivity among individual terms, essentially
precludes the full use of the NLPB in pK a
calculations. In order to deal with this problem,
we introduce a method that uses solutions to the
LPB for which additivity holds, but then adds a
correction term that accounts approximately for
missing nonlinear contributions. The approximation is related to one used previously in the
treatment of the titration behavior of polylysine.59
Our calculations are based on a Monte Carlo
treatment of multiple ionization states. Specifically,
we use a modified version of the MCCE method that
1477
Calculating pKas in RNA
has been shown to be effective both for pKa
calculations in proteins60–62 and for the placement
of hydrogen in crystal structures.63 We extend the
MCCE method so that it is applicable for pKa
calculations in RNA. To this end, we report a new set
of atomic parameters for calculating electrostatic
potentials in RNA molecules containing protonated
nucleotides. Our approach is validated by testing its
ability to reproduce quantitatively pKas taken from
the literature. We address the role of shifted pKas in
RNA through an analysis of the branch-point
helix,64 the lead-dependent ribozyme,65 pseudoknots from the beet western yellows virus66 and
the pea enation mosaic plant virus,67 HDVR,68,69
and the hairpin ribozyme.70,71 For cases where
experimental data are available, the calculated pKa
shifts are in quite good agreement with experimental results. However, based on experience with
proteins, it is unlikely that all of the calculated
pKas will be in quantitative agreement with experimental values, primarily because conformational
changes that accompany changes in ionization state
are not taken into account. The consequences of
assuming a rigid RNA structure will be discussed
further below but the expectation is that calculated
shifts will be too large. However, the key result of
our analysis is not the magnitude of pKa shifts but
the identification of nucleotides that undergo
significant shifts and the determination of the
structural factors that lead to these shifts.
Our analysis reveals that nucleotides with
elevated pKas are often located at positions in
the structure where they contribute to hydrogen
bonds in their protonated states, and in regions of
the RNA that have been characterized to be
catalytically or functionally important. We find
also that several distinct features of RNA are
important for the occurrence of pKa shifts to
higher values. These include the abundance of
negatively charged phosphate groups near
pKa-shifted nucleotides and conserved interactions
with polar groups from adjacent nucleotides. In
addition, as is the case for proteins, the removal of
nucleotide groups from the solvent generally
favors pKa shifts of bases to lower values. A
comparison between C+GCA motifs in divergent
structures gives us a novel view of how these
motifs may be stabilized. Our analysis provides a
detailed picture of how structure influences pKa
shifts in RNA molecules.
spectroscopy for two RNAs. The first of these, the
branch-point helix (BPH), has a 21 nucleotide stem–
loop structure containing an internal asymmetric
loop (PDB ID 17ra).64 In the structure, consecutive
adenosine residues in the asymmetric loop, A6 and
A7, stack within the helix opposite a single uridine,
U16. The measured pKa of A7 is shifted to 6.1, while
the other adenosine residues in the structure have
pKa ≤ 5.5 (Table 1). Calculations were carried out
using ionic strengths that mimic experimental
conditions (e.g. 10 mM monovalent salt for
BPH).64 As can be seen from Table 1, there is a
striking agreement between the measured and
calculated values. The two nucleotides with the
highest and second highest measured pKa shifts (A7
and A13) were identified in the correct order and
were calculated to have pKas within 0.7 pKa unit of
the experiment. In addition, the calculated pKas of
nucleotides involved in Watson–Crick base-pairs are
depressed from their solution values, as normally
would be expected.
Results
All pKas were calculated from the LPB using non linear
correction as described in the text. The mean ± standard deviation of the calculated pKa values is given for the 12 low-energy
NMR structures for BPH (PDB ID 17ra) and the 25 low-energy
NMR structures for LDZ (PDB ID 1ldz). Secondary structure
interactions are annotated with one of the following types: wc,
Watson–Crick; A+U, protonated AU; or A+C, protonated AC.
Nucleotides with experimentally measured pKas are highlighted
in bold-face.
a
In an A+U pair, A:N1+ forms a hydrogen bond with U:O2 as
the acceptor.
b
In an A+C pair, A:N1+ forms a hydrogen bond with C:O2 as
the acceptor.
Calculation of nucleotide pKas in RNA
structures
Assessment of the accuracy of calculated pKas
by comparison with spectroscopically determined
values
In order to validate our approach, we compared
calculated pKas to those determined by NMR
Table 1. Comparison of calculated and spectroscopically
determined pKas
Nucleotide
Secondary SpectroscopicallyCalculated pKa
structure
determined pKa using NL correction
Branch-point helix (BPH) in 10 mM NaCl 64
C3
wc
A6
<5.0
6.1
A7
A+Ua
A10
<5.0
A13
5.5
C14
wc
C15
wc
A17
wc
<5.0
C20
wc
C21
wc
2.5 ± 0.7
2.5 ± 0.9
6.8 ± 0.8
1.7 ± 0.6
5.3 ± 0.4
3.5 ± 1.0
1.4 ± 0.8
2.7 ± 1.3
1.7 ± 0.8
2.1 ± 0.5
Lead-dependent ribozyme (LDZ) in 100 mM NaCl 65
C2
wc
A4
wc
≤3.1
C5
wc
+ b
C6
A C
A8
4.3 ± .3
C10
wc
C11
wc
A12
wc
≤3.1
C14
wc
A16
3.8 ± 0.4
A17
3.8 ± 0.4
A18
3.5 ± 0.6
6.5 ± 0.1
A25
A+C
C28
wc
C30
wc
2.1 ± 1.5
<3.0
3.0 ± 2.0
2.8 ± 2.4
4.9 ± 0.8
1.4 ± 1.5
3.7 ± 1.5
<3.0
4.6 ± 1.0
3.4 ± 1.1
2.4 ± 1.3
3.6 ± 0.9
7.3 ± 1.8
3.1 ± 0.7
5.0 ± 2.0
1478
Calculating pKas in RNA
Table 2. Salt-dependence of the pKa of A25 in leaddependent ribozyme
[NaCL]
(mM)
100
500
Spectroscopicallydetermined pKa
Calculated pKa
using NL
correction
Calculated pKa
using LPB alone
6.5 ± .1
5.9 ± .1
7.3 ± 1.8
6.6 ± 1.8
7.9 ± 1.8
6.8 ± 1.8
The pKa of A25 was calculated using 25 low-energy NMR
structures from PDB ID 1ldz under different salt conditions and
compared to experiment.73 Calculations were performed using
the nonlinear (NL) correction term and using the LPB alone.
The second structure, lead-dependent ribozyme
(LDZ), is a 30 nucleotide stem–loop that also
contains an internal asymmetric loop (PDB IDs
1ldz and 2ldz).72 The asymmetric loop contains a
protonated A+C pair, A25-C6, in which A25 displays a measured pKa of 6.5 ± 0.1 (Table 1). This loop
contains a non-canonical AG pair flanked by two
extrahelical guanosine nucleotides; these and all
other nucleotides are measured to have more typical
pKas of less than 4.3 (Table 1).65,73 We calculated
pKas for each nucleotide and averaged them across
the published set of 25 NMR conformers. The
nucleotides with the highest (A25) and second
highest (A8) measured pKas were identified in the
correct order and the calculated values were
accurate to within 0.8 pKa unit.
Our ability to account for salt effects was tested
by comparing the calculated pKa of A25 to
experimental measurements under two salt conditions. The experimentally determined pKa of A25
shifts from 6.5 to 5.9 upon changing the concentration of monovalent ion from 100 mM to 500 mM
(Table 2).65,73 The calculated shift, from 7.3 to 6.6, is
in excellent agreement with experiment, as are the
absolute values, which are within 0.8 pKa unit of
the experimental measurement. As can be seen in
Table 2, agreement with experiment is reduced
slightly if the MCCE procedure is used in
conjunction with the LPB. In all other cases we
have examined, the pKas reported by the LPB
method are one to two units larger than those
obtained from the NLPB, probably because the
effects of the ion atmosphere in screening interactions with phosphate groups is underestimated by
the LPB. Since even the NLPB calculations tend to
overestimate pKa shifts, the use of the LPB reduces
overall agreement between the calculated results
and experiment.
Identification of pKa-shifted nucleotides in
pseudoknots
The pKa calculations were carried out on pseudoknot structures from the beet western yellows
virus, BWYV-ψ, and the pea enation mosaic virus,
PEMV-ψ. BWYV-ψ and PEMV-ψ share a common
secondary structure topology composed of two
stems (S1 and S2) and two loops (L1 and L2), where
L1 interacts with the major groove of S2, and L2
interacts with the minor groove of S1. In both
pseudoknots, tertiary contacts between the L1 and
S2 form a C+GCA structural motif containing a
protonated cytidine (Figure 2). As can be seen in
the Figure, C+ GCA appears to be a recurring
structural motif that has been observed in
HDVR.67 The unfolding of both pseudoknots have
been shown to be highly pH-dependent, which has
been attributed to the cytidine in the C+GCA motif
on the basis of the proposed hydrogen bond
between the protonated cytidine N3 nitrogen
atom and the guanosine O6 oxygen atom in the
structure.74,75
The protonated cytidine in the C+GCA motif is
identified in the calculations as having the most
elevated pKa in both structures (Table 3). As shown
in Table 3, the pKa is calculated to be 13.7 for C8 in
BWYV-ψ and 10.6 for C10 in PEMV-ψ. The pHdependence of unfolding in BWYV-ψ and PEMV-ψ
have been measured to exhibit apparent pKas of
6.8–7.3 and 7.1, respectively.74,75 However, the
apparent pKas obtained from folding/unfolding
transitions do not correspond directly to the pKas
of individual groups. In the simplest case, where
only a single group controls titration behavior, the
apparent pKa corresponds to the midpoint between
the pKa of the titratable group in the two states
(folded and unfolded).40,76 If only a single group
determines the shape of the titration curves for
Figure 2. HDVR, BWYV-ψ and PEMV-ψ share a common C+GCA structural motif. Dotted lines indicate the hydrogen
bond network. A hydrogen bond between protonated N3 atom of cytidine and the O6 atom of guanosine is indicated by
the red arrow (forming a C+G [rh] interaction, also discussed in the legend to Table 3).
1479
Calculating pKas in RNA
Table 3. Comparison of calculated and apparent pKas of
unfolding
Nucleotide
Secondary
structure
Apparent pKa
of unfolding
Calculated pKa using
NL correction
BWYV-ψ in 100 mM NaCl, 10 mM MgCl274,75
C3
wc
C5
wc
6.8-7.3
C8
C+G[rh]a
A9
C10
wc
C11
wc
C14
wc
C15
wc
C17
wc
A20
O2′
A21
C22
A23
O2′
A24
AG
A25
O2′
C26
wc
<3.0
<3.0
13.7 ± 0.1
2.6 ± 0.1
<3.0
<3.0
<3.0
<3.0
<3.0
7.3 ± 0.1
4.6 ± 0.6
4.5 ± 0.1
n.r.
<3.0
6.1 ± 0.2
<3.0
PEMV-ψ in 100 mM NaCl 74,75
C5
wc
C6
wc
+
C10
C G[rh]
A12
C13
wc
C15
wc
C16
wc
A19
AU[s]b, wc
A21
A22
A23
C24
A25
A26
A27
O2′
C30
wc
A31
<3.0
<3.0
10.6 ± 1.1
4.4 ± 2.0
3.5 ± 3.4
5.7 ± 2.7
<3.0
7.4 ± 0.7
3.3 ± 1.4
3.8 ± 1.8
7.8 ± 2.3
4.9 ± 2.5
4.6 ± 1.9
<3.0
2.1 ± 1.5
<3.0
4.2 ± 1.0
7.1
pKas were calculated for each adenosine and cytidine nucleotide
in BWYV-ψ and PEMV-ψ. The mean ± standard deviation was
calculated for a set of four BWYV-ψ crystal structures (PDB ID
437d and 1l2x) and 15 low-energy NMR structures of PEMV-ψ
(PDB ID 1kpy). The apparent pKas of unfolding values (column 3)
are taken from the literature. 74,75 Secondary structure interactions
are annotated with types defined in Table 1 or from the following:
o2′, hydrogen-bonded to a 2′ hydroxyl group; C+G[rh], protonated cytidine interaction with guanosine along its Hoogsteen
edge; AG, AG mispair; or AU[s] sheared AU. For BWYV-ψ, the
calculated pKa of A23 is marked n.r. (not reported) because there
is a large discrepancy between the calculated values in the two
crystal structures. See the text for details.
a
See Figure 2 for examples of C+G[rh] pairs.
b
In a sheared AU pair, the U appears shifted into the minor
groove and U:O4 is within hydrogen bonding distance of A:N1,
where, if a hydrogen bond is formed, the latter should be
protonated.
BWYV-ψ and PEMV-ψ, then assuming a pKa of 4.3
for cytidine in the unfolded state would predict a
pKa of ∼10 for the cytidine in the folded state, which
is in good agreement with the calculated value for
PEMV-ψ. However, when multiple titration sites
influence the folding reaction, the titration behavior
becomes more complex and the experimental data
become more difficult to interpret. Moreover, if
residual secondary structure is present in the
unfolded state, then assuming a reference pKa of
4.3 would not be correct.
Moody et al. have provided a cogent discussion of
thermodynamic linkage relationships involved in
pH-dependent RNA folding and have discussed
conditions where large unfolded state pKas might be
expected.76 Their analysis highlights the difficulties
of assigning experimental pKas to cytidine in the
pseudoknots considered here. We can say with some
certainty that the relevant values are greater than the
apparent pKas so that they are likely to be above 7.5,
and perhaps significantly higher. Thus, the calculations are successful in identifying cytidine nucleotides that have undergone significant pKa shifts,
although we are unable to determine at this point
whether the actual values are calculated accurately.
On the other hand, the highest pKas we are aware of
that has been measured experimentally for cytidine
in a nucleic acid structure is 9.5.5 As such, a
calculated pKa such as 13.7 for C8 in BWYV-ψ is
unprecedented, and thus is almost certainly too
high.
Indeed, there is reason to believe that some pKa
values have been overestimated, since we have
treated the RNA structure as rigid; that is, we have
not allowed the RNA to undergo conformational
relaxation in response to a change in protonation.
Since the crystal and NMR structures studied here
were determined in pH ranges where the cytidine
nucleotides of interest are protonated, one would
expect some conformational relaxation to occur in
the folded state that would stabilize the unprotonated form of the cytidine. This would, in turn,
reduce the pKa to below the value obtained by
assuming a rigid structure. For this reason, the
values reported here for C8 in BWYV-ψ and C10 in
PEMV-ψ are likely to be too large. On the other
hand, the calculations clearly identify these two
cytidine nucleotides as undergoing significant pKa
shifts to higher values. Consistent with previous
studies, our calculations suggest that these groups
determine the pH-dependent unfolding of the two
pseudoknots at high pH.
The error resulting from the use of only two
crystal conformations was greater than 4 pKa units
for A23 in the BWYV-ψ (Table 3), and we concluded
we could not determine its pKa with any precision
(data not shown). This is likely due to the fact that
small changes in local structure around the titrating
group between the two conformations can have
large effects on electrostatic free energies. The
resulting energy differences may lead to large errors,
especially if the number of conformations considered is very few. On the other hand, averages over
larger numbers of conformations usually lead to less
noisy results, as was the case for most of the
remaining calculations.
pKa-shifted nucleotides in the HDV ribozyme
We computed the pKas of all titratable nucleotides
in HDVR and the hairpin ribozyme, so as to
determine the locations of nucleotides likely to be
protonated at physiological pH. HDVR catalyzes a
site-specific phosphodiester self-cleavage reaction
1480
that has been shown to be strongly pH-dependent.14,77 The structure of the HDVR ribozyme has
been solved in the precursor and product conformations.68,69 We performed pKa calculations using
the product (1cx0 and 1drz) and precursor (1vc5)
crystal structures, each obtained at pH ≥ 6.68,69
Because of the central interest of C75 for understanding HDVR enzymatic function,14,16,19,68,69,77–79 we
performed calculations only for structures with
cytidine at position 75; the nine remaining structures
were omitted from this study. With the exception of
C75, the calculated pKas of the nucleotides were
quite similar for the product and precursor structures (data not shown). This was expected since,
except for differences between the product and
precursor structures near C75, the overall similarity
of the selected structures is very strong (<1.6/1.1 Å
all-atom/all-phosphate-atom root-mean-square deviation (RMSD)). Following experimental conditions, 14, 77,80 we performed our calculations at
1.0 M monovalent salt (i.e. NaCl or LiCl). Ionspecific effects between two different species of
monovalent salt, however, cannot be taken into
account within the context of the PB equation.
Figure 3(a) displays the pKas calculated for all
titratable nucleotides in HDVR. Two nucleotides,
C41 and C75, were calculated to have pKas greater
than 5.8 (Figures 3(a) and 4(a)–(c)). C41 is part of a
CAA three-nucleotide loop in HDVR and is
involved in the structurally-conserved C + GCA
motif, found also in the BWYV and PEMV pseudoknots described above in Figure 2. Its calculated
pKa is 10.6, which is in the same range as the values
calculated for the protonated cytidine nucleotides of
the C+GCA motifs in the two pseudoknots. The
identification of C41 as a nucleotide with a shifted
pKa is consistent with the results reported by Been
and co-workers, who have attributed the apparent
rate constant for catalysis of about 7 to C41.17,80,81
C75 is calculated to have the second highest pKa in
the product structure of the HDV ribozyme, with a
calculated value of ∼9.6. Although we are not able to
report pKas for the precursor because atoms near the
5′ terminus of the RNA are missing (1vc5),69 C75 is
expected to have a higher pKa in the precursor than
in the product, because it contains an extra negative
charge due to the phosphate group located near C75.
C75 is located at the active-site of the ribozyme and
appears to form a hydrogen bond with the 5′
terminus OH in the product structure as in Figure
4(b) (PDB IDs 1cx0 and 1drz). In the precursor
structure (PDB ID 1vc5), the N3 atom of C75 is
within 2.7 Å of the O2P atom of the scissile
phosphate group, suggesting strongly that the
protonated state is stabilized by nearby phosphate
groups. C75 has been shown to play a direct role in
catalysis,14,16 and the mutation of C75 to U or G
effectively eliminates ribozyme activity.82–85 Mutation of C75 to adenosine lowers the apparent pKa of
the reaction by an amount that corresponds to the
difference in the solution pKa values of cytidine
and adenosine, suggesting strongly that the apparent pKa of the reaction reflects that of the nuc-
Calculating pKas in RNA
leotide at this position.14,18 It has been proposed
that the catalytic activity of HDVR depends on the
protonation of C75.14, 19,69 The identification of
C75 as a nucleotide with an elevated pKa supports
this hypothesis, although the calculated value is
greater than the best estimate in the literature,
pKa ∼ 6–8.14,80,81
pKa-shifted nucleotides in the hairpin ribozyme
Like HDVR, the hairpin ribozyme catalyzes a
site-specific phosphodiester cleavage reaction. In
the crystallized structure of this ribozyme, the
substrate appears as a separate strand, but the
base-pairing of this strand with the ribozyme
strand is integral to the formation of the ribozyme
structure. Together, the substrate and ribozyme
strands fold into a single four-helix junction.70,71
The active site is located within an extensive
interface between the two major helices of the
four-helix junction. pKas were calculated for the
precursor and product structures (PDB ID 1m5k
and 1m5v, each crystallized at pH 5), for which
four structures were available. Calculations were
not done on the 1m5o transition-state structure,
since the presence of the vanadate ion made the
partial charges of the transition state difficult to
predict. Our calculations identified three nucleotides, A10, A22 and A38, whose pKas are predicted
to be greater than 5.8 in the hairpin ribozyme when
calculations were done under experimental salt
conditions (1.0 M monovalent salt and 10 mM
divalent salt).86 The calculated values are 6.6, 7.2
and 5.9, respectively (Figure 3(b)).
As can be seen in Figure 5, A38 is located at the
interface of the two major helices near the active
site. In its protonated state, A38 appears to form a
hydrogen bond with the oxygen atom at the site of
the catalytic reaction. Biochemical characterization
of the hairpin ribozyme has shown that the
replacement of the adenosine with an abasic
residue reduces the rate of catalysis by five to six
orders of magnitude.20 However, activity can be
largely restored by supplying free adenine in
solution.20 Furthermore, substituting adenine with
nucleobases having a higher pKa, such as isoguanine (pKa = 9.0), raises the apparent pKa of the
reaction, suggesting that the nucleotide at position
38 is responsible for at least some of the pHdependence observed in the reaction.20 Substitutions by other nucleotide analogs displayed equivalent pKa shifts. On the basis of this evidence, it has
been suggested that A38 in the protonated state
stabilizes the transition state of the hairpin ribozyme. The elevated pKa calculated for A38 is
consistent with this idea.
A10 is also located in the interface between the
two major helices near the active site. The elevated
pKa of A10 is consistent with the sensitivity of the
ribozyme activity to the solution pKa of nucleotide
analogs substituted at A10.87,88 In particular, the
decrease in activity when A10 is substituted with 8aza-adenosine (n8A), whose solution pKa is 2.2, can
Calculating pKas in RNA
1481
Figure 3. pKas and electrostatic free energies in (a) hepatitis delta virus ribozyme and (b) the hairpin ribozyme.
Nucleotides with significantly shifted pKas are labeled. These include C41 (calculated pKa = 10.6) and C75 (9.6) in HDVR,
and A10 (6.6), A22 (7.2) and A38 (5.9) in the hairpin ribozyme. Values less than 3.0 are not reported. Locations of
nucleotides involved in Watson–Crick base-pairs are indicated as w. The red line indicates the solution pKa of
cytidine = 4.3.
be rescued by lowering the pH of the reaction,
suggesting that the ionization of A10 influences
catalytic activity directly.23 New crystal structures
have indicated that ordered water molecules are
near the active site of the hairpin ribozyme, and one
of these is in direct contact with the N1 atom of
A10.89 Disruption of the water network by perturbing the protonation state of A10 could explain the
pH-dependent nucleotide analog interference pattern of n8A, further supporting the existence of an
elevated pKa for A10. Our treatment of buried
waters as a dielectric continuum is, of course,
1482
Calculating pKas in RNA
Figure 4. Structure and organization of the HDV ribozyme. (a) Surface view and secondary structure schematic of
HDVR: P1 (red), P1.1 (yellow), P2 (tan), P3 (green), and P4 (purple). The approximate location of the scissile bond at the
junction of several secondary structure elements is indicated by an arrow. (b) The C75:N3 and G1:O5′ atoms within the
active site are within hydrogen bonding distance (1cx0). (c) C41 and C75 are shown relative to the secondary structure
elements in HDVR. Colors of nucleotides correspond to those depicted in (a). The J4/2 loop is shown in pale blue and the
C+GCA motif is in magenta.
problematic but to account for the interactions of
individual water molecules properly would require
simulations that are beyond the scope of this work.
Rather, our goals here are to identify nucleobases
with shifted pKas and to understand how RNA
structure is designed to effect these shifts.
Lastly, A22 also exhibits an elevated pKa in the
hairpin ribozyme. However, there is no evidence at
this point that A22 plays a specific role in catalysis.
Energetic contributions to pKa shifts in RNA
As has been discussed extensively for amino
acids,34–42,60,62 a number of factors can result in the
pKa shift of a nucleotide in RNA away from the value
observed for the isolated nucleotide in solution. In
the context of RNA, these include favorable interactions between negatively charged phosphate groups
and the protonated form of the base, desolvation
effects and intramolecular interactions with other
bases. Structural features that stabilize the protonated state of the nucleotide shift pKas upward, whereas
features that destabilize that state shift pKas downward. Favorable interactions of a protonated base
with negatively charged phosphate groups (base–
phosphate interactions) will always favor a shift to
higher pKas. Desolvation effects resulting from the
transfer of an ionizable nucleotide from the solvent
into a buried location within an RNA molecule will
favor lower pKas compared to those observed in
solution, due to the loss of stabilizing interactions of
the ionized species with the solvent. Lastly, intramolecular interactions between nucleobases (base–
base interactions) through hydrogen bonds and
other polar interactions can also shift pKas. The
size and direction of this effect depend on the
1483
Calculating pKas in RNA
Figure 5. Structure and organization of the hairpin ribozyme. (a) Secondary structure cartoon of the hairpin ribozyme
and (b) the locations of A10, A22 and A38 within the ribozyme. The approximate location of the scissile bond within the
interface (gray region) of the two major helices is indicated by the arrow. (c) The conformation of A38, A-1 and G+1 in the
hairpin ribozyme active site (1m5o) showing A38:N1 within hydrogen bonding distance of O5′ in the scissile phosphate
group.
detailed structural environment of each ionizable
group. Due to the lack of additivity of individual
contributions within the NLPB, we cannot report
specific contributions for each of these terms to pKa
shifts. However, in the following sections we
report individual contributions to electrostatic
free energies that can be related to structural
features of the RNA. This allows us to consider
how RNA structure is used to produce shifted
pKas.
Role of solvation and hydrogen bonding in
stabilizing pKa shifts: the branch-point helix
Although A6 and A7 are situated in very similar
structural environments in the branch-point helix,
the pKa of A7 is observed to be elevated, but the pKa
of A6 is not (Table 1). In an attempt to understand
the source of this difference, we have calculated a
number of contributions to the electrostatic potential
at both sites. As can be seen in Table 4, negatively
charged phosphate groups contribute a strong
negative electrostatic potential that stabilizes the
protonated form of each base by ∼3.6 kcal/mol.
However, desolvation opposes the phosphate contribution for A6 and A7. The effect is much smaller
for A7, which is more exposed to solvent than A6.
Indeed, much of the difference between the pKas of
A6 and A7 can be attributed to solvent exposure.
The ionized form of A7 is stabilized also by
favorable interactions with other bases, primarily
U16, which can form a hydrogen bond with the N1
atom of A7 via O2. Thus, the pKa of A7 is shifted to a
higher value, due to the effects of the phosphate
groups and interactions with other bases. In
contrast, A6 has weaker interactions with other
bases and the effect of the phosphate backbone is
opposed by desolvation effects.
Role of phosphate and base interactions in
stabilizing pKa shifts in HDVR
To understand the role of phosphate and base
groups in shifting pKas, we calculated individual
1484
Calculating pKas in RNA
Table 4. Electrostatic contributions due to changes in protonation state in the branch point helix and C+GCA motif
Structure/
nucleotides
Desolvation
free energy
Source of
contribution
Base–base interaction
free energy
Base–phosphate
interaction free energy
Branch-point helix
A6
A7
+2.2
+0.5
All NTs
All NTs
+0.4
−1.3
−3.6
−3.5
C+GCA motifs
HDVR
C41
−0.8
BWYV-ψ
C8
G73
G74
−4.6
−1.4
−0.5
−0.3
−0.1
PEMV-ψ
C10
G12
C14
−5.4
−3.7
−0.7
−0.5
−2.3
U9
G28
−1.8
−2.6
−0.4
−0.4
Electrostatic contributions from specific sources were calculated as discussed in Methods and are reported in units of kcal/mol. Rows
labeled All NTs signify the base–base and base–phosphate contributions due to all nucleotides. See Results for details.
electrostatic free energy terms for C41 and C75 in
HDVR, and A10, A22 and A38 in the hairpin
ribozyme, and compared these values to those
calculated for all other nucleotides (Figure 3). The
electrostatic terms vary for each nucleotide, but
there are regions in which base-phosphate and/or
base–base interactions stabilize the protonated form
of the base due to a strongly negative electrostatic
potential. In particular, nucleotides with highly
positive pKa shifts are found in regions where the
negative electrostatic potential due to phosphate
groups is particularly large.
Different contributions stabilize the ionized forms
of C41 and C75 in HDVR (Figure 3(a)). For C75,
base–phosphate interactions dominate base–base
interactions, whereas for C41 base–base interactions
also contribute favorably. C75 experiences a high
negative electrostatic potential induced by the
specific arrangement of phosphate groups in
HDVR. A unique feature of HDVR is the presence
of a nested pseudoknot in the core of the structure. A
reverse turn of the phosphodiester backbone at
nucleotides C21 and C22 in this part of the structure
results in a cup-like geometry of the phosphate
groups such that they surround one surface of the
catalytic C75 nucleotide (Figure 6). On the opposite
surface, phosphate groups adjacent to the scissile
bond also contribute to the negative potential.
Finally, an S-turn in the backbone of the so-called
J4/2 loop brings the A77 phosphate group within
7 Å of the catalytic core, which would not be
achieved if the backbone did not contain a turn in
this region. This unique convergence of geometries
appears to be the source of the high electrostatic
potential surrounding the active site nucleotide.
In contrast, C41 is located in a region of the RNA
where the phosphate potential is not unusually
negative (Figure 3(a)). Instead, the high pKa shift of
C41 can be explained by the unusually strong
energetic contributions from base–base interactions,
predominantly with nucleotides within the C+ GCA
motif. As discussed in the following section, the
favorable interactions that stabilize the protonated
cytidine include those formed with the guanosine
nucleotide, with which it shares two hydrogen
bonds, and additional neighboring nucleobase interactions that are conserved across very different RNA
structures.
Conservation of stabilizing interactions in the
C +GCA motif
In order to understand the energetic interactions
required to stabilize the protonated cytidine in the
C+GCA motif, we compared the magnitude of
individual electrostatic free energy terms at the N3
atom of the protonated cytidine within the three
structures where it is found; HDVR, BWYV-ψ and
PEMV-ψ. As can be observed in Figure 2, the four
nucleotides composing the C+GCA motif can be
readily superimposed. In each case, the protonated
cytidine (C41 in HDVR, C8 in BWYV-ψ, and C10 in
PEMV-ψ) is involved in an unusual hydrogen bond
with the major-groove (Hoogsteen) edge of the
guanosine in the motif (G73 in HDVR, G12 in
BWYV-ψ, and G28 in PEMV-ψ). As can be seen in the
structure, this hydrogen bond interaction occurs via
the keto oxygen of guanosine and stabilizes protonated cytidine. Consistent with its role in the
structure, this interaction emerges as a major
stabilizing feature, as shown in Table 4.
A second feature present in all three structures
appears to stabilize the protonated cytidine. Specifically, a neighboring nucleotide stacked above (and/
or below) the plane of the C+G base-pair contributes
to the stability of the protonated cytidine. As can be
seen in Table 4, G74 in HDVR, C14 in BWYV-ψ, and
U9 in PEMV-ψ play this role. In each case, the
individual free energy terms due to neighboring
nucleobase interactions are >1.4 kcal/mol. To
understand how three apparently different nucleotides could serve the same role in stabilizing this
structure, we compared the structures of the three
nucleotides near the C+GCA motif (Figure 7). In
Figure 7, it is clear that the G74, C14 and U9
nucleotides all contribute a keto oxygen atom to a
Calculating pKas in RNA
1485
Figure 6. Phosphate and other oxygen atoms can stabilize protonated nucleotides. (a) Structure of the phosphodiester
backbone near C75 (purple and yellow surface, purple and blue trace). A cluster of phosphate ions (yellow) from the P1.1
pseudoknot helix, the substrate strand, and the J4/2 loop stabilize the pKa shift of C75. The O5′ atom in the scissile
phosphate group is labeled. (b) The conformation of G29 is consistent with the formation of an O2′ hydrogen bond with
A78.
position within 4 Å of the proton of the ionized
cytidine, stacked above or below the plane of the
C+G base-pair. This oxygen atom is not involved in a
hydrogen bond with the proton but rather it is
arranged so as to optimize local electrostatic interactions in the neighborhood of the cytidine proton.
Figure 7. Hydrogen bonds and neighboring nucleobase interactions stabilize the protonated cytidine in the C+GCA
motif. Distances are given for neighboring nucleobase interactions between oxygen and cytidine N3. See the text for details.
1486
Calculating pKas in RNA
Most interesting is the case of C14 in BWYV-ψ.
This pseudoknot is characterized by a highly
unusual backbone conformation in the vicinity of
nucleotides 13 and 14 (A.M.P. and L. Wadley,
unpublished results).90 For example, U13 is C2′endo and the backbone is described by η-θ values
that fall outside any of the typical regions for RNA
(η, 62.3; θ, 42.2, for conformation B). The G12-C14
base-step is characterized by a much greater than
usual helical twist of nearly 90° (Figure 7). This
overtwisted conformation is accommodated by the
RNA backbone through the unpaired and outwardly-flipped U13 base. From these observations and
the calculated energy profiles, the structurallyconserved keto oxygen atom described above
appears to be an important feature of the C+ GCA
motif, which, to our knowledge, has not been
characterized.
phodiester backbone segments arranged in close
proximity (Figure 8). It is likely that the close
packing of this interface evolved in such a way as
to bring together the four backbone segments,
resulting in the creation of a region at the center
of the helical interface, and the catalytic core,
with a particularly high negative electrostatic
potential.
In some cases, nucleotides may interact strongly
with phosphate groups but are not calculated to
have a large pKa shift (e.g. A77 and A78 in HDVR;
Figure 3(a)). In these cases, the protonation of one
nucleotide with a higher pKa, such as C75, reduces
the pKa shifts of nearby nucleotides that interact
with it.
Stabilization of pKa shifts in the hairpin ribozyme
Calculating pKas in RNA
To understand the structural features that stabilize the pKa shift of A10, A22 and A38 in the
hairpin ribozyme, free energy contributions that
affect protonation were calculated for these
nucleotides. pKas generally coincide with the
regions of the RNA where the negative electrostatic potential due to phosphate groups is higher
than in the surrounding structure. Specifically,
nucleotides 9-10, 20–27, and 38–44 (Figure 3(b))
experience a significant negative electrostatic
potential. Within each of these nucleotide ranges,
at least one nucleotide is calculated to have a pKa
shift > 2 pKa units from its solution value. These
nucleotides coincide with center of the dense
interface between the two major helical elements of
the hairpin ribozyme that consists of four phos-
Discussion
Conformational relaxation
Here, we report a treatment of the factors that
produce pKa shifts in RNA structures. On the basis
of comparisons to experimental results where the
pKas have been determined directly, our approach
appears to be effective in predicting the pKas of
nucleotides. Most significantly, it is successful in
identifying bases with a shifted pKas and, in each
case, offers a structural interpretation for the shifts.
In systems where pKas have not been measured
directly, the identification of pKa shifts in this study
is supported by multiple lines of biochemical
evidence, such as the pH-dependence of catalysis
or unfolding. In many cases, the calculated pKas
can be interpreted meaningfully and agree quan-
Figure 8. Phosphodiester backbones of the two major helices stabilize the pKa shift of A38 at the interface of the
helices. The scissile phosphate group (arrow) is colored orange (phosphorus) and red (oxygen).
1487
Calculating pKas in RNA
titatively with experiment, but in a few other
cases, such as the protonated C in C+GCA and
the catalytic C in HDVR, the calculated values
appear to be more elevated than the best estimates
now available from experiment.
As we have discussed, the discrepancies are
likely due to the assumption that the RNA
structure does not change with change of the
ionization state. Using an internal dielectric constant of 4 accounts for some minor conformational
relaxation throughout the RNA associated with
nucleotide ionization,91 but clearly this does not
account for major changes that could occur if, say, a
nucleotide was stacked into a helix in one
conformation, but flipped out of the helix in
another. This, for example, has been shown to
occur in the U6 RNA intermolecular stem–loop
(ISL).92 In such cases, conformational changes
must be treated explicitly. Although assuming a
rigid molecule is clearly an oversimplification, it
is of considerable interest to explore pKa predictions based on the experimental structure alone
without introducing uncertainties arising from a
treatment of conformational relaxation in RNA.
Indeed, the good agreement between the calculated and experimental results obtained here for
the branch-point helix and the lead-dependent
ribozyme suggests that base protonation does
not induce large conformational changes in these
two structures, or at least that such changes are
not large enough to have significant effects on
pKa.
Divalent ions
Nonlinear effects
The role of phosphate groups in inducing pKa shifts
A major complication involved in calculating
pKas in a highly charged molecule arises from the
nonlinear response to salt concentration. For
proteins where nonlinear effects are not generally
thought to be important, the electrostatic interaction between two sites is independent of the
ionization state of the other sites. Thus, all
interactions are additive and need to be calculated
only once. In contrast, when the nonlinear PB
equation is used, the interaction of each pair of
sites depends on the charge states of other sites,
since these, in turn, affect the screening by salt of
the pairwise interaction in question. Accounting for
this effect exactly is computationally expensive and
to do so would require a separate PB calculation
for each of the 2 N ionization states of the
macromolecule. An approach to this problem was
considered by Vorobjev et al. for polylysine helical
peptides.59 They used a screening factor for each
pairwise interaction, which increased with the
electrostatic potential that characterized each
interaction (and hence, the net charge of the
molecule). Our method differs by attempting to
calculate a nonlinear correction as a single term
correlated with the total net charge on the RNA.
Overall, this is a simpler approach for treating
nonlinearity that nonetheless appears to achieve
good accuracy compared to experiment.
How do structural elements in ribozymes elevate
the pKas of nucleotides near their active sites? We
propose that the elevated pKas of several nucleotides
are a consequence of the architecture, or “fold”, of
the RNA, in which the local abundance of phosphate
groups helps to elevate pKas. In HDVR, for instance,
the two major helical axes of the ribozyme form a Yshaped intersection that converges near the active
site (Figure 4(c)). As a consequence, the active site
cytidine, C75, is brought together with the phosphate groups adjacent to the scissile bond, as well as
those involved with forming the central, P1.1
pseudoknot helix (nucleotides 21-22; Figure 6).
Such a compact arrangement of phosphate groups,
along with the curvature of the molecular surface,
can focus strong electrostatic potentials in the active
site region. Using calculations of the surface
potential with the nonlinear PB equation, Bevilacqua
and colleagues have shown that this is indeed
calculated to happen for this ribozyme.77,95 In
addition, we now show that the highly negative
potential leads to an elevated pKa calculated for the
C75 nucleotide. Notably, we also observed that the
electrostatic environment surrounding the proximal
nucleotides, A77 and A78 (Figure 3(a)), also favor
elevated pKas, although it remains to be shown
whether this has any functional significance for the
ribozyme.
The appropriate treatment of divalent ions is of
considerable importance, given their significant
electrostatic contributions to folding, stability and
ligand binding within RNA.93,94 Here, divalent ions
have been treated using the same formalism
governing the interaction of monovalent ions with
RNA. Thus, site-bound ions are not treated explicitly; rather, all divalent ions are assumed to be
bound diffusely and treated directly by application
of the NLPB.50 The effects of site-bound ions is
potentially of particular importance to the calculation of pKas in the active conformation of the HDVR
structure, where it has been reported that electron
density, interpreted as the presence of a site-bound
hydrated metal ion (e.g. Mg2+(H2O)6), is observed in
the crystal structure 4.3 Å away from the titration
site of the catalytic nucleotide (PDB 1sj3).69 However, we have used only the structures of HDVR in
which Mg2+ was not observed in the active site;
namely, in the product conformations (PDB 1cx0
and 1drz) and one that had been made inactive by
the removal of Mg2+ from the solution conditions of
the crystal structure (PDB 1vc5). A second factor
justifying our treatment of divalent ions is that
HDVR is catalytically active even in the presence of
only monovalent salt.14,80 Thus, at the very least, the
calculated pKa shift of the active site C75 is relevant
to understanding the nature of catalysis in the
absence of magnesium.
On the structural origins of pKa shifts in RNA
1488
A similar convergence of phosphate groups can
be observed near the active site of the hairpin
ribozyme (Figure 5). The high electrostatic potential
surrounding the active site may enable A38 to
protonate and form a functionally important
hydrogen bond in the transition state of the
transesterification reaction. In this case, the local
abundance of phosphate groups near the active site
is the consequence of the crossing of the two major
helical axes through the center of the molecule
(Figure 8). Much like HDVR, the formation of the
two-helix interface buries phosphate groups from
both the substrate and ribozyme strands. In total,
19 nucleotides are buried by greater than 70% of
their surface areas, and these occur mainly in the
interface. It seems likely that there will be further
instances in other RNAs where unusually high
densities of phosphate groups or buried phosphate
groups are used to shift the pKas of functionally
important nucleotides.
The role of base–base interactions in inducing
pKa shifts
Because of the importance of base–base interactions for defining structures in RNA, a number of
attempts have been made to catalogue the hydrogen
bonding patterns that are possible between nucleobases. 2,25,96–99 Through manual and automatic
means, these efforts have identified eight distinct
patterns of base-pairing that involve at least one
protonated nucleotide: C+C (cis and trans), A+C (cis),
A+G (Hoogsteen and reverse Hoogsteen), C+G (cis,
Hoogsteen and reverse Hoogsteen), where cis and
trans refer to the relative orientations along the
glycosidic bonds. An underlying assumption for
constructing most base-pair compendiums has been
to limit them to coplanar base-pairs involved in at
least two hydrogen bonds. Although these simplifications have been useful for enumerating the most
likely configurations for protonated nucleotides,
clearly these heuristics may miss the identification
of nucleotides that are protonated if the hydrogen
bond acceptor is a phosphate oxygen atom or 2′OH.
Indeed, the pKa of adenosine or cytidine is particularly ambiguous in the absence of energy calculations when the hydrogen bonding partner is a 2′OH
because it is possible for the 2′ oxygen atom to act as
either a hydrogen bond donor or acceptor. An
example of this can be observed in the catalytic core
of HDVR for A78 (Figure 6(b)).
Among the set of base-pairs containing protonated
nucleotides, one of the more commonly observed
seems to be the A+C base-pair.4,92,100,101 In our
current study, the A+C pair has appeared in the
lead-dependent ribozyme where we have obtained a
calculated pKa for the pair close to 6.5. Does this basepair have any function other than to stabilize the RNA
under acidic conditions? Others have noted that the
A+C interaction is isosteric to GU wobble pairs,96,100
where the C1′ atoms of the A+C pair and those of GU
and glycosidic bonds are equally distant and in the
same relative orientation. However, unlike the GU
Calculating pKas in RNA
wobble pair, the A+C pair forms only when the pH is
sufficiently low for adenosine to protonate. Thus,
unlike the GU wobble, A+C pairs can act as a pHsensitive conformational switch, such as the one that
appears to occur near the cleavage site in the Varkud
satellite ribozyme.101 In this system, deprotonation is
coupled to a conformational change in the cleavage
site stem–loop. A significant conformational change
upon a shift in pH is observed also for the U6 ISL.92 It
is possible that pH-dependent base-pairs like A+C
may be conserved in an RNA where sensitivity to its
pH environment may be important to its function.
Conserved hydrogen bonds appear to be the main
source of stability of the C+GCA motif. However,
more subtle structural features such as the structural
and electrostatic effects of neighboring nucleobases
may also be involved. It is known that duplexes
having the same composition of base-pairs but in
different permutations have different energies of
duplex formation.102,103 Indeed, there is experimental
evidence that nearest neighbors influence the pKas of
adjacent nucleotides directly.104 Differences arise
from the different stabilities introduced by the
juxtaposition of different interactions between basepairs due to the different permutations. In the case of
the C+GCA motif, the structure appears to preferentially adopt a conformation where adjacent keto
oxygen atoms are positioned to stabilize the protonated cytidine. Future work may involve performing
nucleotide sequence alignment to discover whether
this preference is more widely conserved.
On the basis of our calculations, several nucleotides
have been identified in the catalytic cores of ribozymes to have elevated pKas. Notably, the predicted
nucleotides coincide almost precisely with nucleotides that have been shown to have catalytic roles.
Moreover, many of these nucleotides that we have
predicted to have anomalous pKas correspond to
groups that have been suggested to be protonated on
the basis of the pH-dependence of the catalytic rates of
reaction. Each of the calculated results can generally
be understood in structural terms. In the case of
HDVR, C75 is thought to act as a general acid or base
near the 2′ OH nucleophile of the precursor HDVR.
The proximity of the C75 base to the 5′ oxygen
terminus (Figure 4(d)) has suggested the possibility
that, under certain conditions, the protonated form of
C75 could be stable and act as a possible general acid
or base, even though there is no structure of the native
sequence clearly showing this interaction. In the
hairpin ribozyme, the protonated form of A38
appears to form a hydrogen bond with a central
phosphate oxygen atom within the trapped transition-state mimic (Figure 5(c)). Since the hairpin
ribozyme displays no specificity for metal ions and
none is observed to bind near the active site,
nucleotides alone may be solely responsible for
catalysis.105,106 Indeed, the same may be true for the
HDV ribozyme.80 Such findings deepen our appreciation of the nature of RNA catalysis and the versatility
of ribonucleotides. Moreover, the theoretical and
computational methodology developed in this work
offers the possibility of understanding the structural
1489
Calculating pKas in RNA
origins of pKa shifts of nucleotides that play functional
roles and, in addition, of using structural information
to identify these nucleotides when direct experimental measurements are not available.
Methods
pKa calculations based on the Poisson–Boltzmann equation have been widely used to study proteins,34–42,60,62
and, more recently, DNA.43 Here, we review the underlying theory in order to discuss its application in the context
of RNA. We used a modified version of the program
MCCE,60–62 which uses a distance-dependent pairwise
energy softening function107 to help prevent large electrostatic energies from dominating the pKa calculations. This
can occur if ionizable groups approach each other too
closely due to small errors in the crystal structure or when
dielectric screening is not accounted for completely. A
unique feature of MCCE is its ability to account for
conformational changes between protonated and unprotonated states of the titratable group by sampling over
multiple conformations. However, we have used a simplified version where titratable nucleotides are held rigid and
can exist only in one of two states: protonated and
unprotonated, and where the charge of the nucleotide is
0e or −1e, respectively.
Theory of multi-site titration in nucleic acids
The theory of multi-site titration for polymers in solution
was developed previously,37,41,59,60 and is described here as
applied to nucleic acids. Given a nucleic acid with N
nucleotides that might be protonated, we can compute the
titration curve of the ith nucleotide by finding its average
degree of protonation, xi, as a function of pH, (i.e. xi = +1 if
protonated, otherwise xi = 0). For the purposes of this study,
we consider adenosine and cytidine nucleotides as capable
of protonation, specifically on their N1 or N3 imino nitrogen
atom, respectively, although the same representation can be
used for additional types of nucleotides.
We represent the protonation microstate, m, of the nucleic
acid by the vector x with N elements, which describes the
titration states of each nucleotide in the molecule for that
microstate. A free energy, ΔGm, is associated with each
microstate. There are M = 2N such microstates. The average
charge on the ith nucleotide can be found by taking the
Boltzmann-weighted average:
!
"
M
X
DGm
xi ðmÞexp $
kB T
ð1Þ
hxi i ¼ m M
!
"
X
DGm
exp $
kB T
m
over the set of possible microstates. In a nucleic acid with N
titratable nucleotides, the complete Boltzmann-weighted
average requires the computation of 2N terms. In practice,
this is avoided by using a Monte Carlo (MC) procedure to
estimate the frequency of low-energy microstates, which
will dominate the partition function. These are used to
calculate the titration curves of each nucleotide using the
microstate free energy described by equation (2). The pKa of
the ith nucleotide is obtained by finding the pH at which ‹xi›
is equal to 0.5 using the multi-conformational continuum
electrostatics (MCCE) procedure.60 MCCE was designed to
account for local conformational changes around an
ionizable group but this feature of the program has not
been developed for nucleic acids. For this reason, we have
kept the RNA structure rigid and the term multiconformational is not appropriate for the current application. Thus, we
have kept the title of the program we used but have turned
off one of its features. However, the MCCE program offers a
well-tested MC approach to using continuum electrostatics
in the calculation of pKas and it has thus provided a
particularly useful vehicle in the current study.
In MCCE, ΔGLPB
is obtained from solutions of the LPB
m
and is written as:
DGLPB
m ¼
N
X
i
#
$
xi d½2:3kB T pH$pKaref ðiÞ
þ DGself ðiÞþDGfixedðiÞ'
þ
N X
N
%
&
1X
xid xjd DGpair ði; jÞ þ DGvdW ði; jÞ
2 i jp i
ð2Þ
where the reference pKa, pKref
a (i), is the pKa of the ith
titratable nucleotide in the hypothetical unfolded state of
the nucleic acid (Figure 9). As a simplification, this value is
taken to be the same as the solution pKa of an isolated
nucleotide monophosphate and is quoted from experimental measurement as 3.8 for 5′-AMP at 25 °C in 0.1 M
KNO3 and 4.3 for 5′-CMP at 25 °C in 0.1 M KCl.1 The
precision of these measurements is expected to be ±0.2–0.4
pKa unit, given possible differences in temperature and
salt concentration between the reference state and the
experimental conditions of the RNA structures used
here.
The additional free energy terms in equation (2) are
responsible for pKa shifts relative to the solution value.
The self free energy, ΔGself(i), is the desolvation cost of
protonating nucleotide i in the folded state compared to
the unfolded state. ΔGfixed(i) gives the change in the free
energy of solvent-screened coulombic, or pairwise, interactions between the charges in a protonated or unprotonated nucleotide and fixed charges in the RNA (i.e. due to
Figure 9. Thermodynamic cycle considered for a pKa
calculation of a single nucleotide for simplicity. See
Methods for details.
1490
Calculating pKas in RNA
guanosine and uridine nucleotides). In the MCCE method,
ΔGfixed(i) includes any change in free energy of van der
Waals (vdW) interactions upon protonation. The final
energy terms, ΔGpair(i,j) and ΔGvdW(i,j), give the free
energies of pairwise interaction and the van der Waals
interaction, respectively, between the ith and jth titratable
nucleotides. kB is Boltzmann's constant and T is the
temperature of the system. At T = 25 °C, kBT is taken to be
0.6 kcal/mol.
Standard equations for computing the desolvation free
energy of a nucleotide and free energy of interaction
between the nucleotide and other partial charges (e.g. due
to other fixed or titratable nucleotides) from electrostatic
potential are given by:108
Gself ¼
and
X
1 atoms
q Brxn$field
2 n n n
Gfixedjpair ¼
atoms
X
ð3aÞ
chgðmÞ
qn Bn
ð3bÞ
where the summations run over the atoms of the
nucleotide and qn is the partial charge of the nth atom in
the nucleotide. φrxn-field
is the reaction field potential at the
n
position of atom n induced by solvation effects, and φchg(m)
n
is the site potential at the coordinates of atom n induced by
partial charges in the set of atoms m with all other partial
charges set to zero. Hence, the electrostatic free energy
terms in equation (2) can be expressed as:
nuclðiÞ
1 X prot rxn$field
DGself ðiÞ ¼
q Bn
2 n n
nuclðiÞ
1 X unpr rxn$field
$
q
Bn
2 n n
ð
$
ð 12
$
and
n
Þ
RNA
rxn$field
qprot
n Bn
nuclðiÞ
1 X unpr rxn$field
q
Bn
2 n n
DGfixedjpair ði; jÞ ¼
Linear and nonlinear Poisson–Boltzmann
equations
Electrostatic site potentials and reaction field potentials
are obtained from finite difference solutions to the
Poisson–Boltzmann equation:109
4ke f
ð5Þ
U ðrÞ þ FðfÞ ¼ 0
kB T
where ϕ(r) denotes the electrostatic potential, ρf(r) denotes
the distribution of partial atomic charges and ε(r) is the
value of the dielectric constant for any point in space.54 F
(ϕ) has the general form:
jd εðrÞjfðrÞ þ
FðfÞu
n
nuclðiÞ
X
imate non linear microstate energies that can be used in
the context of the MC procedure.
Þ
solution
ð4aÞ
nuclðiÞ
X
ffixedjpairgðjÞ
qprot
n Bn
n
nuclðiÞ
X
$
qunpr
BnffixedjpairgðjÞ
n
n
ð4bÞ
where qprot
and qunpr
refer to the partial charges for the
n
n
protonated and unprotonated forms of the nucleotide i,
and φ{fixed|pair}(j)
refer to potentials computed under the
n
appropriate set of atoms in nucleotide j.
Since free energies within the LPB are additive, all of the
terms in equation (2) need be calculated only once for a
particular macromolecule. Thus, the PB equation does not
need to be solved during every step of the MC procedure.
However, as discussed below, additivity is lost if the
NLPB is used and every term in the equation depends on
the microstate involved. This would require that the NLPB
be solved for every step in an MC procedure, which is not
computationally feasible when many nucleotides are
involved. In the next section, we describe our use of the
LPB and NLPB equations. The section that follows
introduces a method that allows us to calculate approx-
4k X b
c zi expð$zi fðrÞÞ
kB T i i
ð6Þ
FðfÞu $ q0 n2 fðrrÞ
ð7Þ
where the sum is taken over all mobile ion species, and cbi
and zi are the bulk concentration and electrical charge of
each species. Where only monovalent salt appears in the
solvent, F(ϕ) is rewritten as –ε0κ2sinh(ϕ(r)), where κ2 is
8πe2I/ε0kBT and I is the ionic strength. When potentials are
small, sinh(ϕ(r)) can be approximated simply as ϕ(r) and
F(ϕ) is simplified to:
This form for F(ϕ) yields the linear PB equation and has
the important property that energetic contributions derived from it are linearly additive. Thus, the linear PB
equation can be used to break up larger calculations into
individual contributions to the electrostatic free energy,
which can then be summed to yield total values, as
described by equation (2). However, the drawback is that
the linear approximation is valid only for molecules where
the net charge is small and ions of different valence are all
incorporated into a single ionic strength parameter I. RNA
however bears a –1e charge for every unprotonated
nucleotide in its structure, and electrostatic potentials can
become very high for even moderately sized molecules.
Water is assigned a value of ε = 80 and a lower dielectric
constant is generally used to represent the solute; the
solvent-accessible molecular surface represents the
boundary between these two dielectric regions. As
discussed in previous work, a value of ε = 1 represents a
solute with no electronic polarizability (an implicit
assumption in most all-atom simulations).91 The value of
2 has been shown to account well for electronic polarizability in a static structure, whereas larger values such as 4
account in small part for conformational changes in the
molecule that accompany changes in ionization state.91
Since our model for RNA keeps the nucleobase and
backbone rigid, a value of 4 is consistent with work done
in proteins, and was adopted for this work.41,60 The
dielectric constant inside the molecular surface of the RNA
is assigned this value. The ionic strength is assigned a
value of zero at every point in the finite difference lattice
that is inside this surface and within an ion-excluded
region that extends 2 Å from the surface.
The nonlinear correction to microstate energy
When the NLPB is used, as is appropriate for highly
charged molecules, the additive property of the linear
1491
Calculating pKas in RNA
equation is no longer valid. This is because the concentration of salt around the RNA depends on the charge state
of each nucleotide; for example, there are clearly more
positively charged counterions around the RNA when
nucleotides are all negatively charged than when they are
neutral. Thus, pairwise interactions between any two
nucleotides depend on the ionization state of all other
nucleotides. This leads to a major combinatorial problem
that cannot be addressed without some type of approximation. In earlier studies, this was addressed by
introducing a correction factor for each pair to account
for the non linearity.59 However, we choose the simpler
assumption of introducing a correction factor for each
charged state of the RNA, which is less expensive to
calculate.
Our approach is to use the LPB equation to obtain
pairwise energies that do not depend on the charge state of
other nucleotides and to correct these linear energies
based on the net charge of the RNA. The difference in
electrostatic free energy obtained from the NLPB and LPB
is defined here as ΔGcorr, where the superscript corr
denotes a correction term. Thus:
DGcorr ¼ DGNLPB $ DGLPB
ð8Þ
where ΔGNLPB and ΔGLPB are the electrostatic free
energies computed using the nonlinear and linear PB
equations, respectively.54 We assume that ΔGcorr can be
approximated with a function that has a quadratic
dependence on net charge. Specifically:
2
DGcorr
m ¼ a Zm þ b Zm þ c
N
X
i
xi ðmÞ
ð10Þ
where the sum is over all nucleotides in a particular
microstate. We determine values for a, b and c for each
molecule by running LPB and NLPB calculations on three
different microstates that produce a particular net charge
and, in this way we are able to plot ΔGcorr as a function of
Z. Fitting these points to the polynomial of equation (9)
yields values of a, b and c. (A plot of the nonlinear
correction energy for the RNAs studied here appears in
Supplementary Data Figure 1.)
We now define the approximate free energy of a
microstate, ΔGNLPB(apprx)
, as:
m
corr
¼ DGLPB
DGNLPBðapprxÞ
m
m þ DGm
Individual contributions to the electrostatic free
energy
The electrostatic free energy contribution due to
desolvation is defined by equation (4a). We note here
that values for ΔGself(i) obtained from the LPB and NLPB
are nearly identical (data not shown).
In order to obtain a measure of the electrostatic effects
due to phosphate groups and to other bases, we have
calculated electrostatic potentials at the N1 atom of each
adenosine and the N3 atom of each cytidine. Although we
recognize that the potential obtained from the NLPB is not
additive, the terms we report are related directly to RNA
structural features and this provides insight as to the
source of the pKa shifts. Contributions due to phosphate
groups, which include the atoms: P, O1P, O2P, O5′ and
O3′, are obtained by assuming these groups to be charged,
while all other atoms in the RNA are kept neutral.
Multiplying these potentials by +1e, to reflect a change in
ionization state at the site of protonation, yields the
contribution to the electrostatic free energies that is
reported in Figure 3 and Table 4. In order to calculate
the electrostatic potentials due to the bases, we keep the
phosphate groups charged and calculate the differential
potential when the atoms in the bases are assumed to be
charged relative to when they are assumed to be neutral.
This can be done for all the bases in RNA or for an
individual nucleotide.
ð9Þ
where a, b and c, are coefficients that are appropriate for a
particular conformation of a given macromolecule. Zm
represents the number of nucleotides protonated in
microstate m, and can be written as:
Zm ¼
that ΔGNLPB(apprx)
is used only in the context of the MC
m
procedure.
ð11Þ
ΔGNLPB(apprx)
is used in our MC procedure instead of
m
NLPB(apprx)
ΔGLPB
m . Note that the free energy defined by ΔGm
is additive, but equation (11) accounts for nonlinear effects
in an approximate way. The use of ΔGNLPB(apprx)
yields
m
more accurate agreement between computed and experimental pKas than ΔGLPB
alone (Supplementary Data
m
Figure 2). The difference is often on the order of +1–2 pKa
units. All pKa calculations reported here were performed
using the nonlinear correction, except where noted
otherwise. A separate set of values, a, b and c, is computed
for each NMR or crystal structure. The resulting nonlinear
corrections are similar within each set of structures and
salt conditions (see Supplementary Data Figure 1). Note
The determination of partial atomic charges and radii
The solution to the PB equations relies on a detailed
atomic description of partial charges within the RNA
along with its molecular surface. Since standard molecular
mechanics force-fields do not provide partial charges for
ionized forms of AMP and CMP, new partial atomic
charges were calculated for these nucleotides. Our
philosophy was to devise a simple way to generate partial
charges that, when combined with appropriate radii,
would be consistent with the experimental literature
concerning the solvation energies of nucleobase derivatives. To do this, we used a philosophy similar to that used
in the development of the AMBER atom-centered charges
and a PARSE-like strategy for the selection of appropriate
atomic radii.110,111 Atomic radii are used to describe the
solvent-accessible molecular surface (and hence, the
dielectric boundary) between solute and the solvent for
calculations used here. The hydrogen radius was assigned
a value of 1.10 Å. These and other atomic radii were
chosen, in part, for their ability to reproduce trends of
solvation in the four nucleobases. Consistent with PARSE
radii for amino acids, Pauling's atomic radii were assigned
to all heavy atoms. Thus, the atomic radius of phosphorus
was assigned using its literature value of 1.90 Å.112 (See
Supplementary Data Table 1).
Partial charges were generated by fitting atom-centered
charges to electrostatic potentials (ESP) derived ab initio
using the B3LYP/6-31g* level of theory and using the
program Gaussian 98 (gaussian.com). Nine calculations
were performed: one for each of six ribonucleosides (A,
A+ , C, C+ , G and U), and one for each of three
conformations of dimethyl-phosphate (gauche-gauche,
gauche-trans and trans-trans). The partial charges on ribose
atoms C5′, H5′1, H5′2, C4′, H4′, O4′, C3′, H3′, C2′, O2′,
1492
HO2 were made equivalent in all six ribonucleosides by
averaging the corresponding partial charges for each
atom. Excess charges were redistributed over the atoms
C1′, H1′ and N1/9 (nitrogen involved in the glycosidic
bond) to ensure the net charge per nucleotide was integral.
A single set of partial charges was obtained for the
phosphate atoms P, O1P, O2P, O3′, O5′ by averaging the
corresponding partial charges in the three conformers. The
protons in each pair, H5′1/H5′2 (ribose), H21/H22
(guanosine), H41/H42 (cytidine), H61/H62 (adenosine),
were made equivalent by redistributing the partial charge
evenly between the two protons. The overall redistribution of charge resulting from this procedure was very
small. Partial charges for all remaining nucleobase atoms
were not modified. United atoms were created for all RNA
hydroxyl groups, O2′/HO2, O3T/H3T (3′ terminus),
O5T/H5T (5′ terminus), by summing the partial charge
on the oxygen and hydrogen atoms and placing the sum at
the coordinates of the oxygen atom. This procedure
produced partial charges that were not significantly
different from those of AMBER 94 or ChARMM 27. (See
Supplementary Data Table 2; atom names and nucleotide
structures are given in Supplementary Data Figure 3).
To validate the partial charges and atomic radii set,
solvation free energies from gas to water were calculated
by summing electrostatic and non-polar contributions to
solvation111 (equations (1), (3), (4) therein) and the results
were compared to the solvation free energy determined
for 9-methyladenine; this quantity was derived originally
in the work by Ferguson et al.113 using the experimentally
measured heat of vaporization of 9-methyladenosine.
Based on the comparison of calculated solvation free
energies for 9-methyladenine for various hydrogen radii,
the radius of 1.10 Å was chosen. The relative solubilities
of the nucleobases have been determined on the basis of
their ability to partition between water and chloroform
as well as between water and cyclohexane, where in
order of hydrophilicity, G > C > U > A.114,115 The calculated solvation free energy for 9-methyladenine is consistent with the experimental value and the remaining
calculated solvation free energies are consistent with the
hydrophilicity scale established by Wolfenden and coworkers. (Supplementary Data Tables 3 and 4).
Finally, we scale atomic radii in order to use them in PB
calculations where the solute dielectric of RNA is set to a
value greater than 1. In particular, we scale the atomic
radius by 87% when working with ε = 4, the value used in
the pKa calculations. Atomic radius scaling was used in
calculations of solvation free energy by Sitkoff et al.111 The
rationale is to maintain the same solvation free energy of
individual nucleotides as calculated for ε = 1 when
alternate values for the dielectric is used. The scaling
factor maintains the balance between solvation and
pairwise energies involved in the calculation of pKa shifts.
Tests of these parameters (atomic radii, internal dielectric)
for pKa calculations revealed that the values chosen were
quite reasonable, as shown in Results. We emphasize that
the choice of the scaling factor was obtained independently of any pKa calculation, and was not in any way
chosen so as to fit experimental pKas.
Preparation of structures before calculation
Coordinates of RNA structures were obtained from the
Protein Data Bank (PDB). The following structures were
used in this work: 17ra (BPH),64 1ldz and 2ldz (LDZ),72
437d and 1l2x (BWYV-ψ),66,116 1kpy and 1kpz (PEMV-ψ),67
1cx0, 1drz and 1vc5 (HDVR),68,69 1m5k and 1m5v (hairpin
ribozyme).70,71 Crystallographic water and all metal ions
Calculating pKas in RNA
were removed from the structures and are not included in
the calculations. NMR structures having multiple conformations were separated and treated individually. The
topology and parameter files were modified for the XPLOR program to handle the protonation of ionized
nucleotides. Hydrogen atoms for all nucleotides, ionized
or neutral, were added using the X-PLOR program holding
heavy-atom positions fixed.117 The modified X-PLOR
topology and parameter files are available upon request.
Calculations were performed on the proton-added structures without further minimization. In the structures for
BWYV-ψ, the 5′ triphosphate terminus was removed and
replaced with a standard O5′ terminus. The product
structure of the hairpin ribozyme contains 2′-3′-cyclic
phosphate between A12 and G13 of the cleaved substrate
strand. To obtain pKas of the ribozyme in the product
conformation, partial charges were first determined for the
2′-3′-cyclic phosphate using the ESP protocol described
above.
Source code and additional parameters
All the source code used in this work, including our
modified version of MCCE, will be made available via the
website†.
In general, all parameters not otherwise discussed here
are given in Supplementary Data Table 5.
Acknowledgements
We thank Donald Petrey for assistance with
GRASP2, Li Xi for assistance with Gaussian98, and
Kevin Keating for calculations of η-θ angles. We are
grateful to Lucy Forrest, Mickey Kosloff and Remo
Rohs for many helpful comments in the writing of
the manuscript.
References
1. Izatt, R. M., Christensen, J. J. & Rytting, J. H. (1971).
Sites and thermodynamic quantities associated with
proton and metal ion interaction with ribonucleic
acid, deoxyribonucleic acid, and their constituent
bases, nucleosides, and nucleotides. Chem. Rev. 71,
439–481.
2. Saenger, W. (1984). Principles of Nucleic Acid Structure.
Springer-Verlag, New York.
3. Gao, X. L. & Patel, D. J. (1987). NMR studies of A.C
mismatches in DNA dodecanucleotides at acidic pH.
Wobble A(anti).C(anti) pair formation. J. Biol. Chem.
262, 16973–16984.
4. Cai, Z. & Tinoco, I., Jr (1996). Solution structure of
loop A from the hairpin ribozyme from tobacco
ringspot virus satellite. Biochemistry, 35, 6026–6036.
5. Asensio, J. L., Lane, A. N., Dhesi, J., Bergqvist, S. &
Brown, T. (1998). The contribution of cytosine
protonation to the stability of parallel DNA triple
helices. J. Mol. Biol. 275, 811–822.
6. Jang, S. B., Hung, L. W., Chi, Y. I., Holbrook, E. L.,
Carter, R. J. & Holbrook, S. R. (1998). Structure of an
† http://wiki.c2b2.columbia.edu/honiglab _ public/
index.php/RNA
1493
Calculating pKas in RNA
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
RNA internal loop consisting of tandem C-A+ basepairs. Biochemistry, 37, 11726–11731.
Durant, P. C. & Davis, D. R. (1999). Stabilization of
the anticodon stem-loop of tRNALys,3 by an A+-C
base-pair and by pseudouridine. J. Mol. Biol. 285,
115–131.
Morse, S. E. & Draper, D. E. (1995). Purine-purine
mismatches in RNA helices: evidence for protonated
G.A pairs and next-nearest neighbor effects. Nucl.
Acids Res. 23, 302–306.
Ravindranathan, S., Butcher, S. E. & Feigon, J. (2000).
Adenine protonation in domain B of the hairpin
ribozyme. Biochemistry, 39, 16026–16032.
Bink, H. H., Hellendoorn, K., van der Meulen, J. &
Pleij, C. W. (2002). Protonation of non-Watson-Crick
base-pairs and encapsidation of turnip yellow
mosaic virus RNA. Proc. Natl Acad. Sci. USA, 99,
13465–13470.
Blanchard, S. C. & Puglisi, J. D. (2001). Solution
structure of the A loop of 23S ribosomal RNA. Proc.
Natl Acad. Sci. USA, 98, 3720–3725.
Bevilacqua, P. C. (2003). Mechanistic considerations
for general acid-base catalysis by RNA: revisiting the
mechanism of the hairpin ribozyme. Biochemistry, 42,
2259–2265.
Bevilacqua, P. C., Brown, T. S., Nakano, S. & Yajima,
R. (2004). Catalytic roles for proton transfer and
protonation in ribozymes. Biopolymers, 73, 90–109.
Nakano, S., Chadalavada, D. M. & Bevilacqua, P. C.
(2000). General acid-base catalysis in the mechanism
of a hepatitis delta virus ribozyme. Science, 287,
1493–1497.
Oyelere, A. K., Kardon, J. R. & Strobel, S. A. (2002).
pKa perturbation in genomic Hepatitis Delta Virus
ribozyme catalysis evidenced by nucleotide analogue
interference mapping. Biochemistry, 41, 3667–3675.
Perrotta, A. T., Shih, I. & Been, M. D. (1999).
Imidazole rescue of a cytosine mutation in a selfcleaving ribozyme. Science, 286, 123–126.
Wadkins, T. S., Shih, I., Perrotta, A. T. & Been, M. D.
(2001). A pH-sensitive RNA tertiary interaction
affects self-cleavage activity of the HDV ribozymes
in the absence of added divalent metal ion. J. Mol.
Biol. 305, 1045–1055.
Shih, I. H. & Been, M. D. (2001). Involvement of a
cytosine side chain in proton transfer in the ratedetermining step of ribozyme self-cleavage. Proc.
Natl Acad. Sci. USA, 98, 1489–1494.
Das, S. R. & Piccirilli, J. A. (2005) General acid
catalysis by the hepatitis delta virus ribozyme 1,
45–52.
Kuzmin, Y. I., Da Costa, C. P., Cottrell, J. W. & Fedor,
M. J. (2005). Role of an active site adenine in hairpin
ribozyme catalysis. J. Mol. Biol. 349, 989–1010.
Kuzmin, Y. I., Da Costa, C. P. & Fedor, M. J. (2004).
Role of an active site guanine in hairpin ribozyme
catalysis probed by exogenous nucleobase rescue.
J. Mol. Biol. 340, 233–251.
Lebruska, L. L., Kuzmine, Y. I. & Fedor, M. J. (2002).
Rescue of an abasic hairpin ribozyme by cationic
nucleobases: evidence for a novel mechanism of
RNA catalysis. Chem. Biol. 9, 465–473.
Ryder, S. P., Oyelere, A. K., Padilla, J. L., Klostermeier, D., Millar, D. P. & Strobel, S. A. (2001).
Investigation of adenosine base ionization in the
hairpin ribozyme by nucleotide analog interference
mapping. RNA, 7, 1454–1463.
Wilson, T. J., Ouellet, J., Zhao, Z. Y., Harusawa, S.,
Araki, L., Kurihara, T. & Lilley, D. M. (2006).
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
Nucleobase catalysis in the hairpin ribozyme. RNA,
12, 980–987.
Lee, J. C. & Gutell, R. R. (2004). Diversity of base-pair
conformations and their occurrence in rRNA structure and RNA structural motifs. J. Mol. Biol. 344,
1225–1249.
Xiong, L., Polacek, N., Sander, P., Bottger, E. C. &
Mankin, A. (2001). pKa of adenine 2451 in the
ribosomal peptidyl transferase center remains elusive. RNA, 7, 1365–1369.
Muth, G. W., Chen, L., Kosek, A. B. & Strobel, S. A.
(2001). pH-dependent conformational flexibility
within the ribosomal peptidyl transferase center.
RNA, 7, 1403–1415.
Yang, A. S. & Honig, B. (1994). Structural origins of
pH and ionic strength effects on protein stability. Acid
denaturation of sperm whale apomyoglobin. J. Mol.
Biol. 237, 602–614.
Bullough, P. A., Hughson, F. M., Skehel, J. J. & Wiley,
D. C. (1994). Structure of influenza haemagglutinin at
the pH of membrane fusion. Nature, 371, 37–43.
Frick, D. N., Rypma, R. S., Lam, A. M. & Frenz, C. M.
(2004). Electrostatic analysis of the hepatitis C virus
NS3 helicase reveals both active and allosteric site
locations. Nucl. Acids Res. 32, 5519–5528.
Ondrechen, M. J., Clifton, J. G. & Ringe, D. (2001).
THEMATICS: a simple computational predictor of
enzyme function from structure. Proc. Natl Acad. Sci.
USA, 98, 12473–12478.
Doudna, J. A. & Cech, T. R. (2002). The chemical
repertoire of natural ribozymes. Nature, 418, 222–228.
Fedor, M. J. & Williamson, J. R. (2005). The catalytic
diversity of RNAs. Nature Rev. Mol. Cell. Biol. 6,
399–412.
Demchuk, E. & Wade, R. C. (1996). Improving the
continuum dielectric approach to calculating pKas of
ionizable groups in proteins. J. Phys. Chem. 100,
17373–17387.
Nielsen, J. E. & Vriend, G. (2001). Optimizing the
hydrogen-bond network in Poisson-Boltzmann
equation-based pKa calculations. Proteins: Struct.
Funct. Genet. 43, 403–412.
Mehler, E. L. & Guarnieri, F. (1999). A self-consistent,
microenvironment modulated screened coulomb
potential approximation to calculate pH-dependent
electrostatic effects in proteins. Biophys. J. 77, 3–22.
Bashford, D. & Karplus, M. (1990). pKas of ionizable
groups in proteins: atomic detail from a continuum
electrostatic model. Biochemistry, 29, 10219–10225.
Antosiewicz, J., McCammon, J. A. & Gilson, M. K.
(1994). Prediction of pH-dependent properties of
proteins. J. Mol. Biol. 238, 415–436.
Antosiewicz, J., McCammon, J. A. & Gilson, M. K.
(1996). The determinants of pKas in proteins.
Biochemistry, 35, 7819–7833.
Yang, A. S. & Honig, B. (1993). On the pH
dependence of protein stability. J. Mol. Biol. 231,
459–474.
Yang, A. S., Gunner, M. R., Sampogna, R., Sharp, K.
& Honig, B. (1993). On the calculation of pKas in
proteins. Proteins: Struct. Funct. Genet. 15, 252–265.
Li, H., Robertson, A. D. & Jensen, J. H. (2005). Very
fast empirical prediction and rationalization of
protein pKa values. Proteins: Struct. Funct. Genet. 61,
704–721.
Petrov, A. S., Lamm, G. & Pack, G. R. (2004). The
triplex-hairpin transition in cytosine-rich DNA.
Biophys. J. 87, 3954–3973.
Misra, V. K., Sharp, K. A., Friedman, R. A. & Honig,
1494
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
B. (1994). Salt effects on ligand-DNA binding. Minor
groove binding antibiotics. J. Mol. Biol. 238, 245–263.
Misra, V. K., Hecht, J. L., Sharp, K. A., Friedman, R. A.
& Honig, B. (1994). Salt effects on protein-DNA
interactions. The lambda cI repressor and EcoRI
endonuclease. J. Mol. Biol. 238, 264–280.
Ben-Tal, N., Honig, B., Peitzsch, R. M., Denisov, G. &
McLaughlin, S. (1996). Binding of small basic peptides to membranes containing acidic lipids: theoretical models and experimental results. Biophys. J. 71,
561–575.
Hecht, J. L., Honig, B., Shin, Y. K. & Hubbell, W. L.
(1995). Electrostatic potentials near-the-surface of
DNA - Comparing Theory and Experiment. J. Phys.
Chem. 99, 7782–7786.
Misra, V. K. & Honig, B. (1995). On the magnitude of
the electrostatic contribution to ligand-DNA interactions. Proc. Natl Acad. Sci. USA, 92, 4691–4695.
Misra, V. K. & Draper, D. E. (1999). The interpretation
of Mg2+ binding isotherms for nucleic acids using
Poisson–Boltzmann theory. J. Mol. Biol. 294, 1135–1147.
Misra, V. K. & Draper, D. E. (2000). Mg2+ binding to
tRNA revisited: the nonlinear Poisson–Boltzmann
model. J. Mol. Biol. 299, 813–825.
Misra, V. K. & Draper, D. E. (2001). A thermodynamic
framework for Mg2+ binding to RNA. Proc. Natl Acad.
Sci. USA, 98, 12456–12461.
Misra, V. K. & Draper, D. E. (2002). The linkage between magnesium binding and RNA folding. J. Mol.
Biol. 317, 507–521.
Misra, V. K., Shiman, R. & Draper, D. E. (2003). A
thermodynamic framework for the magnesium-dependent folding of RNA. Biopolymers, 69, 118–136.
Sharp, K. A. & Honig, B. (1990). Calculating total
electrostatic energies with the nonlinear PoissonBoltzmann equation. J. Phys. Chem. 94, 7684–7692.
Murthy, C. S., Bacquet, R. J. & Rossky, P. J. (1985).
Ionic distributions near poly-electrolytes–a comparison of theoretical approaches. J. Phys. Chem. 89,
701–710.
Bacquet, R. & Rossky, P. J. (1984). Ionic atmosphere of
rodlike poly-electrolytes–a hypernetted chain study.
J. Phys. Chem. 88, 2660–2669.
Svensson, B., Jonsson, B. & Woodward, C. E. (1990).
Monte-Carlo simulations of an electric double-layer.
J. Phys. Chem. 94, 2105–2113.
Guldbrand, L., Jonsson, B., Wennerstrom, H. & Linse,
P. (1984). Electrical double-layer forces—a MonteCarlo study. J. Chem. Phys. 80, 2221–2228.
Vorobjev, Y. N., Scheraga, H. A., Hitz, B. & Honig, B.
(1994). Theoretical modeling of electrostatic effects of
titratable side-chain groups on protein conformation
in a polar ionic solution. 1. Potential of mean force
between charged lysine residues and titration of poly
(L-lysine) in 95-percent methanol solution. J. Phys.
Chem. 98, 10940–10948.
Alexov, E. G. & Gunner, M. R. (1997). Incorporating
protein conformational flexibility into the calculation
of pH-dependent protein properties. Biophys. J. 72,
2075–2093.
Alexov, E. G. & Gunner, M. R. (1999). Calculated
protein and proton motions coupled to electron
transfer: electron transfer from QA- to QB in bacterial
photosynthetic reaction centers. Biochemistry, 38,
8253–8270.
Gunner, M. R. & Alexov, E. (2000). A pragmatic
approach to structure based calculation of coupled
proton and electron transfer in proteins. Biochim.
Biophys. Acta, 1458, 63–87.
Calculating pKas in RNA
63. Forrest, L. R. & Honig, B. (2005). An assessment of
the accuracy of methods for predicting hydrogen
positions in protein structures. Proteins: Struct. Funct.
Genet. 61, 296–309.
64. Smith, J. S. & Nikonowicz, E. P. (1998). NMR
structure and dynamics of an RNA motif common
to the spliceosome branch-point helix and the RNAbinding site for phage GA coat protein. Biochemistry,
37, 13486–13498.
65. Legault, P. & Pardi, A. (1997). Unusual dynamics and
pKa shift at the active site of a lead-dependent
ribozyme. J. Am. Chem. Soc. 119, 6621–6628.
66. Su, L., Chen, L., Egli, M., Berger, J. M. & Rich, A.
(1999). Minor groove RNA triplex in the crystal
structure of a ribosomal frameshifting viral pseudoknot. Nature Struct. Biol. 6, 285–292.
67. Nixon, P. L., Rangan, A., Kim, Y. G., Rich, A.,
Hoffman, D. W., Hennig, M. & Giedroc, D. P. (2002).
Solution structure of a luteoviral P1-P2 frameshifting
mRNA pseudoknot. J. Mol. Biol. 322, 621–633.
68. Ferre-D'Amare, A. R., Zhou, K. & Doudna, J. A.
(1998). Crystal structure of a hepatitis delta virus
ribozyme. Nature, 395, 567–574.
69. Ke, A., Zhou, K., Ding, F., Cate, J. H. & Doudna, J. A.
(2004). A conformational switch controls hepatitis
delta virus ribozyme catalysis. Nature, 429, 201–205.
70. Rupert, P. B. & Ferre-D'Amare, A. R. (2001). Crystal
structure of a hairpin ribozyme-inhibitor complex
with implications for catalysis. Nature, 410, 780–786.
71. Rupert, P. B., Massey, A. P., Sigurdsson, S. T. & FerreD'Amare, A. R. (2002). Transition state stabilization
by a catalytic RNA. Science, 298, 1421–1424.
72. Hoogstraten, C. G., Legault, P. & Pardi, A. (1998).
NMR solution structure of the lead-dependent
ribozyme: evidence for dynamics in RNA catalysis.
J. Mol. Biol. 284, 337–350.
73. Legault, P., Hoogstraten, C. G., Metlitzky, E. & Pardi,
A. (1998). Order, dynamics and metal-binding in the
lead-dependent ribozyme. J. Mol. Biol. 284, 325–335.
74. Nixon, P. L., Cornish, P. V., Suram, S. V. & Giedroc,
D. P. (2002). Thermodynamic analysis of conserved
loop-stem interactions in P1-P2 frameshifting RNA
pseudoknots from plant Luteoviridae. Biochemistry,
41, 10665–10674.
75. Nixon, P. L. & Giedroc, D. P. (2000). Energetics of
a strongly pH-dependent RNA tertiary structure
in a frameshifting pseudoknot. J. Mol. Biol. 296,
659–671.
76. Moody, E. M., Lecomte, J. T. & Bevilacqua, P. C.
(2005). Linkage between proton binding and folding
in RNA: a thermodynamic framework and its
experimental application for investigating pKa shifting. RNA, 11, 157–172.
77. Nakano, S., Proctor, D. J. & Bevilacqua, P. C. (2001).
Mechanistic characterization of the HDV genomic
ribozyme: assessing the catalytic and structural contributions of divalent metal ions within a multichannel
reaction mechanism. Biochemistry, 40, 12022–12038.
78. Bevilacqua, P. C., Brown, T. S., Chadalavada, D.,
Lecomte, J., Moody, E. & Nakano, S. I. (2005).
Linkage between proton binding and folding in
RNA: implications for RNA catalysis. Biochem. Soc.
Trans. 33, 466–470.
79. Shih, I. H. & Been, M. D. (2002). Catalytic strategies
of the hepatitis delta virus ribozymes. Annu. Rev.
Biochem. 71, 887–917.
80. Perrotta, A. T. & Been, M. D. (2006). HDV ribozyme
activity in monovalent cations. Biochemistry, 45,
11357–11365.
1495
Calculating pKas in RNA
81. Perrotta, A. T., Wadkins, T. S. & Been, M. D. (2006).
Chemical rescue, multiple ionizable groups, and
general acid-base catalysis in the HDV genomic
ribozyme. RNA, 12, 1282–1291.
82. Kumar, P. K., Suh, Y. A., Miyashiro, H., Nishikawa, F.,
Kawakami, J., Taira, K. & Nishikawa, S. (1992).
Random mutations to evaluate the role of bases at
two important single-stranded regions of genomic
HDV ribozyme. Nucl. Acids Res. 20, 3919–3924.
83. Belinsky, M. G., Britton, E. & Dinter-Gottlieb, G.
(1993). Modification interference analysis of a selfcleaving RNA from hepatitis delta virus. FASEB J. 7,
130–136.
84. Suh, Y. A., Kumar, P. K., Kawakami, J., Nishikawa, F.,
Taira, K. & Nishikawa, S. (1993). Systematic substitution of individual bases in two important singlestranded regions of the HDV ribozyme for evaluation
of the role of specific bases. FEBS Letters, 326,
158–162.
85. Tanner, N. K., Schaff, S., Thill, G., Petit-Koskas, E.,
Crain-Denoyelle, A. M. & Westhof, E. (1994). A threedimensional model of hepatitis delta virus ribozyme
based on biochemical and mutational analyses. Curr.
Biol. 4, 488–498.
86. Nesbitt, S. M., Erlacher, H. A. & Fedor, M. J. (1999).
The internal equilibrium of the hairpin ribozyme:
temperature, ion and pH effects. J. Mol. Biol. 286,
1009–1024.
87. Grasby, J. A., Mersmann, K., Singh, M. & Gait, M. J.
(1995). Purine functional groups in essential residues
of the hairpin ribozyme required for catalytic
cleavage of RNA. Biochemistry, 34, 4068–4076.
88. Ryder, S. P. & Strobel, S. A. (1999). Nucleotide analog
interference mapping of the hairpin ribozyme:
implications for secondary and tertiary structure
formation. J. Mol. Biol. 291, 295–311.
89. Salter, J., Krucinska, J., Alam, S., Grum-Tokars, V. &
Wedekind, J. E. (2006). Water in the active site of an
all-RNA hairpin ribozyme and effects of Gua8 base
variants on the geometry of phosphoryl transfer.
Biochemistry, 45, 686–700.
90. Wadley, L. M. & Pyle, A. M. (2004). The identification
of novel RNA structural motifs using COMPADRES:
an automated approach to structural discovery. Nucl.
Acids Res. 32, 6650–6659.
91. Gilson, M. K. & Honig, B. H. (1986). The dielectric constant of a folded protein. Biopolymers, 25,
2097–2119.
92. Reiter, N. J., Blad, H., Abildgaard, F. & Butcher, S. E.
(2004). Dynamics in the U6 RNA intramolecular
stem-loop: a base flipping conformational change.
Biochemistry, 43, 13739–13747.
93. Misra, V. K. & Draper, D. E. (1998). On the role of
magnesium ions in RNA stability. Biopolymers, 48,
113–135.
94. Draper, D. E., Grilley, D. & Soto, A. M. (2005). Ions
and RNA folding. Annu Rev Biophys. Biomol. Struct.
34, 221–243.
95. Chin, K., Sharp, K. A., Honig, B. & Pyle, A. M. (1999).
Calculating the electrostatic properties of RNA
provides new insights into molecular interactions
and function. Nature Struct. Biol. 6, 1055–1061.
96. Leontis, N. B., Stombaugh, J. & Westhof, E. (2002). The
non-Watson-Crick base-pairs and their associated
isostericity matrices. Nucl. Acids Res. 30, 3497–3531.
97. Leontis, N. B. & Westhof, E. (2001). Geometric
nomenclature and classification of RNA base-pairs.
RNA, 7, 499–512.
98. Lemieux, S. & Major, F. (2002). RNA canonical and
99.
100.
101.
102.
103.
104.
105.
106.
107.
108.
109.
110.
111.
112.
113.
114.
115.
non-canonical base-pairing types: a recognition
method and complete repertoire. Nucl. Acids Res.
30, 4250–4263.
Walberer, B. J., Cheng, A. C. & Frankel, A. D. (2003).
Structural diversity and isomorphism of hydrogenbonded base interactions in nucleic acids. J. Mol. Biol.
327, 767–780.
Hunter, W. N., Brown, T., Anand, N. N. & Kennard,
O. (1986). Structure of an adenine-cytosine base-pair
in DNA and its implications for mismatch repair.
Nature, 320, 552–555.
Flinders, J. & Dieckmann, T. (2001). A pH controlled conformational switch in the cleavage site of
the VS ribozyme substrate RNA. J. Mol. Biol. 308,
665–679.
Freier, S. M., Kierzek, R., Jaeger, J. A., Sugimoto, N.,
Caruthers, M. H., Neilson, T. & Turner, D. H. (1986).
Improved free-energy parameters for predictions of
RNA duplex stability. Proc. Natl Acad. Sci. USA, 83,
9373–9377.
Yildirim, I. & Turner, D. H. (2005). RNA challenges for computational chemists. Biochemistry, 44,
13225–13234.
Moody, E. M., Brown, T. S. & Bevilacqua, P. C. (2004).
Simple method for determining nucleobase pKa
values by indirect labeling and demonstration of a
pKa of neutrality in dsDNA. J. Am. Chem. Soc. 126,
10200–10201.
Murray, J. B., Seyhan, A. A., Walter, N. G., Burke, J. M.
& Scott, W. G. (1998). The hammerhead, hairpin and
VS ribozymes are catalytically proficient in monovalent cations alone. Chem. Biol. 5, 587–595.
Fedor, M. J. (2000). Structure and function of the
hairpin ribozyme. J. Mol. Biol. 297, 269–291.
Alexov, E. (2003). Role of the protein side-chain
fluctuations on the strength of pair-wise electrostatic
interactions: comparing experimental with computed pKas. Proteins: Struct. Funct. Genet. 50, 94–103.
Gilson, M. K. & Honig, B. (1988). Calculation of the
total electrostatic energy of a macromolecular
system: solvation energies, binding energies, and
conformational analysis. Proteins: Struct. Funct. Genet.
4, 7–18.
Rocchia, W., Sridharan, S., Nicholls, A., Alexov, E.,
Chiabrera, A. & Honig, B. (2002). Rapid grid-based
construction of the molecular surface and the use
of induced surface charge to calculate reaction
field energies: applications to the molecular systems and geometric objects. J. Comput. Chem. 23,
128–137.
Cornell, W. D., Cieplak, P., Bayly, C. I., Gould, I. R.,
Merz, K. M., Ferguson, D. M. et al. (1995). A second
generation force field for the simulation of proteins,
nucleic acids, and organic molecules. J. Am. Chem.
Soc. 117, 5179–5197.
Sitkoff, D., Sharp, K. A. & Honig, B. (1994).
Correlating solvation free energies and surface
tensions of hydrocarbon solutes.Biophys. Chem. 51,
397–403; discussion 404-399.
Pauling, L. (1960). The Nature of the Chemical Bond, 3rd
edit. Cornell University Press, .
Ferguson, D. M., Radmer, R. J. & Kollman, P. A.
(1991). Determination of the relative binding free
energies of peptide inhibitors to the HIV-1 protease.
J. Med. Chem. 34, 2654–2659.
Cullis, P. M. & Wolfenden, R. (1981). Affinities of
nucleic acid bases for solvent water. Biochemistry, 20,
3024–3028.
Shih, P., Pedersen, L. G., Gibbs, P. R. & Wolfenden, R.
1496
Calculating pKas in RNA
(1998). Hydrophobicities of the nucleic acid bases:
distribution coefficients from water to cyclohexane.
J. Mol. Biol. 280, 421–430.
116. Egli, M., Minasov, G., Su, L. & Rich, A. (2002). Metal
ions and flexibility in a viral RNA pseudoknot at
atomic resolution. Proc. Natl Acad. Sci. USA, 99,
4302–4307.
117. Brünger, A. T. (1992). X-PLOR Version 3.1. A System
for X-ray Crystallography and NMR. Yale University
Press, New Haven, CT.
Edited by D. E. Draper
(Received 26 July 2006; received in revised form 29 November 2006; accepted 1 December 2006)
Available online 6 December 2006