J. Mol. Biol. (2007) 366, 1475–1496 doi:10.1016/j.jmb.2006.12.001 Calculation of pKas in RNA: On the Structural Origins and Functional Roles of Protonated Nucleotides Christopher L. Tang 1 , Emil Alexov 1 , Anna Marie Pyle 2 and Barry Honig 1 ⁎ 1 Howard Hughes Medical Institute, Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biophysics, Columbia University, 1130 St. Nicholas Avenue, Room 815, New York, NY 10032, USA 2 Department of Molecular Biophysics and Biochemistry, Howard Hughes Medical Institute, Yale University, New Haven, CT 06520, USA pKa calculations based on the Poisson–Boltzmann equation have been widely used to study proteins and, more recently, DNA. However, much less attention has been paid to the calculation of pKa shifts in RNA. There is accumulating evidence that protonated nucleotides can stabilize RNA structure and participate in enzyme catalysis within ribozymes. Here, we calculate the pKa shifts of nucleotides in RNA structures using numerical solutions to the Poisson–Boltzmann equation. We find that significant shifts are predicted for several nucleotides in two catalytic RNAs, the hairpin ribozyme and the hepatitis delta virus ribozyme, and that the shifts are likely to be related to their functions. We explore how different structural environments shift the pKas of nucleotides from their solution values. RNA structures appear to use two basic strategies to shift pKas: (a) the formation of compact structural motifs with structurallyconserved, electrostatic interactions; and (b) the arrangement of the phosphodiester backbone to focus negative electrostatic potential in specific regions. © 2006 Published by Elsevier Ltd. *Corresponding author Keywords: ribozyme; pseudoknot; pKa calculation; Poisson–Boltzmann equation; RNA structure Introduction There is increasing evidence that ionized nucleotides play important roles in RNA structure and function. Adenosine and cytidine can protonate on their N1 and N3 atoms, respectively, but both are poor bases and have pKas in solution that render them neutral at pH 7 (their pKas in solution are 3.8 for adenosine and 4.3 for cytidine; Figure 1).1,2 Nevertheless, there have been numerous examples where, based on the examination of crystal and Present address: E. Alexov, Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA. Abbreviations used: BPH, branch point helix; LDZ, lead-dependent ribozyme; BWYV, beet western yellows virus; PEMV, pea enation mosaic virus; HDVR, hepatitis delta virus ribozyme; LPB/NLPB, linear/non-linear Poisson–Boltzmann equation; ESP, electrostatic potential; MCCE, multi-conformation continuum electrostatics; MC, Monte Carlo; RMSD, root-mean-square deviation. E-mail address of the corresponding author: [email protected] 0022-2836/$ - see front matter © 2006 Published by Elsevier Ltd. solution structures,3–11 protonated nucleotides appear to be present in RNA, suggesting that their pKas have been shifted upwards from their solution values. Nucleotides with elevated pKas have been implicated to play a direct role in ribozyme catalysis.12,13 For example, many lines of biochemical and structural evidence suggest that the hepatitis delta virus ribozyme (HDVR) and the hairpin ribozyme, in particular, utilize protonated nucleotides or nucleotides with elevated pKas to achieve optimal activity.14–24 Protonated nucleotides have been implicated in a wide variety of structures, ranging from frameshifting pseudoknots to the ribosome itself.25–27 Therefore, a central question is whether nucleotides with shifted pKas play as significant a role in RNA structure and function as they often do in proteins, where pKa-shifted residues affect protein stability,28 control conformational changes,29 modulate binding to substrates,30 and participate in catalytic mechanisms.31 Indeed, the availability of protonated nucleotides would add to the diversity of chemical groups that could be used for function in RNA.32,33 Here, we seek to understand the structural determinants of pKa shifts, relative to solution values, of nucleotides in RNA using computational 1476 Figure 1. Adenosine and cytidine in their unprotonated (A, C) and protonated (A+, C+) states. Their solution pKas are shown in parentheses. methods. Our work builds on the extensive literature that exists for calculating pKas in proteins,34–42 and, more recently, in DNA.43 Most of these methods rely on numerical solutions to the Poisson–Boltzmann (PB) equation to obtain electrostatic contributions to the pKa shift. The linear PB equation (LPB) has been used in most applications in proteins but, given the high charge density on RNA molecules, the nonlinear PB equation (NLPB) is more appropriate. The NLPB has been applied extensively to highly charged systems such as acidic membranes and nucleic acids. Despite the large charge densities of highly charged molecules and the high mobile ion densities that accumulate in their vicinity, the predictions of the NLPB have been in remarkable agreement with experiment in many cases. Examples include the salt-dependence of binding of proteins and ligands to DNA,44,45 the salt and membrane charge-dependence of the binding of proteins and peptides to membrane surfaces,46 electrostatic potentials around DNA as measured by EPR experiments,47 and the absolute magnitude and salt-dependence of the pKa shift experienced by a ligand that intercalates into DNA.48 The NLPB has also successfully explained the binding isotherms of mixed ion species binding to DNA and RNA, 49 the stoichiometry and free energy of magnesium binding to DNA and RNA,50,51 and the magnesium-dependence of RNA folding.52,53 In many of these cases the salt concentration approaches 1 M, a region where the approximation in traditional PB methods such as Debye-Huckel theory are believed not to be valid. However, even the linearization condition inherent in the LPB (eϕ/ kT << 1, where ϕ is the electrostatic potential) actually improves at high salt, since the potentials induced by a macromolecule become weaker as the concentration of salt increases. As discussed by Calculating pKas in RNA Sharp & Honig,54 a more serious problem with Debye-Huckel theory is that it chooses one mobile ion as a fixed charge and all other mobile ions become part of the ion atmosphere. This introduces an artificial asymmetry into the problem that does not exist when the macromolecule is assumed to be fixed and the surrounding salt is treated as mobile. Indeed, one would not normally think of a DNA molecule as part of the ion atmosphere of a mobile sodium ion. The lack of symmetry in macromolecular and colloidal solutions makes it possible to define the electrostatic free energy uniquely within the context of the NLPB, and it is the availability of this formalism that enabled many of the applications that followed.54 All PB methods ignore ion–ion correlation and effects of ion size. However, these work in opposite directions in the sense that ion–ion correlation effects increase the local ion concentration, while ion size effects tend to reduce the local ion concentration by ensuring that two ions do not approach each other too closely. This cancellation may account, in part, for the fact that the NLPB provides such an accurate description of the dependence of electrostatic free energies on salt concentration, as summarized in the previous paragraph. Indeed, the NLPB underestimates ion distributions around cylinders as obtained from Monte Carlo simulations by only 10–15%, and the effect on electrostatic free energies appears to be in this range or smaller.55–58 This is the reason that the NLPB has been applied so effectively to macromolecular systems, despite its well known approximations. Simply stated, the consequences of these approximations, especially the effect on electrostatic free energies involving monovalent ions, do not appear to be severe. Their effect on solutions containing divalent ions are almost certainly more serious, although there have been few experimental tests that have made it possible to determine the magnitude of the problem. Based on the past successes of applying the NLPB to charged macromolecules, it seems reasonable to apply it to the calculation of pKas in RNA. However, pKa calculations on systems with many titratable groups require the use of methods that take multiple ionization equilibria into account. As will be discussed below, when a significant number of nucleotides are involved, the existence of a large number of ionization states can introduce computational complexity that, due to the lack of additivity among individual terms, essentially precludes the full use of the NLPB in pK a calculations. In order to deal with this problem, we introduce a method that uses solutions to the LPB for which additivity holds, but then adds a correction term that accounts approximately for missing nonlinear contributions. The approximation is related to one used previously in the treatment of the titration behavior of polylysine.59 Our calculations are based on a Monte Carlo treatment of multiple ionization states. Specifically, we use a modified version of the MCCE method that 1477 Calculating pKas in RNA has been shown to be effective both for pKa calculations in proteins60–62 and for the placement of hydrogen in crystal structures.63 We extend the MCCE method so that it is applicable for pKa calculations in RNA. To this end, we report a new set of atomic parameters for calculating electrostatic potentials in RNA molecules containing protonated nucleotides. Our approach is validated by testing its ability to reproduce quantitatively pKas taken from the literature. We address the role of shifted pKas in RNA through an analysis of the branch-point helix,64 the lead-dependent ribozyme,65 pseudoknots from the beet western yellows virus66 and the pea enation mosaic plant virus,67 HDVR,68,69 and the hairpin ribozyme.70,71 For cases where experimental data are available, the calculated pKa shifts are in quite good agreement with experimental results. However, based on experience with proteins, it is unlikely that all of the calculated pKas will be in quantitative agreement with experimental values, primarily because conformational changes that accompany changes in ionization state are not taken into account. The consequences of assuming a rigid RNA structure will be discussed further below but the expectation is that calculated shifts will be too large. However, the key result of our analysis is not the magnitude of pKa shifts but the identification of nucleotides that undergo significant shifts and the determination of the structural factors that lead to these shifts. Our analysis reveals that nucleotides with elevated pKas are often located at positions in the structure where they contribute to hydrogen bonds in their protonated states, and in regions of the RNA that have been characterized to be catalytically or functionally important. We find also that several distinct features of RNA are important for the occurrence of pKa shifts to higher values. These include the abundance of negatively charged phosphate groups near pKa-shifted nucleotides and conserved interactions with polar groups from adjacent nucleotides. In addition, as is the case for proteins, the removal of nucleotide groups from the solvent generally favors pKa shifts of bases to lower values. A comparison between C+GCA motifs in divergent structures gives us a novel view of how these motifs may be stabilized. Our analysis provides a detailed picture of how structure influences pKa shifts in RNA molecules. spectroscopy for two RNAs. The first of these, the branch-point helix (BPH), has a 21 nucleotide stem– loop structure containing an internal asymmetric loop (PDB ID 17ra).64 In the structure, consecutive adenosine residues in the asymmetric loop, A6 and A7, stack within the helix opposite a single uridine, U16. The measured pKa of A7 is shifted to 6.1, while the other adenosine residues in the structure have pKa ≤ 5.5 (Table 1). Calculations were carried out using ionic strengths that mimic experimental conditions (e.g. 10 mM monovalent salt for BPH).64 As can be seen from Table 1, there is a striking agreement between the measured and calculated values. The two nucleotides with the highest and second highest measured pKa shifts (A7 and A13) were identified in the correct order and were calculated to have pKas within 0.7 pKa unit of the experiment. In addition, the calculated pKas of nucleotides involved in Watson–Crick base-pairs are depressed from their solution values, as normally would be expected. Results All pKas were calculated from the LPB using non linear correction as described in the text. The mean ± standard deviation of the calculated pKa values is given for the 12 low-energy NMR structures for BPH (PDB ID 17ra) and the 25 low-energy NMR structures for LDZ (PDB ID 1ldz). Secondary structure interactions are annotated with one of the following types: wc, Watson–Crick; A+U, protonated AU; or A+C, protonated AC. Nucleotides with experimentally measured pKas are highlighted in bold-face. a In an A+U pair, A:N1+ forms a hydrogen bond with U:O2 as the acceptor. b In an A+C pair, A:N1+ forms a hydrogen bond with C:O2 as the acceptor. Calculation of nucleotide pKas in RNA structures Assessment of the accuracy of calculated pKas by comparison with spectroscopically determined values In order to validate our approach, we compared calculated pKas to those determined by NMR Table 1. Comparison of calculated and spectroscopically determined pKas Nucleotide Secondary SpectroscopicallyCalculated pKa structure determined pKa using NL correction Branch-point helix (BPH) in 10 mM NaCl 64 C3 wc A6 <5.0 6.1 A7 A+Ua A10 <5.0 A13 5.5 C14 wc C15 wc A17 wc <5.0 C20 wc C21 wc 2.5 ± 0.7 2.5 ± 0.9 6.8 ± 0.8 1.7 ± 0.6 5.3 ± 0.4 3.5 ± 1.0 1.4 ± 0.8 2.7 ± 1.3 1.7 ± 0.8 2.1 ± 0.5 Lead-dependent ribozyme (LDZ) in 100 mM NaCl 65 C2 wc A4 wc ≤3.1 C5 wc + b C6 A C A8 4.3 ± .3 C10 wc C11 wc A12 wc ≤3.1 C14 wc A16 3.8 ± 0.4 A17 3.8 ± 0.4 A18 3.5 ± 0.6 6.5 ± 0.1 A25 A+C C28 wc C30 wc 2.1 ± 1.5 <3.0 3.0 ± 2.0 2.8 ± 2.4 4.9 ± 0.8 1.4 ± 1.5 3.7 ± 1.5 <3.0 4.6 ± 1.0 3.4 ± 1.1 2.4 ± 1.3 3.6 ± 0.9 7.3 ± 1.8 3.1 ± 0.7 5.0 ± 2.0 1478 Calculating pKas in RNA Table 2. Salt-dependence of the pKa of A25 in leaddependent ribozyme [NaCL] (mM) 100 500 Spectroscopicallydetermined pKa Calculated pKa using NL correction Calculated pKa using LPB alone 6.5 ± .1 5.9 ± .1 7.3 ± 1.8 6.6 ± 1.8 7.9 ± 1.8 6.8 ± 1.8 The pKa of A25 was calculated using 25 low-energy NMR structures from PDB ID 1ldz under different salt conditions and compared to experiment.73 Calculations were performed using the nonlinear (NL) correction term and using the LPB alone. The second structure, lead-dependent ribozyme (LDZ), is a 30 nucleotide stem–loop that also contains an internal asymmetric loop (PDB IDs 1ldz and 2ldz).72 The asymmetric loop contains a protonated A+C pair, A25-C6, in which A25 displays a measured pKa of 6.5 ± 0.1 (Table 1). This loop contains a non-canonical AG pair flanked by two extrahelical guanosine nucleotides; these and all other nucleotides are measured to have more typical pKas of less than 4.3 (Table 1).65,73 We calculated pKas for each nucleotide and averaged them across the published set of 25 NMR conformers. The nucleotides with the highest (A25) and second highest (A8) measured pKas were identified in the correct order and the calculated values were accurate to within 0.8 pKa unit. Our ability to account for salt effects was tested by comparing the calculated pKa of A25 to experimental measurements under two salt conditions. The experimentally determined pKa of A25 shifts from 6.5 to 5.9 upon changing the concentration of monovalent ion from 100 mM to 500 mM (Table 2).65,73 The calculated shift, from 7.3 to 6.6, is in excellent agreement with experiment, as are the absolute values, which are within 0.8 pKa unit of the experimental measurement. As can be seen in Table 2, agreement with experiment is reduced slightly if the MCCE procedure is used in conjunction with the LPB. In all other cases we have examined, the pKas reported by the LPB method are one to two units larger than those obtained from the NLPB, probably because the effects of the ion atmosphere in screening interactions with phosphate groups is underestimated by the LPB. Since even the NLPB calculations tend to overestimate pKa shifts, the use of the LPB reduces overall agreement between the calculated results and experiment. Identification of pKa-shifted nucleotides in pseudoknots The pKa calculations were carried out on pseudoknot structures from the beet western yellows virus, BWYV-ψ, and the pea enation mosaic virus, PEMV-ψ. BWYV-ψ and PEMV-ψ share a common secondary structure topology composed of two stems (S1 and S2) and two loops (L1 and L2), where L1 interacts with the major groove of S2, and L2 interacts with the minor groove of S1. In both pseudoknots, tertiary contacts between the L1 and S2 form a C+GCA structural motif containing a protonated cytidine (Figure 2). As can be seen in the Figure, C+ GCA appears to be a recurring structural motif that has been observed in HDVR.67 The unfolding of both pseudoknots have been shown to be highly pH-dependent, which has been attributed to the cytidine in the C+GCA motif on the basis of the proposed hydrogen bond between the protonated cytidine N3 nitrogen atom and the guanosine O6 oxygen atom in the structure.74,75 The protonated cytidine in the C+GCA motif is identified in the calculations as having the most elevated pKa in both structures (Table 3). As shown in Table 3, the pKa is calculated to be 13.7 for C8 in BWYV-ψ and 10.6 for C10 in PEMV-ψ. The pHdependence of unfolding in BWYV-ψ and PEMV-ψ have been measured to exhibit apparent pKas of 6.8–7.3 and 7.1, respectively.74,75 However, the apparent pKas obtained from folding/unfolding transitions do not correspond directly to the pKas of individual groups. In the simplest case, where only a single group controls titration behavior, the apparent pKa corresponds to the midpoint between the pKa of the titratable group in the two states (folded and unfolded).40,76 If only a single group determines the shape of the titration curves for Figure 2. HDVR, BWYV-ψ and PEMV-ψ share a common C+GCA structural motif. Dotted lines indicate the hydrogen bond network. A hydrogen bond between protonated N3 atom of cytidine and the O6 atom of guanosine is indicated by the red arrow (forming a C+G [rh] interaction, also discussed in the legend to Table 3). 1479 Calculating pKas in RNA Table 3. Comparison of calculated and apparent pKas of unfolding Nucleotide Secondary structure Apparent pKa of unfolding Calculated pKa using NL correction BWYV-ψ in 100 mM NaCl, 10 mM MgCl274,75 C3 wc C5 wc 6.8-7.3 C8 C+G[rh]a A9 C10 wc C11 wc C14 wc C15 wc C17 wc A20 O2′ A21 C22 A23 O2′ A24 AG A25 O2′ C26 wc <3.0 <3.0 13.7 ± 0.1 2.6 ± 0.1 <3.0 <3.0 <3.0 <3.0 <3.0 7.3 ± 0.1 4.6 ± 0.6 4.5 ± 0.1 n.r. <3.0 6.1 ± 0.2 <3.0 PEMV-ψ in 100 mM NaCl 74,75 C5 wc C6 wc + C10 C G[rh] A12 C13 wc C15 wc C16 wc A19 AU[s]b, wc A21 A22 A23 C24 A25 A26 A27 O2′ C30 wc A31 <3.0 <3.0 10.6 ± 1.1 4.4 ± 2.0 3.5 ± 3.4 5.7 ± 2.7 <3.0 7.4 ± 0.7 3.3 ± 1.4 3.8 ± 1.8 7.8 ± 2.3 4.9 ± 2.5 4.6 ± 1.9 <3.0 2.1 ± 1.5 <3.0 4.2 ± 1.0 7.1 pKas were calculated for each adenosine and cytidine nucleotide in BWYV-ψ and PEMV-ψ. The mean ± standard deviation was calculated for a set of four BWYV-ψ crystal structures (PDB ID 437d and 1l2x) and 15 low-energy NMR structures of PEMV-ψ (PDB ID 1kpy). The apparent pKas of unfolding values (column 3) are taken from the literature. 74,75 Secondary structure interactions are annotated with types defined in Table 1 or from the following: o2′, hydrogen-bonded to a 2′ hydroxyl group; C+G[rh], protonated cytidine interaction with guanosine along its Hoogsteen edge; AG, AG mispair; or AU[s] sheared AU. For BWYV-ψ, the calculated pKa of A23 is marked n.r. (not reported) because there is a large discrepancy between the calculated values in the two crystal structures. See the text for details. a See Figure 2 for examples of C+G[rh] pairs. b In a sheared AU pair, the U appears shifted into the minor groove and U:O4 is within hydrogen bonding distance of A:N1, where, if a hydrogen bond is formed, the latter should be protonated. BWYV-ψ and PEMV-ψ, then assuming a pKa of 4.3 for cytidine in the unfolded state would predict a pKa of ∼10 for the cytidine in the folded state, which is in good agreement with the calculated value for PEMV-ψ. However, when multiple titration sites influence the folding reaction, the titration behavior becomes more complex and the experimental data become more difficult to interpret. Moreover, if residual secondary structure is present in the unfolded state, then assuming a reference pKa of 4.3 would not be correct. Moody et al. have provided a cogent discussion of thermodynamic linkage relationships involved in pH-dependent RNA folding and have discussed conditions where large unfolded state pKas might be expected.76 Their analysis highlights the difficulties of assigning experimental pKas to cytidine in the pseudoknots considered here. We can say with some certainty that the relevant values are greater than the apparent pKas so that they are likely to be above 7.5, and perhaps significantly higher. Thus, the calculations are successful in identifying cytidine nucleotides that have undergone significant pKa shifts, although we are unable to determine at this point whether the actual values are calculated accurately. On the other hand, the highest pKas we are aware of that has been measured experimentally for cytidine in a nucleic acid structure is 9.5.5 As such, a calculated pKa such as 13.7 for C8 in BWYV-ψ is unprecedented, and thus is almost certainly too high. Indeed, there is reason to believe that some pKa values have been overestimated, since we have treated the RNA structure as rigid; that is, we have not allowed the RNA to undergo conformational relaxation in response to a change in protonation. Since the crystal and NMR structures studied here were determined in pH ranges where the cytidine nucleotides of interest are protonated, one would expect some conformational relaxation to occur in the folded state that would stabilize the unprotonated form of the cytidine. This would, in turn, reduce the pKa to below the value obtained by assuming a rigid structure. For this reason, the values reported here for C8 in BWYV-ψ and C10 in PEMV-ψ are likely to be too large. On the other hand, the calculations clearly identify these two cytidine nucleotides as undergoing significant pKa shifts to higher values. Consistent with previous studies, our calculations suggest that these groups determine the pH-dependent unfolding of the two pseudoknots at high pH. The error resulting from the use of only two crystal conformations was greater than 4 pKa units for A23 in the BWYV-ψ (Table 3), and we concluded we could not determine its pKa with any precision (data not shown). This is likely due to the fact that small changes in local structure around the titrating group between the two conformations can have large effects on electrostatic free energies. The resulting energy differences may lead to large errors, especially if the number of conformations considered is very few. On the other hand, averages over larger numbers of conformations usually lead to less noisy results, as was the case for most of the remaining calculations. pKa-shifted nucleotides in the HDV ribozyme We computed the pKas of all titratable nucleotides in HDVR and the hairpin ribozyme, so as to determine the locations of nucleotides likely to be protonated at physiological pH. HDVR catalyzes a site-specific phosphodiester self-cleavage reaction 1480 that has been shown to be strongly pH-dependent.14,77 The structure of the HDVR ribozyme has been solved in the precursor and product conformations.68,69 We performed pKa calculations using the product (1cx0 and 1drz) and precursor (1vc5) crystal structures, each obtained at pH ≥ 6.68,69 Because of the central interest of C75 for understanding HDVR enzymatic function,14,16,19,68,69,77–79 we performed calculations only for structures with cytidine at position 75; the nine remaining structures were omitted from this study. With the exception of C75, the calculated pKas of the nucleotides were quite similar for the product and precursor structures (data not shown). This was expected since, except for differences between the product and precursor structures near C75, the overall similarity of the selected structures is very strong (<1.6/1.1 Å all-atom/all-phosphate-atom root-mean-square deviation (RMSD)). Following experimental conditions, 14, 77,80 we performed our calculations at 1.0 M monovalent salt (i.e. NaCl or LiCl). Ionspecific effects between two different species of monovalent salt, however, cannot be taken into account within the context of the PB equation. Figure 3(a) displays the pKas calculated for all titratable nucleotides in HDVR. Two nucleotides, C41 and C75, were calculated to have pKas greater than 5.8 (Figures 3(a) and 4(a)–(c)). C41 is part of a CAA three-nucleotide loop in HDVR and is involved in the structurally-conserved C + GCA motif, found also in the BWYV and PEMV pseudoknots described above in Figure 2. Its calculated pKa is 10.6, which is in the same range as the values calculated for the protonated cytidine nucleotides of the C+GCA motifs in the two pseudoknots. The identification of C41 as a nucleotide with a shifted pKa is consistent with the results reported by Been and co-workers, who have attributed the apparent rate constant for catalysis of about 7 to C41.17,80,81 C75 is calculated to have the second highest pKa in the product structure of the HDV ribozyme, with a calculated value of ∼9.6. Although we are not able to report pKas for the precursor because atoms near the 5′ terminus of the RNA are missing (1vc5),69 C75 is expected to have a higher pKa in the precursor than in the product, because it contains an extra negative charge due to the phosphate group located near C75. C75 is located at the active-site of the ribozyme and appears to form a hydrogen bond with the 5′ terminus OH in the product structure as in Figure 4(b) (PDB IDs 1cx0 and 1drz). In the precursor structure (PDB ID 1vc5), the N3 atom of C75 is within 2.7 Å of the O2P atom of the scissile phosphate group, suggesting strongly that the protonated state is stabilized by nearby phosphate groups. C75 has been shown to play a direct role in catalysis,14,16 and the mutation of C75 to U or G effectively eliminates ribozyme activity.82–85 Mutation of C75 to adenosine lowers the apparent pKa of the reaction by an amount that corresponds to the difference in the solution pKa values of cytidine and adenosine, suggesting strongly that the apparent pKa of the reaction reflects that of the nuc- Calculating pKas in RNA leotide at this position.14,18 It has been proposed that the catalytic activity of HDVR depends on the protonation of C75.14, 19,69 The identification of C75 as a nucleotide with an elevated pKa supports this hypothesis, although the calculated value is greater than the best estimate in the literature, pKa ∼ 6–8.14,80,81 pKa-shifted nucleotides in the hairpin ribozyme Like HDVR, the hairpin ribozyme catalyzes a site-specific phosphodiester cleavage reaction. In the crystallized structure of this ribozyme, the substrate appears as a separate strand, but the base-pairing of this strand with the ribozyme strand is integral to the formation of the ribozyme structure. Together, the substrate and ribozyme strands fold into a single four-helix junction.70,71 The active site is located within an extensive interface between the two major helices of the four-helix junction. pKas were calculated for the precursor and product structures (PDB ID 1m5k and 1m5v, each crystallized at pH 5), for which four structures were available. Calculations were not done on the 1m5o transition-state structure, since the presence of the vanadate ion made the partial charges of the transition state difficult to predict. Our calculations identified three nucleotides, A10, A22 and A38, whose pKas are predicted to be greater than 5.8 in the hairpin ribozyme when calculations were done under experimental salt conditions (1.0 M monovalent salt and 10 mM divalent salt).86 The calculated values are 6.6, 7.2 and 5.9, respectively (Figure 3(b)). As can be seen in Figure 5, A38 is located at the interface of the two major helices near the active site. In its protonated state, A38 appears to form a hydrogen bond with the oxygen atom at the site of the catalytic reaction. Biochemical characterization of the hairpin ribozyme has shown that the replacement of the adenosine with an abasic residue reduces the rate of catalysis by five to six orders of magnitude.20 However, activity can be largely restored by supplying free adenine in solution.20 Furthermore, substituting adenine with nucleobases having a higher pKa, such as isoguanine (pKa = 9.0), raises the apparent pKa of the reaction, suggesting that the nucleotide at position 38 is responsible for at least some of the pHdependence observed in the reaction.20 Substitutions by other nucleotide analogs displayed equivalent pKa shifts. On the basis of this evidence, it has been suggested that A38 in the protonated state stabilizes the transition state of the hairpin ribozyme. The elevated pKa calculated for A38 is consistent with this idea. A10 is also located in the interface between the two major helices near the active site. The elevated pKa of A10 is consistent with the sensitivity of the ribozyme activity to the solution pKa of nucleotide analogs substituted at A10.87,88 In particular, the decrease in activity when A10 is substituted with 8aza-adenosine (n8A), whose solution pKa is 2.2, can Calculating pKas in RNA 1481 Figure 3. pKas and electrostatic free energies in (a) hepatitis delta virus ribozyme and (b) the hairpin ribozyme. Nucleotides with significantly shifted pKas are labeled. These include C41 (calculated pKa = 10.6) and C75 (9.6) in HDVR, and A10 (6.6), A22 (7.2) and A38 (5.9) in the hairpin ribozyme. Values less than 3.0 are not reported. Locations of nucleotides involved in Watson–Crick base-pairs are indicated as w. The red line indicates the solution pKa of cytidine = 4.3. be rescued by lowering the pH of the reaction, suggesting that the ionization of A10 influences catalytic activity directly.23 New crystal structures have indicated that ordered water molecules are near the active site of the hairpin ribozyme, and one of these is in direct contact with the N1 atom of A10.89 Disruption of the water network by perturbing the protonation state of A10 could explain the pH-dependent nucleotide analog interference pattern of n8A, further supporting the existence of an elevated pKa for A10. Our treatment of buried waters as a dielectric continuum is, of course, 1482 Calculating pKas in RNA Figure 4. Structure and organization of the HDV ribozyme. (a) Surface view and secondary structure schematic of HDVR: P1 (red), P1.1 (yellow), P2 (tan), P3 (green), and P4 (purple). The approximate location of the scissile bond at the junction of several secondary structure elements is indicated by an arrow. (b) The C75:N3 and G1:O5′ atoms within the active site are within hydrogen bonding distance (1cx0). (c) C41 and C75 are shown relative to the secondary structure elements in HDVR. Colors of nucleotides correspond to those depicted in (a). The J4/2 loop is shown in pale blue and the C+GCA motif is in magenta. problematic but to account for the interactions of individual water molecules properly would require simulations that are beyond the scope of this work. Rather, our goals here are to identify nucleobases with shifted pKas and to understand how RNA structure is designed to effect these shifts. Lastly, A22 also exhibits an elevated pKa in the hairpin ribozyme. However, there is no evidence at this point that A22 plays a specific role in catalysis. Energetic contributions to pKa shifts in RNA As has been discussed extensively for amino acids,34–42,60,62 a number of factors can result in the pKa shift of a nucleotide in RNA away from the value observed for the isolated nucleotide in solution. In the context of RNA, these include favorable interactions between negatively charged phosphate groups and the protonated form of the base, desolvation effects and intramolecular interactions with other bases. Structural features that stabilize the protonated state of the nucleotide shift pKas upward, whereas features that destabilize that state shift pKas downward. Favorable interactions of a protonated base with negatively charged phosphate groups (base– phosphate interactions) will always favor a shift to higher pKas. Desolvation effects resulting from the transfer of an ionizable nucleotide from the solvent into a buried location within an RNA molecule will favor lower pKas compared to those observed in solution, due to the loss of stabilizing interactions of the ionized species with the solvent. Lastly, intramolecular interactions between nucleobases (base– base interactions) through hydrogen bonds and other polar interactions can also shift pKas. The size and direction of this effect depend on the 1483 Calculating pKas in RNA Figure 5. Structure and organization of the hairpin ribozyme. (a) Secondary structure cartoon of the hairpin ribozyme and (b) the locations of A10, A22 and A38 within the ribozyme. The approximate location of the scissile bond within the interface (gray region) of the two major helices is indicated by the arrow. (c) The conformation of A38, A-1 and G+1 in the hairpin ribozyme active site (1m5o) showing A38:N1 within hydrogen bonding distance of O5′ in the scissile phosphate group. detailed structural environment of each ionizable group. Due to the lack of additivity of individual contributions within the NLPB, we cannot report specific contributions for each of these terms to pKa shifts. However, in the following sections we report individual contributions to electrostatic free energies that can be related to structural features of the RNA. This allows us to consider how RNA structure is used to produce shifted pKas. Role of solvation and hydrogen bonding in stabilizing pKa shifts: the branch-point helix Although A6 and A7 are situated in very similar structural environments in the branch-point helix, the pKa of A7 is observed to be elevated, but the pKa of A6 is not (Table 1). In an attempt to understand the source of this difference, we have calculated a number of contributions to the electrostatic potential at both sites. As can be seen in Table 4, negatively charged phosphate groups contribute a strong negative electrostatic potential that stabilizes the protonated form of each base by ∼3.6 kcal/mol. However, desolvation opposes the phosphate contribution for A6 and A7. The effect is much smaller for A7, which is more exposed to solvent than A6. Indeed, much of the difference between the pKas of A6 and A7 can be attributed to solvent exposure. The ionized form of A7 is stabilized also by favorable interactions with other bases, primarily U16, which can form a hydrogen bond with the N1 atom of A7 via O2. Thus, the pKa of A7 is shifted to a higher value, due to the effects of the phosphate groups and interactions with other bases. In contrast, A6 has weaker interactions with other bases and the effect of the phosphate backbone is opposed by desolvation effects. Role of phosphate and base interactions in stabilizing pKa shifts in HDVR To understand the role of phosphate and base groups in shifting pKas, we calculated individual 1484 Calculating pKas in RNA Table 4. Electrostatic contributions due to changes in protonation state in the branch point helix and C+GCA motif Structure/ nucleotides Desolvation free energy Source of contribution Base–base interaction free energy Base–phosphate interaction free energy Branch-point helix A6 A7 +2.2 +0.5 All NTs All NTs +0.4 −1.3 −3.6 −3.5 C+GCA motifs HDVR C41 −0.8 BWYV-ψ C8 G73 G74 −4.6 −1.4 −0.5 −0.3 −0.1 PEMV-ψ C10 G12 C14 −5.4 −3.7 −0.7 −0.5 −2.3 U9 G28 −1.8 −2.6 −0.4 −0.4 Electrostatic contributions from specific sources were calculated as discussed in Methods and are reported in units of kcal/mol. Rows labeled All NTs signify the base–base and base–phosphate contributions due to all nucleotides. See Results for details. electrostatic free energy terms for C41 and C75 in HDVR, and A10, A22 and A38 in the hairpin ribozyme, and compared these values to those calculated for all other nucleotides (Figure 3). The electrostatic terms vary for each nucleotide, but there are regions in which base-phosphate and/or base–base interactions stabilize the protonated form of the base due to a strongly negative electrostatic potential. In particular, nucleotides with highly positive pKa shifts are found in regions where the negative electrostatic potential due to phosphate groups is particularly large. Different contributions stabilize the ionized forms of C41 and C75 in HDVR (Figure 3(a)). For C75, base–phosphate interactions dominate base–base interactions, whereas for C41 base–base interactions also contribute favorably. C75 experiences a high negative electrostatic potential induced by the specific arrangement of phosphate groups in HDVR. A unique feature of HDVR is the presence of a nested pseudoknot in the core of the structure. A reverse turn of the phosphodiester backbone at nucleotides C21 and C22 in this part of the structure results in a cup-like geometry of the phosphate groups such that they surround one surface of the catalytic C75 nucleotide (Figure 6). On the opposite surface, phosphate groups adjacent to the scissile bond also contribute to the negative potential. Finally, an S-turn in the backbone of the so-called J4/2 loop brings the A77 phosphate group within 7 Å of the catalytic core, which would not be achieved if the backbone did not contain a turn in this region. This unique convergence of geometries appears to be the source of the high electrostatic potential surrounding the active site nucleotide. In contrast, C41 is located in a region of the RNA where the phosphate potential is not unusually negative (Figure 3(a)). Instead, the high pKa shift of C41 can be explained by the unusually strong energetic contributions from base–base interactions, predominantly with nucleotides within the C+ GCA motif. As discussed in the following section, the favorable interactions that stabilize the protonated cytidine include those formed with the guanosine nucleotide, with which it shares two hydrogen bonds, and additional neighboring nucleobase interactions that are conserved across very different RNA structures. Conservation of stabilizing interactions in the C +GCA motif In order to understand the energetic interactions required to stabilize the protonated cytidine in the C+GCA motif, we compared the magnitude of individual electrostatic free energy terms at the N3 atom of the protonated cytidine within the three structures where it is found; HDVR, BWYV-ψ and PEMV-ψ. As can be observed in Figure 2, the four nucleotides composing the C+GCA motif can be readily superimposed. In each case, the protonated cytidine (C41 in HDVR, C8 in BWYV-ψ, and C10 in PEMV-ψ) is involved in an unusual hydrogen bond with the major-groove (Hoogsteen) edge of the guanosine in the motif (G73 in HDVR, G12 in BWYV-ψ, and G28 in PEMV-ψ). As can be seen in the structure, this hydrogen bond interaction occurs via the keto oxygen of guanosine and stabilizes protonated cytidine. Consistent with its role in the structure, this interaction emerges as a major stabilizing feature, as shown in Table 4. A second feature present in all three structures appears to stabilize the protonated cytidine. Specifically, a neighboring nucleotide stacked above (and/ or below) the plane of the C+G base-pair contributes to the stability of the protonated cytidine. As can be seen in Table 4, G74 in HDVR, C14 in BWYV-ψ, and U9 in PEMV-ψ play this role. In each case, the individual free energy terms due to neighboring nucleobase interactions are >1.4 kcal/mol. To understand how three apparently different nucleotides could serve the same role in stabilizing this structure, we compared the structures of the three nucleotides near the C+GCA motif (Figure 7). In Figure 7, it is clear that the G74, C14 and U9 nucleotides all contribute a keto oxygen atom to a Calculating pKas in RNA 1485 Figure 6. Phosphate and other oxygen atoms can stabilize protonated nucleotides. (a) Structure of the phosphodiester backbone near C75 (purple and yellow surface, purple and blue trace). A cluster of phosphate ions (yellow) from the P1.1 pseudoknot helix, the substrate strand, and the J4/2 loop stabilize the pKa shift of C75. The O5′ atom in the scissile phosphate group is labeled. (b) The conformation of G29 is consistent with the formation of an O2′ hydrogen bond with A78. position within 4 Å of the proton of the ionized cytidine, stacked above or below the plane of the C+G base-pair. This oxygen atom is not involved in a hydrogen bond with the proton but rather it is arranged so as to optimize local electrostatic interactions in the neighborhood of the cytidine proton. Figure 7. Hydrogen bonds and neighboring nucleobase interactions stabilize the protonated cytidine in the C+GCA motif. Distances are given for neighboring nucleobase interactions between oxygen and cytidine N3. See the text for details. 1486 Calculating pKas in RNA Most interesting is the case of C14 in BWYV-ψ. This pseudoknot is characterized by a highly unusual backbone conformation in the vicinity of nucleotides 13 and 14 (A.M.P. and L. Wadley, unpublished results).90 For example, U13 is C2′endo and the backbone is described by η-θ values that fall outside any of the typical regions for RNA (η, 62.3; θ, 42.2, for conformation B). The G12-C14 base-step is characterized by a much greater than usual helical twist of nearly 90° (Figure 7). This overtwisted conformation is accommodated by the RNA backbone through the unpaired and outwardly-flipped U13 base. From these observations and the calculated energy profiles, the structurallyconserved keto oxygen atom described above appears to be an important feature of the C+ GCA motif, which, to our knowledge, has not been characterized. phodiester backbone segments arranged in close proximity (Figure 8). It is likely that the close packing of this interface evolved in such a way as to bring together the four backbone segments, resulting in the creation of a region at the center of the helical interface, and the catalytic core, with a particularly high negative electrostatic potential. In some cases, nucleotides may interact strongly with phosphate groups but are not calculated to have a large pKa shift (e.g. A77 and A78 in HDVR; Figure 3(a)). In these cases, the protonation of one nucleotide with a higher pKa, such as C75, reduces the pKa shifts of nearby nucleotides that interact with it. Stabilization of pKa shifts in the hairpin ribozyme Calculating pKas in RNA To understand the structural features that stabilize the pKa shift of A10, A22 and A38 in the hairpin ribozyme, free energy contributions that affect protonation were calculated for these nucleotides. pKas generally coincide with the regions of the RNA where the negative electrostatic potential due to phosphate groups is higher than in the surrounding structure. Specifically, nucleotides 9-10, 20–27, and 38–44 (Figure 3(b)) experience a significant negative electrostatic potential. Within each of these nucleotide ranges, at least one nucleotide is calculated to have a pKa shift > 2 pKa units from its solution value. These nucleotides coincide with center of the dense interface between the two major helical elements of the hairpin ribozyme that consists of four phos- Discussion Conformational relaxation Here, we report a treatment of the factors that produce pKa shifts in RNA structures. On the basis of comparisons to experimental results where the pKas have been determined directly, our approach appears to be effective in predicting the pKas of nucleotides. Most significantly, it is successful in identifying bases with a shifted pKas and, in each case, offers a structural interpretation for the shifts. In systems where pKas have not been measured directly, the identification of pKa shifts in this study is supported by multiple lines of biochemical evidence, such as the pH-dependence of catalysis or unfolding. In many cases, the calculated pKas can be interpreted meaningfully and agree quan- Figure 8. Phosphodiester backbones of the two major helices stabilize the pKa shift of A38 at the interface of the helices. The scissile phosphate group (arrow) is colored orange (phosphorus) and red (oxygen). 1487 Calculating pKas in RNA titatively with experiment, but in a few other cases, such as the protonated C in C+GCA and the catalytic C in HDVR, the calculated values appear to be more elevated than the best estimates now available from experiment. As we have discussed, the discrepancies are likely due to the assumption that the RNA structure does not change with change of the ionization state. Using an internal dielectric constant of 4 accounts for some minor conformational relaxation throughout the RNA associated with nucleotide ionization,91 but clearly this does not account for major changes that could occur if, say, a nucleotide was stacked into a helix in one conformation, but flipped out of the helix in another. This, for example, has been shown to occur in the U6 RNA intermolecular stem–loop (ISL).92 In such cases, conformational changes must be treated explicitly. Although assuming a rigid molecule is clearly an oversimplification, it is of considerable interest to explore pKa predictions based on the experimental structure alone without introducing uncertainties arising from a treatment of conformational relaxation in RNA. Indeed, the good agreement between the calculated and experimental results obtained here for the branch-point helix and the lead-dependent ribozyme suggests that base protonation does not induce large conformational changes in these two structures, or at least that such changes are not large enough to have significant effects on pKa. Divalent ions Nonlinear effects The role of phosphate groups in inducing pKa shifts A major complication involved in calculating pKas in a highly charged molecule arises from the nonlinear response to salt concentration. For proteins where nonlinear effects are not generally thought to be important, the electrostatic interaction between two sites is independent of the ionization state of the other sites. Thus, all interactions are additive and need to be calculated only once. In contrast, when the nonlinear PB equation is used, the interaction of each pair of sites depends on the charge states of other sites, since these, in turn, affect the screening by salt of the pairwise interaction in question. Accounting for this effect exactly is computationally expensive and to do so would require a separate PB calculation for each of the 2 N ionization states of the macromolecule. An approach to this problem was considered by Vorobjev et al. for polylysine helical peptides.59 They used a screening factor for each pairwise interaction, which increased with the electrostatic potential that characterized each interaction (and hence, the net charge of the molecule). Our method differs by attempting to calculate a nonlinear correction as a single term correlated with the total net charge on the RNA. Overall, this is a simpler approach for treating nonlinearity that nonetheless appears to achieve good accuracy compared to experiment. How do structural elements in ribozymes elevate the pKas of nucleotides near their active sites? We propose that the elevated pKas of several nucleotides are a consequence of the architecture, or “fold”, of the RNA, in which the local abundance of phosphate groups helps to elevate pKas. In HDVR, for instance, the two major helical axes of the ribozyme form a Yshaped intersection that converges near the active site (Figure 4(c)). As a consequence, the active site cytidine, C75, is brought together with the phosphate groups adjacent to the scissile bond, as well as those involved with forming the central, P1.1 pseudoknot helix (nucleotides 21-22; Figure 6). Such a compact arrangement of phosphate groups, along with the curvature of the molecular surface, can focus strong electrostatic potentials in the active site region. Using calculations of the surface potential with the nonlinear PB equation, Bevilacqua and colleagues have shown that this is indeed calculated to happen for this ribozyme.77,95 In addition, we now show that the highly negative potential leads to an elevated pKa calculated for the C75 nucleotide. Notably, we also observed that the electrostatic environment surrounding the proximal nucleotides, A77 and A78 (Figure 3(a)), also favor elevated pKas, although it remains to be shown whether this has any functional significance for the ribozyme. The appropriate treatment of divalent ions is of considerable importance, given their significant electrostatic contributions to folding, stability and ligand binding within RNA.93,94 Here, divalent ions have been treated using the same formalism governing the interaction of monovalent ions with RNA. Thus, site-bound ions are not treated explicitly; rather, all divalent ions are assumed to be bound diffusely and treated directly by application of the NLPB.50 The effects of site-bound ions is potentially of particular importance to the calculation of pKas in the active conformation of the HDVR structure, where it has been reported that electron density, interpreted as the presence of a site-bound hydrated metal ion (e.g. Mg2+(H2O)6), is observed in the crystal structure 4.3 Å away from the titration site of the catalytic nucleotide (PDB 1sj3).69 However, we have used only the structures of HDVR in which Mg2+ was not observed in the active site; namely, in the product conformations (PDB 1cx0 and 1drz) and one that had been made inactive by the removal of Mg2+ from the solution conditions of the crystal structure (PDB 1vc5). A second factor justifying our treatment of divalent ions is that HDVR is catalytically active even in the presence of only monovalent salt.14,80 Thus, at the very least, the calculated pKa shift of the active site C75 is relevant to understanding the nature of catalysis in the absence of magnesium. On the structural origins of pKa shifts in RNA 1488 A similar convergence of phosphate groups can be observed near the active site of the hairpin ribozyme (Figure 5). The high electrostatic potential surrounding the active site may enable A38 to protonate and form a functionally important hydrogen bond in the transition state of the transesterification reaction. In this case, the local abundance of phosphate groups near the active site is the consequence of the crossing of the two major helical axes through the center of the molecule (Figure 8). Much like HDVR, the formation of the two-helix interface buries phosphate groups from both the substrate and ribozyme strands. In total, 19 nucleotides are buried by greater than 70% of their surface areas, and these occur mainly in the interface. It seems likely that there will be further instances in other RNAs where unusually high densities of phosphate groups or buried phosphate groups are used to shift the pKas of functionally important nucleotides. The role of base–base interactions in inducing pKa shifts Because of the importance of base–base interactions for defining structures in RNA, a number of attempts have been made to catalogue the hydrogen bonding patterns that are possible between nucleobases. 2,25,96–99 Through manual and automatic means, these efforts have identified eight distinct patterns of base-pairing that involve at least one protonated nucleotide: C+C (cis and trans), A+C (cis), A+G (Hoogsteen and reverse Hoogsteen), C+G (cis, Hoogsteen and reverse Hoogsteen), where cis and trans refer to the relative orientations along the glycosidic bonds. An underlying assumption for constructing most base-pair compendiums has been to limit them to coplanar base-pairs involved in at least two hydrogen bonds. Although these simplifications have been useful for enumerating the most likely configurations for protonated nucleotides, clearly these heuristics may miss the identification of nucleotides that are protonated if the hydrogen bond acceptor is a phosphate oxygen atom or 2′OH. Indeed, the pKa of adenosine or cytidine is particularly ambiguous in the absence of energy calculations when the hydrogen bonding partner is a 2′OH because it is possible for the 2′ oxygen atom to act as either a hydrogen bond donor or acceptor. An example of this can be observed in the catalytic core of HDVR for A78 (Figure 6(b)). Among the set of base-pairs containing protonated nucleotides, one of the more commonly observed seems to be the A+C base-pair.4,92,100,101 In our current study, the A+C pair has appeared in the lead-dependent ribozyme where we have obtained a calculated pKa for the pair close to 6.5. Does this basepair have any function other than to stabilize the RNA under acidic conditions? Others have noted that the A+C interaction is isosteric to GU wobble pairs,96,100 where the C1′ atoms of the A+C pair and those of GU and glycosidic bonds are equally distant and in the same relative orientation. However, unlike the GU Calculating pKas in RNA wobble pair, the A+C pair forms only when the pH is sufficiently low for adenosine to protonate. Thus, unlike the GU wobble, A+C pairs can act as a pHsensitive conformational switch, such as the one that appears to occur near the cleavage site in the Varkud satellite ribozyme.101 In this system, deprotonation is coupled to a conformational change in the cleavage site stem–loop. A significant conformational change upon a shift in pH is observed also for the U6 ISL.92 It is possible that pH-dependent base-pairs like A+C may be conserved in an RNA where sensitivity to its pH environment may be important to its function. Conserved hydrogen bonds appear to be the main source of stability of the C+GCA motif. However, more subtle structural features such as the structural and electrostatic effects of neighboring nucleobases may also be involved. It is known that duplexes having the same composition of base-pairs but in different permutations have different energies of duplex formation.102,103 Indeed, there is experimental evidence that nearest neighbors influence the pKas of adjacent nucleotides directly.104 Differences arise from the different stabilities introduced by the juxtaposition of different interactions between basepairs due to the different permutations. In the case of the C+GCA motif, the structure appears to preferentially adopt a conformation where adjacent keto oxygen atoms are positioned to stabilize the protonated cytidine. Future work may involve performing nucleotide sequence alignment to discover whether this preference is more widely conserved. On the basis of our calculations, several nucleotides have been identified in the catalytic cores of ribozymes to have elevated pKas. Notably, the predicted nucleotides coincide almost precisely with nucleotides that have been shown to have catalytic roles. Moreover, many of these nucleotides that we have predicted to have anomalous pKas correspond to groups that have been suggested to be protonated on the basis of the pH-dependence of the catalytic rates of reaction. Each of the calculated results can generally be understood in structural terms. In the case of HDVR, C75 is thought to act as a general acid or base near the 2′ OH nucleophile of the precursor HDVR. The proximity of the C75 base to the 5′ oxygen terminus (Figure 4(d)) has suggested the possibility that, under certain conditions, the protonated form of C75 could be stable and act as a possible general acid or base, even though there is no structure of the native sequence clearly showing this interaction. In the hairpin ribozyme, the protonated form of A38 appears to form a hydrogen bond with a central phosphate oxygen atom within the trapped transition-state mimic (Figure 5(c)). Since the hairpin ribozyme displays no specificity for metal ions and none is observed to bind near the active site, nucleotides alone may be solely responsible for catalysis.105,106 Indeed, the same may be true for the HDV ribozyme.80 Such findings deepen our appreciation of the nature of RNA catalysis and the versatility of ribonucleotides. Moreover, the theoretical and computational methodology developed in this work offers the possibility of understanding the structural 1489 Calculating pKas in RNA origins of pKa shifts of nucleotides that play functional roles and, in addition, of using structural information to identify these nucleotides when direct experimental measurements are not available. Methods pKa calculations based on the Poisson–Boltzmann equation have been widely used to study proteins,34–42,60,62 and, more recently, DNA.43 Here, we review the underlying theory in order to discuss its application in the context of RNA. We used a modified version of the program MCCE,60–62 which uses a distance-dependent pairwise energy softening function107 to help prevent large electrostatic energies from dominating the pKa calculations. This can occur if ionizable groups approach each other too closely due to small errors in the crystal structure or when dielectric screening is not accounted for completely. A unique feature of MCCE is its ability to account for conformational changes between protonated and unprotonated states of the titratable group by sampling over multiple conformations. However, we have used a simplified version where titratable nucleotides are held rigid and can exist only in one of two states: protonated and unprotonated, and where the charge of the nucleotide is 0e or −1e, respectively. Theory of multi-site titration in nucleic acids The theory of multi-site titration for polymers in solution was developed previously,37,41,59,60 and is described here as applied to nucleic acids. Given a nucleic acid with N nucleotides that might be protonated, we can compute the titration curve of the ith nucleotide by finding its average degree of protonation, xi, as a function of pH, (i.e. xi = +1 if protonated, otherwise xi = 0). For the purposes of this study, we consider adenosine and cytidine nucleotides as capable of protonation, specifically on their N1 or N3 imino nitrogen atom, respectively, although the same representation can be used for additional types of nucleotides. We represent the protonation microstate, m, of the nucleic acid by the vector x with N elements, which describes the titration states of each nucleotide in the molecule for that microstate. A free energy, ΔGm, is associated with each microstate. There are M = 2N such microstates. The average charge on the ith nucleotide can be found by taking the Boltzmann-weighted average: ! " M X DGm xi ðmÞexp $ kB T ð1Þ hxi i ¼ m M ! " X DGm exp $ kB T m over the set of possible microstates. In a nucleic acid with N titratable nucleotides, the complete Boltzmann-weighted average requires the computation of 2N terms. In practice, this is avoided by using a Monte Carlo (MC) procedure to estimate the frequency of low-energy microstates, which will dominate the partition function. These are used to calculate the titration curves of each nucleotide using the microstate free energy described by equation (2). The pKa of the ith nucleotide is obtained by finding the pH at which ‹xi› is equal to 0.5 using the multi-conformational continuum electrostatics (MCCE) procedure.60 MCCE was designed to account for local conformational changes around an ionizable group but this feature of the program has not been developed for nucleic acids. For this reason, we have kept the RNA structure rigid and the term multiconformational is not appropriate for the current application. Thus, we have kept the title of the program we used but have turned off one of its features. However, the MCCE program offers a well-tested MC approach to using continuum electrostatics in the calculation of pKas and it has thus provided a particularly useful vehicle in the current study. In MCCE, ΔGLPB is obtained from solutions of the LPB m and is written as: DGLPB m ¼ N X i # $ xi d½2:3kB T pH$pKaref ðiÞ þ DGself ðiÞþDGfixedðiÞ' þ N X N % & 1X xid xjd DGpair ði; jÞ þ DGvdW ði; jÞ 2 i jp i ð2Þ where the reference pKa, pKref a (i), is the pKa of the ith titratable nucleotide in the hypothetical unfolded state of the nucleic acid (Figure 9). As a simplification, this value is taken to be the same as the solution pKa of an isolated nucleotide monophosphate and is quoted from experimental measurement as 3.8 for 5′-AMP at 25 °C in 0.1 M KNO3 and 4.3 for 5′-CMP at 25 °C in 0.1 M KCl.1 The precision of these measurements is expected to be ±0.2–0.4 pKa unit, given possible differences in temperature and salt concentration between the reference state and the experimental conditions of the RNA structures used here. The additional free energy terms in equation (2) are responsible for pKa shifts relative to the solution value. The self free energy, ΔGself(i), is the desolvation cost of protonating nucleotide i in the folded state compared to the unfolded state. ΔGfixed(i) gives the change in the free energy of solvent-screened coulombic, or pairwise, interactions between the charges in a protonated or unprotonated nucleotide and fixed charges in the RNA (i.e. due to Figure 9. Thermodynamic cycle considered for a pKa calculation of a single nucleotide for simplicity. See Methods for details. 1490 Calculating pKas in RNA guanosine and uridine nucleotides). In the MCCE method, ΔGfixed(i) includes any change in free energy of van der Waals (vdW) interactions upon protonation. The final energy terms, ΔGpair(i,j) and ΔGvdW(i,j), give the free energies of pairwise interaction and the van der Waals interaction, respectively, between the ith and jth titratable nucleotides. kB is Boltzmann's constant and T is the temperature of the system. At T = 25 °C, kBT is taken to be 0.6 kcal/mol. Standard equations for computing the desolvation free energy of a nucleotide and free energy of interaction between the nucleotide and other partial charges (e.g. due to other fixed or titratable nucleotides) from electrostatic potential are given by:108 Gself ¼ and X 1 atoms q Brxn$field 2 n n n Gfixedjpair ¼ atoms X ð3aÞ chgðmÞ qn Bn ð3bÞ where the summations run over the atoms of the nucleotide and qn is the partial charge of the nth atom in the nucleotide. φrxn-field is the reaction field potential at the n position of atom n induced by solvation effects, and φchg(m) n is the site potential at the coordinates of atom n induced by partial charges in the set of atoms m with all other partial charges set to zero. Hence, the electrostatic free energy terms in equation (2) can be expressed as: nuclðiÞ 1 X prot rxn$field DGself ðiÞ ¼ q Bn 2 n n nuclðiÞ 1 X unpr rxn$field $ q Bn 2 n n ð $ ð 12 $ and n Þ RNA rxn$field qprot n Bn nuclðiÞ 1 X unpr rxn$field q Bn 2 n n DGfixedjpair ði; jÞ ¼ Linear and nonlinear Poisson–Boltzmann equations Electrostatic site potentials and reaction field potentials are obtained from finite difference solutions to the Poisson–Boltzmann equation:109 4ke f ð5Þ U ðrÞ þ FðfÞ ¼ 0 kB T where ϕ(r) denotes the electrostatic potential, ρf(r) denotes the distribution of partial atomic charges and ε(r) is the value of the dielectric constant for any point in space.54 F (ϕ) has the general form: jd εðrÞjfðrÞ þ FðfÞu n nuclðiÞ X imate non linear microstate energies that can be used in the context of the MC procedure. Þ solution ð4aÞ nuclðiÞ X ffixedjpairgðjÞ qprot n Bn n nuclðiÞ X $ qunpr BnffixedjpairgðjÞ n n ð4bÞ where qprot and qunpr refer to the partial charges for the n n protonated and unprotonated forms of the nucleotide i, and φ{fixed|pair}(j) refer to potentials computed under the n appropriate set of atoms in nucleotide j. Since free energies within the LPB are additive, all of the terms in equation (2) need be calculated only once for a particular macromolecule. Thus, the PB equation does not need to be solved during every step of the MC procedure. However, as discussed below, additivity is lost if the NLPB is used and every term in the equation depends on the microstate involved. This would require that the NLPB be solved for every step in an MC procedure, which is not computationally feasible when many nucleotides are involved. In the next section, we describe our use of the LPB and NLPB equations. The section that follows introduces a method that allows us to calculate approx- 4k X b c zi expð$zi fðrÞÞ kB T i i ð6Þ FðfÞu $ q0 n2 fðrrÞ ð7Þ where the sum is taken over all mobile ion species, and cbi and zi are the bulk concentration and electrical charge of each species. Where only monovalent salt appears in the solvent, F(ϕ) is rewritten as –ε0κ2sinh(ϕ(r)), where κ2 is 8πe2I/ε0kBT and I is the ionic strength. When potentials are small, sinh(ϕ(r)) can be approximated simply as ϕ(r) and F(ϕ) is simplified to: This form for F(ϕ) yields the linear PB equation and has the important property that energetic contributions derived from it are linearly additive. Thus, the linear PB equation can be used to break up larger calculations into individual contributions to the electrostatic free energy, which can then be summed to yield total values, as described by equation (2). However, the drawback is that the linear approximation is valid only for molecules where the net charge is small and ions of different valence are all incorporated into a single ionic strength parameter I. RNA however bears a –1e charge for every unprotonated nucleotide in its structure, and electrostatic potentials can become very high for even moderately sized molecules. Water is assigned a value of ε = 80 and a lower dielectric constant is generally used to represent the solute; the solvent-accessible molecular surface represents the boundary between these two dielectric regions. As discussed in previous work, a value of ε = 1 represents a solute with no electronic polarizability (an implicit assumption in most all-atom simulations).91 The value of 2 has been shown to account well for electronic polarizability in a static structure, whereas larger values such as 4 account in small part for conformational changes in the molecule that accompany changes in ionization state.91 Since our model for RNA keeps the nucleobase and backbone rigid, a value of 4 is consistent with work done in proteins, and was adopted for this work.41,60 The dielectric constant inside the molecular surface of the RNA is assigned this value. The ionic strength is assigned a value of zero at every point in the finite difference lattice that is inside this surface and within an ion-excluded region that extends 2 Å from the surface. The nonlinear correction to microstate energy When the NLPB is used, as is appropriate for highly charged molecules, the additive property of the linear 1491 Calculating pKas in RNA equation is no longer valid. This is because the concentration of salt around the RNA depends on the charge state of each nucleotide; for example, there are clearly more positively charged counterions around the RNA when nucleotides are all negatively charged than when they are neutral. Thus, pairwise interactions between any two nucleotides depend on the ionization state of all other nucleotides. This leads to a major combinatorial problem that cannot be addressed without some type of approximation. In earlier studies, this was addressed by introducing a correction factor for each pair to account for the non linearity.59 However, we choose the simpler assumption of introducing a correction factor for each charged state of the RNA, which is less expensive to calculate. Our approach is to use the LPB equation to obtain pairwise energies that do not depend on the charge state of other nucleotides and to correct these linear energies based on the net charge of the RNA. The difference in electrostatic free energy obtained from the NLPB and LPB is defined here as ΔGcorr, where the superscript corr denotes a correction term. Thus: DGcorr ¼ DGNLPB $ DGLPB ð8Þ where ΔGNLPB and ΔGLPB are the electrostatic free energies computed using the nonlinear and linear PB equations, respectively.54 We assume that ΔGcorr can be approximated with a function that has a quadratic dependence on net charge. Specifically: 2 DGcorr m ¼ a Zm þ b Zm þ c N X i xi ðmÞ ð10Þ where the sum is over all nucleotides in a particular microstate. We determine values for a, b and c for each molecule by running LPB and NLPB calculations on three different microstates that produce a particular net charge and, in this way we are able to plot ΔGcorr as a function of Z. Fitting these points to the polynomial of equation (9) yields values of a, b and c. (A plot of the nonlinear correction energy for the RNAs studied here appears in Supplementary Data Figure 1.) We now define the approximate free energy of a microstate, ΔGNLPB(apprx) , as: m corr ¼ DGLPB DGNLPBðapprxÞ m m þ DGm Individual contributions to the electrostatic free energy The electrostatic free energy contribution due to desolvation is defined by equation (4a). We note here that values for ΔGself(i) obtained from the LPB and NLPB are nearly identical (data not shown). In order to obtain a measure of the electrostatic effects due to phosphate groups and to other bases, we have calculated electrostatic potentials at the N1 atom of each adenosine and the N3 atom of each cytidine. Although we recognize that the potential obtained from the NLPB is not additive, the terms we report are related directly to RNA structural features and this provides insight as to the source of the pKa shifts. Contributions due to phosphate groups, which include the atoms: P, O1P, O2P, O5′ and O3′, are obtained by assuming these groups to be charged, while all other atoms in the RNA are kept neutral. Multiplying these potentials by +1e, to reflect a change in ionization state at the site of protonation, yields the contribution to the electrostatic free energies that is reported in Figure 3 and Table 4. In order to calculate the electrostatic potentials due to the bases, we keep the phosphate groups charged and calculate the differential potential when the atoms in the bases are assumed to be charged relative to when they are assumed to be neutral. This can be done for all the bases in RNA or for an individual nucleotide. ð9Þ where a, b and c, are coefficients that are appropriate for a particular conformation of a given macromolecule. Zm represents the number of nucleotides protonated in microstate m, and can be written as: Zm ¼ that ΔGNLPB(apprx) is used only in the context of the MC m procedure. ð11Þ ΔGNLPB(apprx) is used in our MC procedure instead of m NLPB(apprx) ΔGLPB m . Note that the free energy defined by ΔGm is additive, but equation (11) accounts for nonlinear effects in an approximate way. The use of ΔGNLPB(apprx) yields m more accurate agreement between computed and experimental pKas than ΔGLPB alone (Supplementary Data m Figure 2). The difference is often on the order of +1–2 pKa units. All pKa calculations reported here were performed using the nonlinear correction, except where noted otherwise. A separate set of values, a, b and c, is computed for each NMR or crystal structure. The resulting nonlinear corrections are similar within each set of structures and salt conditions (see Supplementary Data Figure 1). Note The determination of partial atomic charges and radii The solution to the PB equations relies on a detailed atomic description of partial charges within the RNA along with its molecular surface. Since standard molecular mechanics force-fields do not provide partial charges for ionized forms of AMP and CMP, new partial atomic charges were calculated for these nucleotides. Our philosophy was to devise a simple way to generate partial charges that, when combined with appropriate radii, would be consistent with the experimental literature concerning the solvation energies of nucleobase derivatives. To do this, we used a philosophy similar to that used in the development of the AMBER atom-centered charges and a PARSE-like strategy for the selection of appropriate atomic radii.110,111 Atomic radii are used to describe the solvent-accessible molecular surface (and hence, the dielectric boundary) between solute and the solvent for calculations used here. The hydrogen radius was assigned a value of 1.10 Å. These and other atomic radii were chosen, in part, for their ability to reproduce trends of solvation in the four nucleobases. Consistent with PARSE radii for amino acids, Pauling's atomic radii were assigned to all heavy atoms. Thus, the atomic radius of phosphorus was assigned using its literature value of 1.90 Å.112 (See Supplementary Data Table 1). Partial charges were generated by fitting atom-centered charges to electrostatic potentials (ESP) derived ab initio using the B3LYP/6-31g* level of theory and using the program Gaussian 98 (gaussian.com). Nine calculations were performed: one for each of six ribonucleosides (A, A+ , C, C+ , G and U), and one for each of three conformations of dimethyl-phosphate (gauche-gauche, gauche-trans and trans-trans). The partial charges on ribose atoms C5′, H5′1, H5′2, C4′, H4′, O4′, C3′, H3′, C2′, O2′, 1492 HO2 were made equivalent in all six ribonucleosides by averaging the corresponding partial charges for each atom. Excess charges were redistributed over the atoms C1′, H1′ and N1/9 (nitrogen involved in the glycosidic bond) to ensure the net charge per nucleotide was integral. A single set of partial charges was obtained for the phosphate atoms P, O1P, O2P, O3′, O5′ by averaging the corresponding partial charges in the three conformers. The protons in each pair, H5′1/H5′2 (ribose), H21/H22 (guanosine), H41/H42 (cytidine), H61/H62 (adenosine), were made equivalent by redistributing the partial charge evenly between the two protons. The overall redistribution of charge resulting from this procedure was very small. Partial charges for all remaining nucleobase atoms were not modified. United atoms were created for all RNA hydroxyl groups, O2′/HO2, O3T/H3T (3′ terminus), O5T/H5T (5′ terminus), by summing the partial charge on the oxygen and hydrogen atoms and placing the sum at the coordinates of the oxygen atom. This procedure produced partial charges that were not significantly different from those of AMBER 94 or ChARMM 27. (See Supplementary Data Table 2; atom names and nucleotide structures are given in Supplementary Data Figure 3). To validate the partial charges and atomic radii set, solvation free energies from gas to water were calculated by summing electrostatic and non-polar contributions to solvation111 (equations (1), (3), (4) therein) and the results were compared to the solvation free energy determined for 9-methyladenine; this quantity was derived originally in the work by Ferguson et al.113 using the experimentally measured heat of vaporization of 9-methyladenosine. Based on the comparison of calculated solvation free energies for 9-methyladenine for various hydrogen radii, the radius of 1.10 Å was chosen. The relative solubilities of the nucleobases have been determined on the basis of their ability to partition between water and chloroform as well as between water and cyclohexane, where in order of hydrophilicity, G > C > U > A.114,115 The calculated solvation free energy for 9-methyladenine is consistent with the experimental value and the remaining calculated solvation free energies are consistent with the hydrophilicity scale established by Wolfenden and coworkers. (Supplementary Data Tables 3 and 4). Finally, we scale atomic radii in order to use them in PB calculations where the solute dielectric of RNA is set to a value greater than 1. In particular, we scale the atomic radius by 87% when working with ε = 4, the value used in the pKa calculations. Atomic radius scaling was used in calculations of solvation free energy by Sitkoff et al.111 The rationale is to maintain the same solvation free energy of individual nucleotides as calculated for ε = 1 when alternate values for the dielectric is used. The scaling factor maintains the balance between solvation and pairwise energies involved in the calculation of pKa shifts. Tests of these parameters (atomic radii, internal dielectric) for pKa calculations revealed that the values chosen were quite reasonable, as shown in Results. We emphasize that the choice of the scaling factor was obtained independently of any pKa calculation, and was not in any way chosen so as to fit experimental pKas. Preparation of structures before calculation Coordinates of RNA structures were obtained from the Protein Data Bank (PDB). The following structures were used in this work: 17ra (BPH),64 1ldz and 2ldz (LDZ),72 437d and 1l2x (BWYV-ψ),66,116 1kpy and 1kpz (PEMV-ψ),67 1cx0, 1drz and 1vc5 (HDVR),68,69 1m5k and 1m5v (hairpin ribozyme).70,71 Crystallographic water and all metal ions Calculating pKas in RNA were removed from the structures and are not included in the calculations. NMR structures having multiple conformations were separated and treated individually. The topology and parameter files were modified for the XPLOR program to handle the protonation of ionized nucleotides. Hydrogen atoms for all nucleotides, ionized or neutral, were added using the X-PLOR program holding heavy-atom positions fixed.117 The modified X-PLOR topology and parameter files are available upon request. Calculations were performed on the proton-added structures without further minimization. In the structures for BWYV-ψ, the 5′ triphosphate terminus was removed and replaced with a standard O5′ terminus. The product structure of the hairpin ribozyme contains 2′-3′-cyclic phosphate between A12 and G13 of the cleaved substrate strand. To obtain pKas of the ribozyme in the product conformation, partial charges were first determined for the 2′-3′-cyclic phosphate using the ESP protocol described above. Source code and additional parameters All the source code used in this work, including our modified version of MCCE, will be made available via the website†. In general, all parameters not otherwise discussed here are given in Supplementary Data Table 5. Acknowledgements We thank Donald Petrey for assistance with GRASP2, Li Xi for assistance with Gaussian98, and Kevin Keating for calculations of η-θ angles. We are grateful to Lucy Forrest, Mickey Kosloff and Remo Rohs for many helpful comments in the writing of the manuscript. References 1. Izatt, R. M., Christensen, J. J. & Rytting, J. H. (1971). Sites and thermodynamic quantities associated with proton and metal ion interaction with ribonucleic acid, deoxyribonucleic acid, and their constituent bases, nucleosides, and nucleotides. Chem. Rev. 71, 439–481. 2. Saenger, W. (1984). Principles of Nucleic Acid Structure. Springer-Verlag, New York. 3. Gao, X. L. & Patel, D. J. (1987). NMR studies of A.C mismatches in DNA dodecanucleotides at acidic pH. Wobble A(anti).C(anti) pair formation. J. Biol. Chem. 262, 16973–16984. 4. Cai, Z. & Tinoco, I., Jr (1996). Solution structure of loop A from the hairpin ribozyme from tobacco ringspot virus satellite. Biochemistry, 35, 6026–6036. 5. Asensio, J. L., Lane, A. N., Dhesi, J., Bergqvist, S. & Brown, T. (1998). The contribution of cytosine protonation to the stability of parallel DNA triple helices. J. Mol. Biol. 275, 811–822. 6. Jang, S. B., Hung, L. W., Chi, Y. I., Holbrook, E. L., Carter, R. J. & Holbrook, S. R. (1998). Structure of an † http://wiki.c2b2.columbia.edu/honiglab _ public/ index.php/RNA 1493 Calculating pKas in RNA 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. RNA internal loop consisting of tandem C-A+ basepairs. Biochemistry, 37, 11726–11731. Durant, P. C. & Davis, D. R. (1999). Stabilization of the anticodon stem-loop of tRNALys,3 by an A+-C base-pair and by pseudouridine. J. Mol. Biol. 285, 115–131. Morse, S. E. & Draper, D. E. (1995). Purine-purine mismatches in RNA helices: evidence for protonated G.A pairs and next-nearest neighbor effects. Nucl. Acids Res. 23, 302–306. Ravindranathan, S., Butcher, S. E. & Feigon, J. (2000). Adenine protonation in domain B of the hairpin ribozyme. Biochemistry, 39, 16026–16032. Bink, H. H., Hellendoorn, K., van der Meulen, J. & Pleij, C. W. (2002). Protonation of non-Watson-Crick base-pairs and encapsidation of turnip yellow mosaic virus RNA. Proc. Natl Acad. Sci. USA, 99, 13465–13470. Blanchard, S. C. & Puglisi, J. D. (2001). Solution structure of the A loop of 23S ribosomal RNA. Proc. Natl Acad. Sci. USA, 98, 3720–3725. Bevilacqua, P. C. (2003). Mechanistic considerations for general acid-base catalysis by RNA: revisiting the mechanism of the hairpin ribozyme. Biochemistry, 42, 2259–2265. Bevilacqua, P. C., Brown, T. S., Nakano, S. & Yajima, R. (2004). Catalytic roles for proton transfer and protonation in ribozymes. Biopolymers, 73, 90–109. Nakano, S., Chadalavada, D. M. & Bevilacqua, P. C. (2000). General acid-base catalysis in the mechanism of a hepatitis delta virus ribozyme. Science, 287, 1493–1497. Oyelere, A. K., Kardon, J. R. & Strobel, S. A. (2002). pKa perturbation in genomic Hepatitis Delta Virus ribozyme catalysis evidenced by nucleotide analogue interference mapping. Biochemistry, 41, 3667–3675. Perrotta, A. T., Shih, I. & Been, M. D. (1999). Imidazole rescue of a cytosine mutation in a selfcleaving ribozyme. Science, 286, 123–126. Wadkins, T. S., Shih, I., Perrotta, A. T. & Been, M. D. (2001). A pH-sensitive RNA tertiary interaction affects self-cleavage activity of the HDV ribozymes in the absence of added divalent metal ion. J. Mol. Biol. 305, 1045–1055. Shih, I. H. & Been, M. D. (2001). Involvement of a cytosine side chain in proton transfer in the ratedetermining step of ribozyme self-cleavage. Proc. Natl Acad. Sci. USA, 98, 1489–1494. Das, S. R. & Piccirilli, J. A. (2005) General acid catalysis by the hepatitis delta virus ribozyme 1, 45–52. Kuzmin, Y. I., Da Costa, C. P., Cottrell, J. W. & Fedor, M. J. (2005). Role of an active site adenine in hairpin ribozyme catalysis. J. Mol. Biol. 349, 989–1010. Kuzmin, Y. I., Da Costa, C. P. & Fedor, M. J. (2004). Role of an active site guanine in hairpin ribozyme catalysis probed by exogenous nucleobase rescue. J. Mol. Biol. 340, 233–251. Lebruska, L. L., Kuzmine, Y. I. & Fedor, M. J. (2002). Rescue of an abasic hairpin ribozyme by cationic nucleobases: evidence for a novel mechanism of RNA catalysis. Chem. Biol. 9, 465–473. Ryder, S. P., Oyelere, A. K., Padilla, J. L., Klostermeier, D., Millar, D. P. & Strobel, S. A. (2001). Investigation of adenosine base ionization in the hairpin ribozyme by nucleotide analog interference mapping. RNA, 7, 1454–1463. Wilson, T. J., Ouellet, J., Zhao, Z. Y., Harusawa, S., Araki, L., Kurihara, T. & Lilley, D. M. (2006). 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. Nucleobase catalysis in the hairpin ribozyme. RNA, 12, 980–987. Lee, J. C. & Gutell, R. R. (2004). Diversity of base-pair conformations and their occurrence in rRNA structure and RNA structural motifs. J. Mol. Biol. 344, 1225–1249. Xiong, L., Polacek, N., Sander, P., Bottger, E. C. & Mankin, A. (2001). pKa of adenine 2451 in the ribosomal peptidyl transferase center remains elusive. RNA, 7, 1365–1369. Muth, G. W., Chen, L., Kosek, A. B. & Strobel, S. A. (2001). pH-dependent conformational flexibility within the ribosomal peptidyl transferase center. RNA, 7, 1403–1415. Yang, A. S. & Honig, B. (1994). Structural origins of pH and ionic strength effects on protein stability. Acid denaturation of sperm whale apomyoglobin. J. Mol. Biol. 237, 602–614. Bullough, P. A., Hughson, F. M., Skehel, J. J. & Wiley, D. C. (1994). Structure of influenza haemagglutinin at the pH of membrane fusion. Nature, 371, 37–43. Frick, D. N., Rypma, R. S., Lam, A. M. & Frenz, C. M. (2004). Electrostatic analysis of the hepatitis C virus NS3 helicase reveals both active and allosteric site locations. Nucl. Acids Res. 32, 5519–5528. Ondrechen, M. J., Clifton, J. G. & Ringe, D. (2001). THEMATICS: a simple computational predictor of enzyme function from structure. Proc. Natl Acad. Sci. USA, 98, 12473–12478. Doudna, J. A. & Cech, T. R. (2002). The chemical repertoire of natural ribozymes. Nature, 418, 222–228. Fedor, M. J. & Williamson, J. R. (2005). The catalytic diversity of RNAs. Nature Rev. Mol. Cell. Biol. 6, 399–412. Demchuk, E. & Wade, R. C. (1996). Improving the continuum dielectric approach to calculating pKas of ionizable groups in proteins. J. Phys. Chem. 100, 17373–17387. Nielsen, J. E. & Vriend, G. (2001). Optimizing the hydrogen-bond network in Poisson-Boltzmann equation-based pKa calculations. Proteins: Struct. Funct. Genet. 43, 403–412. Mehler, E. L. & Guarnieri, F. (1999). A self-consistent, microenvironment modulated screened coulomb potential approximation to calculate pH-dependent electrostatic effects in proteins. Biophys. J. 77, 3–22. Bashford, D. & Karplus, M. (1990). pKas of ionizable groups in proteins: atomic detail from a continuum electrostatic model. Biochemistry, 29, 10219–10225. Antosiewicz, J., McCammon, J. A. & Gilson, M. K. (1994). Prediction of pH-dependent properties of proteins. J. Mol. Biol. 238, 415–436. Antosiewicz, J., McCammon, J. A. & Gilson, M. K. (1996). The determinants of pKas in proteins. Biochemistry, 35, 7819–7833. Yang, A. S. & Honig, B. (1993). On the pH dependence of protein stability. J. Mol. Biol. 231, 459–474. Yang, A. S., Gunner, M. R., Sampogna, R., Sharp, K. & Honig, B. (1993). On the calculation of pKas in proteins. Proteins: Struct. Funct. Genet. 15, 252–265. Li, H., Robertson, A. D. & Jensen, J. H. (2005). Very fast empirical prediction and rationalization of protein pKa values. Proteins: Struct. Funct. Genet. 61, 704–721. Petrov, A. S., Lamm, G. & Pack, G. R. (2004). The triplex-hairpin transition in cytosine-rich DNA. Biophys. J. 87, 3954–3973. Misra, V. K., Sharp, K. A., Friedman, R. A. & Honig, 1494 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. B. (1994). Salt effects on ligand-DNA binding. Minor groove binding antibiotics. J. Mol. Biol. 238, 245–263. Misra, V. K., Hecht, J. L., Sharp, K. A., Friedman, R. A. & Honig, B. (1994). Salt effects on protein-DNA interactions. The lambda cI repressor and EcoRI endonuclease. J. Mol. Biol. 238, 264–280. Ben-Tal, N., Honig, B., Peitzsch, R. M., Denisov, G. & McLaughlin, S. (1996). Binding of small basic peptides to membranes containing acidic lipids: theoretical models and experimental results. Biophys. J. 71, 561–575. Hecht, J. L., Honig, B., Shin, Y. K. & Hubbell, W. L. (1995). Electrostatic potentials near-the-surface of DNA - Comparing Theory and Experiment. J. Phys. Chem. 99, 7782–7786. Misra, V. K. & Honig, B. (1995). On the magnitude of the electrostatic contribution to ligand-DNA interactions. Proc. Natl Acad. Sci. USA, 92, 4691–4695. Misra, V. K. & Draper, D. E. (1999). The interpretation of Mg2+ binding isotherms for nucleic acids using Poisson–Boltzmann theory. J. Mol. Biol. 294, 1135–1147. Misra, V. K. & Draper, D. E. (2000). Mg2+ binding to tRNA revisited: the nonlinear Poisson–Boltzmann model. J. Mol. Biol. 299, 813–825. Misra, V. K. & Draper, D. E. (2001). A thermodynamic framework for Mg2+ binding to RNA. Proc. Natl Acad. Sci. USA, 98, 12456–12461. Misra, V. K. & Draper, D. E. (2002). The linkage between magnesium binding and RNA folding. J. Mol. Biol. 317, 507–521. Misra, V. K., Shiman, R. & Draper, D. E. (2003). A thermodynamic framework for the magnesium-dependent folding of RNA. Biopolymers, 69, 118–136. Sharp, K. A. & Honig, B. (1990). Calculating total electrostatic energies with the nonlinear PoissonBoltzmann equation. J. Phys. Chem. 94, 7684–7692. Murthy, C. S., Bacquet, R. J. & Rossky, P. J. (1985). Ionic distributions near poly-electrolytes–a comparison of theoretical approaches. J. Phys. Chem. 89, 701–710. Bacquet, R. & Rossky, P. J. (1984). Ionic atmosphere of rodlike poly-electrolytes–a hypernetted chain study. J. Phys. Chem. 88, 2660–2669. Svensson, B., Jonsson, B. & Woodward, C. E. (1990). Monte-Carlo simulations of an electric double-layer. J. Phys. Chem. 94, 2105–2113. Guldbrand, L., Jonsson, B., Wennerstrom, H. & Linse, P. (1984). Electrical double-layer forces—a MonteCarlo study. J. Chem. Phys. 80, 2221–2228. Vorobjev, Y. N., Scheraga, H. A., Hitz, B. & Honig, B. (1994). Theoretical modeling of electrostatic effects of titratable side-chain groups on protein conformation in a polar ionic solution. 1. Potential of mean force between charged lysine residues and titration of poly (L-lysine) in 95-percent methanol solution. J. Phys. Chem. 98, 10940–10948. Alexov, E. G. & Gunner, M. R. (1997). Incorporating protein conformational flexibility into the calculation of pH-dependent protein properties. Biophys. J. 72, 2075–2093. Alexov, E. G. & Gunner, M. R. (1999). Calculated protein and proton motions coupled to electron transfer: electron transfer from QA- to QB in bacterial photosynthetic reaction centers. Biochemistry, 38, 8253–8270. Gunner, M. R. & Alexov, E. (2000). A pragmatic approach to structure based calculation of coupled proton and electron transfer in proteins. Biochim. Biophys. Acta, 1458, 63–87. Calculating pKas in RNA 63. Forrest, L. R. & Honig, B. (2005). An assessment of the accuracy of methods for predicting hydrogen positions in protein structures. Proteins: Struct. Funct. Genet. 61, 296–309. 64. Smith, J. S. & Nikonowicz, E. P. (1998). NMR structure and dynamics of an RNA motif common to the spliceosome branch-point helix and the RNAbinding site for phage GA coat protein. Biochemistry, 37, 13486–13498. 65. Legault, P. & Pardi, A. (1997). Unusual dynamics and pKa shift at the active site of a lead-dependent ribozyme. J. Am. Chem. Soc. 119, 6621–6628. 66. Su, L., Chen, L., Egli, M., Berger, J. M. & Rich, A. (1999). Minor groove RNA triplex in the crystal structure of a ribosomal frameshifting viral pseudoknot. Nature Struct. Biol. 6, 285–292. 67. Nixon, P. L., Rangan, A., Kim, Y. G., Rich, A., Hoffman, D. W., Hennig, M. & Giedroc, D. P. (2002). Solution structure of a luteoviral P1-P2 frameshifting mRNA pseudoknot. J. Mol. Biol. 322, 621–633. 68. Ferre-D'Amare, A. R., Zhou, K. & Doudna, J. A. (1998). Crystal structure of a hepatitis delta virus ribozyme. Nature, 395, 567–574. 69. Ke, A., Zhou, K., Ding, F., Cate, J. H. & Doudna, J. A. (2004). A conformational switch controls hepatitis delta virus ribozyme catalysis. Nature, 429, 201–205. 70. Rupert, P. B. & Ferre-D'Amare, A. R. (2001). Crystal structure of a hairpin ribozyme-inhibitor complex with implications for catalysis. Nature, 410, 780–786. 71. Rupert, P. B., Massey, A. P., Sigurdsson, S. T. & FerreD'Amare, A. R. (2002). Transition state stabilization by a catalytic RNA. Science, 298, 1421–1424. 72. Hoogstraten, C. G., Legault, P. & Pardi, A. (1998). NMR solution structure of the lead-dependent ribozyme: evidence for dynamics in RNA catalysis. J. Mol. Biol. 284, 337–350. 73. Legault, P., Hoogstraten, C. G., Metlitzky, E. & Pardi, A. (1998). Order, dynamics and metal-binding in the lead-dependent ribozyme. J. Mol. Biol. 284, 325–335. 74. Nixon, P. L., Cornish, P. V., Suram, S. V. & Giedroc, D. P. (2002). Thermodynamic analysis of conserved loop-stem interactions in P1-P2 frameshifting RNA pseudoknots from plant Luteoviridae. Biochemistry, 41, 10665–10674. 75. Nixon, P. L. & Giedroc, D. P. (2000). Energetics of a strongly pH-dependent RNA tertiary structure in a frameshifting pseudoknot. J. Mol. Biol. 296, 659–671. 76. Moody, E. M., Lecomte, J. T. & Bevilacqua, P. C. (2005). Linkage between proton binding and folding in RNA: a thermodynamic framework and its experimental application for investigating pKa shifting. RNA, 11, 157–172. 77. Nakano, S., Proctor, D. J. & Bevilacqua, P. C. (2001). Mechanistic characterization of the HDV genomic ribozyme: assessing the catalytic and structural contributions of divalent metal ions within a multichannel reaction mechanism. Biochemistry, 40, 12022–12038. 78. Bevilacqua, P. C., Brown, T. S., Chadalavada, D., Lecomte, J., Moody, E. & Nakano, S. I. (2005). Linkage between proton binding and folding in RNA: implications for RNA catalysis. Biochem. Soc. Trans. 33, 466–470. 79. Shih, I. H. & Been, M. D. (2002). Catalytic strategies of the hepatitis delta virus ribozymes. Annu. Rev. Biochem. 71, 887–917. 80. Perrotta, A. T. & Been, M. D. (2006). HDV ribozyme activity in monovalent cations. Biochemistry, 45, 11357–11365. 1495 Calculating pKas in RNA 81. Perrotta, A. T., Wadkins, T. S. & Been, M. D. (2006). Chemical rescue, multiple ionizable groups, and general acid-base catalysis in the HDV genomic ribozyme. RNA, 12, 1282–1291. 82. Kumar, P. K., Suh, Y. A., Miyashiro, H., Nishikawa, F., Kawakami, J., Taira, K. & Nishikawa, S. (1992). Random mutations to evaluate the role of bases at two important single-stranded regions of genomic HDV ribozyme. Nucl. Acids Res. 20, 3919–3924. 83. Belinsky, M. G., Britton, E. & Dinter-Gottlieb, G. (1993). Modification interference analysis of a selfcleaving RNA from hepatitis delta virus. FASEB J. 7, 130–136. 84. Suh, Y. A., Kumar, P. K., Kawakami, J., Nishikawa, F., Taira, K. & Nishikawa, S. (1993). Systematic substitution of individual bases in two important singlestranded regions of the HDV ribozyme for evaluation of the role of specific bases. FEBS Letters, 326, 158–162. 85. Tanner, N. K., Schaff, S., Thill, G., Petit-Koskas, E., Crain-Denoyelle, A. M. & Westhof, E. (1994). A threedimensional model of hepatitis delta virus ribozyme based on biochemical and mutational analyses. Curr. Biol. 4, 488–498. 86. Nesbitt, S. M., Erlacher, H. A. & Fedor, M. J. (1999). The internal equilibrium of the hairpin ribozyme: temperature, ion and pH effects. J. Mol. Biol. 286, 1009–1024. 87. Grasby, J. A., Mersmann, K., Singh, M. & Gait, M. J. (1995). Purine functional groups in essential residues of the hairpin ribozyme required for catalytic cleavage of RNA. Biochemistry, 34, 4068–4076. 88. Ryder, S. P. & Strobel, S. A. (1999). Nucleotide analog interference mapping of the hairpin ribozyme: implications for secondary and tertiary structure formation. J. Mol. Biol. 291, 295–311. 89. Salter, J., Krucinska, J., Alam, S., Grum-Tokars, V. & Wedekind, J. E. (2006). Water in the active site of an all-RNA hairpin ribozyme and effects of Gua8 base variants on the geometry of phosphoryl transfer. Biochemistry, 45, 686–700. 90. Wadley, L. M. & Pyle, A. M. (2004). The identification of novel RNA structural motifs using COMPADRES: an automated approach to structural discovery. Nucl. Acids Res. 32, 6650–6659. 91. Gilson, M. K. & Honig, B. H. (1986). The dielectric constant of a folded protein. Biopolymers, 25, 2097–2119. 92. Reiter, N. J., Blad, H., Abildgaard, F. & Butcher, S. E. (2004). Dynamics in the U6 RNA intramolecular stem-loop: a base flipping conformational change. Biochemistry, 43, 13739–13747. 93. Misra, V. K. & Draper, D. E. (1998). On the role of magnesium ions in RNA stability. Biopolymers, 48, 113–135. 94. Draper, D. E., Grilley, D. & Soto, A. M. (2005). Ions and RNA folding. Annu Rev Biophys. Biomol. Struct. 34, 221–243. 95. Chin, K., Sharp, K. A., Honig, B. & Pyle, A. M. (1999). Calculating the electrostatic properties of RNA provides new insights into molecular interactions and function. Nature Struct. Biol. 6, 1055–1061. 96. Leontis, N. B., Stombaugh, J. & Westhof, E. (2002). The non-Watson-Crick base-pairs and their associated isostericity matrices. Nucl. Acids Res. 30, 3497–3531. 97. Leontis, N. B. & Westhof, E. (2001). Geometric nomenclature and classification of RNA base-pairs. RNA, 7, 499–512. 98. Lemieux, S. & Major, F. (2002). RNA canonical and 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115. non-canonical base-pairing types: a recognition method and complete repertoire. Nucl. Acids Res. 30, 4250–4263. Walberer, B. J., Cheng, A. C. & Frankel, A. D. (2003). Structural diversity and isomorphism of hydrogenbonded base interactions in nucleic acids. J. Mol. Biol. 327, 767–780. Hunter, W. N., Brown, T., Anand, N. N. & Kennard, O. (1986). Structure of an adenine-cytosine base-pair in DNA and its implications for mismatch repair. Nature, 320, 552–555. Flinders, J. & Dieckmann, T. (2001). A pH controlled conformational switch in the cleavage site of the VS ribozyme substrate RNA. J. Mol. Biol. 308, 665–679. Freier, S. M., Kierzek, R., Jaeger, J. A., Sugimoto, N., Caruthers, M. H., Neilson, T. & Turner, D. H. (1986). Improved free-energy parameters for predictions of RNA duplex stability. Proc. Natl Acad. Sci. USA, 83, 9373–9377. Yildirim, I. & Turner, D. H. (2005). RNA challenges for computational chemists. Biochemistry, 44, 13225–13234. Moody, E. M., Brown, T. S. & Bevilacqua, P. C. (2004). Simple method for determining nucleobase pKa values by indirect labeling and demonstration of a pKa of neutrality in dsDNA. J. Am. Chem. Soc. 126, 10200–10201. Murray, J. B., Seyhan, A. A., Walter, N. G., Burke, J. M. & Scott, W. G. (1998). The hammerhead, hairpin and VS ribozymes are catalytically proficient in monovalent cations alone. Chem. Biol. 5, 587–595. Fedor, M. J. (2000). Structure and function of the hairpin ribozyme. J. Mol. Biol. 297, 269–291. Alexov, E. (2003). Role of the protein side-chain fluctuations on the strength of pair-wise electrostatic interactions: comparing experimental with computed pKas. Proteins: Struct. Funct. Genet. 50, 94–103. Gilson, M. K. & Honig, B. (1988). Calculation of the total electrostatic energy of a macromolecular system: solvation energies, binding energies, and conformational analysis. Proteins: Struct. Funct. Genet. 4, 7–18. Rocchia, W., Sridharan, S., Nicholls, A., Alexov, E., Chiabrera, A. & Honig, B. (2002). Rapid grid-based construction of the molecular surface and the use of induced surface charge to calculate reaction field energies: applications to the molecular systems and geometric objects. J. Comput. Chem. 23, 128–137. Cornell, W. D., Cieplak, P., Bayly, C. I., Gould, I. R., Merz, K. M., Ferguson, D. M. et al. (1995). A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. J. Am. Chem. Soc. 117, 5179–5197. Sitkoff, D., Sharp, K. A. & Honig, B. (1994). Correlating solvation free energies and surface tensions of hydrocarbon solutes.Biophys. Chem. 51, 397–403; discussion 404-399. Pauling, L. (1960). The Nature of the Chemical Bond, 3rd edit. Cornell University Press, . Ferguson, D. M., Radmer, R. J. & Kollman, P. A. (1991). Determination of the relative binding free energies of peptide inhibitors to the HIV-1 protease. J. Med. Chem. 34, 2654–2659. Cullis, P. M. & Wolfenden, R. (1981). Affinities of nucleic acid bases for solvent water. Biochemistry, 20, 3024–3028. Shih, P., Pedersen, L. G., Gibbs, P. R. & Wolfenden, R. 1496 Calculating pKas in RNA (1998). Hydrophobicities of the nucleic acid bases: distribution coefficients from water to cyclohexane. J. Mol. Biol. 280, 421–430. 116. Egli, M., Minasov, G., Su, L. & Rich, A. (2002). Metal ions and flexibility in a viral RNA pseudoknot at atomic resolution. Proc. Natl Acad. Sci. USA, 99, 4302–4307. 117. Brünger, A. T. (1992). X-PLOR Version 3.1. A System for X-ray Crystallography and NMR. Yale University Press, New Haven, CT. Edited by D. E. Draper (Received 26 July 2006; received in revised form 29 November 2006; accepted 1 December 2006) Available online 6 December 2006
© Copyright 2026 Paperzz