Gen. Physiol. Biophys. (1993), 12, 401—419
401
Geometries and Energies of Watson-Crick Base Pairs
in Oligonucleotide Crystal Structures
J. JURSA and J. KYPR
Institute of Biophysics, Czech Academy of Sciences,
Královopolská 135, 61265 Brno, Czech Republic
A b s t r a c t . We analyzed geometries and energies of 469 GC and 224 AT pairs
occurring in 11 A-, 54 B- and 12 Z-DNA crystal structures. T h e most frequent
hydrogen bond length conforms to the canonical value but distributions of the
hydrogen bond lengths are unexpectedly wide. The average GC pair is rather
compressed and opened into the double helix minor groove while an average AT
pair has canonical dimensions. T h e extreme base pair geometrical parameters are
the following: buckle - 2 6 ° and 39°, propeller - 3 3 ° and 26°, opening - 2 3 ° and
19°, shear - 1 . 4 Ä and 2.1 Á, stretch - 0 . 7 Á and 0.4 Á, and stagger - 2 . 3 Á and 1.8
Á. T h e analyzed set contains complementary bases apparently bound by unusual
bifurcated hydrogen bonds. T h e unusual base pairing geometries are of two kinds.
In the first case, an extreme value of one parameter is compensated for by other
geometrical parameters so t h a t the energy of the resulting geometry is acceptable.
However, there are also examples when the compensatory effect is missing and t h e n
the base pairing instability is dramatic.
T h e most extensive base pair deformations occur in dodecamers, the
d ( G G A T G G G A G ) nonamer, the r(UUAUAUAUAUAUAA) 14-mer, and the
d( I CCGG) t e t r a m e r whose common feature is a low structure resolution. However, very unstable base pairs are also present in the decamer d ( C C A A C G T T G G )
and hexamer d ( C G T A C G ) whose structures were solved at a relatively high resolution. As the deformations probably originate from crystal packing forces a n d / o r
d a t a and refinement errors, we recommend t o omit these structures from studies
of DNA sequence-structure relationships. T h e remaining hexamers, octamers and
decamers whose list and their Watson-Crick base pair characteristics are given in
t h e article are suitable for this purpose because they exhibit no prohibitive energy
deviations from the canonical hydrogen bonding properties.
K e y w o r d s : Watson-Crick base pairs — Base pair deformations — DNA — Crystal
structures
402
Jursa and Kypr
Introduction
DNA is a polymorphic molecule whose conformational properties are controlled by
the nucleotide sequences it contains. T h e only available method capable to visualize DNA conformation at near atomic resolution is the X-ray detraction analysis
of short oligonucleotide single crystals. This method has already provided several
dozens of independent oligonucleotide structures whose coordinates are deposited
in t h e Brookhaven or Cambridge databases (Bernstein et al. 1977, Abola et al.
1987). Extrapolating from the history of protein crystallography, empirical analysis of the oligonucleotide crystal structures may provide a wealth of information
regarding rules governing DNA architecture. However, this approach can only be
successful if the structures are not biased by the refinement procedures and if they
do not contain large inaccuracies or errors due to low resolution of the experimental d a t a The oligonucleotide crystal structures are also influenced by the crystal
packing forces which should be identified and separated from the effects t h a t govern D N A conformation in biologically relevant environments Unfortunately, the
crystal packing effects and the d a t a and refinement errors are not rare and, which
is even worse, they influence the oligonucleotide crystal structures in variable and
as yet unpredictable ways (Dickerson et al. 1991; Sponer and Kypr 1993a). Here
we analyze a large set of DNA crystal stiuctures regaiding the geometries and
energetics of their Watson-Crick base pairs. This choice is motivated by a relative
simplicity of the base pair geometry and energy analysis and by our previous experience with the empirical potential studies of free Watson-Crick base pairs (Jursa and
Kypr 1991) suggesting t h a t unrealistic perturbations of hydrogen bonding between
the complementary bases is an indicator of DNA structures which are significantly
influenced b> factors not occurring in solution.
Materialb and M e t h o d s
We use an IBM personal computet containing a mathematical copiocessor and software
called "DNA Modeller" (Jursa 1994) The software leads the atomic coordinates of bases,
analyzes their mutual positions and calculates the hydrogen bond lengths and energies
using the AMBER potential (Werner at al 1986) for non-bonded interactions The par a m e t e r describing mutual positions of bases are defined accoidmg to the Cambridge
Convention (Dickerson 1989) Generation or analysis of deformed base pans is alwajs
staited horn the canonical base pair geometnes (Saengei 1984) The right-handed orthonoimal cooidinate system is connected to the base pair in the following mannei The
y-axis goes through the C'6 and CM atoms of p\ iimidme and puime, íespectively, with
its positive direction pointing to the left when the base pan is viewed fiom the mmoi
groove The x-axis is perpendicular, it lies in the base pair plane and divides the C6-C8
distance into halves Positive dnection of this axis points to the majoi gioove The z-axis
is perpendiculai to the xy-plane and its positive direction is defined according to the nghthand rule Anothei coordinate systems are connected with each base in the pair The
Watson-Crick Base Pairs in Oligonucleotide Crystal Structures
403
coordinate systems of the bases coincide with the coordinate system of the corresponding
base pair as long as all the six base pair parameters are zero.
The analysis of base pairs occurring in DNA crystal structures starts with a replacement of the bases by their canonical equivalents including the hydrogen atoms. Because
all six pair parameters are defined as symmetrical deviations of bases from their reference
positions in the pair, the coordinate system of the base pair can easily be generated from
the coordinate systems of the bases. First the bases are shifted such that their coordinate
system origins coincide with the coordinate origin of the pair. Now we can calculate base
pair opening (its half) by bringing the y-axes of the bases into the yz-plane of the pair
coordinate system due to a rotation of each base around the z-axis of the pair. Similarly the rotations around the x- and y-axes of the pair define the buckle and propeller,
respectively. The order of rotations is essential because they are not commutative. All
conformational angles are defined by the right-hand rule. The signs of buckle, propeller
and opening are given by the sign of the corresponding rotation of the left-hand base in
the pair when DNA is viewed from its minor groove side. Similarly the positive signs of
shear, stretch and stagger are given by the shift of the left-hand base from its regular
position in the positive directions of the x-, y- and z-axis, respectively. The order of
the geometrical operations is reversed (i.e. propeller, buckle, opening and then the three
translations) when a base pair is built up. Hydrogen bond lengths (distances between the
participating non-hydrogen atoms) were directly calculated from the atomic coordinates.
Then the crystal base geometries were replaced by their regular geometries, including
the hydiogeu atoms (Saenger 1984), to calculate energies and the parameters describing
mutual positions of bases in the crystal structures. The DNA crystal structure data have
been taken from the Brookhaven database (Bernstein et al. 1977, Abola et al. 1987). We
use the database designations of the particular DNA fragment structures.
It is not realistic to calculate interatomic energies for distances shorter than 1 A,
using the empirical potentials. Therefore we add the penalty of 1000 kcal/mol to the
interaction energy for one interatomic distance shorter than 1 Á and take "energies"
higher than 1000 kcal/mol, in this article, as only an indication of the number of atom
pairs in the severely short contacts.
Results
This woik analyzes the published atomic coordinates of the bases which form 469
GC and 224 AT Watson-Crick pairs in 11 A-, 54 B- and 12 Z-DNA crystal structures. The structures solved at low resolution do not provide a sufficiently precise
structural information but nevertheless the most frequent hydrogen bond lengths
are the same (2.9 A) as in free pairs giving highly resolved structures (Saenger
1984). However, the deviations from the most frequent value are very large in
some cases. A distribution of the hydrogen bond lengths shorter t h a n 3 A is given
in Fig. 1 to demonstrate a presence of unrealistically short ( < 2.6 Ä) hydrogen
bonds in some pairs. T h e base-pairing energies are given in Fig. 2 as dependent
on propeller. The energies of most base paiis lie close to the optima but there are
some mostly GC pairs whose energies are by tens of kcal/mol above the optimum.
Such instabilities can hardly be compensated for by other forces operating in the
Jursa and Kypr
404
%
15
10
5
0
15
10
5
*i*rt*Hml
-H
0
•
1
'
1
>
1
— •• + -
10
0
15
D
-DD
D
U0
10
..dUfljll.
20
21
22
23 24 25 26 27 28
Hydrogen Bond Length / A
29
30
Figure 1. Histograms of the hydrogen bond lengths shorter than 3 0 Ä (from top to
bottom) in the major groove, center, minor groove and the average for the GC (open)
and AT (closed) pairs occurring in the oligonucleotide crystal structures deposited in the
Brookhaven Protein Data Bank
double helix In the following we will identify the shortest hydrogen bonds and
discuss t h e m in the context of the corresponding base pair geometries.
Table 1 summarizes the shortest hydrogen bonds in the major groove, cent r a l and minor groove positions of the Watson-Crick base pairs in the DNA crystal
structures. T h e shortest major groove hydrogen bond occurs in the CG pair number
3 (given in "bold") in t h e P D B 1 B D N D dodecamer ( C G C A A A A A T G C G , complementary strand C G C A T T T T T G C G ) . T h e base pair is almost planar but heavdy
compressed in the pair long axis direction (Sy=—0.6 A) In addition, the major
groove hydrogen bond is further compressed by the base pair opening of —10° A
similar situation is observed with t h e G C base pair number 10 in the same struc-
405
Watson-Crick Base Pairs in Oligonucleotide Crystal Structures
I
50
O
o
E
\
30
J*
20
\
10
o
o
o
40
0
O
O
O
o
0
1>
c -10
0
-20
0
O
-30 8 - 2 0
-10
0
10
20
30
Propeller / d e g
50
Ô
E
\
30
-¥
20
\
10
o
o
v
40
en
0
v
c -10
v
-v
-
-20
-30
-20
-10
0
10
20
30
Propeller / d e g
Figure 2. Summary of energies of the (top) GC and (bottom) AT pairs in the DNA
crystal structures (the base pairs having energies higher than 50 kcal/mol are not shown).
The base amino groups lie in the base planes. The curves indicate energies of the optimized
free base pairs (Jursa and Kypr 1991).
t u r e which contains t h e second shortest major groove hydrogen bond between the
complementary nucleic acid bases in the database. In the AT pair number 6 of this
structure, the compression in the direction of t h e long axis is partially compensated
for by the propeller ( — 17°) and stagger (0.6 Á) and so its instability is much lower
t h a n in the previous two examples but still very high (see Table 1). In t h e same
crystal, there is a dodecamer in two orientations (pointing "down" a n " u p " , which
is indicated by the last character D or U added t o the P D B name). It is interesting
t h a t the "Up" conformer contains no severe clashes between the complementary
bases while the "Down" conformer does (Fig. 3).
T h e next shortest major groove hydrogen bonds are in the dodecamer
Jursa and Kypr
406
Table 1. The shortest hydrogen bonds (top) on the major groove side, (middle) in the
center and (bottom) on the minor groove side of the base pairs in the indicated DNA
Continued on the opposite page
Base pair
Hydrogen bond length
and its
(angstroms)
position Major Central Minor Average
Database Sequence
in the
groove
groove
code
length
Sequence
sequence
PDBIBDND 12 CGCAAAAATGCG: C : G 3
2.0
2.4
2.7
2.4
PDBIBDND 12 CGCAAAAATGCG: G : C 10
2.1
2.2
2.2
2.2
PDB1D28
12
CGTGAATTCACG: T : A 3
2.4
2.8
2.6
PDB1D29
12
CGTGAATTCACG: G : C 12
2.4
2.6
2.9
2.7
PDB1D30
12 CGCGAATTCGCG:* G : C 4
2.5
2.8
2.8
2.7
PDB1D29
12
CGTGAATTCACG: T : A 3
2.5
2.4
2.4
PDB5DNB 10
CCAACGTTGG: G : C 6
2.5
2.6
2.5
2.5
PDB5DNB 10
CCAACGTTGG: C : G 5
2.5
2.5
2.5
2.5
PDB1D28
12 CGTGAATTCACG: G : C 2
2.5
3.1
3.3
2.9
PDB1D29
12
CGTGAATTCACG: G : C 2
2.5
2.8
3.3
2.8
PDB1DN6
9
GGATGGGAG: A : T 8
2.5
3.1
2.8
PDBIBDND 12 CGCAAAAATGCG: A : T 6
2.5
2.2
2.4
PDB1D29
12
CGTGAATTCACG: C : G 1
2.5
2.9
3.1
2.8
PDB7BNA 12 CGCGAATTCGCG: G : C 4
2.5
2.7
2.7
2.6
PDB1DN6
9
GGATGGGAG: G : C 7
2.5
2.8
3.1
2.8
PDB3BNA 12 CGCGAATT B r CGCG: C : G 3
2.5
2.9
3.0
2.8
PDBIBDND
PDBIBDND
PDBIBDND
PDB1D29
PDB1D28
PDB1D29
PDB1D22
PDB1D29
PDB1D29
PDB5DNB
12
12
12
12
12
12
6
12
12
10
CGCAAAAATGCG:
CGCAAAAATGCG:
CGCAAAAATGCG:
CGTGAATTCACG:
CGTGAATTCACG:
CGTGAATTCACG:
CGTACG:
CGTGAATTCACG:
CGTGAATTCACG:
CCAACGTTGG:
A
G
C
T
T
T
A
A
G
C
:T 6
: C 10
:G 3
:A 3
:A 8
:A 8
:T 4
:T 5
:C 4
:G 5
2.5
2.1
2.0
2.5
2.6
2.9
2.7
3.1
2.7
2.5
2.2
2.2
2.4
2.4
2.4
2.5
2.5
2.5
2.5
2.5
PDBIBDND
PDB1DN9
PDB5DNB
PDB5DNB
PDB1D29
PDB1D22
12
12
10
10
12
6
CGCAAAAATGCG:
CGCATATATGCG:
CCAACGTTGG:
CCAACGTTGG:
CGTGAATTCACG:
CGTACG:
G
C
C
G
G
C
: C 10
:G 3
:G 5
:C 6
:C 4
:G 1
2.1
2.7
2.5
2.5
2.7
2.7
2.2
2.6
2.5
2.6
2.5
2.7
Br
bromo, * DAPI
2.2
2.7
2.5
2.5
2.2
2.4
2.5
2.5
2.5
2.5
2.4
2.2
2.4
2.4
2.5
2.7
2.6
2.8
2.6
2.5
2.2
2.5
2.5
2.5
2.6
2.6
CO
K
o
T
bo
ň
>
i>
T3
>
(S
CD
J5
H
CM
05
t~
CO
CO
CTI
^H
CO
lO
CO
o
CD
^H
t - ^r
XI
00
-*
f
CD
o
(M
00
CO
O O O l O C M ^ H l - H r - I ^ H
• ^ C O t — I C O C N C O O O C O
w
aj
CO
-^
O) cu
'Ä
gS
o
a
ľ
ť-
t'H
CO
CU <D
u
.J3
T)
<e c
v .2
CU C
Í3 bo
S.S
c
cu
"3 >
£? £
u
I
n
(M
in
^H
I
I
I
I
I
I
oi
rH
I
^ŕ
CM
CO
T-H
ť-
00
CM
CO
(M
^H
^H
• > #
lO
05
^f
Í ^ O O O r H ^ H t ^ C D ^ H O O ' - H
C M C O ^ ŕ C O l M ^ H ^ H O C M C O
I
I
I
I
CO
CJ
T-l
CM CM
CM O
• > *
1
02
CO
CO
(M
CO
t—1
CO
•t"
CO
05
I
I
I
I
I
I
•f
CO
m
o
t^
CT0 0
• > *
•t
lO
O
CO
O
CO
o
CO
--I
CO
0 0 CT
CM CN
Cf
lO
CO
m
co
I
m
I
co ^
lO
I
I
I
I
I
C O C N C O I M t - H , - H C O C O r H T t <
O
O
O
a3
q
oi
^ O C O r H r H l M C ^ C M ^ r ^ C í O H O C D C M O l O
O O O
I I I
ÍS
co
co
N
to
CM ^H
CO
• *
I
I
00
CD
I
I
O)
00
N
lO
H
H
CN
n
O
O
O
I I
O
CN
d> á
CN
d> ó á
I
I I
-í
^
H
t-
CN
^f
Tf
^
CN
q
CO
CM
CM
CO
r*
(M
T*
**
CM
o
m
( M CM CO
00
00
CO CO CN
O
O O
I I I
H
d
I
á <ó ó d> d> á>
I
I I I I I
CO
'S c
a •-
I
n
I
CM
05
CO
C D ^ l M C O t - H L O ^ - ^ ^ H C M T Í t ^ O C O í M O
o -^
(_,
io
(M
I
CO
Cft
t-t^COlOiO^^CO-^TP
o ' o o ó o c i o o o d o o o o o o
O O O O O O O O O O
I
I I I I I I I I I I I
I I I
I
I I I I I I I I I
—. "°
S 2
I
I
I I I
1u S
ä
O
I
I
CD
CO
CD
C O C N C M r - l ^ H t M ^ ^ ^ H O ^ H C O ^ - l C N O O O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
CO
"3 S
CO
I
S M
"S
o
I
t^
t~-
*& CO
tí
C/3
r/i
I
- ^ CM O
00
^ŕ
CO CM I M
t — r-H
-H
CO CO
!-< . «
+j
s-l
I
íO
—
« S
2
u
I
t~
(M
r—\
bo
C
'3
<u
a,
O
*
I
00
PH
I
O
Ol
I
CD
I
O
lO
I
O
lO
I
O
I I
IO
I
M
O
O
O O
I I I
O
Cl
i-l
I
O
CO
CM
o
O
O
I I
io o
O
O
^H
I
r-
1
I
T P c M r - l C O C O O T ŕ O O t - -
1
d
CO
I
777777^7
O l lO
I
I
b-
I
^H
I
CM
CO 0 0 CT
I
CM
CO
IC
CM
CM
o o d
1
1
o
CO
t-
t ^
d
d
d d
1
o
o
CO
CT5 CO
O
OJ
1
1
^
CO
d
H
I *? <? V
CT
I
lO
I
^H
I
O
tI
1
^H
i—l
i
CO
i—(
l
CM
i—l
i
d
1
1
I
íO
d d
1
d
1
1
I T
lO
i—I
CM
d d
1
T* V
i—I
t— CD
408
Jursa and Kypr
4 i
_ 2
I
1
1
1
I
i
i
C
G
C
1
1
i
A
1
1
A
1
i
A
1
1
A
1
1
A
1
T
G
1
r
1
1
C
L.
G
F i g u r e 3 . The PDB1BDN crystal contains the oligonucleotide d(CGCAAAAATGCG)
in two orientations ("Up" - open bars and "Down" - closed bars). The base pair energies
are compared for these two orientations. Note that the energy scale is logarithmic.
d ( C G T G A A T T C A C G ) (PDB1D28). T h e G C pair number 2 and TA pair number 3 can serve as examples of base pairs having no parameter extremely high, but
which nevertheless have short hydrogen bonds and thus high energy. T h e largest
p a r a m e t e r is opening ( - 1 4 ° and - 9 ° ) in b o t h cases which is only partially compensated for by the - 1 0 ° propeller and - 0 . 4 Á shear with the pair no. 2. W h e n we
fixed t h e opening in the TA pair no. 3 and optimized its remaining 5 parameters,
t h e n its energy improved by more t h a n 40 kcal/mol and the resulting geometry
had t h e following parameters: buckle = - 1 5 ° , propeller = - 2 9 ° , S x = - 0 . 3 Á,
Sy = —0.1 Ä and Sz = 0.5 A. T h e same energy minimum could be obtained by
changing the signs of buckle, propeller and stagger. It follows from a comparison of
these parameters with Table 1 t h a t t h e base pair is too flat in the crystal structure,
i.e. its propeller, buckle and stagger are too low to reduce the clash created by
opening. W h a t force keeps the flat base pair geometry? Or what force creates the
large opening?
T h e dodecamer d ( C G T G A A T T C A C G ) (PDB1D29) contains noticeably nonplanar bases. Further structures showing this property include the t e t r a m e r
d ( C G C G ) (PDB1D32), decamer d ( A C C G G C C G G T ) (PDB1D13) and hexamer
d ( C G T A C G ) (PDB1D14). T h e dodecamer contains short hydrogen bonds especially in the GC pair no. 12 and TA pair no. 3 and their energies are very high
Watson-Crick Base Pairs in Oligonucleotide Crystal Structures
409
(Table 1). In both base pairs, the instability is caused by a compression in the
direction of the base pair long axis and by negative opening. These deformations
are not compensated by other pair parameters, which result in a clash in the double
helix major groove. The energy of base pairs is, however, artificially increased in
some cases by the substitution of the canonical planar bases for the deformed bases
in the crystal structure (see Materials and Methods).
The GC pairs no. 5 and 6 of the decamer d(CCAACGTTGG) (PDB5DNB)
are identical because of a symmetry constraint during the structure refinement.
The base pairs are compressed in the long axis direction (Sy = —0.4 A) and the
other pair parameters do not compensate for this compression (i.e. buckle is too
high, opening and stagger even have opposite signs than they would have to remove
the clash, and Sx is too small). It seems that the base pair deformation, if it is not
a refinement procedure artifact, is not caused by a simple compression in the long
axis direction.
A very short major groove hydrogen bond also occurs in the AT pair number
8 in the nonamer formed by the sequence GGATGGGAG. This base pair shows
opening —23° and almost —1 Á shear, i.e. adenine is shifted to the minor groove
and thymine to the major groove. In addition, there is a 0.4 A compression in the
base pair long axis direction. This causes a dramatic steric clash not only between
Figure 4. The nonamer AT base pair no. 8, containing a short major groove hydrogen
bond (2.5 Á), without (top) and with (bottom) a 45° adenine amino group rotation which
improves the base pair energy.
Jursa and Kypr
410
the regular hydrogen bond atoms but especially between the adenine amino and
thymine imino hydrogen atoms (Fig. 4, t o p ) . It is certainly not t h e inherent base
pair or t h e DNA fragment intramolecular tendency t o keep the base pair so much
deformed. The clash is partially removed by negative stagger and positive propeller
of 18° while negative propellers strongly dominate in the DNA fragment crystal
structures (Fig. 2). We tried to remove the clash by the adenine amino group
rotation (van Zandt and Schroll 1990). T h e energy dependence (Fig. 5) shows t h a t
a 45° amino group rotation decreases b u t does not eliminate the clash. T h e AT
pair having the optimized amino group rotation is shown at the b o t t o m of Fig. 4.
0
T
1
1
!
10
20
30
40
1
1
r
50
60
70
Angle /
80
90
100 110
deg
F i g u r e 5. Energy dependence oi the nonainri Al base pan no. 8 (Uiangles) on the
adenine amino group rotation. Contributions of the hydiogen bond energy (circles) and
of the energy of the amino group interaction with the rest of the adenine molecule (squares)
are also shown.
T h e next shortest hydrogen bond in the major groove was found in the GC
pair no. 4 of the dodecamer P D B 7 B N A d ( C G C G A A T T C G C G ) . This base pair is
simply compressed in the direction of the y-axis and no other parameter has an
exceptionally high value.
T h e nonamer also contains the further shortest major groove hydrogen bond
which occurs in the neighboring GC pair no. 7. This base pair is also positively
propeller-twisted and negatively opened but it is sheared in the opposite direction
t h a n the strongly deformed neighboring AT pair no. 8. However, the compression
Watson-Crick Base Pairs in Oligonucleotide Crystal Structures
411
Figure 6. The nonamer GC base pair no. 9 showing the bifurcated hydrogen bonds
(the G and C amino group orientations are optimized). The putative hydrogen bonds are
indicated by the dotted lines. Their lengths are given in angstroms.
in the direction of the long base pair axis is smaller. In addition, this base pair
exhibits a large stagger (—0.8 Á) which substantially diminishes the clash.
T h e nonamer still contains another type of clash in the GC pair no. 9 where all
three hydrogen bonds are about 3.1 A in length, i.e. they are stretched. However,
the base pair is deformed by t h e shear of 2.1 Á (the highest shear in the database)
and compressed by 0.4 Ä in the direction of the long base pair axis so t h a t cytosine
is shifted to the minor groove (and guanine to t h e major groove) and simultaneously
towards the guanine to give rise to a clash between a cytosine amino group hydrogen
and the guanine central hydrogen atom. This clash is partially removed by a —0.9
A stagger, positive 14° propeller and 14° buckle. The —9° opening then helps
to form apparent bifurcated hydrogen bonds between the cytosine amino group
and the guanine N l . and between the cytosine N3 and a guanine amino group
hydrogen. T h e amino group rotations may decrease the base pair repulsion and
enable an imusual hydrogen bonding of b o t h cytosine amino group hydrogens to
the guanine 0 6 (Fig. 6).
Further shortest hydrogen bonds on the base pair major groove sides all occur
in the Dickerson's dodecamer and its variants (Table 1). The compressions concern
GC pairs. However, none of the deformed base pairs is positively propeller-twisted,
the openings are in absolute value less t h a n 10° and the mutual shifts along any
412
Jursa and Kypr
of the three translational base pair axes do not exceed 0.5 Á. All of the deformed
base pairs are destabilized by the van der Waals contributions to the interaction
energies.
The shortest central and minor groove hydrogen bonds occur in the dodecamers, PDB1D22 hexamer and PDB5BNA decamer (Table 1). Population of the
GC and AT pairs is equal regarding the short central hydrogen bonds. The base
pairs are compressed (negative Sy) by more than 0.2 Ä in the long axis direction
and most of them are sheared up to 0.7 Á (Table 1, right), which leads to close
contacts among the atoms participating in hydrogen bonds. The deformed base
pairs are mostly destabilized by steric clashes which are in some cases overcome by
the electrostatic attraction.
The base pairs possessing the shortest central hydrogen bonds are shown
in the central part of Table 1. They include the base pairs no. 6, 10 and 3
of PDBIBDND already discussed in the previous paragraphs. The next short
central hydrogen bond is in the TA pair number 8 of the dodecamer PDB1D28
d(CGTGAATTCACG). It is compressed in the direction of the long base pair axis
and only propeller (—20°) partially compensates for the compression. Other parameters are near zero and so hydrogen bonds are short and the energy is high.
The AT pair no. 4 in the hexamer PDB1D22 d(CGTACG) is also compressed in
the direction of the long base pair axis. This is partially compensated by shear (0.7
A) and stagger (0.3 A), but a further increase of these parameters would better
remove the clash and decrease the energy.
There is an interesting GC pair no. 2 in two variants of the dodecamer
PDB1DNE complexed with netropsin. Its energy is high but the central hydrogen bond compression is not the main contributor to the base pair destabilization.
We found that another contribution stemmed from a very short (2.4 Á) hydrogen
bond on the base pair major groove side which surprisingly was not included among
the shortest hydrogen bonds listed in the first third of Table 1. Resolution of this
discrepancy was found to consist in that we calculate the hydrogen bond lengths directly from the crystal coordinates while the energy is calculated using the standard
base geometries (Saenger 1984) superimposed on the crystal coordinates (which is
inevitable because the X-ray data do not include positions of hydrogen atoms that
should be included into the energy calculations). The hydrogen bond length of 2.4
A therefore concerns the canonical bases while the actual distance in the crystal is
not 2.4 Á but 2.7 Á. The difference arises owing to modified base geometries when,
for example, the guanine N1-C6-06 and cytosine N3-C4-N4 angles are changed by
7° and 3°, respectively, with respect to their regular AMBER values (Weiner et
al. 1986). These deviations can originate from either the covalent structure deformations by the crystal forces or from inaccuracies of the data and/or refinement
(Dickerson et al. 1991).
AT pairs have no hydrogen bond in the minor groove so that the GC pairs
Watson-Crick Base Pairs in Oligonucleotide Crystal Structures
413
Table 2. Base pairs ini the Brookhaven database showing extreme values of the indicated
parameters.
BUCKLE:
PDB2D47
C :: G
G : C
G : C
C :: G
PDBINDNU G : C
A : T
C :: G
PDB1NDND G : C
C :: G
G : C
PDB2BNA
C :: G
PDB3BNA
C:G
PDB5BNA
C :: G
PDB1D44
G :C
PDB1DN6
G :C
PDB1D46
G : C
T :: A
1
8
10
2
4
8
11
10
9
12
9
9
9
4
2
10
8
39°
30°
23°
21°
28°
-26°
21°
-24°
-23°
-21°
-21°
-20°
-21°
22°
19°
24°
-22°
S H E A R (SX):
PDB1DN6
G : C 9
A : T 3
G :C 1
G :C 2
PDB6BNA
G : C 2
G : C 12
C : G 11
PDB2D47
C :G 1
PDB1D46
T :: A 8
PDB8BNA
G :C 4
T :A 7
2.1
-1.2
1.1
-1.1
-1.4
-1.3
-1.2
-1.4
-1.2
-1.1
1.0
Á
Ä
Á
Á
Á
Á
Á
Á
Á
Á
Á
S T A G G E R (SZ):
PDB2D47
C :G 1
G :: C 8
G :: C 9
G ;: C 10
C :G 5
PDB4BNA
G :: C 2
PDB1DNE
C : G 11
PDB5BNA
C : G 11
PDB1DN6
G :: C 5
PDB1ANA
G :: C 4
-2.3
1.8
1.1
1.0
1.0
1.2
1.2
1.1
1.0
-1.0
Á
Á
Á
Á
Á
Á
Á
Á
Á
Á
PROPELLER:
PDBINDNU A :
PDB1D46
C ::
PDB1RNA
U :
T ::
PDB3BNA
PDB1DNH
A :
PDB1NDND A :
PDB2D47
C ::
PDB1DN9
G :
PDB4BNA
G :
T 6
G 11
A 1
A 7
T 6
T 5
G 1
C 12
C 12
-33°
-33°
-29°
-28°
-28°
-28°
26°
23°
21°
OPENING:
PDB1DN6
PDB2D47
PDB3BNA
PDB1DN9
PDB8BNA
T
G
A
A
A
8
1
7
7
7
-23°
-17°
19°
18°
14°
STRETCH (SY):
PDBIBDND A : T
G : C
C :: G
PDB2D47
C :: G
T :: A
PDB1D28
A : T
PDB1DN6
C :: G
G : C
G : C
PDB1D29
T :: A
G : C
T :: A
PDB5DNB
C :: G
G : C
C :: G
G : C
PDB1D22
T :: A
A :: T
PDB1DNE
C:G
PDB1DNH
G : C
PDB1DN9
A:T
G :C
PDB1D46
G :C
T :: A
PDB8BNA
C :: G
PDB1D45
C :: G
6
10
3
1
8
10
2
5
9
3
4
8
1
10
5
6
3
4
11
12
4
10
12
8
1
9
-0.7
-0.7
-0.6
-0.6
-0.5
-0.5
-0.5
-0.4
-0.4
-0.5
-0.4
-0.4
0.4
0.4
-0.4
-0.4
-0.4
-0.4
-0.4
-0.4
-0.4
-0.4
-0.4
0.3
0.3
0.3
A :
C ::
T ::
T ::
T ::
Á
Á
Á
Ä
Á
Á
Á
Á
Á
Á
Á
Ä
Á
Ä
Á
Ä
Á
Á
Á
Á
Á
Á
Á
Á
Á
Á
414
Jursa and Kypr
are only present in t h e third set of Table 1 The shortest hydrogen bond on the
minor groove side found in the GC pair no 10 of the dodecamer P D B I B D N D was
already discussed above, in the p a r a g r a p h devoted to the shortest major groove
hydrogen bonds T h e second shortest hydrogen bond occurs in the third GC pair
m t h e double helix foimed by the sequence d ( C G C A T A T A T G C G ) (PDB1DN9)
In addition, the central hydrogen bond of this pair is listed in t h e group of short
central hydrogen bonds Destabihzation caused by the extremely close contacts of
the hydrogen-bonded atoms is very high and is deti eased but not eliminated by
optimizing the a m m o group rotation T h e other base pairs in this group (Table 1,
right) show a large stretch a n d / o i buckle in some cases
Table 2 gives t h e extreme base pair geometrical parameters found in the
d a t a b a s e which are t h e following buckle —26° and 39°, propeller —33° and 26°
opening - 2 3 ° and 19°, shear - 1 4 Á and 2 1 Á, stretch - 0 7 Á and 0 4 Á, and
staggei —2 3 A and 1 8 Á Some base pairs are extreme in more parameters T h e
extreme example is t h e CG pair no 1 in the dodecamer d ( C C C C C G C G G G G G )
(PDB2D47) with all parameters listed in Table 2 This base pair is an interesting
example of coirelated parameters of very high magnitudes to give near o p t i m u m
hydrogen bond lengths between 2 8 and 2 9 Á An opposite example is the G C pair
no 4 of the dodecamer d ( C G C G A A T T C G C G ) (PDB1D30), having no parameter
extreme but containing short hydrogen bonds (see Table 1) There aie also exam
pies of base pairs with two extieme parameters This concerns, for example, the
G C pair no 9 of the nonamer P D B 1 D N 6 (shear 2 1 Á, stretch - 0 4 Á)
Table 3 provides t h e calculated energies of some base pairs with short (2 4-2 6
Á) hydrogen bond lengths, their improved values upon optimization of the base
a m m o group rotations and the o p t i m u m a m m o group íotation angles So it seems
t h a t some clashes can be partially removed by the a m m o group rotations
T a b l e 3 . Examples of unstable base pairs in the Brookhaven database, energies of which
can be substantially decreased by the ammo group rotations The energies are given for
base pairs with their ammo groups lying in the planes of the bases and, also if the amino
group orientations are energy optimized The angle is zero if the ammo group hydrogen
atoms he m the plane of the base Positive íotation is defined by the right hand rule in
accordance with the Cambridge Convention (Dickerson 1989)
PDBlDNb
PDB1DN9
PDB1DNE
PDB2DND
PDB1DNH
A T
G C
C G
G C
A T
G C
8
9
3
2
1
12
Energy
(kcal/mol)
47
44
45
24
11
b
Optimized energy
(kcal/mol)
15
9
4
5
4
6
The ammo group rotations
Strand I
Strand II
45° (A)
19° (G)
31° (C)
-34° (G)
- 7 ° (C)
3° (G)
-31° (C)
-27° (A)
2° (G)
-38° (C)
Watson-Crick Base Pairs in Oligonucleotide Crystal Structures
415
Table 4. Average energies to optimize a base pair in the indicated DNA crystal structures.
PDBIBDND CGCAAAAATGCG
613.7* > PDB1DN4 5 B r C G 5 B ' C G 5 B r C G
PDB1D29
CGTGAATTCACG
85.5 PDBINDNU CGCGAAAACGCG
PDB1DN6 GGATGGGAG
20.7 PDB1D38
CGATCG
5MeCGTA5MeCG
PDB1D22
18.8 PDB1BNA CGCGAATTCGCG
CGTGAATTCACG
PDB1D28
16.0 PDB28DN GTACGTAC
PDB5DNB CCAACGTTGG
13.2 PDB1D18
CATGCATG
CGCG
PDB1D32
10.8 PDB9BNA CGCGAATTCGCG
10.5 PDB1NDND CGCGAAAACGCG
PDB1DN9 CGCATATATGCG
5MeCGTA5MeCG
10.2 PDB1D20
TCTATCACCG
PDB1D21
CGCGAATTCGCG
10.2 PDB1RNA UUTUTUTUTUTUTT
PDB1D46
PDB1DNH CGCGAATTCGCG
8.1 PDB1DNM CGCAAGCTGGCG
PDB1DNE2 CGCGATATCGCG
8.0 PDB2ANA GGGGCCCC
PDB1DNE1 CGCGATATCGCG
8.0 PDB1D19
GTACGTAC
CGATCG
7.6 PDB1D39
PDB1D15
CGCGCG
ACCGGCCGGT
PDB1D13
7.0 PDB1D36
CGTACG
C G C 0 6 M e G C G
6.4 PDB1D24
PDB1D45
CGCGAATTCGCG
CGATCGATCG
PDB1D43
CGCGAATTCGCG
5.8 PDB1D23
5.4 PDB1D12
PDB1D30
CGCGAATTCGCG
CGATCG
PDB1D27
CGC° 6 M e GAATTTGCG
5.3 PDB2DBE CGCGAATTCGCG
PDB8BNA CGCGAATTCGCG
4.9 PDB2D25
CCAGGC 5 M e CTGG
5Br
P
P
CG r j B l CG 5 B l CG
PDB5BNA CGC GAATTC GCG
4.6 PDB1DN5
5B,
PDB3BNA CGCGAATT CGCG
4.6 PDB1D26
GCCCGGGC
5Me
4 2 PDB1D41
CGUA 5 M < CG
PDB7BNA CGCGAATTCGCG
PDB1ANA 'CCGG
4.1 PDB1D10
CGATCG
PDB1D14
CGTACG
3.8 PDB1ZNA CGCG
PDB1D44
CGCGAATTCGCG
3.8 PDB1D25
CCAGGC rjM ' CTGG
PDB2DND CGCAAATTTGCG
3.8 PDB1D37
CGATCG
CGCGCG
3.7 PDB1DCG CGCGCG
PDB1D33
CCCCCGCGGGGG
3 7 PDB1DNS GTGTACAC
PDB2D47
PDB1BDNU CGCAAAAATGCG
3.5 PDB1DNF CGCG 5 F UG
PDB4BNA CGCGAATT 5 B r CGCG
3.5 PDB2DCG CGCGCG
3.0 PDB1D35
PDB5ANA GTACGTAC
CGT 2 N H 2 ACG
5
M
e
5
M
e
C
G
U
A
C
G
PDB1D40
3.0 PDB3DNB CCAAGATTGG
5MepprriA 5Mepri
C G T 2 N H 2 A C G
2.9 PDB2D34
PDB1D17
2.8 PDB9DNA GCCCGGGC
PDB4DNB CGCGAATTCGCG
CGCGCGTTTTCGCGCG 2.7 PDB3ANA GGGATCCC
PDB1D16
CGATTAATCG
2.7 PDB1BD1 CCAGGCCTGG
PDB1D49
2.3 PDB1D48
PDB2BNA CGCGAATTCGCG
CGCGCG
2.2
PDB6BNA CGCGAATT 5 B r CGCG
2.1
1.9
1.9
1.9
1.8
1.7
1.7
1.7
1.7
1.6
1.6
1.4
1.4
1.3
1.3
1.3
1.2
1.1
1.1
1.0
1.0
1.0
0.9
0.8
0.8
0.8
0.6
0.5
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.3
0.3
0.2
**Two interatomic clashes shorter than 1 Á.
Table 4 summarizes the average energies which are needed to optimize a base
pair geometry in the indicated DNA crystal structures. It is obvious at first sight
t h a t the structures are dramatically different from this point of view. We compared
416
Jursa and Kypr
energies of the C-A step (helical twist fixed at 35°) in two situations. First, the base
pairs were fixed in their planar geometries and their stacking was only optimized
(i.e. the remaining 5 parameters of the step). Secondly, the base pair geometries
were optimized together with the stacking (i.e. 5 parameters of the step and all
12 parameters of the pairs were optimized simultaneously). The difference was
only 0.6 kcal/mol, which is an estimate of the amount of energy that DNA can
gain through deformations of its base pairing geometries. In other words, more
than a half of the analyzed crystal structures (Table 4) may contain base pairing
deformations which do not originate from the base stacking improvement.
It is interesting that energies of base pairs are strongly variable even within a
crystal structure. This especially holds for deformed structures. The
d(CGCAAAAATGCG) (PDBIBDND) dodecamer can serve as an example because
its GC pair no. 10 has an extremely high energy while the neighboring CG pair
no. 11 is only by 2.1 kcal/mol less stable than the planar base pair. The extreme
instability of the base pair no. 10 is caused by its compression in the direction of
the base pair long axis (see Table 1 and the corresponding discussion for details).
The compression brings the hydrogen atom of the hydrogen bond donor closer than
1 Á to the hydrogen bond acceptor with all three hydrogen bonds between G and
C, which is completely unrealistic. Though this is the most extreme example, the
heavily deformed base pairs generally occur in the deformed oligonucleotide structures where they coexist with base pairs with acceptable energies. On the other
hand, there are many structures in Table 4, where less than 1 kcal/mol is needed
to optimize the average base pair geometry. This energy even moves within 0.1 0.7 kcal/mol in the structure of d(CCAGGCCTGG) (PDB1BD1).
Discussion
This work analyzed geometries and energetics of the complementary base hydrogen
bonding in 11 A-, 54 B- and 12 Z-DNA crystal structures. We warn against using
the present values, especially of the calculated energies, absolutely because there
is a number of uncertainities, including (i) the notoriously known unrealibility of
empirical potentials, especially concerning the absolute values of energy; (ii) uncertainities about the positions of the base hydrogens, which especially concerns
the exocyclic amino group hydrogens of nucleic acid bases (Komárov et al. 1992;
Sponer and Kypr 1993b); (iii) our incapability to discriminate between the effect
of the crystal packing forces and the refinement biases. Nevertheless the results
correctly describe the qualitative and global (statistical) properties of the set of almost a thousand of Watson-Crick base pairs present in the available oligonucleotide
crystal structures.
The analysis showed that hydrogen-bonding of the complementary bases was
very unstable in some DNA crystal structures. However the very unstable base
Watson-Crick Base Pairs in Oligonucleotide Crystal Structures
417
pairs only occurred in DNA dodecamers, nonamer and tetramer while decamers,
octamers and hexamers (with the exception of PDB5DNB and PDB1D22) did not
contain base pairings which should destabilize the whole double helix. We have
also analyzed base stacking in the oligonucleotide crystal structures (Sponer and
Kypr 1993a) and this approach has independently led to the same conclusion, i.e.
dodecamers, the nonamer and tetramer are deformed or imprecisely determined
structures while decamers, octamers and hexamers contain no prohibitive basebase interactions which should strongly destabilize the DNA structures. That is
why we propose to omit the dodecamers, nonamer and tetramer from studies of
correlations between the nucleotide sequences and DNA architecture because otherwise deformations by the crystal packing forces and other undesirable effects would
strongly influence the results.
Many of the crystallized DNA fragments contain symmetric nucleotide sequences while their crystal structures are asymmetric. It is then a matter of discussion which part of the structure asymmetry is caused by the crystal packing
forces, and what should be attributed to the data and refinement errors. The first
step to distinguish between these two contributions has recently been done (Dickerson et al. 1991). The authors claim that the smallest mean structure asymmetry
in a series of isomorphous structures places an upper bound on the crystal packing effects and that larger asymmetries in the structures of the isomorphous series
must arise from the data or analysis errors. However, this notion ignores a possibility that the data and analysis errors can not only increase but also diminish the
asymmetries caused by the crystal packing effects which then would influence the
DNA crystal structures much stronger than it is indicated by the smallest structure
asymmetry in a series of isomorphous crystals. To support this view we present an
example of two variants of the dodecamer d(CGCGAATT Br CGCG) (PDB3BNA
and PDB4BNA) where the smallest mean asymmetry between the symmetry related base pairs is 4° for propeller and 0.4 Á for stagger. However, there are
symmetric base pairs on the opposite ends of the PDB4BNA dodecamer whose
propellers differ by as much as 36° (GC pairs number 1 and 12) and staggers by
1.3 Á (GC pairs number 2 and 11). It is difficult to believe that the data or analysis
errors give the 32° difference in the propellers, especially when the resolution of
the less asymmetric PDB3BNA structure is worse (3.0 A) than the resolution (2.4
Á) of PDB4BNA.
Base pairs showing the short hydrogen bonds can be divided into two groups.
The first group contains base pairs having the hydrogen bonds shorter than 2.4
Á. This group is characteristic by a compression in the direction of the long base
pair axis and by small values of propeller, buckle and the other three base pair
parameters. In other words, the compression is not compensated by changes in
the other pair parameters. The hydrogen bonds are so short that even the amino
group rotations cannot help to stabilize such arrangements.
418
Jursa and Kypr
T h e second group contains hydrogen bonds within 2.4-2.7 Á where the compression in the direction of the long base pair axis is accompanied by compensating
changes of one or more remaining p a r a m e t e r s of the base pairs. T h e compensating
changes give rise to unusual base pairing schemes apparently containing bifurcated
hydrogen bonds. In this second group, t h e amino group rotations can substantially
improve the base pair energy.
T h e database also contains base pairs having extreme conformational parameters b u t acceptable energies. On the other hand, acceptable conformational parameters are so "anti-correlated" in some base pairs t h a t they give steric clashes
and high energies.
T h e present work demonstrates t h a t DNA crystal stiuctures obtained at low
resolution contain so much deformed hydrogen bonds between t h e complementary
bases t h a t the deformations hardly reflect a nucleotide sequence effect on local
features of DNA architecture. Rather, the strongest deformations arise from the
crystal packing forces while another possibility is t h a t the low resolution of the
diffraction d a t a which in most cases accompanies the largest deformations, leads
to extensive inaccuracies or even errors in the atomic coordinates from which all
results presented in this paper have been calculated. T h a t is why this paper provides no conclusive information about the particular hydrogen bonds between the
complementary bases in DNA. R a t h e r the hydrogen bonds have been used as indicators of the DNA crystal structures t h a t are hardly relevant to the situation in
solution. On the other hand, the average hydrogen bond lengths correspond t o the
canonical values so t h a t it seems t h a t t h e errors in the atomic coordinates are not
systematic and are mutually eliminated in large sets of the structures.
References
Abola E. E., Bernstein F. C , Bryant S. H., Koetzle T. F., Weng J. (1987): Protein Data
Bank. In: Crystallographic Databases - Information Content, Software Systems,
Scientific Applications (Eds. F. H. Allen, G. Bergerhoff, R. Sievers), Data Commission of the International Union of Crystallography, Bonn/Cambridge/Chester,
pp 107—132.
Bernstein F. C , Koetzle T. F., Williams G. J. B., Meyer E. F. Jr., Brice M. D., Rodgers
J. R., Kennard O., Shimanouchi T., Tasumi M. (1977): The protein data bank:
A computer-based archival file for macromolecular structures. J. Mol. Biol. 112,
535—542.
Dickerson R. E. (1989): Definitions and nomenclature of nucleic acid structure parameters.
J. Biomol. Struct. Dyn. 6, 627—634.
Dickerson R. E., Grzeskouiak K., Grzeskowiak M., Kôpka M. L., Larsen T., Lipanov A.,
Privé G. G., Quintana J., Schutze P., Yanagi K., Yuan H., Yoon H.-C. (1991):
Polymorphism, packing, resolution, and reliability in single-crystal DNA oligomer
analyses. Nucleos. Nucleot. 10, 1—22.
Jursa J. (1994): DNA modeller: An interactive program for modelling stacks of DNA base
pairs on a microcomputer. Comp. Appl. Biosci., in press.
Watson-Crick Base Pairs in Oligonucleotide Crystal Structures
419
Jursa J., Kypr J. (1991): Propeller-twisted adenine.thymine and guanine.cytosine base
pairs tend to buckle and stagger in opposite directions. Gen. Physiol. Biophys. 10,
373—381.
Komárov V. M., Polozov R. V., Konoplev G. G. (1992): Non-planar structure of nitrous
bases and non-planarity of Watson-Crick pairs. J. Theor. Biol. 155, 281—294.
Saenger W. (1984): Principles of Nucleic Acid Structure, Springer - Verlag, New York.
Sponer J., Kypr J. (1993a): Theoretical analysis of the base stacking in DNA: Choice
of the force field and a comparison with the oligonucleotide crystal structures. J.
Biomol. Struct. Dyn. 11, in press.
Sponer J., Kypr J. (1993b): Close mutual contacts of the amino groups in DNA. Int J.
Biol. Macromol., in press.
Van Zandt L. L., Schroll W. K. (1990): Modeling hydrogen bonds in three dimensions. J.
Biomol. Struct. Dyn. 8, 431—438.
Weiner S. J., Kollman P. A., Nguyen D. T., Case D. A. (1986): An all atom force field for
simulations of proteins and nucleic acids. J. Comput. Chem. 7, 230—252.
Final version accepted September 28, 1993
420
Jursa and Kypr
© Copyright 2026 Paperzz