Trinucleotide repeat DNA structures: dynamic

321
Trinucleotide repeat DNA structures: dynamic mutations
from dynamic DNA
Christopher E Pearson*t and Richard R Sinden*
Models for the disease-associated expansion of
(CTG)n°(CAG) n, (CGG)n°(CCG) n, and (GAA)n°(TTC) n
trinucleotide repeats involve alternative DNA structures formed
during DNA replication, repair and recombination. These
repeat sequences are inherently flexible and can form a variety
of hairpins, intramolecular triplexes, quadruplexes, and slippedstrand structures that may be important intermediates and
result in their genetic instability.
hyper-reactivity of even a single (CTG).(CAG) trinucleotide in DNA to chloroacetaldehydc modification.
This hyper-reactivity, which is dependent upon Zn 2+ or
Co 2+ ions and DNA supercoiling, suggests a DNA structure in which cytosines possess unpaired character.
Subsequently, unusual electrophoretic mobility, hyperflexibility, and unusnal properties of nucleosome assembly have been identified.
Addresses
*Center for Genome Research, Institute of Biosciences and
Technology, Texas A&M University, Texas Medical Center, 2121 West
Holcombe, Houston, Texas 77030-3303, USA
+Present address: Department of Genetics, Hospital for Sick Children
and Department of Molecular and Medical Genetics, University of
Toronto, 555 University Avenue, Toronto, Ontario, Canada, M5G 1X8
Correspondence: Richard R Sinden; e-mail: [email protected]
(CTG)n'(CAG) n and (CGG)n'(CCG) n repeats exhibit
unusual fast mobility on polyacrylamide gels
Current Opinion in Structural Biology 1998, 8:321-330
http://biomednet.com/elecref/Og59440XO0800321
© Current Biology Ltd ISSN 0959-440X
Abbreviations
DM
myotonic dystrophy
FRAXA fragile X mental retardation
h
helix repeat
Jm
molar cyclization factor
Papp
apparent persistant length
SCA
spino-cerebellar ataxia
Introduction
The expansion of triplet repeats in genomic DNA is associated with human disease (reviewed in [1"]). Nine different disease loci have trustable (CAG)n°(CTG) n repeats,
five chromosomal fragile sites contain unstable
(C(;(;)n*(CCG) n repeats, and another disease is associated
with an unstable (GAA)n.(TTC) n repeat ('Fable 1). The
expansion mutation is strongly dependent upon the repeat
tract length, such that the expanded alleles are more likely to undergo further expansion [21. In addition, the purity
(or homogeneity) of the repeat tract can affect its genetic
stability ('Fable 1). Models for expansion invoke the participation of alternative DNA structures in aberrant DNA
replication, repair and recombination [1"]. Here we review
the unusual helical properties of the repeats and the variety of alternative DNA structures formed by them
('Fable 2) - - characteristics that may be biologically relevant in their genetic instability
Unusual helical characteristics associated with
(CTG)n°(CAG) n and (CGG)n°(CCG) n repeats
Many unusual properties have been identified for DNA
containing (CTG),~°(CAG) n and (CGG),~-(CCG) n
repeats. Kohwi et a/. [3] initially presented evidence for
an unusual ( C T G ) ' ( C A G ) helix, demonstrating the
Chastain eta/. [4] reported that duplex (CTG)n'(CAG) ncontaining DNA from the myotonic dystrophy (DM)
locus and (CGG)n.(CCG)n-Containing DNA from the
fragile X mental retardation (FRAXA) locus migrated
faster than expected in polyacrylamide gels (up to 20%
faster for (CTG)2ss.(CAG)2ss). This unusual mobility is
dependent upon repeat length, gel temperature, and the
percentage of acrylamide. T h e physical basis for this
unusually rapid electrophoretic mobilit> which has rarely
been observed previously (see, for discussion, [4,5°]), is
not yet understood. Based on a reptation model for electrophoresis [6], Chastain eta/. [4] suggested that this
increase in mobility may have resulted from a 20%
increase in the apparent persistent length of the DNA
from Papp = 43-51 nm, which is possibly indicative of a
straighter and perhaps less flexible helix than typical BDNA. More recent data suggest, however, that the rapid
mobility may be a resuh of the presence of a long, flexible helical region (see below) [5"].
(CTG)n°(CAG) n and (CGG)n,(CCG) n repeats are
inherently flexible
Two recent studies have shown that (CTG)n'(CAG) n and
(CGG),~*(CCG),~ repeats
are
inherently
flexible.
(CTG)n*(CAG) n and (CGG)n'(CCG),~ repeats exhibit
unusually high molar cyclization factors {/,,), suggesting that
they are more flexible or curved molecules than normal
mixed sequence B-DNA [5",7"']. By analyzing the cyclization kinetics for a series of molecules with increasing numbers of repeats, Bacolla eta/. [7"] reported bending moduli
of 1.13 x 10-19 and 1.27 x 10-19 ergocm for (CTG)n,(CAG) n
and ((2GG)~'(CCG)n, respectively [7°°]. These values are
40% less than that for random B-DNA, and suggest persistence length values about 60% that for mixed sequence BDNA. \51ues for the torsional moduli and helix repeat (~)
were essentially unchanged from that for mixed sequence BDNA. When measured using a topological band shift
method, /} varied dramatically as a flmction of the repeat
length - - from 9.6 to 10.3 for (CTG)n,(CAG) n and from 10.0
to 11.1 for (CGG)n'(CCG) n [7"1. This deviation in ~} with
respect to repeat length could be explained if the different
lengths of repeats had a variable twist, which is inconsistent
322
Nucleic acids
Table 1
Trinucleotide repeats associated with human disease.
Length
Disorder/disease locus
Repeat
Normal
Premutation
Full expansion
Interruption
Fragile XA/FRAXA
(CGG)n°(CCG) n
6-52
59-230
230-2000
(pure)
Fragile XE/FRAXE
Fragile XF/FRAXF
FRA16A
FRA11 B/Jacobsen syndrome
Kennedy's disease/SBMA
Myotonic dystrophy/DM
(CCG)n.(CGG) n
(CGG)n'(CCG)n
(CCG)n'(CGG) n
(CGG)n°(CCG) n
(CAG)n'(CTG) n
(CTG)n°(CAG)n
4-39
7-40
16-49
11
14-32
5-37
*(31-61 )
*
*
80
*
50-80
Huntington's disease/HD
Spino-cerebellar ataxia 1/SCA1
(CAG)n°(CTG) n
(CAG)n°(CTG)n
10-34
6-39
36-39
None
200-900
306-1008
1000-1900
100-1000
40-55
80-1000,
2000-3000
(congenital)
40-121
40-81
(pure)
AGG
(threshold > 34 pure
repeats)
None
(GCCGTC)3_ 4
complex
None
None
Spino-cerebellar ataxia 2/SCA2
(CAG)n'(CTG) n
14-31
None
Spino-cerebellar ataxia 3/8CA3,
Machado Joseph disease/MJD
Spino-cerebellar ataxia 6/SCA6
Spino-cerebellar ataxia 7/SCAT
Haw river syndrome/HRS
(also DRPLA)
Friedreich's ataxia/FRDA
(CAG)n°(CTG)n
13-44
*
34-59
(pure)
60-84
(CAG)n'(CTG) n
(CAG)n°(CTG)n
(CAG)n'(CTG) n
4-18
?-17
7-25
*
*
*
21-28
38-130
49-?5
None
None
None
(GAA)n°(TTC)n
6-29
*(> 34-40)
200-900
(GAGGAA) (threshold
>_34-40 pure repeats)
None
None
CAT
(threshold _>40
pure repeats)
CAA
None
*A possible mutagenic intermediate length. Not all diseases are associated with premutation and protomutation clinical or repeat length status.
with the./,, calculations, or if the repeat had a quite different
writhe from that of mixed sequence DNA. Bacolla eta/. [7"]
suggested that (CT(})n'(CAG),~ and ( C G G ) n ' ( C C G ) n
repeats are both more flexible and more highly writhed than
random B-DNA.
Chastain and Sinden showed that short ( C T G ) n ' ( C A G ) n
tracts are inherently flexible by examining the effect of
triplet repeats upon the electrophoretic mobility of curved
and noncurved D N A polymers [5"]. Triplet repeats interspersed between regions of straight D N A showed a slightly
reduced electrophoretic mobilit-~; excluding the possibility
that (CT(~)n*(CAG) n repeat tracts possess static curvature.
When (CTG)4.(CAG) 4 was interspersed between A-tract
curves, the reduced electrophoretic mobility due to D N A
curvature was dramatically abrogated, consistent with interspersion of a torsionally flexible and/or bendable region that
would destroy the helical phasing required for A-tract curvature. T h e effect of (CTG)4.(CAG) 4 was almost identical
to that afforded by two consecutive T * T mismatches
[(TT)-(T'F)]. This study shows that even the very short
(CTG)4e(CAG) 4 tract imparts a very large degree of flexibility to a I)NA helix [5"].
Differential assembly of (CTG)n'(CAG)" and
(CGG)no(CCG)" repeats into nucleosomes
( C T G ) n ' ( C A G ) n and ( C G G ) n ' ( C C G ) n tracts exhibit
opposite behavior with respect to nucleosome assembly.
(CTG)n*(CAG) n repeats assemble into nucleosomes with
much higher efficiency than any othcr D N A sequences
Table 2
Summary of the properties and structures of (CGG)n,(CCG) n,
(CTG)n'(CAG) n and (GAA)n'(TTC)n repeats.
Repeat
(CGG)n'(CCG)n
Alternative structure
Properties of
duplex DNA
S-DNA
Fast electrophoretic
gel mobility
Hyperflexible,
resists assembly
into nucleosomes
SI-DNA
(CGG) n
(CCG) n
(CTG)n'(CAG) n
Intrastrand hairpins,
interstrand duplexes,
quadruplex DNA
Intrastrand hairpins,
interstrand duplexes
S-DNA
SI-DNA
(CTG) n
(CAG) n
(GAA)n°(TTC) n
(GAA) n
(TTC) n
Intrastrand hairpins,
interstrand duplexes
Intrastrand hairpins
Intramolecular triplex DNA
?
?
Fast electrophoretic
gel mobility
Hyperflexible,
preferential assembly
into nucleosomes
Trinucleotide r e p e a t D N A s t r u c t u r e s Pearson and Sinden
yet identified [8-10], whereas (CGG)n*(CCG) n repeats
assemble into nucleosomes with the least efficiency of
any known sequence [11,12]. This difference is surprising given the general observation that flexible DNAs and
DNAs that contain inherent curved regions typically
assemble
preferentially into nucleosomes [13].
Presumably
the
inherent
flexibility
of
the
(CTG)n.(CAG) n repeats alh)ws the sequence to easily
adopt the conformation required in order to stably bend
around nucleosomes. (CGG)n*(CCG) n repeats are inherently flexible, although they may not easily adopt the
stable radius of curvature required for the assembly of
DNA into nucleosonms. Length-dcpendent differences
have been reported for the effects of methylation at CpG
dinucleotides within (CGG)n*(CCG) n repeats; methylated short repeats (n = 13) and long repeats (n = 74-76)
assemble into nucleosomes more and less easily than
nonmethylated repeats, respectively [12,14J. Both triplet
repeats are flexible [5",7"*], although they almost certain-
323
ly have very different biological properties in terms of the
chromatin organization upon expansion in the disease
state, and also depend upon the state of methylation.
Inter- and intramolecular
association
of triplet
repeats: mismatched
duplexes and hairpins
Mismatched duplexes and hairpins
Single-stranded oligonucieotides of (CCG) n or (CGG)~
repeats can form intermolecular duplexes and intramolecular hairpin duplexes (Figure 1). On native polyacrylamide
gels, the migration of intramolecular triplet hairpins is
characteristically faster than that of their complementary
base-paired duplexes [15-23], whereas the migration of
the intermolecular duplexes is slower than that of their corresponding \Vatson-Crick duplexes [20,23,24].
Two alternative schemes of inter or intrastrand (hairpin)
mispairing exist for (CGG) n repeats: one in which the G
of the GpC step participates in a (G).(G) mismatch
Figure 1
(a)
Watson-Crick
(b)
Interstrand
(c)
Hairpins
5' __ cGGcGGcGGcGGcGGc e
(i)
3' - - - - GGCGGCGGCGGCG G c
(ii)
3'
G
5' __ c G G c G G c G G c G G c G G - - 3 '
3' - - - - GGCGGCGGCGGCGGC - - 5'
5' __ cGGcGGcGGGGGcGG C
- - GGCGGCGGCGGCGGC G
G
5' - - CGG CGG CGG eGG CGG 3 '
3' - - GCC GCC GCC GCC GCC - - 5 '
e-motif
(iii)
5' - - G
CCCCC
5' - - c C G c C G c C G c C G c G G c
C G
3' - - G C C G C C G C C G C C G C C G c
C
CG CG CG CG C 3'
(ii)
5' - (iv)
5' _ cT GoT GeT GeT GoT G 3'
-
(iii)
(ii)
5' - - C T G
CTG CTG CTG CTG 3 '
3,--GAC
GAC GAC GAC G A C - -
G
C
~' _ cT GeT GeT GcT GeT G C
-
3' - - GT CGT CGT CGT CGT C --5'
--CcGCcGCCGCcGCc
3' - - G C C G C C G C C G C C G C C G
(v)
3' - - GT CGT CGT CGT CGT C G T
(vi)
5' _ cA c-cA GcA GcA GcA G C
3' - - GA CGA CGA CGA CGA e G A
5,
Current Opinion in Structural Biology
Structures assumed by the individual single strands of triplet repeats. (a) Canonical Watson-Crick (CGG)n°(CCG) n and (CTG)n°(CAG) n duplexes
between complementary strands containing triplet repeats (i and ii, respectively). (b) Intermolecular duplexes formed between individual (CGG)n,
(CCG) n, and (CTG) n strands (i, ii and iii, respectively). Intermolecular duplexes have not been reported for (GAG) n strands. Single strands of (CCG) n
can form an interstrand duplex, called an e-motif, in which the Cs of the G p C dinucleotide step are extrahelical. (CTG) n and (CGG) n duplexes contain
T.T and GoG mispairs, respectively. (¢) A variety of intramolecular duplexes (hairpins) have been reported for (eGG)n, (COG)n, (CTG) n (v) and (GAG) n
(vi) repeats. (CGG) n and (CCG) n repeats can adopt two different folding patterns in which the G°G or CoC mispair is flanked by either (CG)°(CG) or
(GC)o(GC) (i or ii and iii or iv). The preferred structures are indicated in bold. The larger font denotes the mispaired or extrahelical bases.
324 Nucleicacids
(Figure lci); the other in which the G of the CpG step
participates in a (G)°(G) mismatch (Figure lcii). T h e
former is the preferred conformation [20,21]. Mitas and
colleagues [16] reported structures with the latter mispairing scheme, however, and this exception may be due
to the length of the repeat tract or the fact that the
authors used sequences with nonrepeating flanking
sequences that hydrogen bond, 'forcing' the observed
structural alignment. For repeat lengths of n = 5-11, the
equilibrium between the (CGG) n hairpin and the
[(CGG)~] e duplex is affected by n and the salt concentration. Low values of n (n < 10) and high salt concentration (200 mM NaCI) favor the duplex conformation [20].
With (CGG)ll only the hairpin conformation exists [24].
T h e loops of the CGG hairpins are sensitive to diethyl
pyrocarbonate modification [22].
(CCG) n hairpins can contain mismatched Cs that involve
the C of the CpG and GpC steps (Figure lciii and iv).
The former appears to be the preferred conformation
[20], but this may be influenced by the length [21].
(CCG)n, with n = 2-5, forms a staggered interstrand
duplex stabilized by Watson-Crick (CG)°(CG) basepairs, in which the (Is of the GpC dinticleotide are extrahelical and located diagonally across the minor groove
(called an e-motif) (Figure lbii) [25]. T h e extrahelical Cs
appear to be in equilibrium with the protonated stacked
form. T h e hmger (CCG)ls tract appears to form an
intramolecular hairpin [26 °] with characteristics very
similar to an e-motif in which the (] residues are
unpaired and cxtrahelical. (CCG) n hairpins are reportedly more stable than (CGG),~ hairpins [20].
Single stranded ((]TG)10 or ((iT(J)30 oligonucleotides
can form inter and intrainolecular duplexes (Figure lb,c)
[23]. (CAG)10 and (CAG).~ 0, however, only form
intrastrand hairpin structures. The hairpin loop typically
contains three or four bases [27]. T h e tip of the hairpin
loops of both (CT(;),~ and ((lAG) n hairpins are sensitive
to Pl nuclease [17,18]. Only at high temperatures
(60-70°(]) were the (T)°(T) mismatches in the ((7I'G),~
hairpin sensitive to potassium permanganate modification [18], suggesting a B-like conformation for the hairpin stem.
Hairpin base, base mispairs
In ((X;(;) n hairpins with n = 11-15, the (G)°((;) mismatches appear to bc hydrogen bonded [22,241, whereas
with n = 3, the mismatches appear to bc unpaired and flexible in xarious d\namic conformational isomers [21].
Additionally, the misntatches alternatc G(s/'n)oG(and)
throughout the (]GG hairpin 116,20,24]. Although no C-C
interactions should exist in the e-nlotiE h3'drogen bonds
have been detected in (CCG) n hairpins with n = 5-7. This
is consistent with the presence of a ((])°(C) mispair
[21,24]. Longer (CC(;),~ ranlccules, with n > 15, appear to
form an c-motif-like intcrmolecular hairpin [26"], in which
the c\'tnsine residues are unpaired and extrahclical.
In (CTG) n hairpins, the (T)°(T) mismatches are stacked
into the helix and appear to be stabilized by two hydrogen
bonds [21,27,28], similar to a wobble (G)*(T) mispair [29].
In (CAG) n hairpins, the (A)*(A) mismatches exhibit conformational instability, showing little, if an B base-base
interaction [21,28]. These results provide an explanation
for the red(iced biophysical stability of CAG hairpins compared with CTG- hairpins [17,23,28,30].
In either the (CGG) n or (C(]G) n intermolecnlar duplexes
or intramolecular hairpins, the cytosine residues of the
CpG step are either (C)-(C) mismatches, or are destabilized by either the adjacent mispaired (G)*(G) residues or
the adjacent extrahelical cytosine residues (e-motif)
(Figure 1). Interestingly, the cytosine residues of the CpG
step are methylated in expanded FRAXA repeats.
Presumably, the destabilization facilitates the flipping of
the cytosine away from the helix and into the active site (if
the CpG methylase.
Thermodynamics of mispaired duplexes
Analysis of the thermodynamic parameters (Tin,i- v and
AG~7(:) of single-stranded repeats revealed that the formation of intrastrand structures (hairpins) from
Watson-Crick duplexes is favored for the disease-assnciated (CTG)n°(CAG) n and (CGG)n°(C(]G),~ rcpeats relatixe to the nondiseasc-associated (GTC),~°(GA(]),~
repeats [21].
Petruska eta/. [23] measured the length independence of
the T m ~alues for (CTG),~ or (CAG-)n hairpins: the differences were 0°C and I°C, rcspectively, for repeats of
n = 10 and 30 [23]. Similarl> only a minor difference of
7°(]
existed
between
the
T m values
for
((]TG)10.(CAG)10 and ((]TG).~0°(CAG).~ 0 duplexes.
These findings contrast with the predicted direct relationship between repeat length and T,n [17,30]. The
actual length independence of T m suggests that rather
than forming a single large hairpin, the longer tract mav
forrn lnultiple short hairpin structures [23].
Triplex D N A structures in (GAA)n'(TTC) n
Honmpurineohonlopyrimidinc tracts that contain mirror
repeat symmetry can forn~ four different intramolecular
triplex DNA structural isomers (see, for review, [311).
This
sequence
requirement
is met
I1\ the
((;AA),°(TTC),~ tract, which forms intramolecular triplex
I)NA (Figure 2a) [32]. Within this sequence, the 3' half of
the T T C strand folds into the major grooxe of the other
half (if the Pu*Pv tract, thus forming Hoogsteen hydrogen bonds and creating C+°G°C and 7'°A°T base triads
(the italicized base represcnts the third strand). The formatinn of this I)NA structure is Llvorcd by a low pit (as
a restllt of the required protonation of the third strand
cytosine) and DNA supercoiling. These structures can
exist in cells [31 ]. An intennolecular triplex may also form
if an appropriate single-stranded triplet-containing strand
is available (Figure 2b).
Trinucleotide repeat DNA structures Pearson and Sinden
cluded the formation of any other quadruplex, and it is
not known whether this would be a preferred folding
scheme within hmger repeats. Many triplet repeat tracts
clearly form hairpins, and thesc triplet repeats may fold
into quadrt, plex or other higher order structures. A rigorous demonstration of any such structures, however, is not
yet available,
Figure 2
(a)
d',
%
02"2, "g G A A
O.x
GAAG'~A
GAA
GAAQ44
a
Slipped-stranded DNA structures
%
.a
0
TTC
TTC
TTC
TTC
TT
AAG
AAG
AAG
AAG
oo
C
AA G
T
5'
TTC
TTC
TTC
TTC
TT
aloe
ooo
i l l
ooo
olel
ooo
lee
There are two types of slipped-stranded DNA structures - 'hnmoduplex' Slipped structures (S-DNA) [36], which are
formed between two complementary strands with the same
number of repeats paired in an out of register fashion, and
'heteroduplex' Slipped Intermediates (SI-DNA) [37"], in
which the strands have different numbers of repeats
(Figure 4). SI-DNA may arise through replication slippage,
resulting in repeat expansions or deletions (Figure 4a),
whereas strand slippage in the absence of replication will
result in S-DNA (Figure 4b).
G
A
A
3'
ooo
eo
C I
(b)
J
o~,~
325
"~.
/"//..
j
~ O7"2.~G4 ~ GA A G A A G A AGA A G A A G A AGA AGA AG,kikco
%~
o~~
°/"~ o
3'------AAG
AAG
S-DNA
&&
~,©
TTC TTC
oo~
AA
AAG
. . . . . .AT .TAC.G
TTC
TTC
AA
~
T T £7'
AAG AAG
AAG------5'
TTC
TTC
oo
Current
3'
Opinion in Structural Biology
The triplex DNA structures formed by (GAA)o(TTC) repeats.
(GAA)n'(TTC) n repeats can form (a) intramolecular triplex DNA
structures in which the 3' half of the TTC tract folds into the major
groove of the 5" half of the repeat tract [31 ]. (b) Intermolecular triplex
structures may also form provided a single pyrimidine-rich strand is
available. A filled bullet (*) indicates Watson-Crick hydrogen bonds
and a hollow bullet (o) indicates Hoogsteen hydrogen bonds.
Formation of quadruplex DNA structures
within triplet repeats
Several reports have shown that, under some conditions,
quadruplex D N A structures (Figure 3) can form within
single-stranded e G G tracts. Fry" and Loeb [33] showed
that short oligonucleotides composed of e G G residues
(but not CCG) formed complexes in the presence of K +,
Na +, or IA + ions that migrated as four times its molecular
weight on polyacrylamide gels. Usdin and Woodford [34]
have interpreted specific pauses in in vitro replication as
an indication of quadruplex formation. T h e pause
required the presence of K + ions and the e G G template
strand, conditions required for quadruplex formation.
Chemical modification studies, including protection of
the N7 of guanine [33,34], have supported the contention that guanines are involved in Hoogsteen hydrogen bonding, a characteristic of quadruplex D N A
(Figure 3e). Kettani eta/. [35] detected the quadruplex
with two G ' C ' G ° C tetrads flanked by two G - G * G , G
tetrads (G quartets) shown in Figure 3d. T h e Kettani
structure, which was expected from studies of the folding of an intramolecular hairpin (Figure 3f), is different
from that proposed b \ Usdin and \Voodford [34] (Figure
3e). T h e design of the Kettani sequence probably pre-
Although hmg proposed (see, for reviews, [1",31]), the first
biochemical evidence for the existence of slipped-stranded
DNA strnctures (S-DNA) was obtained using DNA containing (CTG)n*(CAG) n and (CGG)n'(CCG) n repeats [36].
S-DNA formed following the denaturation and renaturation
(reduplexing) of DNA. S-DNAs migrate in polyacrylamide
gels as a broad distribution of distinct products ranging up to
twice their actual size. This reduced gel mobility is consistent with that expected for DNA containing bends introduced from the three- and four-way junctions generated by
the looped-out strands [381. The broad distribution was as
expected from the multiple possibilities for strand slippage
involving different lengths of loop outs at multiple locations
throughout the repeat tract (see Figure 4b). In fact, the complexity of the S-DNA populations, as well as the percentage
of S-DNA formed, increased with increasing length of the
repeat tract [36,37",39"',40"]. T h e effect of length on the
propensity of S-DNA formation correlates with the effect of
length on genetic stability. Electron microscopic analyses
confirm that S-DNA secondary structure occurs within the
repeat tract, that the S-DNA is subtended by linear mixedsequence duplex DNA, and that the S-DNAs are a heterogeneous popt, lation of products [39"']. S-DNA does not
require superhelical tension for formation or stability; the
structures arc remarkably" thermostable, and display minireal interconversion between isomers and the linear duplex
form under physiological conditions [36,37",39"']. If the
looped-out regions of the slipped-stranded DNA were
unpaired, then branch migrations could occur, resulting in
the reformation of linear duplex DNA. T h e remarkable stability of (CTG)~*(CAG) n and (CGG)n'(CCG) n S-DNAs
may result from the formation of intrastrand hairpins, as
branch migration would require breaking base pairs at the
slipped-out junction and within the slipped-out hairpin.
(CTG)s0,(CAG)s0 and (CTG)ess*(CAG)es s S-DNAs have
recently been visualized by electron microscopy [39"']. T h e
326
Nucleic acids
Figure 3
(a)
(b)
(c)
3'
(d)
5'
F,'
5'
3'
t3
t
5'
3'
5'
3'
3'
(e)
(f)
>
,-r-(
~r
J
5'
3'
5' ~
5'
3'
5' ~
Current Opinion in Structural Biology
The quadruplex structures formed by triplet repeats. Guanine-rich DNA sequences can form a variety of quadruplex DNA structures including those
structures shown in (a-c) (see, for a review, [31]). Short tracts of thymine (not shown for clarity) typically intersperse guanine tracts. (d) The quadruplex
structure detected by Kettani et al. [35]. (e) The quadruplex structure suggested by Usdin and Woodford [34]. This could form within a single (CGG) n
tract. The duplex/hairpin precursor shown, involving G*G Hoogsteen hydrogen bonds, has not been observed and quadruplex formation probably occurs
without formation of this intermediate. The precursor shown with the Kettani-type quadruplex (f) is the preferred conformation of a (CGG) n intrastrand
hairpin. The quadruplex shown in (e) may be more probable than the Kettani-type quadruplex due to a higher proportion of G-quartets.
major heterogeneous (CTG)s0,(CAG)s0 S-DNA population
showed single and multiple bends and kinks as expected for
short slipped-out regions (15 repeats). A slower migrating
minor (CTG)s0*(CAG)s0 S-DNA population and the
(CTG)zss,(CAG)zs s S-DNA population both contained visible branches and loops, three-way, and four-way (cruciformlike) junctions with variable arm lengths (involving 15-50
repeats). For all molecules, the structures were observed at
random locations throughout the repeat tract, as expected of
a heterogeneous population of S-DNAs (Figures 4b and 5).
S-DNAs have a reduced contour length relative to linear
duplex DNAs, indicating that the strands are in fact slipped
with respect to each other [39"']. All data are consistent with
S-DNAs being slipped-stranded structures.
The effect of sequence interruptions upon S-DNA formation
Sequence interruptions within several trinucleotide repeat
tracts confer an increased genetic stability upon the repeat
Trinucleotide repeat DNA structures Pearson and Sinden
Figure 4
~.~(ii)
(a)
Delet~/,'---~
Expans=on~
s
,
Ist round of
replication
J
Expansion
product
2nd round of
replication
(b)
(i)
Deletion
, , , ' - ' - - - product
S-DNA
,Ln-_
(ii)
~all --_
~k__k_~
(iii)
r U-
"-CCurrent Opinion in Structural Biology
S-DNA and SI-DNA. (a) Replication slippage model showing SI-DNA.
During synthesis of the triplet repeat, slippage of the nascent strand
forwards (i) or backwards (ii) on the template strand results in the
formation of heteroduplex slipped intermediate DNAs (SI-DNAs).
Following continued replication, the DNA contains an excess of
repeats in one strand. This will subsequently lead to expansion (i) or
deletion (ii) products. Thin lines represent repeat tracts, and broad
lines are non-repetitive flanks. (b) Various structural isomers of slippedstranded DNAs (S-DNAs) can form following denaturation and
renaturation of a triplet repeat tract in an out of register fashion (i-iv).
Looped-out regions can be of variable size or number, and they can be
positioned throughout the repeat tract (see, for discussion, [36]).
tracts (~[hble 1) [1°]. In the SCA1, SCA2 and FRAXA loci,
the repeat tracts (CAG)n'(CTG)n, (CAG)n'(CTG) n and
(CGG)n*(CCG) n, are normally interrnptcd
with
(CNF)*(ATG), (CAA).(TTG), and (AGG).(CCT), respectively (Table 1). Loss of these interruptions results in a
longer length of the pure tract, and correlates with genetic
instability, expansion and disease. Sequence interruptions
dramatically reduce the propensity of S-DNA formation
and the complexity (or heterogeneity) of the S-DNAs
formed [40"]. In fact, sequence interruptions can abolish or
reduce the anmunt of certain S-DNA isomers present. T h e
reduced propensity of S-DNA formation may be as a result
of the reduced stability of the DNA that contains the mispaired interruptions either in interstrand slippages or in
intrastrand hairpins (Figure 5a). In this fashion, the interruptions would effectively anchor the repeat tracts in an inframe Watson-Crick base-pairing conformation. This
would limit the number of S-DNA isomers formed within
the interrupted repeat tracts (compare Figure 5b and c).
T h e inhibitory structural phenomenon specific to interrup-
327
tions of either (CAG)n'(CTG) n or (CGG)n*(CCG) n repeats
suggests a possible molecular explanation for the biology
associated with the protective mechanism of interruptions;
interruptions may inhibit the formation of slipped mutagenic intermediates during replication and, in this fashion,
inhibit genetic length changes. Thus, the effects both of
the length and purity of the repeat tract on the propensity
of S-DNA formation correlates with their effects on genetic stability and disease.
Slipped intermediate DNA structures (SI-DNA)
SI-DNAs have been formed by reannealing DNAs containing two different lengths of repeats [37°]. SI-DNAs
migrate more slowly in polyacrylamide gels than their
respective linear duplexes or S-DNAs that are formed from
either length of repeat. Two sister SI-DNAs composed of
(CTG)m.(CAG) n and (CTG)n'(CAG) m (where n ~ m)
could be electrophoretically separated, suggesting that
they are structurally distinct [37°]. Similarly, two sister
(CGG)*(CCG) SI-DNAs could also be electrophoretically
separated (CE Pearson, unpublished data). T h e structural
differences between SI-DNAs containing either loopedout C T G or CAG repeats may reflect the different singlestranded character of the looped-out strands [36,37"]. In
S-DNA, the CAG strand is preferentially susceptible to
mung bean nuclease compared to the C T G strand [36],
indicating the greater single-stranded character of the
CAG strand (consistent with the lower thermal stability of
the CAG hairpin relative to the C T G hairpin). Structural
differences between sister SI-DNAs have also been
detected by the binding of the human mismatch repair
protein hMSH2 [37°].
Conclusions
DNA is not a passive participant in its procreation.
Through subtle variations in helix shape, DNA controls its
intimate association with regulatory proteins. Through
variations in helix trajectory, stiffness, and flexibility, the
DNA can specify both its pattern of conjugal assembly
with nucleosomes, or its organization into specific threedimensional structures that regulate gene expression or
facilitate genetic recombination. Within the past seven
years, a novel type of repeat expansion mutation leading to
human disease, called dynamic mutation [2]( has been
identified in ( C T G ) n ' ( C A G ) n, (CGG)n-(CCG)n, and
(GAA)n.(TTC) n repeats. T h e working hypothesis in this
field is that an unusual property or structure of the helix, or
the formation of DNA secondary structures such as a hairpin or slipped-stranded structures, is responsible for the
unusual biology associated with these repeats. In addition
to genetic instability in humans, these repeats exhibit orientation-dependent deletion in yeast and bacteria [41-44].
Their stability is influenced by mutations in proteins that
are involved in DNA replication and repair [45-47]; they
can pause replication forks in vivo [48°], and can direct
intramolecular strand switching during replication in vitro
[49]. T h e progress summarized here reveals previously
unrecognized dynamic properties of D N A including
328
Nucleic acids
Figure 5
(a) Slippage inhibition
Linear duplex
Interstrand
slip outs
Interstrand
slippage
~ and
and/or
or
(c) Pure tract
(b) Interrupted tract
Linear duplex
Linear duplex
Limited slipped isomers
with short slip outs
Multipleslippedisomers
..... & A A A &
.........
~ &__&_&__&.......
.....
YY
~
&
& _ _&..... &
Y£
4- qL '
-q-
I
.....
m
Current Opinion in Structural Biology
Models for the protective mechanism of repeat sequence interruptions. The open circles represent (CAG)o(CTG) and (CGG)°(CCG) repeats, and
the filled circles represent the (CAT)o(ATG) and (AGG)o(CCT) interruptions. (a) The reduced percentage of S-DNA formed by an interrupted repeat
tract relative to a pure (uninterrupted) tract may result from the reduced stability of the inter and intrastrand slipped DNAs containing these
interruptions. These sequence interruptions will destabilize both interstrand and intrastrand duplexes. (b) Sequence interruptions may limit the
position, and thereby the length, of looped-out regions. Loop-outs in sequence interrupted repeat tracts may occur preferentially between the
sequence interruptions or form with the sequence interruptions at the tip of a hairpin loop (not shown). (c) Pure repeat tracts can have loop-out
strands of variable lengths throughout the repeat tract.
Trinucleotide repeat DNA structures Pearson and Sinden
unusual helical properties, such as very flexible DNA, and
alternative DNA secondary structures, including a variety
of intrastrand hairpins containing mispairs, slipped-stranded DNA, and intramolecular triplex and quadruplex structures. Clearly, (CTG)n'(CAG)n, ( C G G ) n ' ( C C G ) n, and
(GAA)n'(TTC) n repeats represent dynamic DNA associated with dynamic mutations. Much has been revealed,
although any direct relationship between DNA structure
and loci-specific expansion leading to human disease
remains, at present, an enigma.
Note added in proof
While this manuscript was in preparation, there were two
publications indicating the formation of triplex DNA structures in (GAA)-(TCC) repeats [50,51] and one demonstrating the in vivo formation of hairpins from long inverted
repeats having ( C G G ) . ( C C G ) repeats at their centre [52].
References and recommended reading
Papers of particular interest, published within the annual period of review,
have been highlighted as:
• of special interest
°° of outstanding interest
1.
•
PearsonCE, Sinclen RR: Slipped strand DNA, dynamic mutations,
and human disease. In Genetic Instabilities and Hereditary
Neurological Diseases. Edited by Wells RD, Warren ST. New York:
Academic Press; 1998:585-621.
This article contains a thorough review of triplet-repeat diseases, the DNA
sequences involved, and expansion models. It is part of a comprehensive
book describing the current state of this field.
2.
Richards RI, Sutherland GR: Heritable unstable DNA sequences
and human genetic disease. Ceil 1992, 70:709-712.
3.
Kohwi Y, Wang H, Kohwi-Shigematsu T: A single trinucleotide,
5'AGC3'/5'GCT3', of the triplet-repeat disease genes confers
metal ion-induced non-B DNA structure. Nucleic Acids Res 1993,
21:5651-5655.
4.
Chastain PD, Eichler EE, Kang S, Nelson DL, Levene SD, Sinden RR:
Anomalous rapid electrophoretic mobility of DNA containing
triplet repeats associated with human disease genes,
Biochemistry 1995, 34:16125-16131.
5. Chastain PD, Sinden RR: CTG repeats associated with human
•
genetic disease are inherently flexible. J Mol Biol 1998, 275:405-411.
This paper demonstrates that (CTG)o(CAG) repeats are unusually flexible.
Interspersion of (CTG)4°(CAG)4 between A-tract curves abrogates overall
DNA curvature to the same extent as interspersion of two consecutive T°T
mismatches. This is a remarkable degree of flexibility for a very short
repeat tract.
6.
Levene SD, Zimm BH: Understanding the anomalous
electrophoresis of bent DNA molecules: a reptation model.
Science 1989, 245:396-399.
7.
••
Bacolla A, Gellibolian R, Shimizu M, Amirhaeri S, Kang S, Ohshima K,
LarsonJE, Harvey S, Stollar BD, Wells RD: Flexible DNA: genetically
unstable CTG.CAG and CGG.CCG from human hereditary
neuromuscular disease genes. J Biol Chem 1997, 272:16783-16792.
This very careful analysis of cyclization kinetics measurements for series of
(CTG)no(CAG)n and (CGG)n.(CCG) n repeats provides the bending and
torsional moduli, persistence length, and helix repeat values. This paper
presents the first demonstration of the flexibility of these triplet repeats.
8.
Wang Y-H, Amirhaeri S, Kang S, Wells RD, Griffith JD: Preferential
nucleosome assembly at DNA triplet repeats from the myotonic
dystrophy gene. Science 1994, 265:669-671.
9.
Wang Y-H, Griffith JD: Expanded CTG triplet blocks from the
myotonic dystrophy gene create the strongest known natural
nucleosome positioning elements. Genomics 1995, 25:570-573.
10. Godde JS, Wolffe AP: Nucleosome assembly on CTG triplet
repeats. J Biol Chem 1996, 271:15222-15229.
329
11. Wang Y-H, Gellibolian R, Shimizu M, Wells RD, Griffith J: Long CCG
triplet repeat blocks exclude nucleosomes - a possible
mechanism for the nature of fragile sites in chromosomes. E Mo/
Biol 1996, 263:511-516.
12. Godde JS, Kass SU, Hirst Me, Wolffe AP: Nucleosome assembly on
methylated eGG triplet repeats in the fragile X mental retardation
gene 1 promoter. J Biol Chem 1996, 271:24325-24328.
13. Shrader TE, Crothers DM: Effects of DNA sequence and histonehistone interactions on nucleosome placement. J Mol Biol 1990,
216:69-84.
14. Wang Y-H, Griffith J: Methylation of expanded CCG triplet repeat
DNA from fragile X syndrome patients enhances nucleosome
exclusion. J Biol Chem 1996, 271:22937-22940.
15. Mitchell JE, Newbury SF, McClellan JA: Compact structures of
d(CNG) n oligonucleotides in solution and their possible relevance
to fragile X and related human genetic diseases. Nucleic Acids
Res 1995, 23:1876-1881.
16. Mitas M, Yu A, Dill J, Haworth IS: The trinucleotide repeat sequence
d(CGG)+5 forms a heat-stable hairpin containing GsYn'Ganti base
pairs. Biochemistry 1995, 34:12803-12811.
17. Mitas M, Yu A, Dill J, Kamp TJ, Chambers EJ, Haworth IS: Hairpin
properties of single-stranded DNA containing a GC-rich triplet
repeat: (CTG)I 5. Nucleic Acids Res 1995, 23:1050-1059.
18. Yu A, Dill J, Mitas M: The purine-rich trinucleotide repeat
sequences d(CAG) 15 and d(GAC) 15form hairpins. Nucleic Acids
Res 1995, 23:4055-4057.
1g. Yu A, Dill J, Wirth SS, Huang G, Lee VH, Haworth IS, Mitas M: The
trinucleotide repeat sequence d(GTC) is adopts a hairpin
conformation. Nucleic Acids Res 1995, 23:2706-2714.
20. Chen X, Mariappan SVS, Catasti P, Ratliff R, Moyzis RK, Laayoun A,
Smith SS, Bradbury EM, Gupta G: Hairpins are formed by the single
strands of the fragile X triplet repeats: Structure and biological
implications. Proc Nat/Acad Sci USA 1995, 92:5199-5203.
21. Zheng M, Huang X, Smith GK, Yang X, Gao X: Genetically unstable
CXG repeats are structurally dynamic and have a high propensity
for folding. An NMR and UV spectroscopic study. J Mol Bio/1996,
264:323-336.
22. NadelY, Weisrnan-Shomer P, Fry M: The fragile X syndrome single
strand d(CGG) n nucleotide repeats readily fold back to form
unimolecular hairpin structures. J Biol Chem 1995, 270:28970-28977.
23. PetruskaJ, Arnheim N, Goodman MF: Stability of intrastrand hairpin
structures formed by the CAG/CTG class of DNA triplet repeats
associated with neurological diseases. Nucleic Acids Res 1996,
24:1992-1998.
24. Mariappan SV, Catasti P, Chen X, Ratliff R, Moyzis RK, Bradbury EM,
Gupta G: Solution structures of the individual single strands of
the fragile X DNA triplets (GCC)n'(GGC) n. Nucleic Acids Res 1996,
24:784-792.
25. Gao X, Huang X, Smith GK, Zheng M, Liu H: A new antiparallel
duplex motif of DNA CCG repeats that is stabilized by extrahelical
bases symetrically localized in the minor groove. J Am Chem Soc
1995, 117:8883-8884.
26. Yu A, Barren MD, Romero RM, Christy M, Gold B, Dai JL, Gray DM,
•
Haworth IS, Mitas M: At physiological pH, d(CCG)15 forms a
hairpin containing protonated cytosines and a distorted helix.
Biochemistry 1997, 36:3687-3699.
This paper presents a careful analysis of the characteristics of a d(CCG)15
intramolecular hairpin, which forms the 'e-motif'. The methods used in this
paper are representative of those used in many similar analyses of this and
other triplet repeats published during 1996 and 1996.
27. Mariappan SV, Garcia AE, Gupta G: Structure and dynamics of the
DNA hairpins formed by tandemly repeated CTG triplets
associated with myotonic dystrophy. Nucleic Acids Res 1996,
24:775-783.
28. Smith GK, Jie J, Fox GE, Gao X: DNA CTG triplet repeats involved
in dynamic mutations of neurologically related gene sequences
form stable duplexes. Nucleic Acids Res 1995, 23:4303-4311.
29. Rabinovich D, Haran T, Eisenstein M, Shakked Z: Structures of the
mismatched duplex d(GGGTGCCC) and one of its Watson-Crick
analogues d(GGGCGCCC). J Mol Biol 1988, 200:151-161.
330
Nucleic acids
30. Gacy AM, Goellner G, Juranic N, Macura S, McMurray CT:
Trinucleotide repeats that expand in human disease form hairpin
structures in vitro. Cell 1995, 81:533-540.
this effect that correlates with the effect of repeat interruptions upon the
stability of repeats in humans.
31. Sinden RR: DNA Structure and Function. San Diego: Academic
Press; 1994.
41. KangS, Jaworski A, Ohshima K, Wells RD: Expansion and deletion of
CTG triplet repeats from human disease genes are determined by
the direction of replication in E. coil. Nat Genet 1995, 10:213-218.
32. Ohshima K, Kang S, Larson JE, Wells RD: Cloning, characterization,
and properties of seven triplet repeat DNA sequences. J Biol
Chem 1996, 271:16773-16783.
42. Maurer DJ, O'Callaghan BL, Livingston DM: Orientation dependence
of trinucleotide CAG repeat instability in Saccharomyces
cerevisiae. Mo/ Cell Bio/1996, 16:6617-6622.
33. Fry M, Loeb LA: The fragile X syndrome d(CGG)n nucleotide
repeats form a stable tetrahelical structure. Proc Nat/Acad Sci
USA 1994, 91:4950-4954.
43. Freudenreich CH, Stavenhagen JB, Zakian VA: Stability of a
CTG/CAG trinucleotide repeat in yeast is dependent on its
orientation in the genome. Mo/Cell B/o/1997, 17:2090-2098.
34. Usdin K, Woodford KJ: CGG repeats associated with DNA
instability and chromosome fragility form structures that block
DNA synthesis in vitro. Nucleic Acids Res 1995, 23:4202-4209.
44. Miret JJ, Pessoabrandao L, Lahue RS: Instability of CAG and CTG
trinucleotide repeats in Saccharomyces cerevisiae. Mo/ Ceil Bio/
1997, 17:3382-3387,
35. KettaniA, Kumar RA, Patel DJ: Solution structure of a DNA
quadruplex containing the fragile X syndrome triplet repeat. J Mol
Bio/1995, 254'638-656.
45. Rosche WA, Jaworski A, Kang S, Kramer SF, Larson JE, Giedroc DP,
Wells RD, Sinden RR: Single-stranded DNA binding protein
enhances the stability of CTG triplet repeats in Escherichia coil
J Bacteriol 1996, 178:5042-5044.
36. PearsonCE, Sinden RR: Alternative DNA structures within the
trinucleotide repeats of the my•tonic dystrophy and fragile X
locus. Biochemistry 1996, 35:5041-5053.
37.
•
PearsonCE, Ewel A, Acharya S, Fishel RA, Sinden RR: Human
MSH2 binds to trinucleotide repeat DNA structures associated with
neurodegenerative diseases. Hum Mo/Genet 1997, 6:1117-1123.
This paper presents the purification, characterization, and mechanism of
binding to the human mismatch repair protein (hMSH2) to SI-DNA and SDNA. SI-DNA (slipped intermediate DNA) contains heteroduplex triplet
repeat regions in which the lengths of the repeat tracts in the
complementary strands are different. Sister heteroduplexes migrate to
different positions on a polyacrylamide gel. hMSH2 preferentially binds to
SI-DNAs containing looped out CAG residues compared with CTG strands.
38. Lilley DMJ, Clegg RM: The structure of branched DNA species.
Q Rev Biophys 1993, 26:131-175.
39. PearsonCE, Wang Y-H, Griffith JD, Sinden RR: Structural analysis
,,* of slipped strand DNA (S-DNA) formed in (CTG)n°CAG)n repeats
from the my•tonic dystrophy locus. Nucleic Acids Res 1998,
26:816-823.
A paper that continues the description of the properties and characteristics
of the S-DNA structures formed in (CTG)n'(CAG)n repeats. Following
denaturation and renaturation, a complex, heterogeneous distribution of SDNA isomers was formed. These structures were characterized
biochemically and visualized using electron microscopy. The majority of SDNAs from (CTG)5oo(CAG)5o repeats contain short slipped-out regions that
are visible only as kinks. Other structures, containing the loops and threeand fourway junctions expected of slipped-stranded structures, are visible,
especially in (CTG)255o(CAG)255 S-DNAs.
40. PearsonCE, Eichler EE, Lorenzetti D, Kramer SF, Zoghbi HY, Nelson
*• DL, Sinden RR: Interruptions in the triplet repeats of SCA1 and
FRAXA reduce the propensity and complexity of slipped strand
DNA (S-DNA)formation. Biochemistry 1998, 37:2701-2708.
This paper demonstrates the fact that sequence interruptions within triplet
repeat tracts reduce both the percentage and the complexity of the S-DNA
formed following denaturation and renaturation. A model is proposed for
46. JaworskiA, Rosche WA, Gellibolian R, Kang S, Shimizu M, Bowater
RP, Sinden RR, Wells RD: Mismatch repair in E. coil enhances
instability of (CTG)n triplet repeats from human hereditary
diseases. Proc Natl Acad Sci USA 1995, 92:11019-11023.
47. Schweitzer JK, Livingston DM: Destabilization of CAG trinucleotide
repeat tracts by mismatch repair mutations in yeast. Hum Mol
Genet 1997, 6:349-355.
48. Samadashwily GM, Raca G, Mirkin SM: Trinucleotide repeats affect
•
DNA replication in vivo. Nat Genet 1997, 17:298-304.
This paper reveals stalling of the replication fork within the repeat tract in
Escherichia coil Pausing is dependent upon repeat length and orientation
with respect to the origin of plasmid replication. This paper is representative
of many showing an in vivo biological effect of triplet repeats, and the
authors suggest that the formation of an unusual DNA structure is
responsible. To date, however, these papers have not presented biochemical
data consistent with the formation of any alternative triplet repeat DNA
structure in vivo. This is likely to be an important area for future research.
49. Ohshima K, Wells RD: Hairpin formation during DNA synthesis
primer realignment in vitro in triplet repeat sequences from human
hereditary disease genes. J Biol Chem 199"7, 272:16798-16806.
50. Bidichandani SI, Ashizawa T, Patel PI: The GAA triplet-repeat
expansion in Friedreich ataxia interferes with transcription and
may be associated with an unusual DNA structure. Am J Hum
Genet 1998, 62:111-121.
51. Gacy AM, Goellner GM, Spit• C, Chen X, Gupta G, Bradbury EM,
Dyer RB, Mikesell MJ, Yao JZ et aL: GAA instability in Friedreichsataxia shares a common, DNA-directed and intra-allelic
mechanism with other trinucleotide diseases. Mo/Ceil 1998,
1:583-593.
52. Darlow JM, Leach DRF: Evidence for two preferred hairpin folding
patterns in d(CGG)*d(CCG) repeat tracts in vivo. J Me/Biol 1998,
275:1 ?-23.