Why Repetitive DNA is Essential to Genome Function (2005)

1
Biol. Rev. (2005), 80, pp. 1–24. f Cambridge Philosophical Society
DOI : 10.1017/S1464793104006657 Printed in the United Kingdom
Why repetitive DNA is essential to
genome function
James A. Shapiro1,* and Richard von Sternberg2,3
1
Department of Biochemistry and Molecular Biology, University of Chicago, 920 E. 58th Street, Chicago, IL 60637, USA
(E-mail : [email protected])
2
National Center for Biotechnology Information – GenBank Building 45, Room 6AN.18D-30, National Institutes of Health, Bethesda,
Maryland 20894 (E-mail : [email protected])
3
Department of Systematic Biology, NHB-163, National Museum of Natural History, Smithsonian Institution, Washington, D.C., 20013-7012
(E-mail : [email protected])
(Received 9 March 2004 ; revised 24 September 2004; accepted 29 September 2004 )
ABSTRACT
There are clear theoretical reasons and many well-documented examples which show that repetitive DNA is
essential for genome function. Generic repeated signals in the DNA are necessary to format expression of unique
coding sequence files and to organise additional functions essential for genome replication and accurate transmission to progeny cells. Repetitive DNA sequence elements are also fundamental to the cooperative molecular
interactions forming nucleoprotein complexes. Here, we review the surprising abundance of repetitive DNA in
many genomes, describe its structural diversity, and discuss dozens of cases where the functional importance of
repetitive elements has been studied in molecular detail. In particular, the fact that repeat elements serve either as
initiators or boundaries for heterochromatin domains and provide a significant fraction of scaffolding/matrix
attachment regions (S/MARs) suggests that the repetitive component of the genome plays a major architectonic
role in higher order physical structuring. Employing an information science model, the ‘ functionalist ’ perspective
on repetitive DNA leads to new ways of thinking about the systemic organisation of cellular genomes and
provides several novel possibilities involving repeat elements in evolutionarily significant genome reorganisation.
These ideas may facilitate the interpretation of comparisons between sequenced genomes, where the repetitive
DNA component is often greater than the coding sequence component.
Key words : transposable element, non-coding DNA, satellite DNA, junk DNA, transcriptional regulation,
chromatin domains, evolution, biocomputing, systems biology, data storage.
CONTENTS
I. Introduction .................................................................................................................................................
(1) The conceptual problem posed by repetitive DNA .........................................................................
(2) Overlooked aspects of genome organisation .....................................................................................
II. The genome as the cell’s long-term data storage organelle : the informatics metaphor ....................
III. Multiple functions of the genome .............................................................................................................
IV. Repetitive signals in the hierarchical control of coding sequence expression .....................................
V. Cooperative interactions : a fundamental reason for repetition in genome formatting .....................
VI. Structural varieties of repetitive DNA ......................................................................................................
VII. Documentation of diverse genomic functions associated with different classes of repetitive
DNA elements .............................................................................................................................................
VIII. Synteny and genomic synonyms ...............................................................................................................
IX. Repeats and RNAi in heterochromatin formatting ................................................................................
X. Taxonomically restricted genome system architecture ..........................................................................
* Address for Correspondence : Tel : 773-702-1625. Fax : 773-947-9345.
000
000
000
000
000
000
000
000
000
000
000
000
James A. Shapiro and Richard von Sternberg
2
XI. Discussion .....................................................................................................................................................
(1) Genome system architecture and evolution ......................................................................................
(2) A more integrative view of the genome .............................................................................................
(3) Repetitive DNA and the computational metaphor for the genome ..............................................
XII. Conclusions ..................................................................................................................................................
XIII. Acknowledgments ........................................................................................................................................
XIV. References ....................................................................................................................................................
I. INTRODUCTION
(1 ) The conceptual problem posed by
repetitive DNA
Fifty years of DNA-based molecular genetics and genome
sequencing have revolutionised our ideas about the physical
basis of cell and organismal heredity. We now understand
many processes of genome expression and transmission in
considerable molecular detail, and whole genome sequences
allow us to think about the principles that underlie the
organisation of cellular DNA molecules. There have been
many surprises and new insights. In the human genome, for
example, the protein-coding component represents about
1.2% of the total DNA, while 43 % of the sequenced
euchromatic portion of the genome consists of repeated and
mobile DNA elements (International Human Genome
Consortium, 2001 ; Table 1). In addition to dispersed
elements, most of the unsequenced heterochromatic portion
of the human genome (about 18% of the total) consists of
repetitive DNA, both mobile elements and tandemly
repeated ‘ satellite ’ DNA. Thus, over half the human genome
is repetitive DNA. Table 1 shows that the human genome is
far from exceptional in containing a major fraction of
repeats. Even in bacteria, repetitive sequences may account
for upwards of 5–10 % of the total genome (Hofnung &
Shapiro, 1999 ; Parkhill et al., 2000).
Despite its abundance, the repetitive component of the
genome is often called ‘ junk, ’ ‘selfish, ’ or ‘ parasitic ’ DNA
(Doolittle & Sapienza, 1980 ; Orgel & Crick, 1980). Because
the view of repetitive and mobile genetic elements as genomic parasites continues to be influential (e.g. Lynch &
Conery, 2003), we feel it is timely to present an alternative
‘functionalist ’ point of view. The discovery of repetitive
DNA presents a conceptual problem for traditional genebased notions of hereditary information. This issue was
noted in the pioneering work of Britten and colleagues
(Britten & Kohne, 1968; Britten & Davidson, 1969, 1971;
Davidson & Britten, 1979). We argue here that a more
fruitful interpretation of sequence data may result from
thinking about genomes as information storage systems with
parallels to electronic information storage systems. From
this informatics perspective, repetitive DNA is an essential
component of genomes ; it is required for formatting coding
information so that it can be accurately expressed and for
formatting DNA molecules for transmission to new generations of cells. In addition, the cooperative nature of proteinDNA interactions provides another fundamental reason
why repeated sequence elements are essential to format
genomic DNA. Instead of parasites, we argue that repetitive
000
000
000
000
000
000
000
DNA elements are necessary organisers of genomic information.
( 2) Overlooked aspects of genome organisation
Wide divergences between closely related taxa in repeat
abundance and genome size (C-value) are often cited as
evidence of the parasitic nature of repetitive DNA (Pagel &
Johnstone, 1992; Vinogradov, 2003). This argument
ignores three important aspects of genome organisation as a
complex information system :
(a) The first aspect is that robust complex systems rely
upon redundant components, many of which can be
removed without detectably affecting overall system performance. Such robustness characterises repetitive DNA
elements, particularly those arrayed in long tandem repeats
that form compact nuclear structures, like centromeres. A
minimum number may be required for function, but
excesses are compatible with normal operation. Tandem
arrays are often the regions that vary most between related
taxa. Sometimes, different repeat elements fulfill equivalent
tasks, so that absence of one particular element does not
disable genome function.
(b) The second overlooked aspect is the significance of
genome size and of distance between distinct regions of the
genome. Rapidly reproducing organisms, like Caenorhabditis,
Drosophila, Fugu and Arabidopsis tend to have stripped-down
genomes with relatively less abundant repetitive DNA, while
organisms with longer life cycles, such as humans and
maize, have larger genomes with correspondingly more
repetitive elements (Table 1 ; Cavalier-Smith, 1985).
(c) The third neglected aspect of genome organisation is
the importance of stoichiometric balance between DNAbinding proteins and their cognate recognition elements
(Schotta et al., 2003). Changes in the ratio of proteins to
repetitive elements influences chromatin structure and have
observable phenotypic effects. Thus, repetitive DNA abundance is flexible but not adaptively neutral.
II. THE GENOME AS THE CELL’S LONG-TERM
DATA STORAGE ORGANELLE : THE
INFORMATICS METAPHOR
The key idea is to think of DNA as a sophisticated data
storage medium. In order for cells to access, preserve,
duplicate and transmit digitally encoded sequence information, DNA has to interact with other molecular components. Structurally and through its interactions with other
Why repetitive DNA is essential to genome function
3
Table 1. DNA content in higher eukaryotes
% repetitive
DNA
% coding
sequences
Reference
14
13
<10
Stein et al. (2003)
Stein et al. (2003)
Bennett et al. (2003) ; Celniker et al. (2002)
157 MB
365 MB
2.4 GB
2.5 GB
2.9 GB
16.5
22.4
33.7 (female)
y57 (male)2
35
15
31
40
o50
125–157 MB
13–14
21
466 MB
420 MB
2.5 GB
42
45
77
11.8
11.9
1
Species
Genome size1
Animals
Caenorhabditis elegans
Caenorhabditis briggsae
Drosophila melanogaster
100 MB
104 MB
175 MB
Ciona intestinalis
Fugu rubripes
Canis domesticus
Mus musculus
Homo sapiens
Plants
Arabidopsis thaliana
Oryza sativa (indica)
Oryza sativa ( Japonica)
Zea mays
1
2
9.5
9.5
1.45
1.4
1.2
Dehal et al. (2002)
Aparicio et al. (2002)
Kirkness et al. (2003)
Mouse Genome Sequencing Consortium (2002)
International Human Genome Consortium (2001)
Arabidopsis Genome Initiative (2000)
Bennett et al. (2003)
Yu et al. (2002)
Goff et al. (2002)
Meyers et al. (2001)
MB=megabases (106 base pairs), GB=gigabases (109 base pairs).
The D. melanogaster Y chromosome is largely heterochromatic repetitive DNA.
cellular molecules, DNA stores information on three different time scales:
(1) Long-term (‘ genetic ’) storage involves DNA sequence information stable for many organismal generations.
(2) Intermediate-term (‘ epigenetic ’) storage occurs
through complexing of DNA with protein and RNA into
chromatin structures that may propagate over several cell
generations (see e.g. Jenuwein & Allis, 2001, and other
articles in the same issue ; Van Speybroeck, Van de Vijver &
De Waele, 2002). Chemical modifications of DNA that do
not change sequence data, such as methylation, contribute
to epigenetic storage (Bird, 2002).
(3) Short-term (‘computational’) information storage
involves dynamic interactions of DNA with proteins and
RNA molecules that can adapt rapidly within the cell cycle
as the cellular environment changes.
The capacity of DNA to complex with RNA and protein
to store information at these three different time scales
enables the genome to fulfill multiple roles in cell and
organismal heredity. Genomes serve as the organism’s
evolutionary record and most basic repository of specific
phenotypic information. Genomes also participate in
executing programs of cellular differentiation and multicellular morphogenesis during organismal life cycles. These
programs are not hard-wired in the DNA sequence, and
they sometimes permit the formation of very different organisms utilising a single genome (e.g. invertebrates having
distinct larval and adult stages). Finally, genomes participate
in computational responses that allow each cell to complete
its cell cycle and deal with changing circumstances, correct
internal errors and repair damage (Nurse, Masui &
Hartwell, 1998).
The explicit parallel with electronic data systems indicates
that the genomic storage medium has to be marked, or
formatted, with generic signals so that operational hardware
can locate and process the stored information. This is
basically the idea of Britten and Davidson (1969). Like all
information storage systems, the genome contains various
classes of data files, and these files have to be formatted for
access and copying. While many of the data files are unique
(e.g. protein coding sequences), the formatting information
must be far simpler in content. The requirement for reduced
information content in formatting signals is the most basic
reason that repetitive DNA sequences are essential to
genome function.
III. MULTIPLE FUNCTIONS OF THE GENOME
It is tempting to reduce genome function to encoding
cellular proteins and RNAs. However, molecular genetics
has shown that achieving this task requires cells to possess a
number of additional capabilities also encoded in the genome : (1) Regulating timing and extent of coding sequence
expression. (2) Organizing coordinated expression of protein
and RNA molecules that function together. (3) Packaging
DNA appropriately within the cell. (4) Replicating the
genome in synchrony with the cell division cycle. (5)
Transmitting replicated DNA accurately to progeny cells at
cell division. (6) Detecting and repairing errors and damage
to the genome. (7) Restructuring the genome when necessary (as part of the normal life cycle or in response to a critical
selective challenge). These additional capabilities involve
specific kinds of interactions between DNA and other
cellular molecules. The construction of highly precise transcription complexes in RNA and protein synthesis is one
example of such interactions (Ptashne, 1986). Formation of
a kinetochore structure at the centromere for attachment of
microtubulues to ensure chromosome distribution at mitosis
is another example (Volpe et al., 2003).
IV. REPETITIVE SIGNALS IN THE
HIERARCHICAL CONTROL OF CODING
SEQUENCE EXPRESSION
To see how generic repetitive sequences format DNA for
function, we can examine coding sequence expression.
Historically, this genome operation has received the greatest
attention. There are multiple roles for repetitive sequences.
Specific transcription is not possible without generic start
sites (promoters), stop sites (terminators in prokaryotes,
polyA addition signals in eukaryotes) and signals for RNA
processing (such as splice sites) (Alberts et al., 2002).
Regulation of transcription initiation involves arrays of
binding sites for transcription factors that interact with the
basic transcriptional apparatus to ensure proper timing and
location of expression (Ptashne, 1986). When these binding
sites are shared by several genetic loci, cells achieve
coordinated expression of corresponding RNA and protein
products (Arnone & Davidson, 1997).
Microbial genomes have an operon-like organisation at
every scalar level (Audit & Ouzonis, 2003). This suggests
that prokaryotic chromosomes are partitioned into a nested
series of ‘ folders, ’ which can be differentially accessed by the
cell. Likewise, eukaryotic genomes have a hierarchical
structure that reflects at least three regulatory layers (van
Driel, Fransz & Verschure, 2003). The existence of genomic
folders has been described as epigenetic ‘ indexing’ of the
genome ( Jenuwein, 2002). By regulation of chromatin
remodeling and localisation of chromatin domains, cells
achieve coordinate control over multi-locus segments of the
genome, as observed in Drosophila melanogaster (Boutanaev
et al., 2002), Caenorhabditis elegans (Roy et al., 2002), and
humans (Caron et al., 2001). In addition, systematic RNA
inactivation (RNAi) knockout studies in C. elegans have
identified extensive clusters of functionally related (hence,
coordinately controlled) genetic loci (Kamath et al., 2003).
The molecular details of coordinate regulation by chromatin organisation are rapidly becoming evident (e.g. Greil
et al., 2003).
In eukaryotes, transcription is strongly influenced by how
DNA is packaged. Packaging involves winding the DNA
into nucleosomes, compacting the nucleosomes into higher
order ‘chromatin ’ structures complexed with protein and
---lacI---|
O1
>------lacZ-------O3 CRP P-35 P-10 O1
O2
AATTGTGAGCGGATAACAATT
AATTGTTATCCGCTCACAATT
Ever since the elaboration in the early 1960s of the
operon model ( Jacob & Monod, 1961) and the replicon
hypothesis ( Jacob, Brenner & Cuzin, 1963), we have
understood specific interactions of cellular proteins with the
genome to depend upon the existence of recognition signals
in the DNA distinct from coding sequences. Signals like the
operator or the origin of chromosome replication are completely different from any classical definition of a ‘ gene ’ as a
basic unit. These recognition signals are themselves repetitive sequences in the genome, often carried by larger repeat
elements. The idea that repetitive DNA is ‘junk’ without
functional significance in the genome is simply not consistent with an extensive and growing literature, only a minor
part of which is cited here.
James A. Shapiro and Richard von Sternberg
CRP TGTGAGTTAGCTCACT
AGTGAGCTAACTCACA
4
Fig. 1. Linear structure of the lac operon regulatory region.
The region schematised extends from the end of the lacI coding
sequence to the beginning of the lacZ coding sequence. O1, O2
and O3 are the three documented operators ; note that O2 is
part of the lacZ coding sequence. The x10 and x35 regions of
the canonical sigma 70 promoter are indicated together with
the binding site for the cAMP receptor protein (CRP) needed
for full promoter function.
RNA ( Jenuwein & Allis, 2001), and attaching the chromatin
to the nuclear scaffold or matrix (Laemmli et al., 1992).
Generic repetitive signals affect positioning of nucleosomes,
formation of different classes of chromatin, extent of
chromatin domains encompassing multiple genetic loci, and
scaffold attachment (see Table 3).
Without entering into details here, it is widely recognised
that the processes of replication, transmission, repair and
DNA restructuring involve analogous recognition of specific
repeated signals in the genome and the formation of higherorder nucleoprotein structures (Alberts et al., 2002).
V. COOPERATIVE INTERACTIONS: A
FUNDAMENTAL REASON FOR REPETITION IN
GENOME FORMATTING
One of the simplest genome–proteome interaction systems,
E. coli lac operon repression, provides a clear example of a
fundamental principle : cooperative protein-DNA interactions involve repeated DNA sequence elements. Like many
bacterial operons, lac contains multiple protein interactions
sites, including related repeats of a sequence recognised by
the LacI repressor’s DNA-binding domain (Müller, Oehler
& Müller-Hill, 1996; Fig. 1). As illustrated for the principal
operator, O1, and for the distributed cyclic AMP receptor
protein (CRP) binding site, these repeats are frequently
organised as imperfect head-to-head palindromes (Ptashne,
1986; Fig. 2). The palindromic structure means that each
recognition element is itself composed of repeats.
Operon repression requires that four repressor monomers
organise into two dimers to bind four DNA recognition
sequences in two palindromic operators (Lewis et al., 1996 ;
Fig. 2). One dimer binds each operator, and the two dimers
contact each other through their C terminal domains. This
binding forms a DNA loop which prevents RNA polymerase
from accessing the promoter to initiate transcription. A single
repressor monomer-half operator interaction is too weak to
form a stable structure, but the cooperative interaction of
two monomer-DNA binding events plus the protein–protein
binding within the dimer creates a stable protein–DNA
complex (Ptashne, 1986). Formation of the loop further
stabilises the structure and more effectively occludes the
promoter (Fig. 2). Thus, repeated copies of the recognition
P35
Why repetitive DNA is essential to genome function
P
P-10
O3
CR
O1
5
(Arnone & Davidson, 1997 ; Yuh, Bolouri & Davidson,
1998).
Cooperativity between individually weak but stereospecific molecular interactions to build meta-stable and
stable multimolecular complexes or structures underlies the
operation of cellular control circuits. Cooperativity and
multiple interactions provide precision, combinatorial
richness and robustness similar to the operation of multilayered fuzzy logic systems (http://zadeh.cs.berkeley.edu/).
Because the underlying interactions are weak, multimolecular structures can assemble and disassemble rapidly
to provide the dynamic responses needed to deal with a
changing cellular environment. On the other hand, complexes with large numbers of cooperatively interacting
components can be very stable, as occurs in longer-term
epigenetic regulation of the genome. In the latter case, we
find extensive arrays of repeated DNA elements in stable
heterochromatic structures (Grewal & Elgin, 2002).
VI. STRUCTURAL VARIETIES OF
REPETITIVE DNA
Fig. 2. Cartoon of lac promoter looping by repressor binding
to O1 and O3 operators. The drawing is not to scale. Note that
the effective promoter is divided into three separate components : the x10 and x35 binding sites for Sigma 70 RNA
polymerase as well as the binding site for the required cAMP
receptor protein (CRP) transcription factor.
sequence are critical to the formation of a repression complex.
When it is necessary to lift repression, an inducer
molecule disrupts the cooperativity that supports a stable
protein-DNA complex, repressor molecules separate from
operators, and the promoter becomes accessible for transcription. Transcription initiation itself requires cooperative
interactions ; RNA polymerase binds to the x10 and x35
sites in the promoter and also to the dimeric cAMP-CRP
complex, itself bound to two recognition sequences that
make up the palindromic CRP site (Fig. 1).
The iteration of protein recognition sequences (transcription factor binding sites) is a general feature of both
prokaryotic and eukaryotic transcriptional regulatory
regions (Ptashne, 1986). In animals, regulatory regions
contain multiple copies of several different binding sites
(Arnone & Davidson, 1997). The relative arrangement of
recognition sites determines whether protein binding will
affect transcription positively or negatively. Different
arrangements of several repeated recognition sites provides
the combinatorial potential to construct a virtually infinite
range of regulatory regions able to engage in sophisticated
molecular computations with the cognate binding proteins,
as documented in sea urchins and Drosophila melanogaster
Repetitive DNA sequence elements involved in transcriptional regulation are chiefly oligonucleotide motifs recognised by specific DNA binding proteins. It is not essential
that every copy be identical. Many tolerate differences in
one or more base pairs and still provide binding specificity
with an altered affinity for the cognate protein. Variations in
binding sites serve regulatory functions, as in the classic
example of the phage lambda operator (Ptashne, 1986).
Since interaction of a protein factor with a recognition
sequence motif is generally weak, the real functional entities
in transcriptional formatting are composite elements that
comprise two or more motifs, like promoters (Gralla &
Collado-Vides, 1996), palindromic operators (Fig. 1), and
enhancers (Arnone & Davidson, 1997).
In addition to oligonucleotide motifs and short composite
elements (generally <100 base pairs), repetitive DNA sequences come in a wide variety of structural arrangements.
Table 2 lists the most commonly recognised structures.
Except for homopolymeric tracts and individual oligonucleotide motifs, repetitive elements are generally built up
from simpler components. This modularity exemplifies the
combinatorial and hierarchical nature of genomic coding
resulting from cooperative interactions. Tandem array
satellites are composed of sequence elements of defined
length that typically contain motifs for binding of specific
proteins, such as those required for chromatin configurations (Henikoff, Ahmad & Malik, 2001; Grewal & Elgin,
2002 ; Dawe, 2003). Tandem repetition creates extended
structures where a very large number of cooperative interactions build up robust epigenetic stability persisting
through multiple cell cycles.
Modularity also applies to transposons and retrotransposons that act to create novel long-term inherited
DNA structures. Transposable elements contain signals that
define the boundaries of each element and help create
nucleoprotein structures that allow them to interact with
James A. Shapiro and Richard von Sternberg
6
Table 2. Different structural classes of repetitive DNA
Structural class
Mnemonic
Oligonucleotide motif
4–50 bp ; protein binding or recognition sites
Homopolymeric tract
VNTR
Structural or functional characteristics1
Repeats of a single nucleotide (N)n
Variable nucleotide tandem repeats
Repeats of dinucleotides and longer sequences <100 bp
that may vary in number in the tandem array ; (NN…N)n
Composite elements
Composed of two or more oligonucleotide motifs, sometimes
with non-specific spacer sequences ; examples include
palindromic operators, promoters, enhancers and silencers,
replication origins, site-specific recombination sequences
(Gralla & Collado-Vides, 1996 ; Craig et al., 2002)
Tandem array satellites
Repeats of larger elements, typically 100–200 bp in length ;
satellite arrays typically contain thousands of copies
(Henikoff et al., 2001)
TIR DNA transposons
Terminal inverted repeat
DNA-based mobile genetic elements flanked by inverted
terminal repeat sequences of f50 bp ; may encode proteins
needed for transposition ; vary in length from several hundred
to several thousand base pairs (Craig et al., 2002)
FB DNA transposons
Foldback
DNA transposons with extensive (many kb) inverted repeats
at each end (Lim & Simmons, 1994)
Rolling circle DNA
transposons
DNA transposons that insert from a circular intermediate
by rolling circle replication ; can generate tandem arrays
(Kapitonov & Jurka, 2001 ; Craig et al., 2002)
LTR retrotransposons
Long terminal repeat
Retroviruses and non-viral mobile elements flanked by direct
terminal repeats of several hundred base pairs ; insert at new
locations following reverse transcription from an RNA copy
into duplex DNA (Craig et al., 2002)
LINE retrotransposons
Long interspersed nucleotide element
Mobile elements several kb in length with no terminal
repeats ; encode proteins involved in retrotransposition
from a PolII-transcribed RNA copy by target-primed
reverse transcription (Craig et al., 2002)
SINE retrotransposons
Short interspersed nucleotide element
Mobile elements, a few hundred base pairs in length with no
terminal repeats ; do not encode proteins (mobilised by LINE
products from a PolIII-transcribed RNA copy)
(Craig et al., 2002)
1
bp=base pairs, kb=kilobase pairs.
and rearrange target DNA sequences [e.g. terminal inverted
repeats (TIRs) of DNA transposons and long terminal
repeats (LTRs) in retrotransposons (Craig et al., 2002)]. In
addition, transposons and retrotransposons contain sequence components that control transcription and may
participate in DNA replication and chromatin organisation.
We know from an extensive literature on insertional mutagenesis in nature and the laboratory that introduction of a
transposable element into a particular location confers new
functional properties on that region of the genome (Shapiro,
1983; Craig et al., 2002 ; Deininger et al., 2003).
VII. DOCUMENTATION OF DIVERSE GENOMIC
FUNCTIONS ASSOCIATED WITH DIFFERENT
CLASSES OF REPETITIVE DNA ELEMENTS
Table 3 provides a compilation of repetitive DNA functions
in genome operation. These range from basic transcription,
through regulation at transcriptional and post-transcriptional
levels, to chromatin and nuclear organisation, genome
transmission at cell division, damage repair and DNA
restructuring. In some cases, functional categories overlap.
For example, DNA restructuring at homopolymeric tracts
and variable tandem nucleotide repeats (VNTRs) in pathogenic bacteria and chromatin organisation in eukaryotes
both serve as mechanisms for regulating coding sequence
expression. Moreover, recombination mechanisms fulfill
roles in repair [double-strand (DS) break correction] as well
as cellular protein engineering [antigenic variation, V(D)J
joining (Bassing, Swat & Alt, 2002), and immunoglobulin
class switching]. Particularly noteworthy are the cases where
repetitive DNA influences the physical organisation and
movement of the genome through the cell cycle. These cases
include the delineation of centromeres by tandem repeat
arrays, the formation of telomeres by non-LTR retrotransposons, delineation of chromatin domains by boundary/insulator elements, and the attachment of DNA to the
nuclear matrix by sequences found in larger composite
elements, such as mammalian long interspersed nucleotide
elements (LINEs).
Function
Structural class1
Example
Comment
Reference
TE sequences in almost a quarter of human promoter
sequences
Jordan et al. (2003)
TRANSCRIPTION
Promoters
Enhancers &
Silencers
Transposable elements
(TE)
LINE
Human LINE-1
1.6 % of 2004 examined human promoters include
LINEs ; the 5k-untranslated region of L1 has both an
internal (sense) promoter and an antisense promoter
(ASP) ; L1 ASP chimeric transcripts are highly
represented in expressed-sequence tag (EST) databases
Speek (2001) ; Zaiss & Kloetzel (1999) ;
Nigumann et al. (2002) ; Jordan et al.
(2003)
SINE
Human Alus ; mouse B2 elements
Genomic synonyms ; RNA polymerase II promoter
elements – 5.3 % of 2004 examined human promoters
have SINEs as components
Ferrigno et al. (2001) ; Jordan et al. (2003)
Unclassified middle
repetitive
RENT elements (repetitive element
from Nicotiana tabacum)
>5 kb with conserved 5k ends but variable 3k termini ;
moderately repetitive (y100 copies) ; found only in
certain Nicotiana species
Foster et al. (2003)
Unclassified
CpG islands
Characteristic clusters of dinucleotides associated with
mammalian promoters
Ioshikhes & Zhang (2000) ; Ponger &
Mouchiroud (2002)
LINE
Human Line-1
Positive transcriptional regulatory element ; binding
sites for testis-determining transcription factors
Yang et al. (1998) ; Becker et al. (1993) ;
Tchénio et al. (2000)
SINE
Subgroup II–III of human AluSx
subfamily
Nuclear hormone receptor binding sites for thyroid
hormone receptor, retinoic acid receptor and estrogen
receptor
Norris et al. (1995) ; Vansant & Reynolds
(1995) ; Babich et al. (1999)
Jo, Jb, Sq, Sp, Sx, and Sg
subfamilies of human Alus ; subset of
rodent B1 elements
Genomic synonyms ; Pax6 binding sites
Zhou et al. (2000, 2002)
Drosophila (GA)n and (GAGA)n
elements
Bithoraxoid polycomb group response elements are
(GA)n repeats ; y250 Drosophila loci have GAGA
elements ; the Drosophila GAGA factor (GAF) binds to
the 5k and intronic regions of loci with (GA)n sequences
Hodgson et al. (2001) ; Mahmoudi et al.
(2002) ; van Steensel et al. (2003)
Mammalian (GAGA)n elements
Found in RNA PolII and PolIII promoter elements in
Alu elements ; enriched binding sites for various
mammalian chromatin/transcriptional factors ;
negative regulatory effects mediated by microsatellites
and VNTRs can range from general to cell-typespecific
Humphrey et al. (1996) ; Kropotov et al.
(1997, 1999) ; Tomilin et al. (1992) ;
Tsuchiya et al. (1998) ; Cox et al. (1998) ;
Albanese et al. (2001) ; Fabregat et al.
(2001) ; Rothenburg et al. (2001) ; Youn
et al. (2002)
Mammalian triplet repeats
Repeat unit number differences result in quantitative
variation in gene silencing/down-regulation in many
instances (VNTRs as expression ‘tuning forks, ’ ; Kashi
et al. 1997)
Saveliev et al. (2003) ; Tovar et al. (2003)
VNTR
Santi et al. (2003) ; Sangwan & O’Brian
(2002)
7
Arabidopsis and other plant (GA)n
elements
Why repetitive DNA is essential to genome function
Table 3. Selected examples of repetitive DNA functions
Function
Transcription
attenuation
Terminators
Regulatory RNAs
8
Table 3 (cont.)
Example
Comment
Reference
Unclassified
S. cerevisiae subtelomeric X elements
Core X [470 nt] can enhance the action of a distant
silencer without acting as a silencer on its own
Lebrun et al. (2001)
Unclassified
Drosophila subtelomeric satellite-like
repeat TAS
A complex subterminal satellite with a 457-bp repeat
unit (also called telomere-associated sequence, TAS)
silences adjacent sequences
Kurenova et al. (1998) ; Boivin et al.
(2003)
Oligonucleotide motif
Schizosaccharomyces pombe Cre (cAMPresponse-element)=ATGACGT
and related sequences
Almost 1000 Cre hotspots are dispersed throughout
the S. pombe genome. Meiotic recombination hotspot
activity
Fox et al. (2000)
Unclassified
Unnamed Arabidopsis thaliana
repetitive elements
Act as enhancers
Ott & Hansen (1996)
Unclassified
Strongylocentrotus RSR elements
Act as enhancers
Gan et al. (1990)
Mosaic repetitive elements
E. coli BIMEs (bacterial interspersed
mosaic elements)
BIMEs are part of a Rho-dependent transcriptional
regulational system involving up to 250 operons
Espeli et al. (2001)
LINE
L1
Retards transcript elongation
Han et al. (2004)
Oligonucleotide motif
AATAAA
Canonical polyadenylation signal
Wahle & Ruegsegger (1999) ; Barabino
& Keller (1999)
TIR DNA transposons
IS elements
Rho-dependent and Rho-independent terminators
http://www-is.biotoul.fr/is.html
LTR retrotransposon
Mouse VL30 elements
Non-protein coding transcripts of VL30 elements
selectively bind to PSF (pre-mRNA splicing factor)
repressor, allowing transcription of genes controlled by
insulin-like growth factor response elements (IGFRE) ;
the VL30 transcripts are causally involved in
steroidogenesis and oncogenesis
Song et al. (2004)
Rodent ID elements
Target mRNAs to neuronal dendrites ; genomic
synonym
Chen et al. (2003)
Rodent BC200 and primate
homologue
Neuronal targeting ; genomic synonym
Skryabin et al. (1998)
POST-TRANSCRIPTIONAL RNA PROCESSING
mRNA targeting
SINE
Primate Alu
Neuronal targeting ; genomic synonym
Watson & Sutcliffe (1987)
Unclassified
Xenopus laevis Xlslirts familiy
(3–13 dispersed copies of a 79–81 bp monomer unit) ;
transcripts localise RNAs to the vegetal cortex
Kloc & Etkin (1994) ; Zearfoss et al.
(2003)
SINE
Human Alus ; mouse B1, B2
elements
Genomic synonyms
Rubin et al. (2002)
Multiple nearby repeats in origins, specific to each
replicon
del Solar et al. (1998) ; Marczynski &
Shapiro (1993)
TRANSLATION
Selective enhancement of mRNA
translation
DNA REPLICATION, LOCALISATION AND MOVEMENT
Repication origins
(Ori)
Oligonucleotide motifs
Bacterial chromosomes and
plasmids
James A. Shapiro and Richard von Sternberg
Structural class1
Centromeres
ORS (origin recognition sequence)
Schild-Poulter et al. (2003) ; Novac et al.
(2001) ; Price et al. (2003)
S. cerevisiae LTR, subtelomeric X
and Yk repeats
Contain 20 % of S. cerevisiae sequences that
immunoprecipitate with origin recognition proteins
Wyrick et al. (2001)
Large tandem arrays form pericentric
heterochromatin, facilitate centromere function
Choo (2001) ; Henikoff et al. (2001)
4–5 kb dg-dh repeats in S. pombe
RNAi, dg-dh DS RNA needed for centromere
organisation ; capable of inducing silencing at an
ectopic site
Jenuwein (2002) ; Dawe (2003) ; Volpe
et al. (2002) ; Hall et al. (2002) ; Reinhart
& Bartel (2002)
171 bp alpha repeats in primates
Alpha-satellite arrays are highly competent human
artifical chromosome (HAC)-forming substrates
Grimes et al. (2002) ; Schueler et al.
(2001) ; Vafa & Sullivan (1997)
Tandem array satellites
180 bp Arabidopsis repeat
156 bp CentC in maize
LTR
Nagaki et al. (2003b) ; Copenhaver et al.
(1999)
CENH3 replaces histone H3 in centromeres, and
CentC interacts specifically with CENH3
155 bp CentO rice repeat
Cheng et al. (2002)
Cereal centromere repeats
Aragon-Alcaide et al. (1996)
CRR
Centromere-specific retrotransposon in rice
Cheng et al. (2002)
Maize CRM ; CentA, Huck and
Prem2
CENH3 interacts specifically with CRM (centromere
retrotransposon in maize)
Ananiev et al. (1998b) ; Zhong et al.
(2002) ; Nagaki et al. (2003a)
Arabidopsis Athila element
Meiotic pairing and
recombination
Telomeres
Ananiev et al. (1998a) ; Zhong et al.
(2002) ; Nagaki et al. (2003a)
Why repetitive DNA is essential to genome function
LTR and unclassified
A3/4=CCTCAAATGGTC
TCCATTTTCCTTT
GGCAAATTCC
Pelissier et al. (1996)
250, 301 bp repeats in wheat and
rye
Ty3/gypsy-related
Cheng & Murata (2003)
Ty3/gypsy family Sorghum elements
Ty3/gypsy-related sequences present exclusively in the
centromeres of all Sorghum chromosomes ; Ty1/copiarelated DNA sequences are not specific to the
centromeric regions
Miller et al. (1998)
VNTR
(CAGG)10-18, (CAGA)4-6
in mice
Meiotic recombinational hotspots in mouse MHC
(major histocompatability) region
Isobe et al. (2002) ; Shiroishi et al. (1995)
Unclassified
Drosophila heterochromatin
Heterochromatin regions initiate pairing in male
meiosis ; depends upon rDNA spacer repeats
McKee et al. (2000)
Oligonucleotide motif
Schizosaccharomyces pombe Cre (cAMPresponse-element)=ATGACGT
and related sequences
Almost 1000 Cre hotspots are dispersed throughout
the S. pombe genome. Meiotic recombination hotspot
activity
Fox et al. (2000)
non-LTR
retrotransposons
Drosophila heterochromatin telomere
repeat A (HeT-A), telomereassociated retrotransposon (TART)
HeT-A units work in pairs : the 5k element has a
promoter in the 3k untranscribed region that allows
transcription of the adjacent template-unit
Pardue & DeBaryshe (2003)
Plasmodium telomere associated
repetitive elements (TAREs)
Plasmodium falciparum telomere-associated sequences of
the 14 linear chromosomes display a similar higher
order organisation and form clusters of four to seven
telomeres localised at the nuclear periphery
Figueiredo et al. (2000, 2002)
9
10
Table 3 (cont.)
Function
Structural class1
Example
Comment
Giardia telomere retrotransposons
S/MARs (scaffold/
matrix associated
regions)
Reference
Arkhipova & Morrison (2001)
176, 340, or 350 bp DNA
repeats
Chironomus spp.
Maintain telomere length by recombination/gene
conversion mechanisms ; repeat lengths species-specific
LTR retrotransposon
Yeast Ty1
Ty1 activated when normal telomere function impaired
Scholes et al. (2003)
TIR DNA Transposons
Rice and sorghum genome
miniature inverted repeat
transposable elements (MITEs)
Most of the MARs discovered in the two genomic
regions co-localise with MITEs
Avramova et al. (1998)
Transposable Euplotes crassus (Tec)
elements
S/MARs that undergo en masse, developmental,
chromosomal elimination
Sharp et al. (2003)
Human LTR retrotransposons
7.0 % of examined human S/MARs derived from
LTR retrotransposons
Jordan et al. (2003)
Drosophila gypsy
Elements determining intranuclear gene localisation/
nuclear pore association
Gerasimova et al. (2000) ; Labrador &
Corces (2002)
Human LINE-1
39.4 % of human S/MARs are LINE sequences ; 98
LINE1 consensus sequences were found to contain 14
distinct S/MAR recognition signatures ; the
distribution of Alu and LINE repetitive DNA are
biased to positions at or adjacent to apoptotic cleavage
sites ; LINE1 elements retard transcript elongation
Chimera & Musich (1985) ; Rollini et al.
(1999) ; Khodarev et al. (2000) ; Jordan
et al. (2003) ; Han et al. (2004)
LTR
LINEs
Cohn & Edstrom (1992)
CHROMATIN ORGANISATION, NUCLEAR ARCHITECTURE & EPIGENETIC MODIFICATION
VNTR
(TATAAACGCC)n
Flexible DNA segments, highest documented affinity
for nucleosomes, cluster around centromeres ; the
TATA tetrad (5k-TATAAACGCC-3k), is found in
multiple phased repeats in genomic sequences that are
among the strongest known binders of core histones
Widlund et al. (1999)
Heterochromatin
DNA Transposons ; LTR
and non-LTR retroposons
Drosophila transposable elements
Nine transposable elements (copia, gypsy, mdg-1, blood,
Doc, I, F, G, and Bari-1) are preferentially clustered into
one or more discrete heterochromatic regions in
chromosomes of the Oregon-R laboratory stock ; P and
hobo elements, recent invaders of the D. melanogaster
genome exhibit heterochromatic clusters in certain
natural populations
Cryderman et al. (1998) ; Pimpinelli et al.
(1995)
LTR retroposon
Hamster intracisternal A particle
(IAP) elements
In Syrian hamster, over half of the genomic IAP
elements are accumulated in heterochromatin,
including along the entire Y chromosome
Dimitri & Junakovic (1999)
Maize Grande, Prem2, RE-10, RE-15,
and Zeon
Abundant in heterochromatic knob regions ; blocks of
tandem 180-bp repeats interrupted by insertions of full
size copies of retrotransposable elements ; about 30 %
of cloned knob DNA fragments
Ananiev et al. (1998c)
Arabidopsis Athila
Athila elements in the Arabidopsis genome are
concentrated in or near heterochromatic regions. Most
of the heterochromatic elements retrotransposed
directly into 180 bp satellite clusters
Pelissier et al. (1996)
James A. Shapiro and Richard von Sternberg
Nucleosome
positioning elements
Insulator/Boundary
elements
LTR elements represent 61 % of euchromatic
transposable elements and approximately 78 % of
heterochromatic elements. LINE elements represent
24 % of the euchromatic and 17 % of the
heterochromatic transposable element sequence. DNA
elements represent 15 % in euchromatin and 5 % in
heterochromatin
Hoskins et al. (2002)
LINE
Human LINE-1
X inactivation, monoallelic expression, imprinting
Lyon (2000) ; Parish et al. (2002) ; Allen
et al. (2003)
Oligonucleotide motif
GATC
DNA adenine methylase (DAM) site in Gram-negative
bacteria ; involved in methylation control of
transcription, replication, repair, chromosome
packaging and transposition
Barras & Marinus (1989) ; Low et al.
(2001)
Tandem repeat satellite
180 bp, 350 bp TR1 repeats in
maize knobs
SINE
Mouse B1
B1 elements methylated de novo to a high level after
transfection into embryonal carcinoma cells ; B1
elements acted synergistically
Yates et al. (1999)
Unclassified –
palindromic repeat
Petunia repetitive sequence (RPS)
element
The palindromic RPS element acts as a de novo
hypermethylation site in the non-repetitive genomic
background of Arabidopsis
Müller et al. (2002)
Unclassified
CpG islands
Methylation sites clustered near eukaryotic promoters ;
methylation indicates silenced state
Ioshikes & Zhang (2000) ; Robertson
(2002)
Unclassified
Saccharomyces subtelomeric
anti-silencing repeats (STARs)
The telomere-proximal portion of either X or Yk
dampened silencing when located between the
telomere and the reporter gene and also at one of the
silenced mating-type cassettes
Fourel et al. (1999) ; Pryde & Louis
(1999)
LTR
Drosophila gypsy element
The gypsy insulator blocks propagation of silencing and
alters the nuclear localisation of adjacent DNA
Gerasimova et al. (2000) ; Byrd & Corces
(2003)
Unclassified
Drosophila boundary element 28
(BE28)
Approximately 150 copies in the D. melanogaster
genome ; this 269 bp BE is part of a 1.2 kb repeated
sequence. BE28 element maps to several pericentric
chromosomal regions, blocks promoter-enhancer
interactions in a directional manner, and binds the
boundary-element associated DNA-binding factor
BEAF
Cuvier et al. (2002)
Ananiev et al. (1998a)
Why repetitive DNA is essential to genome function
Methylation
Several Drosophila LTR families
ERROR CORRECTION AND REPAIR
Oligonucleotide motif
Chi sites in E. coli, B. subtilis, H.
influenzae and Lactococcus lactis
Distinct recombination initiation signals in each
bacterial species
El-Karoui et al. (1999) ; Chedin et al.
(2000) ; Quiberoni et al. (2001)
Methyl-directed mismatch repair
(MMR)
Oligonucleotide motif
GATC
Dam methylation site, binds MutH mutational repair
protein when hemimethylated
Modrich (1989)
11
Homolgous
recombination
(double-strand break
repair)
12
Table 3 (cont.)
Function
Structural class1
Example
Comment
Reference
Oligonucleotide motifs
M. bovis vis (35 bp)
Inversions generate multiple distinct variant surface
proteins by site-specific recombination
Lysnyansky, Ron & Yogev (2001)
Unclassified
N. gonorrhoeae and N. meningitidis
NIME (Neisseria interspersed mosaic
element) repeats flanking silent pilus
(pil) cassettes
Recombination involving flanking repeats replaces
segments of pilus coding sequence at expressed pil locus
Parkhill et al. (2000) ; Saunders et al.
(2000)
Borrelia downstream homology
sequence (DHS) 200 bp and
17–18 bp repeats
Surface protein variation by expression site switching
Wang et al. (1995) ; Barbour et al. (2000)
Homopolymeric tracts
Neisseria meningitidis opc locus,
N. gonorrhoeae pilC
Phase variation of opacity (opc) and pilus (pilC) proteins
Sarkari et al. (1994) ; Jonsson et al. 1991 ;
van der Ende et al. (1995) ; Henderson
et al. (1999) ; Bayliss, Field & Moxon
(2001)
VNTR
N. meningitidis hmbR locus
Phase variation of outer membrane haemoglobinbinding protein (Hmb)
Richardson & Stojiljkovic (1999)
H. influenzae adhesins
Tandem heptamers (ATCTTTC)
Dawid et al. (1999)
DNA RESTRUCTURING
Antigenic variation
Phase variation
Tandem pentamers (CTCTT)
Stern & Meyer (1987)
VNTR
Various Helicobacter pylori repeats
Repetitive, nonrandomly positioned VNTRs act as
recombination centers promoting host-specific
genomic modification
Aras et al. (2003a, b)
Uptake and
integration of
laterally transferred
DNA
Oligonucleotide motif
V. cholerae repeats (VCRs)
Construction of pathogenicity operons by site-specific
recombination
Mazel et al. (1998)
DNA uptake : N.
gonorrhoeae = GGCGTCTGAA ;
H. influenzae and Actinobacillus
actinomycetemcomitans=
AAGTGCGGTCA
Sequences identifying conspecific DNA
Chen & Dubnau (2003)
Chromatin
diminution
DNA transposons
Excised elements in Paramecium and
some Oxytrichia species ; Euplotes
crassus Tec elements
DNA transposon-like elements that undergo en masse,
developmental, chromosomal elimination
DuBois & Prescott (1997) ; Prescott
(2000) ; Sharp et al. (2003)
VDJ recombination
Oligonucleotide motif
composites
RAG (recombination-associated
gene) transposase recognition
sequences (RSSs)
Protein engineering, rapid protein evolution.
Recombination (DNA DS break) signals in
mammalian immune systems ; composed of 7 bp – 12/
23 bp spacer – 9 bp elements
Bassing, Swat & Alt (2002) ; Gellert
(2002)
Immunoglobulin class
switching
VNTR
S (switch) regions upstream of
immunoglobulin heavy chain
constant region exons
Protein engineering. Tandem arrays that undergo DS
breaks when transcribed from lymphokine-controlled
promoters
Kitao et al. (2000) ; Kinoshita & Honjo
(2001)
1
See Table 2 for the definitions of abbreviations/mnemonics for the various repeat structural classes.
James A. Shapiro and Richard von Sternberg
N. gonorrhoeae Opa proteins
Global genome
plasticity
Why repetitive DNA is essential to genome function
13
One feature of Table 3 is that certain repetitive elements,
such as the mammalian LINE-1 and Drosophila gypsy
retrovirus/retrotransposon (Pelisson et al., 2002), appear
under multiple functional headings. For example, the most
intensively studied feature of gypsy is the role it plays in the
organisation of chromatin domains. It contains an ‘ insulator ’ or boundary element capable of separating inactive
and active chromatin domains by tethering the DNA to the
nuclear matrix, thereby creating a discontinuity in the
chromatin structure (Gerasimova, Byrd & Corces, 2000;
Byrd & Corces, 2003). Since gypsy and other mobile
elements retain their structures as they migrate through the
genome, there is predictability to the signals they will carry
with them. Thus, cells have the ability to introduce a preorganised constellation of functional signals into any location
in the genome. Although gypsy is often rare in Drosophila
genomes and may be dispensible, the D. melanogaster genome
also has approximately 150 boundary elements at the base
of each chromosome containing the 269 bp BE28 sequence
inside an unclassified 1.2 kb repeat element. Apparently,
therefore, repetitive elements play a significant role in the
physical organisation of Drosophila chromatin.
The LINE-1 element in the human genome has externally
oriented promoter and polyadenylation activity, scaffold/
matrix attachment region (S/MAR) signals, and has been
implicated in X chromosome inactivation by facilitating
heterochromatin formation. Very recently, a LINE1 role in
modulating transcription has been discovered (Han, Szak
& Boeke, 2004). Since about 850 000 LINE elements make
up a remarkable 21 % of human euchromatic DNA, it is
difficult to deny that the LINE elements constitute major
architectonic and expression-related features of our hereditary material. The notion that LINE elements are major
organisers of genome functional architecture is supported by
close comparative analysis of syntenic regions in the mouse
and human genomes. In two aligned segments, 18/25 and
9/11 murine L1 elements have putative human orthologues
in the same orientation (Zhu, Swergold & Seldin, 2003). The
high degree of conservation in the positions and orientations
of highly variable elements implies positive functional
selection.
Even the very stripped-down, repeat-poor genome of the
yeast S. cerevisiae has boundary elements embedded in the
subtelomeric Yk repeats. Intriguingly, these boundary
elements form part of a larger complex that also includes
silencing elements in the adjacent X repeats and an autonomously replicating sequence (ARS) that serves as a site for
the initiation of DNA replication. About 20% of putative
yeast ARS sequences correspond to LTR regions of complete or defective retrotransposons (Wyrick et al., 2001).
Thus, it appears that repetitive elements play an important
role in organizing the chromatin and replication structures
of the budding yeast genome.
the existence of extensive ‘synteny ’ between related genomes
as well as the presence of ‘genomic synonyms ’ at comparable
locations in related genomes. Two ‘ genomic synonyms ’ are
evolutionarily independent (i.e. unrelated) elements belonging to the same structural class that play the same functional
role in each species. ‘Synteny’ refers to the occurrence of large
genome segments, often megabases in length, that share the
same order of genetic loci and are assumed to represent evolutionary conservation of a functional genomic arrangement
(Eichler & Sankoff, 2003). The locations of related LINE and
short interspersed nucleotide element (SINE) sequences are
also conserved in syntenic regions of the mouse and human
genomes (Silva et al., 2003), indicating positive selection.
Widespread examples of genomic synonyms are the
100–200 bp elements composing long tandem repeat arrays
in pericentromeric heterochromatin that delineate centromeres. More narrowly defined genomic synonyms are Alu
and B1 SINE elements of primate and rodent genomes.
These elements derive from partial dimers of the 7S RNA
sequence, but they arose independently or diverged from
a common structure before either was amplified in the primate and rodent genomes. Alu is not found in rodents, and
B1 is not found in primates (Schmid, 1996 ; Vassetzky, Ten
& Kramerov, 2003). Both Alu and B1 elements have been
documented to carry similar transcriptional and translational regulatory signals. Alu elements are also genomic
synonyms to mouse B2 elements as promoters, and Alu, B1
and B2 serve as synonyms in enhancing mRNA translation.
Primate Alu further serves as a synonym for rodent ‘ identifier ’ (ID) and BC200 elements targeting specific transcripts
in nerve cells (see Table 3 for individual references).
Because the many thousands of copies of primate Alus
and rodent B1s, B2s, ID and BC200s are structurally distinct,
their distributions in the human and mouse genomes originated in each species from literally thousands of distinct
retrotransposition events at some point after the divergence
of the rodent and primate ancestors. Despite independent
amplifications, the coarse-grained distributions of SINE
elements are similar within syntenic regions (Fig. 12 in
Mouse Genome Sequencing Consortium, 2002), supporting
the proposition that locations of these synonyms have
related functions in both genomes. Within the human
genome, functional roles have been invoked to account
for the nonrandom localisation of Alu in genetic loci involved
in metabolism, signaling, and transport (Grover et al.,
2003) and for observed nonrandom higher-order patterns
(Versteeg et al., 2003).
A significant test of the genomic synonym hypothesis will
be to see what patterns, if any, emerge from the results of a
fine-grained comparison between the locations of Alu in the
human genome and its putative synonyms, B1, B2, ID and
BC200 in the mouse genome.
VIII. SYNTENY AND GENOMIC SYNONYMS
IX. REPEATS AND RNAi IN
HETEROCHROMATIN FORMATTING
Comparative whole-genome sequencing is beginning to
provide additional supporting evidence for the structural/
organisational roles of repetitive elements by documenting
The role of repetitive elements as genomic synonyms in
heterochromatin formatting poses an apparent paradox.
14
How do conserved proteins involved in establishing heterochromatin recognise distinct DNA sequences ? Part of the
answer appears to be that initial repeat recognition is
accomplished by means of cognate RNAs rather than by
proteins (Jenuwein, 2002). It has long been known that fungi
can detect genomic repetitions independently of sequence
content and specifically methylate repeated sequences
(Selker, 1990). Similar methylated repeats were discovered
in transgenic plants and connected mechanistically to the
presence of complementary RNA (Hamilton & Baulcombe,
1999; Mette et al., 2000). DNA methylation is a step in the
formation of transcriptionally inactive heterochromatin
from transcriptionally active euchromatin (Bird, 2002).
These observations were some of the first indications of
RNA-directed inactivation, or RNAi (Matzke, Matzke &
Kooter, 2001).
The role of RNAi in repeat-directed heterochromatin
formation has been confirmed by genetic studies in the fission yeast Schizosaccharomyces pombe. Mutants in the dcr, RdRp,
and ago loci defective in the RNAi RNA-processing and
silencing machinery display aberrations in heterochromatin
formation and accumulate double-stranded transcripts from
the tandem array repeats flanking the centromeres (Volpe
et al., 2002). In wild-type S. pombe cells, a majority of short
RNA fragments characteristic of RNAi correspond to pericentromeric repeat sequences (Reinhart & Bartel, 2002).
Thus, it appeared there might be a mechanistic link between
RNAi and chromatin formatting by repetitive sequences.
Taking advantage of the ability to manipulate the yeast
genome, the RNAi – repeat-formatting hypothesis was
tested by inserting a reporter locus into pericentromeric
heterochromatin arrays. In wild-type cells, the inserted
reporter is not expressed, but expression is derepressed in
the dcr, RdRp, and ago mutants (Volpe et al., 2002). Thus,
RNAi is required to establish inactive heterochromatin
around centromeres. In a complementary experiment, a
centromere-related CenH element was inserted in a normally
euchromatic region. Ectopically, CenH silenced a linked
reporter and induced histone modifications characteristic of
heterochromatin (Hall et al., 2002). Introduction of dcr,
RdRp, and ago mutations eliminated the silencing. Thus,
RNA recognition of repeats can lead to heterochromatin
formatting in a manner that has yet to be determined. RNAi
is necessary for centromere function in S. pombe (Volpe et al.,
2003). A broad range of observations in plant and animal
systems indicate that the RNAi–repetitive DNA-heterochromatin connection is widespread among eukaryotes
(summarised in Dawe, 2003 ; Martienssen, 2003, and in
Verdel et al., 2004), as recently confirmed in Drosophila
melanogaster (Pal-Bhadra et al., 2004).
X. TAXONOMICALLY RESTRICTED GENOME
SYSTEM ARCHITECTURE
The view of the genome advocated here as a hierarchically
organised data storage system formatted by repetitive DNA
sequence elements implies that each organism has a genome
system architecture, in the same way that each computer
James A. Shapiro and Richard von Sternberg
data storage system has a characteristic architecture. In the
computer example, architecture depends upon the operating system and hardware that are used, not upon the content
of each data file. Macintoshf, Windowsf and Unixf
machines can all display the same images and text files, even
though the data retrieval paths are operationally quite distinct. Similarly, many protein and RNA sequences (data
files) are conserved through evolution, but different taxa
organise and format their genomes in quite different ways
for replication, transmission and expression. An overall
system architecture is required since these processes must be
coordinated so that they operate without mutual interference. DNA segments must be in the right place at the
right time for function. Chromatin formatting for large-scale
organisation of transcription and replication domains is well
documented ( Jenuwein & Allis, 2001, and other articles in
the same issue ; van Driel et al., 2003), and we are learning
about higher levels of spatio-temporal organisation of transcription and replication into ‘factory ’ zones (Rui, 1999 ;
Sawitzke & Austin, 2001 ; Bentley, 2002).
One clear example of taxonomically-specific genome
system architectures involves the different signals used for
basic transcription in bacteria and archaea, organisms
sharing many coding sequences that are formatted completely differently for transcription. These include, notably,
the heat-shock hsp70 locus encoding a group of three
orthologous chaperone proteins (Macario et al., 1999).
Another example of differential transcription formatting
concerns catabolite repression. The generic signal marking
catabolite-repressed sequences in Escherichia coli (the CRP
palindromic binding site for the CRP-cAMP complex in
Fig. 1) is completely different from its genomic synonym in
Bacillus subtilis (CRE element recognised by the catabolite
control protein CcpA ; Miwa et al., 2000). While both E. coli
and B. subtilis use orthologous transport systems to monitor
external glucose, independently evolved molecular signal
transduction paths connect them to the catabolite repression
signals in the genome.
An aspect of genome system architecture that deserves
special discussion is distribution of dispersed and clustered
repetitive DNA elements along the chromosomes. Table 3
shows how these elements are rich in signals affecting transcription, RNA processing, chromatin organisation, and
attachment of the DNA to the nuclear matrix. We know
that the genomic locations of these repeats have significant
effects on function. There is a vast literature on multiple
phenotypic changes caused by transposable element insertions, even outside well-defined genetic loci (Shapiro, 1983 ;
Craig et al., 2002), and by the phenomenon known as
‘ position effect variegation ’ (PEV) (Spofford, 1976 ;
Henikoff, 1996; Wakimoto, 1998). PEV is observed when
heterochromatic blocks of clustered repeats inhibit expression of adjacent genetic loci. PEV effects occur over
megabase distances. PEV thus constitutes a clear demonstration that the repeat content of each genome is an
important aspect of system architecture.
It is important to note here the little known fact that
phenotypic effects of heterochromatin are not necessarily
limited to adjacent genetic loci. The strength of heterochromatic silencing on the three large D. melanogaster
Why repetitive DNA is essential to genome function
chromosomes is sensitive to the total nuclear content of
heterochromatin carried on the Y chromosome (see Table
1). Increase in Y chromosome dosage (XYY males) suppresses PEV silencing, while reduction in the number of Y
chromosome repeats (XO males) enhances PEV silencing
(Spofford, 1976; Dimitri & Pisano, 1989). Changing dosage
of repetitive DNA encoding ribosomal RNA in the nucleolar
organiser (NO) region had similar effects (Spofford &
DeSalle, 1991). We now understand these inter-chromosomal effects on developmental coding sequence expression
to result from titration of heterochromatin-specific proteins
in the nucleus (Henikoff, 1996 ; Schotta et al., 2003). Consequently, the sizes of repeat sequence arrays are not
neutral. Indeed, expansion and contraction of major heterochromatic blocks may serve as higher-level ‘ tuning
forks ’ for developmental processes in the same way that
VNTR expansion and contraction regulate single locus
expression (Kashi, King & Doller, 1997 ; Trifonov, 1999).
Comparable trans effects on aspects of genome functioning
have been observed with B chromosomes in maize (Carlson,
1978).
The architectural role of dispersed repeats agrees with the
conservation detected in the positions and orientations of
shared repetitive elements (Silva et al., 2003; Zhu et al.,
2003). The observations on conserved repeats suggest that
high numbers of ‘framework elements ’ may be retained in
disparate mammalian genomes, with more derived subfamilies of LINEs, SINEs, and LTR elements being restricted to particular families and genera. Moreover, genome
analysis is beginning to provide evidence of functional roles
related to imprinting for evolutionarily ‘recent’ LINE
element insertions. Nonorthologous LINE-1 elements are
similarly positioned asymmetrically in the X-inactivation
centres of human, mouse, and cow (Chureau et al., 2002),
and L1 elements are significantly associated with monoallelically expressed loci in both human and mouse genomes
(Allen et al., 2003).
From a perspective postulating that changes in repetitive
elements may be important events in establishing specific
new genome architectures, it is significant to note that repetitive DNA can be far more taxonomically discriminating
than coding sequences. For example, each order of mammals has its own characteristic set of SINE elements
(Weiner, Deininger & Efstratiadis, 1986; Sternberg &
Shapiro, in press). Since these highly iterated SINEs are independently derived from cellular sequences, such as different tRNA or 7S RNA sequences, it is clear that
taxonomic diversification among mammals involved many
thousands of independent SINE amplification and insertion
events. Similarly, plant species can be discriminated by their
pericentromeric repeats (Table 3). The related nematode
species C. elegans and C. briggsae have genomes composed,
respectively, of 16.5 % and 22.4% repeat DNA, but any one
of the ten major repeat elements from either species is not
found in the other (Stein et al., 2003). Sibling species of
Drosophila often share both morphology and protein polymorphisms, but they can still be can be identified because
they contain different simple sequence satellite DNAs as well
as different abundances of particular transposable elements
(Dowsett, 1983; Bachmann & Sperlich, 1993 ; Miller et al.,
15
2000). Operationally, it is much easier to identify the species
of origin of a DNA, cell culture, or tissue sample by examining repetitive DNA than coding or unique sequences,
and this principle of using repeats for identification is
applied within species for forensic DNA analysis ( Jeffreys
et al., 1993).
Another frequently ignored feature of genome system
architecture associated with repeat elements is overall
genome size (Cavalier-Smith, 1985). In plants, genome size
correlates with an increase in repetitive DNA abundance
(Table 1). Plant molecular geneticists have suggested that
the total length of each genome is an important functional characteristic, which influences replication time, a
characteristic that correlates with the length of the life
cycle (Bennett, 1998 ; Bennetzen, 2000; Petrov, 2001;
Vinogradov, 2003). It makes sense that amplification of
mobile genetic elements is an efficient method of altering
total DNA content in the genome. Similarly, distance
between regulatory and coding sequences may be an
important control parameter (Zuckerandl, 2002).
XI. DISCUSSION
(1 ) Genome system architecture and evolution
The concept of genome system architectures formatted by
repetitive DNA extends the range of conceivable changes
that confer adaptive benefits or reproductive isolation. This
idea obliges us to consider the effects of altering repetitive
components of the genome as well as unique coding and
regulatory sequences. Classical evolutionary theory assumes
that phenotypic variation involves alteration of individual
gene products due to changes in coding sequences. We now
know this view is too restricted in two ways. First of all,
protein structure can change without altering the coding
sequences themselves through rearrangement of exons or
via alteration of splicing patterns. Formation of new exon
combinations by segmental duplications represents one
class of such changes (Eichler, 2001), and insertion of repetitive elements into introns to alter splicing patterns is
another (Nekrutenko & Li, 2001). Segmental duplications,
insertion of repetitive elements, exon shuffling (Moran,
DeBerardinis & Kazazian, 1999) as well as major chromosome rearrangements (Gray, 2000 ; Bailey, Liu & Ecihler,
2003) are all processes mediated by dispersed repetitive
elements.
The second way that the classical view is unnecessarily
restricted is in constraining adaptive variation to changes in
an organism’s repertoire of protein and RNA. Changes in
regulatory formatting of conserved coding sequences can
alter developmental patterns and lead to new traits using the
same repertoire of proteins and RNAs (Britten & Davidson,
1971). The Drosophila literature is replete with examples of
major morphological changes caused by alterations in
repetitive elements.
Examination of genomic sequences indicates that rearrangement of repetitive elements has played a significant
role in adaptive evolution and multicellular development. In
Table 3 we noted the use of a VNTR element to format
16
ontogenetic silencing of the Polycomb response element
(PRE) in the Ultrabithorax region (Hodgson, Argiropoulis &
Brock, 2001). A search of the human genome reveals many
examples of regulatory regions evolved from repetitive
elements (Britten, 1996 ; Brosius, 1999 ; Jordan et al., 2003).
These conclusions from genome scanning agree with
detailed molecular genetic analyses that demonstrate the
participation of repetitive elements in regulation of coding
sequence expression (e.g. Mozer & Benzer, 1994 ; Song, Sui
& Garen, 2004). Moreover, as we have seen, changes in
repetitive DNA affect chromatin formatting and nuclear
organisation and thus have consequences for developmental
expression of large chromosomal regions.
The genome system architecture perspective predicts a
major role for evolutionary diversification by alterations in
repetitive DNA that alter genome transmission without
affecting phenotype. This is just what we find in groups of
closely related and sibling species, which may display no
detectable morphological, physiological or adaptive differences. In the Cyclops group of copepods, large heterochromatic blocks have different chromosomal locations in each
species and undergo distinct patterns of excision during
early somatic development (Beermann, 1977; Wyngaard &
Gregory, 2001).
Changes in centromeric repeats constitute another situation where repetitive DNA variations may alter chromosome transmission but not organismal phenotypes. In some
cases, such as the Indian Muntjac deer, we know that germ
line incompatibility with the parental species resulted from
genome restructuring. The Muntiacus muntjak deer ancestor
underwent one or more Robertsonian fusions of centromeric heterochromatin that reduced chromosome numbers and cause abnormal pairing and non-disjunction at
meiosis in hybrids with sibling SE Asian deer species
(Fontana & Rubini, 1990; He & Brinkley, 1996).
Since major changes in control of genome maintenance
and transmission can occur by altering repetitive sequences
that format these processes, such alterations can lead to reproductive isolation and set the stage for subsequent phylogenetically-restricted changes in phenotype. In other words,
we suggest that major evolutionary events can initiate within
the repetitive sector of the genome. They do not have to
follow changes in the coding sector. The importance of
repetitive elements as major actors in evolutionary diversification has been expounded most forcefully by Dover’s
concept of ‘ molecular drive ’ (Dover, 1982). Importantly,
however, we disagree with his viewing changes in repetitive
DNA as functionally neutral. Indeed, we argue that functionality is precisely what makes repetitive elements powerful agents of taxonomic separation.
(2 ) A more integrative view of the genome
It is commonly accepted that the major information in
genomes consists of coding sequences determining protein
and RNA molecules. The importance of regulatory signals
has also been widely recognised, and searches for phylogenetically conserved non-coding sequences constitute an
active subfield of genomics. Nonetheless, the conceptual
significance of these ‘non-coding ’ components of the genome
James A. Shapiro and Richard von Sternberg
has largely gone unnoticed, with the result that acceptance
of the ‘selfish DNA ’ hypothesis is rarely challenged.
As we enter the era of ‘systems biology ’ (Kitano, 2002), it
is useful to recall that a system is more than a collection of
components. Those components need to integrate functionally so they can accomplish systemic tasks requiring
cooperative action. One way to state our argument is to say
that repetitive DNA elements provide the physical basis
within the genome for functional integration. As Britten and
Davidson (1969, 1971) realized, dispersed regulatory sites
connect unlinked coding sequences into coordinately controlled subsystems. Similarly, replication and genome
transmission processes are organised by generic signals
that determine origins, telomeres, centromeres and other
nucleoprotein complexes involved in genome maintenance.
Signals for formatting and delineating various chromatin
domains provide a higher level of organisation for both
transcription and replication, and distributed sites for
attachment to cellular or nuclear structures provide a
dynamic overall physical organisation of the genome that
we are just beginning to comprehend.
A second consequence of an integrative view of repetitive
genome components will be to focus attention on their importance in evolution. There has been growing recognition
of the role of mobile elements as agents of DNA restructuring (summarised in Shapiro, 1999 a, b, 2002a, b), but less
consideration about questions of how repetitive elements
come to be distributed as they are. Analysis of individual
genomes indicates that there have been episodes of expansion by particular subfamilies of SINE elements
(International Human Genome Consortium, 2001 ; Mouse
Genome Sequencing Consortium, 2002). These episodes
probably indicate periods of rapid evolutionary change and
may help clarify the punctuated nature of the evolutionary
record. It may also prove worthwhile to search for cases
where changes in repetitive elements have been the major
engine of speciation, as appears to be the case in copepods
and Muntjac deer.
( 3) Repetitive DNA and the computational
metaphor for the genome
We report only a small portion of the growing literature on
the functional roles of repetitive DNA elements. The trend is
clearly towards discovering greater specificity, pattern and
significance in the surprisingly abundant repeat fraction of
genomes. As we increasingly apply computational metaphors to cellular function, we expect that a deeper understanding of repetitive elements, the integrative fraction of
cellular DNA, will reveal novel aspects of the logical architecture inherent to genome organisation.
The electronic computation metaphor can only be
applied so far to cellular information processing. The former
is a digitized process based on binary coding and Turing
machine principles (http://www.abelard.org/turpap/turpap.htm), while the latter is a poorly-understood analog
process based on molecular stereospecificity and templating
as well as on sequence encoding. In the case of repetitive
DNA, we encounter limitations to the metaphor because
Why repetitive DNA is essential to genome function
this genomic fraction has aspects of both software and
hardware. Repetitive DNA acts like software insofar as it is
encoded in the DNA sequence and is utilised by the cell
many times to carry out defined routines, such as heterochromatin condensation. Like software, repetitive DNA can
control operations involving different unique data files. On
the other hand, repetitive DNA also forms part of essential
cellular machinery, such as the mitotic apparatus. When
DNA serves as a physical substrate facilitating protein
aggregation during nucleoprotein complex formation in
transcription, recombination and chromosome packaging, it
operates as hardware.
The fact that simple distinctions between software and
hardware do not readily apply to the informatics of repetitive DNA holds an important and encouraging lesson. In the
era of biocomputing and systems biology, our study of
cellular information processing promises to revolutionise not
only the life sciences but also the information sciences. We
can anticipate learning powerful new computational paradigms as we come to understand how cells use myriad
molecular components to regulate millions of biochemical
events that occur every minute of every cell cycle. Indeed,
we may come one day to regard erstwhile ‘ junk DNA ’ as an
integral part of cellular control regimes that can truly be
called ‘ expert. ’
XII. CONCLUSIONS
(1) DNA is a data storage medium, and genomes function
as computational information organelles.
(2) Proper access to coding sequence data files and
reliable genome replication and transmission to progeny
cells require that there be generic repetitive signals to format
the DNA for interaction with cellular hardware. Repeat
signal formatting is also necessary for genome packaging,
repair and restructuring.
(3) Generic repetitive signals are distinct from the genes
envisioned by conventional theory and require us to employ
informatic metaphors in conceptualizing fundamental principles of genome organization. Repetitive DNA elements
provide the physical basis for integrating different regions of
the genome and for coordinating interdependent aspects of
genome function (e.g. packaging, expression and transmission).
(4) Cooperative interactions between repeated DNA
elements and iterated protein domains are essential to the
formation of nucleoprotein complexes that carry out basic
genome operations. The cooperative nature of molecular
interactions in cellular information processing provides a
second fundamental reason why repetitive DNA elements
are essential to genome function.
(5) There is extensive documentation in the molecular
genetic literature (some of it tabulated here) that all structural varieties of repetitive DNA play significant roles in one
or more categories of genomic tasks. More complex repetitive DNA elements, such as retrotransposons, carry integrated sets of signals influencing multiple functions.
Evolutionarily independent elements can serve as ‘genome
17
synonyms ’ to provide similar functional formatting in
different species.
(6) The distribution of repetitive signals confers a
characteristic genome system architecture independent of
coding sequence content. Different genome system architectures can have distinct transmission and expression
properties even with the same coding sequences.
Meaningful evolutionary change can take place in the
repetitive component of the genome without altering coding
sequences.
(7) Recognition by repeat-derived small interfering
siRNA molecules provides a mechanistic basis for analogous
chromatin formatting by repetitive arrays with different
sequence contents.
XIII. ACKNOWLEDGMENTS
We thank Adam Wilkins, Michael Ashburner and an anonymous
reviewer for helpful comments on the manuscript. J. A.S. wishes to
thank members of NORDITA and the Niels Bohr Institute,
Copenhagen, for inviting him to participate in summer schools that
helped begin his interdisciplinary thinking about biological issues.
R.V. S. wishes to thank Dr Michael J. Dewey, University of South
Carolina, for bringing repetitive DNA elements to his attention so
many years ago and for encouraging him to study these enigmatic
sequences.
XIV. REFERENCES
ALBANESE, V., BIGUET, N. F., KIEFER, H., BAYARD, E., MALLET, J.
& MELONI, R. (2001). Quantitative effects on gene silencing
by allelic variation at a tetranucleotide microsatellite. Human
Molecular Genetics 10, 1785–1792.
ALBERTS, B., JOHNSON, A., LEWIS, J., RAFF, M., ROBERTS, K. &
WALTER, P. (2002). The Molecular Biology of the Cell. 4th Edition.
Garland Scientific, Taylor & Francis, New York.
ALLEN, E., HORVATH, S., TONG, F., SPITERI, E., RIGGS, A. D. &
MARAHRENS, Y. (2003). High concentrations of long interspersed
nuclear element sequence distinguish monoallelically expressed
genes. Proceedings of the National Academy of Sciences, USA 100,
9940–9945.
ANANIEV, E. V., PHILLIPS, R. L. & RINES, H. W. (1998 a). A knobassociated tandem repeat in maize capable of forming fold-back
DNA segments : are chromosome knobs megatransposons ?
Proceedings of the National Academy of Sciences, USA 95, 10785–10790.
ANANIEV, E. V., PHILLIPS, R. L. & RINES, H. W. (1998 b). Chromosome-specific molecular organisation of maize (Zea mays L.)
centromeric regions. Proceedings of the National Academy of Sciences,
USA 95, 13073–13078.
ANANIEV, E. V., PHILLIPS, R. L. & RINES, H. W. (1998 c). Complex
structure of knob DNA on maize chromosome 9 : retrotransposon invasion into heterochromatin. Genetics 149,
2025–2037.
APARICIO, S., CHAPMAN, J., STUPKA, E., PUTNAM, N., CHIA, J. M.,
DEHAL, P., CHRISTOFFELS, A., RASH, S., HOON, S., SMIT, A., et al.
(2002). Whole-genome shotgun assembly and analysis of the
genome of Fugu rubripes. Science 297, 1301–1310.
ARABIDOPSIS GENOME INITIATIVE (2000). Analysis of the genome
sequence of the flowering plant Arabidopsis thaliana. Nature 408,
796–815.
18
ARAGON-ALCAIDE, L., MILLER, T., SCHWARZACHER, T., READER, S.
& MOORE, G. (1996). A cereal centromeric sequence. Chromosoma
105, 261–268.
ARAS, R. A., FISCHER, W., PEREZ-PEREZ, G. I., CROSATTI, M.,
ANDO, T., HAAS, R. & BLASER, M. J. (2003 a). Plasticity of
repetitive DNA sequences within a bacterial (Type IV) secretion
system component. Journal of Experimental Medicine 198,
1349–1360.
ARAS, R. A., KANG, J., TSCHUMI, A. I., HARASAKI, Y. & BLASER,
M. J. (2003 b). Extensive repetitive DNA facilitates prokaryotic
genome plasticity. Proceedings of the National Academy of Sciences, USA
100, 13579–13584.
ARKHIPOVA, I. R. & MORRISON, H. G. (2001). Three retrotransposon families in the genome of Giardia lamblia : two telomeric, one dead. Proceedings of the National Academy of Sciences, USA
98, 14497–14502.
ARNONE, M. I. & DAVIDSON, E. H. (1997). The hardwiring of
development : organisation and function of genomic regulatory
systems. Development 124, 1851–1864.
AUDIT, B. & OUZONIS, C. A. (2003). From genes to genomes :
universal scale-invariant properties of microbial chromosome
organisation. Journal of Molecular Biology 332, 617–633.
AVRAMOVA, Z., TIKHONOV, A., CHEN, M. & BENNETZEN, J. L.
(1998). Matrix-attachment regions and structural colinearity in
the genomes of two grass species. Nucleic Acids Research 26,
761–767.
BABICH, V., AKSENOV, N., ALEXEENKO, V., OEI, S. L., BUCHLOW, G.
& TOMILIN, N. (1999). Association of some potential hormone
response elements in human genes with Alu family repeats. Gene
239, 341–349.
BACHMANN, L. & SPERLICH, D. (1993). Gradual evolution of a
specific satellite DNA family in Drosophila ambigua, D. tristis, and
D. obscura. Molecular Biology and Evolution 10, 647–659.
BAILEY, J. A., LIU, G. & EICHLER, E. E. (2003). An Alu transposition
model for the origin and expansion of human segmental duplications. American Journal of Human Genetics 73, 823–834.
BARABINO, S. M. L. & KELLER, W. (1999). Last but Not Least :
Regulated Poly(A) Tail Formation. Cell 99, 9–11.
BARBOUR, A. G., CARTER, C. J. & SOHASKEY, C. D. (2000). Surface
protein variation by expression site switching in the relapsing
fever agent Borrelia hermsii. Infection and Immunity 68, 7114–7121.
BARRAS, F. & MARINUS, M. G. (1989). The great GATC : DNA
methylation in E. coli. Trends in Genetics 5, 139–143.
BASSING, C. H., SWAT, W. & ALT, F. W. (2002). The mechanism
and regulation of chromosomal V(D)J recombination. Cell 109,
S45–S55.
BAYLISS, C. D., FIELD, D. & MOXON, E. R. (2001). The simple
sequence contingency loci of Haemophilus influenzae and Neisseria
meningitidis. Journal of Clinical Investigation 107, 657–662.
BECKER, K. G., SWERGOLD, G. D., OZATA, K. & THAYER, R. E.
(1993). Binding of the ubiquitous nuclear transcription factor
YY1 to a cis regulatory sequence in the human LINE-1 transposable element. Human Molecular Genetics 2, 1697–1702.
BEERMANN, S. (1977). The diminution of heterochromatic chromosomal segments in Cyclops (Crustacea, Copepoda). Chromosoma
60, 297–344.
BENNETT, M. D. (1998). Plant genome values : how much do we
know ? Proceedings of the National Academy of Sciences, USA 95,
2011–2016.
BENNETT, M. D., LEITCH, I. J., PRICE, H. J. & JOHNSTON, J. S.
(2003). Comparisons with Caenorhabditis (approximately
100 Mb) and Drosophila (approximately 175 Mb) using flow
James A. Shapiro and Richard von Sternberg
cytometry show genome size in Arabidopsis to be approximately
157 Mb and thus approximately 25 % larger than the
Arabidopsis genome initiative estimate of approximately
125 Mb. Annals of Botany 91, 547–557.
BENNETZEN, J. L. (2000). Transposable elements contributions to
plant gene and genome evolution. Plant Molecular Biology 42,
251–269.
BENTLEY, D. (2002). The mRNA assembly line : transcription and
processing machines in the same factory. Current Opinion in Cell
Biology 14, 336–342.
BIRD, A. (2002). DNA methylation patterns and epigenetic memory. Genes and Development 16, 6–21.
BOIVIN, A., GALLY, C., NETTER, S., ANXOLABEHERE, D. &
RONSSERAY, S. (2003). Telomeric associated sequences of
Drosophila recruit polycomb-group proteins in vivo and can induce
pairing-sensitive repression. Genetics 164, 195–208.
BOUTANAEV, A. M., KALMYKOVA, A. I., SHEVELYOV, Y. Y. &
NURMINSKY, D. I. (2002). Large clusters of co-expressed genes in
the Drosophila genome. Nature 420, 666–669.
BRITTEN, R. J. (1996). DNA sequence insertion and evolutionary
variation in gene regulation. Proceedings of the National Academy of
Sciences, USA 93, 9374–9377.
BRITTEN, R. J. & DAVIDSON, E. H. (1969). Gene regulation for
higher cells : a theory. Science 165, 349–357.
BRITTEN, R. J. & DAVIDSON, E. H. (1971). Repetitive and nonrepetitive DNA sequences and a speculation on the origins of
evolutionary novelty. Quarterly Review of Biology 46, 111–138.
BRITTEN, R. J. & KOHNE, D. E. (1968). Repeated sequences in
DNA. Hundreds of thousands of copies of DNA sequences have
been incorporated into the genomes of higher organisms. Science
161, 529–540.
BROSIUS, J. (1999). RNAs from all categories generate retrosequences that may be exapted as novel genes or regulatory
elements. Gene 238, 115–134.
BYRD, K. & CORCES, V. G. (2003). Visualization of chromatin
domains created by the gypsy insulator of Drosophila. Journal of
Cell Biology 162, 565–574.
CARLSON, W. R. (1978). The B chromosome of corn. Annual Reviews
of Genetics 12, 5–23.
CARON, H., VAN SCHAIK, B., VAN DER MEE, M., BAAS, F., RIGGINS,
G., VAN SLUIS, P., HERMUS, M. C., VAN ASPEREN, R., BOON, K.,
VOUTE, P. A., HEISTERKAMP, S., VAN KAMPEN, A. & VERSTEEG, R.
(2001). The human transcriptome map : clustering of highly
expressed genes in chromosomal domains. Science 291,
1289–1292.
CAVALIER-SMITH, T. (1985). The evolution of genome size. John Wiley &
Sons Ltd., Chichester.
CELNIKER, S. E., WHEELER, D. A., KRONMILLER, B., CARLSON,
J. W., HALPERN, A., PATEL, S., ADAMS, M., CHAMPE, M., DUGAN,
S. P., FRISE, E., et al. (2002). Finishing a whole-genome shotgun :
release 3 of the Drosophila melanogaster euchromatic genome
sequence. Genome Biol. 2002 ;3(12) :RESEARCH0079. Epub
2002 Dec 23.
CHEDIN, F., EHRLICH, S. D. & KOWALCZYKOWSKI, S. C. (2000). The
Bacillus subtilis AddAB Helicase/Nuclease is regulated by its
cognate Chi sequence in vitro. Journal of Molecular Biology 298,
7–20.
CHEN, I. & DUBNAU, D. (2003). DNA transport during transformation. Frontiers in Bioscience 8, S544–S556.
CHEN, D., KUNLIN, J., KAWAGUCHI, K., NAKAYAMA, M., ZHOU, X.,
XIONG, Z., ZHOU, A., MAO, X. O., GREENBERG, D. A., GRAHAM,
S. H. & SIMON, R. P. (2003). Ero1-L, an ischemia-inducible gene
Why repetitive DNA is essential to genome function
from rat brain with homology to global ischemia-induced gene
11 (Giig11), is localized to neuronal dendrites by a dispersed
identifier (ID) element-dependent mechanism. Journal of
Neurochemistry 85, 670–679.
CHENG, Z., DONG, F., LANGDON, T., OUYANG, S., BUELL, C. R., GU,
M., BLATTNER, F. R. & JIANG, J. (2002). Functional rice centromeres are marked by a satellite repeat and a centromerespecific retrotransposon. Plant Cell 14, 1691–1704.
CHENG, Z.-J. & MURATA, M. (2003). A centromeric tandem repeat
family originating from a part of Ty3/gypsy-retroelement in
wheat and its relatives. Genetics 164, 665–672.
CHIMERA, J. A. & MUSICH, P. R. (1985). The association of the interspersed repetitive KpnI sequences with the nuclear matrix.
Journal of Biological Chemistry 260, 9373–9379.
CHOO, K. H. (2001). Domain organisation at the centromere and
neocentromere. Developmental Cell 1, 165–177.
CHUREAU, C., PRISSETTE, M., BOURDET, A., BARBE, V., CATTOLICO,
L., JONES, L., EGGEN, A., AVNER, P. & DURET, L. (2002).
Comparative sequence analysis of the X-inactivation center
region in mouse, human, and bovine. Genome Research 12,
894–908.
COHN, M. & EDSTROM, J.-L. (1992). Telomere-associated repeats in
Chironomus form discrete subfamilies generated by gene conversion. Journal of Molecular Evolution 35, 114–122.
COPENHAVER, G. P., NICKEL, K., KUROMORI, T., BENITO, M. I.,
KAUL, S., LIN, X., BEVAN, M., MURPHY, G., HARRIS, B., PARNELL,
L. D., MCCOMBIE, W. R., MARTIENSSEN, R. A., MARRA, M.,
PREUSS, D., et al. (1999). Genetic definition and sequence analysis
of Arabidopsis centromeres. Science 286, 2468–2474.
COX, G. S., GUTKIN, D. W., HAAS, M. J. & COSGROVE, D. E.
(1998). Isolation of an Alu repetitive DNAbinding protein and
effect of CpG methylation on binding to its recognition
sequence. Biochimica et Biophysica Acta 1396, 67–87.
CRAIG, N. L., CRAIGIE, R., GELLERT, M. & LAMBOWITZ, A. M.
(2002). Mobile DNA II. ASM Press, Washington DC.
CRYDERMAN, D. E., CUAYCONG, M. H., ELGIN, S. C. R. &
WALLRATH, L. L. (1998). Characterization of sequences associated with position-effect variegation at pericentric sites in
Drosophila heterochromatin. Chromosoma 107, 277–285.
CUVIER, O., HART, C. M., KÄS, E. & LAEMMLI, U. K. (2002).
Identification of a multicopy chromatin boundary element at
the borders of silenced chromosomal domains. Chromosoma 110,
519–531.
DAVIDSON, E. H. & BRITTEN, R. J. (1979). Regulation of gene
expression : possible role of repetitive sequences. Science 204,
1052–1059.
DAWE, R. K. (2003). RNA interference, transposons, and the centromere. Plant Cell 15, 297–301.
DAWID, S., BARENKAMP, S. J. & STGEME, W. J. (1999). Variation in
expression of the Haemophilus influenzae HMW adhesins : a prokaryotic system reminiscent of eukaryotes. Proceedings of the
National Academy of Sciences, USA 96, 1077–1082.
DEHAL, P., SATOU, Y., CAMPBELL, R. K., CHAPMAN, J., DEGNAN, B.,
DE TOMASO, A., DAVIDSON, B., DI GREGORIO, A., GELPKE, M.,
GOODSTEIN, D. M., et al. (2002). The draft genome sequence of
Ciona intestinalis : insights into chordate and vertebrate orgins.
Science 298, 2157–2167.
DEININGER, P. L., MORAN, J. V., BATZER, M. A. & KAZAZIAN, H. H.
(2003). Mobile elements and mammalian genome evolution.
Current Opinion in Genetics and Development 13, 651–658.
DEL SOLAR, G., GIRALDO, R., RUIZ-ECHEVARRÍA, M. J., ESPINOSA,
M. & DÍAZ-OREJAS, R. (1998). Replication and control of circular
19
bacterial plasmids. Microbiology and Molecular Biology Reviews 62,
434–464.
DIMITRI, P. & JUNAKOVIC, N. (1999). Revising the selfish DNA
hypothesis : new evidence on accumulation of transposable
elements in heterochromatin. Trends in Genetics 15, 123–124.
DIMITRI, P. & PISANO, C. (1989). Position effect variegation in
Drosophila melanogaster : relationship between suppression effect
and the amount of Y chromosome. Genetics 122, 793–800.
DOOLITTLE, W. F. & SAPIENZA, C. (1980). Selfish genes, the phenotype paradigm and genome evolution. Nature 284, 601–603.
DOVER, G. A. (1982). Molecular drive : a cohesive mode of species
evolution. Nature 299, 111–117.
DOWSETT, A. P. (1983). Closely related species of Drosophila can
contain different libraries of middle repetitive DNA sequences.
Chromosoma 88, 104–108.
DUBOIS, M. L. & PRESCOTT, D. M. (1997). Volatility of internal
eliminated segments in germ line genes of hypotrichous ciliates.
Molecular and Cellular Biology 17, 326–337.
EICHLER, E. E. (2001). Recent duplication, domain accretion and
the dynamic mutation of the human genome. Trends in Genetics
17, 661–669.
EICHLER, E. E. & SANKOFF, D. (2003). Structural dynamics of
eukaryotic chromosome evolution. Science 301, 793–797.
EL KAROUI, M., BIAUDET, V., SCHBATH, S. & GRUSS, A. (1999).
Characteristics of Chi distribution on different bacterial
genomes. Research in Microbiology 150, 579–587.
ESPELI, O., MOULIN, L. & BOCCARD, F. (2001). Transcription
attenuation associated with bacterial repetitive extragenic BIME
elements. Journal of Molecular Biology 314, 375–386.
FABREGAT, I., KOCH, K. S., AOKI, T., ATKINSON, A. E., DANG, H.,
AMOSOVA, O., FRESCO, J. R., SCHILDKRAUT, C. H. & LEFFERT,
H. L. (2001). Functional pleiotropy of an intramolecular triplexforming fragment from the 3k-UTR of the rat Pigr gene.
Physiological Genomics 5, 53–65.
FERRIGNO, O., VIROLLE, T., DJABARI, Z., ORTONNE, J. P., WHITE,
R. J. & ABERDAM, D. (2001). Transposable B2 SINE elements
can provide mobile RNA polymerase II promoters. Nature
Genetics 28, 77–81.
FIGUEIREDO, L. M., PIRRIT, L. A. & SCHERF, A. (2000). Genomic
organisation and chromatin structure of Plasmodium falciparum
chromosome ends. Molecular and Biochemical Parasitology 106,
169–174.
FIGUEIREDO, L. M., FREITAS-JUNIOR, L. H., BOTTIUS, E., OLIVOMARTIN, J.-C. & SCHERT, A. (2002). A central role for Plasmodium
falciparum subtelomeric regions in spatial positioning and telomere length regulation. EMBO Journal 21, 815–824.
FONTANA, F. & RUBINI, M. (1990). Chromosomal evolution in
Cervidae. Biosystems 24, 157–174.
FOSTER, E., HATTORI, J., ZHANG, P., LABBE, H., MARTIN-HELLER,
T., LI-POOK-THAN, J., OUELLET, T., MALLET, K. & MIKI, B.
(2003). The new RENT family of repetitive elements in Nicotiana
species harbors gene regulatory elements related to the tCUP
cryptic promoter. Genome 46, 146–155.
FOUREL, G., REVARDEL, E., KOERING, C. E. & GILSON, E. (1999).
Cohabitation of insulators and silencing elements in yeast subtelomeric regions. EMBO Journal 18, 2522–2537.
FOX, M. E., YAMADA, T., OHTA, K. & SMITH, G. R. (2000). A
family of cAMP-response-element-related DNA sequences with
meiotic recombination hotspot activity in Schizosaccharomyces
pombe. Genetics 156, 59–68.
GAN, L., ZHANG, W. & KLEIN, W. H. (1990). Repetitive
DNA sequences linked to the sea urchin spec genes contain
20
transcriptional enhancer-like elements. Developmental Biology 139,
180–196.
GELLERT, M. (2002). V(D)J recombination : RAG proteins,
repair factors, and regulation. Annual Review of Biochemistry 71,
101–132.
GERASIMOVA, T. I., BYRD, K. & CORCES, V. G. (2000). A chromatin
insulator determines the nuclear localizations of DNA. Molecular
Cell 6, 1025–1035.
GOFF, S. A., RICKE, D., LAN, T. H., PRESTING, G., WANG, R.,
DUNN, M., GLAZEBROOK, J., SESSIONS, A., OELLER, P., VARMA,
H., et al. (2002). A draft sequence of the rice genome (Oryza sativa
L. ssp. japonica). Science 296, 92–100.
GRALLA, J. D. & COLLADO-VIDES, J. (1996). Organisation and
function of transcription regulatory elements. In Escherichia coli
and Salmonella Cellular and Molecular Biology, 2nd ed., (eds. F. C.
Neidhardt et al.), pp. 1232–1245. ASM Press, Washington, D.C.
GRAY, Y. H. (2000). It takes two transposons to tango : transposable-element-mediated chromosomal rearrangements. Trends in
Genetics 16, 461–468.
GREIL, F., VAN DER KRAAN, I., DELROW, J., SMOTHERS, J. F., DE
WIT, E., BUSSEMAKER, H. J., VAN DRIEL, R., HENIKOFF, S. & VAN
STEENSEL, B. (2003). Distinct HP1 and Su(var)3-9 complexes
bind to sets of developmentally coexpressed genes depending on
chromosomal location. Genes and Development 17, 2825–2838.
GREWAL, S. I. & ELGIN, S. C. (2002). Heterochromatin : new possibilities for the inheritance of structure. Current Opinion in Genetics
and Development 12, 178–187.
GRIMES, B., RHOADES, A. & WILLARD, H. (2002). a-Satellite DNA
and vector composition influence rates of human artificial
chromosome formation. Molecular Therapy 5, 798–805.
GROVER, D., MAJUMDER, P. P., RAO, C. B., BRAHMACHARI, S. K. &
MUKERJI, M. (2003). Non-random distribution of alu elements in
genes of various functional categories : insight from analysis of
human chromosomes 21 and 22. Molecular Biology and Evolution
20, 1420–1424.
HALL, I. M., SHANKARANARAYANA, G. D., NOMA, K., AYOUB, N.,
COHEN, A. & GREWAL, S. I. (2002). Establishment and maintenance of a heterochromatin domain. Science 297, 2232–2237.
HAMILTON, A. J. & BAULCOMBE, D. C. (1999). A species of small
antisense RNA in posttranscriptional gene silencing in plants.
Science 286, 950–952.
HAN, J. S., SZAK, S. T. & BOEKE, J. D. (2004). Transcriptional
disruption by the L1 retrotransposon and implications for
mammalian transcriptomes. Nature 429, 268–274
HE, D. & BRINKLEY, B. R. (1996). Structure and dynamic organisation of centromeres/prekinetochores in the nucleus of mammalian cells. Journal of Cell Science 109, 2693–2704.
HENDERSON, I. R., OWEN, P. & NATARO, J. P. (1999). Molecular
switches – the ON and OFF of bacterial phase variation.
Molecular Microbiology 33, 919–932.
HENIKOFF, S. (1996). Dosage-dependent modification of positioneffect variegation in Drosophila. Bioessays 18, 401–409.
HENIKOFF, S., AHMAD, K. & MALIK, H. S. (2001). The centromere
paradox : stable inheritance with rapidly evolving DNA. Science
293, 1098–1102.
HODGSON, J. W., ARGIROPOULOS, B. & BROCK, H. W. (2001).
Site-specific recognition of a 70 base-pair element containing
d(GA)n repeats mediates bithoraxoid polycomb group response
element-dependent silencing. Molecular and Cellular Biology 21,
4528–4543.
HOFNUNG, M. & SHAPIRO, J. (1999). Research in Microbiology 150
(special November–December double issue on bacterial repeats).
James A. Shapiro and Richard von Sternberg
HOSKINS, R. A., SMITH, C. D., CARLSON, J. W., CARVALHO, A. B.,
HALPERN, A., KAMINKER, J. S., KENNEDY, C., MUNGALL, C. J.,
SULLIVAN, B. A., SUTTON, G. G., YASUHARA, J. C., WAKIMOTO,
B. T., MYERS, E. W., CELNIKER, S. E., RUBIN, G. M. & KARPEN,
G. H. (2002). Heterochromatic sequences in a Drosophila wholegenome shotgun assembly. Genome Biology 3, 0085.1–0085.16.
HUMPHREY, G. W., ENGLANDER, E. W. & HOWARD, B. H. (1996).
Specific binding sites for a pol III transcriptional repressor and
pol II transcription factor YY1 within the internucleosomal
spacer region in primate Alu repetitive elements. Gene Expression
6, 151–168.
INTERNATIONAL HUMAN GENOME CONSORTIUM (2001). Initial
sequencing and analysis of the human genome. Nature 409,
860–921.
IOSHIKHES, I. P. & ZHANG, M. Q. (2000). Large-scale human promoter mapping using CpG islands. Nature Genetics 26, 61–63.
ISOBE, T., YASHINO, M., MIZUNO, K.-I., LINDAHL, K. F., KOIDE, T.,
GAUDIERI, S., GOJOBORI, T. & SHIROISHI, T. (2002). Molecular
characterization of the Pb recombination hotspot in the mouse
major histocompatibility complex class II region. Genomics 80,
229–235.
JACOB, F., BRENNER, S. & CUZIN, F. (1963). On the regulation of
DNA replication in Bacteria. Cold Spring Harbor Symposium
Quantitative Biology 28, 329–438.
JACOB, F. & MONOD, J. (1961). Genetic regulatory mechanisms in
the synthesis of proteins. Journal of Molecular Biology 3, 318–356.
JEFFREYS, A. J., MONCKTON, D. G., TAMAKI, K., NEIL, D. L.,
ARMOUR, J. A., MACLEOD, A., COLLICK, A., ALLEN, M. &
JOBLING, M. (1993). Minisatellite variant repeat mapping :
application to DNA typing and mutation analysis. Experientia
Supplementa 67, 125–139.
JENUWEIN, T. (2002). An RNA-guided pathway for the epigenome.
Science 297, 2215–2218.
JENUWEIN, T. & ALLIS, C. D. (2001). Translating the histone code.
Science 293, 1074–1080.
JONSSON, A. B., NYBERG, G. & NORMARK, S. (1991). Phase variation
of gonococcal pili by frameshift mutation in pilC, a novel gene
for pilus assembly. EMBO Journal 10, 477–488.
JORDAN, I. K., ROGOZIN, I. B., GLAZKO, G. V. & KOONIN, E. V.
(2003). Origin of a substantial fraction of human regulatory sequences from transposable elements. Trends in Genetics 19, 68–72.
KAMATH, R. S., FRASER, A. G., DONG, Y., POULIN, G., DURBIN, R.,
GOTTA, M., KANAPIN, A., LE BOT, N., MORENO, S., SOHRMANN,
M., WELCHMAN, D. P., ZIPPERLEN, P. & AHRINGER, J. (2003).
Systematic functional analysis of the Caenorhabditis elegans genome
using RNAi. Nature 421, 231–237.
KAPITONOV, V. V. & JURKA, J. (2001). Rolling-circle transposons in
eukaryotes. Proceedings of the National Academy of Sciences, USA 98,
8714–8719.
KASHI, Y., KING, D. & DOLLER, M. S. (1997). Simple sequence
repeats as a source of quantitative variation. Trends in Genetics 13,
74–78.
KHODAREV, N. N., BENNETT, T., SHEARING, N., SOKOLOVA, I.,
KOUDELIK, J., WALTER, S., VILLALOBOS, M. & VAUGHN, A. T. M.
(2000). LINE L1 retrotransposable element is targeted during
the initial stages of apoptotic DNA fragmentation. Journal of
Cellular Biochemistry 79, 486–495.
KINOSHITA, K. & HONJO, T. (2001). Linking class-switch recombination with somatic hypermutation. Nature Reviews Molecular Cell
Biology 2, 493–503.
KIRKNESS, E. F., BAFNA, V., HALPERN, A. L., LEVY, S., REMINGTON,
K., RUSCH, D. B., DELCHER, A. L., POP, M., WANG, W.,
Why repetitive DNA is essential to genome function
FRASER, C. M. & VENTER, J. C. (2003). The dog genome : survey
sequencing and comparative analysis. Science 301, 1898–1903.
KITAO, H., ARAKAWA, H., KUMA, K.-I., YAMAGISHI, H., NAKAMURA,
N., FURUSAWA, S., MATSUDA, H., YASUDA, M., EKINO, S. &
SHIMIZU, A. (2000). Class switch recombination of the chicken
IgH chain genes : implications for the primordial switch region
repeats. International Immunology 12, 959–968.
KITANO, H. (2002). Systems biology : a brief overview. Science 295,
1662–1664.
KLOC, M. & ETKIN, L. D. (1994). Delocalization of Vg1 mRNA
from the vegetal cortex in Xenopus oocytes after destruction of
Xlsirt RNA. Science 265, 1101–1103.
KROPOTOV, A., SEDOVA, V., IVANOV, V., SAZEEVA, N., TOMILIN, A.,
KRUTILINA, R., OEI, S. L., GRIESENBECK, J., BUCHLOW, G. &
TOMILIN, N. (1999). A novel human DNA-binding protein with
sequence similarity to a subfamily of redox proteins which is able
to repress RNA-polymerase-III-driven transcription of the Alufamily retroposons in vitro. European Journal of Biochemistry 260,
336–346.
KROPOTOV, A. V., YAU, P., BRADBURY, P. & TOMILIN, N. (1997).
Nonhistone chromatin proteins HMG1 and HMG2 stabilise one
of the sequence-specific complexes, formed on the promoter of
human retroposons of the ALU-family, by other nuclear proteins. Molekuliarnaia Genetika, Mikrobiologia, I Virusologia 4, 32–36.
KURENOVA, E., CHAMPION, L., BIESSMANN, H. & MASON, J. M.
(1998). Directional gene silencing induced by a complex subtelomeric satellite in Drosophila. Chromosoma 107, 311–320.
LABRADOR, M. & CORCES, V. G. (2002). Setting the boundaries of
chromatin domains and nuclear organization. Cell 111,
151–154.
LAEMMLI, U. K., KÄS, E., POLJAK, L. & ADACHI, Y. (1992). Scaffoldassociated regions : cis-acting determinants of chromatin structural loops and functional domains. Current Opinion in Genetics and
Development 2, 275–285.
LEBRUN, E., REVARDEL, E., BOSCHERON, C., LI, R., GILSON, E. &
FOUREL, G. (2001). Protosilencers in Saccharomyces cerevisiae subtelomeric regions. Genetics 158, 167–176.
LEWIS, M., CHANG, G., HORTON, N. C., KERCHER, M. A., PACE,
H. C., SCHUMACHER, M. A., BRENNAN, R. G. & LU, P. (1996).
Crystal structure of the lactose operon repressor and its complexes with DNA and inducer. Science 271, 1247–1254.
LIM, J. K. & SIMMONS, M. J. (1994). Gross chromosome rearrangements mediated by transposable elements in Drosophila
melanogaster. Bioessays 16, 269–275.
LOW, D. A., WEYAND, N. J. & MAHAN, M. J. (2001). Roles of DNA
adenine methylation in regulating bacterial gene expression and
virulence. Infection and Immunity 69, 7197–7204.
LYNCH, M. & CONERY, J. (2003). The origins of genome complexity. Science 302, 1401–1404.
LYON, M. F. (2000). LINE-1 elements and X chromosome inactivation : a function for ‘junk ’ DNA ? Proceedings of the National
Academy of Sciences, USA 97, 6248–6249.
LYSNYANSKY, Y. R., RON, Y. & YOGEV, D. (2001). Juxtaposition of
an active promoter to vsp genes via site-specific DNA inversions
generates antigenic variation in Mycoplasma bovis. Journal of
Bacteriology 183, 5698–5708.
MACARIO, A. J. L., LANGE, M., AHRING, B. K. & DE MACARIO, E. C.
(1999). Stress genes and proteins in the Archaea. Microbiology and
Molecular Biology Reviews 63, 923–967.
MAHMOUDI, T., KATSANI, K. R. & VERRIJZER, C. P. (2002). GAGA
can mediate enhancer function in trans by linking two separate
DNA molecules. EMBO Journal 21, 1775–1781.
21
MARCZYNSKI, G. T. & SHAPIRO, L. (1993). Bacterial chromosome
origins of replication. Current Opinion in Genetics and Development 3,
775–782.
MARTIENSSEN, R. A. (2003). Maintenance of heterochromatin by
RNA interference of tandem repeats. Nature Genetics 35, 213–214.
MATZKE, M., MATZKE, A. J. & KOOTER, J. M. (2001). RNA : guiding gene silencing. Science 293, 1080–1083.
MAZEL, D., DYCHINCO, B., WEBB, V. A. & DAVIES, J. (1998). A
distinctive class of integron in the Vibrio cholerae genome. Science
280, 605–608.
MCKEE, B. D., HONG, C. S. & DAS, S. (2000). On the roles of
heterochromatin and euchromatin in meiosis in Drosophila :
mapping chromosomal pairing sites and testing candidate
mutations for effects on X-Y nondisjunction and meiotic drive in
male meiosis. Genetica 109, 77–93.
METTE, M. F., AUFSATZ, W., VAN DER WINDEN, J., MATZKE, M. A.
& MATZKE, A. J. (2000). Transcriptional silencing and promoter
methylation triggered by double-stranded RNA. EMBO Journal
19, 5194–5201.
MEYERS, B. C., TINGEY, S. V. & MORGANTE, M. (2001).
Abundance, distribution, and transcriptional activity of repetitive elements in the maize genome. Genome Research 11,
1660–1676.
MILLER, J. T., DONG, F., JACKSON, S. A., SONG, J. & JIANG, J. (1998).
Retrotransposon-related DNA sequences in the centromeres of
grass chromosomes. Genetics 150, 1615–1623.
MILLER, W. J., NAGEL, A., BACHMANN, J. & BACHMANN, L. (2000).
Evolutionary dynamics of the SGM transposon family in the
Drosophila obscura species group. Molecular Biology and Evolution 17,
1597–1609.
MIWA, Y., NAKATA, A., OGIWARA, A., YAMAMOTO, M. & FUJITA, Y.
(2000). Evaluation and characterization of catabolite-responsive
elements (cre) of Bacillus subtilis. Nucleic Acids Research 28,
1206–1210.
MODRICH, P. (1989). Methyl-directed DNA mismatch correction.
Journal of Biological Chemistry 264, 6597–6600.
MORAN, J. V., DEBERARDINIS, R. J. & KAZAZIAN, H. H. JR. (1999).
Exon shuffling by L1 retrotransposition. Science 283, 1530–1534.
MOUSE GENOME SEQUENCING CONSORTIUM (2002). Initial sequencing and comparative analysis of the mouse genome. Nature
420, 520–562.
MOZER, B. A. & BENZER, S. (1994). Ingrowth by photoreceptor
axons induces transcription of a retrotransposon in the
developing Drosophila brain. Development 120, 1049–1058.
MÜ LLER, A., MARINS, M., KAMISUGI, Y. & MEYER, P. (2002).
Analysis of hypermethylation in the RPS element suggests a
signal function for short inverted repeats in de novo methylation.
Plant Molecular Biology 48, 383–399.
MÜ LLER, J., OEHLER, S. & MÜ LLER-HILL, B. (1996). Repression of
lac promoter as a function of distance, phase and quality of an
auxiliary lac operator. Journal of Molecular Biology 257, 21–29.
NAGAKI, K., SONG, J., STUPAR, R. M., PAROKONNY, A. S., YUAN, Q.,
OUYANG, S., LIU, J., HSIAO, J., JONES, K. M., DAWE, R. K.,
BUELL, C. R. & JIANG, J. (2003 a). Molecular and cytological
analyses of large tracks of centromeric DNA reveal the structure
and evolutionary dynamics of maize centromeres. Genetics 163,
759–770.
NAGAKI, K., TALBERT, P. B., ZHONG, C. X., DAWE, R. K.,
HENIKOFF, S. & JIANG, J. (2003 b). Chromatin immunoprecipitation reveals that the 180-bp satellite repeat is the key functional
DNA element of Arabidopsis thaliana centromeres. Genetics 163,
1221–1225.
22
NEKRUTENKO, A. & LI, W.-H. (2001). Transposable elements are
found in a large number of human protein coding regions. Trends
in Genetics 17, 619–625.
NIGUMANN, P., REDIK, K., MATLIK, K. & SPEEK, M. (2002). Many
human genes are transcribed from the antisense promoter of L1
retrotransposon. Genomics 79, 628–634.
NORRIS, J., FAN, D., ALEMAN, C., MARKS, J. R., FUTREAL, P. A.,
WISEMAN, R. W., INGLEHART, J. D., DEININGER, P. L. &
MCDONNELL, D. P. (1995). Identification of a new subclass of
Alu DNA repeats which can function as estrogen receptordependent transcriptional enhancers. Journal of Biological Chemistry
270, 22777–22782.
NOVAC, O., MATHEOS, D., ARAUJO, F. D., PRICE, G. B. & ZANNISHADJOPOULOS, M. (2001). In vivo association of Ku with mammalian origins of DNA replication. Molecular Biology of the Cell 12,
3386–3401.
NURSE, P., MASUI, Y. & HARTWELL, L. (1998). Understanding the
cell cycle. Nature Medicine 4, 1103–1106.
ORGEL, L. E. & CRICK, F. H. (1980). Selfish DNA : the ultimate
parasite. Nature 284, 604–607.
OTT, R. W. & HANSEN, L. K. (1996). Repeated sequences from the
Arabidopsis thaliana genome function as enhancers in transgenic
tobacco. Molecular and General Genetics 252, 563–571.
PAGEL, M. & JOHNSTONE, R. A. (1992). Variation across species in
the size of the nuclear genome supports the junk-DNA explanation for the C-value paradox. Proceedings of the Royal Society of
London B : Biological Science 249(1325), 119–124.
PAL-BHADRA, M., LEIBOVITCH, B. A., GANDHI, S. G., RAO, M.,
BHADRA, U., BIRCHLER, J. A. & ELGIN, S. C. R. (2004).
Heterochromatic silencing and HP1 localization in Drosophila are
dependent on the RNAi machinery. Science 303, 669–672.
PARDUE, M. L. & DEBARYSHE, P. G. (2003). Retrotransposons
provide an evolutionarily robust non-telomerase mechanism to
maintain telomeres. Annual Review of Genetics 37, 485–511.
PARISH, D. A., VISE, P., WICHMAN, H. A., BULL, J. J. & BAKER, R. J.
(2002). Distribution of LINEs and other repetitive elements
in the karyotype of the bat Carollia : implications for Xchromosome inactivation. Cytogenetics and Genome Research 96,
191–197.
PARKHILL, J., ACHTMAN, M., JAMES, K. D., BENTLEY, S. D.,
CHURCHER, C., KLEE, S. R., MORELLI, G., BASHAM, D., BROWN,
D., CHILLINGWORTH, T., et al. (2000). Complete DNA sequence
of a serogroup A strain of Neisseria meningitidis Z2491. Nature 404,
502–507.
PELISSIER, T., TUTOIS, S., TOURMENTE, S., DERAGON, J. M. &
PICARD, G. (1996). DNA regions flanking the major Arabidopsis
thaliana satellite are principally enriched in Athila retroelement
sequences. Genetica 97, 141–151.
PELISSON, A., MEJLUMIAN, L., ROBERT, V., TERZIAN, C. &
BUCHETON, A. (2002). Drosophila germline invasion by the
endogenous retrovirus gypsy : involvement of the viral env gene.
Insect Biochemistry and Molecular Biology 32, 1249–1256.
PETROV, D. A. (2001). Evolution of genome size : new approaches
to an old problem. Trends in Genetics 17, 23–28.
PIMPINELLI, S., BERLOCO, M., FANTI, L., DIMITRI, P., BONACCORSI,
S., MARCHETTI, E., CAIZZI, R., CAGGESE, C. & GATTI, M. (1995).
Transposable elements are stable structural components of
Drosophila melanogaster heterochromatin. Proceedings of the National
Academy of Sciences, USA 92, 3804–3808.
PONGER, L. & MOUCHIROUD, D. (2002). CpGProD : identifying
CpG islands associated with transcription start sites in large
genomic mammalian sequences. Bioinformatics 18, 631–633.
James A. Shapiro and Richard von Sternberg
PRESCOTT, D. M. (2000). Genome gymnastics : unique modes of
DNA evolution and processing in ciliates. Nature Reviews Genetics
1, 191–198.
PRICE, G. B., ALLARAKHIA, M., COSSONS, N., NIELSEN, T., DIAZPEREZ, M., FRIEDLANDER, P., TAO, L. & ZANNIS-HADJOPOULOS,
M. (2003). Identification of a cis-element that determines
autonomous DNA replication in eukaryotic cells. Journal of
Biological Chemistry 278, 19649–19659.
PRYDE, F. E. & LOUIS, E. J. (1999). Limitations on natural TPE in
yeast. EMBO Journal 18, 2538–2550.
PTASHNE, M. (1986). A Genetic Switch : Phage lambda and Higher
Organisms. Cell Press and Blackwell Scientific Publications,
Cambridge, MA, 2nd edition.
QUIBERONI, A., REZAIKI, L., EL KAROUI, M., BISWAS, I., TAILLIEZ, P.
& GRUSS, A. (2001). Distinctive features of homologous recombination in an ‘old ’ microorganism, Lactococcus lactis. Research in
Microbiology 152, 131–139.
REINHART, B. J. & BARTEL, D. P. (2002). Small RNAs correspond to
centromere heterochromatic repeats. Science 297, 1831.
RICHARDSON, A. R. & STOJILJKOVIC, I. (1999). HmbR a
hemoglobin-binding outer membrane protein of Neisseria
meningitidis undergoes phase variation. Journal of Bacteriology 18,
2067–2074.
ROBERTSON, K. D. (2002). DNA methylation and chromatin :
unraveling the tangled web. Oncogene 21, 5361–5379.
ROLLINI, P., NAMCIU, S. J., MARSDEN, M. D. & FOURNIER, R. E. K.
(1999). Identification and characterization of nuclear matrixattachment regions in the human serpin gene cluster at 14q32.1.
Nucleic Acids Research 27, 3779–3791.
ROTHENBURG, S., KOCH-NOLTE, F., RICH, A. & HAAG, F. (2001). A
polymorphic dinucleotide repeat in the rat nucleolin gene forms
Z-DNA and inhibits promoter activity. Proceedings of the National
Academy of Sciences, USA 98, 8985–8990.
ROY, P. J., STUART, J. M., LUND, J. & KIM, S. K. (2002).
Chromosomal clustering of muscle expressed genes in
Caenorhabditis elegans. Nature 418, 975–979.
RUBIN, C. M., KIMURA, R. H. & SCHMID, C. W. (2002). Selective
stimulation of translational expression by Alu RNA. Nucleic Acids
Research 30, 3253–3261.
RUI, W. J. (1999). Regulation of eukaryotic DNA replication and
nuclear structure. Cell Research 9, 163–170.
SANGWAN, I. & O’BRIAN, M. R. (2002). Identification of a soybean
protein that interacts with GAGA element dinucleotide repeat
DNA. Plant Physiology 129, 1788–1794.
SANTI, L., WANG, Y., STILE, M. R., BERENDZEN, K., WANKE, D.,
ROIG, C., POZZI, C., MULLER, K., MULLER, J., ROHDE, W.
& SALAMINI, F. (2003). The GA octodinucleotide repeat
binding factor BBR participates in the transcriptional
regulation of the homeobox gene Bkn3. Plant Journal 34,
813–826.
SARKARI, J., PANDIT, N., MOXON, E. R. & ACHTMAN, M. (1994).
Variable expression of the Opc outer membrane protein in
Neisseria meningitidis is caused by size variation of a
promoter containing poly-cytidine. Molecular Microbiology 13,
207–217.
SAUNDERS, N. J., JEFFRIES, A. C., PEDEN, J. F., HOOD, D. W.,
TETTELIN, H., RAPPUOLI, R. & MOXON, E. R. (2000). Repeatassociated phase variable genes in the complete genome
sequence of Neisseria meningitidis strain MC58. Molecular
Microbiology 37, 207–215.
SAVELIEV, A., EVERETT, C., SHARPE, T., WEBSTER, Z. &
FESTENSTEIN, R. (2003). DNA triplet repeats mediate
Why repetitive DNA is essential to genome function
heterochromatin-protein-1-sensitive variegated gene silencing.
Nature 422, 909–913.
SAWITZKE, J. & AUSTIN, S. (2001). An analysis of the factory model
for chromosome replication and segregation in bacteria.
Molecular Microbiology 40, 786–794.
SCHILD-POULTER, C., MATHEOS, D., NOVAC, O., CUI, B., GIFFIN,
W., RUIZ, M. T., PRICE, G. B., ZANNIS-HADJOPOULOS, M. &
HACHE, R. J. (2003). Differential DNA binding of Ku antigen
determines its involvement in DNA replication. DNA and Cell
Biology 22, 65–78.
SCHMID, C. (1996). Alu : structure, origin, evolution, significance,
and function of one-tenth of human DNA. Progress in Nucleic Acids
Research and Molecular Biology 53, 283–319.
SCHOLES, D. T., KENNY, A. E., GAMACHE, E. R., MOU, Z. &
CURCIO, M. J. (2003). Activation of a LTR-retrotransposon by
telomere erosion. Proceedings of the National Academy of Sciences, USA
100, 15736–15741.
SCHOTTA, G., EBERT, A., DORN, R. & REUTER, G. (2003). Positioneffect variegation and the genetic dissection of chromatin
regulation in Drosophila. Seminars in Cell and Developmental Biology
14, 67–75.
SCHUELER, M. G., HIGGINS, A. W., RUDD, M. K., GUSTASHAW, K.
& WILLARD, H. F. (2001). Genomic and genetic definition of a
functional human centromere. Science 294, 109–115.
SELKER, E. U. (1990). Premeiotic instability of repeated sequences
in Neurospora crassa. Annual Review of Genetics 24, 579–613.
SHAPIRO, J. A. (1983). Mobile Genetic Elements. Academic Press, New
York.
SHAPIRO, J. A. (1999 a). Genome system architecture and natural
genetic engineering in evolution. Annals of the New York Academy of
Science 870, 23–35.
SHAPIRO, J. A. (1999 b). Transposable elements as the key to a 21st
Century view of evolution. Genetica 107, 171–179.
SHAPIRO, J. A. (2002a). Genome organisation and reorganisation
in evolution : formatting for computation and function. In From
Epigenesis to Epigenetics : The Genome in Context (eds. L. Van
Speybroeck, G. Van de Vijver and D. De Waele). Annals of the
New York Academy of Science 981, 111–134.
SHAPIRO, J. A. (2002 b). Repetitive DNA, genome system architecture and genome reorganisation. Research in Microbiology 150,
447–453.
SHARP, S. I., PICKRELL, J. K. & JAHN, C. L. (2003). Identification of
a novel ‘ chromosome scaffold ’ protein that associates with Tec
elements undergoing en masse elimination in Euplotes crassus.
Molecular Biology of the Cell 14, 571–584.
SHIROISHI, T., KOIDE, T., YOSHINO, M., SAGAI, T. & MORIWAKI, K.
(1995). Hotspots of homologous recombination in mouse
meiosis. Advances in Biophysics 31, 119–132.
SILVA, J. C., SHABALINA, S. A., HARRIS, D. G., SPOUGE, J. L. &
KONDRASHOVI, A. S. (2003). Conserved fragments of transposable elements in intergenic regions : evidence for widespread
recruitment of MIR-and L2-derived sequences within the mouse
and human genomes. Genetical Research 82, 1–18.
SKRYABIN, B. V., KREMERSKOTHEN, J., VASSILACOPOULOU, D.,
DISOTELL, T. R., KAPATINOV, V. V., JORKA, J. & BROSIUS, J.
(1998). The BC200 RNA gene and its neural expression are
conserved in Anthropoidea (primates). Journal of Molecular
Evolution 47, 677–685.
SONG, X., SUI, A. & GAREN, A. (2004). Binding of mouse VL30
retrotransposon RNA to PSF protein induces genes repressed by
PSF : Effects on steroidogenesis and oncogenesis. Proceedings of the
National Academy of Sciences, USA 101, 621–626.
23
SPEEK, M. (2001). Antisense promoter of human L1 retrotransposon drives transcription of adjacent cellular genes.
Molecular and Cellular Biology 21, 1973–1985.
SPOFFORD, J. B. (1976). Position-effect variegation in Drosophila. In
The Genetics and Biology of Drosophila (eds. M. Ashburner and
E. Novitski), pp. 955–1018. Academic Press, New York.
SPOFFORD, J. B. & DESALLE, R. (1991). Nucleolus organisersuppressed position-effect variegation in Drosophila melanogaster.
Genetical Research 57, 245–255.
STEIN, L. D., BAO, Z., BLASIAR, D., BLUMENTHAL, T., BRENT, M. R.,
CHEN, N., CHINWALLA, A., CLARKE, L., CLEE, C., COGHLAN, A.,
et al. (2003). The genome sequence of Caenorhabditis briggsae :
a platform for comparative genomics. PLoS Biology 1, 166–192.
STERN, A. & MEYER, T. F. (1987). Common mechanism controlling
phase and antigenic variation in pathogenic neisseriae. Molecular
Microbiology 1, 5–12.
TCHÉNIO, T., CASELLA, J.-F. & HEIDMANN, T. (2000). Members of
the SRY family regulate the human LINE retrotransposons.
Nucleic Acids Research 28, 411–415.
TOMILIN, N. V., BOZHKOV, V. M., BRADBURY, E. M. & SCHMID,
C. W. (1992). Differential binding of human nuclear proteins to
Alu subfamilies. Nucleic Acids Research 20, 2941–2945.
TOVAR, D., FAYE, J. C. & FAVRE, G. (2003). Cloning of the human
RHOB gene promoter : characterization of a VNTR sequence
that affects transcriptional activity. Genomics 81, 525–530.
TRIFONOV, E. N. (1999). Elucidating sequence codes : three codes
for evolution. Annals of the New York Academy of Science 870,
330–338.
TSUCHIYA, T., SAËGUSA, Y., TAIRA, T., MIMORI, T., IGUCHI-ARIGA,
S. M. M. & ARIGA, H. (1998). Ku antigen binds to Alu family
DNA. Journal of Biochemistry 123, 120–127.
VAFA, O. & SULLIVAN, K. F. (1997). Chromatin containing CENPA and alpha-satellite DNA is a major component of the inner
kinetochore plate. Current Biology 7, 897–900.
VAN DER ENDE, A., HOPMAN, C. T., ZAAT, S., ESSINK, B. B.,
BERKHOUT, B. & DANKERT, J. (1995). Variable expression of class
1 outer membrane protein in Neisseria meningitidis is caused by
variation in the spacing between the x10 and x35 regions of
the promoter. Journal of Bacteriology 177, 2475–2480.
VAN DRIEL, R., FRANSZ, P. F. & VERSCHURE, P. J. (2003). The
eukaryotic genome : a system regulated at different hierarchical
levels. Journal of Cell Science 116, 4067–4075.
VANSANT, G. & REYNOLDS, W. F. (1995). The consensus sequence
of a major Alu subfamily contains a functional retinoic acid
response element. Proceedings of the National Academy of Sciences, USA
92, 8229–8233.
VAN SPEYBROECK, L., VAN DE VIJVER, G. & DE WAELE, D. (2002).
From Epigenesis to Epigenetics : The Genome in Context. Annals of the
New York Academy of Science, volume 981, New York.
VAN STEENSEL, B., DELROW, J. & BUSSEMAKER, H. J. (2003).
Genomewide analysis of Drosophila GAGA factor target genes
reveals context-dependent DNA binding. Proceedings of the National
Academy of Sciences, USA 100, 2580–2585.
VASSETZKY, N. S., TEN, O. A. & KRAMEROV, D. A. (2003). B1 and
related SINEs in mammalian genomes. Gene 319, 149–160.
VERDEL, A., JIA, S., GERBER, S., SUGIYAMA, T., GYGI, S., GREWAL,
S. I. S. & MOAZED, D. (2004). RNAi-mediated targeting of heterochromatin by the RITS complex. Science 303, 672–676.
VERSTEEG, R., VAN SCHAIK, B. D., VAN BATENBURG, M. F., ROOS,
M., MONAJEMI, R., CARON, H., BUSSEMAKER, H. J. & VAN
KAMPEN, A. H. (2003). The human transcriptome map reveals
extremes in gene density, intron length, GC content, and repeat
24
pattern for domains of highly and weakly expressed genes.
Genome Research 13, 1998–2004.
VINOGRADOV, A. E. (2003). Selfish DNA is maladaptive : evidence
from the plant Red List. Trends in Genetics 19, 609–614.
VOLPE, T. A., KIDNER, C., HALL, I. M., TENG, G., GREWAL, S. I.
& MARTIENSSEN, R. A. (2002). Regulation of heterochromatic
silencing and histone H3 lysine-9 methylation by RNAi. Science
297, 1833–1837.
VOLPE, T. A., SCHRAMKE, V., HAMILTON, G. L., WHITE, S. A.,
TENG, G., MARTIENSSEN, R. A. & ALLSHIRE, R. C. (2003). RNA
interference is required for normal centromere function in fission
yeast. Chromosome Research 11, 137–146.
WAHLE, E. & RUEGSEGGER, U. (1999). 3k-End processing of
pre-mRNA in eukaryotes. FEMS Microbiology Reviews 23,
277–295.
WAKIMOTO, B. T. (1998). Beyond the nucleosome : epigenetic
aspects of position-effect variegation in Drosophila. Cell 93,
321–324.
WANG, G., VAN DAM, A. P. & DANKERT, J. (1995). Analysis of a
VMP-like sequence (vls) locus in Borrelia garinii and Vls homologues among four Borrelia burgdorferi sensu lato species. FEMS
Microbiological Letters 199, 39–45.
WATSON, J. B. & SUTCLIFFE, J. G. (1987). Primate brain-specific
cytoplasmic transcript of the Alu repeat family. Molecular and
Cellular Biology 7, 3324–3327.
WEINER, A. M., DEININGER, P. L. & EFSTRATIADIS, A. (1986).
Nonviral retroposons : Genes, pseudogenes and transposable
elements generated by the reverse flow of genetic information.
Annual Review of Biochemistry 55, 631–661.
WIDLUND, H. R., KUDUVALLI, P. N., BENGTSSON, M., CAO, H.,
TULLIUS, T. D. & KUBISTA, M. (1999). Nucleosome structural
features and intrinsic properties of the TATAAACGCC repeat
sequence. Journal of Biological Chemistry 274, 31847–31852.
WYNGAARD, G. A. & GREGORY, T. R. (2001). Temporal control
of DNA replication and the adaptive value of chromatin
diminution in copepods. Journal of Experimental Zoology 291,
310–316.
WYRICK, J. J., APARICIO, J. G., CHEN, T., BARNETT, J. D., JENNINGS,
E. G., YOUNG, R. A., BELL, S. P. & APARICIO, O. M. (2001).
Genome-wide distribution of ORC and MCM proteins in
S. cerevisiae : high-resolution mapping of replication origins. Science
14, 2357–2360.
James A. Shapiro and Richard von Sternberg
YANG, Z., BOFELLI, D., BOONMARK, N., SCHWARTZ, K. & LAWN, R.
(1998). Apolipoprotein(a) gene enhancer resides within a LINE
element. Journal of Biological Chemistry 273, 891–897.
YATES, P. A., BURMAN, R. W., MUMMANENI, P., KRUSSEL, S. &
TURKER, M. S. (1999). Tandem B1 elements located in a mouse
methylation center provide a target for de novo DNA methylation.
Journal of Biological Chemistry 274, 36357–36361.
YOUN, B. S., LIM, C. L., SHIN, M. K., HILL, J. M. & KWON, B. S.
(2002). An intronic silencer of the mouse perforin gene. Molecular
Cells 13, 61–68.
YU, J., HU, S., WANG, J., WONG, G. K., LI, S., LIU, B., DENG, Y.,
DAI, L., ZHOU, Y., ZHANG, X., et al. (2002). A draft sequence of
the rice genome (Oryza sativa L. ssp. indica). Science 296, 79–92.
YUH, C. H., BOLOURI, H. & DAVIDSON, E. H. (1998). Genomic
cis-regulatory logic : experimental and computational analysis
of a sea urchin gene. Science 279, 1896–1902.
ZAISS, D. M. W. & KLOETZEL, P.-M. (1999). A second gene
encoding the mouse proteosome activator b subunit is part of a
LINE1 element and is driven by a LINE1 promoter. Journal of
Molecular Biology 287, 829–835.
ZEARFOSS, N. R., CHAN, A. P., KLOC, M., ALLEN, L. H. & ETKIN,
L. D. (2003). Identification of new Xlsirt family members in the
Xenopus laevis oocyte. Mechanisms of Development 120, 503–509.
ZHONG, C. X., MARSHALL, J. B., TOPP, C., MROCZEK, R., KATO, A.,
NAGAKI, K., BIRCHLER, J. A., JIANG, J. & DAWE, R. K. (2002).
Centromeric retroelements and satellites interact with maize
kinetochore protein CENH3. Plant Cell 14, 2825–2836.
ZHOU, Y.-H., ZHENG, J. B., GU, X., LI, W.-H. & SAUNDERS, G. F.
(2000). A novel Pax-6 binding site in rodent B1 repetitive
elements : coevolution between developmental regulation and
repeated elements ? Gene 245, 319–328.
ZHOU, Y.-H., ZHENG, J. B., GU, X., SAUNDERS, G. F. & YUNG,
W.-K. A. (2002). Novel Pax6 binding sites in the human genome
and the role of repetitive elements in the evolution of gene
regulation. Genome Research 12, 1716–1722.
ZHU, L., SWERGOLD, G. D. & SELDIN, M. F. (2003). Examination of
sequence homology between human chromosome 20 and the
mouse genome : intense conservation of many genomic
elements. Human Genetics 113, 60–70.
ZUCKERKANDL, E. (2002). Why so many noncoding nucleotides ?
The eukaryote genome as an epigenetic machine. Genetica 115,
105–129.