THE EVOLUTIONARY DYNAMICS OF EUKARYOTIC GENE ORDER

REVIEWS
THE EVOLUTIONARY DYNAMICS
OF EUKARYOTIC GENE ORDER
Laurence D. Hurst*, Csaba Pál*‡ and Martin J. Lercher*
In eukaryotes, unlike in bacteria, gene order has typically been assumed to be random. However,
the first statistically rigorous analyses of complete genomes, together with the availability of
abundant gene-expression data, have forced a paradigm shift: in every complete eukaryotic
genome that has been analysed so far, gene order is not random. It seems that genes that have
similar and/or coordinated expression are often clustered. Here, we review this evidence and ask
how such clusters evolve and how this relates to mechanisms that control gene expression.
TRANSGENE
Foreign DNA that is inserted
experimentally into totipotent
embryonic cells or into
unicellular organisms.
POSITION EFFECT
In general terms, any effect of a
gene’s genomic location on its
expression. A phenomenon that
is often observed in transgenic
organisms in which
transcription of an inserted
transgene is affected by the
proximity to heterochromatin.
*Department of Biology
and Biochemistry,
University of Bath,
Bath BA2 7AY, UK.
‡
MTA, Theoretical Biology
Research Group,
Eötvös Loránd University,
Pázmány Péter Sétány 1/C,
Budapest H-1117, Hungary.
Correspondence to L.D.H.
e-mail: [email protected]
doi:10.1038/nrg1319
NATURE REVIEWS | GENETICS
The organization of genes within a genome (gene
order) can be considered at two levels: first, between
chromosomes (for example, comparison of the distribution of genes among autosomes and sex chromosomes), and second, within chromosomes. Here, we
focus on the second of these two levels: in particular we
discuss the evidence for, and the causes of, non-random
gene order within chromosomes.
The idea that genes in eukaryotic genomes could be
distributed non-randomly, and, moreover, that genes of
comparable and/or coordinated expression might cluster, is important in terms of understanding how
genomes function and how they have evolved. This idea,
however, also has important practical implications. For
example, it might explain why an intact gene in a novel
genomic location can have a pathological phenotype1–4.
Moreover, understanding how and why genes cluster
might also be important if we are to understand development5 and ageing. For example, in contrast to genes
that are upregulated in quiescent cells (that is, those that
are in reversible proliferative arrest), upregulated genes
in cells that undergo replicative senescence (irreversible
proliferative arrest) are clustered 6.
The extent to which the location of a gene in a
genome affects expression is also important when we
consider genetic modification. TRANSGENE activity can
depend on the chromosomal integration site (see, for
example, REF. 7), and some argue that successful manipulation of genomes must wait until we understand the
causes of POSITION EFFECTS1. Importantly, recent analyses
indicate that genomic regions that contain the most
actively expressed genes are also those of the highest
gene density8, which makes it more probable that functional integration would interfere with other genes.
Understanding position effects should, and does,
inform the design of gene-therapy vectors, both with
respect to improving their efficacy7,9 and their safety10.
With the plethora of statistically rigorous wholegenome analyses of gene organization that have recently
been published, for the first time we are in a position to
make a general assessment of the dynamics of gene
order in eukaryotic genomes. Here, we review what is
known about the non-random gene order in eukaryotes
and the underlying molecular mechanisms that favour
coordinated gene expression. In particular, we ask what
is the evidence for non-random order (see TABLE 1),
what are the probable mechanisms of coordinated regulation, is it probable that selection for coordinated
expression explains the initial evolution of linkage in all
cases, and are the clusters maintained by selection (and
should we expect them to be)?
Evidence for non-random gene order
Anecdotal evidence for clusters. In some ways, the null
hypothesis of eukaryotic gene order — random distribution along chromosomes — could be viewed as a straw
man. Even before whole-genome sequences were available, numerous apparent exceptions were known. Many
of these, however, involve tandem duplicates, the Hox
and globin clusters being examples. What, if anything, is
the worth of anecdotal observations of clusters that are
not explained by tandem duplication?
VOLUME 5 | APRIL 2004 | 2 9 9
REVIEWS
Table 1 | Whole-genome studies on gene order in relation to expression or protein function
Species
Experimental
method
Controlled
Observation
for duplicates
Reference
Protists
P. falciparum
Chromatography/ No
mass spectrometry
Clusters of co-expressed proteins (3–6 neighbours)
124
Fungi
S. cerevisiae
Microarray
No
Clusters of cell-cycle-dependent genes (direct neighbours)*
18
S. cerevisiae
Microarray, MIPS
Yes
Pairs (triplets) of co-expressed neighbouring genes, independent of orientation; pairs of
neighbouring genes with similar function (MIPS)
19
S. cerevisiae
Microarray
No
Pairs of co-expressed neighbouring genes, independent of orientation
52
S. cerevisiae
Knockout
Yes
Essential genes that are clustered in regions of low recombination, independent of co-expression 111
Z. mays
Polymorphisms,
cDNA, QTL
No
10–30-centimorgan (cM)-long clusters that contain developmental genes*
A. thaliana
Microarray, cDNA Yes
Regions of correlated expression patterns, in part owing to functionally related genes (KEGG)
A. thaliana
Microarray
Yes
Regions of increased expression rate; regions of correlated expression patterns, with few
functionally related genes
A. thaliana
Microarray
Yes
Clusters of co-expressed genes
26
Yes
Clusters of adjacent co-expressed genes encompassing 20% of genes; few clusters are enriched
for functional classes (GO); no association with cytogenetic bands or matrix-attachment regions
24
Plants
28
25
27 ‡
Animals: invertebrates
D. melanogaster Microarray
D. melanogaster EST
Yes
Clusters of adjacent tissue-specific genes; no clusters on the X chromosome
23
D. melanogaster Microarray
No
Small clusters of circadian genes; evidence for regular spacing between some clusters
49
C. elegans
Microarray
No
Operons (2–8 genes) that contain 15% of genes
20
C. elegans
mRNA tagging/
microarray
Yes
Clusters of adjacent genes that are expressed in muscle (including housekeeping genes)
22
C. elegans
RNAi, microarray
No
Large regions that are enriched in genes of similar RNAi phenotype/co-expressed genes
102
C. elegans
Microarray
Yes
Clusters of co-expressed neighbouring genes; long-range owing to duplicate genes,
short-range owing to operons and neighbouring genes on opposing strands
21
Animals: mammals
H. sapiens
EST
No
Clusters of muscle-expressed genes (including housekeeping genes)
31
H. sapiens
SAGE
No
Large regions of highly expressed genes
36
H. sapiens
EST
No
Clusters of genes that are expressed in adipose tissue*
34
H. sapiens
SAGE, EST
Yes
Clusters of housekeeping genes; regions of highly expressed genes/genes that are expressed
in one tissue as secondary effects
29
H. sapiens
SAGE, EST
No
Housekeeping genes that are mainly located in high GC and R-bands§
72
H. sapiens
SAGE
No
Regions of highly expressed genes at high GC; clusters of genes with low levels of expression
H. sapiens
EST
No
Clusters of tissue-specific genes
H. sapiens
EST
No
Regions of increased expression in tumours
3
H. sapiens
Microarray
No
Clusters of genes that are upregulated in senescent, or downregulated in quiescent cells
6
M. musculus
EST
No
M. musculus
RNA in situ hybrid, No
PCR
8
39
Clusters of extra-embryonically expressed genes*
30
Clusters of genes that are upregulated in the brain and downregulated in the heart, lung,
testis and muscle
40
Multiple phyla
S. cerevisiae,
KEGG pathways
A. thaliana,
C. elegans,
D. melanogaster,
H. sapiens
No
Clusters of genes that are involved in the same pathway; the number of clustered
pathways is variable: yeast (98%) > human > worm > A. thaliana > fly (30%)
17
S. cerevisiae,
Microarray
C. elegans,
D. melanogaster,
H. sapiens,
M. musculus,
R. norvegicus
Yes
Comparable clustering of co-expressed genes in all genomes; in yeast, there are many
functional overlaps (GO) between clustered genes, whereas in humans there are only a few
functional overlaps
41
*Studies in which statistics were not robust (statistics were robust in all other studies). ‡Additional details from T. Zhu (personal communication). §Chromosome banding
patterns that are produced by Giemsa staining (G-bands). The reciprocal pattern (reverse, or R-bands) can be produced with various other staining procedures. A. thaliana,
Arabidopsis thaliana; C. elegans, Caenorhabditis elegans; D. melanogaster, Drosophila melanogaster; GO, Gene Ontology; H. sapiens, Homo sapiens; KEGG, Kyoto
Encyclopaedia of Genes and Genomes; MIPS, Munich Information Centre for Protein Sequences; M. musculus, Mus musculus; P. falciparum, Plasmodium falciparum; QTL,
quantitative trait loci; R. norvegicus, Rattus norvegicus; RNAi, RNA interference; S. cerevisiae, Saccharomyces cerevisiae; SAGE, serial analysis of gene expression; Z. mays,
Zea mays.
300
| APRIL 2004 | VOLUME 5
www.nature.com/reviews/genetics
REVIEWS
Box 1 | Genome-wide analysis of gene clusters: statistical considerations
To find non-trivial cases of non-random gene order, it is necessary to start by formulating a test function. This measures
the degree of order in the genome. For example, in assaying the clustering of essential genes in a genome, the frequency
with which an essential gene has another essential gene as its immediate neighbour might be considered. After
determining the value of the function for the real genome, we then ask how often this figure or higher would be observed
if gene order were random. To do this, we must define a null and test for deviation from it.
Testing for deviation from the null is often best done by randomizing the location of genes in a genome, recalculating
the test function for the random genome and repeating this process many times. This generates the null distribution of
values of the test function. The real value can then be compared with this distribution. If there are n random simulants
and r have a test score that is equal to or greater than that observed in the real data, the probability (p) of observing the
degree of order that is seen in the real genome is given in equation 1.
p=
r+1
n+1
(1)
The rules that are used to define a ‘random’ genome define the null hypothesis that is being examined. A common null
hypothesis is that there is a lack of spatial pattern in the distribution of genes with shared properties. The simplest
procedure, then, is to allow, in randomizations, any gene to assume any ‘location’ in the genome while preventing two or
more genes from assuming the same location. However, this often fails to exclude trivial or competing biological
explanations. For example, the presence of tandem duplicates can lead to a deviation from random as they can show
similar properties (that is, expression profiles) that result from common evolutionary history or experimental design (for
example, cross-hybridization in microarray studies). If the physical location of genes is of interest, rather than their order
alone, the null should reflect observed gene-density variation. One problem with cluster analysis using quantitative trait
loci (QTLs) is that the null often supposes an equal probability of finding a gene in all genomic locations.
Differences in generating random gene-order variants mean that the results of different randomization studies are often
difficult to compare. Alternative analytical methods therefore have attractions. A few studies116,117 elaborate exact
analytical solutions or approximate formulae for non-random gene distribution or borrow previously elaborated
methods from time-series analysis118. In many cases, however, randomization seems the only tractable method.
IMPRINTED GENES
Genes that are expressed from
only one of the two parental
copies, the choice being
dependent on the sex of the
parent from which the gene
was derived.
CO-EXPRESSION
A property of genes that show
similar spatial or temporal
expression patterns.
NATURE REVIEWS | GENETICS
Perhaps the strongest anecdotal evidence of nonrandom gene order in eukaryotes is the observed clustering of mammalian IMPRINTED GENES11. However, this
clustering could be related to localized cis activity of
imprint control regions and so could be interpreted as a
strange exception to the rule of random gene order11. On
a more limited scale, there are numerous reports of gene
clusters of related function (see, for example, REFS 12–15).
For instance, human glutamine phosphoribosyl pyrophosphate amidotransferase (GPAT), which is necessary
for the initial step in de novo purine synthesis, and phosphoribosylamidoimidazole-succinocarboxamide synthase (AIRC), which encodes an enzyme for later steps
in the pathway, are closely linked16.
Do such incidences disturb our null hypothesis? The
problem is that in random genomes, we expect curious
co-incidental linkages. Consequently, these anecdotes fail
to show that there is more clustering than would be
expected by chance. Even assuming that we can eliminate
tandem duplication as a cause, the problems are numerous (see also BOX 1). First, we have no a priori expectation
as to which genes should be clustered and which should
not. So, to understand the statistical significance of the
finding of a given cluster, we need to ask, not, for example, how often GPAT and AIRC reside next to each other
in random genomes, but how often two or more genes
that act in the same pathway are found next to each other.
As we had no expectation that purine synthesis would be
unusual, we must consider all ‘comparable’ pathways
(N.B. defining ‘comparable’ is problematic).
We then need to determine the null expected number
of incidences of linkage of genes in the same pathway
(see BOX 1). To do this, we will need much more extensive data than is provided in the original observation of
linkage of two genes. Therefore, although larger clusters
— such as the seven linked genes that are involved in
quinic acid utilization in fungi12 — are strongly indicative of non-random gene order, rigorous analysis is only
possible with complete genome data. We can now, for
example, ask whether there are more metabolic pathways in which two or more genes are clustered than
would be expected by chance17. However, determining
whether any given cluster requires special explanation
remains problematic. Nonetheless, the same statistical
tools can be applied, albeit with less power, to address
this issue.
Evidence of clustering from whole-genome studies. The
study of genes that are involved in the mitotic cell cycle
in yeast by Cho et al. was the first to show clustering of
18
CO-EXPRESSED genes on a genomic scale . They found that
25% of genes with cell-cycle-dependent expression patterns were directly adjacent to genes induced in the
same phase of the cell cycle (see also REF. 19). Clusters of
co-expressed yeast genes rarely seem to exceed ten genes
or a few kilobases (C.P., unpublished observations).
Although clusters of a similar size are found in the
worm Caenorhabditis elegans, many of these clusters
could be attributed to the co-transcription of these
genes in operons: a process that is unusual among
eukaryotes. Approximately 15% of C. elegans genes are
contained in operons — that is, stretches of two to
eight genes that are transcribed into polycistronic premRNAs20. Although operons, together with tandemly
VOLUME 5 | APRIL 2004 | 3 0 1
REVIEWS
QUANTITATIVE TRAIT LOCI
(QTLs). Genes that segregate
for a quantitative trait. QTL
mapping allows the
determination of the genomic
location of QTL using genetic
markers.
SERIAL ANALYSIS OF GENE
EXPRESSION
(SAGE). An experimental
method for determining
transcript abundances in a tissue
on the basis of sequencing
thousands of short gene-specific
tags.
TRANSCRIPTION-COUPLED
REPAIR
A specialized repair pathway that
counteracts the toxic effects of
DNA damage in
transcriptionally active genes.
EXPRESSION BREADTH
The number of tissues in which
a gene is expressed.
EXPRESSION RATE
mRNA or protein abundances of
a gene in a given tissue or under
given cellular conditions.
KYOTO ENCYCLOPAEDIA OF
GENES AND GENOMES
(KEGG). An online database
that integrates current
knowledge on molecular
interaction networks (for
example, metabolic pathways
and protein complexes).
MUNICH INFORMATION
CENTRE FOR PROTEIN
SEQUENCES
(MIPS). An online database
that provides protein sequencerelated information on the basis
of whole-genome analysis of
Saccharomyces cerevisiae,
Arabidopsis thaliana and
Neurospora crassa.
GENE ONTOLOGY [DATABASE]
(GO). A collaborative effort to
address the need for consistent
descriptions and functional
classification of gene products
in different databases.
302
| APRIL 2004 | VOLUME 5
duplicated genes, account for most of the observed coexpression clusters in the worm21, significant local
co-expression is still evident after excluding these two
causes21,22.
Clusters of co-expressed genes in multicellular
eukaryotes can be substantially larger than those that are
described in yeast and the worm. In Drosophila
melanogaster, 45% of genes that are expressed only in
testes were found in uninterrupted stretches of at least
four genes23. However, a looser definition of a cluster
that allows for intervening genes with different expression patterns led to the identification of much larger
groups of co-expressed genes. When averaging coexpression over 10-kb windows, Spellman and Rubin
found that 20% of genes occur in co-expression clusters
that span 10–30 genes or, on average, 125 kb of DNA24.
Within the Arabidopsis thaliana genome, co-expression
clusters (excluding tandem duplicates) span up to 20
genes25 (see also REFS 26,27), and QUANTITATIVE TRAIT LOCUS
(QTL) studies indicate that they might be considerably
larger28.
The physical scale of co-expression seems to be
even larger in mammals, with clusters that extend up
to 1,000 kb (REF. 29). Several reports30–35 indicate that,
when a large body of cDNAs (or ESTs) are extracted for a
given tissue, the genes that specify the proteins tend to
cluster in the genome. Other reports note that highly
expressed genes, defined by SERIAL ANALYSIS OF GENE EXPRESSION
(SAGE) tags, tend to cluster in large domains (regions
of increased gene expression; RIDGEs)8,36. Similarly,
TRANSCRIPTION-COUPLED REPAIR is prominent in specific
chromosomal domains37, although this pattern might
reflect variation in gene density. Lercher et al.29 argue
that all these patterns might be explained by a tendency for (on average) highly expressed housekeeping genes to cluster. They note that although genes
tend to cluster according to their EXPRESSION BREADTH
even if their EXPRESSION RATE is controlled for, they do
not tend to cluster according to their expression rate
if breadth is controlled for. This is not to say that tissue-specific, highly expressed genes in the clusters
cannot be identified 8, simply that the dominant
trend is for clustering of genes that are expressed in
many tissues.
Lercher et al.29 also concluded that they could not find
much evidence for clustering by tissue of tissue-specific
genes — for example, genes that are expressed only in
muscle do not tend to cluster with other genes that are
expressed only in muscle. A significant clustering was
detected for 4 of 14 tissues, but only 1 remains significant
after control for multiple testing. Clustering of tissue-specific genes is, however, well described for testes-specific
genes in flies23. Whether the human genome contains
blocks of multiple genes that are expressed exclusively in
the same tissue remains to be resolved.
However, this might not be the most important
question. Given evidence for tissue-specific chromosomal silencing38, it might be more relevant to ask whether
there are chromosomal domains that are associated
with up- or downregulation in a given tissue. Evidence
for such clustering has been found39,40, although these
studies did not control for the effect of tandem duplicates. These clusters of co-suppressed genes extend over
several megabases40.
In summary, there is extensive evidence for the clustering of co-expressed genes across all major eukaryotic
kingdoms. However, there seems to be a correlation
between the physical size of clusters and organismal
complexity, with cluster size ranging from a few kilobases in yeast to several megabases in mammals (see
also REF. 41). This might be partly explained by differences in genome compactness; however, it might also
reflect different underlying mechanisms.
Are functionally related genes clustered? Bacterial operons often consist of genes that are functionally related,
such as being part of the same metabolic pathway. Do
we more generally find that functionally related genes
cluster in eukaryotes? The answer to this question to
some extent depends on how we define functional relatedness. Unlike co-expression, what it means to be ‘functionally related’ is relatively ambiguous. It might mean
involvement in the same pathway, proteins that interact
with each other, or genes, the alleles of which affect the
same trait, and so on. There can be overlap between all
of these meanings.
Lee and Sonnhammer17 examined the physical location, in numerous genomes, of genes that have proteins
that are involved in metabolic pathways, defined from the
KYOTO ENCYCLOPAEDIA OF GENES AND GENOMES (KEGG) database. In all species examined (human, worm, fly,
A. thaliana and yeast), there was a significant tendency for
genes from the same metabolic pathway to cluster.
However, the fraction of pathways with significant chromosomal clustering of genes is highly variable, ranging
from 30% for D. melanogaster to a remarkable 98% for
yeast, whereas 11% are expected under the null hypothesis17. In at least one well-characterized example, a cluster
of genes evolved independently in two different species42.
Similarly, in yeast, the genes that are involved in stable protein–protein complexes tend to be more tightly
linked than expected43. Cooper13 has suggested that, in
the human genome, proteins tend to be linked to their
receptor. However, from a whole-genome analysis, we
find that the number of such incidences is not different
from random (L.D.H., C.P. and M.J.L., unpublished
observations). A less robust approach to identifying
functional clusters of genes is to examine the clustering
of QTLs mapped for any given trait. Several such QTL
studies indicate co-localization of QTLs for related
traits28,44–46. However, these effects might result from
variations in gene density or from multiple effects at
one gene and its control sequences.
The relationship between the above functional
clusters and co-expression clusters is often uncertain.
In a few cases, the link between co-expression and cofunctionality has been examined. In yeast, many genes
in co-expression clusters seem to be functionally related
— they either belong to the same MUNICH INFORMATION
19
CENTRE FOR PROTEIN SEQUENCES (MIPS) category or the
same GENE ONTOLOGY (GO) classification41. Similarly, in
A. thaliana, both genes with protein products that
www.nature.com/reviews/genetics
REVIEWS
a Primary (~10 kb) cis-acting elements
c Tertiary (~1,000 kb) active chromatin hub
b Secondary (~100 kb) histone modifications
d Tertiary (~1,000 kb) chromosome territories
Figure 1 | Schematic representation of the different levels of transcriptional co-regulation. a | At the primary level,
cis-acting elements directly affect the transcription of neighbouring genes. The figure depicts a bidirectional promoter causing
co-regulation of transcription of genes on the two DNA strands. This level will only affect genes within a few kilobases of each
other. b | At the secondary level, HISTONE modifications spread from a LOCUS CONTROL REGION (LCR) (depicted in orange) down the
CHROMATIN fibre until they are stopped at a BOUNDARY ELEMENTS (depicted in pink). Modification of the histones (depicted in red)
suppresses transcription of the intervening genes (grey boxes), whereas unmodified histones (green) beyond the boundary element
retain an open chromatin structure, thereby allowing transcription of a neighbouring gene (blue). This type of co-regulation will affect
regions of up to a few hundred kilobases. c | At the tertiary level, cis-acting elements (orange ovals) come together to form the node
of chromatin loops (the ‘active chromatin hub’). Genes close to the hub (blue) are accessible to transcription, whereas genes further
away (grey) are inaccessible. d | An alternative view of the tertiary level posits that chromatin is arranged in compact chromosome
territories, with transcription largely being restricted to territory surfaces (blue genes), but suppressed within the interior (grey genes).
In both pictures of tertiary-level regulation, effects are expected to range up to several megabases.
HISTONES
Positively charged DNA-binding
proteins that mediate the folding
of DNA.
LOCUS CONTROL REGION
(LCR). Cis-acting sequence that
organizes a gene cluster into an
active chromatin block and
enhances transcription.
CHROMATIN
A highly condensed structure of
DNA that is associated with
histone proteins and other
DNA-binding proteins.
BOUNDARY ELEMENTS [OR
INSULATORS]
Cis-acting DNA sequences that
act as barriers to the effects of
distal enhancers and silencers.
BIDIRECTIONAL PROMOTERS
Promoter sequences between
divergently transcribed
neighbouring gene pairs that
initiate transcription in both
directions.
POLYCISTRONIC TRANSCRIPT
mRNA that encodes several
polypeptides; a common
phenomenon in bacteria.
NATURE REVIEWS | GENETICS
interact and genes that act in the same pathway (defined
by KEGG) explain some but not all of the observed coexpression25. Clustering of co-expressed linked genes that
belong to the same GO category is relatively rare in
humans41. Although some functionally related genes are
found in co-expression clusters in D. melanogaster,
these seem to be mostly the result of tandemly duplicated genes24, 47.
Evidence for regular spacing of genes. Non-random
gene order need not be manifested as clustering. A rarely
considered possibility is that genes might be regularly
spaced. Intriguingly, Képès48 reports that genes that are
regulated by the same sequence-specific transcription
factor tend to be regularly spaced along the yeast chromosome. Regular spacing along chromosomes of coexpressed gene pairs, defined by chip array data, has
also been described in Saccharomyces cerevisiae19,49 and
D. melanogaster, but these reports might be artefacts of
the chip design50.
Mechanisms
There seems to be a broad split of incidences of coexpression into those that act on a relatively small local
scale and those that act over much broader genomic
spans. We argue that such a pattern is consistent with
what is known of the mechanisms for co-expression
(FIG. 1).
The simple null hypothesis is that a gene’s expression depends only on the promoters in its immediate
vicinity. Many of the local scale phenomena discussed
above are consistent with this model. Trivially, tandem
duplicates tend to have comparable expression because
they have comparable promoters21,51. In yeast18,19,52 and
in humans13,16,53,54 (see also REF. 25), some co-expression
of adjacent genes can be attributed to a BIDIRECTIONAL
PROMOTER that resides between the two. Similarly,
although there are POLYCISTRONIC TRANSCRIPTS in some
eukaryotes20,55–57, such incidences do not disturb the
simple ‘promoter-drives-expression’ model any more
than does the finding of genes nested within the introns
of other genes. At the extreme, co-expression of multiple linked genes is achieved by fusion of all of the genes
to make one protein product58,59.
However, even on the small scale, this simple null
model is unable to explain everything. Notably, cis
effects, such as the downstream effects of upstream activating sequences (UAS), explain some examples of tight
co-expression of gene pairs19. Moreover, the broader
scale co-expression patterns indicate that the promoterdrives-expression model is too simplistic, as does parallel
work that indicates that higher-order features are crucial
to understanding chromosomal domains of expression.
Beyond the one-dimensional array of genes on chromosomes, to understand gene expression, it now seems
important to consider two higher levels of chromosomal
VOLUME 5 | APRIL 2004 | 3 0 3
REVIEWS
DNA METHYLATION
Covalent modification of the
DNA that inhibits transcription
initiation.
HISTONE ACETYLATION/
DEACETYLATION
These processes regulate changes
in chromatin structure by
covalent modification of histone
proteins, and therefore influence
the ability of transcriptions
factors to bind to promoters.
CHROMATIN
IMMUNOPRECIPITATION
An experimental method that is
used for analysing the
acetylation state of histones in a
specific genomic region.
ISOCHORIC [STRUCTURE]
Large-scale variation in the G+C
content of vertebrate genomes.
organization: the state of chromatin and its positioning
within the nucleus (particularly its proximity to
intranuclear transcription-associated machinery).
These two features interact and often it is difficult to dissect out the two causes. The tightly packed state of DNA
(heterochromatin) that leaves genes largely inaccessible
to transcription factors (and is therefore transcriptionally incompetent) tends to reside towards the periphery
of the nucleus60. These issues have been well-reviewed
elsewhere60–62 so here we only deal with the main features that are pertinent to understanding the expression
clusters.
Chromatin-level regulation. Studies on the success of
transgene inserts show that inserts into heterochromatin tend to be inactive7,9. Chromatin is not, however,
static and transitions between states are linked to
changes in gene expression63 and are causally related
to covalent modifications of the core histones64.
The best current model indicates that specific histonemodifying proteins initiate the opening or closing of
chromatin (for example, at a locus control region
(LCR)), and that this modification spreads along a chromosome until it meets a boundary element65,66. In this
way, all the genes in a region of a chromosome might be
prevented from being expressed. Alternatively, a chromosomal region can be made accessible for transcription, but whether these genes are actually expressed
depends on other factors such as DNA METHYLATION status,
nuclear position, available transcription factors and cisUAS effects19. Consequently, we expect to see domains
of downregulation (see, for example, REF. 38) more than
we see domains of coordinated upregulation. Indeed,
Akashi et al.5 suggest a model on the basis of this sort of
premise. They propose that stem cells have a largely
open chromatin formation and each step towards specialization is accompanied by the downregulation of
genes in specific chromosomal regions .
In some cases, these modifications are stably inherited through cell division and are therefore important in
differentiation and development65. Different mechanisms of silencing have different consequences for the
stability of the silencing. For example, silencing by histone lysine methylation is reversed only by the slow
process of replacement of histones or through DNA
replication. In other cases, the modification can be
rapidly modulated by the alteration of activities of
HISTONE ACETYLASES and HISTONE DEACETYLASES (HDAC) (see,
for example, REF. 67).
The relationship between chromatin modification
and co-expression has been most elegantly demonstrated in yeast. Yeast contains a family of five related
HDACs. Using CHROMATIN IMMUNOPRECIPITATION and intergenic microarrays to generate genome-wide HDAC
enzyme-activity maps, Robyr et al.67 reported a striking division of labour that enables yeast to modify the
chromatin to activate a block of genes that are associated with a given function. Hda1, for example, deacetylates subtelomeric domains that contain normally
repressed genes that are used instead for gluconeogenesis, growth on carbon sources other than glucose and
adverse growth conditions. By contrast, Hos1/Hos3
and Hos2 preferentially affect ribosomal DNA and
ribosomal protein genes, respectively.
In humans, there is evidence that comparable
mechanisms can explain the inactivation of blocks of
tissue-specific genes. The zinc-finger gene-specific
repressor element RE-1 silencing transcription factor
(REST) can mediate restriction of gene activity in
non-neuronal tissues by imposing active repression
through histone deacetylase recruitment38. Through
the recruitment of an associated co-repressor,
CoREST, it can also enable long-term gene silencing
that spreads down the chromosome38, affecting transcriptional units that do not themselves contain REST
response elements.
8
Average number of tissues per gene
7
6
0.6
5
4
0.5
3
2
0.4
1
0
Average proportion of intronic GC per gene
0.7
Expression breadth
GC
0.3
0
20
40
60
80
100
120
140
Position (Mb)
Figure 2 | Expression breadth and surrounding GC content along human chromosome 11. The figure shows the number of analysed tissues in which a gene is
expressed and the local GC content; both are averaged over a sliding window of 15 genes drawn across human chromosome 11. Genes with high breadth of expression
(that is, expressed in many tissues) tend to reside in regions in which the local GC content is especially high. This indicates that the ISOCHORIC STRUCTURE of the human
genome (the regional variation in GC) might reflect underlying selection for transcriptional competency. Modified with permission from REF. 72 © (2003) Oxford Univ. Press.
304
| APRIL 2004 | VOLUME 5
www.nature.com/reviews/genetics
REVIEWS
SC-35 DOMAINS
A set of 10–30 prominent
domains of the eukaryotic
nucleus that are concentrated in
mRNA metabolic factors. They
are probably important in
organizing euchromatin
domains.
NUCLEOSOMAL FIBRE
Fibre of chromatin that is made
up of nucleosomes.
NATURE REVIEWS | GENETICS
Three-dimensional structure and intra-nuclear position.
We know that targeting a gene to the periphery of the
yeast nucleus induces silencing68, which indicates that
the location within the nucleus might be an important
component in promoting or repressing transcription.
Indeed, interphase chromosomes in many species occupy
unique, relatively compact positions in the nucleus60.
Moreover, gene-dense chromosomes tend to be more
central in the nucleus60,69, which also indicates that there
might be a relationship between three-dimensional
position and expression.
Can the need to be at a particular intra-nuclear location also drive the evolution of similarly expressed genes
to cluster in particular chromosomal regions? This has
long been considered a possibility for rRNA genes.
Linkage of these genes makes sense because they are
associated with the nucleolus: the factory that enables
their rapid expression.
Are there other intra-nuclear structures that might
be of importance? SC-35 DOMAINS are one such group of
structures that could promote gene clustering70.
Typically, eukaryotic nuclei contain 10–30 prominent
domains that are concentrated in mRNA metabolic factors. Gene-rich reverse-chromosomal bands71 show
extensive contact with these domains70, which tallies
with the tendency for domains of broadly/highly
expressed genes to be located in GC-rich R-bands72 (see
also FIG. 2). Shopland et al.70 argue that these findings
indicate a functional rationale for gene clustering in
chromosomal bands, which relates to nuclear clustering
of genes with SC-35 domains. They propose a model of
SC-35 domains as functional centres for a multitude of
clustered genes, forming local euchromatic ‘neighbourhoods’. This model also indicates a mechanism for
restricting expression even in euchromatin — that is,
the chromatin might be open but if the DNA is not
associated with SC-35 domains, transcription will be
limited.
However, whether nuclear location and chromosomal clustering are as tightly coordinated remains
unclear. Analysis of tRNA genes73 suggests a different
story: the genes are associated with the nucleolus but
are not co-localized on the chromosomes. So, intranuclear location could determine the potential for gene
expression, but might not necessarily lead to evolution
of clustering of the genes on a chromosome.
Analysis of co-regulated genes in yeast supports the
idea that selection that acts on gene location might not
result in the genes being clustered, but might nonetheless drive non-random gene order48. Képès proposed
that the three-dimensional arrangement of genes within
the nucleus might underpin the regular spacing of genes
that are under the control of a given transcription factor.
Specifically, if the DNA NUCLEOSOMAL FIBRE folds into
topologically closed loops of regular size, the promoters
of these regularly spaced genes would cluster in a small
region of the nuclear space. This model fits with the
‘active chromatin hub’ model of gene regulation62. In
this model, at least two cis-acting regulatory structures,
at either end of a broadly defined region, come together
in three-dimensional space to form a DNA loop. Gene
expression is then allowed only in close proximity to the
point at which the elements meet. Multiple loops would
then act to enable co-expression of regularly spaced
genes and inhibition of intervening genes.
Between-species comparison of co-expression modes. Do
the mechanisms of co-expression vary between species?
Certainly, clustering of co-expressed genes in all eukaryotic genomes does not necessarily imply a common
underlying mechanism. For example, operons are common in the worm20, but are rarely found in other eukaryotes55–57. Similarly, although it is probable that in all
species, tandem duplicates contribute to co-expression,
in the worm these happen to be unusually common21.
Bidirectional promoters might explain many incidences
of co-expression of gene pairs in yeast18,52 but by no
means all of them19. Their role in other species is now
starting to be examined on a genomic scale54.
Less clear is the importance of chromatin-level regulation. Histone-modified genomic domains in yeast are
now well described67. However, a whole-genome analysis in the worm revealed little evidence that broad-scale
effects mediate co-expression21. It is still unclear how
common chromatin-mediated inactivation of broad
spans of genes is in the human genome5,38.
So, differences between species in the mechanisms
that promote non-random gene order seem to be largely
quantitiative rather than qualitative. Nonetheless, there
might be mechanisms that truly are limited to certain
taxa. In flies, for example, there is a coupling of the timing of replication and initiation of transcription74, but
no such effect is seen in yeast75. Similarly, methylation,
although rare in D. melanogaster and common in plants
and vertebrates, is absent in yeast76.
Formation and maintenance of clusters
Why might clusters have formed? To address this issue,
we first ask whether non-random organization is itself
evidence for selection on gene order? We then address
whether it is adequate to suppose that, because the
current organization allows co-regulation, selection for
co-regulation drives the aggregation process.
Non-random gene order need not necessarily
imply the activity of selection. First, if gene expression is a noisy process, then opening chromatin to
allow expression from one gene might incidentally
allow leaky expression of linked genes24. Given that in
D. melanogaster, the large regions of co-expression are
not also regions in which the genes are functionally
related24, this model cannot be trivially dismissed.
Second, the random model was a poor null because
it failed to make allowance for biases in the rates and
dimensions of various forms of gene rearrangement
(duplication, transposition/retroposition, translocation,
inversion, and so on), and in the parameters that differ
between species77–79 and between chromosomes80.
Removal of tandem duplicates is desirable as it attempts
to correct for these known biases. Retroposition might
also cause such a bias in gene order: insertion of retroposing viruses seems to be more common in open chromatin81. Such a bias alone could explain, in principle,
VOLUME 5 | APRIL 2004 | 3 0 5
REVIEWS
Box 2 | Supergene clusters with low recombination rates
Supergene clusters are genomic regions in which selection favours tight linkage to
maintain linkage disequilibrium between alternative alleles at two or more loci. The
mating-type locus of the single-celled green alga Chlamydomonas reinhardtii is an
example. For instance, the chloroplasts in the zygote of this species are derived from
both parental cells but a ‘destruction’ allele in one of the gametes eliminates the
chloroplast genomes of the mating partner before SPORULATION. Haploid gametes with
this allele should be under selection to avoid mating with each other. Assuming that
uniparental inheritance is beneficial, cells without this allele will be under selection to
mate with a partner that does have the allele. Selection can then favour the linkage of the
organelle-inheritance allele with a mating-type allele, as linkage disequilibrium between
them reduces the rate of the more deleterious matings: destroyer with destroyer, nondestroyer with non-destroyer. Therefore, it is predicted119,120 that mating-type (+ and –
type) and organelle-inheritance alleles should come to be linked and to be in strict
linkage disequilibrium (all gametes of one mating type should be the destroyer type,
whereas all gametes of the opposite mating type should be the non-destroyers). This is
what is seen96,121. The genome region has features that minimize the recombination rate
within it96, including inversions, rearrangements and insertions.
The other well-described supergene clusters are segregation distorters, such as Sd in
flies and t-complex in mice98. In the simplest model, at around the time of male meiosis,
a toxin is given to all sperm, but the anti-toxin is restricted to those sperm that contain
the anti-toxin allele. Selection strongly favours linkage of the alleles for toxin and antitoxin, as a chromosome that bears the toxin allele but not the anti-toxin allele is
immediately eliminated from the population122,123. As predicted, the genes are usually in
regions of low recombination (for example, centromeres) and often have inversions98.
SD has at least two loci, Sd and Rsp. Sd+ is the toxic allele and alleles at Rsp determine
sensitivity to the toxin. The two loci span the centromere on chromosome 2 and are
often associated with an inversion. As predicted, a modifying allele (E(Sd)) that increases
the extent of segregation distortion is linked to and is in linkage disequilibrium with SD,
residing between Sd+ and Rsp.
SPORULATION
A defence mechanism of
microbes in response to
unfavourable environmental
conditions that results in spores
that are highly resistant to
physical and chemical abuse.
LINKAGE DISEQUILIBRIUM
Non-random assortment of
alleles at different, usually linked,
loci. Low population size and
selection can increase linkage
disequilibrum whereas
recombination reduces linkage.
MEIOTIC DRIVE
A departure from Mendelian
segregation of chromosomes.
MAJOR HISTOCOMPATIBILITY
COMPLEX
(MHC). MHC molecules bind
peptide fragments that are
derived from pathogens and
display them on the cell surface
for recognition by the appropriate
T cells. The organizations of the
MHC gene clusters are similar in
many species.
GENE CONVERSION
Non-reciprocal transfer between
a pair of non-allelic or allelic
DNA sequences during meiosis
and mitosis.
306
| APRIL 2004 | VOLUME 5
why gene density is not random and why highly
expressed genes tend to reside in regions of highest gene
density8 — that is, highly expressed genes have the highest probability of being in open chromatin and therefore
of having new genes inserted in close proximity. Similarly,
clustering of organelle-associated genes in the nuclear
genomes of D. melanogaster82 and A. thaliana83 might
reflect nothing more than a block transfer of genes from
organelle to nucleus83.
So, the discovery of more structure to genomes
than was previously foreseen need not implicate a role
for selection. However, the presence of functional clusters does indicate that selection is important. So, under
the assumption that selection might favour certain
genes to be co-expressed, can we suppose that this will
explain the evolution of clusters? Insertion of a gene
into the region might well directly affect its expression
profile. For example, if a gene moves into a chromosomal domain that is regulated by Hda1, we might
assume that regulation of the cluster will affect its
activity. This type of model could be tested by analysis
of expression characteristics of de novo retrovirus
insertions to find out whether those that are inserted
into transcriptionally more competent chromatin are
more likely to be expressed. Preliminary data support
the idea that high GC content of insertion sites might
be necessary for activity84. This tallies with reports that
gene-expression parameters vary with the GC content
of the flanking sequence8,72,85.
Some types of co-regulation might require additional
steps, such as the evolution of bidirectional promoters,
or the establishment of operons. This dislocation between
the eventual co-expression of genes and the reason for the
assemblage of operons has been noted previously86,87.
Consider the evolution of an operon with two functionally related but initially unlinked genes (A and B). Why
might A and B be favoured to come into closer proximity? Lawrence86,87 argues that before the evolution of the
polycistronic transcript, it cannot be supposed that
ever-closer linkage means ever-tighter co-expression.
Consequently, although selection might favour the
adsorption of the two genes into a single operon, once
they are tightly linked, selection for co-expression cannot
explain the original evolution of linkage. There are at
least two alternative explanations to account for the initial co-location of the genes: either that there was some
other force promoting linkage or that chance happens to
put the two genes into proximity.
Selection for linkage independent of selection on coexpression. What other forces might promote linkage?
Lawrence86 suggests that linkage of functionally related
genes in prokaryotes might enable simultaneous horizontal transfer. However, there is evidence against this
model in prokaryotes88, and its relevance to eukaryotes
is limited. More importantly, although interest has
understandably concentrated on the relationship
between expression and linkage, a body of work on
population-genetic theory of the evolution of the
recombination rate89–92 (and therefore of linkage) has
been relatively overlooked. In 1930, for example, Fisher93
noted that if, in a haploid, alleles A and B together confer high fitness, as do a and b, although Ab and aB are of
low fitness, selection will favour the establishment and
maintenance of LINKAGE DISEQUILIBRIUM between them to
form AB and ab clusters that are rarely broken by
recombination owing to the close genetic position.
Although numerous examples have been discussed15,90,94,95, perhaps the strongest evidence has
come from the examination of the mating-type loci of
Chlamydomonas reinhardtii96 and MEIOTIC-DRIVE ‘genes’97,98
(BOX 2). A problem with these examples, as with the finding of the clustering of imprinted genes11, is that they
might simply be strange phenomena that are associated
with strange genes. Similarly, it is indicated that in the
MAJOR HISTOCOMPATIBILITY COMPLEX (MHC), new beneficial
alleles can be created by GENE CONVERSION99. Although this
might provide selection for linkage (see also REF. 100),
this could only apply to genes in the same family.
The issue, then, is whether population-genetic forces
that promote linkage can have broad-scale effects on
gene order. Recent evidence indicates that this is possible. In both yeast101 and worm101,102, essential genes
(those for which the knockout is not viable) cluster in
the genome. In both cases, the clusters are associated
with low recombination rates (FIG. 3), indicating that a
population genetics model for linkage might be needed.
What might be going on? There might be a simple
NEUTRALIST explanation: recombination might enable the
production of tandem duplicates. As duplicates tend to
www.nature.com/reviews/genetics
REVIEWS
be non-essential103, regions of high recombination might
be regions of clusters of non-essential genes. However,
we find that only relatively few tandem duplicates reside
in regions of high recombination, and this slight bias
cannot explain the observed association between gene
dispensability and recombination rate (C.P., unpublished observations). Additionally, Pál and Hurst101 show,
in yeast, that this clustering is not associated with coexpression. Nei91 showed that if deleterious alleles that
are maintained at mutation-selection equilibrium interact with positive epistasis (organisms with two mutations are not as badly affected as expected given the
effects of each mutation alone), then selection favours
linkage of the genes and reduced recombination rates.
He argues that essential genes are more likely to harbour
positive epistatic mutations104. Similarly, inversions105 in
flies have emerged as suppressors of recombination to
maintain a positive epistastic relationship among loci
within the gene rearrangements. Alternatively, Gessler
and Xu106 note that the strength of selection on an
enhancer of recombination is weaker if the strength of
selection on the deleterious mutations in two linked
genes is larger, as expected for essential genes.
An alternative possibility is that selection might promote relatively important genes to be in mutational cold
spots, which also correspond to regions of low recombination107,108. Pál and Hurst found no evidence that the
essential genes had especially low mutation rates in yeast.
This interpretation, however, has been put forward to
explain clustering of genes of similar synonymous substitution rates (a proxy for the mutation rate) in the
human genome109. It is notable that genes in mutation
cold spots are biased109 towards essential cellular processes
(gene regulation, RNA processing, and so on).
NEUTRALIST [MODEL]
Evolutionary model that
assumes that the trait being
investigated has no selective
advantage. Changes in allele
frequency are said to be the
result of chance (drift) alone.
a
Are clusters maintained by selection? So, how and why
clusters of genes form remains unclear, but can we
say anything about whether selection acts to maintain them once they are formed? Although high rates
of gene-order evolution have been taken as evidence of
an absence of constraint110, the most detailed analysis
so far supports a role for selection111. In yeast, there are
at least two strong independent predictors of the probability that given gene pairs are still linked in Candida
albicans: the intergene spacer size and the degree of coexpression111. The role of intergene spacer is consistent
with a simple null neutralist model in which only
rearrangements with breakpoints between genes are
tolerated. However, co-expressed genes remain linked
more than expected, which indicates that selection
might favour their retention as a pair (see also REF. 112).
Linked pairs of essential genes in yeast are also retained
as linked more often than expected101. It is unclear then
why clusters of metabolically related genes are not
especially well-conserved between species17. One possible explanation is that selection for the importance of
any given metabolic pathway varies over time in a
given lineage. Another possibility is that linkage is not
under selection.
If co-expression is on the broader chromatin level
scale, do we expect similar selection on gene order? If we
take Hda1 control of spans of genes in yeast that are
associated with stress response67 as a model example,
then the answer must be no. If the cluster was formed
under selection, then we expect the cluster to be
retained together within the relevant chromosomal
domain, but the precise order and orientation of genes
need not be under selection. This suggestion has yet to
receive much systematic scrutiny, although it has
b
6
2.5
Number of essential genes
Recombination rate
(deviation from mean)
5
2.0
4
Recombination rate
1.5
3
2
1
1.0
0.5
0.0
Centromere
0
–0.5
–1
–1.0
–2
–1.5
5′
3′
Position on chromosome 9
0
1
2
3
4
5
6
N
Figure 3 | The number of essential genes and the recombination rate along yeast chromosome 9. a | Sliding window
analysis. The figure shows the number of essential genes and the recombination in a sliding window (of 10 genes) drawn across
yeast chromosome 9. Note that in general, if the recombination rate is low, the number of essential genes is high. The recombination
rate is assayed as the number of standard deviations from the mean recombination rate (that is, +1 = one standard deviation above
the mean). The inset box illustrates the same data for non-overlapping windows (N = number of essential genes, Recombination
rate = recombination rate assayed in standard deviations). Modified with permission from REF. 101 © (2003) Macmillan Magazines
Ltd. b | Non-overlapping window analysis figure. The recombination rate in a 10-gene window is plotted against the number of
essential genes in that window. Blocks with many essential genes only ever have low recombination rates.
NATURE REVIEWS | GENETICS
VOLUME 5 | APRIL 2004 | 3 0 7
REVIEWS
EFFECTIVE POPULATION SIZE
The number of individuals in a
population that contribute to
the next generation. It never
exceeds the actual population
size.
1.
2.
3.
4.
5.
6.
7.
8.
308
recently been claimed that the MHC has conserved gene
composition but non-conserved gene order113.
Summary and outlook
It is no longer tenable to suppose that gene order in
eukaryotes is random. Parallel advances in our understanding of the control of gene expression and their
distribution in the genome have led to a new, more
organized, view. Although we are not yet at a position in
which we can present a complete integration of the bioinformatic results, the understanding of chromatin and
the role of intra-nuclear location, such an integration is
both necessary and realistic.
Nonetheless, if eukaryotic gene order is not random,
what sort of model might take its place? The idea of the
genome as, in part, a series of chromosomal blocks, each
being opened for the potential for transcription or inactivated under particular conditions, seems like a helpful,
guiding new model. It agrees well with the notion that
stem cells generally have open chromatin and that part
of the development of specificity is the inactivation of
particular spans of genes5. It also tallies with the evidence for a region of downregulation of neuronalspecific genes38 and with the division of labour between
histone deacetylases in yeast67.
However, as always, a new model generates new
questions. For example, is the extent of genome organization the same in all species? We have been struck by
the extent to which many patterns are highly discernible
in yeast. Of all the complete genomes, yeast has the
highest degree of linkage of genes that have proteins that
are involved in the same metabolic pathway17, it shows
the most striking clustering of essential genes into
regions of low recombination101 and has many incidences of highly coordinated expression of linked
Kleinjan, D. J. & van Heyningen, V. Position effect in human
genetic disease. Hum. Mol. Genet. 7, 1611–1618 (1998).
Glinsky, G. V., Krones-Herzig, A. & Glinskii, A. B.
Malignancy-associated regions of transcriptional activation:
gene expression profiling identifies common chromosomal
regions of a recurrent transcriptional activation in human
prostate, breast, ovarian, and colon cancers. Neoplasia 5,
218–228 (2003).
Zhou, Y. et al. Genome-wide identification of chromosomal
regions of increased tumor expression by transcriptome
analysis. Cancer Res. 63, 5781–5784 (2003).
Joos, S. et al. Variable breakpoints in Burkitt lymphoma cells
with chromosomal t(8;14) translocation separate c-myc and
the IgH locus up to several hundred kb. Hum. Mol. Genet. 1,
625–632 (1992).
Akashi, K. et al. Transcriptional accessibility for genes of
multiple tissues and hematopoietic lineages is hierarchically
controlled during early hematopoiesis. Blood 101, 383–389
(2003).
A quality analysis that supports the hypothesis that
stem cells possess a wide-open chromatin structure
to maintain their multipotentiality, which is
progressively quenched as they go down a particular
pathway of differentiation.
Zhang, H., Pan, K. H. & Cohen, S. N. Senescence-specific
gene expression fingerprints reveal cell-type-dependent
physical clustering of upregulated chromosomal loci. Proc.
Natl Acad. Sci. USA. 100, 3251–3256 (2003).
Milot, E. et al. Heterochromatin effects on the frequency and
duration of LCR-mediated gene transcription. Cell 87,
105–114 (1996).
Versteeg, R. et al. The human transcriptome map reveals
extremes in gene density, intron length, GC content, and
repeat pattern for domains of highly and weakly expressed
genes. Genome Res. 13, 1998–2004 (2003).
| APRIL 2004 | VOLUME 5
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
genes19,52. We can also imagine reasons why genomes
might vary in the extent to which they are organized.
One possibility is that organisms with a large EFFECTIVE
POPULATION SIZE (which we assume yeast must have)
should be able to resist the spread of weakly deleterious
mutations and therefore are expected to be more ‘optimally’ organized114. Alternatively, we might expect the
genomes of more ‘complex’ organisms to be more ‘organized’, and as such, organization might be necessary in
development5. Indeed, could a connection between GC
content and regional transcriptional competence8,72,85,115
explain the evolution of isochores in mammals (FIG. 2)?
The above new model supposes that selection that
favours coordinated control of gene expression is the only
reason for gene-order evolution. However, not only is it
often difficult to eliminate sophisticated neutralist models, but there are counter-examples to indicate that selection can favour linkage for other reasons. Although the
need to evoke alternative models has been advocated87
and evidence that population genetics models are needed
has been provided101, the relevance of prior population
genetics theory for gene-order evolution89,91 is uncertain.
More generally, there is the need to develop a theory
of genome-organization evolution, taking into account
the mechanisms of genome rearrangement, mechanisms of control of gene expression and the evolutionary forces that result from different interactions of loci.
Understanding how genes are rearranged will be important in defining a more appropriate null. Moreover, different mechanisms have different population-genetic
consequences. Inversions alter recombination rates and
duplicates can mask deleterious mutations, whereas
translocations might disrupt meiosis. Modelling the evolution of gene order, from both selective and neutralist
perspectives, represents a considerable challenge.
Festenstein, R. et al. Locus control region function and
heterochromatin-induced position effect variegation.
Science 271, 1123–1125 (1996).
Kuhn, E. J. & Geyer, P. K. Genomic insulators: connecting
properties to mechanism. Curr. Opin. Cell Biol. 15, 259–265
(2003).
Reik, W. & Walter, J. Genomic imprinting: parental influence
on the genome. Nature Rev. Genet. 2, 21–32 (2001).
Turner, G. in The Eukaryotic Genome (eds Broda, P.,
Oliver, S. G. & Sims, P. F. G.) 107–125 (Cambridge Univ.
Press, Cambridge, 1993).
Cooper, D. N. Human Gene Evolution (BIOS Scientific,
Oxford, 1999).
Hughes, A. L. & Yeager, M. Molecular evolution of the
vertebrate immune system. Bioessays 19, 777–786
(1997).
Korol, A. B., Preigel, I. A. & Preigel, S. I. Recombination
Variability and Evolution (Chapman and Hall, London, 1994).
Brayton, K. A. et al. Two genes for de novo purine
nucleotide synthesis on human chromosome 4 are closely
linked and divergently transcribed. J. Biol. Chem. 269,
5313–5321 (1994).
Lee, J. M. & Sonnhammer, E. L. L. Genomic gene clustering
analysis of pathways in eukaryotes. Genome Res. 13,
875–882 (2003).
First systematic evidence that, in eukaryotes, genes
from the same metabolic pathway tend to cluster. The
study reveals striking differences between species in
the extent to which this is true.
Cho, R. J. et al. A genome-wide transcriptional analysis of
the mitotic cell cycle. Mol. Cell 2, 65–73 (1998).
Cohen, B. A., Mitra, R. D., Hughes, J. D. & Church, G. M.
A computational analysis of whole-genome expression data
reveals chromosomal domains of gene expression. Nature
Genet. 26, 183–186 (2000).
20.
21.
22.
23.
24.
25.
26.
27.
Early systematic evidence that adjacent pairs of
genes, as well as nearby non-adjacent pairs of genes,
show correlated expression.
Blumenthal, T. et al. A global analysis of Caenorhabditis
elegans operons. Nature 417, 851–854 (2002).
Evidence that in the worm genome, operons are not a
rare peculiarity: it contains at least 1,000 operons that
are 2–8 genes long, which contain approximately 15%
of all C. elegans genes.
Lercher, M. J., Blumenthal, T. & Hurst, L. D. Co-expression
of neighboring genes in Caenorhabditis elegans is mostly
due to operons and duplicate genes. Genome Res. 13,
238–243 (2003).
Roy, P. J., Stuart, J. M., Lund, J. & Kim, S. K. Chromosomal
clustering of muscle-expressed genes in Caenorhabditis
elegans. Nature 418, 975–979 (2002).
Boutanaev, A. M., Kalmykova, A. I., Shevelyou, Y. Y. &
Nurminsky, D. I. Large clusters of co-expressed genes
in the Drosophila genome. Nature 420, 666–669
(2002).
Spellman, P. T. & Rubin, G. M. Evidence for large domains of
similarly expressed genes in the Drosophila genome. J. Biol.
1, 5 (2002).
Robust report to show that groups of adjacent and
co-regulated genes, which are not otherwise
functionally related in any obvious way, can be
identified by expression profiling in D. melanogaster.
Williams, E. J. B. & Bowles, D. J. Co-expression of
neighbouring genes in the genome of Arabidopsis thaliana.
Genome Res. (in the press).
Birnbaum, K. et al. A gene expression map of the
Arabidopsis root. Science 302, 1956–1960 (2003).
Zhu, T. Global analysis of gene expression using
GeneChip microarrays. Curr. Opin. Plant Biol. 6, 418–425
(2003).
www.nature.com/reviews/genetics
REVIEWS
28. Khavkin, E. & Coe, E. Mapped genomic locations for
developmental functions and QTLs reflect concerted groups
in maize (Zea mays L.). Theor. Appl. Genet. 95, 343–352
(1997).
29. Lercher, M. J., Urrutia, A. O. & Hurst, L. D. Clustering of
housekeeping genes provides a unified model of gene order
in the human genome. Nature Genet. 31, 180–183 (2002).
Evidence that genes that are expressed in most
tissues tend to cluster. This is proposed to explain
why highly epxressed genes cluster and why cDNAs
extracted from any given tissue show clustering.
30. Ko, M. S. H. et al. Genome-wide mapping of unselected
transcripts from extraembryonic tissue of 7.5-day mouse
embryos reveals enrichment in the t-complex and underrepresentation on the X chromosome. Hum. Mol. Genet. 7,
1967–1978 (1998).
31. Bortoluzzi, S. et al. A comprehensive, high-resolution
genomic transcript map of human skeletal muscle. Genome
Res. 8, 817–825 (1998).
32. Dempsey, A. A., Pabalan, N., Tang, H. & Liew, C.-C.
Organization of human cardiovascular-expressed genes on
chromosomes 21 and 22. J. Mol. Cell. Cardiol. 33, 587–591
(2001).
33. Gabrielsson, B. L., Carlsson, B. & Carlsson, L. M. S. Partial
genome scale analysis of gene expression in human
adipose tissue using DNA array. Obes. Res. 8, 374–384
(2000).
34. Yang, Y. S. et al. Chromosome localization analysis of genes
strongly expressed in human visceral adipose tissue.
Endocrine 18, 57–66 (2002).
35. Soury, E. et al. Chromosomal assignments of mammalian
genes with an acute inflammation-regulated expression in
liver. Immunogenet. 53, 634–642 (2001).
36. Caron, H. et al. The human transcriptome map: clustering of
highly expressed genes in chromosomal domains. Science
291, 1289–1292 (2001).
First systematic evidence for clustering of genes in
the human genome according to their expression
profile.
37. Surralles, J., Ramirez, M. J., Marcos, R., Natarajan, A. T. &
Mullenders, L. H. Clusters of transcription-coupled repair in
the human genome. Proc. Natl Acad. Sci. USA 99,
10571–10574 (2002).
38. Lunyak, V. V. et al. Co-repressor-dependent silencing of
chromosomal regions encoding neuronal genes. Science
298, 1747–1752 (2002).
Elegant evidence for the existence of a large domain
of downregulation of genes in non-neuronal tissues
39. Megy, K., Audic, S. & Claverie, J. M. Positional clustering of
differentially expressed genes on human chromosomes 20,
21 and 22. Genome Biol. 4, P1 (2003).
40. Reymond, A. et al. Human chromosome 21 gene
expression atlas in the mouse. Nature 420, 582–586 (2002)
41. Fukuoka, Y., Inaoka, I. & Kohane, I. S. Inter-species
differences of co-expression of neighboring genes in
eukaryotic genomes. BMC Genomics 5, 4 (2004).
42. Vieira, C. P., Vieira, J. & Hartl, D. L. The evolution of small
gene clusters: evidence for an independent origin of the
maltase gene cluster in Drosophila virilis and Drosophila
melanogaster. Mol. Biol. Evol. 14, 985–993 (1997).
43. Teichmann, S. & Veitia, R. Genes encoding subunits of
stable complexes are clustered on the yeast chromosomes.
Genetics (in the press).
44. Tuberosa, R. et al. Mapping QTLs regulating morphophysiological traits and yield: case studies, shortcomings
and perspectives in drought-stressed maize. Ann. Bot. 89,
941–963 (2002).
45. Santos, C. A. F. & Simon, P. W. QTL analyses reveal
clustered loci for accumulation of major provitamin A
carotenes and lycopene in carrot roots. Mol. Genet.
Genomics 268, 122–129 (2002).
46. Cai, H. W. & Morishima, H. QTL clusters reflect character
associations in wild and cultivated rice. Theor. Appl. Genet.
104, 1217–1228 (2002).
47. Ueda, H. R. et al. Genome-wide transcriptional
orchestration of circadian rhythms in Drosophila. J. Biol.
Chem. 277, 14048–14052 (2002).
48. Képès, F. Periodic epi-organization of the yeast genome
revealed by the distribution of promoter sites. J. Mol. Biol.
329, 859–865 (2003).
Elegant evidence that in yeast, genes that are
controlled by the same sequence-specific
transcription factor tend to be regularly spaced along
the chromosome arms. It is proposed that these
regularities are consistent with a genome-wide loop
model of chromosomes, in which coregulated genes
tend to dynamically co-localize in 3D.
49. Mannila, H., Patrikainen, A., Seppanen, J. K. & Kere, J.
Long-range control of expression in yeast. Bioinformatics
18, 482–483 (2002).
NATURE REVIEWS | GENETICS
50. Balazsi, G., Kay, K. A., Barabasi, A. L. & Oltvai, Z. N.
Spurious spatial periodicity of co-expression in microarray
data due to printing design. Nucleic Acids Res. 31,
4425–4433 (2003).
51. Papp, B., Pál, C. & Hurst, L. D. Evolution of cis-regulatory
elements in duplicated genes of yeast. Trends Genet. 19,
417–422 (2003).
52. Kruglyak, S. & Tang, H. Regulation of adjacent yeast genes.
Trends Genet. 16, 109–111 (2000).
53. Wright, K. L. et al. Coordinate regulation of the human Tap1
and Lmp2 genes from a shared bidirectional promoter. J.
Exp. Med. 181, 1459–1471 (1995).
54. Trinklein, N. D. et al. An abundance of bidirectional promoters
in the human genome. Genome Res. 14, 62–66 (2004).
55. Gray, T. A., Saitoh, S. & Nicholls, R. D. An imprinted,
mammalian bicistronic transcript encodes two independent
proteins. Proc. Natl Acad. Sci. USA 96, 5616–5621 (1999).
56. Reiss, J. et al. Mutations in a polycistronic nuclear gene
associated with molybdenum cofactor deficiency. Nature
Genet. 20, 51–53 (1998).
57. Nanbru, C. et al. Translation of the human c-myc P0
tricistronic mRNA involves two independent internal
ribosome entry sites. Oncogene 20, 4270–4280 (2001).
58. Hawkins, A. R. The complex Arom locus of Aspergillus
nidulans. Evidence for multiple gene fusions and convergent
evolution. Curr. Genet. 11, 491–498 (1987).
59. Zhang, X. & Smith, T. F. Yeast ‘operons’. Microb. Comp.
Genomics 3, 133–140 (1998).
60. Cremer, T. & Cremer, C. Chromosome territories, nuclear
architecture and gene regulation in mammalian cells. Nature
Rev. Genet. 2, 292–301 (2001).
61. van Driel, R., Fransz, P. F. & Verschure, P. J. The eukaryotic
genome: a system regulated at different hierarchical levels.
J. Cell Sci. 116, 4067–4075 (2003).
62. de Laat, W. & Grosveld, F. Spatial organization of gene
expression: the active chromatin hub. Chromosome Res.
11, 447–459 (2003).
63. Eberharter, A. & Becker, P. B. Histone acetylation: a switch
between repressive and permissive chromatin. Second in
review series on chromatin dynamics. EMBO Rep. 3,
224–229 (2002).
64. Strahl, B. D. & Allis, C. D. The language of covalent histone
modifications. Nature 403, 41–45 (2000).
65. Turner, B. M. Cellular memory and the histone code. Cell
111, 285–291 (2002).
66. Labrador, M. & Corces, V. G. Setting the boundaries of
chromatin domains and nuclear organization. Cell 111,
151–154 (2002).
67. Robyr, D. et al. Microarray deacetylation maps determine
genome-wide functions for yeast histone deacetylases. Cell
109, 437–446 (2002).
Acetylation microarrays are used to uncover a striking
‘division of labour’ for yeast histone deacetylases,
with individual deacetylases controlling highly
specific chromosomal domains.
68. Andrulis, E. D., Neiman, A. M., Zappulla, D. C. & Sternglanz, R.
Perinuclear localization of chromatin facilitates transcriptional
silencing. Nature 394, 592–595 (1998).
A manipulation experiment that shows that
perinuclear localization helps to establish
transcriptionally silent chromatin.
69. Tanabe, H. et al. Evolutionary conservation of chromosome
territory arrangements in cell nuclei from higher primates.
Proc. Natl Acad. Sci. USA 99, 4424–4429 (2002).
70. Shopland, L. S., Johnson, C. V., Byron, M., McNeil, J. &
Lawrence, J. B. Clustering of multiple specific genes and
gene-rich R-bands around SC-35 domains: evidence for
local euchromatic neighborhoods. J. Cell Biol. 162,
981–990 (2003).
Evidence that chromosomal bands relate to nuclear
clustering of genes around SC-35 domains.
71. Saccone, S., Pavlicek, A., Federico, C., Paces, J. &
Bernardi, G. Genes, isochores and bands in human
chromosomes 21 and 22. Chromosome Res. 9, 533–539
(2001).
72. Lercher, M. J., Urrutia, A. O., Pavlicek, A. & Hurst, L. D.
A unification of mosaic structures in the human genome.
Hum. Mol. Genet. 12, 2411–2415 (2003).
73. Thompson, M., Haeusler, R. A., Good, P. D. & Engelke, D. R.
Nucleolar clustering of dispersed tRNA genes. Science 302,
1399–1401 (2003).
tRNA genes are shown to be unclustered in one
dimension (linear order on chromosomes) but highly
clustered when considered in three dimensions (that
is, intra-nuclear location).
74. Schubeler, D. et al. Genome-wide DNA replication profile for
Drosophila melanogaster: a link between transcription and
replication timing. Nature Genet. 32, 438–442 (2002).
75. Raghuraman, M. K. et al. Replication dynamics of the yeast
genome. Science 294, 115–121 (2001).
76. Regev, A., Lamb, M. J. & Jablonka, E. The role of DNA
methylation in invertebrates: developmental regulation or
genome defense? Mol. Biol. Evol. 15, 880–891 (1998).
77. Coghlan, A. & Wolfe, K. H. Fourfold faster rate of genome
rearrangement in nematodes than in Drosophila. Genome
Res. 12, 857–867 (2002).
78. Seoighe, C. et al. Prevalence of small inversions in yeast
gene order evolution. Proc. Natl Acad. Sci. USA 97,
14433–14437 (2000).
79. Ranz, J. M., Gonzalez, J., Casals, F. & Ruiz, A. Low
occurrence of gene transposition events during the evolution
of the genus Drosophila. Evolution 57, 1325–1335 (2003).
80. Gonzalez, J., Ranz, J. M. & Ruiz, A. Chromosomal elements
evolve at different rates in the Drosophila genome. Genetics
161, 1137–1154 (2002).
81. Rynditch, A. V., Zoubak, S., Tsyba, L., Tryapitsina-Guley, N.
& Bernardi, G. The regional integration of retroviral
sequences into the mosaic genomes of mammals. Gene
222, 1–16 (1998).
82. Lefai, E., Fernandez-Moreno, M. A., Kaguni, L. S. &
Garesse, R. The highly compact structure of the
mitochondrial DNA polymerase genomic region of
Drosophila melanogaster: functional and evolutionary
implications. Insect Mol. Biol. 9, 315–322 (2000).
83. Elo, A., Lyznik, A., Gonzalez, D. O., Kachman, S. D. &
Mackenzie, S. A. Nuclear genes that encode
mitochondrial proteins for DNA and RNA metabolism are
clustered in the Arabidopsis genome. Plant Cell 15,
1619–1631 (2003).
84. Glukhova, L. A. et al. Localization of HTLV-1 and HIV-1
proviral sequences in chromosomes of persistently infected
cells. Chromosome Res. 7, 177–183 (1999).
85. Vinogradov, A. E. Isochores and tissue-specificity. Nucleic
Acids Res. 31, 5212–5220 (2003).
86. Lawrence, J. G. & Roth, J. R. Selfish operons: horizontal
transfer might drive the evolution of gene clusters. Genetics
143, 1843–1860 (1996).
87. Lawrence, J. G. Gene organization: selection, selfishness,
and serendipity. Annu. Rev. Microbiol. 57, 419–440 (2003).
88. Pál, C. & Hurst, L. D. Evidence against the selfish operon
hypothesis. Trends Genet. (in the press).
89. Bodmer, W. F. & Parsons, P. A. Linkage and recombination
in evolution. Adv. Genet. 11, 1–100 (1962).
90. Charlesworth, D. & Charlesworth, B. Theoretical genetics of
Batesian mimicry II. Evolution of supergenes. J. Theor. Biol.
55, 305–324 (1975).
91. Nei, M. Modification of linkage intensity by natural selection.
Genetics 57, 625–641 (1967).
92. Otto, S. P. & Lenormand, T. Resolving the paradox of sex
and recombination. Nature Rev. Genet. 3, 252–261 (2002).
93. Fisher, R. A. The Genetical Theory of Natural Selection
(Clarendon, Oxford, 1930).
94. Sinervo, B. & Svensson, E. Correlational selection and the
evolution of genomic architecture. Heredity 89, 329–338
(2002).
95. Ford, E. B. Ecological Genetics (Chapman and Hall,
London, 1971).
96. Ferris, P. J., Armbrust, E. V. & Goodenough, U. W. Genetic
structure of the mating-type locus of Chlamydomonas
reinhardtii. Genetics 160, 181–200 (2002).
97. Hurst, L. D. The evolution of genomic anatomy. Trends Ecol.
Evol. 14, 108–112 (1999).
98. Lyttle, T. W. Segregation distorters. Annu. Rev. Genet. 25,
511–557 (1991).
99. Hogstrand, K. & Bohme, J. Gene conversion can create
new MHC alleles. Immunol. Rev. 167, 305–317 (1999).
100. Hurst, L. D. & Smith, N. G. C. The evolution of concerted
evolution. Proc. R. Soc. Lond. B 265, 121–127 (1998).
101. Pál, C. & Hurst, L. D. Evidence for co-evolution of gene
order and recombination rate. Nature Genet. 33, 392–395
(2003).
Evidence that essential genes reside in regions of low
recombination in yeast and worm. Evidence is also
presented to indicate that this is not to the result of
tandem duplicates or co-expression.
102. Kamath, R. S. et al. Systematic functional analysis of the
Caenorhabditis elegans genome using RNAi. Nature 421,
231–237 (2003).
First large-scale assay of gene dispensability in a
multicellular organism showing that the essential
genes cluster in regions of low recombination.
103. Gu, Z. et al. Role of duplicate genes in genetic robustness
against null mutations. Nature 421, 63–66 (2003).
104. Nei, M. Genome evolution: let’s stick together. Heredity 90,
411–412 (2003).
105. Schaeffer, S. W. et al. Evolutionary genomics of inversions in
Drosophila pseudoobscura: evidence for epistasis. Proc.
Natl Acad. Sci. USA 100, 8319–8324 (2003).
106. Gessler, D. D. & Xu, S. On the evolution of recombination
and meiosis. Genet. Res. 73, 119–131 (1999).
VOLUME 5 | APRIL 2004 | 3 0 9
REVIEWS
107. Lercher, M. J. & Hurst, L. D. Human SNP variability and
mutation rate are higher in regions of high recombination.
Trends Genet. 18, 337–340 (2002).
108. Perry, J. & Ashworth, A. Evolutionary rate of a gene
affected by chromosomal position. Curr. Biol. 9, 987–989
(1999).
109. Chuang, J. H. & Li, H. Functional bias and spatial
organization of genes in mutational hot and cold
regions in the human genome. PLoS Biol. 2, e29 (2004).
110. Ranz, J. M., Casals, F. & Ruiz, A. How malleable is the
eukaryotic genome? Extreme rate of chromosomal
rearrangement in the genus Drosophila. Genome Res. 11,
230–239 (2001).
111. Hurst, L. D., Williams, E. J. B. & Pál, C. Natural selection
promotes the conservation of linkage of co- expressed genes.
Trends Genet. 18, 604–606 (2002).
First evidence that selection acts to preserve linked
pairs of co-expressed genes, even after allowing for
the effect of intergene distance.
112. Huynen, M. A. & Snel, B. in Frontiers in Computational
Genomics (eds Galperin, M. Y. & Koonin, E. V.)
145–166 (Horizon Scientific Press, Wymondham, UK,
2003).
113. Danchin, E. G., Abi-Rached, L., Gilles, A. & Pontarotti, P.
Conservation of the MHC-like region throughout evolution.
Immunogenet. 55, 141–148 (2003).
114. Lynch, M. & Conery, J. S. The origins of genome complexity.
Science 302, 1401–1404 (2003).
115. Vinogradov, A. E. DNA helix: the importance of being GC-rich.
Nucleic Acids Res. 31, 1838–1844 (2003).
116. Durand, D. & Sankoff, D. Tests for gene clustering. J. Comput.
Biol. 10, 453–482 (2003).
117. Lefebvre, J. F., El-Mabrouk, N., Tillier, E. & Sankoff, D.
Detection and validation of single gene inversions.
Bioinformatics 19 (Suppl. 1), I190–I196 (2003).
118. Bradnam, K. R., Seoighe, C., Sharp, P. M. & Wolfe, K. H.
G+C content variation along and among Saccharomyces
cerevisiae chromosomes. Mol. Biol. Evol. 16, 666–675
(1999).
119. Hurst, L. D. Why are there only 2 sexes? Proc. R. Soc.
Lond. B 263, 415–422 (1996).
120. Hutson, V. & Law, R. Four steps to two sexes. Proc. R. Soc.
Lond. B 253, 43–51 (1993).
121. Armbrust, E. V., Ferris, P. J. & Goodenough, U. W. A
mating type-linked gene cluster expressed in
Chlamydomonas zygotes participates in the uniparental
inheritance of the chloroplast genome. Cell 74, 801–811
(1993).
122. Feldman, M. W. & Otto, S. P. A comparative approach
to the theoretical population-genetics theory of
segregation distortion. Am. Nat. 137, 443–456
(1991).
123. Thomson, G. J. & Feldman, M. W. Population genetics of
modifiers of meiotic drive. II. Linkage modification in the
segregation distortion system. Theor. Popul. Biol. 5,
155–162 (1974).
124. Florens, L. et al. A proteomic view of the Plasmodium
falciparum life cycle. Nature 419, 520–526
(2002).
Acknowledgements
We wish to thank two anonymous reviewers and J. Lawrence for
comments on an earlier version of the manuscript. We also thank
F. Grosveld, A. Ward, R. Kelsh and L. Weinert for discussion. M.J.L.
is funded by a Royal Society University Research Fellowship. L.D.H.
and C.P. are funded by the Biotechnology and Biological Sciences
Research Council.
Competing interests statement
The authors declare that they have no competing financial interests.
Online links
DATABASES
The following terms in this article are linked online to:
LocusLink: http://www.ncbi.nlm.nih.gov/LocusLink
AIRC | GPAT | Sd
FURTHER INFORMATION
Laurence Hurst’s web page: http://www.bath.ac.uk/biosci/hurst.htm
Martin Lercher’s web page: http://www.bath.ac.uk/biosci/lercher.htm
Access to this interactive links box is free online.
CORRECTION
TRANSGENE INTROGRESSION FROM GENETICALLY MODIFIED
CROPS TO THEIR WILD RELATIVES
C. Neal Stewart, Matthew D. Halfhill and Suzanne I.Warwick
Nature Reviews Genetics 4, 806–817 (2003); doi:10.1038/nrg1179
In reference to an article that appeared in Nature in 2001 (Quist and Chapela, Nature 414, 541–543 (2001)), it was incorrectly stated that
“After much controversy, Nature retracted the paper because introgression per se was not shown”(page 806).
The article was never formally retracted by Nature (see editorial footnote to Nature 417, 898 (2002)). In fact, Nature concluded that
although “the evidence is not sufficient to justify the publication of the original paper”, it was best “to allow [the] readers to judge the
science for themselves” (see editorial footnote to Nature 416, 600–601 (2002)).
310
| APRIL 2004 | VOLUME 5
www.nature.com/reviews/genetics