Phylogenetic Tests of the Hypothesis of Block Duplication of

Phylogenetic Tests of the Hypothesis of Block Duplication of Homologous
Genes on Human Chromosomes 6, 9, and 1
Austin L. Hughes
Department of Biology and Institute of Molecular Evolutionary Genetics, The Pennsylvania State University
There are 10 gene families that have members on both human chromosome 6 (6p21.3, the location of the human
major histocompatibility complex [MHC]) and human chromosome 9 (mostly 9q33–34). Six of these families also
have members on mouse chromosome 17 (the mouse MHC chromosome) and mouse chromosome 2. In addition,
four of these families have members on human chromosome 1 (1q21–25 and 1p13), and two of these have members
on mouse chromosome 1. One hypothesis to explain these patterns is that members of the 10 gene families of
human chromosomes 6 and 9 were duplicated simultaneously as a result of polyploidization or duplication of a
chromosome segment (‘‘block duplication’’). A subsequent block duplication has been proposed to account for the
presence of representatives of four of these families on human chromosome 1. Phylogenetic analyses of the 9 gene
families for which data were available decisively rejected the hypothesis of block duplication as an overall explanation of these patterns. Three to five of the genes on human chromosomes 6 and 9 probably duplicated simultaneously early in vertebrate history, prior to the divergence of jawed and jawless vertebrates, and shortly after that,
all four of the genes on chromosomes 1 and 9 probably duplicated as a block. However, the other genes duplicated
at different times scattered over at least 1.6 billion years. Since the occurrence of these clusters of related genes
cannot be explained by block duplication, one alternative explanation is that they cluster together because of shared
functional characteristics relating to expression patterns.
Introduction
As increasing numbers of genes are sequenced and
mapped in eukaryotes, it has sometimes been found that
a number of genes clustered together show evidence of
an evolutionary relationship (homology) to genes forming a cluster on another chromosome. Frequently, such
a pattern is attributed to an ancient event of block duplication, that is, duplication of an entire chromosome
segment either as a result of a whole-genome duplication (polyploidization) or by duplication of one chromosomal segment followed, perhaps at a much later
time, by its translocation to another chromosome. For
example, in humans, there are 15 genes on chromosome
6 (location 6p21.3), belonging to 10 gene families,
which show evidence of homology to 10 genes on chromosome 9 (9 of them in 9q33–34) (table 1; Kasahara et
al. 1996). Human chromosome 6 bears the genes of the
major histocompatibility complex (MHC) (Klein 1986),
and these 15 genes are located in the MHC class II and
class III regions (Kasahara et al. 1996). Of these 10 gene
families, 6 also have members on chromosome 17 (the
MHC chromosome) of the mouse, Mus musculus (table
1). These 6 families also have representatives on mouse
chromosome 2 (table 1; Kasahara et al. 1996).
Kasahara et al. (1996, p. 9099) attributed this pattern to an ancient block duplication event that took place
‘‘at an early stage of vertebrate evolution, probably at
or before the emergence of bony fish but after the emergence of the jawless fishes.’’ Similarly, four of the gene
families with representatives on human chromosomes 6
and 9 (and with representatives on mouse chromosomes
Key words: adaptive evolution, gene duplication, genome structure, major histocompatibility complex.
Address for correspondence and reprints: Austin L. Hughes,
Department of Biology, 208 Mueller Laboratory, The Pennsylvania
State University, University Park, Pennsylvania 16803. E-mail:
[email protected].
Mol. Biol. Evol. 15(7):854–870. 1998
q 1998 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038
854
17 and 2) also include members on human chromosome
1 (1p13, Iq21–31) (table 1; Katsanis, Fitzgibbon, and
Fisher 1996). Two of these four families also have representatives on mouse chromosome 1 (table 1; Katsanis,
Fitzgibbon, and Fisher 1996). This pattern was also attributed to an ancient block duplication by Katsanis,
Fitzgibbon, and Fisher (1996).
If two or more genes have been duplicated simultaneously as the result of a block duplication event, this
should be revealed by phylogenetic analyses. In the case
of the genes on human chromosomes 6, 9, and 1, few
of the gene families involved have been subjected to
phylogenetic analyses sufficiently rigorous to determine
the time of gene duplication relative to divergence of
major groups of organisms. Katsanis, Fitzgibbon, and
Fisher (1996) presented phylogenetic trees for retinoid
X receptor (RXR) and pre-B-cell-leukemia transcription
factor (PBX) family (table 1) members, but they included only a small number of sequences, so little could be
inferred about divergence times. In addition, they based
their trees on similarity at nucleotide sites in the entire
coding regions of the genes; this is not a reliable basis
for reconstructing the relationships of genes as distantly
related as those analyzed by these authors, because synonymous nucleotide sites are saturated with changes and
thus convey no evolutionary information. Kasahara et
al. (1996) constructed a phylogenetic tree of just one of
the gene families with members on chromosomes 6 and
9, the proteasome component b (PSMB) gene family.
In the case of chromosomes 6 and 9, Kasahara et
al. (1996) proposed a complicated modification of the
hypothesis of block duplication. They proposed that four
tandem duplications involving three of the gene families
occurred prior to the alleged block duplication. According to these authors, the tandemly duplicated genes all
remained linked for many millions of years (in one case
over a billion years) until shortly after the alleged block
duplication, when four of the genes were deleted while
Testing the Hypothesis of Block Duplication
855
Table 1
Gene Families with Members on Human Chromosomes 6, 9, and 1 and on Mouse Chromosomes 17, 2, and 1
Family
Human
6p21.3
Retinoid X receptor (RXR) . . . . . . . . . . . . . . . . . . . . . RXRB
a pro-collagen (COL) . . . . . . . . . . . . . . . . . . . . . . . . . . COLL11A2
ATP-binding cassette transporter (ABC) . . . . . . . . . . . TAP1
TAP2
Proteasome component b (PSMB) . . . . . . . . . . . . . . . . LMP2
LMP7
Notch (NOTCH) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . INT3
Pre-B-cell-leukemia transcription factor (PBX) . . . . . PBX2
Tenascin (TEN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TNX
C3/C4/C5 complement component (C3/4/5) . . . . . . . . C4A
C4B
Heat shock protein 70 (HSP70) . . . . . . . . . . . . . . . . . . HSP70-1
HSP70-2
HSP70-HOM
Mouse 17
RXRB
COLL11A2
TAP1
TAP2
LMP2
LMP7
INT3
—
—
C4
HSP70-1
HSP70-3
HSP70t
Human
1p13,
1p21–25
Human
9q33–34
Mouse 2
RXRA
COL5A1
ABC2
RXRA
COL5A1
ABC2
RXRG
—
—
RXRG
—
—
PSMB7
PSMB7
—
—
NOTCH1
PBX3
TNC
C5
NOTCH1
—
—
—
NOTCH2
PBX1
TNR
—
—
PBX1
—
—
GRP78
GRP78
—
—
Mouse 1
NOTE.—It is conventional to use lowercase letters to designate mouse genes; that convention has not been followed here in the interest of highlighting orthologous
relationships between human and mouse genes.
three others were translocated to different chromosomes.
No evidence was presented to support this complex scenario. Furthermore, for no apparent reason, Kasahara et
al. (1996) dated the alleged block duplication of chromosome 6 and 9 homologs by comparing LMP7 (on
human chromosome 6) with its homolog X (on human
chromosome 14). Gould and Lewontin (1979) warn evolutionary biologists against ad hoc modification of a favorite hypothesis in order to explain away discrepancies
between that hypothesis and the data; continued ad hoc
modification yields a hypothesis that is essentially nonfalsifiable and, thus, no longer scientific. Since the complex hypothesis of Kasahara et al. (1996) seems to be
an ad hoc modification of the hypothesis of block duplication, it is desirable to consider simple, testable alternatives. In the case of the pairs of homologous genes
on human chromosomes 6 and 9, the simplest such alternative is the hypothesis of block duplication.
The purpose of the present study is to provide a
systematic test of the hypothesis of block duplication
for the gene clusters on human chromosomes 6, 9, and
1 (and for the corresponding gene clusters on mouse
chromosomes 17, 2, and 1) by means of phylogenetic
analysis of representative members of 9 of the 10 gene
families involved. The valine tRNA-synthetase family
was excluded, because no sequence is available in the
database that is known to correspond to the member of
this family mapped to human chromosome 9 (Walter et
al. 1987). By using sequences from major groups of
organisms, such analysis makes it possible to time gene
duplication events relative to major events of speciation
(‘‘cladogenesis’’) in the history of life that have given
rise to new kingdoms, phyla, or classes of organisms.
This, in turn, provides a test of the hypothesis that two
or more genes duplicated simultaneously that is independent of any estimate of the divergence of major taxa
that is derived from the fossil record and also does not
require the assumption of a constant rate of molecular
evolution (or a ‘‘molecular clock’’).
Methods
Sequences Analyzed
Phylogenetic analyses involved a total of 165 sequences belonging to the nine multigene families (table
2), each of which was designated by an abbreviated
mnemonic symbol (table 1). The members of these families found on human chromosomes 6, 9, and 1 and on
mouse chrosomomes 17, 2, and 1 are listed in table 1;
the gene names used in this table are the same as, or
similar to, those given by Kasahara et al. (1996) and
Katsanis, Fitzgibbon, and Fisher (1996). In the present
study, names for genes of nonhuman vertebrates were
chosen in such a way as to emphasize orthologous relationships (i.e., relationships of homology without gene
duplication). It is conventional to write names of mouse
genes with lowercase letters, in contrast to putatively
orthologous genes of humans, the names of which are
written with uppercase letters. Because of the emphasis
on clarifying orthologous relationships, this convention
was not followed here.
The sequences analyzed included those corresponding to genes involved in the putative block duplications
hypothesized by Kasahara et al. (1996) and Katsanis,
Fitzgibbon, and Fisher (1996) plus selected related sequences of other organisms. These latter sequences were
chosen from among those available in the database to
represent major taxa of organisms. The presence of sequences from major taxa of organisms made it possible
to date gene duplication events relative to major cladogenetic events.
In some cases, the sequences involved in the putative block duplication events were incomplete. The human INT3 and NOTCH1 sequences (the representatives
of the NOTCH family on human chromosomes 6 and 9,
respectively) were incomplete, as was the human ABC2
sequence. In the case of human NOTCH3, but not INT3,
the sequence available was too small to be included in
phylogenetic analyses. In the case of the ABC family,
the phylogenetic analysis was based only on the ATP-
856
Hughes
Table 2
Sequences Used in Analyses (with GenBank accession numbers)
1. RXR family
Arthropoda
Drosophila (Drosophila melanogaster) RXR (X52591), HNF4 (U70874)
Silkworm (Bombyx mori) RXR (U06073)
Hornworm (Manduca sexta) RXR (U44837)
Chordata
Osteichthyes
Zebrafish (Brachydanio rerio) RXRA (U29894), RXRD (U29941), RXRE (U29942), RXRG-like (U29940)
Amphibia
Clawed frog (Xenopus laevis) HNF4 (Z37526), RXRB1 (S73269), RXRB2 (X87366), RXRG (L11443)
Aves
Chicken (Gallus gallus) RXRG (X58997)
Mammalia
Mouse (Mus musculus) RXRA (X66223), RXRB (M84818), RXRG (M84819)
Human (Homo sapiens) HNF4A (Z49825), HNF4G (Z49826), RXRA (X52773), RXRB (M84820), RXRG (U38480)
2. COL family
Porifera
Sponge (Ephydatia meulleri) COL (M34640)
Annelida
Lugworm (Arenicola marina) COL (U68412)
Echinodermata
Common sea urchin (Paracentrotus lividus) COLA1 (M25282), COLA2 (J05422)
Purple sea urchin (Strongylocentrotus purpuratus) COLA1 (M92040), COLA2 (M92041)
Chordata
Osteichthyes
Zebrafish COL2A1 (U23822)
Aves
Chicken COL1A1 (V00401), COL1A2 (J00812), COL2A1 (X02663), COL5A1 (M76730), COL11A1 (M88593)
Mammalia
Mouse COL2A1 (M65161), COL11A1 (D38162), COL11A2 (U16789)
Rat (Rattus norvegicus) COL3A1 (X70369), COL11A1 (U20121)
Chinese hamster (Cricetulus longicaudatus) COL5A1 (M76730)
Rabbit (Oryctolagus cuniculus) COL1A2 (D49399)
Human COL2A1 (J00116), COL3A1 (X06700), COL5A1 (M76729), COL11A1 (J04177), COL11A2 (L18987)
3. ABC family
Archaea
Sulfobolus solfataricus ABC (Y08256)
Eubacteria
Bordatella pertussis CyaB (X14199)
Escherichia coli HlyB (M81823)
Pasturella haemolytica LktB (M20730)
Pediococcus acidilactici PedD (M83924)
Rhizobium galegae NodI (X87578)
Rhizobium loti NodI (X55705)
Streptococcus pneumoniae ComA (M36180)
Streptomyces peucetius DRRA (M73758)
Protista
Entamoeba histolytica Pgp1 (M88599), Pgp2 (M88598)
Animalia
Chordata
Mouse Pgp1 (M33581), TAP1 (X59615), TAP2 (M90459), ABC1 (X75926), ABC2 (X75927)
Bovine (Bos taurus) CFTR (M76128)
Human Pgp1 (X78081), TAP1 (X66401), TAP2 (X66401), ABC2 (U18235), ABC3 (U78735), CFTR (M28668)
4. PSMB family
Fungi
Yeast (Saccharomyces cerevisiae) PRG1 (M96667), PUP1 (X61189), PRE3 (X86020)
Animalia
Chordata
Amphibia
Clawed frog LMP7A (D44540), LMP7B (D44549)
Testing the Hypothesis of Block Duplication
Table 2
Continued
Mammalia
Mouse LMP7 (U22035), LMP2 (U35323), PSMB7 (D83585), d (U13393)
Rat LMP7 (D10729), LMP2 (D10757), X (D45247), d (D10754)
Human LMP7 (Z14982), LMP2 (U01025), X (D29011), MECL-1 (X71874), PSMB7 (D38048), d (D29012)
5. NOTCH family
Arthropoda
Drosophila NOTCH (M16149–M16153), crumbs (M33753)
Blowfly (Lucilia cuprina) SCL (U58977)
Chordata
Osteichthyes
Zebrafish NOTCH1 (X69088)
Goldfish (Carassius auratus) NOTCH3 (U09191)
Amphibia
Clawed frog NOTCH1 (M33874)
Aves
Chicken serrate (X95283)
Mammalia
Mouse NOTCH1 (Z11886), NOTCH3 (X74760), NOTCH4 (U43691), INT3 (M80456)
Rat NOTCH1 (X57405), jagged (L38483)
Human NOTCH1 (M73980), NOTCH2 (M99437), INT3 (D63395), jagged (U61276)
6. PBX family
Fungi
Yeast YGLO96W (Z72618), CUP9 (L36815)
Plantae
Barley (Hordeum vulgare) KNOX3 (X83518)
Animalia
Nematoda
Caenorhabditis elegans F17A2.5 (Z68114), CEH-20 (U01303)
Arthropoda
Drosophila extradenticle (U33747)
Chordata
Mouse PBX1 (L27453)
Human PBX1 (M86546), PBX2 (X59842), PBX3 (X59841)
7. TEN family
Nematoda
C. elegans R13F6 (U00046)
Chordata
Osteichthyes
Zebrafish TENC (X89203)
Aves
Chicken TENC (M23121), TENY (X99062)
Mammalia
Mouse TENC (X56304), TENX (X73959)
Pig (Sus scrofa) TENC-like (X61599)
Human TENC (M55618), TENR (Z67996), TENX (X71937)
8. C3/4/5 family
Chordata
Agnatha
Lamprey (Lampetra japonica) C3 (D10087)
Gnathostomata
Osteichthyes
Trout (Onchorynchus mykiss) C3 (L24433)
Amphibia
Clawed frog C3 (U19253), C4 (D78003)
Reptilia
Cobra (Naja naja) C3 (L02365)
Aves
Chicken C3 (U16848)
857
858
Hughes
Table 2
Continued
Mammalia
Guinea pig (Cavia porcellus) C3 (M34054), a2m (D84338)
Mouse C3 (K02782), C4 (M11789), C5 (M35526), MUG1 (M65736), MUG2 (M65238)
Rat C3 (X52477), a1m (M77183), a2m (J02635)
Human C3 (K02765), C4A (M59815), C4B (U24578), C5 (M57729), a2m (M11313)
9. HSP70 family
Eubacteria
E. coli DnaK (K01298)
Fungi
Yeast SSC1 (M27229), GRP78 (M25064), SSA1 (L22015), SSA2 (X125927)
Plantae
Tomato (Lycopersicon esculentum) HSC-1 (X54029)
Maize (Zea mays) HSP70 (X03658)
Petunia (Petunia hybrida) HSP70 (X06932)
Animalia
Arthropoda
Drosophila 87C1 (J01104)
Chordata
Clawed frog HSP70 (X01102)
Chicken GRP78 (M27260), HSP70 (J02579)
Mouse GRP78 (D78645), HSP70-1 (M76613), HSP70-3 (M35021), HSP70t (M32218), HSC70 (M19141), HSC70B (M20567)
Human GRP70 (M19645), HSP70-1 (M59828), HSP70-2 (M59830), HSP70-HOM (M59829)
binding cassette (ABC cassette) region of the molecule
(see below). Several eukaryotic members of this family
have an internally duplicated structure, such that there
are both N-terminal and C-terminal ABC cassettes. Only
the N-terminal cassette region of human ABC2 is present, but that was included in the analysis.
In the case of the HSP70 genes, there were conflicts
in nomenclature between various sources. There are
three members of this family on chromosome 17 of the
mouse, given the names Hsp70-1, Hsp70-3, and Hsp70t
by Kasahara et al. (1996) (called HSP70-1, HSP70-3,
and HSP70t in table 1). The first seems to correspond
to the gene called hsp70A1 by Perry et al. (1994), while
the second is the gene called HSP70A2 by those authors.
The protein products of these two genes are in fact identical. The third gene is apparently that called hsc70t by
Matsumo and Fujimoto (1990).
Statistical Methods
Within each family, amino acid sequences were
aligned by the CLUSTAL V program (Higgins, Bleasby,
and Fuchs 1992), and the alignments were corrected by
eye in some cases. Because the sequences analyzed were
generally highly divergent, frequently representing more
than one kingdom of organisms, the alignment was often
felt to be reliable only for a certain conserved portion
of the protein. Therefore, only such conserved portions
of the protein were used in phylogenetic analyses. (The
alignments of these regions are available upon request
from the author.) The results of previous analyses helped
identify conserved regions of some proteins. Hughes
(1994a), in a study of the ABC family, found that only
the ABC cassette region of the protein, including both
N-terminal and C-terminal cassettes in the case of family members with internally duplicated structures, could
be reliably aligned between very distant members of this
family. In the case of the PSMB family, the region corresponding to the mature protein was previously found
to be more conserved than the cleaved propeptide
(Hughes 1997).
When any set of sequences were compared pairwise, any site at which the alignment postulated a gap
or at which the residue was unknown due to incomplete
sequence information was excluded from all pairwise
distance computations so that a comparable data set was
used in each comparison. The following are the numbers
of amino acid residues compared in phylogenetic analyses for each of the 10 families, with a brief description
of the region of the protein used in analyses: (1) RXR:
284 amino acids (most of sequence, including both
DNA- and ligand-binding domains but excluding the
poorly aligned N-terminal region; Leid et al. 1992); (2)
COL: 110 amino acids (conserved C-terminal region);
(3) ABC: 72 amino acids (ABC cassettes; Luciani et al.
1994); (4) PSMB: 188 amino acids (mature protein); (5)
NOTCH: 175 amino acids (conserved central region);
PBX: 95 amino acids (conserved central region of the
protein, including the homeobox domain; Monica et al.
1991); TEN: 123 amino acids (conserved C-terminal region); C3/4/5: 1,016 amino acids (mature protein; Halkier 1991); HSP70: 584 amino acids (complete sequence).
Phylogenetic analyses were conducted by two different methods: (1) The neighbor-joining (NJ) method
(Saitou and Nei 1987) was used to construct trees on
the basis of three different distance measures: the uncorrected proportion of amino acid differences (p), the
estimate of the number of amino acid replacements per
site corrected for multiple hits by the Poisson formula
Testing the Hypothesis of Block Duplication
859
FIG. 1.—Phylogenetic tree of RXR family members.
(Nei 1991), and the estimate of the number of amino
acid replacements per site corrected for multiple hits by
the gamma formula (Ota and Nei 1994). (2) Maximumparsimony (MP) trees were constructed on the basis of
amino acid sequences using a heuristic search algorithm
(Swofford 1990). All of these methods produced essentially the same topologies. Therefore, in the following,
only NJ trees based on p are presented. The sequences
analyzed here were highly divergent from one another;
in such cases, use of p based on amino acids is preferable, because it has a low variance compared with other
distances (Kumar, Tamura, and Nei 1993). The NJ method is known to perform better than most others when
the rate of evolution differs in different branches of a
tree (Saitou and Nei 1987; Saito and Imanishi 1989; Nei
1991). The reliability of clustering within phylogenetic
trees was tested by bootstrapping (which involves repeated pseudosampling, with replacement, of sites from
the data set and construction of trees based on pseudosamples; Felsenstein 1985); 1,000 bootstrap samples
were used. In the figures, the percentage of bootstrap
samples supporting a given branch is indicated on that
branch; only values of .50% are shown.
For seven of the phylogenetic trees, a known outgroup was used to root the tree of the molecules of
interest. The other two trees (ABC and PSMB) were
rooted at the midpoint of the longest internal branch. In
these cases, even if the root of the entire tree was not
known, subsets of the tree could be considered as rooted
trees, being rooted by other portions of the tree.
Phylogenetic trees provided a way of placing gene
duplications in time relative to the divergence of major
taxa of organisms. This method of relative dating does
not depend on absolute divergence dates estimated from
the fossil record, nor does it depend on the assumption
of a constant rate of molecular evolution (‘‘molecular
clock’’). In addition to the relative dating of gene duplications, estimates of absolute divergence time were
also obtained. These were based on Poisson-corrected
amino acid distances (dAA) (Nei 1987, p. 41). Each was
calibrated by a date estimated from the fossil record
(summarized in Nei 1987). The standard error of mean
dAA for a set of comparisons was estimated by a method
analogous to that developed for nucleotide distances by
Nei and Jin (1989), which takes into account the covariance between distances.
Results
Phylogenetic Analyses
The results of phylogenetic analyses are shown in
figures 1–9. I briefly discuss these results for each gene
860
Hughes
FIG. 2.—Phylogenetic tree of COL family members.
family, highlighting the salient points of the tree with
regard to relative timing of duplications of genes on
human chromosomes 6, 9, and 1 and the divergence of
major taxa.
RXR
The RXR phylogeny was rooted with insect and
vertebrate HNF4 sequences (fig. 1). RXR genes from
three insects fell outside all of the vertebrate RXRA,
RXRB, and RXRG genes (fig. 1). The phylogeny suggests that RXRB diverged first, followed by RXRA and
RXRG, as hypothesized by Katsanis, Fitzgibbon, and
Fisher (1996); the bootstrap support for this pattern was
90% (fig. 1). Zebrafish genes were found to cluster with
mammalian RXRB, RXRA, and RXRG, but bootstrap
support for these clustering patterns was not strong.
Frog RXRB and RXRG genes clustered with their mammalian counterparts, and in each of these cases, there
was strong (99%) bootstrap support (fig. 1). The tree
thus suggests that RXRB, RXRA, and RXRG diverged
before the divergence of amphibians and amniotes and
probably before the divergence of tetrapods and bony
fishes.
and echinoderms were not well resolved by the tree, but
neither of two strongly supported clusters of vertebrate
collagens included any invertebrate sequence (fig. 2).
Both COL5A1 and COL11A2 fell within the same significantly supported (96%) cluster of vertebrate genes
and were closer to each other than to any invertebrate
genes (fig. 2). This topology suggests that the duplication of these two genes occurred after the origin divergence of echinoderms and chordates. Chicken and mammalian COL5A1 clustered together, a pattern that received high bootstrap support (fig. 2). Thus, the phylogeny indicates that the COL5A1–COL11A2 divergence
predated the divergence of birds and mammals.
COL11A1 clustered with COL5A1 (on human
chromosome 9), while COL11A2 (on human chromosome 6) clustered outside these two molecules; this pattern received 100% bootstrap support (fig. 2). It is of
interest that the human COL11A1 gene maps on chromosome 1 (Ayad et al. 1994). However, it is located in
1p21, outside the region of chromosome 1 hypothesized
by Katsanis, Fitzgibbon, and Fisher (1996) to have been
involved in a block duplication.
COL
ABC
The COL tree was rooted with a sponge collagen
sequence. The relationships of collagens from annelids
The ABC transporters constitute an extensive family or superfamily of molecules found in archaebacteria,
Testing the Hypothesis of Block Duplication
861
FIG. 3.—Phylogenetic tree of ABC family members.
eubacteria, and eukaryotes. In the phylogenetic analysis,
the TAP1 and TAP2 transporters, along with eukaryotic
Pgp molecules, clustered with ABC transporters from
purple bacteria such as Escherichia coli HlyB, a pattern
receiving strong bootstrap support (fig. 3). By contrast,
other bacterial and eukaryotic ABC transporters clustered outside this group. This pattern was observed in a
previous phylogenetic analysis (Hughes 1994a); it is
most easily explained under the hypothesis that the ancestor of TAP and Pgp genes was a mitochondrial gene
that was later translocated to the nuclear genome
(Hughes 1994a).
ABC2, the ABC transporter encoded on human
chromosome 9 and on mouse chromosome 2, clustered
with mammalian ABC1 and ABC2 and a number of
bacterial genes (fig. 3). Luciani et al. (1994) previously
noted the similarity of ABC1 and ABC2 to NodI transporters of Rhizobium, which are involved in nodulation
of plant roots (Evans and Downie 1986). In the present
analysis, the bacterial genes clustering with these mammalian genes included not only NodI, but also another
eubacterial transporter (from Streptomyces peucetius)
and a transporter from the archaebacterium Sulfolobus
solfataricus (fig. 3).
The relationship among eukaryotes, archaebacteria,
and prokaryotes remains controversial, although some
molecular data support a eukaryote-archaebacteria clade
(Iwabe et al. 1989). The present data do not address this
issue. Nonetheless, the phylogeny of ABC transporters
(fig. 3) clearly supports the hypothesis that the duplication of ABC2 and the ancestor of TAP1 and TAP2
occurred prior to the divergence of eukaryotes from eubacteria.
PSMB
Kasahara et al. (1996) presented a phylogenetic tree
of the PSMB (b proteasome component) family. Hughes
(1997) analyzed both a and b proteasome components.
862
Hughes
FIG. 4.—Phylogenetic tree of PSMB family members.
In both of these phylogenies, each of the PSMB members on human chromosomes 6 and 9 clustered closer
to a yeast molecule than it did to any other of the human
members of the family. The same pattern was seen in
the present analysis (fig. 4). LMP2 clustered with yeast
PRE3, LMP7 clustered with yeast PRG1, and PSMB7
clustered with yeast PUP1; each of these clusters received 100% bootstrap support (fig. 4). The only way
this phylogeny can be explained is that LMP2, LMP7,
and PSMB7 all diverged from each other before the divergence of animals and fungi.
NOTCH
The NOTCH tree was rooted with Drosophila
crumbs and vertebrate homologs (fig. 5). Human and
mouse INT3 and the related NOTCH4 clustered together
outside all other NOTCH sequences, including those of
both vertebrates and insects (fig. 5). This demonstrates
that INT3 diverged from NOTCH1 and NOTCH2 prior
to the divergence of protostomes (including Arthropoda
and Annelida) from deuterostomes (including Echinodermata and Chordata). A zebrafish gene clustered with
human and mouse NOTCH1, a pattern which received
strong (99%) bootstrap support (fig. 5). This pattern is
most consistent with the hypothesis that the duplication
of NOTCH1 and NOTCH2 occurred before the divergence of bony fishes from tetrapods. The phylogeny
supports the hypothesis that INT3 diverged from the ancestor of NOTCH1 and NOTCH2 before they diverged
from each other, as proposed by Katsanis, Fitzgibbon,
and Fisher (1996).
PBX
The PBX tree was rooted with a homologous sequence from yeast (fig. 6). Mammalian PBX1, PBX2,
and PBX3 all clustered together apart from invertebrate
sequences (fig. 6). PBX2 fell outside the cluster of
PBX1 and PBX3, a pattern which received strong (96%)
bootstrap support (fig. 6). The phylogeny thus supports
the hypothesis of Katsanis, Fitzgibbon, and Fisher
(1996) that PBX2 was the first of these three to diverge.
TEN
The TEN family tree was rooted with a homolog
from the nematode Caenorhabditis elegans (fig. 7). This
Testing the Hypothesis of Block Duplication
863
FIG. 5.—Phylogenetic tree of NOTCH family members.
tree strongly supports the hypothesis that TENX diverged prior to the divergence of TENC from TENR, as
proposed by Katsanis, Fitzgibbon, and Fisher (1996).
The chicken molecule called TENY clustered with human and mouse TENX and is probably orthologous to
mammalian TENX. The fact that both mammalian
TENX and mammalian TENC clustered with bird homologs (fig. 7) supports the hypothesis that all three mammalian genes duplicated prior to the bird–mammal divergence. Mammal TENC and bird TENC also cluster
with a zebrafish gene (fig. 7); this suggests that these
duplications also must have occurred prior to the divergence of bony fishes and tetrapods.
C3/4/5
The phylogeny of the C3/4/5 complement components was rooted with related molecules a2m, a1m, and
mouse MUG1 and MUG2 (fig. 8). As with a previous
analysis involving nonsynonymous sites in DNA sequences (Hughes 1994b), the phylogeny supported the
hypothesis that C5 diverged prior to the divergence of
C3 and C4. In the present phylogeny, this pattern received only modestly strong bootstrap support (85%); it
received much stronger support in a phylogeny of conserved portions of the molecule (Hughes 1994b). Nonaka and Takahashi (1992) proposed, without any phylogenetic analysis, that C4 was the first to diverge; but
this hypothesis received no support from either the previous or present phylogenetic analyses. The C3 genes
of jawed vertebrates clustered with a homolog from lam-
prey; this cluster received 98% bootstrap support (fig.
8). This suggests that the C4–C5 divergence must have
preceded the divergence of jawed and jawless vertebrates.
HSP70
One of the yeast members of the HSP70 family,
SSC1, is mitochondrially expressed, although encoded
in the nuclear genome (Craig et al. 1989). SSC1 is closely related to bacterial HSP70, supporting the hypothesis
that the SSC1 gene was originally a mitochondrial gene
that was translocated to the nucleus (Hughes 1993). In
the present phylogenetic analysis, SSC1 and E. coli
DnaK were used to root the tree (fig. 9). Vertebrate
GRP78 clustered with yeast GRP78, while other vertebrate members of the family clustered with yeast SSA1
and SSA2 (fig. 9), as was seen in a previous analysis
(Hughes 1993). Bootstrap support for both of these clusters was very strong (fig. 9). This phylogeny thus indicates that the divergence of GRP78 from human HSP70
genes on chromosome 6 (and from their homologs on
mouse chromosome 17) occurred prior to the divergence
of animals from fungi.
Divergence Times
Table 3 summarizes relative divergence times supported by the phylogenetic analyses in figures 1–9. In
each case, the percentage of bootstrap support for the
relevant internal branch in both NJ and MP trees is given. The results are inconsistent with the hypothesis that
864
Hughes
FIG. 6.—Phylogenetic tree of PBX family members.
all of the genes on human chromosomes 6 and 9 were
duplicated simultaneously early in vertebrate history, after the divergence of the jawless and jawed vertebrates
but prior to the divergence of bony fishes and tetrapods.
Rather, these duplications have occurred at different
points over a very long period of time. The ancestor of
TAP1 and TAP2 diverged from the ancestor of ABC2
prior to the divergence of eukaryotes from eubacteria.
The ancestors of both LMP2 and LMP7 diverged from
PSMB7 prior to the divergence of animals and fungi.
Likewise, GRP78 diverged from its homologs on human
chromosome 6 and mouse chromosome 17 prior to the
divergence of animals and fungi. INT3 diverged from
NOTCH1 and NOTCH2 prior to the divergence of protostomes from deuterostomes. Other duplications have
clearly occurred much more recently. There is evidence
that the PBX and RXR duplications occurred after the
divergence of deuterostomes from protostomes.
COL5A1 and COL11A2 appear to have diverged after
the divergence of echinoderms from chordates.
In spite of the potential pitfalls involved in estimating absolute divergence times, most of the estimates
obtained in the present case are remarkably consistent
with the relative time estimates obtained from phylogenetic analyses (tables 3 and 4). On the basis of these
dates, the related genes on human chromosomes 6 and
9 duplicated at times scattered over at least 1.6 billion
years of the earth’s history. There is one group of genes,
however, which were estimated to have duplicated at
approximately the same time. The duplication of RXR,
PBX, and TEN between chromosome 6 and the ancestors of the chromosome 9 and 1 homologs was estimated
to have occurred about 650 MYA (612–696 MYA). In
addition, the duplication of C5 from the ancestor of C3
and C4 and the duplication of COL11A2 and COL5A1
may have occurred around the same time (table 4). The
duplication of the chromosome 9 and chromosome 1
homologs of RXR, PBX, and TEN occurred shortly
thereafter, around 600 MYA (table 4). In the case of the
fourth gene family with homologs on all three chromosomes, NOTCH, the initial duplication (between the
chromosome 6 gene and the ancestor of the chromosome
9 and 1 genes) was estimated to have occurred much
earlier than the duplications of RXR, PBX, and TEN
family members (table 4). However, the duplication of
NOTCH family members between chromosomes 9 and
1 was estimated to have occurred around the same time
as that of RXR, PBX, and TEN (table 4). Thus, in at
least three, and possibly five, of the gene families, there
was a block duplication between the members on chro-
Testing the Hypothesis of Block Duplication
865
FIG. 7.—Phylogenetic tree of TEN family members.
mosome 6, and the ancestor of the members on chromosomes 9 and 1. Subsequently, three of these families,
plus NOTCH, were probably involved in a block duplication giving rise to chromosome 9 and 1 members.
Time estimation appeared to be least reliable in the
case of COL. The time estimate shown in table 4 is
based on the assumption that the annelid (lugworm)
COL gene diverged from the vertebrate genes at the time
of divergence of deuterostomes and protostomes (here
estimated at 800 MYA). Because the position of the annelid gene in the COL phylogeny is not very stable (fig.
2), this assumption is difficult to test. If the annelid COL
gene diverged before the protostome–deuterostome divergence, the time estimate given would be an underestimate. Using the same calibration as that in table 4,
the divergence time of vertebrate COL5A1 genes (on
human chromosome 9) and COL11A1 genes (on human
chromosome 1) is estimated at 322 MYA (685 Myrr,
99% confidence interval). This estimate is not unreasonable since the phylogeny (fig. 2) clearly shows that
the duplication predated the bird-mammal divergence
(311 MYA). However, if the same calibration is used to
date the bird–mammal divergence, it gives the impossibly recent figure of 90 MYA. Thus, the data for COL
imply either (1) that the annelid gene is inappropriate
for calibration because it diverged much earlier than the
protostome–duterostome divergence or (2) that there has
been a substantial slowdown in the rate of evolution of
these genes within the vertebrates. Because of these
problems, it cannot, at present, be decided whether or
not the COL genes on chromosomes 6 and 9 were duplicated at the same time as PBX, RXR, and TEN.
Discussion
On the basis of the analyses reported here, the following conclusions can be made:
1. The clusters of homologous genes on human chromosomes 6 and 9 (and the corresponding clusters on
mouse chromosomes 17 and 2) did not all result from
a block duplication early in vertebrate history or at
866
Hughes
FIG. 8.—Phylogenetic tree of C3/4/5 family members.
any other time. Rather, they occurred at various times
over a period of 1.6 billion years.
2. Four of the genes on human chromosomes 6 and 9
(those in the RXR, PBX, and TEN, and C3/4/5 families) probably did duplicate as a block, although this
duplication may have occurred very early in vertebrate history, well prior to the divergence of jawed
and jawless vertebrates. The COL genes may also
have duplicated at the same time, although this hypothesis is less well supported. However, the members of NOTCH family on chromosomes 6 and 9 did
not duplicate at the same time as the other three
genes but much earlier, contrary to the hypothesis of
Katsanis, Fitzgibbon, and Fisher (1996). Shortly
thereafter, RXR, NOTCH PBX, and TEN family
members seem to have duplicated as a block between
chromosomes 9 and 1, as proposed by Katsanis, Fitzgibbon, and Fisher (1996).
3. In general, the existence of two clusters of homologous genes on different chromosomes cannot be
taken as evidence of a simultaneous or ‘‘block’’ duplication event in the absence of a phylogenetic anal-
ysis. Therefore, the technique of ‘‘paralogy mapping’’ proposed by Katsanis, Fitzgibbon, and Fisher
(1996), by which known clusters of genes are used
to predict the existence of unknown genes on other
chromosomes, should be used with caution.
As mentioned previously, Kasahara et al. (1996)
proposed a very complicated hypothesis to explain the
pattern of relatedness of genes on human chromosomes
6 and 9, involving very ancient tandem duplications,
then block duplication followed by deletion and translocation. They proposed that the ABC2 and TAP genes
duplicated in tandem prior to the divergence of eukaryotes and eubacteria. These two genes remained in tandem until after the alleged block duplication, which occurred over a billion years later. Soon after this event,
however, the TAP gene in the chromosome 9 group was
deleted, as was the ABC2 gene in the chromosome 6
group. Note that this scenario is inconsistent with evidence from phylogenetic analysis suggesting that TAP
has a mitochondrial origin (Hughes 1994a). Likewise,
Kasahara et al. (1996) proposed tandem duplication of
HSP70 and GRP78 prior to the divergence of animals
Testing the Hypothesis of Block Duplication
867
FIG. 9.—Phylogenetic tree of HSP70 family members.
and fungi, tandem association of these genes for millions of years, and, finally, after the alleged block duplication, deletion of HSP70 in the chromosome 9 group
and of GRP78 in the chromosome 6 group.
The hypothesis of Kasahara et al. (1996) concerning PSMB is still more complex. Here, two events of
tandem duplication are proposed to have occurred prior
to the divergence of animals from fungi. Again, the duplicates are assumed to have remained in tandem until
after the alleged block duplication. At this point, one of
the genes from the chromosome 6 group is supposed to
have been translocated to another chromosome, while
two of the genes from the chromosome 9 group were
translocated independently to yet two other chromosomes.
There are many problems with these complex scenarios. First, no evidence whatsoever is presented in
support of them. Second, they imply that tandemly duplicated genes remained in close linkage for millions of
years—over one billion years in the case of ABC—and
yet no gene conversion occurred between them. The
supposed lack of interlocus recombination contradicts
much of what we know about tandemly arrayed duplicated genes (Ohta 1991; Hughes 1996). To explain the
occurrence on chromosomes 6 and 9 of members of just
three families (ABC, PSMB, and HSP70), Kasahara et
al. (1996) hypothesized no less than four tandem duplications, one block duplication, three translocations, and
four independent gene deletions (a total of 12 genetic
events). It is far more parsimonious to hypothesize that
the members of each of these families duplicated independently and that each was then independently translocated to chromosomes 6 and 9; this hypothesis requires at most nine genetic events (three duplications
and six translocations). The present analyses clearly
demonstrate that NOTCH family members on chromosomes 6 and 9 also duplicated at a much earlier time
than the alleged block duplication, indeed, before the
divergence of deuterostomes and protostomes. Rather
than invent yet another ‘‘just-so story’’ (Gould and Le-
868
Hughes
Table 3
Summary of Gene Divergences Relative to Organismal
Divergences as Indicated by Phylogenetic Analyses
Family
RXR . . . . . .
Human
Chromosomes
TEN . . . . . . .
6-9
1-9
1-9
6-9
6-9
6-9
1-9
6-9
1-9
6-9
C3/4/5 . . . . .
HSP70 . . . . .
6-9
6-9
COL . . . . . .
ABC . . . . . .
PSMB . . . . .
NOTCH . . .
PBX . . . . . . .
Beforea
Tet-Ost (95) [100]
Tet-Ost (95) [100]
Mav-Av (100) [100]
Euk-Eub (95) [93]
An-Fn (100) [100]
Deu-Pro (96) [79]
Tet Ost (99) [100]
—
—
Tet-Ost (99) [82]
Tet-Ost (99) [82]
Gna-Ag (98) [71]
An-Fn (100) [99]
Table 4
Estimates of Divergence Times Based on the Number of
Amino Acid Replacements per 100 Sites (dAA) Calibrated
with Dates from the Fossil Record
Aftera
Deu-Pro (100) [95]
Deu-Pro (100) [95]
—
—
—
—
Deu Pro (45) [87]
Deu-Pro (45) [87]
—
—
—
a The number in parentheses is the bootstrap percentage for the relevant branch
in the NJ tree; the number in brackets is the bootstrap percentage for the corresponding branch in the MP tree. Organismal divergence events are abbreviated
as follows: An-Fn, Animalia–Fungi; Deu-Pro, Deuterostomes–Protostomes; EukEub, Eukaryotes–Eubacteria; Gna-Ag, Gnathostomata–Agnatha; Mam-Av,
Mammalia–Aves; Tet-Ost, Tetropoda–Osteichthyes.
wontin 1979) of tandem duplication and deletion in the
case of the NOTCH family, it is far more reasonable to
reject the hypothesis that ABC, HSP70, PSMB, or
NOTCH family members on chromosomes 6 and 9 were
ever involved in a block duplication event.
If the clusters of homologous genes on human
chromosomes 6 and 9 did not all duplicate simultaneously but, in fact, duplicated at various widely different times, it remains an open question why they are
clustered together. The two remaining hypotheses are (1)
that these associations are the result of chance and (2)
that there is an adaptive significance to this clustering;
in other words, that it has been favored by natural selection. In the case of human chromosomes 6 and 9, at
least four separate pairs of paralogs (from the ABC,
PSMB, NOTCH, and HSP70 families) have independently translocated to the two linkange groups. Such a
large number of independent events seems unlikely to
have occurred by chance and suggests that there is some
functional significance to clustering of these genes.
It has previously been hypothesized that a gene
may be translocated to a chromosomal location that is
selectively advantageous and that this rearranged genotype may become fixed due to natural selection. One
potential example involves the DAZ gene cluster, which
encodes proteins involved in spermatogenesis and was
translocated to the Y chromosome sometime in primate
evolution, before the divergence of humans and orangutans (Saxena et al. 1996; Menke, Mutter, and Page
1997). Some shared functional characteristics may account for the convergent accumulation of members of
10 gene families on chromosomes 6 and 9, although the
nature of these characteristics remains speculative.
One characteristic shared by many of the genes on
human 6p21.3 is a universal or very broad pattern of
expression (table 5). This is true, for example, of the
Family
Human
Chromosomes dAA 6 SE
RXR . . . . . . . 6-1/9
27.2 6 2.4
Calibration
(dAA 6 SE)
Tet-Ost
(16.8 6 1.7)
1-9
25.1 6 2.6 Tet-Ost
(16.8 6 1.7)
COL . . . . . . . 6-9
54.7 6 6.9 Deu-Pro
(78.0 6 8.8)
ABC . . . . . . . 6-9
149.2 6 16.0 Euk-Eub
(104.6 6 7.6)
PSMB . . . . . . 6-9
134.1 6 9.9 An-Fn
(59.1 6 5.7)
NOTCH . . . . 6-1/9 66.7 6 5.9 Deu-Pro
(33.4 6 3.8)
1-9
25.5 6 3.4 Deu-Pro
(33.4 6 3.8)
PBX . . . . . . . 6-1/9 18.7 6 2.0 Deu-Pro
(24.4 6 2.6)
1-9
18.2 6 2.5 Deu-Pro
(24.4 6 2.6)
TEN . . . . . . . 6-1/9 83.3 6 4.2 Tet-Ost
(47.9 6 3.2)
74.5 6 4.5 Tet-Ost
(47.9 6 3.2)
C3/4/5 . . . . . . 6-9
131.2 6 3.9 Gna-Ag
(191.9 6 2.6)
HSP70 . . . . . 6-9
44.7 6 2.9 An-Fn
(33.4 6 2.4)
Time Estimate
6
99% Confidence
Interval
648 6 149
596 6 160
561 6 181
2,140 6 592
2,268 6 430
1,896 6 365
610 6 209
612 6 167
599 6 209
696 6 90
623 6 98
579 6 44
1,604 6 268
NOTE.—Abbreviations for organismal divergence events are as in table 3.
Divergence times used in calibration were the following: Euk-Eub, 1,500 MYA;
An-Fn, 1,200 MYA; Deu-Pro, 800 MYA; Tet-Ost, 400 MYA.
class I MHC genes and of those genes encoding molecules such as the TAP transporters and proteasome components which interact functionally with the class I molecules. 6p21.3 also includes genes for essential cellular
structural elements, such as histones and b tubulin, and
broadly expressed regulators of transcription, such as
PBX2 and ZNF173 (table 6). 9q33–34 also includes a
number of broadly expressed genes (table 6). The most
interesting of these is the Surfeit housekeeping gene
complex, which was recently described as ‘‘the tightest
mammalian gene cluster described so far’’ (Gilley,
Armes, and Fried 1997). The presence of the MHC in
6p21.3 and the Surfeit complex in 9q34 suggests the
intriguing hypothesis that such complexes of highly expressed genes may act as strong attractors over evolutionary time for other highly expressed genes. It may be
advantageous to locate highly expressed genes in
regions that are likely to be transcriptionally active in
most cells.
Another characteristic of many genes in 6p21.3 and
9q34 is that they encode unusually long polypeptide
chains (table 6). Often, these genes include large numbers of exons. For example, the C4A gene consists of
41 exons, while COL11A2 has 65 exons. It is possible
that it is advantageous to locate such large genes in
regions likely to be continually active transcriptionally,
because the process of transcription and splicing of the
Testing the Hypothesis of Block Duplication
Table 5
Genes with Broad to Universal Expression and Genes
Encoding Large (.700 aa) Proteins Located in Human
6p21.3 and 9q33–34
869
6 and 9 may indeed represent a quite rare event and
therefore one for which an explanation in terms of natural selection is appealing.
Acknowledgments
Broad–universal expressiona
6p21.3 . . . . . HLA class Ia, HSP70 homologs, histone H1.D,
This research was supported by grants R01histone H2A.1, cyclin-dependent kinase
GM43940
and K04-GM000614 from the National Instiinhibitor I, serine kinase, Ndr serine/threonine
kinase, ZNF173, b tubulin, valine t-RNA
tutes of Health. I am grateful to Federica Verra for comsynthetase, PBX2, LMP2, LMP7, TAP1, TAP2,
ments on the manuscript.
RXRB, RD protein
9q33–34 . . .
PSMB7, ABC2, PBX3, GRP78, Surfeit gene cluster,
LITERATURE CITED
gelsolin, RXRA, ribosomal protein L12, CAN
Large proteinsb
6p21.3 . . . . .
9q33–34 . . .
C4A (1,744), C4B (1,699), INT3 (.1,095), TENX
(3,536), VARS2 (1,265), COL11A2 (1,629–
1,736), phospholipase D (841), complement
factor B (764), complement C2 (753), helicase-like
(1,245), female sterile homeotic homolog (755)
Gelsolin (782), golgin-97 (767), CAN (2,090),
TENC (2,203), C5 (1,676), COL5A1 (1,839),
NOTCH1 (.2,444), ABC2 (1,472)c
a References for expression patterns: Chu et al. (1995); Dobner et al. (1991);
Eick et al. (1989); Gui, Lane, and Fu (1994); Hall et al. (1983); Harper et al.
(1993); Kraemer et al. (1994); Kwiatkowski et al. (1986); Levi-Strauss et al.
(1988); Milner and Campbell (1990); Millward, Cron, and Hemmings (1995);
Monaco (1992); Williams et al. (1988).
b Numbers of amino acid residues are given in parentheses where known.
c Based on mouse homolog.
pre-mRNA is quite complicated and presumably relatively time-consuming. Therefore, clusters of highly expressed genes such as the class I MHC and Surfeit clusters may serve to attract genes encoding large proteins.
If so, the fact that the C2, factor B, and C4 complement
components are large proteins may account for their
linkage to the MHC in mammals rather than any advantage arising from the fact that they have an immune
system function, although one that is essentially unrelated to that of the MHC class I and class II molecules
themselves.
The hypothesis that the clustering of paralogs on
chromosomes 6 and 9 is adaptive requires testing. One
way to test it would be to gather baseline data regarding
the rate of occurrence of such clusters in the genomes
of vertebrates; however, no genetic maps of vertebrates
are as yet sufficiently detailed to provide these data. It
has been estimated that the human genome contains
about 7 3 104 genes (Miklos and Rubin 1996). Assuming that these represent 1,000 gene familes (Doolittle
1989), the average gene family would contain 70 members (although, obviously, some gene families contain
many more members; Miklos and Rubin 1996). If one
were to draw two genes at random from the genome,
the chance that they would belong to the same gene
family would be, on average, about 1 3 1023. Given
this probability, if one were then to draw two sets of
100 genes (about the number of genes in the MHC region), the chance that four or more genes in one set
would have paralogs in the other set would be less than
1025. Although rough, these calculations suggest that the
independent translocation of homologs from the ABC,
PSMB, NOTCH, and HSP70 families to chromosomes
AYAD, S., R. P. BOOT-HANDFORD, M. J. HUMPHRIES, K. E.
KADLER, and C. A. SHUTTLEWORTH. 1994. The extracellular matrix factsbook. Academic Press, London.
CHU, T. W., A. CAPOSSELA, R. COLEMAN, V. L. GOEI, G. NALLUR, and J. R. GRUEN. 1995. Cloning of a new ‘‘finger’’
protein gene (ZNF173) within the class I region of the
human MHC. Genomics 29:229–239.
CRAIG, E. A., J. KRAMER, J. SHILLING, M. WERNER-WASHBURNE, S. HOLMES, J. KOSIC-SMITHERS, and C. M. NICOLET. 1989. SSC1, and essential member of the yeast
HSO70 multigene family, encodes a mitochondrial protein. Mol. Cell. Biol. 9:3000–3008.
DOBNER, T., I. WOLF, B. MAI, and M. LIPP. 1991. A novel
divergently transcribed human histone H2A/H2B gene
pair. DNA Seq. 1:409–413.
DOOLITTLE, F. F. 1989. Redundancies in protein sequences.
Pp. 599–623 in G. D. FASMAN, ed. Prediction of protein
structure and the principles of protein conformation. Plenum Press, New York.
EICK, S., M. NICOLAI, D. MUMBERG, and D. DOENECKE. 1989.
Human H1 histones: conserved and varied sequence elements in two H1 subtype sequences. J. Cell Biol. 49:110–
115.
EVANS, I. J., and J. A. DOWNIE. 1986. The nod I gene product
of Rhizobium leguminosarum is closely related to ATPbinding bacterial transport proteins: nucleotide sequence
analysis of the nod I and nod J genes. Gene 43:95–105.
FELSENSTEIN, J. 1985. Confidence limits on phylogenies: an
approach using the bootstrap. Evolution 39:783–791.
GILLEY, J., N. ARMES, and M. FRIED. 1997. Fugu genome is
not a good mammalian model. Nature 385:305–306.
GOULD, S. J., and R. C. LEWONTIN. 1979. The spandrels of
San Marco and the Panglossian paradigm: a critique of the
adaptationist programme. Proc. R. Soc. Lond. Biol. Sci.
205:581–598.
GUI, J.-F., W. S. LANE, and X.-D. FU. 1994. A serine kinase
regulates intracellular localization of splicing factors in the
cell cycle. Nature 369:678–682.
HALL, J. L., L. DUDLEY, P. R. DOBNER, S. A. LEWIS, and N.
J. COWAN. 1983. Identification of two human b-tubulin
isotypes. Mol. Cell. Biol. 3:854–862.
HARPER, J. W., G. R. ADAMI, N. WEI, K. KEYOMARSI, and S.
J. ELLEDGE. 1993. The p21 Cdk-interacting protein Cipl
is a potent inhibitor of G1 cyclin-dependent kinases. Cell
75:805–816.
HIGGINS, D. G., A. J. BLEASBY, and R. FUCHS. 1992. CLUSTAL V: improved software for multiple sequence alignment. Comput. Appl. Biosci. 8:189–191.
HUGHES, A. L. 1993. Nonlinear relationships among evolutionary rates identify regions of functional divergence in
heat-shock protein 70 genes. Mol. Biol. Evol. 10:243–255.
. 1994a. Evolution of the ATP-binding-cassette transmembrane transporters of vertebrates. Mol. Biol. Evol. 11:
899–910.
870
Hughes
. 1994b. Phylogeny of the C3/C4/C5 complement
component gene family indicates that C5 diverged first.
Mol. Biol. Evol. 11:417–425.
. 1996. Gene duplication and recombination in the
evolution of mammalian Fc receptors. J. Mol. Evol. 42:
247–256.
. 1997. Evolution of the proteasome components. Immunogenetics 46:82–92.
IWABE, N., K. KUMA, M. HASEGAWA, S. OSAWA, and T. MIYATA. 1989. Evolutionary relationships of archaebacteria,
eubacteria, and eukaryotes inferred from phylogenetic
trees of duplicated genes. Proc. Natl. Acad. Sci. USA 86:
9355–9359.
KASAHARA, M., M. HAYASHI, K. TANAKA, H. INOKU, K. SUGAYA, T. IKEMURA, and T. ISHIBASHI. 1996. Chromosomal
localization of the proteasome Z subunit gene reveals an
ancient chromosomal duplication involving the major histocompatibility complex. Proc. Natl. Acad. Sci. USA 93:
9096–9101.
KATSANIS, N., J. FITZGIBBON, and E. M. C. FISHER. 1996.
Paralogy mapping: identification of a region in the human
MHC triplicated onto human chromosomes 1 and 9 allows
the prediction and isolation of novel PBX and NOTCH
loci. Genomics 35:101–108.
KLEIN, J. 1986. Natural history of the major histocompatibility complex. Wiley, New York.
KRAEMER, D., R. W. WOZNIK, G. BLOBEL, and A. RADU.
1994. The human CAN protein, a putative oncogene product associated with myeloid leukemogenesis, is a nuclear
pore complex protein that faces the cytoplasm. Proc. Natl.
Acad. Sci. USA 91:1519–1523.
KUMAR, S., K. TAMURA, and M. NEI. 1993. MEGA: molecular evolutionary genetic analysis. Version 1.0. Pennsylvania State University, University Park.
KWIATKOWSKI, D. J., J. P. STOSSEL, S. H. ORKIN, J. E. MOLE,
H. R. COLLEN, H. L. YIN. 1986. Plasma and cytosolic gelsolins are encoded by a single gene and contain a duplicated actin-binding domain. Nature 323:455–458.
LEID, M., P. KASTNER, R. LYONS et al. (11 co-authors). 1992.
Purification, cloning, and RXR identity of the HeLa cell
factor with which RAR or TR heterodimerizes to bind target sequences efficiently. Cell 68:377–395.
LEVI-STRAUSS, M., M. C. CARROLL, M. STEINMETZ, and T.
MEO. 1988. A previously undetected MHC gene with an
unusual periodic structure. Science 240:201–204.
LUCIANI, M. F., F. DENIZOT, S. SAVARY, M. G. MATTEI, and
G. CHIMINI. 1994. Cloning of two novel ABC transporters
mapping on human chromosome 9. Genomics 21:150–
159.
MATSUMOTO, M., and H. FUJIMOTO. 1990. Cloning of a
hsp70-related gene expressed in mouse spermatids. Biochem. Biophys. Res. Commun. 166:43–49.
MENKE, D. B., G. L. MUTTER, and D. C. PAGE. 1997. Expression of DAZ, an Azoospermia factor candidate, in human spermatogonia. Am. J. Hum. Genet. 60:237–241.
MIKLOS, G. L. G., and G. M. RUBIN. 1996. The role of the
genome project in determining gene function: insight from
model organisms. Cell 86:521–529.
MILLWARD, T., P. CRON, and B. A. HEMMINGS. 1995. Molecular cloning and characterization of a conserved nuclear
serine (threonine) protein kinase. Proc. Natl. Acad. Sci.
USA 92:5022–5026.
MILNER, C. M., and R. D. CAMPBELL. 1990. Structure and
expression of the three MHC-linked HSP70 genes. Immunogenetics 32:242–251.
MONACO, J. J. 1992. A molecular model of MHC class-Irestricted antigen processing. Immunol. Today 13:173–
178.
MONICA, K., N. GALILI, J. NOURSE, D. SATLMAN, and M. L.
CLEARY. 1991. PBX2 and PBX3, new homeobox genes
with extensive homology to the human proto-oncogene
PBX1. Mol. Cell. Biol. 11:6149–6157.
NEI, M. 1987. Molecular evolutionary genetics. Columbia
University Press, New York.
. 1991. Relative efficiencies of different tree making
methods for molecular data. Pp. 90–128 in M. M. MIYAMOTO and J. L. CRACRAFT, eds. Recent advances in phylogenetic studies of DNA sequences. Oxford University
Press, Oxford.
NEI, M., and L. JIN. 1989. Variances of the average numbers
of nucleotide substitutions within and between populations. Mol. Biol. Evol. 6:290–300.
NONAKA, M., and M. TAKAHASHI. 1992. Complete complementary DNA sequence of the third component of complement of lamprey. J. Immunol. 148:3290–3295.
OHTA, T. 1991. Multigene families and the evolution of complexity. J. Mol. Evol. 33:31–41.
OTA, T., and M. NEI. 1994. Estimation of the number of amino acid substitutions per site when the substitution rate
varies among sites. J. Mol. Evol. 38:642–643.
PERRY, M. D., L. ANJANE, S. SHTANG, and L. A. MORAN.
1994. Structure and expression of an inducible HSP70encoding gene from Mus. musculus. Gene 146:273–278.
SAITOU, N., and M. IMANISHI. 1989. Relative efficiencies of
the Fitch-Margoliash, maximum-parsimony, maximumlikelihood, minimum-evolution, and neighbor-joining
methods of phylogenetic tree reconstruction in obtaining
the correct tree. Mol. Biol. Evol. 6:514–525.
SAITOU, N., and M. NEI. 1987. The neighbor-joining method:
a new method for reconstructing phylogenetic trees. Mol.
Biol. Evol. 4:406–425.
SAXENA, R., L. G. BROWN, T. HAWKINS et al. (11 co-authors).
1996. The DAZ gene cluster on the human Y chromosome
arose from an autosomal gene that was transposed, repeatedly amplified and pruned. Nat. Genet. 14:292–299.
SWOFFORD, D. L. 1990. PAUP: phylogenetic analysis using
parsimony. Illinois Natural History Survey, Champaign.
TOMITA, M., T. KINOSHITA, S. IZUMI, S. TOMINO, and K.
YOSHIZATO. 1994. Characterizations of sea urchin fibrillar
collagen and its cDNA clone. Biochem. Biophys. Acta
1217:131–140.
WALTER, B., A. YEN, J. WASMUTH, and M. SMITH. 1987.
Selection of somatic cell hybrids containing human chromosome 9 using a temperature sensitive CHO valyl-t-RNA
synthetase mutant. Cytogenet. Cell Genet. 46:710.
WILLIAMS, T., J. YOU, C. HUXLEY, and M. FRIED. 1988. The
mouse surfeit locus contains a very tight cluster of four
‘‘housekeeping’’ genes that is conserved through evolution. Proc. Natl. Acad. Sci. USA 85:3527–3530.
MANOLO GOUY, reviewing editor
Accepted March 19, 1998