Phylogenetic Tests of the Hypothesis of Block Duplication of Homologous Genes on Human Chromosomes 6, 9, and 1 Austin L. Hughes Department of Biology and Institute of Molecular Evolutionary Genetics, The Pennsylvania State University There are 10 gene families that have members on both human chromosome 6 (6p21.3, the location of the human major histocompatibility complex [MHC]) and human chromosome 9 (mostly 9q33–34). Six of these families also have members on mouse chromosome 17 (the mouse MHC chromosome) and mouse chromosome 2. In addition, four of these families have members on human chromosome 1 (1q21–25 and 1p13), and two of these have members on mouse chromosome 1. One hypothesis to explain these patterns is that members of the 10 gene families of human chromosomes 6 and 9 were duplicated simultaneously as a result of polyploidization or duplication of a chromosome segment (‘‘block duplication’’). A subsequent block duplication has been proposed to account for the presence of representatives of four of these families on human chromosome 1. Phylogenetic analyses of the 9 gene families for which data were available decisively rejected the hypothesis of block duplication as an overall explanation of these patterns. Three to five of the genes on human chromosomes 6 and 9 probably duplicated simultaneously early in vertebrate history, prior to the divergence of jawed and jawless vertebrates, and shortly after that, all four of the genes on chromosomes 1 and 9 probably duplicated as a block. However, the other genes duplicated at different times scattered over at least 1.6 billion years. Since the occurrence of these clusters of related genes cannot be explained by block duplication, one alternative explanation is that they cluster together because of shared functional characteristics relating to expression patterns. Introduction As increasing numbers of genes are sequenced and mapped in eukaryotes, it has sometimes been found that a number of genes clustered together show evidence of an evolutionary relationship (homology) to genes forming a cluster on another chromosome. Frequently, such a pattern is attributed to an ancient event of block duplication, that is, duplication of an entire chromosome segment either as a result of a whole-genome duplication (polyploidization) or by duplication of one chromosomal segment followed, perhaps at a much later time, by its translocation to another chromosome. For example, in humans, there are 15 genes on chromosome 6 (location 6p21.3), belonging to 10 gene families, which show evidence of homology to 10 genes on chromosome 9 (9 of them in 9q33–34) (table 1; Kasahara et al. 1996). Human chromosome 6 bears the genes of the major histocompatibility complex (MHC) (Klein 1986), and these 15 genes are located in the MHC class II and class III regions (Kasahara et al. 1996). Of these 10 gene families, 6 also have members on chromosome 17 (the MHC chromosome) of the mouse, Mus musculus (table 1). These 6 families also have representatives on mouse chromosome 2 (table 1; Kasahara et al. 1996). Kasahara et al. (1996, p. 9099) attributed this pattern to an ancient block duplication event that took place ‘‘at an early stage of vertebrate evolution, probably at or before the emergence of bony fish but after the emergence of the jawless fishes.’’ Similarly, four of the gene families with representatives on human chromosomes 6 and 9 (and with representatives on mouse chromosomes Key words: adaptive evolution, gene duplication, genome structure, major histocompatibility complex. Address for correspondence and reprints: Austin L. Hughes, Department of Biology, 208 Mueller Laboratory, The Pennsylvania State University, University Park, Pennsylvania 16803. E-mail: [email protected]. Mol. Biol. Evol. 15(7):854–870. 1998 q 1998 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038 854 17 and 2) also include members on human chromosome 1 (1p13, Iq21–31) (table 1; Katsanis, Fitzgibbon, and Fisher 1996). Two of these four families also have representatives on mouse chromosome 1 (table 1; Katsanis, Fitzgibbon, and Fisher 1996). This pattern was also attributed to an ancient block duplication by Katsanis, Fitzgibbon, and Fisher (1996). If two or more genes have been duplicated simultaneously as the result of a block duplication event, this should be revealed by phylogenetic analyses. In the case of the genes on human chromosomes 6, 9, and 1, few of the gene families involved have been subjected to phylogenetic analyses sufficiently rigorous to determine the time of gene duplication relative to divergence of major groups of organisms. Katsanis, Fitzgibbon, and Fisher (1996) presented phylogenetic trees for retinoid X receptor (RXR) and pre-B-cell-leukemia transcription factor (PBX) family (table 1) members, but they included only a small number of sequences, so little could be inferred about divergence times. In addition, they based their trees on similarity at nucleotide sites in the entire coding regions of the genes; this is not a reliable basis for reconstructing the relationships of genes as distantly related as those analyzed by these authors, because synonymous nucleotide sites are saturated with changes and thus convey no evolutionary information. Kasahara et al. (1996) constructed a phylogenetic tree of just one of the gene families with members on chromosomes 6 and 9, the proteasome component b (PSMB) gene family. In the case of chromosomes 6 and 9, Kasahara et al. (1996) proposed a complicated modification of the hypothesis of block duplication. They proposed that four tandem duplications involving three of the gene families occurred prior to the alleged block duplication. According to these authors, the tandemly duplicated genes all remained linked for many millions of years (in one case over a billion years) until shortly after the alleged block duplication, when four of the genes were deleted while Testing the Hypothesis of Block Duplication 855 Table 1 Gene Families with Members on Human Chromosomes 6, 9, and 1 and on Mouse Chromosomes 17, 2, and 1 Family Human 6p21.3 Retinoid X receptor (RXR) . . . . . . . . . . . . . . . . . . . . . RXRB a pro-collagen (COL) . . . . . . . . . . . . . . . . . . . . . . . . . . COLL11A2 ATP-binding cassette transporter (ABC) . . . . . . . . . . . TAP1 TAP2 Proteasome component b (PSMB) . . . . . . . . . . . . . . . . LMP2 LMP7 Notch (NOTCH) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . INT3 Pre-B-cell-leukemia transcription factor (PBX) . . . . . PBX2 Tenascin (TEN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TNX C3/C4/C5 complement component (C3/4/5) . . . . . . . . C4A C4B Heat shock protein 70 (HSP70) . . . . . . . . . . . . . . . . . . HSP70-1 HSP70-2 HSP70-HOM Mouse 17 RXRB COLL11A2 TAP1 TAP2 LMP2 LMP7 INT3 — — C4 HSP70-1 HSP70-3 HSP70t Human 1p13, 1p21–25 Human 9q33–34 Mouse 2 RXRA COL5A1 ABC2 RXRA COL5A1 ABC2 RXRG — — RXRG — — PSMB7 PSMB7 — — NOTCH1 PBX3 TNC C5 NOTCH1 — — — NOTCH2 PBX1 TNR — — PBX1 — — GRP78 GRP78 — — Mouse 1 NOTE.—It is conventional to use lowercase letters to designate mouse genes; that convention has not been followed here in the interest of highlighting orthologous relationships between human and mouse genes. three others were translocated to different chromosomes. No evidence was presented to support this complex scenario. Furthermore, for no apparent reason, Kasahara et al. (1996) dated the alleged block duplication of chromosome 6 and 9 homologs by comparing LMP7 (on human chromosome 6) with its homolog X (on human chromosome 14). Gould and Lewontin (1979) warn evolutionary biologists against ad hoc modification of a favorite hypothesis in order to explain away discrepancies between that hypothesis and the data; continued ad hoc modification yields a hypothesis that is essentially nonfalsifiable and, thus, no longer scientific. Since the complex hypothesis of Kasahara et al. (1996) seems to be an ad hoc modification of the hypothesis of block duplication, it is desirable to consider simple, testable alternatives. In the case of the pairs of homologous genes on human chromosomes 6 and 9, the simplest such alternative is the hypothesis of block duplication. The purpose of the present study is to provide a systematic test of the hypothesis of block duplication for the gene clusters on human chromosomes 6, 9, and 1 (and for the corresponding gene clusters on mouse chromosomes 17, 2, and 1) by means of phylogenetic analysis of representative members of 9 of the 10 gene families involved. The valine tRNA-synthetase family was excluded, because no sequence is available in the database that is known to correspond to the member of this family mapped to human chromosome 9 (Walter et al. 1987). By using sequences from major groups of organisms, such analysis makes it possible to time gene duplication events relative to major events of speciation (‘‘cladogenesis’’) in the history of life that have given rise to new kingdoms, phyla, or classes of organisms. This, in turn, provides a test of the hypothesis that two or more genes duplicated simultaneously that is independent of any estimate of the divergence of major taxa that is derived from the fossil record and also does not require the assumption of a constant rate of molecular evolution (or a ‘‘molecular clock’’). Methods Sequences Analyzed Phylogenetic analyses involved a total of 165 sequences belonging to the nine multigene families (table 2), each of which was designated by an abbreviated mnemonic symbol (table 1). The members of these families found on human chromosomes 6, 9, and 1 and on mouse chrosomomes 17, 2, and 1 are listed in table 1; the gene names used in this table are the same as, or similar to, those given by Kasahara et al. (1996) and Katsanis, Fitzgibbon, and Fisher (1996). In the present study, names for genes of nonhuman vertebrates were chosen in such a way as to emphasize orthologous relationships (i.e., relationships of homology without gene duplication). It is conventional to write names of mouse genes with lowercase letters, in contrast to putatively orthologous genes of humans, the names of which are written with uppercase letters. Because of the emphasis on clarifying orthologous relationships, this convention was not followed here. The sequences analyzed included those corresponding to genes involved in the putative block duplications hypothesized by Kasahara et al. (1996) and Katsanis, Fitzgibbon, and Fisher (1996) plus selected related sequences of other organisms. These latter sequences were chosen from among those available in the database to represent major taxa of organisms. The presence of sequences from major taxa of organisms made it possible to date gene duplication events relative to major cladogenetic events. In some cases, the sequences involved in the putative block duplication events were incomplete. The human INT3 and NOTCH1 sequences (the representatives of the NOTCH family on human chromosomes 6 and 9, respectively) were incomplete, as was the human ABC2 sequence. In the case of human NOTCH3, but not INT3, the sequence available was too small to be included in phylogenetic analyses. In the case of the ABC family, the phylogenetic analysis was based only on the ATP- 856 Hughes Table 2 Sequences Used in Analyses (with GenBank accession numbers) 1. RXR family Arthropoda Drosophila (Drosophila melanogaster) RXR (X52591), HNF4 (U70874) Silkworm (Bombyx mori) RXR (U06073) Hornworm (Manduca sexta) RXR (U44837) Chordata Osteichthyes Zebrafish (Brachydanio rerio) RXRA (U29894), RXRD (U29941), RXRE (U29942), RXRG-like (U29940) Amphibia Clawed frog (Xenopus laevis) HNF4 (Z37526), RXRB1 (S73269), RXRB2 (X87366), RXRG (L11443) Aves Chicken (Gallus gallus) RXRG (X58997) Mammalia Mouse (Mus musculus) RXRA (X66223), RXRB (M84818), RXRG (M84819) Human (Homo sapiens) HNF4A (Z49825), HNF4G (Z49826), RXRA (X52773), RXRB (M84820), RXRG (U38480) 2. COL family Porifera Sponge (Ephydatia meulleri) COL (M34640) Annelida Lugworm (Arenicola marina) COL (U68412) Echinodermata Common sea urchin (Paracentrotus lividus) COLA1 (M25282), COLA2 (J05422) Purple sea urchin (Strongylocentrotus purpuratus) COLA1 (M92040), COLA2 (M92041) Chordata Osteichthyes Zebrafish COL2A1 (U23822) Aves Chicken COL1A1 (V00401), COL1A2 (J00812), COL2A1 (X02663), COL5A1 (M76730), COL11A1 (M88593) Mammalia Mouse COL2A1 (M65161), COL11A1 (D38162), COL11A2 (U16789) Rat (Rattus norvegicus) COL3A1 (X70369), COL11A1 (U20121) Chinese hamster (Cricetulus longicaudatus) COL5A1 (M76730) Rabbit (Oryctolagus cuniculus) COL1A2 (D49399) Human COL2A1 (J00116), COL3A1 (X06700), COL5A1 (M76729), COL11A1 (J04177), COL11A2 (L18987) 3. ABC family Archaea Sulfobolus solfataricus ABC (Y08256) Eubacteria Bordatella pertussis CyaB (X14199) Escherichia coli HlyB (M81823) Pasturella haemolytica LktB (M20730) Pediococcus acidilactici PedD (M83924) Rhizobium galegae NodI (X87578) Rhizobium loti NodI (X55705) Streptococcus pneumoniae ComA (M36180) Streptomyces peucetius DRRA (M73758) Protista Entamoeba histolytica Pgp1 (M88599), Pgp2 (M88598) Animalia Chordata Mouse Pgp1 (M33581), TAP1 (X59615), TAP2 (M90459), ABC1 (X75926), ABC2 (X75927) Bovine (Bos taurus) CFTR (M76128) Human Pgp1 (X78081), TAP1 (X66401), TAP2 (X66401), ABC2 (U18235), ABC3 (U78735), CFTR (M28668) 4. PSMB family Fungi Yeast (Saccharomyces cerevisiae) PRG1 (M96667), PUP1 (X61189), PRE3 (X86020) Animalia Chordata Amphibia Clawed frog LMP7A (D44540), LMP7B (D44549) Testing the Hypothesis of Block Duplication Table 2 Continued Mammalia Mouse LMP7 (U22035), LMP2 (U35323), PSMB7 (D83585), d (U13393) Rat LMP7 (D10729), LMP2 (D10757), X (D45247), d (D10754) Human LMP7 (Z14982), LMP2 (U01025), X (D29011), MECL-1 (X71874), PSMB7 (D38048), d (D29012) 5. NOTCH family Arthropoda Drosophila NOTCH (M16149–M16153), crumbs (M33753) Blowfly (Lucilia cuprina) SCL (U58977) Chordata Osteichthyes Zebrafish NOTCH1 (X69088) Goldfish (Carassius auratus) NOTCH3 (U09191) Amphibia Clawed frog NOTCH1 (M33874) Aves Chicken serrate (X95283) Mammalia Mouse NOTCH1 (Z11886), NOTCH3 (X74760), NOTCH4 (U43691), INT3 (M80456) Rat NOTCH1 (X57405), jagged (L38483) Human NOTCH1 (M73980), NOTCH2 (M99437), INT3 (D63395), jagged (U61276) 6. PBX family Fungi Yeast YGLO96W (Z72618), CUP9 (L36815) Plantae Barley (Hordeum vulgare) KNOX3 (X83518) Animalia Nematoda Caenorhabditis elegans F17A2.5 (Z68114), CEH-20 (U01303) Arthropoda Drosophila extradenticle (U33747) Chordata Mouse PBX1 (L27453) Human PBX1 (M86546), PBX2 (X59842), PBX3 (X59841) 7. TEN family Nematoda C. elegans R13F6 (U00046) Chordata Osteichthyes Zebrafish TENC (X89203) Aves Chicken TENC (M23121), TENY (X99062) Mammalia Mouse TENC (X56304), TENX (X73959) Pig (Sus scrofa) TENC-like (X61599) Human TENC (M55618), TENR (Z67996), TENX (X71937) 8. C3/4/5 family Chordata Agnatha Lamprey (Lampetra japonica) C3 (D10087) Gnathostomata Osteichthyes Trout (Onchorynchus mykiss) C3 (L24433) Amphibia Clawed frog C3 (U19253), C4 (D78003) Reptilia Cobra (Naja naja) C3 (L02365) Aves Chicken C3 (U16848) 857 858 Hughes Table 2 Continued Mammalia Guinea pig (Cavia porcellus) C3 (M34054), a2m (D84338) Mouse C3 (K02782), C4 (M11789), C5 (M35526), MUG1 (M65736), MUG2 (M65238) Rat C3 (X52477), a1m (M77183), a2m (J02635) Human C3 (K02765), C4A (M59815), C4B (U24578), C5 (M57729), a2m (M11313) 9. HSP70 family Eubacteria E. coli DnaK (K01298) Fungi Yeast SSC1 (M27229), GRP78 (M25064), SSA1 (L22015), SSA2 (X125927) Plantae Tomato (Lycopersicon esculentum) HSC-1 (X54029) Maize (Zea mays) HSP70 (X03658) Petunia (Petunia hybrida) HSP70 (X06932) Animalia Arthropoda Drosophila 87C1 (J01104) Chordata Clawed frog HSP70 (X01102) Chicken GRP78 (M27260), HSP70 (J02579) Mouse GRP78 (D78645), HSP70-1 (M76613), HSP70-3 (M35021), HSP70t (M32218), HSC70 (M19141), HSC70B (M20567) Human GRP70 (M19645), HSP70-1 (M59828), HSP70-2 (M59830), HSP70-HOM (M59829) binding cassette (ABC cassette) region of the molecule (see below). Several eukaryotic members of this family have an internally duplicated structure, such that there are both N-terminal and C-terminal ABC cassettes. Only the N-terminal cassette region of human ABC2 is present, but that was included in the analysis. In the case of the HSP70 genes, there were conflicts in nomenclature between various sources. There are three members of this family on chromosome 17 of the mouse, given the names Hsp70-1, Hsp70-3, and Hsp70t by Kasahara et al. (1996) (called HSP70-1, HSP70-3, and HSP70t in table 1). The first seems to correspond to the gene called hsp70A1 by Perry et al. (1994), while the second is the gene called HSP70A2 by those authors. The protein products of these two genes are in fact identical. The third gene is apparently that called hsc70t by Matsumo and Fujimoto (1990). Statistical Methods Within each family, amino acid sequences were aligned by the CLUSTAL V program (Higgins, Bleasby, and Fuchs 1992), and the alignments were corrected by eye in some cases. Because the sequences analyzed were generally highly divergent, frequently representing more than one kingdom of organisms, the alignment was often felt to be reliable only for a certain conserved portion of the protein. Therefore, only such conserved portions of the protein were used in phylogenetic analyses. (The alignments of these regions are available upon request from the author.) The results of previous analyses helped identify conserved regions of some proteins. Hughes (1994a), in a study of the ABC family, found that only the ABC cassette region of the protein, including both N-terminal and C-terminal cassettes in the case of family members with internally duplicated structures, could be reliably aligned between very distant members of this family. In the case of the PSMB family, the region corresponding to the mature protein was previously found to be more conserved than the cleaved propeptide (Hughes 1997). When any set of sequences were compared pairwise, any site at which the alignment postulated a gap or at which the residue was unknown due to incomplete sequence information was excluded from all pairwise distance computations so that a comparable data set was used in each comparison. The following are the numbers of amino acid residues compared in phylogenetic analyses for each of the 10 families, with a brief description of the region of the protein used in analyses: (1) RXR: 284 amino acids (most of sequence, including both DNA- and ligand-binding domains but excluding the poorly aligned N-terminal region; Leid et al. 1992); (2) COL: 110 amino acids (conserved C-terminal region); (3) ABC: 72 amino acids (ABC cassettes; Luciani et al. 1994); (4) PSMB: 188 amino acids (mature protein); (5) NOTCH: 175 amino acids (conserved central region); PBX: 95 amino acids (conserved central region of the protein, including the homeobox domain; Monica et al. 1991); TEN: 123 amino acids (conserved C-terminal region); C3/4/5: 1,016 amino acids (mature protein; Halkier 1991); HSP70: 584 amino acids (complete sequence). Phylogenetic analyses were conducted by two different methods: (1) The neighbor-joining (NJ) method (Saitou and Nei 1987) was used to construct trees on the basis of three different distance measures: the uncorrected proportion of amino acid differences (p), the estimate of the number of amino acid replacements per site corrected for multiple hits by the Poisson formula Testing the Hypothesis of Block Duplication 859 FIG. 1.—Phylogenetic tree of RXR family members. (Nei 1991), and the estimate of the number of amino acid replacements per site corrected for multiple hits by the gamma formula (Ota and Nei 1994). (2) Maximumparsimony (MP) trees were constructed on the basis of amino acid sequences using a heuristic search algorithm (Swofford 1990). All of these methods produced essentially the same topologies. Therefore, in the following, only NJ trees based on p are presented. The sequences analyzed here were highly divergent from one another; in such cases, use of p based on amino acids is preferable, because it has a low variance compared with other distances (Kumar, Tamura, and Nei 1993). The NJ method is known to perform better than most others when the rate of evolution differs in different branches of a tree (Saitou and Nei 1987; Saito and Imanishi 1989; Nei 1991). The reliability of clustering within phylogenetic trees was tested by bootstrapping (which involves repeated pseudosampling, with replacement, of sites from the data set and construction of trees based on pseudosamples; Felsenstein 1985); 1,000 bootstrap samples were used. In the figures, the percentage of bootstrap samples supporting a given branch is indicated on that branch; only values of .50% are shown. For seven of the phylogenetic trees, a known outgroup was used to root the tree of the molecules of interest. The other two trees (ABC and PSMB) were rooted at the midpoint of the longest internal branch. In these cases, even if the root of the entire tree was not known, subsets of the tree could be considered as rooted trees, being rooted by other portions of the tree. Phylogenetic trees provided a way of placing gene duplications in time relative to the divergence of major taxa of organisms. This method of relative dating does not depend on absolute divergence dates estimated from the fossil record, nor does it depend on the assumption of a constant rate of molecular evolution (‘‘molecular clock’’). In addition to the relative dating of gene duplications, estimates of absolute divergence time were also obtained. These were based on Poisson-corrected amino acid distances (dAA) (Nei 1987, p. 41). Each was calibrated by a date estimated from the fossil record (summarized in Nei 1987). The standard error of mean dAA for a set of comparisons was estimated by a method analogous to that developed for nucleotide distances by Nei and Jin (1989), which takes into account the covariance between distances. Results Phylogenetic Analyses The results of phylogenetic analyses are shown in figures 1–9. I briefly discuss these results for each gene 860 Hughes FIG. 2.—Phylogenetic tree of COL family members. family, highlighting the salient points of the tree with regard to relative timing of duplications of genes on human chromosomes 6, 9, and 1 and the divergence of major taxa. RXR The RXR phylogeny was rooted with insect and vertebrate HNF4 sequences (fig. 1). RXR genes from three insects fell outside all of the vertebrate RXRA, RXRB, and RXRG genes (fig. 1). The phylogeny suggests that RXRB diverged first, followed by RXRA and RXRG, as hypothesized by Katsanis, Fitzgibbon, and Fisher (1996); the bootstrap support for this pattern was 90% (fig. 1). Zebrafish genes were found to cluster with mammalian RXRB, RXRA, and RXRG, but bootstrap support for these clustering patterns was not strong. Frog RXRB and RXRG genes clustered with their mammalian counterparts, and in each of these cases, there was strong (99%) bootstrap support (fig. 1). The tree thus suggests that RXRB, RXRA, and RXRG diverged before the divergence of amphibians and amniotes and probably before the divergence of tetrapods and bony fishes. and echinoderms were not well resolved by the tree, but neither of two strongly supported clusters of vertebrate collagens included any invertebrate sequence (fig. 2). Both COL5A1 and COL11A2 fell within the same significantly supported (96%) cluster of vertebrate genes and were closer to each other than to any invertebrate genes (fig. 2). This topology suggests that the duplication of these two genes occurred after the origin divergence of echinoderms and chordates. Chicken and mammalian COL5A1 clustered together, a pattern that received high bootstrap support (fig. 2). Thus, the phylogeny indicates that the COL5A1–COL11A2 divergence predated the divergence of birds and mammals. COL11A1 clustered with COL5A1 (on human chromosome 9), while COL11A2 (on human chromosome 6) clustered outside these two molecules; this pattern received 100% bootstrap support (fig. 2). It is of interest that the human COL11A1 gene maps on chromosome 1 (Ayad et al. 1994). However, it is located in 1p21, outside the region of chromosome 1 hypothesized by Katsanis, Fitzgibbon, and Fisher (1996) to have been involved in a block duplication. COL ABC The COL tree was rooted with a sponge collagen sequence. The relationships of collagens from annelids The ABC transporters constitute an extensive family or superfamily of molecules found in archaebacteria, Testing the Hypothesis of Block Duplication 861 FIG. 3.—Phylogenetic tree of ABC family members. eubacteria, and eukaryotes. In the phylogenetic analysis, the TAP1 and TAP2 transporters, along with eukaryotic Pgp molecules, clustered with ABC transporters from purple bacteria such as Escherichia coli HlyB, a pattern receiving strong bootstrap support (fig. 3). By contrast, other bacterial and eukaryotic ABC transporters clustered outside this group. This pattern was observed in a previous phylogenetic analysis (Hughes 1994a); it is most easily explained under the hypothesis that the ancestor of TAP and Pgp genes was a mitochondrial gene that was later translocated to the nuclear genome (Hughes 1994a). ABC2, the ABC transporter encoded on human chromosome 9 and on mouse chromosome 2, clustered with mammalian ABC1 and ABC2 and a number of bacterial genes (fig. 3). Luciani et al. (1994) previously noted the similarity of ABC1 and ABC2 to NodI transporters of Rhizobium, which are involved in nodulation of plant roots (Evans and Downie 1986). In the present analysis, the bacterial genes clustering with these mammalian genes included not only NodI, but also another eubacterial transporter (from Streptomyces peucetius) and a transporter from the archaebacterium Sulfolobus solfataricus (fig. 3). The relationship among eukaryotes, archaebacteria, and prokaryotes remains controversial, although some molecular data support a eukaryote-archaebacteria clade (Iwabe et al. 1989). The present data do not address this issue. Nonetheless, the phylogeny of ABC transporters (fig. 3) clearly supports the hypothesis that the duplication of ABC2 and the ancestor of TAP1 and TAP2 occurred prior to the divergence of eukaryotes from eubacteria. PSMB Kasahara et al. (1996) presented a phylogenetic tree of the PSMB (b proteasome component) family. Hughes (1997) analyzed both a and b proteasome components. 862 Hughes FIG. 4.—Phylogenetic tree of PSMB family members. In both of these phylogenies, each of the PSMB members on human chromosomes 6 and 9 clustered closer to a yeast molecule than it did to any other of the human members of the family. The same pattern was seen in the present analysis (fig. 4). LMP2 clustered with yeast PRE3, LMP7 clustered with yeast PRG1, and PSMB7 clustered with yeast PUP1; each of these clusters received 100% bootstrap support (fig. 4). The only way this phylogeny can be explained is that LMP2, LMP7, and PSMB7 all diverged from each other before the divergence of animals and fungi. NOTCH The NOTCH tree was rooted with Drosophila crumbs and vertebrate homologs (fig. 5). Human and mouse INT3 and the related NOTCH4 clustered together outside all other NOTCH sequences, including those of both vertebrates and insects (fig. 5). This demonstrates that INT3 diverged from NOTCH1 and NOTCH2 prior to the divergence of protostomes (including Arthropoda and Annelida) from deuterostomes (including Echinodermata and Chordata). A zebrafish gene clustered with human and mouse NOTCH1, a pattern which received strong (99%) bootstrap support (fig. 5). This pattern is most consistent with the hypothesis that the duplication of NOTCH1 and NOTCH2 occurred before the divergence of bony fishes from tetrapods. The phylogeny supports the hypothesis that INT3 diverged from the ancestor of NOTCH1 and NOTCH2 before they diverged from each other, as proposed by Katsanis, Fitzgibbon, and Fisher (1996). PBX The PBX tree was rooted with a homologous sequence from yeast (fig. 6). Mammalian PBX1, PBX2, and PBX3 all clustered together apart from invertebrate sequences (fig. 6). PBX2 fell outside the cluster of PBX1 and PBX3, a pattern which received strong (96%) bootstrap support (fig. 6). The phylogeny thus supports the hypothesis of Katsanis, Fitzgibbon, and Fisher (1996) that PBX2 was the first of these three to diverge. TEN The TEN family tree was rooted with a homolog from the nematode Caenorhabditis elegans (fig. 7). This Testing the Hypothesis of Block Duplication 863 FIG. 5.—Phylogenetic tree of NOTCH family members. tree strongly supports the hypothesis that TENX diverged prior to the divergence of TENC from TENR, as proposed by Katsanis, Fitzgibbon, and Fisher (1996). The chicken molecule called TENY clustered with human and mouse TENX and is probably orthologous to mammalian TENX. The fact that both mammalian TENX and mammalian TENC clustered with bird homologs (fig. 7) supports the hypothesis that all three mammalian genes duplicated prior to the bird–mammal divergence. Mammal TENC and bird TENC also cluster with a zebrafish gene (fig. 7); this suggests that these duplications also must have occurred prior to the divergence of bony fishes and tetrapods. C3/4/5 The phylogeny of the C3/4/5 complement components was rooted with related molecules a2m, a1m, and mouse MUG1 and MUG2 (fig. 8). As with a previous analysis involving nonsynonymous sites in DNA sequences (Hughes 1994b), the phylogeny supported the hypothesis that C5 diverged prior to the divergence of C3 and C4. In the present phylogeny, this pattern received only modestly strong bootstrap support (85%); it received much stronger support in a phylogeny of conserved portions of the molecule (Hughes 1994b). Nonaka and Takahashi (1992) proposed, without any phylogenetic analysis, that C4 was the first to diverge; but this hypothesis received no support from either the previous or present phylogenetic analyses. The C3 genes of jawed vertebrates clustered with a homolog from lam- prey; this cluster received 98% bootstrap support (fig. 8). This suggests that the C4–C5 divergence must have preceded the divergence of jawed and jawless vertebrates. HSP70 One of the yeast members of the HSP70 family, SSC1, is mitochondrially expressed, although encoded in the nuclear genome (Craig et al. 1989). SSC1 is closely related to bacterial HSP70, supporting the hypothesis that the SSC1 gene was originally a mitochondrial gene that was translocated to the nucleus (Hughes 1993). In the present phylogenetic analysis, SSC1 and E. coli DnaK were used to root the tree (fig. 9). Vertebrate GRP78 clustered with yeast GRP78, while other vertebrate members of the family clustered with yeast SSA1 and SSA2 (fig. 9), as was seen in a previous analysis (Hughes 1993). Bootstrap support for both of these clusters was very strong (fig. 9). This phylogeny thus indicates that the divergence of GRP78 from human HSP70 genes on chromosome 6 (and from their homologs on mouse chromosome 17) occurred prior to the divergence of animals from fungi. Divergence Times Table 3 summarizes relative divergence times supported by the phylogenetic analyses in figures 1–9. In each case, the percentage of bootstrap support for the relevant internal branch in both NJ and MP trees is given. The results are inconsistent with the hypothesis that 864 Hughes FIG. 6.—Phylogenetic tree of PBX family members. all of the genes on human chromosomes 6 and 9 were duplicated simultaneously early in vertebrate history, after the divergence of the jawless and jawed vertebrates but prior to the divergence of bony fishes and tetrapods. Rather, these duplications have occurred at different points over a very long period of time. The ancestor of TAP1 and TAP2 diverged from the ancestor of ABC2 prior to the divergence of eukaryotes from eubacteria. The ancestors of both LMP2 and LMP7 diverged from PSMB7 prior to the divergence of animals and fungi. Likewise, GRP78 diverged from its homologs on human chromosome 6 and mouse chromosome 17 prior to the divergence of animals and fungi. INT3 diverged from NOTCH1 and NOTCH2 prior to the divergence of protostomes from deuterostomes. Other duplications have clearly occurred much more recently. There is evidence that the PBX and RXR duplications occurred after the divergence of deuterostomes from protostomes. COL5A1 and COL11A2 appear to have diverged after the divergence of echinoderms from chordates. In spite of the potential pitfalls involved in estimating absolute divergence times, most of the estimates obtained in the present case are remarkably consistent with the relative time estimates obtained from phylogenetic analyses (tables 3 and 4). On the basis of these dates, the related genes on human chromosomes 6 and 9 duplicated at times scattered over at least 1.6 billion years of the earth’s history. There is one group of genes, however, which were estimated to have duplicated at approximately the same time. The duplication of RXR, PBX, and TEN between chromosome 6 and the ancestors of the chromosome 9 and 1 homologs was estimated to have occurred about 650 MYA (612–696 MYA). In addition, the duplication of C5 from the ancestor of C3 and C4 and the duplication of COL11A2 and COL5A1 may have occurred around the same time (table 4). The duplication of the chromosome 9 and chromosome 1 homologs of RXR, PBX, and TEN occurred shortly thereafter, around 600 MYA (table 4). In the case of the fourth gene family with homologs on all three chromosomes, NOTCH, the initial duplication (between the chromosome 6 gene and the ancestor of the chromosome 9 and 1 genes) was estimated to have occurred much earlier than the duplications of RXR, PBX, and TEN family members (table 4). However, the duplication of NOTCH family members between chromosomes 9 and 1 was estimated to have occurred around the same time as that of RXR, PBX, and TEN (table 4). Thus, in at least three, and possibly five, of the gene families, there was a block duplication between the members on chro- Testing the Hypothesis of Block Duplication 865 FIG. 7.—Phylogenetic tree of TEN family members. mosome 6, and the ancestor of the members on chromosomes 9 and 1. Subsequently, three of these families, plus NOTCH, were probably involved in a block duplication giving rise to chromosome 9 and 1 members. Time estimation appeared to be least reliable in the case of COL. The time estimate shown in table 4 is based on the assumption that the annelid (lugworm) COL gene diverged from the vertebrate genes at the time of divergence of deuterostomes and protostomes (here estimated at 800 MYA). Because the position of the annelid gene in the COL phylogeny is not very stable (fig. 2), this assumption is difficult to test. If the annelid COL gene diverged before the protostome–deuterostome divergence, the time estimate given would be an underestimate. Using the same calibration as that in table 4, the divergence time of vertebrate COL5A1 genes (on human chromosome 9) and COL11A1 genes (on human chromosome 1) is estimated at 322 MYA (685 Myrr, 99% confidence interval). This estimate is not unreasonable since the phylogeny (fig. 2) clearly shows that the duplication predated the bird-mammal divergence (311 MYA). However, if the same calibration is used to date the bird–mammal divergence, it gives the impossibly recent figure of 90 MYA. Thus, the data for COL imply either (1) that the annelid gene is inappropriate for calibration because it diverged much earlier than the protostome–duterostome divergence or (2) that there has been a substantial slowdown in the rate of evolution of these genes within the vertebrates. Because of these problems, it cannot, at present, be decided whether or not the COL genes on chromosomes 6 and 9 were duplicated at the same time as PBX, RXR, and TEN. Discussion On the basis of the analyses reported here, the following conclusions can be made: 1. The clusters of homologous genes on human chromosomes 6 and 9 (and the corresponding clusters on mouse chromosomes 17 and 2) did not all result from a block duplication early in vertebrate history or at 866 Hughes FIG. 8.—Phylogenetic tree of C3/4/5 family members. any other time. Rather, they occurred at various times over a period of 1.6 billion years. 2. Four of the genes on human chromosomes 6 and 9 (those in the RXR, PBX, and TEN, and C3/4/5 families) probably did duplicate as a block, although this duplication may have occurred very early in vertebrate history, well prior to the divergence of jawed and jawless vertebrates. The COL genes may also have duplicated at the same time, although this hypothesis is less well supported. However, the members of NOTCH family on chromosomes 6 and 9 did not duplicate at the same time as the other three genes but much earlier, contrary to the hypothesis of Katsanis, Fitzgibbon, and Fisher (1996). Shortly thereafter, RXR, NOTCH PBX, and TEN family members seem to have duplicated as a block between chromosomes 9 and 1, as proposed by Katsanis, Fitzgibbon, and Fisher (1996). 3. In general, the existence of two clusters of homologous genes on different chromosomes cannot be taken as evidence of a simultaneous or ‘‘block’’ duplication event in the absence of a phylogenetic anal- ysis. Therefore, the technique of ‘‘paralogy mapping’’ proposed by Katsanis, Fitzgibbon, and Fisher (1996), by which known clusters of genes are used to predict the existence of unknown genes on other chromosomes, should be used with caution. As mentioned previously, Kasahara et al. (1996) proposed a very complicated hypothesis to explain the pattern of relatedness of genes on human chromosomes 6 and 9, involving very ancient tandem duplications, then block duplication followed by deletion and translocation. They proposed that the ABC2 and TAP genes duplicated in tandem prior to the divergence of eukaryotes and eubacteria. These two genes remained in tandem until after the alleged block duplication, which occurred over a billion years later. Soon after this event, however, the TAP gene in the chromosome 9 group was deleted, as was the ABC2 gene in the chromosome 6 group. Note that this scenario is inconsistent with evidence from phylogenetic analysis suggesting that TAP has a mitochondrial origin (Hughes 1994a). Likewise, Kasahara et al. (1996) proposed tandem duplication of HSP70 and GRP78 prior to the divergence of animals Testing the Hypothesis of Block Duplication 867 FIG. 9.—Phylogenetic tree of HSP70 family members. and fungi, tandem association of these genes for millions of years, and, finally, after the alleged block duplication, deletion of HSP70 in the chromosome 9 group and of GRP78 in the chromosome 6 group. The hypothesis of Kasahara et al. (1996) concerning PSMB is still more complex. Here, two events of tandem duplication are proposed to have occurred prior to the divergence of animals from fungi. Again, the duplicates are assumed to have remained in tandem until after the alleged block duplication. At this point, one of the genes from the chromosome 6 group is supposed to have been translocated to another chromosome, while two of the genes from the chromosome 9 group were translocated independently to yet two other chromosomes. There are many problems with these complex scenarios. First, no evidence whatsoever is presented in support of them. Second, they imply that tandemly duplicated genes remained in close linkage for millions of years—over one billion years in the case of ABC—and yet no gene conversion occurred between them. The supposed lack of interlocus recombination contradicts much of what we know about tandemly arrayed duplicated genes (Ohta 1991; Hughes 1996). To explain the occurrence on chromosomes 6 and 9 of members of just three families (ABC, PSMB, and HSP70), Kasahara et al. (1996) hypothesized no less than four tandem duplications, one block duplication, three translocations, and four independent gene deletions (a total of 12 genetic events). It is far more parsimonious to hypothesize that the members of each of these families duplicated independently and that each was then independently translocated to chromosomes 6 and 9; this hypothesis requires at most nine genetic events (three duplications and six translocations). The present analyses clearly demonstrate that NOTCH family members on chromosomes 6 and 9 also duplicated at a much earlier time than the alleged block duplication, indeed, before the divergence of deuterostomes and protostomes. Rather than invent yet another ‘‘just-so story’’ (Gould and Le- 868 Hughes Table 3 Summary of Gene Divergences Relative to Organismal Divergences as Indicated by Phylogenetic Analyses Family RXR . . . . . . Human Chromosomes TEN . . . . . . . 6-9 1-9 1-9 6-9 6-9 6-9 1-9 6-9 1-9 6-9 C3/4/5 . . . . . HSP70 . . . . . 6-9 6-9 COL . . . . . . ABC . . . . . . PSMB . . . . . NOTCH . . . PBX . . . . . . . Beforea Tet-Ost (95) [100] Tet-Ost (95) [100] Mav-Av (100) [100] Euk-Eub (95) [93] An-Fn (100) [100] Deu-Pro (96) [79] Tet Ost (99) [100] — — Tet-Ost (99) [82] Tet-Ost (99) [82] Gna-Ag (98) [71] An-Fn (100) [99] Table 4 Estimates of Divergence Times Based on the Number of Amino Acid Replacements per 100 Sites (dAA) Calibrated with Dates from the Fossil Record Aftera Deu-Pro (100) [95] Deu-Pro (100) [95] — — — — Deu Pro (45) [87] Deu-Pro (45) [87] — — — a The number in parentheses is the bootstrap percentage for the relevant branch in the NJ tree; the number in brackets is the bootstrap percentage for the corresponding branch in the MP tree. Organismal divergence events are abbreviated as follows: An-Fn, Animalia–Fungi; Deu-Pro, Deuterostomes–Protostomes; EukEub, Eukaryotes–Eubacteria; Gna-Ag, Gnathostomata–Agnatha; Mam-Av, Mammalia–Aves; Tet-Ost, Tetropoda–Osteichthyes. wontin 1979) of tandem duplication and deletion in the case of the NOTCH family, it is far more reasonable to reject the hypothesis that ABC, HSP70, PSMB, or NOTCH family members on chromosomes 6 and 9 were ever involved in a block duplication event. If the clusters of homologous genes on human chromosomes 6 and 9 did not all duplicate simultaneously but, in fact, duplicated at various widely different times, it remains an open question why they are clustered together. The two remaining hypotheses are (1) that these associations are the result of chance and (2) that there is an adaptive significance to this clustering; in other words, that it has been favored by natural selection. In the case of human chromosomes 6 and 9, at least four separate pairs of paralogs (from the ABC, PSMB, NOTCH, and HSP70 families) have independently translocated to the two linkange groups. Such a large number of independent events seems unlikely to have occurred by chance and suggests that there is some functional significance to clustering of these genes. It has previously been hypothesized that a gene may be translocated to a chromosomal location that is selectively advantageous and that this rearranged genotype may become fixed due to natural selection. One potential example involves the DAZ gene cluster, which encodes proteins involved in spermatogenesis and was translocated to the Y chromosome sometime in primate evolution, before the divergence of humans and orangutans (Saxena et al. 1996; Menke, Mutter, and Page 1997). Some shared functional characteristics may account for the convergent accumulation of members of 10 gene families on chromosomes 6 and 9, although the nature of these characteristics remains speculative. One characteristic shared by many of the genes on human 6p21.3 is a universal or very broad pattern of expression (table 5). This is true, for example, of the Family Human Chromosomes dAA 6 SE RXR . . . . . . . 6-1/9 27.2 6 2.4 Calibration (dAA 6 SE) Tet-Ost (16.8 6 1.7) 1-9 25.1 6 2.6 Tet-Ost (16.8 6 1.7) COL . . . . . . . 6-9 54.7 6 6.9 Deu-Pro (78.0 6 8.8) ABC . . . . . . . 6-9 149.2 6 16.0 Euk-Eub (104.6 6 7.6) PSMB . . . . . . 6-9 134.1 6 9.9 An-Fn (59.1 6 5.7) NOTCH . . . . 6-1/9 66.7 6 5.9 Deu-Pro (33.4 6 3.8) 1-9 25.5 6 3.4 Deu-Pro (33.4 6 3.8) PBX . . . . . . . 6-1/9 18.7 6 2.0 Deu-Pro (24.4 6 2.6) 1-9 18.2 6 2.5 Deu-Pro (24.4 6 2.6) TEN . . . . . . . 6-1/9 83.3 6 4.2 Tet-Ost (47.9 6 3.2) 74.5 6 4.5 Tet-Ost (47.9 6 3.2) C3/4/5 . . . . . . 6-9 131.2 6 3.9 Gna-Ag (191.9 6 2.6) HSP70 . . . . . 6-9 44.7 6 2.9 An-Fn (33.4 6 2.4) Time Estimate 6 99% Confidence Interval 648 6 149 596 6 160 561 6 181 2,140 6 592 2,268 6 430 1,896 6 365 610 6 209 612 6 167 599 6 209 696 6 90 623 6 98 579 6 44 1,604 6 268 NOTE.—Abbreviations for organismal divergence events are as in table 3. Divergence times used in calibration were the following: Euk-Eub, 1,500 MYA; An-Fn, 1,200 MYA; Deu-Pro, 800 MYA; Tet-Ost, 400 MYA. class I MHC genes and of those genes encoding molecules such as the TAP transporters and proteasome components which interact functionally with the class I molecules. 6p21.3 also includes genes for essential cellular structural elements, such as histones and b tubulin, and broadly expressed regulators of transcription, such as PBX2 and ZNF173 (table 6). 9q33–34 also includes a number of broadly expressed genes (table 6). The most interesting of these is the Surfeit housekeeping gene complex, which was recently described as ‘‘the tightest mammalian gene cluster described so far’’ (Gilley, Armes, and Fried 1997). The presence of the MHC in 6p21.3 and the Surfeit complex in 9q34 suggests the intriguing hypothesis that such complexes of highly expressed genes may act as strong attractors over evolutionary time for other highly expressed genes. It may be advantageous to locate highly expressed genes in regions that are likely to be transcriptionally active in most cells. Another characteristic of many genes in 6p21.3 and 9q34 is that they encode unusually long polypeptide chains (table 6). Often, these genes include large numbers of exons. For example, the C4A gene consists of 41 exons, while COL11A2 has 65 exons. It is possible that it is advantageous to locate such large genes in regions likely to be continually active transcriptionally, because the process of transcription and splicing of the Testing the Hypothesis of Block Duplication Table 5 Genes with Broad to Universal Expression and Genes Encoding Large (.700 aa) Proteins Located in Human 6p21.3 and 9q33–34 869 6 and 9 may indeed represent a quite rare event and therefore one for which an explanation in terms of natural selection is appealing. Acknowledgments Broad–universal expressiona 6p21.3 . . . . . HLA class Ia, HSP70 homologs, histone H1.D, This research was supported by grants R01histone H2A.1, cyclin-dependent kinase GM43940 and K04-GM000614 from the National Instiinhibitor I, serine kinase, Ndr serine/threonine kinase, ZNF173, b tubulin, valine t-RNA tutes of Health. I am grateful to Federica Verra for comsynthetase, PBX2, LMP2, LMP7, TAP1, TAP2, ments on the manuscript. RXRB, RD protein 9q33–34 . . . PSMB7, ABC2, PBX3, GRP78, Surfeit gene cluster, LITERATURE CITED gelsolin, RXRA, ribosomal protein L12, CAN Large proteinsb 6p21.3 . . . . . 9q33–34 . . . C4A (1,744), C4B (1,699), INT3 (.1,095), TENX (3,536), VARS2 (1,265), COL11A2 (1,629– 1,736), phospholipase D (841), complement factor B (764), complement C2 (753), helicase-like (1,245), female sterile homeotic homolog (755) Gelsolin (782), golgin-97 (767), CAN (2,090), TENC (2,203), C5 (1,676), COL5A1 (1,839), NOTCH1 (.2,444), ABC2 (1,472)c a References for expression patterns: Chu et al. (1995); Dobner et al. (1991); Eick et al. (1989); Gui, Lane, and Fu (1994); Hall et al. (1983); Harper et al. (1993); Kraemer et al. (1994); Kwiatkowski et al. (1986); Levi-Strauss et al. (1988); Milner and Campbell (1990); Millward, Cron, and Hemmings (1995); Monaco (1992); Williams et al. (1988). b Numbers of amino acid residues are given in parentheses where known. c Based on mouse homolog. pre-mRNA is quite complicated and presumably relatively time-consuming. Therefore, clusters of highly expressed genes such as the class I MHC and Surfeit clusters may serve to attract genes encoding large proteins. If so, the fact that the C2, factor B, and C4 complement components are large proteins may account for their linkage to the MHC in mammals rather than any advantage arising from the fact that they have an immune system function, although one that is essentially unrelated to that of the MHC class I and class II molecules themselves. The hypothesis that the clustering of paralogs on chromosomes 6 and 9 is adaptive requires testing. One way to test it would be to gather baseline data regarding the rate of occurrence of such clusters in the genomes of vertebrates; however, no genetic maps of vertebrates are as yet sufficiently detailed to provide these data. It has been estimated that the human genome contains about 7 3 104 genes (Miklos and Rubin 1996). Assuming that these represent 1,000 gene familes (Doolittle 1989), the average gene family would contain 70 members (although, obviously, some gene families contain many more members; Miklos and Rubin 1996). If one were to draw two genes at random from the genome, the chance that they would belong to the same gene family would be, on average, about 1 3 1023. Given this probability, if one were then to draw two sets of 100 genes (about the number of genes in the MHC region), the chance that four or more genes in one set would have paralogs in the other set would be less than 1025. Although rough, these calculations suggest that the independent translocation of homologs from the ABC, PSMB, NOTCH, and HSP70 families to chromosomes AYAD, S., R. P. BOOT-HANDFORD, M. J. HUMPHRIES, K. E. KADLER, and C. A. SHUTTLEWORTH. 1994. The extracellular matrix factsbook. Academic Press, London. CHU, T. W., A. CAPOSSELA, R. COLEMAN, V. L. GOEI, G. NALLUR, and J. R. GRUEN. 1995. Cloning of a new ‘‘finger’’ protein gene (ZNF173) within the class I region of the human MHC. Genomics 29:229–239. CRAIG, E. A., J. KRAMER, J. SHILLING, M. WERNER-WASHBURNE, S. HOLMES, J. KOSIC-SMITHERS, and C. M. NICOLET. 1989. SSC1, and essential member of the yeast HSO70 multigene family, encodes a mitochondrial protein. Mol. Cell. Biol. 9:3000–3008. DOBNER, T., I. WOLF, B. MAI, and M. LIPP. 1991. A novel divergently transcribed human histone H2A/H2B gene pair. DNA Seq. 1:409–413. DOOLITTLE, F. F. 1989. Redundancies in protein sequences. Pp. 599–623 in G. D. FASMAN, ed. Prediction of protein structure and the principles of protein conformation. Plenum Press, New York. EICK, S., M. NICOLAI, D. MUMBERG, and D. DOENECKE. 1989. Human H1 histones: conserved and varied sequence elements in two H1 subtype sequences. J. Cell Biol. 49:110– 115. EVANS, I. J., and J. A. DOWNIE. 1986. The nod I gene product of Rhizobium leguminosarum is closely related to ATPbinding bacterial transport proteins: nucleotide sequence analysis of the nod I and nod J genes. Gene 43:95–105. FELSENSTEIN, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791. GILLEY, J., N. ARMES, and M. FRIED. 1997. Fugu genome is not a good mammalian model. Nature 385:305–306. GOULD, S. J., and R. C. LEWONTIN. 1979. The spandrels of San Marco and the Panglossian paradigm: a critique of the adaptationist programme. Proc. R. Soc. Lond. Biol. Sci. 205:581–598. GUI, J.-F., W. S. LANE, and X.-D. FU. 1994. A serine kinase regulates intracellular localization of splicing factors in the cell cycle. Nature 369:678–682. HALL, J. L., L. DUDLEY, P. R. DOBNER, S. A. LEWIS, and N. J. COWAN. 1983. Identification of two human b-tubulin isotypes. Mol. Cell. Biol. 3:854–862. HARPER, J. W., G. R. ADAMI, N. WEI, K. KEYOMARSI, and S. J. ELLEDGE. 1993. The p21 Cdk-interacting protein Cipl is a potent inhibitor of G1 cyclin-dependent kinases. Cell 75:805–816. HIGGINS, D. G., A. J. BLEASBY, and R. FUCHS. 1992. CLUSTAL V: improved software for multiple sequence alignment. Comput. Appl. Biosci. 8:189–191. HUGHES, A. L. 1993. Nonlinear relationships among evolutionary rates identify regions of functional divergence in heat-shock protein 70 genes. Mol. Biol. Evol. 10:243–255. . 1994a. Evolution of the ATP-binding-cassette transmembrane transporters of vertebrates. Mol. Biol. Evol. 11: 899–910. 870 Hughes . 1994b. Phylogeny of the C3/C4/C5 complement component gene family indicates that C5 diverged first. Mol. Biol. Evol. 11:417–425. . 1996. Gene duplication and recombination in the evolution of mammalian Fc receptors. J. Mol. Evol. 42: 247–256. . 1997. Evolution of the proteasome components. Immunogenetics 46:82–92. IWABE, N., K. KUMA, M. HASEGAWA, S. OSAWA, and T. MIYATA. 1989. Evolutionary relationships of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes. Proc. Natl. Acad. Sci. USA 86: 9355–9359. KASAHARA, M., M. HAYASHI, K. TANAKA, H. INOKU, K. SUGAYA, T. IKEMURA, and T. ISHIBASHI. 1996. Chromosomal localization of the proteasome Z subunit gene reveals an ancient chromosomal duplication involving the major histocompatibility complex. Proc. Natl. Acad. Sci. USA 93: 9096–9101. KATSANIS, N., J. FITZGIBBON, and E. M. C. FISHER. 1996. Paralogy mapping: identification of a region in the human MHC triplicated onto human chromosomes 1 and 9 allows the prediction and isolation of novel PBX and NOTCH loci. Genomics 35:101–108. KLEIN, J. 1986. Natural history of the major histocompatibility complex. Wiley, New York. KRAEMER, D., R. W. WOZNIK, G. BLOBEL, and A. RADU. 1994. The human CAN protein, a putative oncogene product associated with myeloid leukemogenesis, is a nuclear pore complex protein that faces the cytoplasm. Proc. Natl. Acad. Sci. USA 91:1519–1523. KUMAR, S., K. TAMURA, and M. NEI. 1993. MEGA: molecular evolutionary genetic analysis. Version 1.0. Pennsylvania State University, University Park. KWIATKOWSKI, D. J., J. P. STOSSEL, S. H. ORKIN, J. E. MOLE, H. R. COLLEN, H. L. YIN. 1986. Plasma and cytosolic gelsolins are encoded by a single gene and contain a duplicated actin-binding domain. Nature 323:455–458. LEID, M., P. KASTNER, R. LYONS et al. (11 co-authors). 1992. Purification, cloning, and RXR identity of the HeLa cell factor with which RAR or TR heterodimerizes to bind target sequences efficiently. Cell 68:377–395. LEVI-STRAUSS, M., M. C. CARROLL, M. STEINMETZ, and T. MEO. 1988. A previously undetected MHC gene with an unusual periodic structure. Science 240:201–204. LUCIANI, M. F., F. DENIZOT, S. SAVARY, M. G. MATTEI, and G. CHIMINI. 1994. Cloning of two novel ABC transporters mapping on human chromosome 9. Genomics 21:150– 159. MATSUMOTO, M., and H. FUJIMOTO. 1990. Cloning of a hsp70-related gene expressed in mouse spermatids. Biochem. Biophys. Res. Commun. 166:43–49. MENKE, D. B., G. L. MUTTER, and D. C. PAGE. 1997. Expression of DAZ, an Azoospermia factor candidate, in human spermatogonia. Am. J. Hum. Genet. 60:237–241. MIKLOS, G. L. G., and G. M. RUBIN. 1996. The role of the genome project in determining gene function: insight from model organisms. Cell 86:521–529. MILLWARD, T., P. CRON, and B. A. HEMMINGS. 1995. Molecular cloning and characterization of a conserved nuclear serine (threonine) protein kinase. Proc. Natl. Acad. Sci. USA 92:5022–5026. MILNER, C. M., and R. D. CAMPBELL. 1990. Structure and expression of the three MHC-linked HSP70 genes. Immunogenetics 32:242–251. MONACO, J. J. 1992. A molecular model of MHC class-Irestricted antigen processing. Immunol. Today 13:173– 178. MONICA, K., N. GALILI, J. NOURSE, D. SATLMAN, and M. L. CLEARY. 1991. PBX2 and PBX3, new homeobox genes with extensive homology to the human proto-oncogene PBX1. Mol. Cell. Biol. 11:6149–6157. NEI, M. 1987. Molecular evolutionary genetics. Columbia University Press, New York. . 1991. Relative efficiencies of different tree making methods for molecular data. Pp. 90–128 in M. M. MIYAMOTO and J. L. CRACRAFT, eds. Recent advances in phylogenetic studies of DNA sequences. Oxford University Press, Oxford. NEI, M., and L. JIN. 1989. Variances of the average numbers of nucleotide substitutions within and between populations. Mol. Biol. Evol. 6:290–300. NONAKA, M., and M. TAKAHASHI. 1992. Complete complementary DNA sequence of the third component of complement of lamprey. J. Immunol. 148:3290–3295. OHTA, T. 1991. Multigene families and the evolution of complexity. J. Mol. Evol. 33:31–41. OTA, T., and M. NEI. 1994. Estimation of the number of amino acid substitutions per site when the substitution rate varies among sites. J. Mol. Evol. 38:642–643. PERRY, M. D., L. ANJANE, S. SHTANG, and L. A. MORAN. 1994. Structure and expression of an inducible HSP70encoding gene from Mus. musculus. Gene 146:273–278. SAITOU, N., and M. IMANISHI. 1989. Relative efficiencies of the Fitch-Margoliash, maximum-parsimony, maximumlikelihood, minimum-evolution, and neighbor-joining methods of phylogenetic tree reconstruction in obtaining the correct tree. Mol. Biol. Evol. 6:514–525. SAITOU, N., and M. NEI. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406–425. SAXENA, R., L. G. BROWN, T. HAWKINS et al. (11 co-authors). 1996. The DAZ gene cluster on the human Y chromosome arose from an autosomal gene that was transposed, repeatedly amplified and pruned. Nat. Genet. 14:292–299. SWOFFORD, D. L. 1990. PAUP: phylogenetic analysis using parsimony. Illinois Natural History Survey, Champaign. TOMITA, M., T. KINOSHITA, S. IZUMI, S. TOMINO, and K. YOSHIZATO. 1994. Characterizations of sea urchin fibrillar collagen and its cDNA clone. Biochem. Biophys. Acta 1217:131–140. WALTER, B., A. YEN, J. WASMUTH, and M. SMITH. 1987. Selection of somatic cell hybrids containing human chromosome 9 using a temperature sensitive CHO valyl-t-RNA synthetase mutant. Cytogenet. Cell Genet. 46:710. WILLIAMS, T., J. YOU, C. HUXLEY, and M. FRIED. 1988. The mouse surfeit locus contains a very tight cluster of four ‘‘housekeeping’’ genes that is conserved through evolution. Proc. Natl. Acad. Sci. USA 85:3527–3530. MANOLO GOUY, reviewing editor Accepted March 19, 1998
© Copyright 2026 Paperzz