J Mol Evol (1987) 24:309-318 Journal of Molecular Evolution (~) Spfinger-Verlag New York Inc. 1987 Nucleotide Sequence of the Beta-Globin Genes in Gorilla and Macaque: The Origin of Nucleotide Polymorphisms in Human P. Savatier, G. Trabuchet, Y. Chebloune, C. Faure, G. Verdier, and V.M. Nigon D6partement de BiologieG6n6raleet Appliqu~e--UA92, Universit~Claude Bernard--LyonI, 43 Boulevarddu 11 novembre 1918, 69622 VilleurbanneCedex, France Part ofthebeta-globingenes of Macaca cynomolgus and Gorilla gorilla has been cloned and sequenced. Ten putatively neutral nucleotide polymorphisms have been described at the beta-globin locus in humans. They are associated in seven combinations, which define seven different haplotypes o f the beta-globin gene: four major frameworks-- 1, 2, 3, and 3*--and three minor frameworks, which we term KI1, KA1, and OR1. The nucleotide sequences o f these frameworks are compared with those o f homologous sequences in chimpanzee, colobus, macaque, and gorilla. This comparison provides strong evidence that framework 2 was the earliest f r a m e w o r k in the h u m a n lineage. F r o m framework 2, a rooted parsimonious tree for the six other frameworks is constructed. This phylogenetic tree is discussed in terms of the evolution o f nucleotide polymorphisms as well as in terms of genetic affinities between human populations. For each position at which there is base difference in comparing human, gorilla, and chimpanzee betaglobin genes, the phyletic lineage where the corresponding substitution occurred has been identified using the max imu m parsimony procedure. The data provide evidence that polymorphisms may represent a significant component of differences between closely related species. If so, nucleotide polymorphisms may strongly bias estimates o f small evolutionary distances. Summary. Offprint requests to: P. Savatier Key words: D N A sequencing -- D N A evolution - - Macaque -- Gorilla -- Beta-globin gene -- NucIeotide polymorphisms -- Intraspecific versus interspecific variations -- Hominine phylogeny Introduction Many studies based on D N A and protein comparisons have attempted to resolve the dichotomous branching order among the Hominoidea (human, chimpanzee, gorilla, orangutan, and gibbon). Human, chimpanzee, and gorilla are undoubtedly the three most closely related species. However, the resolving power o f most techniques does not allow separation of their respective branching nodes, leading to the suspicion that the branchings occurred within a short time of one another (Sibley and Ahlquist 1984; O'Brien et aL 1985; Templeton 1985; Koop et al. 1986; Savatier et al. 1987). Mitochondrial restriction site data favor a closer relationship between chimpanzee and gorilla (Templeton 1985), but alternative phylogenies cannot be rejected. Recently, the identification o f an additional immunoglobulin C-epsilon pseudogene in human and gorilla DNAs suggested that these two species are the most closely related (Ueda et al. 1985). The first complete nucleotide sequence of the human beta-globin gene was determined by Lawn et al. (1980). Part or all o f about 40 additional nucleotide sequences have since been reported from the D N A of geographically scattered individuals. 310 These sequences show 10 silent, putatively neutral nucleotide polymorphisms (for a review, see Orkin and Kazazian 1984; Antonarakis et al. 1985) that define four major haplotypes of the human betaglobin gene: frameworks I, 2, 3, and 3*. Since the four frameworks are found together in different human populations, their original appearance may have predated racial divergence. In this report, we compare the nucleotide sequences of these various human beta-globin frameworks with the homologous sequences in chimpanzee (Pan troglodytes, Savatier et al. 1985) and colobus monkey (Colobus polykomos, Martin et al. 1983). We also compare them with the homologous sequences in macaque (Macaca cynomolgus) and lowland gorilla (Gorilla gorilla), which are reported in this article. Using the maximum parsimony procedure, we constructed a nucleotide sequence for the beta-globin gene of the common ancestor of Homo, Pan, and Gorilla. Based on this ancestral sequence, we analyze the origin of nucleotide differences among chimpanzee, gorilla, and human at the beta-globin locus and we propose a rooted phylogenetic tree for the human beta-globin frameworks. We also analyze the effects of intraspecific nucleotide variation on estimates of evolutionary distances between closely related species. Base substitution rates and hominoid phylogeny are discussed in light of these results. Materials and Methods The strategyfor sequencingthe gorilla beta-globin gene is presented in Fig. 1. Savatier et al. (1987) gives the DNA isolation, cloning, and sequencingprocedures. Results Nucleotide sequences of a macaque beta-globin gene and a gorilla beta-globin gene were determined and then compared with the orthologous sequences from human (Poncz et al. 1983), chimpanzee (Savatier et al. 1985), and colobus monkey (Martin et al. 1983) (Fig. 2). The comparison was restricted to a 1396nucleotide-long segment (positions + 1 to + 1396 with respect to the cap site). In this comparison, we have included putatively neutral nucleotide polymorphisms that have been identified at the beta-globin locus in human, and excluded m u t a t i o n s responsible for beta-chain hemoglobinopathies and beta-thalassemias. We have also excluded structural variations of the beta-globin polypeptide, even though some of them are putatively neutral (Bunn et al. 1977), because for most of them the replacement substitution has not been identified at the DNA level. We have considered therefore only 10 neutral nucleotide polymorphisms, which have been identified from nucleotide sequencings of about 40 human beta-globin genes carried out by Lawn et al. (1980), Orkin et al. (1982), Kimura et al. (1983), Poncz et al. (1983), Kazazian et al. (1984), Orkin and Kazazian (1984), and Antonarakis et al. (1985). 1. Positions Involved in Nucleotide Polymorphisms in Humans The 10 nucleotide polymorphisms mentioned above are shown in Fig. 2. They occur at nine nucleotide positions, of which eight display two different nucleotides and one (position +569) displays three (Figs. 2 and 3B). Seven combinations of these polymorphisms have been identified in human populations, which define seven haplotypes or frameworks of the human betaglobin gene (Fig. 3B). Four of them (major frameworks 1, 2, 3, and 3*) are found widely among different human populations. The other three occur only once each in the 40 reported human beta-globin gene sequences. One of these three frameworks has been identified by restriction endonuclease analysis in two other individuals. We term the three minor frameworks OR1, KI1, and KA1 in reference to Orkin et al. (1982), Kimura et al. (1983), and Kazazian et al. (1984), who first identified the corresponding nucleotide polymorphisms. The seven frameworks can be derived from each other by more than one branching route. The route illustrated in Fig. 4 corresponds to the maximum parsimony solution. Framework KI 1 can be derived from either framework 1 or framework 2; the two solutions are equally parsimonious (three substitutions are necessary in both cases). The maximum parsimony network costs 10 substitutions. Making use of the comparisons between human and nonhuman primate sequences shown in Fig. 2, we can identify framework 2 as the most primitive. At every position involved in a nucleotide polymorphism (except +59 and + 511), the nucleotides found in human framework 2 are identical to the homologous nucleotides in the macaque, gorilla, chimpanzee, and colobus DNAs (Fig. 3C). Identification of framework 2 as the most primitive framework allows the determination of the nucleotide of the ancestral human--chimpanzee-gorilla gene (which we call "beta AN") at each of the seven polymorphic positions (Fig. 3C). This ancestral sequence remains ambiguous at position +59 (T in macaque and colobus, C in gorilla and chimpanzee). At position + 511, the nucleotide found in gorilla (T) is the only one different from that o f framework 311 , 5" 1 kb i b ~ a I: E I ,k nr-i. 3a t~ I Eco * S -4000 i -3000 r -2000 i -1000 Bam * ECO .I Fig. 1. Linkage maps of the gorilla delta- and beta-globin gene region deduced from blot hybridization of genomic DNA (data not shown). The map of the "Go beta EcoRI 6.4kb" clone is shown expanded. Both sequencing and restriction endonuclease mapping provided evidence that the extra 900 nucleotides found in the gorilla 6.4-kb insert with respect to both the human and chimpanzee 5.5-kb inserts come from an additional segment about 200 nucleotides 3' of the intergenic EcoRI site (data not shown). The strategy for sequencing the gorilla beta-globin gene is indicated by the horizontal arrows below the maps, indicating sequencing either from restriction endonuclease sites (filled circles) or from Bal31-generated ends (open circles). E or Eco, EcoRI; Barn, BamHI. In bar at right, the hatched area represents the 5' untranslated region, the filled areas represent introns, and the open areas represent exons 2 (C). It is also different f r o m the nucleotide f o u n d in the other f r a m e w o r k s (G). Therefore, we suggest that m o s t p r o b a b l y (1) C was present at position + 511 in beta A N , (2) a C -~ T substitution occurred at this position in the gorilla lineage, and (3) a C -~ G substitution occurred in s o m e h u m a n s . A rooted p a r s i m o n i o u s tree was constructed f r o m the seven h u m a n beta-globin f r a m e w o r k s (Fig. 5). F r a m e w o r k 1 and f r a m e w o r k KA1 were derived directly f r o m the root node ( f r a m e w o r k 2) through only one substitution each. F r a m e w o r k s 3", 3, a n d O R I derive f r o m the root b y three, four, a n d five substitutions, respectively. T h i s implies the existence o f two hypothetical i n t e r m e d i a t e f r a m e w o r k s between f r a m e w o r k 2 and f r a m e w o r k 3* that h a v e not yet been identified in h u m a n populations. We call these f r a m e w o r k s H1 and H2. One o f t h e m could be easily investigated since positions + 59 a n d + 511, which define it, can be detected b y the restriction endonucleases HgiA1 a n d A v a i l , respectively. 2. Positions Not Involved in Nucleotide Polymorphisms in Humans At 21 nucleotide positions there are base differences between f r a m e w o r k 2 a n d c h i m p a n z e e or f r a m e w o r k 2 and gorilla (Fig. 2). T h i s implies that the 21 corresponding substitutions arose either in the p h y letic lineages leading to c h i m p a n z e e or gorilla or in the lineage leading to h u m a n p r i o r to the emergence o f h u m a n beta-globin gene alleles. M o s t o f these 21 substitution sites are in introns; five are at silent positions in coding sequences. O n e difference leads to an a m i n o acid r e p l a c e m e n t (residue position 104, Arg ~ Lys) between h u m a n and gorilla, in agreem e n t with the c o r r e s p o n d i n g a m i n o acid sequences ( G o o d m a n et al. 1983). For each o f these 21 nucleotide positions, let us consider the h o m o l o g o u s nucleotides in h u m a n , c h i m p a n z e e , m a c a q u e , gorilla, a n d (as far as known) in colobus (Table 1). Eighteen belong to the onesolution class. T h a t is, for each o f these 18 positions, four o f the five species c o m p a r e d b e a r the s a m e nucleotide; only one o f the three species h u m a n , c h i m p a n z e e , a n d gorilla bears a nucleotide that is different f r o m the one f o u n d in the other four. T h e m a x i m u m p a r s i m o n y p r o c e d u r e depicts the nucleotide that is f o u n d four t i m e s (or three times w h e n the h o m o l o g o u s nucleotide in colobus is u n k n o w n ) as already present in the last c o m m o n ancestor o f h u m a n , c h i m p a n z e e , a n d gorilla. A substitution is taken to h a v e occurred in the phyletic lineage corresponding to the nucleotide represented only once. In the a s s i g n m e n t o f the 18 substitutions to the three phyletic lineages (Table 1), we find 4 substitutions in the gorilla lineage, 6 in the c h i m p a n z e e , a n d 8 in the h u m a n . F o r each o f the three r e m a i n i n g positions ( 4 4 9 4 , + 7 1 5 , and + 1240, T a b l e 1), the nucleotide o f beta A N c a n n o t be d e t e r m i n e d since each nucleotide is represented twice. These three sites a p p e a r to be cladistically i n f o r m a t i v e . U n f o r t u n a t e l y , each position results in a different p a r s i m o n y cladogram. One (position 4 4 9 4 ) groups h u m a n with c h i m p a n zee, a n o t h e r (position + 715) c h i m p a n z e e with gorilla, and the third (position + 1240) h u m a n with gorilla. T h u s the data do nothing to resolve the trichotomy. 312 +1 polymorph. H.sapiens P.troglodytes C.gorille M.eynomolgus C.polykomos Po], HsB. Ptr, Ggo, Mcy. Cpo. t" ACATTTGCTTCTGACACAACTGTCTTCACTAGCAACCTCAAACAGACACCATGgtgcaoctga C C t t 0 T +100 etcctgsggBgeegtotgccqttaotgocetgtggggeaaggtgaacgtggstgaagttggtggtgagge BB g t c CB C Cpo. +200 G cctgggcBggTTGGTATCAAGGTTACAAfiACACGTTTAAGCAGACCAATAGAAACTGGGCATGTGGAGAC C G C TC A GA 0 t C GA G Pol. Hga. Ptr. Ggo. Mmy. Cpo, AGAGAAGACTCTTGCGTTTCTGATAGGCACTGACTCTCTCTGCCTATTGGTCTATTTTCCCACCCITAGG G fi G xxxxxxxx G Pol. HBS, Ptr, Ggo. Mcy. Pol, Hsa. Ptr, Ggo. +300 etgctggtggtetaeccttggaeeeagaggttctttgagteetttggggatetgtcceetcctgatgetg Hey. Cpo. Pol, Hse, Ptr. Ggo, Mcy. Cpo, Pol. Hsa. Ptr. r,go, Mcy. Cpo. PoZ. HBB, Ptr. Ggo. Hey. t t +400 ttatgggcaaeectaaggtgeeggctcatggeaegeaagtgctcggtgeetttagtgatggoctggetca t B@ cctggacaacctcaagggeacctttgceacactgagtgagctgcactgtgacaagctgeaegtggatcct t oag c tag t t / +500 "G gBgaeottcBggGTGAGTCTATCGGACCCTTGATGTTTTCTTTCCCCTTCTTTTCTATGGTTAAGTTCAT a a T Number of Nucleotide Differences Between the Ancestral "'Beta A N " Gene and Each Present-Day Beta-Globin Gene T h e results r e p o r t e d in s e c t i o n s 1 a n d 2 are s u m m a r i z e d in Fig. 6. W e shall a s s u m e the t r i c h o t o m y a m o n g Homo, Pan, and Gorilla. In the p h y l e t i c lineage o f Homo, w e h a v e e s t i m a t e d eight differences b e t w e e n beta A N and the s t e m gene ( e x c l u d i n g the three cladistically i n f o r m a t i v e p o s i t i o n s ; see T a b l e 1). Similarly, w e h a v e e s t i m a t e d six n u c l e o t i d e differences b e t w e e n the ancestral b e t a - g l o b i n g e n e a n d the c h i m p a n z e e b e t a - g l o b i n gene a n d five n u c l e o t i d e differences b e t w e e n the ancestral b e t a - g l o b i n g e n e a n d the gorilla b e t a - g l o b i n gene. Finally, b e t w e e n the s t e m g e n e and the h u m a n b e t a - g l o b i n gene w e h a v e c o u n t e d z e r o , o n e , three, four, or five differe n c e s , d e p e n d i n g o n the f r a m e w o r k . A C Fig. 2. Nucleotide sequence of a human (Hsa.) beta-globin gene (Poncz et al. 1983), aligned with beta-globin gene sequences from a chimpanzee (Ptr.), a gorilla (Ggo.), a colobus (Cpo.), and a macaque (Mcy.) (Martin et al. 1983; Savatier et al. 1985). Nucleotide polymorphisms (Pol.) idenlifted in human DNA by Lawn et al. (1980), Orkin et al. (]982), Kimura et al. (1983), Poncz et al. (1983), Kazazian et al. (1984), Orkin and Kazazian (1984), and Antonarakis et al. (1985) are also shown. Numbering is from the human cap site. Gaps used to maximize identities in introns are indicated by x's, and the end of the sequenced colobus segment by a slash. Exons are reported in upper-case letters. At position + 1156, the chimpanzee sequence has G instead of the A as previously reported in Savatier et al. (1985). The complete nucleotide sequence for the human betaglobin gene given by Poncz et al. (1983) is given on the second line. This nucleotide sequence is that of framework 2 (see section 1 of Results). Only nucleotide differences found between either species or alleles are indicated Discussion 1. Implications of Nucleotide Polymorphisms for the Estimation of Genetic Distances Between Closely Related Species Our data g i v e n e w i n f o r m a t i o n c o n c e r n i n g the e v o l u t i o n a r y d i s t a n c e s a m o n g h u m a n , c h i m p a n z e e , and gorilla. W e find that 1 4 - 1 9 differences h a v e a c c u m u l a t e d b e t w e e n h u m a n a n d c h i m p a n z e e at the b e t a - g l o b i n l o c u s d e p e n d i n g on w h i c h h u m a n s e q u e n c e is c o n sidered. T h i s leads to b a s e s u b s t i t u t i o n f r e q u e n c i e s ranging f r o m 1% to 1.4%. T h e range w o u l d p r o b a b l y i n c r e a s e i f n u c l e o t i d e p o l y m o r p h i s m s in c h i m p a n z e e D N A w e r e k n o w n . T h e r e f o r e , o u r data p r o v i d e direct e v i d e n c e that p o l y m o r p h i s m s r e p r e s e n t a n o - 313 Pol. Hsa, Ply. Ggo. Mcy. +60O T GTCATAGGAAGGGGAGAAGTAACAGGQTACAGTTTAGAATGGGAAACAGACGAATGATTGCATCAGTGTG T T A T T G G A A Pol. Hsa. Ptr. Ggo. Hey. AIT GAACTCTCAGGATCGTTTTAGTTTCTTTTATTTGCIGTTCATAACAATTGTTTTCTTTTGTxxTTAATTC C A C A C Ax G T GT Pol. Hsa. Ptr. Cgo. Hey. TTGCTTTCTTTTTTTTTCTTCTCCGCAATTTTTACTATTATACTTAATGCCTTAACATTGTGTATAACAA T C T C C C x T T TGC A +700 +80O Pot. Hsa. Ptr. Cgo. Hey. AAGGAAATATCTCTGAGATACATTAAGTAACTTAAAAAAAAxACTTTACACAGTCTGCCTAGTACATTAC TG G A G T C GC G +900 Pol. Hsa. Ptr. Ggo. Ney. TATTTCGAATATATGTGTGCTTATTTGCATATTCATAATCTCCCTACTTTATTTTCTTTTATTTTIAATT A A G A Cx Pol. Hsa. Ptr. Cgo. Mcy. GATACATAATCATTATACATATTTATGGGTTAAAGTGTAATGTTTTAATATGTGTACAxxxxxCATATTG A T CA A CATTG +I000 Pol, Haa. ACCAAATCAGGGTAATTTTGCATTTGTAATTTTAAAAAATGCTTTCTTCTTTTAATATACTTTTITGTTT Ptr. Ggo. Mcy. T IT x GC +llOO Pol. Hsa. Ptr. Ggo. Mcy. Pol. Hsa. Ptr. Ggo. Mcy. ATCTTATTTCTAATACTTTCCCTAATCTCTTTCTTTCAGGGCAATAATGATACAATGTATCATGCCTCTT C T A C CC T C TGCACCATTCTAAAGAATAACAGTGATAATTTCTGGGTTAAGGCAATAGCAATATTTCTGCATATAxxxx G T AGTA +1200 Pol. Hsa, Ptr. Ggo. Mcy. xAATATTTCTGCATATAAATIGTAACTGATGTAAGAGGITTCATATTQCTAATAGCAGCTACAATCCAGC T T G T T +1300 Pol 9 Hsa 9 Ptt. Ggo Mcy. TACCATTCTGCTTTTATTTTATGGTTGGGATAAGGCTGGATTATTCTGAGTCCAAGCTAGGCCCTTTTGC T 9 Pol. Hsa, G T T A a TAATCATGTTCATACCTCTTATCTTCCTCCCACAGcte~ Ptr. Ggo. Moy. T +1396 Pol 9 Hsa. Ptr. tcactttggcaaaga Ggo. Mcy, F i g . 2. Continued G G G t C g 314 5'NT exon I IVS I ,,'....-.. , , posit. ; I B +83 Cc C C T T C C C C T C C c C ~ ORI I =" '~ I KAI C C chimp goril colob C C C macaq b e t a AN C C +185 m I I I , .. +59 3 3* '~ .--_. +20 ~ ~'* 112 exon I l l ".'....-... - A IVS II exon I I I , % , .. % , +511 +569 +576 +1161 +1351 A A A A C C G G G T T T C C T C T T C C G G G G C C A A G A G C C T A T T C C C T T G A G C C T T C C C C A A A A C T . C T T C C T T G G C/T C A C . . . . T C T G T C T G Fig. 3A-C. A Positions of the 10 nucleotide polymorphisms identified at the h u m a n beta-globin gene locus between positions + 1 and + 1396 (numbering is that used in Fig. 2). B Combinations of these nucleotide polymorphisms that define the seven frameworks identified in human populations. C Homologous nucleotides in the chimpanzee, gorilla, macaque, and colobus DNAs analyzed in this report. "Beta A N " is the ancestral human-gorilla--chimpanzee beta-globin gene. IVS, intervening sequence; NT, untranslated region ! . . . . . . . !KAI! ' ...... ' !Kn 3 subst, ~ q 3 subst. Irlb I _.4 2 i- Fig. 4. Maximum parsimony network for the seven h u m a n beta-globin frameworks. Major frameworks are enclosed within a continuous line, minor frameworks within a dashed line ticeable part of interspecific differences between closely related species; nucleotide polymorphisms thus introduce a bias into calculations of evolutionary distances between closely related homologous sequences. In sequence comparisons among human, chimpanzee, and gorilla eta-globin pseudogenes, Chang and Slightom (1984) found that both the chimpanzee and the gorilla pseudogenes have accumulated twice as many substitutions as their human counterpart since the time these three species began to diverge. This led these authors to conclude that the base substitution rate was lower in H o m o than in Pan and Gorilla. In contrast, in comparing the deltabeta-globin intergenie sequences of human, chimpanzee, and macaque (Savatier et al. 1987), we did not find any difference between the base substitution frequencies in Homo, 0.73 +_ 0.14, and in Pan, 0.83 315 CCCACTCTG1 ~ . . . . . . . . . . . . . . . . . . . . 1 subst. . n d . . . . . . . . . . framework 1 , . ~ 1 7 6 1 7 6 1 7 6 1 .7 . 6 . 1 . 7. 6. . . 2 . framework 2 ~ . . . . . . . . . . . . . . . . . . . . . . . . . . ~...... J ~ramework KA1 ~ f r a m e w o rH1 k . . o o ~ 1 7 6 1 7 6 1. .7 . 6. . ,. ~ ~ 1 7 6 1 7 6 ~ subst. § , ~ 1 7 6 . . . . . . . . . , , ~ 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 l ~ framework H2 1 ~ 1 7.6. . . . . 9 ...... 9 .... ..~ .......... ~ ..... ...~ .............. ~176 ! th 3 subst, 9 . . . . . . . . 4 . . 5 ~C~~ - ~ _ - i ~ frameworkKI1 I ~ I framework3* i o ~ 1 7 6 1 7 6 1 7 6. . . . . . . . . ~ 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1~ 7 6 1 7 ~6 1 1 7 76 61 17 76 61. 71. 67. 1.6 .7 . 6 subst. . . th . . . . . . . subst. . , ~ 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1. .7 . 6. 1 7 6 1 framework 3 . . . . . . . . . . . . . . . . . . . . . . . . . . i "'i_ . . . . . . . . ~BICAGTTCG ; ' ~-; framework OR1 _ 0.15 (P > 35%). This discrepancy might be explained in part by nucleotide p o l y m o r p h i s m s inside the c o m p a r e d regions, since both the c h i m p a n z e e and the h u m a n whose D N A s were analyzed by Chang and Slightom (1984) were different from ours. Therefore, whether base substitution rates have been equal in h u m a n and African great apes (gorilla and chimpanzee) is still an open question. We suggest that the estimation o f accurate evolutionary distances between closely related species depends on three conditions: (1) T h e main nucleotide polymorphisms inside the compared region must be identified. (2) T h e sequence c o m p a r i s o n must be between species that are as closely related as possible. [The ambiguous status o f positions + 4 9 4 , +715, and + 1 2 4 0 in our data m a y be due in part to the distance o f macaque from hominoids; identification o f the h o m o l o g o u s nucleotides in the orangutan, which is m u c h closer to the African great . . . . . . . . . . . . . . . . . . . . . . Fig. 5. Rooted p a r s i m o n i o u s tree constructed from the seven h u m a n betaglobin frameworks. Only the nine nucleotides that define these frameworks 7 6 are shown. Each connecting line between two nodes corresponds to one substitution. For each node, the position of the substitution is indicated by a lower-case white letter. F r a m e w o r k s H 1 and H2 are hypothetical intermediate frameworks (see text). Each frame- work has been classified as a one-, two-, three-, four-, or five-substitution type descendant according to the number of substitutions since the root apes than the m a c a q u e (Sibley and Ahlquist 1984; K o o p et al. 1986) might make it possible to resolve these ambiguous positions.] (3) T h e c o m p a r i s o n must be between large nucleotide segments so as to give m o r e accurate estimates o f base substitution frequencies. It is n o t e w o r t h y that the overall n u m b e r o f nucleotide p o l y m o r p h i s m s might be m u c h higher in h u m a n s than in chimpanzees because o f the large differences in effective population size. If so, the base substitution frequencies in the Pan and Homo lineages m a y not be strictly comparable. 2. Branching Patterns Within Hominines Based on Nonquantitative Methods Various quantitative m e t h o d s have been used to resolve the branching order a m o n g h u m a n , gorilla, and chimpanzee based on data for b o t h nuclear and 316 Table 1. Nucleotide positions showing base differences between the h u m a n framework 2 sequence and the chimpanzee or gorilla homologous sequence Nucleotide position in the h u m a n Lineage where substitution Nucleotide in sequence" Macaque Chimp Gorilla Colobus Human Beta A N b occurred + 168 + 181 + 254 +494 + 627 +638 +715 + 730 + 748 + 789 + 803 +804 + 810 +872 + 1048 + 1156 + 1207 + 1240 + 1250 + 1331 + 1357 C A G A C A C T G T A G A A A A G T G G C C G G G C A T T G T T G G A C G G T T G C C A G A C A T C G A A G A A A A T C G G T C A G ------------------- T A T G G G C T A T A C A C A A G C G A C C A G A/G C A C/T T G T A G A A A A G Human Chimp Human ? Human Human ? Gorilla Human Gorilla Chimp Human Chimp Human Chimp Chimp Gorilla T/C ? G G C Chimp Human Gorilla = N u m b e r e d as in Fig. 2 "Beta A N " denotes the deduced sequence o f the beta-globin gene o f the c o m m o n ancestor of h u m a n , chimpanzee, and gorilla "beta AN" ~ n m ~ O ~ m t e m genm 2 G O R I L L A C H I M P A N Z E E 3 ~ 3" ii HUMAN mitochondrial DNA: percentages of substitutions, restriction endonuclease mapping, thermal stability of D N A - D N A hybrids, and maximum parsimony analysis of nucleotide substitutions (for review, see Sibley and Ahlquist 1984; Templeton 1985; Koop et al. 1986; see also Savatier et al. 1987). Data were found to support each of the three alternative cladograms but most of these methods were not able to separate the divergence nodes with statistical significance. Why is it that there is not better agreement among these different experimental approaches? First, probably no more than 2 million years separate the or1 = kil k~t! / Fig. 6. N u m b e r o f nucleotide differences found between each present-day beta-globin gene and either the ancestral gene beta AN or the " s t e m gene" human-chimpanzee, human-gorilla, and chimpanzee-gorilla divergence nodes; this represents less than 20% of the time since human-chimpanzee-gorilla divergence [about 8 million years according to Sibley and Ahlquist (1984); see also Koop et al. (I 986)]. Since the accuracy of calculated evolutionary distances is limited either by the length of the compared nucleotide sequences or by the variability of the experimental results (as in D N A - D N A hybridization), the resolving power of any one of these techniques is not sufficient to separate the divergence nodes with statistical significance. Second, we have shown that nucleotide polymorphisms might rep- 317 resent a noticeable part of the interspecific differences among human, gorilla, and chimpanzee at the beta-globin locus. Templeton (1985) has previously noted that these interspecific differences are not evolutionarily significant unless they are much larger than the intraspecific differences. In this context, nonquantitative methods have been interpreted as more reliable. Ueda et al. (1985) found that a truncated immunoglobulin epsilon pseudogene (C epsilon 2) is present in human and gorilla but absent from the DNAs of all other catarrhines analyzed in their report. This led them to conclude that gorilla is more closely related to human than is chimpanzee. In the nucleotide sequence of the beta-globin gene, we found three cladistically informative positions. Unfortunately, these three positions lead to the three alternative hypotheses for the branching order among Homo, Pan, and Gorilla. The base difference found between primates at position +494 (A in both macaque and gorilla, G in both human and chimpanzee) leads to the single amino acid replacement (residue position 104) found between the gorilla and human and chimpanzee beta-globin polypeptides. Similarly, human and chimpanzee have arginine and aspartic acid at position A-gamma-104 and alpha23, respectively; gorilla and all other catarrhines that have been analyzed have lysine and glutamic acid, respectively, at these positions (Goodman et al. 1983). This led Goodman et al. (1983) to conclude that human and chimpanzee probably are more closely related than are human and gorilla or chimpanzee and gorilla. Ueda's and Goodman's conclusions can be reconciled if one assumes that the Homo, Pan, and Gorilla lineages arose from a common population in which the above-mentioned differences were already present as polymorphisms (i.e., both presence and absence of the C epsilon 2 immunoglobulin pseudogene, both arginine and lysine at positions beta- 104 and A-gamma- 104, both aspartic and glutamic acid at position alpha-23). The branching of Homo, Pan, and Gorilla might have preserved different combinations of these ancestral polymorphisms and discarded others. In this case, one might expect to draw different conclusions about their dichotomous branching order depending on the chromosome or recombination region analyzed. 3. Origin and Phylogeny of the Human Beta-Globin Frameworks The data presented here provide strong evidence that framework 2 was the stem or founder framework in which successive substitutions led to the present-day polymorphisms at the beta-globin locus in humans. Other polymorphisms might have ex- isted prior to the diversification of this stem framework. All but one (framework 2) apparently disappeared, although we cannot exclude that some may still be present at low frequencies. Major frameworks have been identified in most human populations so far analyzed (American black, Mediterranean, Indian, Chinese, and Cambodian), which suggests that their appearance predated racial divergence. However, framework 1 is present in more than 95% of Africans (Pagnier et al. 1985). On the other hand, framework 3 has been identified only in Mediterraneans and American blacks, and framework 3* only in South Asian populations. Since framework 3* most probably gave rise to framework 3 (Fig. 5), it might be expected that the former would be as widely distributed as the latter. Two hypotheses can be considered to explain the discrepancy: 1. Genetic distances measured on the basis of both nucleotide polymorphisms at protein loci and restriction enzyme polymorphisms provide evidence that the Negroid group and Caucaso-Mongoloid group diverged from each other first and that the Caucasoid and Mongoloid groups diverged from each other later (Nei and Roychoudhury 1972, 1982; Wainscoat et al. 1986). This proposed phylogeny requires framework 3 to have arisen prior to the Negroid-Caucaso-Mongoloid split and framework 3* to have disappeared independently from both the Negroid and Caucasoid populations. 2. According to the most parsimonious solution, both the appearance of framework 3 and disappearance of framework 3* took place in a NegroidCaucasoid stem population prior to the NegroidCaucasoid split. This implies that the Mongoloid group and the Negroid-Caucasoid group diverged from each other first and the Caucasoid and Negroid groups from each other later. Previously, only blood groups and histocompatibility locus antigens had supported this phylogeny (Cavalli-Sforza and Edwards 1964; Piazza et al. 1975). The minor frameworks (OR1, KI1, and KA1) have been identified only once each in about 40 beta-globin gene sequences. In the parsimonious network (Fig. 4), these minor frameworks appear on exterior branches and in no case do they constitute intermediate forms. Both observations suggest them to be of more recent origin than the major frameworks. Unidentified m i n o r beta-globin frameworks probably exist in human populations. The parsimonious tree presented in Fig. 5 assumes the existence of additional frameworks as intermediates. Only one of them could be easily investigated (framework H1 or H2) since both positions that define it are detectable by restriction endonuclease mapping. However, sequence data suggest that only 10% of individuals may be minor-framework car- 318 tiers, which implies that their detection depends on the improvement of methods that screen genomic DNA directly rather than on sequencing. If interm e d i a t e f r a m e w o r k s H1 a n d H 2 still exist, t h e i r geographical location might provide information on t h e p r e h i s t o r i c m i g r a t i o n s a n d g e n e t i c affinities o f human populations. H o w o l d is t h e h u m a n s t e m gene? A s s u m i n g t h a t an average of two positions have undergone nucleotide substitutions between the stem gene and the present-day frameworks and that nine substit u t i o n s h a v e b e e n fixed b e t w e e n b e t a A N a n d t h e s t e m gene, the e v o l u t i o n a r y d i s t a n c e b e t w e e n t h e stern gene a n d t h e p r e s e n t g e n e s w o u l d r e p r e s e n t a b o u t 20% o f the h u m a n - g o r i l l a - - c h i m p a n z e e d i v e r g e n c e t i m e o f a b o u t 8 m i l l i o n y e a r s , i.e., a p p r o x i m a t e l y 1.5 m i l l i o n y e a r s . N e i a n d R o y c h o u d h u r y (19 82) e s t i m a t e d t h a t t h e N e g r o i d l i n e a g e s e p a r a t e d f r o m the C a u c a s o - M o n g o l o i d l i n e a g e 120,000 y e a r s ago. T h e r e f o r e , a s s u m i n g t h a t all f o u r m a j o r frameworks arose prior to the Negroid-CaucasoM o n g o l o i d split, o u r e s t i m a t e o f 1.5 m i l l i o n y e a r s is fully c o n s i s t e n t w i t h N e i ' s . 4. The Evolution of Polymorphisms As already mentioned, our data show the absence of intermediate forms between beta AN and the s t e m gene as well as b e t w e e n t h e s t e m g e n e a n d t h e p r e s e n t - d a y f r a m e w o r k 3*. T h i s s i t u a t i o n m i g h t b e a d i r e c t c o n s e q u e n c e o f g e n e t i c drift. C o n s i d e r i n g the l i m i t e d n u m b e r o f a l l e l e s t h a t c a n b e m a i n t a i n e d in a finite p o p u l a t i o n , t h e s p r e a d i n g o f n e w a l l e l e s and the disappearance of older ones, whether by neutral drift or by linkage of a polymorphic form to a s e l e c t i v e l y a d v a n t a g e o u s m u t a t i o n c a p a b l e o f i n d u c i n g c h a n g e s in allele f r e q u e n c i e s a n d loss o f g e n e t i c v a r i a b i l i t y , will n e c e s s a r i l y b e c o n c u r r e n t . Acknowledgments. We are grateful to Dr. Ivanoff (Centre de Recherches Mrdicales de Franceville, Gabon) for providing us blood samples from gorilla. This work was supported by research grants from INSERM, CNRS, and the Commission of the European Communities. References Antonarakis SE, Kazazian HH, Orkin SH (1985) DNA polymorphism and molecular pathology of the human globin gene cluster. Hum Genet 69:1-14 Bunn HF, Forget BG, Ranney HN (1977) Human hemoglobins. WB Saunders, Philadelphia Cavalli-Sforza LL, Edwards AWF (1964) Genetics today. Pergamon, Oxford Chang LYE, Slightom JL (1984) Isolation and nucleotide sequence analysis of the beta type globin pseudogene from human, gorilla and chimpanzee. J Mol Biol 180:767-784 Goodman M, Braunitzer G, Stangl A, Schrank B (1983) Evidence on human origins from haemoglobins of African apes. Nature 303:546-548 Kazazian HH Jr, Orkin SH, Antonarakis SE, Sexton JP, Boehm CD, GoffSC, WaberPG (1984) Molecular characterization of seven beta-thalassemia mutations in Asian Indians. EMBO J 3:593-596 Kimura A, Matsunaga E, Ohta Y, Fujiyoshi T, Matsuo T, Nakamura T, Imamura T, Yanase T, Takagi Y (1983) Structure of cloned delta-globin genes from a normal subject and a patient with delta-thalassemia: sequence polymorphisms found in the delta-globin gene region of Japanese individuals. Nucleic Acids Res 10:5725-5732 Koop BF, Goodman M, Xu P, Chen K, Slightom JL (1986) Primate eta-globin DNA sequences and man's place among the great apes. Nature 319:234-237 Lawn RM, Efstratiadis A, O'Connell C, Maniatis T (1980) The nucleotide sequence of the human beta-globin gene. Cell 21: 647-651 Martin SL, Vincent KA, Wilson AC (1983) Rise and fall of the delta globin gene. J Mol Biol 164:513-528 Nei M, Roychoudhury AK (1972) Gene differences between Caucasian, Negro and Japanese populations. Science 177:434-436 Nei M, Roychoudhury AK (1982) Genetic relationship and evolution of human races. Evol Biol 14:1-59 O'Brien SJ, Nash WG, Wildt DE, Bush ME, Benveniste RE (1985) A molecular solution to the riddle of the giant panda's phylogeny. Nature 317:140-144 Orkin SH, Kazazian HH (1984) The mutation and polymorphism of the human beta-globin gene and its surrounding DNA. Annu Rev Genet 18:131-171 Orkin SH, Kazazian HH, Antonarakis SE, GoffSC, Boehm CD, Sexton JP, Waber PG, Giardina JV (1982) Linkage ofbetathalassemia mutations and beta-globin gene polymorphisms with DNA polymorphisms in human beta-globin gene cluster. Nature 296:627-631 Pagnier J, Mears JG, Dunda-Belkhodja O, Schaefer-Rego KE, Beldjord C, Nagel RL, Labie D (1985) Evidence for the multicentric origin of the sickle cell hemoglobin gene in Africa. Proc Natl Acad Sci USA 81:1771-1773 Piazza A, Sgaranella-Zonta L, Gluckman P, Cavalli-Sforza LL (1975) The 5th histoeompatibility workshop gene frequency data: phylogenetic analysis. Tissue Antigens 5:445-463 Poncz M, Schwartz E, Ballantine M, Surrey S (1983) Nucleotide sequence analysis of the delta-beta globin gene region in humans. J Biol Chem 258:11599-11609 Savatier P, Trabuchet G, Faure C, Chebloune Y, Gouy M, Verdict G, Nigon VM (1985) Evolution of the primate betaglobin gene region: high rate of variation in CpG dinucleotides and in short repeated sequences between man and chimpanzee. J Mol Biol 182:21-29 Savatier P, Trabuchet G, Chebloune Y, Faure C, Verdier G, Nigon VM (1987) Nucleotide sequence of the delta-betaglobin intergenic segment in the macaque: structure and evolutionary rates in higher primates. J Mol Evol 24:297-308 Sibley CG, Ahlquist JE (1984) The phylogeny of the hominoid primates as indicated by DNA-DNA hybridization. J Mol Evol 20:2-15 Templeton AR (1985) The phylogeny of the hominoid primates: a statistical analysis of the DNA-DNA hybridization data. Mol Biol Evol 2:420-433 Ueda S, Takenaka O, Honjo T (1985) A truncated immunoglobulin epsilon pseudogene is found in gorilla and man but not in chimpanzee. Proc Natl Acad Sci USA 82:3712-3715 Wainscoat JS, Hill AVS, Boyce AL, Fint J, Hernandez M, Thein SL, Old JM, Lynch JR, Falusi AG, Weatherall DJ, Clegg JB (1986) Evolutionary relationships of human populations from an analysis of nuclear DNA polymorphisms. Nature 319:491493 Received June 17, 1986/Revised September 5, 1986
© Copyright 2026 Paperzz