Nucleotide sequence of the beta-globin genes in gorilla and

J Mol Evol (1987) 24:309-318
Journal of
Molecular Evolution
(~) Spfinger-Verlag New York Inc. 1987
Nucleotide Sequence of the Beta-Globin Genes in Gorilla and Macaque:
The Origin of Nucleotide Polymorphisms in Human
P. Savatier, G. Trabuchet, Y. Chebloune, C. Faure, G. Verdier, and V.M. Nigon
D6partement de BiologieG6n6raleet Appliqu~e--UA92, Universit~Claude Bernard--LyonI, 43 Boulevarddu 11 novembre 1918,
69622 VilleurbanneCedex, France
Part ofthebeta-globingenes of Macaca
cynomolgus and Gorilla gorilla has been cloned and
sequenced. Ten putatively neutral nucleotide polymorphisms have been described at the beta-globin
locus in humans. They are associated in seven combinations, which define seven different haplotypes
o f the beta-globin gene: four major frameworks-- 1,
2, 3, and 3*--and three minor frameworks, which
we term KI1, KA1, and OR1. The nucleotide sequences o f these frameworks are compared with
those o f homologous sequences in chimpanzee, colobus, macaque, and gorilla. This comparison provides strong evidence that framework 2 was the earliest f r a m e w o r k in the h u m a n lineage. F r o m
framework 2, a rooted parsimonious tree for the six
other frameworks is constructed. This phylogenetic
tree is discussed in terms of the evolution o f nucleotide polymorphisms as well as in terms of genetic affinities between human populations.
For each position at which there is base difference
in comparing human, gorilla, and chimpanzee betaglobin genes, the phyletic lineage where the corresponding substitution occurred has been identified
using the max imu m parsimony procedure. The data
provide evidence that polymorphisms may represent a significant component of differences between
closely related species. If so, nucleotide polymorphisms may strongly bias estimates o f small evolutionary distances.
Summary.
Offprint requests to: P. Savatier
Key words: D N A sequencing -- D N A evolution
- - Macaque -- Gorilla -- Beta-globin gene -- NucIeotide polymorphisms -- Intraspecific versus interspecific variations -- Hominine phylogeny
Introduction
Many studies based on D N A and protein comparisons have attempted to resolve the dichotomous
branching order among the Hominoidea (human,
chimpanzee, gorilla, orangutan, and gibbon). Human, chimpanzee, and gorilla are undoubtedly the
three most closely related species. However, the resolving power o f most techniques does not allow
separation of their respective branching nodes, leading to the suspicion that the branchings occurred
within a short time of one another (Sibley and
Ahlquist 1984; O'Brien et aL 1985; Templeton 1985;
Koop et al. 1986; Savatier et al. 1987). Mitochondrial restriction site data favor a closer relationship
between chimpanzee and gorilla (Templeton 1985),
but alternative phylogenies cannot be rejected. Recently, the identification o f an additional immunoglobulin C-epsilon pseudogene in human and gorilla DNAs suggested that these two species are the
most closely related (Ueda et al. 1985).
The first complete nucleotide sequence of the human beta-globin gene was determined by Lawn et
al. (1980). Part or all o f about 40 additional nucleotide sequences have since been reported from
the D N A of geographically scattered individuals.
310
These sequences show 10 silent, putatively neutral
nucleotide polymorphisms (for a review, see Orkin
and Kazazian 1984; Antonarakis et al. 1985) that
define four major haplotypes of the human betaglobin gene: frameworks I, 2, 3, and 3*. Since the
four frameworks are found together in different human populations, their original appearance may have
predated racial divergence.
In this report, we compare the nucleotide sequences of these various human beta-globin frameworks with the homologous sequences in chimpanzee (Pan troglodytes, Savatier et al. 1985) and colobus
monkey (Colobus polykomos, Martin et al. 1983).
We also compare them with the homologous sequences in macaque (Macaca cynomolgus) and lowland gorilla (Gorilla gorilla), which are reported in
this article. Using the maximum parsimony procedure, we constructed a nucleotide sequence for
the beta-globin gene of the common ancestor of
Homo, Pan, and Gorilla. Based on this ancestral
sequence, we analyze the origin of nucleotide differences among chimpanzee, gorilla, and human at
the beta-globin locus and we propose a rooted phylogenetic tree for the human beta-globin frameworks. We also analyze the effects of intraspecific
nucleotide variation on estimates of evolutionary
distances between closely related species. Base substitution rates and hominoid phylogeny are discussed in light of these results.
Materials and Methods
The strategyfor sequencingthe gorilla beta-globin gene is presented in Fig. 1. Savatier et al. (1987) gives the DNA isolation,
cloning, and sequencingprocedures.
Results
Nucleotide sequences of a macaque beta-globin gene
and a gorilla beta-globin gene were determined and
then compared with the orthologous sequences from
human (Poncz et al. 1983), chimpanzee (Savatier et
al. 1985), and colobus monkey (Martin et al. 1983)
(Fig. 2). The comparison was restricted to a 1396nucleotide-long segment (positions + 1 to + 1396
with respect to the cap site).
In this comparison, we have included putatively
neutral nucleotide polymorphisms that have been
identified at the beta-globin locus in human, and
excluded m u t a t i o n s responsible for beta-chain
hemoglobinopathies and beta-thalassemias. We have
also excluded structural variations of the beta-globin polypeptide, even though some of them are putatively neutral (Bunn et al. 1977), because for most
of them the replacement substitution has not been
identified at the DNA level. We have considered
therefore only 10 neutral nucleotide polymorphisms, which have been identified from nucleotide
sequencings of about 40 human beta-globin genes
carried out by Lawn et al. (1980), Orkin et al. (1982),
Kimura et al. (1983), Poncz et al. (1983), Kazazian
et al. (1984), Orkin and Kazazian (1984), and Antonarakis et al. (1985).
1. Positions Involved in Nucleotide Polymorphisms
in Humans
The 10 nucleotide polymorphisms mentioned above
are shown in Fig. 2. They occur at nine nucleotide
positions, of which eight display two different nucleotides and one (position +569) displays three
(Figs. 2 and 3B).
Seven combinations of these polymorphisms have
been identified in human populations, which define
seven haplotypes or frameworks of the human betaglobin gene (Fig. 3B). Four of them (major frameworks 1, 2, 3, and 3*) are found widely among different human populations. The other three occur
only once each in the 40 reported human beta-globin
gene sequences. One of these three frameworks has
been identified by restriction endonuclease analysis
in two other individuals. We term the three minor
frameworks OR1, KI1, and KA1 in reference to
Orkin et al. (1982), Kimura et al. (1983), and Kazazian et al. (1984), who first identified the corresponding nucleotide polymorphisms.
The seven frameworks can be derived from each
other by more than one branching route. The route
illustrated in Fig. 4 corresponds to the maximum
parsimony solution. Framework KI 1 can be derived
from either framework 1 or framework 2; the two
solutions are equally parsimonious (three substitutions are necessary in both cases). The maximum
parsimony network costs 10 substitutions.
Making use of the comparisons between human
and nonhuman primate sequences shown in Fig. 2,
we can identify framework 2 as the most primitive.
At every position involved in a nucleotide polymorphism (except +59 and + 511), the nucleotides
found in human framework 2 are identical to the
homologous nucleotides in the macaque, gorilla,
chimpanzee, and colobus DNAs (Fig. 3C).
Identification of framework 2 as the most primitive framework allows the determination of the nucleotide of the ancestral human--chimpanzee-gorilla
gene (which we call "beta AN") at each of the seven
polymorphic positions (Fig. 3C). This ancestral sequence remains ambiguous at position +59 (T in
macaque and colobus, C in gorilla and chimpanzee).
At position + 511, the nucleotide found in gorilla
(T) is the only one different from that o f framework
311
,
5"
1 kb
i
b ~ a I:
E
I
,k
nr-i.
3a
t~
I
Eco *
S
-4000
i
-3000
r
-2000
i
-1000
Bam
*
ECO
.I
Fig. 1. Linkage maps of the gorilla delta- and beta-globin gene region deduced from blot hybridization of genomic DNA (data not
shown). The map of the "Go beta EcoRI 6.4kb" clone is shown expanded. Both sequencing and restriction endonuclease mapping
provided evidence that the extra 900 nucleotides found in the gorilla 6.4-kb insert with respect to both the human and chimpanzee
5.5-kb inserts come from an additional segment about 200 nucleotides 3' of the intergenic EcoRI site (data not shown). The strategy
for sequencing the gorilla beta-globin gene is indicated by the horizontal arrows below the maps, indicating sequencing either from
restriction endonuclease sites (filled circles) or from Bal31-generated ends (open circles). E or Eco, EcoRI; Barn, BamHI. In bar at
right, the hatched area represents the 5' untranslated region, the filled areas represent introns, and the open areas represent exons
2 (C). It is also different f r o m the nucleotide f o u n d
in the other f r a m e w o r k s (G). Therefore, we suggest
that m o s t p r o b a b l y (1) C was present at position
+ 511 in beta A N , (2) a C -~ T substitution occurred
at this position in the gorilla lineage, and (3) a C -~
G substitution occurred in s o m e h u m a n s .
A rooted p a r s i m o n i o u s tree was constructed f r o m
the seven h u m a n beta-globin f r a m e w o r k s (Fig. 5).
F r a m e w o r k 1 and f r a m e w o r k KA1 were derived
directly f r o m the root node ( f r a m e w o r k 2) through
only one substitution each. F r a m e w o r k s 3", 3, a n d
O R I derive f r o m the root b y three, four, a n d five
substitutions, respectively. T h i s implies the existence o f two hypothetical i n t e r m e d i a t e f r a m e w o r k s
between f r a m e w o r k 2 and f r a m e w o r k 3* that h a v e
not yet been identified in h u m a n populations. We
call these f r a m e w o r k s H1 and H2. One o f t h e m
could be easily investigated since positions + 59 a n d
+ 511, which define it, can be detected b y the restriction endonucleases HgiA1 a n d A v a i l , respectively.
2. Positions Not Involved in Nucleotide
Polymorphisms in Humans
At 21 nucleotide positions there are base differences
between f r a m e w o r k 2 a n d c h i m p a n z e e or f r a m e w o r k 2 and gorilla (Fig. 2). T h i s implies that the 21
corresponding substitutions arose either in the p h y letic lineages leading to c h i m p a n z e e or gorilla or in
the lineage leading to h u m a n p r i o r to the emergence
o f h u m a n beta-globin gene alleles. M o s t o f these 21
substitution sites are in introns; five are at silent
positions in coding sequences. O n e difference leads
to an a m i n o acid r e p l a c e m e n t (residue position 104,
Arg ~ Lys) between h u m a n and gorilla, in agreem e n t with the c o r r e s p o n d i n g a m i n o acid sequences
( G o o d m a n et al. 1983).
For each o f these 21 nucleotide positions, let us
consider the h o m o l o g o u s nucleotides in h u m a n ,
c h i m p a n z e e , m a c a q u e , gorilla, a n d (as far as known)
in colobus (Table 1). Eighteen belong to the onesolution class. T h a t is, for each o f these 18 positions,
four o f the five species c o m p a r e d b e a r the s a m e
nucleotide; only one o f the three species h u m a n ,
c h i m p a n z e e , a n d gorilla bears a nucleotide that is
different f r o m the one f o u n d in the other four. T h e
m a x i m u m p a r s i m o n y p r o c e d u r e depicts the nucleotide that is f o u n d four t i m e s (or three times w h e n
the h o m o l o g o u s nucleotide in colobus is u n k n o w n )
as already present in the last c o m m o n ancestor o f
h u m a n , c h i m p a n z e e , a n d gorilla. A substitution is
taken to h a v e occurred in the phyletic lineage corresponding to the nucleotide represented only once.
In the a s s i g n m e n t o f the 18 substitutions to the three
phyletic lineages (Table 1), we find 4 substitutions
in the gorilla lineage, 6 in the c h i m p a n z e e , a n d 8 in
the h u m a n .
F o r each o f the three r e m a i n i n g positions ( 4 4 9 4 ,
+ 7 1 5 , and + 1240, T a b l e 1), the nucleotide o f beta
A N c a n n o t be d e t e r m i n e d since each nucleotide is
represented twice. These three sites a p p e a r to be
cladistically i n f o r m a t i v e . U n f o r t u n a t e l y , each position results in a different p a r s i m o n y cladogram.
One (position 4 4 9 4 ) groups h u m a n with c h i m p a n zee, a n o t h e r (position + 715) c h i m p a n z e e with gorilla, and the third (position + 1240) h u m a n with
gorilla. T h u s the data do nothing to resolve the trichotomy.
312
+1
polymorph.
H.sapiens
P.troglodytes
C.gorille
M.eynomolgus
C.polykomos
Po],
HsB.
Ptr,
Ggo,
Mcy.
Cpo.
t"
ACATTTGCTTCTGACACAACTGTCTTCACTAGCAACCTCAAACAGACACCATGgtgcaoctga
C
C
t
t
0
T
+100
etcctgsggBgeegtotgccqttaotgocetgtggggeaaggtgaacgtggstgaagttggtggtgagge
BB
g
t
c
CB
C
Cpo.
+200
G
cctgggcBggTTGGTATCAAGGTTACAAfiACACGTTTAAGCAGACCAATAGAAACTGGGCATGTGGAGAC
C
G
C
TC
A GA
0
t
C
GA
G
Pol.
Hga.
Ptr.
Ggo.
Mmy.
Cpo,
AGAGAAGACTCTTGCGTTTCTGATAGGCACTGACTCTCTCTGCCTATTGGTCTATTTTCCCACCCITAGG
G
fi
G
xxxxxxxx
G
Pol.
HBS,
Ptr,
Ggo.
Mcy.
Pol,
Hsa.
Ptr,
Ggo.
+300
etgctggtggtetaeccttggaeeeagaggttctttgagteetttggggatetgtcceetcctgatgetg
Hey.
Cpo.
Pol,
Hse,
Ptr.
Ggo,
Mcy.
Cpo,
Pol.
Hsa.
Ptr.
r,go,
Mcy.
Cpo.
PoZ.
HBB,
Ptr.
Ggo.
Hey.
t
t
+400
ttatgggcaaeectaaggtgeeggctcatggeaegeaagtgctcggtgeetttagtgatggoctggetca
t
B@
cctggacaacctcaagggeacctttgceacactgagtgagctgcactgtgacaagctgeaegtggatcct
t
oag c
tag
t
t
/
+500
"G
gBgaeottcBggGTGAGTCTATCGGACCCTTGATGTTTTCTTTCCCCTTCTTTTCTATGGTTAAGTTCAT
a
a
T
Number of Nucleotide Differences Between the
Ancestral "'Beta A N " Gene and Each Present-Day
Beta-Globin Gene
T h e results r e p o r t e d in s e c t i o n s 1 a n d 2 are s u m m a r i z e d in Fig. 6. W e shall a s s u m e the t r i c h o t o m y
a m o n g Homo, Pan, and Gorilla. In the p h y l e t i c lineage o f Homo, w e h a v e e s t i m a t e d eight differences
b e t w e e n beta A N and the s t e m gene ( e x c l u d i n g the
three cladistically i n f o r m a t i v e p o s i t i o n s ; see T a b l e
1). Similarly, w e h a v e e s t i m a t e d six n u c l e o t i d e differences b e t w e e n the ancestral b e t a - g l o b i n g e n e a n d
the c h i m p a n z e e b e t a - g l o b i n gene a n d five n u c l e o t i d e
differences b e t w e e n the ancestral b e t a - g l o b i n g e n e
a n d the gorilla b e t a - g l o b i n gene. Finally, b e t w e e n
the s t e m g e n e and the h u m a n b e t a - g l o b i n gene w e
h a v e c o u n t e d z e r o , o n e , three, four, or five differe n c e s , d e p e n d i n g o n the f r a m e w o r k .
A
C
Fig. 2. Nucleotide sequence of a human (Hsa.)
beta-globin gene (Poncz et al. 1983), aligned with
beta-globin gene sequences from a chimpanzee
(Ptr.), a gorilla (Ggo.), a colobus (Cpo.), and a
macaque (Mcy.) (Martin et al. 1983; Savatier et
al. 1985). Nucleotide polymorphisms (Pol.) idenlifted in human DNA by Lawn et al. (1980), Orkin et al. (]982), Kimura et al. (1983), Poncz et
al. (1983), Kazazian et al. (1984), Orkin and Kazazian (1984), and Antonarakis et al. (1985) are
also shown. Numbering is from the human cap
site. Gaps used to maximize identities in introns
are indicated by x's, and the end of the sequenced
colobus segment by a slash. Exons are reported in
upper-case letters. At position + 1156, the chimpanzee sequence has G instead of the A as previously reported in Savatier et al. (1985). The complete nucleotide sequence for the human betaglobin gene given by Poncz et al. (1983) is given
on the second line. This nucleotide sequence is
that of framework 2 (see section 1 of Results).
Only nucleotide differences found between either
species or alleles are indicated
Discussion
1. Implications of Nucleotide Polymorphisms for the
Estimation of Genetic Distances Between Closely
Related Species
Our data g i v e n e w i n f o r m a t i o n c o n c e r n i n g the e v o l u t i o n a r y d i s t a n c e s a m o n g h u m a n , c h i m p a n z e e , and
gorilla.
W e find that 1 4 - 1 9 differences h a v e a c c u m u l a t e d
b e t w e e n h u m a n a n d c h i m p a n z e e at the b e t a - g l o b i n
l o c u s d e p e n d i n g on w h i c h h u m a n s e q u e n c e is c o n sidered. T h i s leads to b a s e s u b s t i t u t i o n f r e q u e n c i e s
ranging f r o m 1% to 1.4%. T h e range w o u l d p r o b a b l y
i n c r e a s e i f n u c l e o t i d e p o l y m o r p h i s m s in c h i m p a n z e e D N A w e r e k n o w n . T h e r e f o r e , o u r data p r o v i d e
direct e v i d e n c e that p o l y m o r p h i s m s r e p r e s e n t a n o -
313
Pol.
Hsa,
Ply.
Ggo.
Mcy.
+60O
T
GTCATAGGAAGGGGAGAAGTAACAGGQTACAGTTTAGAATGGGAAACAGACGAATGATTGCATCAGTGTG
T
T
A
T
T
G G
A
A
Pol.
Hsa.
Ptr.
Ggo.
Hey.
AIT
GAACTCTCAGGATCGTTTTAGTTTCTTTTATTTGCIGTTCATAACAATTGTTTTCTTTTGTxxTTAATTC
C
A
C
A
C
Ax
G
T
GT
Pol.
Hsa.
Ptr.
Cgo.
Hey.
TTGCTTTCTTTTTTTTTCTTCTCCGCAATTTTTACTATTATACTTAATGCCTTAACATTGTGTATAACAA
T
C
T
C
C
C
x
T
T
TGC A
+700
+80O
Pot.
Hsa.
Ptr.
Cgo.
Hey.
AAGGAAATATCTCTGAGATACATTAAGTAACTTAAAAAAAAxACTTTACACAGTCTGCCTAGTACATTAC
TG
G
A
G
T
C GC
G
+900
Pol.
Hsa.
Ptr.
Ggo.
Ney.
TATTTCGAATATATGTGTGCTTATTTGCATATTCATAATCTCCCTACTTTATTTTCTTTTATTTTIAATT
A
A
G
A
Cx
Pol.
Hsa.
Ptr.
Cgo.
Mcy.
GATACATAATCATTATACATATTTATGGGTTAAAGTGTAATGTTTTAATATGTGTACAxxxxxCATATTG
A
T
CA
A
CATTG
+I000
Pol,
Haa.
ACCAAATCAGGGTAATTTTGCATTTGTAATTTTAAAAAATGCTTTCTTCTTTTAATATACTTTTITGTTT
Ptr.
Ggo.
Mcy.
T
IT
x
GC
+llOO
Pol.
Hsa.
Ptr.
Ggo.
Mcy.
Pol.
Hsa.
Ptr.
Ggo.
Mcy.
ATCTTATTTCTAATACTTTCCCTAATCTCTTTCTTTCAGGGCAATAATGATACAATGTATCATGCCTCTT
C
T
A
C
CC
T
C
TGCACCATTCTAAAGAATAACAGTGATAATTTCTGGGTTAAGGCAATAGCAATATTTCTGCATATAxxxx
G
T
AGTA
+1200
Pol.
Hsa,
Ptr.
Ggo.
Mcy.
xAATATTTCTGCATATAAATIGTAACTGATGTAAGAGGITTCATATTQCTAATAGCAGCTACAATCCAGC
T
T
G
T
T
+1300
Pol 9
Hsa 9
Ptt.
Ggo
Mcy.
TACCATTCTGCTTTTATTTTATGGTTGGGATAAGGCTGGATTATTCTGAGTCCAAGCTAGGCCCTTTTGC
T
9
Pol.
Hsa,
G T
T
A
a
TAATCATGTTCATACCTCTTATCTTCCTCCCACAGcte~
Ptr.
Ggo.
Moy.
T
+1396
Pol 9
Hsa.
Ptr.
tcactttggcaaaga
Ggo.
Mcy,
F i g . 2.
Continued
G
G
G
t
C
g
314
5'NT exon I
IVS I
,,'....-..
,
,
posit.
; I
B
+83
Cc
C
C
T
T
C
C
C
C
T
C
C
c
C
~ ORI
I
=" '~ I KAI
C
C
chimp
goril
colob
C
C
C
macaq
b e t a AN
C
C
+185
m
I
I
I
,
..
+59
3
3*
'~
.--_.
+20
~ ~'* 112
exon I l l
".'....-...
-
A
IVS II
exon I I
I
,
%
,
..
%
,
+511
+569
+576
+1161
+1351
A
A
A
A
C
C
G
G
G
T
T
T
C
C
T
C
T
T
C
C
G
G
G
G
C
C
A
A
G
A
G
C
C
T
A
T
T
C
C
C
T
T
G
A
G
C
C
T
T
C
C
C
C
A
A
A
A
C
T
.
C
T
T
C
C
T
T
G
G
C/T
C
A
C
.
.
.
.
T
C
T
G
T
C
T
G
Fig. 3A-C. A Positions of the 10 nucleotide polymorphisms identified at the h u m a n beta-globin gene locus between positions + 1
and + 1396 (numbering is that used in Fig. 2). B Combinations of these nucleotide polymorphisms that define the seven frameworks
identified in human populations. C Homologous nucleotides in the chimpanzee, gorilla, macaque, and colobus DNAs analyzed in
this report. "Beta A N " is the ancestral human-gorilla--chimpanzee beta-globin gene. IVS, intervening sequence; NT, untranslated
region
! .
.
.
.
.
.
.
!KAI!
' ......
'
!Kn
3 subst,
~
q
3 subst.
Irlb I _.4
2 i-
Fig. 4. Maximum parsimony network for the seven h u m a n beta-globin frameworks. Major frameworks are enclosed within a
continuous line, minor frameworks within a dashed line
ticeable part of interspecific differences between
closely related species; nucleotide polymorphisms
thus introduce a bias into calculations of evolutionary distances between closely related homologous
sequences.
In sequence comparisons among human, chimpanzee, and gorilla eta-globin pseudogenes, Chang
and Slightom (1984) found that both the chimpanzee and the gorilla pseudogenes have accumulated
twice as many substitutions as their human counterpart since the time these three species began to
diverge. This led these authors to conclude that the
base substitution rate was lower in H o m o than in
Pan and Gorilla. In contrast, in comparing the deltabeta-globin intergenie sequences of human, chimpanzee, and macaque (Savatier et al. 1987), we did
not find any difference between the base substitution
frequencies in Homo, 0.73 +_ 0.14, and in Pan, 0.83
315
CCCACTCTG1
~
. . . . . . . . . . . . . . . . . . . .
1
subst.
.
n d
.
.
.
.
.
.
.
.
.
.
framework 1
, . ~ 1 7 6 1 7 6 1 7 6 1 .7 . 6 . 1 . 7. 6. . .
2
.
framework 2 ~
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . .
~......
J
~ramework
KA1
~ f r a m e w o rH1
k
. . o o ~ 1 7 6 1 7 6 1. .7 . 6. .
,.
~ ~ 1 7 6 1 7 6
~
subst.
§
, ~ 1 7 6 . . . . . . . . .
, , ~ 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6
l
~ framework H2
1
~ 1 7.6. . . . .
9 ......
9 ....
..~
..........
~
.....
...~
..............
~176
!
th
3
subst,
9 . . . . . . . .
4
.
.
5
~C~~ - ~ _ - i ~
frameworkKI1
I ~ I
framework3*
i
o ~ 1 7 6 1 7 6 1 7 6. . . . . . . . .
~ 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1~ 7 6 1 7 ~6 1 1 7 76 61 17 76 61. 71. 67. 1.6 .7 . 6
subst.
.
.
th
.
.
.
.
.
.
.
subst.
.
, ~ 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1. .7 . 6. 1 7 6 1
framework 3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
i
"'i_
.
.
.
.
.
.
.
.
~BICAGTTCG ;
'
~-;
framework OR1
_ 0.15 (P > 35%). This discrepancy might be explained in part by nucleotide p o l y m o r p h i s m s inside
the c o m p a r e d regions, since both the c h i m p a n z e e
and the h u m a n whose D N A s were analyzed by Chang
and Slightom (1984) were different from ours.
Therefore, whether base substitution rates have been
equal in h u m a n and African great apes (gorilla and
chimpanzee) is still an open question.
We suggest that the estimation o f accurate evolutionary distances between closely related species
depends on three conditions: (1) T h e main nucleotide polymorphisms inside the compared region must
be identified. (2) T h e sequence c o m p a r i s o n must be
between species that are as closely related as possible. [The ambiguous status o f positions + 4 9 4 ,
+715, and + 1 2 4 0 in our data m a y be due in part
to the distance o f macaque from hominoids; identification o f the h o m o l o g o u s nucleotides in the
orangutan, which is m u c h closer to the African great
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Fig. 5. Rooted p a r s i m o n i o u s tree constructed from the seven h u m a n betaglobin frameworks. Only the nine nucleotides that define these frameworks
7 6
are shown. Each connecting line between two nodes corresponds to one
substitution. For each node, the position of the substitution is indicated by
a lower-case white letter. F r a m e w o r k s
H 1 and H2 are hypothetical intermediate frameworks (see text). Each frame-
work has been classified as a one-,
two-, three-, four-, or five-substitution
type descendant according to the number of substitutions since the root
apes than the m a c a q u e (Sibley and Ahlquist 1984;
K o o p et al. 1986) might make it possible to resolve
these ambiguous positions.] (3) T h e c o m p a r i s o n
must be between large nucleotide segments so as to
give m o r e accurate estimates o f base substitution
frequencies.
It is n o t e w o r t h y that the overall n u m b e r o f nucleotide p o l y m o r p h i s m s might be m u c h higher in
h u m a n s than in chimpanzees because o f the large
differences in effective population size. If so, the
base substitution frequencies in the Pan and Homo
lineages m a y not be strictly comparable.
2. Branching Patterns Within Hominines Based on
Nonquantitative Methods
Various quantitative m e t h o d s have been used to
resolve the branching order a m o n g h u m a n , gorilla,
and chimpanzee based on data for b o t h nuclear and
316
Table 1. Nucleotide positions showing base differences between the h u m a n framework 2 sequence and the chimpanzee or gorilla
homologous sequence
Nucleotide
position in
the h u m a n
Lineage
where
substitution
Nucleotide in
sequence"
Macaque
Chimp
Gorilla
Colobus
Human
Beta A N b
occurred
+ 168
+ 181
+ 254
+494
+ 627
+638
+715
+ 730
+ 748
+ 789
+ 803
+804
+ 810
+872
+ 1048
+ 1156
+ 1207
+ 1240
+ 1250
+ 1331
+ 1357
C
A
G
A
C
A
C
T
G
T
A
G
A
A
A
A
G
T
G
G
C
C
G
G
G
C
A
T
T
G
T
T
G
G
A
C
G
G
T
T
G
C
C
A
G
A
C
A
T
C
G
A
A
G
A
A
A
A
T
C
G
G
T
C
A
G
-------------------
T
A
T
G
G
G
C
T
A
T
A
C
A
C
A
A
G
C
G
A
C
C
A
G
A/G
C
A
C/T
T
G
T
A
G
A
A
A
A
G
Human
Chimp
Human
?
Human
Human
?
Gorilla
Human
Gorilla
Chimp
Human
Chimp
Human
Chimp
Chimp
Gorilla
T/C
?
G
G
C
Chimp
Human
Gorilla
=
N u m b e r e d as in Fig. 2
"Beta A N " denotes the deduced sequence o f the beta-globin gene o f the c o m m o n ancestor of h u m a n , chimpanzee, and gorilla
"beta AN"
~ n m
~ O ~ m t e
m genm
2
G O R I L L A
C H I M P A N Z E E
3
~
3"
ii
HUMAN
mitochondrial DNA: percentages of substitutions,
restriction endonuclease mapping, thermal stability
of D N A - D N A hybrids, and maximum parsimony
analysis of nucleotide substitutions (for review, see
Sibley and Ahlquist 1984; Templeton 1985; Koop
et al. 1986; see also Savatier et al. 1987). Data were
found to support each of the three alternative cladograms but most of these methods were not able to
separate the divergence nodes with statistical significance.
Why is it that there is not better agreement among
these different experimental approaches? First,
probably no more than 2 million years separate the
or1
=
kil
k~t!
/
Fig. 6. N u m b e r o f nucleotide differences found between each present-day
beta-globin gene and either the ancestral gene beta AN or the " s t e m gene"
human-chimpanzee, human-gorilla, and chimpanzee-gorilla divergence nodes; this represents less than
20% of the time since human-chimpanzee-gorilla
divergence [about 8 million years according to Sibley and Ahlquist (1984); see also Koop et al. (I 986)].
Since the accuracy of calculated evolutionary distances is limited either by the length of the compared
nucleotide sequences or by the variability of the
experimental results (as in D N A - D N A hybridization), the resolving power of any one of these techniques is not sufficient to separate the divergence
nodes with statistical significance. Second, we have
shown that nucleotide polymorphisms might rep-
317
resent a noticeable part of the interspecific differences among human, gorilla, and chimpanzee at the
beta-globin locus. Templeton (1985) has previously
noted that these interspecific differences are not evolutionarily significant unless they are much larger
than the intraspecific differences.
In this context, nonquantitative methods have
been interpreted as more reliable. Ueda et al. (1985)
found that a truncated immunoglobulin epsilon
pseudogene (C epsilon 2) is present in human and
gorilla but absent from the DNAs of all other catarrhines analyzed in their report. This led them to
conclude that gorilla is more closely related to human than is chimpanzee.
In the nucleotide sequence of the beta-globin gene,
we found three cladistically informative positions.
Unfortunately, these three positions lead to the three
alternative hypotheses for the branching order among
Homo, Pan, and Gorilla. The base difference found
between primates at position +494 (A in both macaque and gorilla, G in both human and chimpanzee) leads to the single amino acid replacement (residue position 104) found between the gorilla and
human and chimpanzee beta-globin polypeptides.
Similarly, human and chimpanzee have arginine and
aspartic acid at position A-gamma-104 and alpha23, respectively; gorilla and all other catarrhines that
have been analyzed have lysine and glutamic acid,
respectively, at these positions (Goodman et al.
1983). This led Goodman et al. (1983) to conclude
that human and chimpanzee probably are more
closely related than are human and gorilla or chimpanzee and gorilla.
Ueda's and Goodman's conclusions can be reconciled if one assumes that the Homo, Pan, and
Gorilla lineages arose from a common population
in which the above-mentioned differences were already present as polymorphisms (i.e., both presence
and absence of the C epsilon 2 immunoglobulin
pseudogene, both arginine and lysine at positions
beta- 104 and A-gamma- 104, both aspartic and glutamic acid at position alpha-23). The branching of
Homo, Pan, and Gorilla might have preserved different combinations of these ancestral polymorphisms and discarded others. In this case, one might
expect to draw different conclusions about their dichotomous branching order depending on the chromosome or recombination region analyzed.
3. Origin and Phylogeny of the Human Beta-Globin
Frameworks
The data presented here provide strong evidence
that framework 2 was the stem or founder framework in which successive substitutions led to the
present-day polymorphisms at the beta-globin locus
in humans. Other polymorphisms might have ex-
isted prior to the diversification of this stem framework. All but one (framework 2) apparently disappeared, although we cannot exclude that some may
still be present at low frequencies.
Major frameworks have been identified in most
human populations so far analyzed (American black,
Mediterranean, Indian, Chinese, and Cambodian),
which suggests that their appearance predated racial
divergence. However, framework 1 is present in more
than 95% of Africans (Pagnier et al. 1985). On the
other hand, framework 3 has been identified only
in Mediterraneans and American blacks, and framework 3* only in South Asian populations. Since
framework 3* most probably gave rise to framework
3 (Fig. 5), it might be expected that the former would
be as widely distributed as the latter. Two hypotheses can be considered to explain the discrepancy:
1. Genetic distances measured on the basis of
both nucleotide polymorphisms at protein loci and
restriction enzyme polymorphisms provide evidence that the Negroid group and Caucaso-Mongoloid group diverged from each other first and that
the Caucasoid and Mongoloid groups diverged from
each other later (Nei and Roychoudhury 1972, 1982;
Wainscoat et al. 1986). This proposed phylogeny
requires framework 3 to have arisen prior to the
Negroid-Caucaso-Mongoloid split and framework
3* to have disappeared independently from both the
Negroid and Caucasoid populations.
2. According to the most parsimonious solution,
both the appearance of framework 3 and disappearance of framework 3* took place in a NegroidCaucasoid stem population prior to the NegroidCaucasoid split. This implies that the Mongoloid
group and the Negroid-Caucasoid group diverged
from each other first and the Caucasoid and Negroid
groups from each other later. Previously, only blood
groups and histocompatibility locus antigens had
supported this phylogeny (Cavalli-Sforza and Edwards 1964; Piazza et al. 1975).
The minor frameworks (OR1, KI1, and KA1)
have been identified only once each in about 40
beta-globin gene sequences. In the parsimonious
network (Fig. 4), these minor frameworks appear on
exterior branches and in no case do they constitute
intermediate forms. Both observations suggest them
to be of more recent origin than the major frameworks.
Unidentified m i n o r beta-globin frameworks
probably exist in human populations. The parsimonious tree presented in Fig. 5 assumes the existence of additional frameworks as intermediates.
Only one of them could be easily investigated
(framework H1 or H2) since both positions that
define it are detectable by restriction endonuclease
mapping. However, sequence data suggest that only
10% of individuals may be minor-framework car-
318
tiers, which implies that their detection depends on
the improvement of methods that screen genomic
DNA directly rather than on sequencing. If interm e d i a t e f r a m e w o r k s H1 a n d H 2 still exist, t h e i r
geographical location might provide information on
t h e p r e h i s t o r i c m i g r a t i o n s a n d g e n e t i c affinities o f
human populations.
H o w o l d is t h e h u m a n s t e m gene? A s s u m i n g t h a t
an average of two positions have undergone nucleotide substitutions between the stem gene and
the present-day frameworks and that nine substit u t i o n s h a v e b e e n fixed b e t w e e n b e t a A N a n d t h e
s t e m gene, the e v o l u t i o n a r y d i s t a n c e b e t w e e n t h e
stern gene a n d t h e p r e s e n t g e n e s w o u l d r e p r e s e n t
a b o u t 20% o f the h u m a n - g o r i l l a - - c h i m p a n z e e d i v e r g e n c e t i m e o f a b o u t 8 m i l l i o n y e a r s , i.e., a p p r o x i m a t e l y 1.5 m i l l i o n y e a r s . N e i a n d R o y c h o u d h u r y (19 82) e s t i m a t e d t h a t t h e N e g r o i d l i n e a g e s e p a r a t e d f r o m the C a u c a s o - M o n g o l o i d l i n e a g e 120,000
y e a r s ago. T h e r e f o r e , a s s u m i n g t h a t all f o u r m a j o r
frameworks arose prior to the Negroid-CaucasoM o n g o l o i d split, o u r e s t i m a t e o f 1.5 m i l l i o n y e a r s
is fully c o n s i s t e n t w i t h N e i ' s .
4. The Evolution of Polymorphisms
As already mentioned, our data show the absence
of intermediate forms between beta AN and the
s t e m gene as well as b e t w e e n t h e s t e m g e n e a n d t h e
p r e s e n t - d a y f r a m e w o r k 3*. T h i s s i t u a t i o n m i g h t b e
a d i r e c t c o n s e q u e n c e o f g e n e t i c drift. C o n s i d e r i n g
the l i m i t e d n u m b e r o f a l l e l e s t h a t c a n b e m a i n t a i n e d
in a finite p o p u l a t i o n , t h e s p r e a d i n g o f n e w a l l e l e s
and the disappearance of older ones, whether by
neutral drift or by linkage of a polymorphic form
to a s e l e c t i v e l y a d v a n t a g e o u s m u t a t i o n c a p a b l e o f
i n d u c i n g c h a n g e s in allele f r e q u e n c i e s a n d loss o f
g e n e t i c v a r i a b i l i t y , will n e c e s s a r i l y b e c o n c u r r e n t .
Acknowledgments. We are grateful to Dr. Ivanoff (Centre de
Recherches Mrdicales de Franceville, Gabon) for providing us
blood samples from gorilla. This work was supported by research
grants from INSERM, CNRS, and the Commission of the European Communities.
References
Antonarakis SE, Kazazian HH, Orkin SH (1985) DNA polymorphism and molecular pathology of the human globin gene
cluster. Hum Genet 69:1-14
Bunn HF, Forget BG, Ranney HN (1977) Human hemoglobins.
WB Saunders, Philadelphia
Cavalli-Sforza LL, Edwards AWF (1964) Genetics today. Pergamon, Oxford
Chang LYE, Slightom JL (1984) Isolation and nucleotide sequence analysis of the beta type globin pseudogene from human, gorilla and chimpanzee. J Mol Biol 180:767-784
Goodman M, Braunitzer G, Stangl A, Schrank B (1983) Evidence on human origins from haemoglobins of African apes.
Nature 303:546-548
Kazazian HH Jr, Orkin SH, Antonarakis SE, Sexton JP, Boehm
CD, GoffSC, WaberPG (1984) Molecular characterization
of seven beta-thalassemia mutations in Asian Indians. EMBO
J 3:593-596
Kimura A, Matsunaga E, Ohta Y, Fujiyoshi T, Matsuo T, Nakamura T, Imamura T, Yanase T, Takagi Y (1983) Structure
of cloned delta-globin genes from a normal subject and a
patient with delta-thalassemia: sequence polymorphisms found
in the delta-globin gene region of Japanese individuals. Nucleic Acids Res 10:5725-5732
Koop BF, Goodman M, Xu P, Chen K, Slightom JL (1986)
Primate eta-globin DNA sequences and man's place among
the great apes. Nature 319:234-237
Lawn RM, Efstratiadis A, O'Connell C, Maniatis T (1980) The
nucleotide sequence of the human beta-globin gene. Cell 21:
647-651
Martin SL, Vincent KA, Wilson AC (1983) Rise and fall of the
delta globin gene. J Mol Biol 164:513-528
Nei M, Roychoudhury AK (1972) Gene differences between
Caucasian, Negro and Japanese populations. Science 177:434-436
Nei M, Roychoudhury AK (1982) Genetic relationship and
evolution of human races. Evol Biol 14:1-59
O'Brien SJ, Nash WG, Wildt DE, Bush ME, Benveniste RE
(1985) A molecular solution to the riddle of the giant panda's
phylogeny. Nature 317:140-144
Orkin SH, Kazazian HH (1984) The mutation and polymorphism of the human beta-globin gene and its surrounding
DNA. Annu Rev Genet 18:131-171
Orkin SH, Kazazian HH, Antonarakis SE, GoffSC, Boehm CD,
Sexton JP, Waber PG, Giardina JV (1982) Linkage ofbetathalassemia mutations and beta-globin gene polymorphisms
with DNA polymorphisms in human beta-globin gene cluster.
Nature 296:627-631
Pagnier J, Mears JG, Dunda-Belkhodja O, Schaefer-Rego KE,
Beldjord C, Nagel RL, Labie D (1985) Evidence for the
multicentric origin of the sickle cell hemoglobin gene in Africa. Proc Natl Acad Sci USA 81:1771-1773
Piazza A, Sgaranella-Zonta L, Gluckman P, Cavalli-Sforza LL
(1975) The 5th histoeompatibility workshop gene frequency
data: phylogenetic analysis. Tissue Antigens 5:445-463
Poncz M, Schwartz E, Ballantine M, Surrey S (1983) Nucleotide
sequence analysis of the delta-beta globin gene region in humans. J Biol Chem 258:11599-11609
Savatier P, Trabuchet G, Faure C, Chebloune Y, Gouy M, Verdict G, Nigon VM (1985) Evolution of the primate betaglobin gene region: high rate of variation in CpG dinucleotides
and in short repeated sequences between man and chimpanzee. J Mol Biol 182:21-29
Savatier P, Trabuchet G, Chebloune Y, Faure C, Verdier G,
Nigon VM (1987) Nucleotide sequence of the delta-betaglobin intergenic segment in the macaque: structure and evolutionary rates in higher primates. J Mol Evol 24:297-308
Sibley CG, Ahlquist JE (1984) The phylogeny of the hominoid
primates as indicated by DNA-DNA hybridization. J Mol
Evol 20:2-15
Templeton AR (1985) The phylogeny of the hominoid primates: a statistical analysis of the DNA-DNA hybridization
data. Mol Biol Evol 2:420-433
Ueda S, Takenaka O, Honjo T (1985) A truncated immunoglobulin epsilon pseudogene is found in gorilla and man but
not in chimpanzee. Proc Natl Acad Sci USA 82:3712-3715
Wainscoat JS, Hill AVS, Boyce AL, Fint J, Hernandez M, Thein
SL, Old JM, Lynch JR, Falusi AG, Weatherall DJ, Clegg JB
(1986) Evolutionary relationships of human populations from
an analysis of nuclear DNA polymorphisms. Nature 319:491493
Received June 17, 1986/Revised September 5, 1986