Comparative sequencing of a multicopy subtelomeric region

© 2001 Oxford University Press
Human Molecular Genetics, 2001, Vol. 10, No. 21 2363–2372
Comparative sequencing of a multicopy subtelomeric
region containing olfactory receptor genes reveals
multiple interactions between non-homologous
chromosomes
Heather C. Mefford1,2, Elena Linardopoulou1,3, David Coil1, Ger van den Engh5 and
Barbara J. Trask1,2,3,4,*
1Division
of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, 2Department of
Genetics,3Department of Bioengineering and 4Department of Molecular Biotechnology, University of Washington,
Seattle, WA 98195, USA and 5Institute for Systems Biology, Seattle, WA 98105, USA
Received June 5, 2001; Revised and Accepted August 13, 2001
In this study, we assess the evolutionary relationships among different chromosomal copies of a
subtelomeric block of sequence. This block contains
homology to three olfactory receptor genes and is
dispersed on at least 14 different chromosome ends
in humans. It is single-copy in non-human primates.
We analyzed single nucleotide polymorphisms in two
1 kb subregions and a polymorphic Alu insertion
within 181 copies of this block from 12 chromosome
ends and found evidence for recent interactions
between the subtelomeric regions of non-homologous
chromosomes. First, several sequence haplotypes
are each present on multiple chromosomes, and
several chromosomes each have multiple alleles
with divergent haplotypes. Secondly, the observed
variation clearly indicates that chromosomes 5q, 8p,
11p and/or 15q have each received the block from at
least two different sources by non-homologous
exchange. In addition, we observe at least one
ectopic gene conversion event. Awareness of such
exchange among sequences on non-homologous
chromosomes is critical for accurate analysis of
these complex and dynamic regions of the genome.
INTRODUCTION
The subtelomeric regions of human chromosomes contain
large blocks of sequence that occur on multiple chromosomes.
Many of these blocks have duplicated since humans diverged
from non-human primates. Some subtelomeric blocks are
multi-copy in humans, but single-copy in non-human primates
(1,2 and unpublished data). Others are multi-copy but exhibit
different chromosomal distributions in different species (3 and
unpublished data). Furthermore, the majority of subtelomeric
blocks that have been analyzed in humans are polymorphic in
copy number and location (1,3–8), suggesting that duplications
and/or losses have occurred during recent human evolution.
The exact mechanism(s) by which these subtelomeric blocks
have been duplicated and dispersed among many human
chromosomes is unknown, but may involve several different
processes. These include (i) translocation or recombination,
which would result in the swapping of chromosome ends; (ii)
gene conversion events, which would result in the replacement
of all or part of one subtelomeric region by another; or (iii)
duplication by a transposition-like event. Regardless of the
mechanism of the original duplication events, the end result is
the distribution of highly homologous subtelomeric blocks of
sequence onto multiple chromosome ends. The extensive
homology created by these duplications sets up the opportunity
for further exchange between the subtelomeric regions of nonhomologous chromosomes. Gross structural polymorphism of
these regions could further facilitate such exchanges.
Homologous chromosomes can differ by the presence or
absence of up to several hundred kilobases of sequence (1,8),
whereas non-homologous chromosomes can share large
regions of homologous sequences. These unbalanced structures
pose a potential problem for the meiotic pairing machinery
and, if not resolved, can result in exchange between similar
sequences on non-homologous chromosomes.
While the complex structure of subtelomeres suggests that
duplications and exchanges have occurred in the evolutionary
past, it is unknown whether (and how often) such processes
have continued to occur. If interactions such as recombination and
gene conversion occur between the ends of non-homologous
chromosomes, patches of subtelomeric sequence from
different chromosome ends will be highly similar, if not identical. On the other hand, if meiotic recognition and pairing
processes prevent non-homologous exchanges, duplicated
blocks on different chromosomes will evolve independently.
Chromosome-specific changes will accumulate, and
divergence among different copies will correlate with the age
*To whom correspondence should be addressed at: 1100 Fairview Avenue N, Mailstop C3-168, PO Box 19024, Seattle, WA 98109, USA. Tel: +1 206 667 1470;
Fax: +1 206 667 4023; Email: [email protected]
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors
2364 Human Molecular Genetics, 2001, Vol. 10, No. 21
of the duplication. Using sequence analysis, we attempt to
discriminate between these models.
In this study, we analyze the evolutionary relationships
among 181 chromosomal copies of a particular multi-copy
block of subtelomeric sequence to determine whether
exchanges occur between non-homologous chromosomes at an
observable rate. This block, identified by the 36 kb cosmid
f7501 (GenBank accession no. L78442), is polymorphic in
copy number and chromosome location in the human genome,
but single-copy in non-human primates. It contains three
olfactory receptor-like sequences, OR-A, OR-B and OR-C (1).
[The prototypic chromosome 19 copies of these genes are
referred to as OR4F19, OR4G8P and OR4G3P by others (9).]
OR-A has an open reading frame and is expressed in olfactory
epithelium and testis (10). OR-C appears to be a pseudogene
on all chromosomes analyzed, and OR-B is a pseudogene on
some chromosomes, but appears intact on others (unpublished
data).
Our earlier analyses of 44 individuals from eight populations
showed that the block can be present at 7 to 11 chromosomal
ends in different individuals (1). There is no evidence for more
than one copy of the block at any chromosome end (see Materials and Methods). The presence of the block on 98–100% of
chromosomes 3q, 15q and 19p analyzed suggested that copies
at these locations are the oldest, having been present there
before humans spread around the world. The polymorphic
nature of the block at other chromosome ends suggests that
these copies might be the consequence of more recent duplication events. The block was commonly observed on 7p, 16p and
16q only among African Pygmy populations. These copies
could either be old alleles that were not carried out of Africa or
new duplications. The phylogenetic relationships among
nucleotide sequences of the various copies could test these
conjectures about the block’s history, if subsequent exchanges
have not blurred the historical record.
By sampling the sequence of this block and analyzing two
polymorphic insertion/deletion events, we ask whether copies
of the f7501 block on different chromosomes have evolved
independently since duplication onto multiple chromosomes,
or whether transfer of subtelomeric genetic information
between non-homologous chromosomes occurs frequently
enough to be observed. Among 181 copies analyzed from
12 chromosome ends, we found several examples of transfers
and exchanges among non-homologous chromosomes. We
find several sequence haplotypes that are present on multiple
chromosomes. In addition, chromosomes 5q, 8p, 11p and/or
15q each appear to have received the block from at least two
separate sources. We also observe at least one ectopic gene
conversion event among the chromosomes analyzed. These
events refute the notion that there is such a thing as a chromosome-specific subtelomeric map and obscure the historical
record of these dynamic regions in the genome.
RESULTS
In order to better understand the evolutionary relationships
among copies of the f7501 block on different chromosomes,
we analyzed sequences in this region in multiple individuals.
175 chromosomal copies of the f7501 block were analyzed in
22 individuals from six different populations. Also included
were six chromosomes from monochromosomal somatic cell
hybrid lines (two each of chromosomes 3, 15 and 19) for a total
of 181 copies from 12 chromosome ends (Table 1). Because
this sequence is present on multiple chromosomes, each
chromosome carrying the f7501 sequence, as previously
identified by fluorescence in situ hybridization (FISH), was
isolated from copies elsewhere in the genome by flow sorting
before PCR amplification and sequence analysis (see Materials
and Methods).
Both coding and non-coding segments within the f7501
region were analyzed at the sequence level: (i) 1 kb of
sequence encompassing the coding exon of OR-A; (ii) a
human-specific Alu repetitive element present in a subset of
chromosomal copies, located 9.2 kb distal (telomeric) to the
coding sequence; and (iii) 1.1 kb of non-coding sequence,
10.4 kb distal to the coding sequence. In addition to the Alu
insertion, a polymorphic deletion in the putative promoter
region of the OR-A gene was analyzed in a subset of chromosomes. These loci are distributed across 12.5 kb of this
subtelomeric block (Fig. 1).
Haplotype analysis at the nucleotide level
Non-coding DNA. Sequence analysis of the 1.1 kb segment of
non-coding DNA reveals the frequency and types of changes
that have arisen in copies of this subtelomeric block. Fifteen
sites within the non-coding segment varied among the chromosomes analyzed. Each site was biallelic, and the ancestral state
was determined by comparison to chimp, gorilla and orangutan
sequences. Of the 15 dimorphic sites, one was observed only
once, and a second was observed on only three of 181 chromosomes. Nine of the variants (60%) were transitions, none of
which are changes at a CpG site, and 40% were transversions.
This ratio is close to the expected 2:1 ratio of transitions to
transversions (11).
The 15 variable sites define 10 different haplotypes within
this 1.1 kb segment. These haplotypes diverge from chimp by
an average of 0.83% (range 0.64–1.00%). The frequencies of
each of the 10 non-coding haplotypes vary markedly overall
and by chromosome location (Fig. 2A). Five of the 10 haplotypes were found at only one chromosome location, and two of
these were observed only a single time. However, five haplotypes were found on multiple chromosome ends suggesting
multiple recent duplications and/or exchanges involving nonhomologous chromosomes. The most common haplotype (N1)
had an overall frequency of 44% and was observed on eight
different ends.
OR-A coding exon. The sequences of the OR-A coding exon
show lower average divergence than the non-coding sequences
(P > 0.001), but contain more variable sites due to multiple
changes at CpG dinucleotides. Twenty dimorphic sites were
observed within the 972 bp OR-A coding exon. Sixteen of the
variants (80%) were transitions, seven of which occur at CpG
sites, and four (20%) were transversions. Excluding the seven
CpG sites, 69% of the variants are transitions while 31% are
transversions. Eight of the observed variants were seen only
once or twice, and six of these result from changes at CpG
sites (Fig. 2B). The variable sites together define 21 different
OR-A haplotypes that have an average divergence from
chimp of 0.60% (range 0.31–0.93%). The observed
frequencies of each OR-A haplotype are shown in
Human Molecular Genetics, 2001, Vol. 10, No. 21 2365
Table 1. Nucleotide diversity and divergence of haplotypes observed at each chromosome location included in the analysis.
Non-coding DNA
OR-A exon
Chromosome Copies Number
Nucleotide SD (%)
analyzed of DNA
diversity
haplotypes (%)
Estimated
Average
divergence age
(%)
(Myears) a
3qter
Copies
analyzed
Number
Nucleotide SD (%) Average
Estimated
of DNA
diversity
divergence age
haplotypes (%)
(%)
(Myears)a
46
1
0.00
0.00
0.91
0.0
46
3
0.02
0.01
0.51
0.2
45
1
0.00
0.00
0.91
0.0
45
2
0.01
0.01
0.51
0.0
5qter
2
2
0.46
0.50
0.86
2.6
2
2
0.31
0.36
0.46
3.3
6pter
1
1
–
–
0.91
–
1
1
–
–
0.51
–
6qter
3
1
0.00
0.00
0.91
0.0
3
1
0.00
0.00
0.51
0.0
7pter
5
3
0.13
0.11
0.71
0.9
5
4
0.31
0.23
0.60
2.6
8pter
2
2
0.55
0.59
0.91
3.0
2
2
0.51
0.56
0.61
4.2
9qter
9
3
0.06
0.05
0.92
0.3
9
2
0.04
0.05
0.53
0.4
11pter
3qter-A
16
3
0.17
0.11
0.82
1.0
16
6
0.14
0.10
0.45
1.6
11pter-A
2
1
0.00
0.00
0.91
0.0
2
2
0.10
0.13
0.57
0.9
11pter-B
13
1
0.00
0.00
0.82
0.0
13
3
0.05
0.05
0.44
0.5
11pter-C
1
1
–
–
0.64
–
1
1
–
–
0.41
–
15qter
45
3
0.22
0.13
0.77
1.4
45
3
0.09
0.07
0.39
1.2
15q-B
25
1
0.00
0.00
0.82
0.0
28
1
0.00
0.00
0.41
0.0
15q-C
20
2
0.12
0.09
0.70
0.9
17
2
0.08
0.07
0.36
1.1
16pter
2
1
0.00
0.00
0.64
0.0
2
2
0.31
0.36
0.46
3.3
16qter
7
2
0.30
0.20
0.71
2.1
7
3
0.23
0.16
0.57
2.0
2
1
0.00
0.00
0.91
0.0
2
3
0.00
0.00
0.51
0.0
16q-A
16q-C
19pter
5
2
0.00
0.00
0.64
0.0
5
3
0.09
0.08
0.60
0.7
43
5
0.35
0.20
0.81
2.2
43
8
0.33
0.19
0.70
2.4
19pter-A
22
2
0.01
0.02
0.91
0.0
24
4
0.07
0.06
0.56
0.6
19pter-C
21
3
0.05
0.04
0.67
0.3
19
4
0.12
0.09
0.84
0.7
181
10
0.36
0.20
0.83
2.2
181
21
0.28
0.16
0.53
2.6
Overall
For chromosomes 3q, 11p, 15q, 16q and 19p, subgroups of haplotypes found at those locations are analyzed separately. A, B and C correspond to groups indicated
in Figures 2 and 3.
Age estimates are calculated as age = (nucleotide diversity) × (5 Myears)/(average divergence), assuming a divergence time of 5 Myears between humans and
chimpanzees (18). This calculation assumes that no exchanges between non-homologous chromosomes have occurred since the original placement of the block,
except where subgroups were analyzed separately for chromosomes 3, 11, 15, 16 and 19.
aMyears, million years.
Figure 2B. Of the 20 variant sites within the coding exon,
eight changes are silent, 11 cause amino acid alterations,
and one results in a premature stop codon. We observed only
a single instance of the latter. The proteins potentially
produced by these various OR-A haplotypes are discussed elsewhere (10).
Extended haplotypes across 12.5 kb of the f7501 block
We were able to deduce extended haplotypes encompassing
the OR-A and non-coding sequence haplotypes for most of the
181 chromosomes analyzed. Although both homologs are flow
sorted and analyzed together, prior FISH analysis indicates
whether the f7501 block is present on one or both homologs.
Therefore, extended haplotypes can be determined easily if the
block is present on only one homolog of the sorted
chromosome pair, or, when it is present on both homologs, if
homozygosity is observed at all sequenced loci. Haplotypes
can also be determined if an individual is heterozygous at one
locus, but homozygous at the others. Using these criteria, we
could determine phase of the coding and non-coding haplotypes and extended haplotypes for 157 of the 181 chromosomes analyzed. A total of 26 non-recombinant extended
haplotypes could be defined by the 21 OR-A haplotypes and 10
non-coding haplotypes together (Fig. 3A). For the 24 cases
where heterozygosity was observed at both loci, extended
haplotypes were inferred from the 26 haplotypes observed in
homozygotes. Five recombinant haplotypes were also
observed. Three (A3b-N1, A3c-N1, A4b-N7) can best be
explained by homologous recombination with a crossover
between the two sequenced loci. For example, A3b-N1 was
observed on chromosome 19 and most likely results from
recombination between the more common A3b-N4 and A1a-N1
haplotypes, which are both also found on chromosome 19.
Two additional recombinant haplotypes are likely to result
2366 Human Molecular Genetics, 2001, Vol. 10, No. 21
Figure 1. Nineteen kilobases of the f7501 subtelomeric block encompassing the regions analyzed from multiple chromosomes from multiple individuals.
from ectopic gene conversion or recombination between nonhomologous chromosomes (A8-N1 and A4a-N4; see below).
Analysis of insertion/deletion polymorphisms
Alu elements are transposable elements that stably integrate
into the genome. Because integration between a particular pair
of nucleotides is likely to have occurred only once in evolution, the presence of an Alu insertion at a particular site can be
used as a molecular marker for regions that are identical by
descent from a common ancestor (12). By comparing three
published sequences (GenBank accession nos L78442,
AC005603, AC005604) containing sequence overlapping the
f7501 block and likely derived from different chromosomes
(1 and unpublished data), we identified an Alu insertion that is
not present in all copies of the block. The Alu insertion was not
detected at this location in non-human primates (data not
shown), and its sequence indicates that it is a member of one of
the youngest Alu subfamilies, the human-specific AluYa5
family (13). The two published sequences containing the Alu
insertion (GenBank accession nos AC005603, AC005604)
also lacked 881 bp ∼2.4 kb upstream of the OR-A coding exon.
This 881 bp sequence is present in non-human primates (data
not shown) and thus appears to have been deleted from some
copies in the human genome. The chromosomal distribution of
these markers was analyzed in order to further define and
verify extended haplotypes and to track the evolutionary
relationships among copies of the f7501 block.
The human-specific Alu-repeat insertion and 881 bp
deletion co-segregate
We tested 133 chromosomes carrying the f7501 block for the
presence of this Alu element using primers matching sequence
flanking the site of insertion. The Alu insertion was found in
some but not all copies of the f7501 block on chromosomes 5q
(one of two f7501-carrying copies analyzed), 8p (one of two),
11p (four of nine), and 15q (23 of 33). It was not found at any
of the following locations: 3q, 6q, 7p, 9q, 16p, 16q or 19p
(34, two, five, seven, two, four and 34 f7501-carrying copies
analyzed, respectively). It is likely that all Alu-containing copies
are derived from the same single insertion event because the Alu
element is situated at the same nucleotide position in all
24 chromosomes for which the insertion site was analyzed at the
sequence level (data not shown). We also analyzed 77 copies
lacking the Alu element and found them all to have identical
sequence across the insertion site, suggesting that the Alu element
had never been present in these copies. Therefore, the dispersal of
the Alu element onto multiple different chromosomes must be the
result of transfer of this region among non-homologous
chromosomes after the initial Alu insertion event.
Of the 133 chromosomes analyzed for the Alu insertion, 105
were also assayed for the 881 bp deletion. We could determine
the phase of the two insertion/deletion events in 97 cases.
Twenty copies carrying the Alu insertion also had the 881 bp
deletion, while 77 copies lacked the Alu insertion and the
881 bp deletion. Four individuals who were heterozygous for
the Alu insertion on chromosome 15q were also heterozygous
for the 881 bp deletion. Together, these results suggest that the
Alu insertion and the 881 bp deletion co-segregate. The Alu
insertion/881 bp deletion combination was observed on
chromosomes 5q, 8p, 11p and 15q on the common sequence
haplotype, A2-N7, and on A7-N7 and A14-N7, which differ
from A2-N7 by 1 to 2 bp over ∼2.1 kb.
Phylogenetic analysis of extended haplotypes
We used phylogenetic analysis to investigate the evolutionary
relationships among the 26 extended haplotypes. One possible
tree representing the evolutionary relationships is shown in
Figure 3A. The haplotypes cluster into three phylogenetic
groups. Group A and C haplotypes lack the Alu insertion, but
form separate phylogenetic clades due to sequence differences.
The three group B haplotypes contain the Alu insertion and
881 bp deletion. Haplotypes within the same group differ on
average by 0.19%, which is significantly less than the average
divergence (0.55%) between haplotypes from different groups
(P < 0.001). We sequenced an additional ∼2 kb in the OR-B
and OR-C main coding exons for a subset of 13 human
chromosomes and three non-human primates. These chromosomes were chosen to represent each of the major clades in
Figure 3A. As shown in Figure 3B, phylogenetic analysis
using this additional sequence (for a total of ∼3800 bp sampled
over 17.5 kb of the f7501 block) provides strong statistical
support for the overall topology of the tree based on OR-A/noncoding haplotypes.
Several relationships among the extended haplotypes provide
evidence for sequence transfer between non-homologous
chromosomes. Six of the extended haplotypes (A1a-N1, A1a-N9,
A2-Alu-N7, A7-Alu-N7, A6-N4, A3b-N4) occur on multiple
Human Molecular Genetics, 2001, Vol. 10, No. 21 2367
Figure 2. Sequence haplotypes of the 1.1 kb non-coding segment and the 1 kb OR-A exon and the frequency of each haplotype by chromosome and 181 chromosomes analyzed overall. Frequencies are calculated using only those chromosomes carrying the f7501 block. The number of copies analyzed for each chromosome is
indicated in parentheses. The reference sequence is the f7501 cosmid (GenBank accession no. L78442). Changes at CpG dinucleotides are indicated by asterisks. The
OR-A DNA haplotypes are named according to which protein form they are predicted to encode (10; e.g. DNA haplotypes A1a and A1b both encode protein P1).
chromosome ends (Fig. 3A). The most frequently observed
extended haplotype (A1a-N1) was seen on eight different
chromosome ends (3q, 5q, 6p, 6q, 9q, 11p, 16q, 19p), and
haplotype A1a-N8, which differs from A1a-N1 by a single
basepair, was seen on a ninth chromosome end (8p). The
presence of identical haplotypes on multiple chromosome ends
suggests recent (and numerous) transfers of subtelomeric
sequence among non-homologous chromosomes.
Our analyses indicate that the f7501 block was transferred to
chromosomes 5q, 8p, 11p and/or 15q from at least two separate
sources. We observed two or more divergent haplotypes at
each of these locations, one of which contains the Alu insertion
and one of which does not (Fig. 3A). The situation on chromosomes 5q and 8p is perhaps most striking. The f7501 block is
rarely found at these locations, and we had access to only two
block-containing alleles of these chromosomes for analysis. In
each case, one chromosome has a haplotype from group A, and
one has an Alu-containing haplotype from group B. The two
chromosome 5 haplotypes are 0.39% divergent across the
OR-A exon and non-coding segment. The two chromosome-8p
haplotypes are 0.53% divergent. The Alu must have inserted
into one of the four chromosomes that now have Alu-inserted
2368 Human Molecular Genetics, 2001, Vol. 10, No. 21
Figure 3. Maximum parsimony tree of (A) 26 non-recombinant extended haplotypes and (B) seven extended haplotypes using additional sequence from the main
coding exons of OR-B and OR-C. Bootstrap values are indicated for each major branch. Analysis was carried out using concatenated sequence using PAUP* (27).
alleles (5q, 8p, 11p or 15q) and subsequently dispersed onto
other chromosomes by duplication or exchange. Since we also
find very different blocks on other alleles of these chromosomes
(without the Alu and diverging at the non-coding and OR-A
loci), these chromosomes must have also been the recipient of
the block from a source lacking the Alu-Ya5 insertion.
Human Molecular Genetics, 2001, Vol. 10, No. 21 2369
The block appears to have been transferred from three different
sources to chromosome 11p, as haplotypes from groups A, B and
C were observed on this chromosome. Chromosome 11 haplotypes from the three groups are 0.39–0.48% divergent from
each other, while haplotypes within a group were 0–0.08%
divergent. Haplotypes from two groups were observed on
chromosome 15q. The most common chromosome 15q haplotype contained the Alu insertion (A2-Alu-N7). Two group C
haplotypes, which are 0.29% divergent from the group B
haplotype, were also observed on 15q.
Although evidence for multiple transfer events is strongest
for chromosomes 5q, 8p, 11p and 15q because of the tell-tale
Alu insertion, we also find evidence for multiple transfers onto
chromosomes 16q and 19p. The most frequently observed
haplotypes on 16q are from group C and are 0.14% divergent
from one another. However, a single copy of 16q analyzed
carries a haplotype from group A that averages 0.46% divergence from the other 16q haplotypes and is very similar to the
common A1a-N1 haplotype. Chromosome 19p carries the
most diverse collection of haplotypes. Eleven different
extended haplotypes were observed on chromosome 19, five
fall into group A, and six belong to group B. While haplotypes
within a group are 0.11% divergent from each other, divergence between group A and B haplotypes is significantly
higher at 0.68% (P < 0.001).
Separate phylogenetic analyses of the OR-A exon and the
non-coding region reveals at least one, apparently ectopic,
gene conversion event involving chromosome 3q and another
chromosome. While 45 of 46 copies of the 972 bp exon on
chromosome 3q differ by 0 or 1 bp (haplotypes A1a and A9),
one haplotype (A8) differs from all other copies on chromosome 3q by 3 or 4 bp. This A8 haplotype is most similar
(99.8% identical) to haplotypes A4a and A4b, the non-Alu
haplotypes found on chromosomes 15q. However, this copy of
chromosome 3q is identical at the non-coding locus to the other
45 3q copies analyzed. Sequence analysis of DNA flanking the
OR-A coding exon indicates that the gene conversion event
extends at least 200 bp, but <700 bp, upstream and at least
800 bp downstream of the exon (data not shown).
Different combinations of OR-A and non-coding sequence
haplotypes on chromosome 15q suggest yet another nonhomologous exchange event. On chromosome 15q, the OR-A
haplotype A4a segregates with two different non-coding
haplotypes, N2 and N4. N2 was seen only on 15q, but N4 was
also observed on chromosomes 16p, 16q and 19p. N4 did not
segregate with other chromosome-15q OR-A haplotypes. A
gene conversion or recombination event involving the more
distal (telomeric) non-coding locus between a chromosome
15q with the N2 haplotype and a different, N4-carrying
chromosome could explain this observation.
Analysis of nucleotide diversity
Nucleotide diversity (π) estimates the number of nucleotide
differences between two randomly chosen sequences, and
when combined with divergence from chimp, can be used to
estimate duplication age. Overall nucleotide diversity is 0.36%
in the non-coding DNA segment (range 0–0.55%) and 0.28%
at the OR-A locus (range 0–0.51%; Table 1). Nucleotide diversity in non-coding DNA is greater than in the OR-A coding
exon at most chromosome locations as expected, although the
difference is not significant. The chromosome locations with
the greatest nucleotide diversity are 5qter and 8pter. However,
the high values at these locations are most likely due to the
presence of two divergent haplotypes derived from different
sources, as described above, rather than divergence of the
copies from a common chromosome 5q or chromosome 8p
ancestor. Excluding 5q and 8p, the highest level of nucleotide
diversity is found on chromosome 19p (0.35% at non-coding,
0.33% at OR-A), suggesting that this copy may be older than
the rest. However, if the block was transferred to 19p from two
or more sources, the high level of nucleotide diversity level
may cause a false elevation of the estimated duplication age
(see Discussion). High levels of nucleotide diversity on
chromosomes 7p and 16q suggest these locations may be old as
well. In contrast, all copies from chromosomes 3q (n = 45,
excluding haplotype A8) and 6q (n = 3) analyzed differ by
0 to 1 nucleotides over the 2 kb analyzed.
DISCUSSION
The subtelomeres are unusual regions of the human genome
where a given chromosome may be more similar to other nonhomologous chromosomes than to its homologous partner.
Since meiotic pairing begins near the telomeres of human
chromosomes (14,15), it is possible that regions of high
sequence similarity on non-homologous chromosomes pair
transiently during homology searching in early meiosis (16).
These interactions provide opportunities for exchange of
genetic information among subtelomeric regions of nonhomologous chromosomes. How often such interactions occur
is unknown. Certainly the present structure of subtelomeres
suggests that such interactions occurred in the past.
In this study, we analyze 35 single nucleotide polymorphisms (SNPs) and two insertion/deletion polymorphisms in
order to explore the evolutionary history of a subtelomeric
block. Our results provide evidence that subtelomeric structure
has been shaped by multiple events that involve interactions
between non-homologous chromosomes. These include
multiple transfers of a subtelomeric block to different alleles of
the same chromosome and ectopic gene conversion. Below, we
discuss possible models for subtelomere evolution supported
by our results, as well as the difficulties these phenomena
create for analyzing complex regions of the genome.
Several models are consistent with the multichromosomal
distribution of subtelomeric blocks demonstrated by FISH (1–8).
One model for the evolution of human subtelomeres is that,
subsequent to the initial duplications, which must have
occurred after the divergence of humans and non-human
primates, the paralogous segments on different chromosomes
evolved independently, accumulating chromosome-specific
variants. An alternative model is that subtelomeres undergo
recurrent shuffling and rearrangement by processes such as
recombination, gene conversion and duplication between nonhomologous chromosomes. This model predicts that a given
sequence haplotype would be present on multiple chromosomes as a result of this exchange of subtelomeric blocks
among chromosomes.
Our analyses of the f7501 block support the latter model.
This block was duplicated onto multiple chromosomes after
humans diverged from other primates (1). By comparing
sequence among extant blocks on multiple chromosomes, we
2370 Human Molecular Genetics, 2001, Vol. 10, No. 21
observe several patterns that are inconsistent with independent
evolution of the blocks on different chromosomes after the
initial duplicative transfers. We find several sequence haplotypes that are present on multiple chromosomes and several
chromosomes carrying multiple, divergent haplotypes. The
conventional explanation for the presence of multiple, divergent haplotypes at a particular genomic location is divergence
(accumulation of mutations) over time. If no exchange occurs
among non-homologous chromosomes, the divergence among
haplotypes at a given chromosomal location would depend on
how long ago the sequence was deposited at that location, and
the presence of similar collections of haplotypes on different
chromosomes would be ascribed to convergent evolution.
However, we see multiple chromosomes with identical haplotypes at two different 1 kb loci separated by 10 kb. The independent accumulation of the same set of mutations at each of
these chromosome locations is highly unlikely.
A more likely explanation for the presence of divergent
alleles of f7501 on the same chromosome is the exchange of
genetic information between non-homologous chromosomes.
The presence of the 36 kb f7501 block on a chromosome may
increase the chance that it will pair with a non-homologous
chromosome carrying the same block, especially if its homolog
does not. Illegitimate pairing might be abetted further by
flanking sequences, which are duplicated on even more
chromosomes than is the f7501 block, and by the extensive
disparity in sequence content among homologs (1,8 and
unpublished data). Such interactions may result in the
swapping of two chromosome ends or in partial or complete
gene conversion of a chromosome end and give rise to the
observed sequence relationships.
Indeed, the distribution of a polymorphic Alu insertion
provides clear indication that the f7501 block has transferred to
some chromosomes more than once and from different
sources. The presence of the same Alu-containing haplotype
on some alleles of chromosomes 5q, 8p, 11p and 15q and
diverged non-Alu haplotypes on other alleles argues strongly
against monochromosomal lineages. In addition, we see clear
evidence of at least one ectopic gene-conversion event among
the 181 chromosomes analyzed.
Exchange of subtelomeric sequences has also been observed
between highly similar repeat arrays on chromosomes 4q and
10q (17,18). Studies in somatic cells also suggest that subtelomeric regions of chromosomes co-localize more often during
interphase than do interstitial regions (19), results supported by
observations of individuals mosaic for 4q/10q subtelomeric
translocations (18). Our study demonstrates non-homologous
interactions during meiosis and indicates that multiple chromosome ends can be involved in non-homologous exchange
events. At least 14 different chromosome ends carry highly
similar copies of the f7501 block, and we have observed nonhomologous exchange events involving a total of at least six
different chromosome ends.
Overall nucleotide diversity, combined with the estimated
divergence from chimpanzee over 5 million years (20), gives
an estimate of the age of the f7501 duplications. The average
nucleotide diversity of 0.36% among all chromosomes at the
non-coding locus, combined with an average divergence from
chimp of 0.83% over 5 million years, suggests the human
copies of f7501 have been diverging for an average of
∼2 million years (0.36/0.83*5). The two most divergent copies
of the non-coding segment are 0.82% divergent, suggesting that
these two copies have been diverging for nearly 5 million years.
Given their complex evolutionary history, however, subtelomeric regions are not entirely amenable to standard approaches
for determining the ages of duplications from sequence
divergence. For example, FISH data on individuals from populations dispersed around the world indicate that the block is
almost completely fixed on chromosomes 3q, 15q and 19p in
humans (1). The polymorphic presence of the block on other
chromosomes, such as 5, 7, 11, 16, etc., suggests that the block
was duplicated and transferred to these chromosomes more
recently than to 3, 15 or 19. However, this prediction is not
supported by comparative sequence data. For example, if 3q is
truly an older site, then we should observe more variation
among 3q copies of the block than among copies on these other
chromosomes. Instead, the sequences of 45 of 46 copies of 3q
are nearly identical at both the non-coding segment and the
OR-A exon. In contrast, the five copies of chromosome 7p
analyzed diverge by up to 0.34% across the two regions,
suggesting that 7p is an old, but incompletely fixed, duplication. Given that the block is common on 7p only in African
Pygmy populations, the block may have been present on 7p for
a long time, but chromosomes 7 carrying the block were not in
the pool of chromosomes that left Africa.
There are various possible explanations for the lack of diversity on chromosome 3q compared with other chromosomes.
The presence of the block on both homologs in all individuals
sampled from multiple populations around the world (1)
suggests that it was present on chromosome 3q before humans
migrated around the world. If duplication onto chromosome 3q
happened just prior to this event, the lack of diversity could be
ascribed to insufficient divergence time. An alternative explanation is that the chromosome 3q copy is older, but one allele
was fixed by genetic drift. It is also possible that the copy on 3q
is old, but a selective sweep, possibly selection for the version
of OR-A on 3q, has led to a homogenization in the region.
While similarity among 3q copies could belie their age, variability among alleles of the block on other chromosomes could
conversely over-estimate the age of duplications onto these
sites. The nucleotide diversity calculated here for non-coding
DNA and/or the OR-A exons on several chromosomes, especially chromosomes 5q, 7p, 8p, 16q and 19p (see Table 1), are
notably high relative to diversity estimates based on SNP
frequencies in single-copy portions of the genome (21–23).
High levels of nucleotide diversity might be explained by a
high mutation rate in the region, as suggested by Baird et al.
(24) for telomere-adjacent sequence. However, the overall
divergence from chimpanzee argues against this explanation
for the f7501 block. High nucleotide diversity could also
reflect the old age of the block at these locations, as postulated
above for the 7p alleles found in African populations. Our
observations of multiple interchromosomal exchanges provide
an alternative interpretation. Multiple transfer events can place
divergent copies at the same location, elevate estimates of
nucleotide diversity, and falsely elevate age estimates. For
example, non-coding segments on chromosome 19p exhibit a
nucleotide diversity of 0.35% and an average divergence from
chimp of 0.81%. At face value, these data suggest that the
copies of f7501 on 19p have been diverging for 2.2 million
years. However, this age estimate is valid only if chromosome
19p was ‘colonized’ by the f7501 block only once in history.
Human Molecular Genetics, 2001, Vol. 10, No. 21 2371
Alternatively, the divergent collection of haplotypes on
chromosome 19p may have resulted from transfer of the f7501
block from two different sources as discussed above. In this
case, some alleles of 19p may be the descendents of an f7501colonization event that occurred as recently as 0.5 Mya
(Table 1, see values for types A and C haplotypes). If multiple
transfer events are accounted for and nucleotide diversities are
calculated based on those haplotypes putatively derived from a
common ancestral transfer event, nucleotide diversity values
are within ranges seen by others for interstitial regions of the
genome (Table 1, see values for subsets of chromosomes 16q
and 19p).
A practical consequence of the interchromosomal exchange
of subtelomeric sequences—a measurable mobility even
among the small set of chromosomes analyzed here and
assayed using only a small subunit of this complex patchwork
of multicopy duplications—is that one would be hard pressed
to assign a given haplotype to a particular chromosomal
location with any degree of confidence. The processes acting
on subtelomeres may have more far-reaching phenotypic
consequences as well. Analysis of the OR-A coding exon
suggests that the vast majority of copies of the OR-A gene are
potentially functional (10). Individuals carry different numbers
of functional copies that encode proteins differing by 1 to 5 amino
acid substitutions, and individuals may express different
combinations of OR-A gene copies from different (and
multiple) chromosomal locations (10). Therefore, subtelomeric
plasticity may contribute to normal phenotypic diversity. Nonhomologous interactions within subtelomeres may also have
pathogenic consequences. It has been estimated that 5–10% of
mental retardation cases are due to subtelomeric translocations
(25). While the subtelomeric rearrangements detected in these
studies include chromosome-specific material, the interaction
of highly homologous subtelomeric sequences may be a major
factor in initiating illegitimate pairing and promoting deleterious
exchanges between non-homologous chromosomes.
MATERIALS AND METHODS
Cell lines
Lymphoblast cell lines were obtained from NIGMS Human
Genetic Mutant Cell Repository (Camden, NJ) for 19 individuals: GM10470, GM10471, GM10493, GM10494, GM10495,
GM10496, GM10539, GM10541, GM10543, GM10966,
GM10977, GM10978, GM11373, GM11374, GM11375,
GM11523, GM11524, GM11525 and CGM1. Data were
obtained using peripheral blood lymphocyte cultures for two
individuals and cultured primary skin fibroblasts from one
individual after appropriate informed consent was obtained.
Somatic hybrid cell lines containing individual human
chromosomes (3, NA10253 and NA11713; 15, NA11418 and
NA11715; 19, NA10449 and NA10612) were obtained from
the NIGMS Repository.
Sequence analysis of flow-sorted chromosomes
Chromosomes were isolated from lymphoblast cell lines or
from phytohaemagglutinin (PHA)-stimulated peripheral blood
cell cultures into a polyamine buffer, stained with Hoechst
33258 and chromomycin A3, and sorted using a custom duallaser flow cytometer, as described by Mefford et al. (26). In
several cases, we resolved chromosomes 9 and 11 by measuring
the fluorescence intensity of a FITC-labeled polyamide (Prolinx,
Inc., Bothell, WA), which targets a short sequence repeated in
the heterochromatic region of chromosome 9 (M.Gygi et al.,
manuscript in preparation). 2000 copies of each chromosome
carrying the f7501 block were sorted into a 0.5 µl PCR tube
containing 10 µl sterile H2O and stored at –20°C before use.
PCR ingredients [final concentrations: 1× High Fidelity buffer
2200 µM each dNTP, 400 nM each primer, 0.7 U Expand High
Fidelity polymerase (Roche Molecular, Indianapolis, IN)]
were added to the tube. After initial denaturation at 94°C for
2 min, the 25 µl reactions were subjected to 35 amplification
cycles of 94°C 30 s, 60°C 30 s, 72°C 90 s. Primers used to
amplify the OR-A coding exon were: F4708 (5′-ATTGAGGCAATGTATGTGGAAG-3′) and OLF-AR (5′-ACACTGAGAAGCCGAGATAACTGAA-3′). Primers for the non-coding
segment were: F16147 (5′-CAAGAAGTCAGAATCAGAAGG-3′) and R17372 (5′-TATTTTCACTCCCTCATCTCA-3′).
As necessary, 1 µl of product was re-amplified using primers
OLA10 (5′-CCAACTTCACTATATTTTGTG-3′) and OLA4
(5′-TCTGACTTCCTTCTCCTTCTC-3′) for the OR-A exon
or primers F16192 (5′-GATCTTTCTCAATAGTGGTCT-3′)
and R17307 (5′-AATGTAGTACCTCAAATCCTT-3′) for the
non-coding segment, using the same PCR conditions for 35
additional cycles. The Alu insertion was assayed using primers
AL14759 (5′-TGGTGTTTGTCTTGGAGTGTGAG-3′) and
ALr15146 (5′-TTGCTTTAAGCCTGAAGGTAACC-3′) and
the 881 bp deletion was assayed with primers U7984 (5′CTGAATTTGTGCTGCTGAGG-3′) or U8594 (5′-GCCTTAGCTTTCCTGTTTTT-3′) and A9295 (5′-TGGCCAAGGGAAAACTTGTGA-3′). Excess dNTPs and primers from
DNA produced by PCR amplification were removed by purifying 20 µl of PCR product through Sephacryl 300 spin
columns (Sigma-Aldrich, St Louis, MO). Bulk PCR products
were sequenced with Ready Reaction Big-dye terminator
PRISM kits with AmpliTaq FS (Perkin Elmer). Primers used
for sequencing were the same as for PCR amplification.
Sequence haplotypes in heterozygotes were resolved by cloning
and sequencing PCR products. Results of sequence analysis
were consistent with the presence of only one copy of the
subtelomeric f7501 block at each chromosome end in all cases.
Only a single haplotype was apparent in sequence traces from
monochromosomal hybrid lines and from tubes containing two
homologs, only one of which carried the block. In cases where
both sorted homologs carried the block, a maximum of two haplotypes were apparent in sequence traces. Rare variants were
verified by resequencing products amplified in an independent
reaction to rule out PCR error. Phylogenetic analysis was
carried out using PAUP* (27). For extended haplotypes, noncoding and OR-A exon sequence haplotypes were concatenated.
Nucleotide diversity
Nucleotide diversity (π) is the average number of pairwise
nucleotide differences per site between two randomly chosen
sequences. Nucleotide diversity and variance were calculated
as shown below:
2
2
( n + 1 )π 2 ( n + n + 3 )π
1
π = ---------------------------------- Σπ ij
V(π ) = ------------------------ + -------------------------------------[ n ( n – 1 ) ⁄ 2]
3 ( n – 1 )L
9n ( n – 1 )
where πij is the number of nucleotide differences between the
2372 Human Molecular Genetics, 2001, Vol. 10, No. 21
ith and jth sequence, n is the number of sequences analyzed,
and L is the total length of sequence.
ACKNOWLEDGEMENTS
We thank Carson Thoreen for helpful discussion, and Melanie
Gygi and Prolinx, Inc. for polyamide probes. This work was
supported in part by NIH grants R01 GM57070 and R01
DC04209 to B.J.T. H.C.M. was supported by T32 HG00035
and a Poncin fellowship.
REFERENCES
1. Trask, B.J., Friedman, C., Martin-Gallardo, A., Rowen, L., Akinbami, C.,
Blankenship, J., Collins, C., Giorgi, D., Iadonato, S., Johnson, F. et al.
(1998) Members of the olfactory receptor gene family are contained in
large blocks of DNA duplicated polymorphically near the ends of human
chromosomes. Hum. Mol. Genet., 7, 13–26.
2. Monfouilloux, S., Avet-Loiseau, H., Amarger, V., Balazs, I., Pourcel, C.
and Vergnaud, G. (1998) Recent human-specific spreading of a subtelomeric
domain. Genomics, 51, 165–176.
3. Hoglund, M., Mitelman, F. and Mandahl, N. (1995) A human 12p-derived
cosmid hybridizing to subsets of human and chimpanzee telomeres.
Cytogenet. Cell Genet., 70, 88–91.
4. Brown, W.R., MacKinnon, P.J., Villasante, A., Spurr, N., Buckle, V.J. and
Dobson, M.J. (1990) Structure and polymorphism of human telomereassociated DNA. Cell, 63, 119–132.
5. Cross, S., Lindsey, J., Fantes, J., McKay, S., McGill, N. and Cooke, H.
(1990) The structure of a subterminal repeated sequence present on many
human chromosomes. Nucleic Acids Res., 18, 6649–6657.
6. Ijdo, J.W., Lindsay, E.A., Wells, R.A. and Baldini, A. (1992) Multiple
variants in subtelomeric regions of normal karyotypes. Genomics, 14,
1019–1025.
7. Martin-Gallardo, A., Lamerdin, J, Sopapan, P., Friedman, C., Fertitta,
AL., Garcia, E., Carrano, A., Negorev, D., Macina, R.A., Trask, B.J. et al.
(1995) Molecular analysis of a novel subtelomeric repeat with polymorphic
chromosomal distribution. Cytogenet. Cell Genet., 71, 289–295.
8. Wilkie, A.O., Higgs, D.R., Rack, K.A., Buckle, V.J., Spurr, N.K.,
Fischel-Ghodsian, N., Ceccherini, I., Brown, W.R. and Harris, P.C.
(1991) Stable length polymorphism of up to 260 kb at the tip of the short
arm of human chromosome 16. Cell, 64, 595–606.
9. Glusman, G., Yanai, I., Rubin, I. and Lancet, D. (2001) The complete
human olfactory subgenome. Genome Res., 11, 685–702.
10. Linardopoulou, E., Mefford, H.C., Nguyen, O., Friedman, C., van den Engh, G.,
Farwell, G.D., Coltrera, M. and Trask, B.J. (2001) Transcriptional activity
of multiple copies of a subtelomerically located olfactory receptor gene that is
polymorphic in number and location. Hum. Mol. Genet., 10, 2373–2383.
11. Collins, D.W. and Jukes, T.H. (1994) Rates of transition and transversion in
coding sequences since the human-rodent divergence. Genomics, 20, 386–396.
12. Batzer, M.A., Stoneking, M., Alegria-Hartman, M., Bazan, H.,
Kass, D.H., Shaikh, T.H., Novick, G.E., Ioannou, P.A., Scheer, W.D.,
Herrera, R.J. et al. (1994) African origin of human-specific polymorphic
Alu insertions. Proc. Natl Acad. Sci. USA, 91, 12288–12292.
13. Batzer, M.A., Deininger, P.L., Hellmann-Blumberg, U., Jurka, J.,
Labuda, D., Rubin, C.M., Schmid, C.W., Zietkiewicz, E. and
Zuckerkandl, E. (1996) Standardized nomenclature for Alu repeats.
J. Mol. Evol., 42, 3–6.
14. Wallace, B.M. and Hulten, M.A. (1985) Meiotic chromosome pairing in
the normal human female. Ann. Hum. Genet., 49, 215–226.
15. Chandley, A.C. (1989) Asymmetry in chromosome pairing: a major factor
in de novo mutation and the production of genetic disease in man. J. Med.
Genet., 26, 546–552.
16. Speed, R.M. (1988) The possible role of meiotic pairing anomalies in the
atresia of human fetal oocytes. Hum. Genet., 78, 260–266.
17. van Overveld, P.G., Lemmers, R.J., Deidda, G., Sandkuijl, L.,
Padberg, G.W., Frants, R.R. and van der Maarel, S.M. (2000)
Interchromosomal repeat array interactions between chromosomes 4 and
10: a model for subtelomeric plasticity. Hum. Mol. Genet., 9, 2879–2884.
18. van der Maarel, S.M., Deidda, G., Lemmers, R.J., van Overveld, P.G.,
van der Wielen, M., Hewitt, J.E., Sandkuijl, L., Bakker, B.,
van Ommen, G.J., Padberg, G.W. and Frants, R.R. (2000) De novo
facioscapulohumeral muscular dystrophy: frequent somatic mosaicism,
sex-dependent phenotype, and the role of mitotic transchromosomal
repeat interaction between chromosomes 4 and 10. Am. J. Hum. Genet.,
66, 26–35.
19. Stout, K., van der Maarel, S., Frants, R.R., Padberg, G.W., Ropers, H.H.
and Haaf, T. (1999) Somatic pairing between subtelomeric chromosome
regions: implications for human genetic disease? Chromosome Res., 7,
323–329.
20. Horai, S., Satta, Y., Hayasaka, K., Kondo, R., Inoue, T., Ishida, T.,
Hayashi, S. and Takahata, N. (1992) Man’s place in Hominoidea revealed
by mitochondrial DNA genealogy. J. Mol. Evol., 35, 32–43.
21. Li, W.H. and Sadler, L.A. (1991) Low nucleotide diversity in man. Genetics,
129, 513–523.
22. Przeworski, M., Hudson, R.R. and Di Rienzo, A. (2000) Adjusting the
focus on human variation. Trends Genet., 16, 296–302.
23. Kruglyak, L. and Nickerson, D.A. (2001) Variation is the spice of life.
Nat. Genet., 27, 234–236.
24. Baird, D.M., Coleman, J., Rosser, Z.H. and Royle, N.J. (2000) High levels
of sequence polymorphism and linkage disequilibrium at the telomere of
12q: implications for telomere biology and human evolution. Am. J. Hum.
Genet., 66, 235–250.
25. Flint, J., Wilkie, A.O., Buckle, V.J., Winter, R.M., Holland, A.J. and
McDermid, H.E. (1995) The detection of subtelomeric chromosomal
rearrangements in idiopathic mental retardation. Nat. Genet., 9, 132–140.
26. Mefford, H., van den Engh, G., Friedman, C. and Trask, B.J. (1997)
Analysis of the variation in chromosome size among diverse human
populations by bivariate flow karyotyping. Hum. Genet., 100, 138–144.
27. Swofford, D.L. (2000) PAUP*. Phylogenetic Analysis Using Parsimony
(*and Other Methods). Version 4. Sinauer Associates, Sunderland, MA.