© 2001 Oxford University Press Human Molecular Genetics, 2001, Vol. 10, No. 21 2363–2372 Comparative sequencing of a multicopy subtelomeric region containing olfactory receptor genes reveals multiple interactions between non-homologous chromosomes Heather C. Mefford1,2, Elena Linardopoulou1,3, David Coil1, Ger van den Engh5 and Barbara J. Trask1,2,3,4,* 1Division of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, 2Department of Genetics,3Department of Bioengineering and 4Department of Molecular Biotechnology, University of Washington, Seattle, WA 98195, USA and 5Institute for Systems Biology, Seattle, WA 98105, USA Received June 5, 2001; Revised and Accepted August 13, 2001 In this study, we assess the evolutionary relationships among different chromosomal copies of a subtelomeric block of sequence. This block contains homology to three olfactory receptor genes and is dispersed on at least 14 different chromosome ends in humans. It is single-copy in non-human primates. We analyzed single nucleotide polymorphisms in two 1 kb subregions and a polymorphic Alu insertion within 181 copies of this block from 12 chromosome ends and found evidence for recent interactions between the subtelomeric regions of non-homologous chromosomes. First, several sequence haplotypes are each present on multiple chromosomes, and several chromosomes each have multiple alleles with divergent haplotypes. Secondly, the observed variation clearly indicates that chromosomes 5q, 8p, 11p and/or 15q have each received the block from at least two different sources by non-homologous exchange. In addition, we observe at least one ectopic gene conversion event. Awareness of such exchange among sequences on non-homologous chromosomes is critical for accurate analysis of these complex and dynamic regions of the genome. INTRODUCTION The subtelomeric regions of human chromosomes contain large blocks of sequence that occur on multiple chromosomes. Many of these blocks have duplicated since humans diverged from non-human primates. Some subtelomeric blocks are multi-copy in humans, but single-copy in non-human primates (1,2 and unpublished data). Others are multi-copy but exhibit different chromosomal distributions in different species (3 and unpublished data). Furthermore, the majority of subtelomeric blocks that have been analyzed in humans are polymorphic in copy number and location (1,3–8), suggesting that duplications and/or losses have occurred during recent human evolution. The exact mechanism(s) by which these subtelomeric blocks have been duplicated and dispersed among many human chromosomes is unknown, but may involve several different processes. These include (i) translocation or recombination, which would result in the swapping of chromosome ends; (ii) gene conversion events, which would result in the replacement of all or part of one subtelomeric region by another; or (iii) duplication by a transposition-like event. Regardless of the mechanism of the original duplication events, the end result is the distribution of highly homologous subtelomeric blocks of sequence onto multiple chromosome ends. The extensive homology created by these duplications sets up the opportunity for further exchange between the subtelomeric regions of nonhomologous chromosomes. Gross structural polymorphism of these regions could further facilitate such exchanges. Homologous chromosomes can differ by the presence or absence of up to several hundred kilobases of sequence (1,8), whereas non-homologous chromosomes can share large regions of homologous sequences. These unbalanced structures pose a potential problem for the meiotic pairing machinery and, if not resolved, can result in exchange between similar sequences on non-homologous chromosomes. While the complex structure of subtelomeres suggests that duplications and exchanges have occurred in the evolutionary past, it is unknown whether (and how often) such processes have continued to occur. If interactions such as recombination and gene conversion occur between the ends of non-homologous chromosomes, patches of subtelomeric sequence from different chromosome ends will be highly similar, if not identical. On the other hand, if meiotic recognition and pairing processes prevent non-homologous exchanges, duplicated blocks on different chromosomes will evolve independently. Chromosome-specific changes will accumulate, and divergence among different copies will correlate with the age *To whom correspondence should be addressed at: 1100 Fairview Avenue N, Mailstop C3-168, PO Box 19024, Seattle, WA 98109, USA. Tel: +1 206 667 1470; Fax: +1 206 667 4023; Email: [email protected] The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors 2364 Human Molecular Genetics, 2001, Vol. 10, No. 21 of the duplication. Using sequence analysis, we attempt to discriminate between these models. In this study, we analyze the evolutionary relationships among 181 chromosomal copies of a particular multi-copy block of subtelomeric sequence to determine whether exchanges occur between non-homologous chromosomes at an observable rate. This block, identified by the 36 kb cosmid f7501 (GenBank accession no. L78442), is polymorphic in copy number and chromosome location in the human genome, but single-copy in non-human primates. It contains three olfactory receptor-like sequences, OR-A, OR-B and OR-C (1). [The prototypic chromosome 19 copies of these genes are referred to as OR4F19, OR4G8P and OR4G3P by others (9).] OR-A has an open reading frame and is expressed in olfactory epithelium and testis (10). OR-C appears to be a pseudogene on all chromosomes analyzed, and OR-B is a pseudogene on some chromosomes, but appears intact on others (unpublished data). Our earlier analyses of 44 individuals from eight populations showed that the block can be present at 7 to 11 chromosomal ends in different individuals (1). There is no evidence for more than one copy of the block at any chromosome end (see Materials and Methods). The presence of the block on 98–100% of chromosomes 3q, 15q and 19p analyzed suggested that copies at these locations are the oldest, having been present there before humans spread around the world. The polymorphic nature of the block at other chromosome ends suggests that these copies might be the consequence of more recent duplication events. The block was commonly observed on 7p, 16p and 16q only among African Pygmy populations. These copies could either be old alleles that were not carried out of Africa or new duplications. The phylogenetic relationships among nucleotide sequences of the various copies could test these conjectures about the block’s history, if subsequent exchanges have not blurred the historical record. By sampling the sequence of this block and analyzing two polymorphic insertion/deletion events, we ask whether copies of the f7501 block on different chromosomes have evolved independently since duplication onto multiple chromosomes, or whether transfer of subtelomeric genetic information between non-homologous chromosomes occurs frequently enough to be observed. Among 181 copies analyzed from 12 chromosome ends, we found several examples of transfers and exchanges among non-homologous chromosomes. We find several sequence haplotypes that are present on multiple chromosomes. In addition, chromosomes 5q, 8p, 11p and/or 15q each appear to have received the block from at least two separate sources. We also observe at least one ectopic gene conversion event among the chromosomes analyzed. These events refute the notion that there is such a thing as a chromosome-specific subtelomeric map and obscure the historical record of these dynamic regions in the genome. RESULTS In order to better understand the evolutionary relationships among copies of the f7501 block on different chromosomes, we analyzed sequences in this region in multiple individuals. 175 chromosomal copies of the f7501 block were analyzed in 22 individuals from six different populations. Also included were six chromosomes from monochromosomal somatic cell hybrid lines (two each of chromosomes 3, 15 and 19) for a total of 181 copies from 12 chromosome ends (Table 1). Because this sequence is present on multiple chromosomes, each chromosome carrying the f7501 sequence, as previously identified by fluorescence in situ hybridization (FISH), was isolated from copies elsewhere in the genome by flow sorting before PCR amplification and sequence analysis (see Materials and Methods). Both coding and non-coding segments within the f7501 region were analyzed at the sequence level: (i) 1 kb of sequence encompassing the coding exon of OR-A; (ii) a human-specific Alu repetitive element present in a subset of chromosomal copies, located 9.2 kb distal (telomeric) to the coding sequence; and (iii) 1.1 kb of non-coding sequence, 10.4 kb distal to the coding sequence. In addition to the Alu insertion, a polymorphic deletion in the putative promoter region of the OR-A gene was analyzed in a subset of chromosomes. These loci are distributed across 12.5 kb of this subtelomeric block (Fig. 1). Haplotype analysis at the nucleotide level Non-coding DNA. Sequence analysis of the 1.1 kb segment of non-coding DNA reveals the frequency and types of changes that have arisen in copies of this subtelomeric block. Fifteen sites within the non-coding segment varied among the chromosomes analyzed. Each site was biallelic, and the ancestral state was determined by comparison to chimp, gorilla and orangutan sequences. Of the 15 dimorphic sites, one was observed only once, and a second was observed on only three of 181 chromosomes. Nine of the variants (60%) were transitions, none of which are changes at a CpG site, and 40% were transversions. This ratio is close to the expected 2:1 ratio of transitions to transversions (11). The 15 variable sites define 10 different haplotypes within this 1.1 kb segment. These haplotypes diverge from chimp by an average of 0.83% (range 0.64–1.00%). The frequencies of each of the 10 non-coding haplotypes vary markedly overall and by chromosome location (Fig. 2A). Five of the 10 haplotypes were found at only one chromosome location, and two of these were observed only a single time. However, five haplotypes were found on multiple chromosome ends suggesting multiple recent duplications and/or exchanges involving nonhomologous chromosomes. The most common haplotype (N1) had an overall frequency of 44% and was observed on eight different ends. OR-A coding exon. The sequences of the OR-A coding exon show lower average divergence than the non-coding sequences (P > 0.001), but contain more variable sites due to multiple changes at CpG dinucleotides. Twenty dimorphic sites were observed within the 972 bp OR-A coding exon. Sixteen of the variants (80%) were transitions, seven of which occur at CpG sites, and four (20%) were transversions. Excluding the seven CpG sites, 69% of the variants are transitions while 31% are transversions. Eight of the observed variants were seen only once or twice, and six of these result from changes at CpG sites (Fig. 2B). The variable sites together define 21 different OR-A haplotypes that have an average divergence from chimp of 0.60% (range 0.31–0.93%). The observed frequencies of each OR-A haplotype are shown in Human Molecular Genetics, 2001, Vol. 10, No. 21 2365 Table 1. Nucleotide diversity and divergence of haplotypes observed at each chromosome location included in the analysis. Non-coding DNA OR-A exon Chromosome Copies Number Nucleotide SD (%) analyzed of DNA diversity haplotypes (%) Estimated Average divergence age (%) (Myears) a 3qter Copies analyzed Number Nucleotide SD (%) Average Estimated of DNA diversity divergence age haplotypes (%) (%) (Myears)a 46 1 0.00 0.00 0.91 0.0 46 3 0.02 0.01 0.51 0.2 45 1 0.00 0.00 0.91 0.0 45 2 0.01 0.01 0.51 0.0 5qter 2 2 0.46 0.50 0.86 2.6 2 2 0.31 0.36 0.46 3.3 6pter 1 1 – – 0.91 – 1 1 – – 0.51 – 6qter 3 1 0.00 0.00 0.91 0.0 3 1 0.00 0.00 0.51 0.0 7pter 5 3 0.13 0.11 0.71 0.9 5 4 0.31 0.23 0.60 2.6 8pter 2 2 0.55 0.59 0.91 3.0 2 2 0.51 0.56 0.61 4.2 9qter 9 3 0.06 0.05 0.92 0.3 9 2 0.04 0.05 0.53 0.4 11pter 3qter-A 16 3 0.17 0.11 0.82 1.0 16 6 0.14 0.10 0.45 1.6 11pter-A 2 1 0.00 0.00 0.91 0.0 2 2 0.10 0.13 0.57 0.9 11pter-B 13 1 0.00 0.00 0.82 0.0 13 3 0.05 0.05 0.44 0.5 11pter-C 1 1 – – 0.64 – 1 1 – – 0.41 – 15qter 45 3 0.22 0.13 0.77 1.4 45 3 0.09 0.07 0.39 1.2 15q-B 25 1 0.00 0.00 0.82 0.0 28 1 0.00 0.00 0.41 0.0 15q-C 20 2 0.12 0.09 0.70 0.9 17 2 0.08 0.07 0.36 1.1 16pter 2 1 0.00 0.00 0.64 0.0 2 2 0.31 0.36 0.46 3.3 16qter 7 2 0.30 0.20 0.71 2.1 7 3 0.23 0.16 0.57 2.0 2 1 0.00 0.00 0.91 0.0 2 3 0.00 0.00 0.51 0.0 16q-A 16q-C 19pter 5 2 0.00 0.00 0.64 0.0 5 3 0.09 0.08 0.60 0.7 43 5 0.35 0.20 0.81 2.2 43 8 0.33 0.19 0.70 2.4 19pter-A 22 2 0.01 0.02 0.91 0.0 24 4 0.07 0.06 0.56 0.6 19pter-C 21 3 0.05 0.04 0.67 0.3 19 4 0.12 0.09 0.84 0.7 181 10 0.36 0.20 0.83 2.2 181 21 0.28 0.16 0.53 2.6 Overall For chromosomes 3q, 11p, 15q, 16q and 19p, subgroups of haplotypes found at those locations are analyzed separately. A, B and C correspond to groups indicated in Figures 2 and 3. Age estimates are calculated as age = (nucleotide diversity) × (5 Myears)/(average divergence), assuming a divergence time of 5 Myears between humans and chimpanzees (18). This calculation assumes that no exchanges between non-homologous chromosomes have occurred since the original placement of the block, except where subgroups were analyzed separately for chromosomes 3, 11, 15, 16 and 19. aMyears, million years. Figure 2B. Of the 20 variant sites within the coding exon, eight changes are silent, 11 cause amino acid alterations, and one results in a premature stop codon. We observed only a single instance of the latter. The proteins potentially produced by these various OR-A haplotypes are discussed elsewhere (10). Extended haplotypes across 12.5 kb of the f7501 block We were able to deduce extended haplotypes encompassing the OR-A and non-coding sequence haplotypes for most of the 181 chromosomes analyzed. Although both homologs are flow sorted and analyzed together, prior FISH analysis indicates whether the f7501 block is present on one or both homologs. Therefore, extended haplotypes can be determined easily if the block is present on only one homolog of the sorted chromosome pair, or, when it is present on both homologs, if homozygosity is observed at all sequenced loci. Haplotypes can also be determined if an individual is heterozygous at one locus, but homozygous at the others. Using these criteria, we could determine phase of the coding and non-coding haplotypes and extended haplotypes for 157 of the 181 chromosomes analyzed. A total of 26 non-recombinant extended haplotypes could be defined by the 21 OR-A haplotypes and 10 non-coding haplotypes together (Fig. 3A). For the 24 cases where heterozygosity was observed at both loci, extended haplotypes were inferred from the 26 haplotypes observed in homozygotes. Five recombinant haplotypes were also observed. Three (A3b-N1, A3c-N1, A4b-N7) can best be explained by homologous recombination with a crossover between the two sequenced loci. For example, A3b-N1 was observed on chromosome 19 and most likely results from recombination between the more common A3b-N4 and A1a-N1 haplotypes, which are both also found on chromosome 19. Two additional recombinant haplotypes are likely to result 2366 Human Molecular Genetics, 2001, Vol. 10, No. 21 Figure 1. Nineteen kilobases of the f7501 subtelomeric block encompassing the regions analyzed from multiple chromosomes from multiple individuals. from ectopic gene conversion or recombination between nonhomologous chromosomes (A8-N1 and A4a-N4; see below). Analysis of insertion/deletion polymorphisms Alu elements are transposable elements that stably integrate into the genome. Because integration between a particular pair of nucleotides is likely to have occurred only once in evolution, the presence of an Alu insertion at a particular site can be used as a molecular marker for regions that are identical by descent from a common ancestor (12). By comparing three published sequences (GenBank accession nos L78442, AC005603, AC005604) containing sequence overlapping the f7501 block and likely derived from different chromosomes (1 and unpublished data), we identified an Alu insertion that is not present in all copies of the block. The Alu insertion was not detected at this location in non-human primates (data not shown), and its sequence indicates that it is a member of one of the youngest Alu subfamilies, the human-specific AluYa5 family (13). The two published sequences containing the Alu insertion (GenBank accession nos AC005603, AC005604) also lacked 881 bp ∼2.4 kb upstream of the OR-A coding exon. This 881 bp sequence is present in non-human primates (data not shown) and thus appears to have been deleted from some copies in the human genome. The chromosomal distribution of these markers was analyzed in order to further define and verify extended haplotypes and to track the evolutionary relationships among copies of the f7501 block. The human-specific Alu-repeat insertion and 881 bp deletion co-segregate We tested 133 chromosomes carrying the f7501 block for the presence of this Alu element using primers matching sequence flanking the site of insertion. The Alu insertion was found in some but not all copies of the f7501 block on chromosomes 5q (one of two f7501-carrying copies analyzed), 8p (one of two), 11p (four of nine), and 15q (23 of 33). It was not found at any of the following locations: 3q, 6q, 7p, 9q, 16p, 16q or 19p (34, two, five, seven, two, four and 34 f7501-carrying copies analyzed, respectively). It is likely that all Alu-containing copies are derived from the same single insertion event because the Alu element is situated at the same nucleotide position in all 24 chromosomes for which the insertion site was analyzed at the sequence level (data not shown). We also analyzed 77 copies lacking the Alu element and found them all to have identical sequence across the insertion site, suggesting that the Alu element had never been present in these copies. Therefore, the dispersal of the Alu element onto multiple different chromosomes must be the result of transfer of this region among non-homologous chromosomes after the initial Alu insertion event. Of the 133 chromosomes analyzed for the Alu insertion, 105 were also assayed for the 881 bp deletion. We could determine the phase of the two insertion/deletion events in 97 cases. Twenty copies carrying the Alu insertion also had the 881 bp deletion, while 77 copies lacked the Alu insertion and the 881 bp deletion. Four individuals who were heterozygous for the Alu insertion on chromosome 15q were also heterozygous for the 881 bp deletion. Together, these results suggest that the Alu insertion and the 881 bp deletion co-segregate. The Alu insertion/881 bp deletion combination was observed on chromosomes 5q, 8p, 11p and 15q on the common sequence haplotype, A2-N7, and on A7-N7 and A14-N7, which differ from A2-N7 by 1 to 2 bp over ∼2.1 kb. Phylogenetic analysis of extended haplotypes We used phylogenetic analysis to investigate the evolutionary relationships among the 26 extended haplotypes. One possible tree representing the evolutionary relationships is shown in Figure 3A. The haplotypes cluster into three phylogenetic groups. Group A and C haplotypes lack the Alu insertion, but form separate phylogenetic clades due to sequence differences. The three group B haplotypes contain the Alu insertion and 881 bp deletion. Haplotypes within the same group differ on average by 0.19%, which is significantly less than the average divergence (0.55%) between haplotypes from different groups (P < 0.001). We sequenced an additional ∼2 kb in the OR-B and OR-C main coding exons for a subset of 13 human chromosomes and three non-human primates. These chromosomes were chosen to represent each of the major clades in Figure 3A. As shown in Figure 3B, phylogenetic analysis using this additional sequence (for a total of ∼3800 bp sampled over 17.5 kb of the f7501 block) provides strong statistical support for the overall topology of the tree based on OR-A/noncoding haplotypes. Several relationships among the extended haplotypes provide evidence for sequence transfer between non-homologous chromosomes. Six of the extended haplotypes (A1a-N1, A1a-N9, A2-Alu-N7, A7-Alu-N7, A6-N4, A3b-N4) occur on multiple Human Molecular Genetics, 2001, Vol. 10, No. 21 2367 Figure 2. Sequence haplotypes of the 1.1 kb non-coding segment and the 1 kb OR-A exon and the frequency of each haplotype by chromosome and 181 chromosomes analyzed overall. Frequencies are calculated using only those chromosomes carrying the f7501 block. The number of copies analyzed for each chromosome is indicated in parentheses. The reference sequence is the f7501 cosmid (GenBank accession no. L78442). Changes at CpG dinucleotides are indicated by asterisks. The OR-A DNA haplotypes are named according to which protein form they are predicted to encode (10; e.g. DNA haplotypes A1a and A1b both encode protein P1). chromosome ends (Fig. 3A). The most frequently observed extended haplotype (A1a-N1) was seen on eight different chromosome ends (3q, 5q, 6p, 6q, 9q, 11p, 16q, 19p), and haplotype A1a-N8, which differs from A1a-N1 by a single basepair, was seen on a ninth chromosome end (8p). The presence of identical haplotypes on multiple chromosome ends suggests recent (and numerous) transfers of subtelomeric sequence among non-homologous chromosomes. Our analyses indicate that the f7501 block was transferred to chromosomes 5q, 8p, 11p and/or 15q from at least two separate sources. We observed two or more divergent haplotypes at each of these locations, one of which contains the Alu insertion and one of which does not (Fig. 3A). The situation on chromosomes 5q and 8p is perhaps most striking. The f7501 block is rarely found at these locations, and we had access to only two block-containing alleles of these chromosomes for analysis. In each case, one chromosome has a haplotype from group A, and one has an Alu-containing haplotype from group B. The two chromosome 5 haplotypes are 0.39% divergent across the OR-A exon and non-coding segment. The two chromosome-8p haplotypes are 0.53% divergent. The Alu must have inserted into one of the four chromosomes that now have Alu-inserted 2368 Human Molecular Genetics, 2001, Vol. 10, No. 21 Figure 3. Maximum parsimony tree of (A) 26 non-recombinant extended haplotypes and (B) seven extended haplotypes using additional sequence from the main coding exons of OR-B and OR-C. Bootstrap values are indicated for each major branch. Analysis was carried out using concatenated sequence using PAUP* (27). alleles (5q, 8p, 11p or 15q) and subsequently dispersed onto other chromosomes by duplication or exchange. Since we also find very different blocks on other alleles of these chromosomes (without the Alu and diverging at the non-coding and OR-A loci), these chromosomes must have also been the recipient of the block from a source lacking the Alu-Ya5 insertion. Human Molecular Genetics, 2001, Vol. 10, No. 21 2369 The block appears to have been transferred from three different sources to chromosome 11p, as haplotypes from groups A, B and C were observed on this chromosome. Chromosome 11 haplotypes from the three groups are 0.39–0.48% divergent from each other, while haplotypes within a group were 0–0.08% divergent. Haplotypes from two groups were observed on chromosome 15q. The most common chromosome 15q haplotype contained the Alu insertion (A2-Alu-N7). Two group C haplotypes, which are 0.29% divergent from the group B haplotype, were also observed on 15q. Although evidence for multiple transfer events is strongest for chromosomes 5q, 8p, 11p and 15q because of the tell-tale Alu insertion, we also find evidence for multiple transfers onto chromosomes 16q and 19p. The most frequently observed haplotypes on 16q are from group C and are 0.14% divergent from one another. However, a single copy of 16q analyzed carries a haplotype from group A that averages 0.46% divergence from the other 16q haplotypes and is very similar to the common A1a-N1 haplotype. Chromosome 19p carries the most diverse collection of haplotypes. Eleven different extended haplotypes were observed on chromosome 19, five fall into group A, and six belong to group B. While haplotypes within a group are 0.11% divergent from each other, divergence between group A and B haplotypes is significantly higher at 0.68% (P < 0.001). Separate phylogenetic analyses of the OR-A exon and the non-coding region reveals at least one, apparently ectopic, gene conversion event involving chromosome 3q and another chromosome. While 45 of 46 copies of the 972 bp exon on chromosome 3q differ by 0 or 1 bp (haplotypes A1a and A9), one haplotype (A8) differs from all other copies on chromosome 3q by 3 or 4 bp. This A8 haplotype is most similar (99.8% identical) to haplotypes A4a and A4b, the non-Alu haplotypes found on chromosomes 15q. However, this copy of chromosome 3q is identical at the non-coding locus to the other 45 3q copies analyzed. Sequence analysis of DNA flanking the OR-A coding exon indicates that the gene conversion event extends at least 200 bp, but <700 bp, upstream and at least 800 bp downstream of the exon (data not shown). Different combinations of OR-A and non-coding sequence haplotypes on chromosome 15q suggest yet another nonhomologous exchange event. On chromosome 15q, the OR-A haplotype A4a segregates with two different non-coding haplotypes, N2 and N4. N2 was seen only on 15q, but N4 was also observed on chromosomes 16p, 16q and 19p. N4 did not segregate with other chromosome-15q OR-A haplotypes. A gene conversion or recombination event involving the more distal (telomeric) non-coding locus between a chromosome 15q with the N2 haplotype and a different, N4-carrying chromosome could explain this observation. Analysis of nucleotide diversity Nucleotide diversity (π) estimates the number of nucleotide differences between two randomly chosen sequences, and when combined with divergence from chimp, can be used to estimate duplication age. Overall nucleotide diversity is 0.36% in the non-coding DNA segment (range 0–0.55%) and 0.28% at the OR-A locus (range 0–0.51%; Table 1). Nucleotide diversity in non-coding DNA is greater than in the OR-A coding exon at most chromosome locations as expected, although the difference is not significant. The chromosome locations with the greatest nucleotide diversity are 5qter and 8pter. However, the high values at these locations are most likely due to the presence of two divergent haplotypes derived from different sources, as described above, rather than divergence of the copies from a common chromosome 5q or chromosome 8p ancestor. Excluding 5q and 8p, the highest level of nucleotide diversity is found on chromosome 19p (0.35% at non-coding, 0.33% at OR-A), suggesting that this copy may be older than the rest. However, if the block was transferred to 19p from two or more sources, the high level of nucleotide diversity level may cause a false elevation of the estimated duplication age (see Discussion). High levels of nucleotide diversity on chromosomes 7p and 16q suggest these locations may be old as well. In contrast, all copies from chromosomes 3q (n = 45, excluding haplotype A8) and 6q (n = 3) analyzed differ by 0 to 1 nucleotides over the 2 kb analyzed. DISCUSSION The subtelomeres are unusual regions of the human genome where a given chromosome may be more similar to other nonhomologous chromosomes than to its homologous partner. Since meiotic pairing begins near the telomeres of human chromosomes (14,15), it is possible that regions of high sequence similarity on non-homologous chromosomes pair transiently during homology searching in early meiosis (16). These interactions provide opportunities for exchange of genetic information among subtelomeric regions of nonhomologous chromosomes. How often such interactions occur is unknown. Certainly the present structure of subtelomeres suggests that such interactions occurred in the past. In this study, we analyze 35 single nucleotide polymorphisms (SNPs) and two insertion/deletion polymorphisms in order to explore the evolutionary history of a subtelomeric block. Our results provide evidence that subtelomeric structure has been shaped by multiple events that involve interactions between non-homologous chromosomes. These include multiple transfers of a subtelomeric block to different alleles of the same chromosome and ectopic gene conversion. Below, we discuss possible models for subtelomere evolution supported by our results, as well as the difficulties these phenomena create for analyzing complex regions of the genome. Several models are consistent with the multichromosomal distribution of subtelomeric blocks demonstrated by FISH (1–8). One model for the evolution of human subtelomeres is that, subsequent to the initial duplications, which must have occurred after the divergence of humans and non-human primates, the paralogous segments on different chromosomes evolved independently, accumulating chromosome-specific variants. An alternative model is that subtelomeres undergo recurrent shuffling and rearrangement by processes such as recombination, gene conversion and duplication between nonhomologous chromosomes. This model predicts that a given sequence haplotype would be present on multiple chromosomes as a result of this exchange of subtelomeric blocks among chromosomes. Our analyses of the f7501 block support the latter model. This block was duplicated onto multiple chromosomes after humans diverged from other primates (1). By comparing sequence among extant blocks on multiple chromosomes, we 2370 Human Molecular Genetics, 2001, Vol. 10, No. 21 observe several patterns that are inconsistent with independent evolution of the blocks on different chromosomes after the initial duplicative transfers. We find several sequence haplotypes that are present on multiple chromosomes and several chromosomes carrying multiple, divergent haplotypes. The conventional explanation for the presence of multiple, divergent haplotypes at a particular genomic location is divergence (accumulation of mutations) over time. If no exchange occurs among non-homologous chromosomes, the divergence among haplotypes at a given chromosomal location would depend on how long ago the sequence was deposited at that location, and the presence of similar collections of haplotypes on different chromosomes would be ascribed to convergent evolution. However, we see multiple chromosomes with identical haplotypes at two different 1 kb loci separated by 10 kb. The independent accumulation of the same set of mutations at each of these chromosome locations is highly unlikely. A more likely explanation for the presence of divergent alleles of f7501 on the same chromosome is the exchange of genetic information between non-homologous chromosomes. The presence of the 36 kb f7501 block on a chromosome may increase the chance that it will pair with a non-homologous chromosome carrying the same block, especially if its homolog does not. Illegitimate pairing might be abetted further by flanking sequences, which are duplicated on even more chromosomes than is the f7501 block, and by the extensive disparity in sequence content among homologs (1,8 and unpublished data). Such interactions may result in the swapping of two chromosome ends or in partial or complete gene conversion of a chromosome end and give rise to the observed sequence relationships. Indeed, the distribution of a polymorphic Alu insertion provides clear indication that the f7501 block has transferred to some chromosomes more than once and from different sources. The presence of the same Alu-containing haplotype on some alleles of chromosomes 5q, 8p, 11p and 15q and diverged non-Alu haplotypes on other alleles argues strongly against monochromosomal lineages. In addition, we see clear evidence of at least one ectopic gene-conversion event among the 181 chromosomes analyzed. Exchange of subtelomeric sequences has also been observed between highly similar repeat arrays on chromosomes 4q and 10q (17,18). Studies in somatic cells also suggest that subtelomeric regions of chromosomes co-localize more often during interphase than do interstitial regions (19), results supported by observations of individuals mosaic for 4q/10q subtelomeric translocations (18). Our study demonstrates non-homologous interactions during meiosis and indicates that multiple chromosome ends can be involved in non-homologous exchange events. At least 14 different chromosome ends carry highly similar copies of the f7501 block, and we have observed nonhomologous exchange events involving a total of at least six different chromosome ends. Overall nucleotide diversity, combined with the estimated divergence from chimpanzee over 5 million years (20), gives an estimate of the age of the f7501 duplications. The average nucleotide diversity of 0.36% among all chromosomes at the non-coding locus, combined with an average divergence from chimp of 0.83% over 5 million years, suggests the human copies of f7501 have been diverging for an average of ∼2 million years (0.36/0.83*5). The two most divergent copies of the non-coding segment are 0.82% divergent, suggesting that these two copies have been diverging for nearly 5 million years. Given their complex evolutionary history, however, subtelomeric regions are not entirely amenable to standard approaches for determining the ages of duplications from sequence divergence. For example, FISH data on individuals from populations dispersed around the world indicate that the block is almost completely fixed on chromosomes 3q, 15q and 19p in humans (1). The polymorphic presence of the block on other chromosomes, such as 5, 7, 11, 16, etc., suggests that the block was duplicated and transferred to these chromosomes more recently than to 3, 15 or 19. However, this prediction is not supported by comparative sequence data. For example, if 3q is truly an older site, then we should observe more variation among 3q copies of the block than among copies on these other chromosomes. Instead, the sequences of 45 of 46 copies of 3q are nearly identical at both the non-coding segment and the OR-A exon. In contrast, the five copies of chromosome 7p analyzed diverge by up to 0.34% across the two regions, suggesting that 7p is an old, but incompletely fixed, duplication. Given that the block is common on 7p only in African Pygmy populations, the block may have been present on 7p for a long time, but chromosomes 7 carrying the block were not in the pool of chromosomes that left Africa. There are various possible explanations for the lack of diversity on chromosome 3q compared with other chromosomes. The presence of the block on both homologs in all individuals sampled from multiple populations around the world (1) suggests that it was present on chromosome 3q before humans migrated around the world. If duplication onto chromosome 3q happened just prior to this event, the lack of diversity could be ascribed to insufficient divergence time. An alternative explanation is that the chromosome 3q copy is older, but one allele was fixed by genetic drift. It is also possible that the copy on 3q is old, but a selective sweep, possibly selection for the version of OR-A on 3q, has led to a homogenization in the region. While similarity among 3q copies could belie their age, variability among alleles of the block on other chromosomes could conversely over-estimate the age of duplications onto these sites. The nucleotide diversity calculated here for non-coding DNA and/or the OR-A exons on several chromosomes, especially chromosomes 5q, 7p, 8p, 16q and 19p (see Table 1), are notably high relative to diversity estimates based on SNP frequencies in single-copy portions of the genome (21–23). High levels of nucleotide diversity might be explained by a high mutation rate in the region, as suggested by Baird et al. (24) for telomere-adjacent sequence. However, the overall divergence from chimpanzee argues against this explanation for the f7501 block. High nucleotide diversity could also reflect the old age of the block at these locations, as postulated above for the 7p alleles found in African populations. Our observations of multiple interchromosomal exchanges provide an alternative interpretation. Multiple transfer events can place divergent copies at the same location, elevate estimates of nucleotide diversity, and falsely elevate age estimates. For example, non-coding segments on chromosome 19p exhibit a nucleotide diversity of 0.35% and an average divergence from chimp of 0.81%. At face value, these data suggest that the copies of f7501 on 19p have been diverging for 2.2 million years. However, this age estimate is valid only if chromosome 19p was ‘colonized’ by the f7501 block only once in history. Human Molecular Genetics, 2001, Vol. 10, No. 21 2371 Alternatively, the divergent collection of haplotypes on chromosome 19p may have resulted from transfer of the f7501 block from two different sources as discussed above. In this case, some alleles of 19p may be the descendents of an f7501colonization event that occurred as recently as 0.5 Mya (Table 1, see values for types A and C haplotypes). If multiple transfer events are accounted for and nucleotide diversities are calculated based on those haplotypes putatively derived from a common ancestral transfer event, nucleotide diversity values are within ranges seen by others for interstitial regions of the genome (Table 1, see values for subsets of chromosomes 16q and 19p). A practical consequence of the interchromosomal exchange of subtelomeric sequences—a measurable mobility even among the small set of chromosomes analyzed here and assayed using only a small subunit of this complex patchwork of multicopy duplications—is that one would be hard pressed to assign a given haplotype to a particular chromosomal location with any degree of confidence. The processes acting on subtelomeres may have more far-reaching phenotypic consequences as well. Analysis of the OR-A coding exon suggests that the vast majority of copies of the OR-A gene are potentially functional (10). Individuals carry different numbers of functional copies that encode proteins differing by 1 to 5 amino acid substitutions, and individuals may express different combinations of OR-A gene copies from different (and multiple) chromosomal locations (10). Therefore, subtelomeric plasticity may contribute to normal phenotypic diversity. Nonhomologous interactions within subtelomeres may also have pathogenic consequences. It has been estimated that 5–10% of mental retardation cases are due to subtelomeric translocations (25). While the subtelomeric rearrangements detected in these studies include chromosome-specific material, the interaction of highly homologous subtelomeric sequences may be a major factor in initiating illegitimate pairing and promoting deleterious exchanges between non-homologous chromosomes. MATERIALS AND METHODS Cell lines Lymphoblast cell lines were obtained from NIGMS Human Genetic Mutant Cell Repository (Camden, NJ) for 19 individuals: GM10470, GM10471, GM10493, GM10494, GM10495, GM10496, GM10539, GM10541, GM10543, GM10966, GM10977, GM10978, GM11373, GM11374, GM11375, GM11523, GM11524, GM11525 and CGM1. Data were obtained using peripheral blood lymphocyte cultures for two individuals and cultured primary skin fibroblasts from one individual after appropriate informed consent was obtained. Somatic hybrid cell lines containing individual human chromosomes (3, NA10253 and NA11713; 15, NA11418 and NA11715; 19, NA10449 and NA10612) were obtained from the NIGMS Repository. Sequence analysis of flow-sorted chromosomes Chromosomes were isolated from lymphoblast cell lines or from phytohaemagglutinin (PHA)-stimulated peripheral blood cell cultures into a polyamine buffer, stained with Hoechst 33258 and chromomycin A3, and sorted using a custom duallaser flow cytometer, as described by Mefford et al. (26). In several cases, we resolved chromosomes 9 and 11 by measuring the fluorescence intensity of a FITC-labeled polyamide (Prolinx, Inc., Bothell, WA), which targets a short sequence repeated in the heterochromatic region of chromosome 9 (M.Gygi et al., manuscript in preparation). 2000 copies of each chromosome carrying the f7501 block were sorted into a 0.5 µl PCR tube containing 10 µl sterile H2O and stored at –20°C before use. PCR ingredients [final concentrations: 1× High Fidelity buffer 2200 µM each dNTP, 400 nM each primer, 0.7 U Expand High Fidelity polymerase (Roche Molecular, Indianapolis, IN)] were added to the tube. After initial denaturation at 94°C for 2 min, the 25 µl reactions were subjected to 35 amplification cycles of 94°C 30 s, 60°C 30 s, 72°C 90 s. Primers used to amplify the OR-A coding exon were: F4708 (5′-ATTGAGGCAATGTATGTGGAAG-3′) and OLF-AR (5′-ACACTGAGAAGCCGAGATAACTGAA-3′). Primers for the non-coding segment were: F16147 (5′-CAAGAAGTCAGAATCAGAAGG-3′) and R17372 (5′-TATTTTCACTCCCTCATCTCA-3′). As necessary, 1 µl of product was re-amplified using primers OLA10 (5′-CCAACTTCACTATATTTTGTG-3′) and OLA4 (5′-TCTGACTTCCTTCTCCTTCTC-3′) for the OR-A exon or primers F16192 (5′-GATCTTTCTCAATAGTGGTCT-3′) and R17307 (5′-AATGTAGTACCTCAAATCCTT-3′) for the non-coding segment, using the same PCR conditions for 35 additional cycles. The Alu insertion was assayed using primers AL14759 (5′-TGGTGTTTGTCTTGGAGTGTGAG-3′) and ALr15146 (5′-TTGCTTTAAGCCTGAAGGTAACC-3′) and the 881 bp deletion was assayed with primers U7984 (5′CTGAATTTGTGCTGCTGAGG-3′) or U8594 (5′-GCCTTAGCTTTCCTGTTTTT-3′) and A9295 (5′-TGGCCAAGGGAAAACTTGTGA-3′). Excess dNTPs and primers from DNA produced by PCR amplification were removed by purifying 20 µl of PCR product through Sephacryl 300 spin columns (Sigma-Aldrich, St Louis, MO). Bulk PCR products were sequenced with Ready Reaction Big-dye terminator PRISM kits with AmpliTaq FS (Perkin Elmer). Primers used for sequencing were the same as for PCR amplification. Sequence haplotypes in heterozygotes were resolved by cloning and sequencing PCR products. Results of sequence analysis were consistent with the presence of only one copy of the subtelomeric f7501 block at each chromosome end in all cases. Only a single haplotype was apparent in sequence traces from monochromosomal hybrid lines and from tubes containing two homologs, only one of which carried the block. In cases where both sorted homologs carried the block, a maximum of two haplotypes were apparent in sequence traces. Rare variants were verified by resequencing products amplified in an independent reaction to rule out PCR error. Phylogenetic analysis was carried out using PAUP* (27). For extended haplotypes, noncoding and OR-A exon sequence haplotypes were concatenated. Nucleotide diversity Nucleotide diversity (π) is the average number of pairwise nucleotide differences per site between two randomly chosen sequences. Nucleotide diversity and variance were calculated as shown below: 2 2 ( n + 1 )π 2 ( n + n + 3 )π 1 π = ---------------------------------- Σπ ij V(π ) = ------------------------ + -------------------------------------[ n ( n – 1 ) ⁄ 2] 3 ( n – 1 )L 9n ( n – 1 ) where πij is the number of nucleotide differences between the 2372 Human Molecular Genetics, 2001, Vol. 10, No. 21 ith and jth sequence, n is the number of sequences analyzed, and L is the total length of sequence. ACKNOWLEDGEMENTS We thank Carson Thoreen for helpful discussion, and Melanie Gygi and Prolinx, Inc. for polyamide probes. This work was supported in part by NIH grants R01 GM57070 and R01 DC04209 to B.J.T. H.C.M. was supported by T32 HG00035 and a Poncin fellowship. REFERENCES 1. Trask, B.J., Friedman, C., Martin-Gallardo, A., Rowen, L., Akinbami, C., Blankenship, J., Collins, C., Giorgi, D., Iadonato, S., Johnson, F. et al. (1998) Members of the olfactory receptor gene family are contained in large blocks of DNA duplicated polymorphically near the ends of human chromosomes. Hum. Mol. Genet., 7, 13–26. 2. Monfouilloux, S., Avet-Loiseau, H., Amarger, V., Balazs, I., Pourcel, C. and Vergnaud, G. (1998) Recent human-specific spreading of a subtelomeric domain. Genomics, 51, 165–176. 3. Hoglund, M., Mitelman, F. and Mandahl, N. (1995) A human 12p-derived cosmid hybridizing to subsets of human and chimpanzee telomeres. Cytogenet. Cell Genet., 70, 88–91. 4. Brown, W.R., MacKinnon, P.J., Villasante, A., Spurr, N., Buckle, V.J. and Dobson, M.J. (1990) Structure and polymorphism of human telomereassociated DNA. Cell, 63, 119–132. 5. Cross, S., Lindsey, J., Fantes, J., McKay, S., McGill, N. and Cooke, H. (1990) The structure of a subterminal repeated sequence present on many human chromosomes. Nucleic Acids Res., 18, 6649–6657. 6. Ijdo, J.W., Lindsay, E.A., Wells, R.A. and Baldini, A. (1992) Multiple variants in subtelomeric regions of normal karyotypes. Genomics, 14, 1019–1025. 7. Martin-Gallardo, A., Lamerdin, J, Sopapan, P., Friedman, C., Fertitta, AL., Garcia, E., Carrano, A., Negorev, D., Macina, R.A., Trask, B.J. et al. (1995) Molecular analysis of a novel subtelomeric repeat with polymorphic chromosomal distribution. Cytogenet. Cell Genet., 71, 289–295. 8. Wilkie, A.O., Higgs, D.R., Rack, K.A., Buckle, V.J., Spurr, N.K., Fischel-Ghodsian, N., Ceccherini, I., Brown, W.R. and Harris, P.C. (1991) Stable length polymorphism of up to 260 kb at the tip of the short arm of human chromosome 16. Cell, 64, 595–606. 9. Glusman, G., Yanai, I., Rubin, I. and Lancet, D. (2001) The complete human olfactory subgenome. Genome Res., 11, 685–702. 10. Linardopoulou, E., Mefford, H.C., Nguyen, O., Friedman, C., van den Engh, G., Farwell, G.D., Coltrera, M. and Trask, B.J. (2001) Transcriptional activity of multiple copies of a subtelomerically located olfactory receptor gene that is polymorphic in number and location. Hum. Mol. Genet., 10, 2373–2383. 11. Collins, D.W. and Jukes, T.H. (1994) Rates of transition and transversion in coding sequences since the human-rodent divergence. Genomics, 20, 386–396. 12. Batzer, M.A., Stoneking, M., Alegria-Hartman, M., Bazan, H., Kass, D.H., Shaikh, T.H., Novick, G.E., Ioannou, P.A., Scheer, W.D., Herrera, R.J. et al. (1994) African origin of human-specific polymorphic Alu insertions. Proc. Natl Acad. Sci. USA, 91, 12288–12292. 13. Batzer, M.A., Deininger, P.L., Hellmann-Blumberg, U., Jurka, J., Labuda, D., Rubin, C.M., Schmid, C.W., Zietkiewicz, E. and Zuckerkandl, E. (1996) Standardized nomenclature for Alu repeats. J. Mol. Evol., 42, 3–6. 14. Wallace, B.M. and Hulten, M.A. (1985) Meiotic chromosome pairing in the normal human female. Ann. Hum. Genet., 49, 215–226. 15. Chandley, A.C. (1989) Asymmetry in chromosome pairing: a major factor in de novo mutation and the production of genetic disease in man. J. Med. Genet., 26, 546–552. 16. Speed, R.M. (1988) The possible role of meiotic pairing anomalies in the atresia of human fetal oocytes. Hum. Genet., 78, 260–266. 17. van Overveld, P.G., Lemmers, R.J., Deidda, G., Sandkuijl, L., Padberg, G.W., Frants, R.R. and van der Maarel, S.M. (2000) Interchromosomal repeat array interactions between chromosomes 4 and 10: a model for subtelomeric plasticity. Hum. Mol. Genet., 9, 2879–2884. 18. van der Maarel, S.M., Deidda, G., Lemmers, R.J., van Overveld, P.G., van der Wielen, M., Hewitt, J.E., Sandkuijl, L., Bakker, B., van Ommen, G.J., Padberg, G.W. and Frants, R.R. (2000) De novo facioscapulohumeral muscular dystrophy: frequent somatic mosaicism, sex-dependent phenotype, and the role of mitotic transchromosomal repeat interaction between chromosomes 4 and 10. Am. J. Hum. Genet., 66, 26–35. 19. Stout, K., van der Maarel, S., Frants, R.R., Padberg, G.W., Ropers, H.H. and Haaf, T. (1999) Somatic pairing between subtelomeric chromosome regions: implications for human genetic disease? Chromosome Res., 7, 323–329. 20. Horai, S., Satta, Y., Hayasaka, K., Kondo, R., Inoue, T., Ishida, T., Hayashi, S. and Takahata, N. (1992) Man’s place in Hominoidea revealed by mitochondrial DNA genealogy. J. Mol. Evol., 35, 32–43. 21. Li, W.H. and Sadler, L.A. (1991) Low nucleotide diversity in man. Genetics, 129, 513–523. 22. Przeworski, M., Hudson, R.R. and Di Rienzo, A. (2000) Adjusting the focus on human variation. Trends Genet., 16, 296–302. 23. Kruglyak, L. and Nickerson, D.A. (2001) Variation is the spice of life. Nat. Genet., 27, 234–236. 24. Baird, D.M., Coleman, J., Rosser, Z.H. and Royle, N.J. (2000) High levels of sequence polymorphism and linkage disequilibrium at the telomere of 12q: implications for telomere biology and human evolution. Am. J. Hum. Genet., 66, 235–250. 25. Flint, J., Wilkie, A.O., Buckle, V.J., Winter, R.M., Holland, A.J. and McDermid, H.E. (1995) The detection of subtelomeric chromosomal rearrangements in idiopathic mental retardation. Nat. Genet., 9, 132–140. 26. Mefford, H., van den Engh, G., Friedman, C. and Trask, B.J. (1997) Analysis of the variation in chromosome size among diverse human populations by bivariate flow karyotyping. Hum. Genet., 100, 138–144. 27. Swofford, D.L. (2000) PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer Associates, Sunderland, MA.
© Copyright 2026 Paperzz