Mechanisms and Rates of Birth and Death of Dispersed Duplicated Genes during the Evolution of a Multigene Family in Diploid and Tetraploid Wheats Eduard D. Akhunov, Alina R. Akhunova, and Jan Dvorak Department of Plant Sciences, University of California, Davis A family of 5 genes that evolved within the past 1.9 Myr in diploid wheat was characterized. The ancestral gene, ALP-A1, is on chromosome 1A and encodes an aci-reductone dioxygenase–like protein. The duplicated genes ALP-A2, ALP-A3, ALPA4.1, and ALP-A4.2 acquired complete coding sequences but lost the original promoter. They are on chromosomes 4A, 2A, 6A and 6A, respectively, and evolved sequentially, the youngest duplicated gene always producing the next duplicate. It is shown that dispersed gene duplication rate consists of the primary rate (duplications of ancestral genes) and the secondary rate (duplications of genes that had been generated by recent duplications). The primary rate was 2.5 3 10 3 gene 1 Myr 1 in diploid wheat. The secondary rate was 5.2 3 10 2 gene 1 Myr 1 in the ALP family. The 20-fold acceleration of the secondary rate was caused by the insertion of the ALP-A2 gene into a novel type transposon. Only the ALP-A1 and ALP-A3 genes are transcribed. The transcription of ALP-A3 is directed by a promoter within a DNA fragment similar to a CACTA type of DNA transposons, making ALP-A3 a new gene. The ALP-A3 transcript is longer than that of the ALP-A1. The halflife of ALP duplicated genes was estimated to be 0.87 Myr. Strong purifying selection acting on the ancestral gene ALP-A1 was undiminished by the evolution of duplicated genes. The evolution of the ALP family shows that repeated elements facilitate both gene duplication and expression of duplicated genes and highlights their importance for the evolution of gene repertoire in large plant genomes. Introduction The evolution of new genes by gene duplication is one of the most important processes driving organic evolution. Polyploidy duplicates the entire gene repertoire of an organism in a single step and is therefore an exceedingly important source of duplicated genes (Ohno 1970). In the plant kingdom, polyploidy is a major evolutionary strategy, and even such classical ‘‘diploid’’ plant models as Arabidopsis, rice, and maize evolved from ancient polyploids (Blanc and Wolfe 2004; Paterson et al. 2004). It is therefore almost unavoidable to assume that a recent or ancient polyploidization is the cause of virtually all duplicated loci in a genome. Studies on wheat clearly showed that such an assumption would greatly distort our understanding of genome evolution in plants (Akhunov, Goodyear, et al. 2003). Beside polyploidy and segmental chromosome duplications, there are 2 basic types of gene duplications: tandem and dispersed. The former are subjected to unequal crossovers leading to reversions and concerted evolution. The latter are copies of genes or gene fragments translocated to other locations in a genome, giving rise to dispersed duplicated loci and dispersed multigene families. This type of gene duplication is more likely to evolve a new expression pattern because of the physical separation of the duplicated gene from its ancestral locus. Tandem gene duplications are more frequent than dispersed gene duplications. In the Arabidopsis genome, tandem duplication may represent nearly 50% of all recently evolved duplicated gene pairs, whereas dispersed duplication may account for only 6% of them (Moore and Purugganan 2003). In contrast, in wheat, nearly 20% of wheat unigenes involve loci that originated by recent interchromosomal gene duplications (Akhunov, Goodyear, et al. 2003). Key words: gene duplications, transposon, transcription, wheat, genome evolution. E-mail: [email protected]. Mol. Biol. Evol. 24(2):539–550. 2007 doi:10.1093/molbev/msl183 Advance Access publication November 29, 2006 Ó The Author 2006. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] Wheat species form a classical polyploid series at 3 ploidy levels: diploid (Triticum urartu and Triticum monococcum, genomes AA and AmAm, respectively), tetraploid (Triticum turgidum and Triticum timopheevii, genomes AABB and AAGG, respectively), and hexaploid (Triticum aestivum and Triticum zhukovskyi, genomes AABBDD and AAGGAmAm, respectively). Both tetraploid wheats originated by hybridization of species in the Aegilops speltoides evolutionary lineage with T. urartu (Dvorak and Zhang 1990; Dvorak et al. 1993). Triticum aestivum originated by hybridization of T. turgidum with Aegilops tauschii (genomes DD) (Kihara 1944; McFadden and Sears 1946; Dvorak et al. 1998). The analysis of 3,159 gene loci in the wheat A and D genomes and their diploid sources uncovered 25 loci that evolved by interchromosomal duplications during the evolution of T. urartu or Ae. tauschii since their divergence about 2.7 MYA (Dvorak and Akhunov 2005). It was estimated from these numbers that new duplicated loci have been evolving with a rate of 2.9 3 10 3 gene 1 Myr 1 in these diploid lineages. Genes that have an open reading frame (ORF) and are expressed at the time of their origin are probably most likely to result in the evolution of a gene with a new function. The completeness of a nascent duplicated gene and its expression status can seldom be deduced directly from genomic DNA sequence and usually must be inferred experimentally. The half-life of duplicated genes was estimated to be 2.9 Myr in invertebrate lineages and 3.2 Myr in the Arabidopsis lineage (Lynch and Conery 2000). Because of this short life span, most complete duplicated genes are destined to become pseudogenes (Walsh 1995) and gene fragments. This is consistent with global analyses of animal genomes, which revealed that most duplicated gene copies contained incomplete gene sequences; only 10.7% of duplicated genes in segmental duplications in the human genome are complete (Zhang et al. 2004) and as much as 70% of duplications are shorter than 2 kb in the Caenorhabditis elegans genome (Katju and Lynch 2003). However, duplicated gene fragments and pseudogenes could occasionally produce a new gene by combining several unrelated gene fragments into a single transcript 540 Akhunov et al. (Brunner et al. 2005). In the rice genome, gene fragments averaging only 325 bp are propagated by the ‘‘mutator’’like transposable element (MULE) (Jiang et al. 2004). Recently discovered helitrons (Kapitonov and Jurka 2001) are another example of transposons propagating numerous gene fragments (Morgante et al. 2005). The expression of duplicated genes can be directed by their own promoters or promoters of other genes. An intriguing possibility is that the expression of duplicated genes is directed by promoters furnished by repeated elements (White et al. 1994; Kawasaki and Nitasaka 2004; Brunner et al. 2005). A bioinformatic study suggested that rice MULEs have the potential to direct transcription of gene fragments (Jiang et al. 2004), but a follow-up study suggested that MULE-mediated gene duplication results in the formation of pseudogenes (Juretic et al. 2006). Duplications of complete genes may have several theoretical outcomes. 1) One of the duplicated genes could become a pseudogene through the acquisition of deleterious mutations, 2) both could continue to fulfill a similar function, 3) one copy could gain a new function, and 4) the original function of a gene could become split between the duplicated copies (subfunctionalization) (Ohno 1970; Walsh 1995; Lynch and Force 2000; Hughes 2002). Because the probability of the acquisition of deleterious mutations by a duplicated gene is very high, it was suggested that (1) is the primary fate of a duplicated gene (Walsh 1995). The high incidence of duplicated genes in eukaryotic genomes has been interpreted as suggesting that selection plays an important role in the fate of duplicated genes (Otto and Whitton 2000; Moore and Purugganan 2003; Jones et al. 2005; Moore and Purugganan 2005). Low levels of polymorphism in 3 pairs of dispersed Arabidopsis thaliana duplicated genes were interpreted as evidence of positive selection acting on the genes (Moore and Purugganan 2003). If repeated elements, such as MULEs and Helitrons, are able to duplicate gene fragments in the small rice genome and the medium-size maize genome (Jiang et al. 2004; Morgante 2006), the significance of repeated sequences for gene duplication could be far greater in large plant genomes, exemplified by those of wheat and its relatives in the grass tribe Triticeae, and may be the factor responsible for the large difference in the abundance of dispersed duplicated genes between Arabidopsis and wheat pointed out above. The sizes of the genomes of wheat diploid ancestors range from 4 Gb to about 6 Gb (Arumuganathan and Earle 1991) of which more than 90% are repeated nucleotide sequences (Akhunov et al. 2005). To obtain experimental data on the early stages of gene evolution via dispersed gene duplications, we analyzed the mechanisms and rates of duplication of genes corresponding to wheat expressed sequence tag (EST) unigene BF200640. The entire BF200640 family evolved in the A-genome diploid lineage since its divergence from the B and D genome lineages (Akhunov, Akhunova, et al. 2003). The ancestral state is a single locus on chromosome 1. This state is preserved in the wheat B and D genomes and the genomes of other wheat diploid relatives. In the wheat B and D genomes, the ancestral locus was mapped in the most distal bin of the 1AL and 1BL arms (Peng et al. 2004). In the A genome, duplicated genes were mapped Table 1 Triticum urartu Accessions Used in This Study Label Tu-B1 Tu-E1 Tu-F1 Tu-B2 Tu-D2 Tu-G2 Tu-G1812 Accession Location G1791 G1895 G3159 DV2351 DV2122 DV2374 G1812 Mardin, Turkey Urfa, Turkey El Beqaa, Lebanon Turkey, Sanli Urfa Syria, Aleppo Turkey, Sanli Urfa Mardin, Turkey on chromosomes 2A and 4A and 2 on chromosome 6A (Akhunov, Akhunova, et al. 2003). The recent origin of this family allowed us to infer the mechanisms of duplication, its rates, and the completeness of each gene at the time of the origin of the duplication. Transcription of each gene was analyzed in detail to assess its expression and regulation. Materials and Methods Plant Materials Nuclear DNAs were isolated (Dvorak et al. 1988) from single plants of 7 T. urartu accessions representative of the geographic distribution of this species (table 1) and from single accessions of the following species in the tribe Triticeae of the grass family: Triticum monococcum, Aegilops speltoides, Aegilops sharonensis, Aegilops longissima, Aegilops bicornis, Aegilops searsii, Aegilops caudata, Aegilops comosa, Aegilops uniaristata, Aegilops umbellulata, Aegilops tauschii, Taeniatherum caput-medusae, Heteranthelium piliferum, Secale cereale, Haynaldia villosa, Agropyron cristatum, Lophopyrum elongatum, Pseudoroegneria stipifolia, Thinopyrum bessarabicum, and Psathyrostachys juncea. Southern Blot Hybridization The DNAs were digested with EcoRI, electrophoretically fractionated in 1% agarose gels, and transferred to Hybond N1 nylon membranes (Amersham, Piscataway, NJ) by capillary transfer in 0.4 N NaOH overnight (Luo et al. 1998). The membranes were then rinsed in 2 3 standard saline citrate (SSC) for 5 min and hybridized with 32Plabeled probe derived from EST BF200640 (supplied by Olin Anderson, USDA Western Research Center, Albany, CA; cDNA clone WHE0825-0828_L16_L16) amplified by polymerase chain reaction (PCR) from the plasmid. Prehybridization and hybridization were performed as described earlier (Dubcovsky et al. 1996). The membranes were washed in 2 3 SSC and 0.5% sodium dodecyl sulfate (SDS) for 30 min to 2 h at 65 °C, 1 3 SSC and 0.5% SDS for 30 min at 65 °C, and 0.5 3 SSC and 0.5% SDS for 12 min and autoradiographed. Bacterial Artificial Chromosome Library Screening Bacterial artificial chromosome (BAC) library of T. urartu accession G1812 (Akhunov et al. 2005) and Triticum turgidum ssp. durum cv Langdon (Cenci et al. 2003) were employed in this study. 32P-labeled probe of wheat EST BF200640 was hybridized with 28 high-density membranes, each containing 18,432 double-printed clones, of BAC library of T. turgidum ssp. durum (henceforth durum Mechanisms and Fate of Dispersed Gene Duplications 541 Table 2 Gene Loci Sequenced in the Study Chromosome (gene) 1A (ALP-A1) 2A (ALP-A3) 4A (ALP-A2) 6A (ALP-A4.11.2) 1B (ALP-B1) 1D (ALP-D1) Species Triticum durum T. urartu T. durum T. urartu T. durum T. urartu T. durum T. urartu T. durum Aegilops tauschii BAC Clone (PCR amplicon) 285M18 317A22 466G24 404H6 221H19 292N12 219E24 41C8 285O20 (AL8/78) wheat). A total of 33 positive clones were isolated. The DNA of these BAC clones was digested with EcoRI restriction endonuclease, restriction fragments were resolved by 1% agarose gel electrophoresis, and Southern blot was hybridized with a 32P-labeled BF200640 probe. BAC clones containing the duplicated copies of the gene family were selected by comparing the hybridization profiles of the clones with the hybridization profile of the wheat genomic DNA digested with EcoRI restriction endonuclease. To sequence T. urartu loci, the EST BF200640 probe was hybridized with 9 screening membranes of T. urartu BAC library (Akhunov et al. 2005). Eight positive clones were identified. To sequence insertion sites of inverted repeat 2 (IR)-2– containing transposons, a T. urartu BAC library high-density screening membrane containing 18,432 double-printed clones was screened with a DNA fragment amplified by PCR from the BAC 404H6 DNA. The PCR target was a sequence upstream of the IR-2 terminal repeat (nucleotides 61,247–61,375 of BAC clone 404H6). The PCR product was 32P-labeled. The 3# insertion sites of the transposon were sequenced by primer walking using DNA of each positive BAC clone as a template. Clone Sequencing As a first step, durum wheat BAC EcoRI restriction fragments hybridizing with the BF200640 probe were subcloned into the pGEM3Zf(1) vector and sequenced using the transposon Tn5 kit (Epicentre, Berkley, CA). Durum wheat BAC clones harboring genes located on chromosomes 2A, 4A, and 6A were completely sequenced using the shotgun approach (Stein et al. 2000). Base calling and assembly of BAC contigs was performed using the Phred/Phrap/Consed software (Gordon et al. 1998). Only the gene and neighboring flanking DNA of a BAC clone harboring the durum wheat gene located on chromosome 1A were sequenced. BigDye v3.1 sequencing chemistry (ABI, Foster City, CA) and capillary electrophoresis with ABI3730xl was used to sequence DNA. The DNA sequences of genes from durum wheat were used to design 3 pairs of primers spanning the gene region (Table 1, Supplementary Material online). These primers were used to amplify and sequence the T. urartu gene on chromosome 1A using BAC DNA as a template. The total length of the sequenced gene region was about 1,200 bp. Triticum urartu BAC clones harboring genes located on chromosomes 2A, 4A, Sequencing Strategy Length (bp) Primer walk PCR amplicon sequencing Shotgun sequencing Shotgun sequencing Shotgun sequencing Shotgun sequencing Shotgun sequencing Shotgun sequencing PCR amplicon sequencing PCR amplicon sequencing 8,921 1,736 99,752 106,806 148,156 111,168 168,664 129,021 1,421 1,738 and 6A were completely sequenced using the shotgun approach (table 2). The same pairs of primers were used to amplify and sequence the gene from the genome of Ae. tauschii and the B-genome of durum wheat, using genomic DNA of Ae. tauschii AL8/78 and a BAC clone of durum wheat, respectively, as templates. In all cases, both strands of PCR products were sequenced. Ambiguous base callings were resolved by resequencing. To annotate repeated elements, BAC DNA sequences were compared with the Triticeae Repeat Sequence (TREP database; http://wheat.pw.usda.gov/ITMI/Repeats/) and Genetic Information Research Institute (GIRI) (http://www. girinst.org) databases. The coding potential of sequences were established by comparisons of translated BAC sequences with the National Center for Biotechnology Information (NCBI) nonredundant database using the BlastX (Altschul et al. 1990) program and by comparison of BAC sequences with the NCBI EST database using the BlastN program. Sequence comparisons of the paralogous gene loci were performed with the BlastN program. In addition to the T. urartu accession G1812, in which the aci-reductone dioxygenase–like protein (ALP) genes were sequenced in their entirety, a fragment of each gene from exons 3 to 5 was sequenced in 6 additional T. urartu accessions (table 1) representative of the geographic distribution of the species. Each gene fragment was PCR amplified and subcloned into the pGEM–T Easy plasmid vector (Promega, Madison, WI), and Escherichia coli DH10B cells were transformed by electroporation. A minimum of 2 independent clones per gene were sequenced using the M13 forward and reverse primers. Both strands of each clone were sequenced. Sequences were aligned with ClustalW program. The gaps in the alignment were deleted before analysis. The Close-Neighbor-Interchange algorithm implemented in the MEGA 3.1 program was used to construct the maximum parsimony trees. Confidence levels of the trees were assessed by bootstrap resampling replicated 1,000 times. Phylogenetic Analysis The exonic and intronic sequences of durum wheat, T. urartu, and Ae. tauschii were aligned using the ClustalX program followed by manual editing of the alignment. Phylogenetic relationships among the genes were inferred using the maximum parsimony and Neighbor-Joining methods of tree construction implemented in the PAUP program 542 Akhunov et al. (Swofford 2003). The Ae. tauschii gene sequence was used as an outgroup to root each tree. The bootstrap confidence of individual nodes was based on 1,000 resampling runs. A total of 890 noncoding and third position nucleotides of each gene were used to time each duplication event. The substitution model parameters were estimated using hierarchical likelihood ratio test implemented in the Modeltest program (Posada and Grandall 1998). According to the Akaike information criterion (AIC), the best model fitting the observed data was HKY (Hasegawa et al. 1985) without rate variation. The selected model parameters were then used for likelihood estimation of the branch lengths of the tree with the given topology using the PAUP program (Swofford 2003). The branch length estimates and the tree were used to compute the divergence time of the duplicated genes using the semiparametric penalized likelihood method implemented in the r8s program (Sanderson 2002). The smoothing parameter for the penalized likelihood method was estimated as described (Sanderson 2002). The ALPB1 gene sequence was used as an outgroup to root the tree. The outgroup was pruned before estimation of the divergence times. The calculations were based on the assumption that the A and D genomes diverged 2.7 MYA (Dvorak and Akhunov 2005). The dN/dS ratio was used as a measure of selective constraints imposed on duplicated genes. The dN/dS ratio was estimated using the maximum likelihood framework implemented in the HyPhy package (Kosakovsky-Pond et al. 2004). The HyPhy package was also used to perform the relative rate tests. The gene conversion between the genes of the ALP family in T. urartu and durum wheat was tested by GeneConv program (Sawyer 1989). The length of the alignment used in gene conversion analysis was 1,787 bp. Gene Expression and Rapid Amplification of cDNA Ends To determine the expression of each gene, T. urartu (accession G1812) and durum wheat cv. Langdon were grown in solution tanks containing 300 l of either 0.53 modified Hoagland solution or the same solution containing 125 mM NaCl(salt stress).Saltstresswas imposedbystepwiseincrease of NaCl concentration to 50, 100, and 125mM NaCl each third day. Whole roots and leaves were collected from 4-week plants, frozen immediately in liquid nitrogen, and stored at 80 °C. RNA was isolated using the RNA isolation kit (Qiagen, Valencia, CA). Reverse transcriptase–PCR was performed using the one-step RT–PCR Kit (Qiagen). A list of primers specific to every member of the ALP gene family is provided in the Table 1, Supplementary Material online. The transcription initiation site was determined with the GeneRacer Kit (Invitrogen,Carlsbad, CA). Rapid amplification of cDNA ends (RACE) products were subcloned using the TOPO Cloning Kit (Invitrogen) and sequenced. Results Phylogenetic Analysis of the ALP Gene Family The NCBI protein database was searched with the translated sequence of wheat EST BF200640. The EST showed the highest (75%) similarity at the amino acid level to the aci-reductone dioxygenase–like protein from rice (accession AAP53794). It is proposed to name wheat proteins encoded by this gene family as ALP. In the previous mapping study, a single ALP gene was located on chromosomes 1A, 2A, and 4A, and 2 genes were located on chromosome 6A (Akhunov, Akhunova, et al. 2003). Following the rules of nomenclature for wheat genes and to reflect the sequence of gene duplication (see below), these genes were designated as ALP-A1 (chromosome 1A), ALP-A2 (chromosome 4A), ALP-A3 (chromosome 2A), ALP-A4.1 (chromosome 6A), and ALP-A4.2 (chromosome 6A). Genes orthologous to ALP-A1 on chromosomes 1B and 1D were designated ALP-B1 and ALP-D1, respectively. The ancestral gene of this paralogous gene set is ALP-A1, whereas ALP-A2, ALP-A3, ALP-A4.1, and ALP-A4.2 genes are duplicated genes (Akhunov, Akhunova, et al. 2003). To assess the frequency of duplication of the ALP loci in the tribe Triticeae, Southern blots of 12 diploid species of the Triticum/Aegilops alliance and a single species from an additional 9 genera were hybridized with the BF200640 probe. Except for T. monococcum, T. urartu, and Aegilops umbellulatum, the remaining species showed a single restriction fragment, suggesting that they possessed only the ancestral ALP-1 gene. The number of restriction fragments per profile suggested that there were at least 3 ALP loci in T. monococcum and at least 2 in Ae. umbellulatum. Only a single gene was detected in rice. The gene was on rice chromosome 10 and was very likely orthologous to the locus on wheat chromosomes 1A, 1B, and 1D because the distal end of the wheat chromosomes of homoeologous group 1 is homoeologous with rice chromosome 10 (Sorrells et al. 2003). The 5 ALP genes present in the T. urartu genome were acquired by tetraploid and hexaploid wheats, such as durum wheat and T. aestivum (Akhunov, Akhunova, et al. 2003). Durum wheat and T. urartu BAC clones harboring each gene were isolated from BAC libraries and sequenced using either primer walking along the clone or by shotgun sequencing of the entire BAC (table 2). Triticum urartu BAC clones containing the ALP-A2, ALP-A3, ALP-A4.1, and ALP-A4.2 genes were sequenced completely. Nucleotide sequence of the durum wheat ALP-A1 gene was employed in the design of primers for sequencing of the T. urartu ALP-A1 gene (1,736 bp), Triticum durum ALP-B1 gene (1,421 bp), and Ae. tauschii ALP-D1 gene (1,738 bp) (table 2). The following sequence descriptions are based on data obtained for both the T. urartu and the durum wheat A-genome BAC sequences, unless it is necessary to discuss differences between them. The ancestral gene of the paralogous set, ALP-A1, is 2,368 bp long from the transcription start to the polyadenylation site and codes for a polypeptide 183 amino acids long. The gene has 5 exons and 4 introns. A 1,792-bp alignment of intronic and exonic sequences was used to infer the phylogeny of the ALP gene family. The nucleotide similarity levels between genes ranged from 90.1% to 98.4%. Neighbor-Joining tree based on these sequences (not shown) had the same branching pattern as the maximum parsimony tree (fig. 1). In the maximum parsimony tree, the T. urartu and durum wheat orthologous Mechanisms and Fate of Dispersed Gene Duplications 543 FIG. 1.—(A) A maximum parsimony tree of the ALP family based on nucleotide sequences of genes including introns. Bootstrap values based on 1,000 replicates are indicated above the branches. Aegilops tauschii was used as an outgroup species. The lengths of tree branches are proportional to the number of mutations. (B) Amino acid sequence alignment of the ALP gene family. Only variable sites are shown and exon–exon junctions are indicated above the amino acid alignments. Stop codons are indicated by asterisks. The scale bar is 10-nt substitutions. Wheat in the figure stands for durum wheat. genes located on the same chromosome are clustered together (fig. 1). Each node of the tree had a high bootstrap confidence. The topology of the tree showed that ALP-A1 is the ancestral locus and indicates that the evolution of the ALP gene family proceeded by interchromosomal duplications in the order ALP-A1 / ALP-A2 / ALP-A3 / ALP-A4. The last duplication was followed by an intrachromosomal duplication on chromosome 6A (ALP-A4.1 and ALP-A4.2 genes). A single conversion event was detected in durum wheat between the tandem duplicated loci ALP-A4.1 and ALP-A4.2. The tract of the gene conversion was 1,088 bp long (P 5 0.00661 after Bonferroni correction). As a consequence of the conversion, the terminal branches leading to the durum wheat ALP-A4.1 and ALP-A4.2 genes are disproportionately short (fig. 1). No gene conversions were detected among the ALP genes in T. urartu. Therefore, an absence of gene conversions was assumed in all further computations. To determine whether or not the relationships observed in T. urartu accession G1812, the source of the BAC library used here, were representative of T. urartu as a whole, portion of each gene was sequenced in an additional 6 T. urartu accessions representative of the geographic distribution of this species (table 1) and maximum parsimony trees were constructed (Fig. 1, Supplementary Material online). Although the trees were based on only a 1,012-bp sequence, which lowered the confidence in tree branching, the topology of 5 of the 6 trees was identical to that of the tree in figure 1. The remaining tree showed a single-gene switch; ALP-A4.1 clustered with ALP-A3 rather than with its tandem duplication ALP-A4.2. All genes, except for the ALP-A2 gene on chromosome 4A, had an uninterrupted coding sequence (fig. 1B). The ALP-A2 gene had mutations in the coding sequence resulting in 2 stop codons. Because these stop codons were absent from the ALP-A3, ALP-A4.1 and ALP-A4.2 genes, the ALP-A2 gene must have had acquired these mutations after the next duplication had originated. The 2 stop codons were present in both T. urartu and durum wheat, showing that they occurred before the divergence of the T. urartu and durum wheat haplotypes. Another stop codon in the coding sequence was in the T. urartu ALP-A3 gene. This stop codon was present in the T. urartu haplotype but not in its durum wheat orthologue, indicating that this mutation originated after the divergence of wheat and T. urartu haplotypes. The stop codons in exons 3 and 4 of the ALP-A2 and ALP-A3 genes, respectively, were monomorphic in the 7 investigated T. urartu accessions, but the stop codon in exon 5 was polymorphic, being present in 4 of the 7 accessions. A total of 890 bp of third-codon positions and intronic sequences were used to estimate the time of the origin of each member of the ALP paralogous set (fig. 2), using the 2.7 MYA as the divergence time of the A- and Dgenomes (Dvorak and Akhunov 2005). The ALP-A4.2 and ALP-A4.1 genes were located on the same BAC clones in T. urartu and durum wheat in tandem. Structure and Evolution of the ALP Gene Family The 5# RACE was performed on T. urartu and durum wheat RNAs. Sequencing of 5# RACE products showed that the 5# untranslated regions (UTR) of the ALP-A1 gene is 185 bp long. The 3# end of the ALP-A1 gene was inferred from the lengths of the 3# EST sequences in the NCBI database to be at least 200 bp. No known promoter or enhancer elements were found with the promoter prediction software (www. softberry.com/berry.phtml) within a 1,297-bp sequenced region upstream of the transcription initiation site of ALP-A1. ALP-A2 Duplication The fragment of chromosome 1A duplicated to chromosome 4A was 2,094 bp long and included the complete coding sequence and 19 bp of the 5# UTR and 92 bp of the 3# UTR of the gene. In the ALP-A1 gene, the ends of the DNA fragment shared with the ALP-A2 locus were flanked by 9-bp GTTGGTTTC inverted repeats (henceforth IR-1) (fig. 3). The left break point was at the gene-proximal boundary of IR-1, and the right break point was 3 bp inside IR-1 (fig. 3). No target-site duplication was found at the insertion site on chromosome 4A. The entire promoter and 166 bp of the 5# UTR and 108 bp of the 3# UTR of the ancestral ALP-A1 gene were lost from the duplicated ALP-A2 gene. A total of 49 bp of new DNA has been 544 Akhunov et al. FIG. 2.—Reconstruction of the evolution of ALP gene family. Timing of duplication events in million years is shown on the left. Corresponding regions between loci are connected with gray rectangles. inserted and 292 bp deleted from introns of the ALP-A2 gene since its origin. All these indels are present in the ALP-A3 and -A4 genes indicating that they originated before the other duplications occurred. All deletions are flanked by di- or trinucleotide repeats in ALP-A1, suggesting that they originated by replication slippage (Wicker, Yahiaoui, et al. 2003). Sequences flanking the inserted gene fragment on chromosome 4A do not have any similarity to known transposable or repetitive elements, and they do not have any significant match with sequences in the NCBI database. An exception is the Sabrina element (no. 6 in fig. 2) located upstream of the ALP-A2 gene–coding sequence (fig. 2). This element was inserted less then 0.9 MYA because it is absent from all subsequent duplications. A 259-bp insertion occurred downstream of the ALP-A2 gene (fig. 2) and was also inserted less than 0.9 MYA. ALP-A3 Duplication During the second duplication, a DNA fragment 7,021 bp long was translocated from chromosome 4A to chromosome 2A, generating the ALP-A3 locus (fig. 2). The fragment included the entire promoterless ALP gene previously duplicated to chromosome 4A from chromosome 1A. The fragment acquired an additional 862 bp at the 5# end and 4,563 bp at the 3# end that bore no similarity to ALP-A1 (fig. 2). The comparison of the 4A BAC sequence with the 2A BAC sequence revealed the following sequences surrounding the 5# excision site on chromosome 4A (5# to 3# order): 1) a 1,273-bp direct repeat ending with 14-bp inverted repeats AGACTATTCTAATCC (henceforth IR-2), 2) (TA)32(GA)12 simple sequence repeat (SSR), 3) IR-2, and 4) a TAT transposon-like sequence (a CACTA-type DNA transposon) truncated from the 5# end (fig. 2). At the 3# end of the 7,021-bp fragment was another 1,273-bp direct repeat ending with IR-2 and a TA dinucleotide SSR (fig. 2). The following elements were surrounding the 5# insertion site on chromosome 2A (5# to 3# direction): 1) (TA)4GA SSR, 2) IR-2, and 3) the truncated CACTA-type DNA transposon (figs. 2 and 3). At the 3# end, there was a 1,273-bp direct repeat ending with IR-2 and a TA dinucleotide SSR (fig. 2). The 14-bp IR-2 is a part of a larger, 30-bp element with an internal 24-bp sequence able to form a perfect hairpin (fig. 3). ALP-A4.1 Duplication The third duplication translocated a fragment containing the ALP-A3 gene from chromosome 2A to chromosome 6A, generating the ALP-A4.1 locus (figs. 2 and 3). The 5# end of the fragment is flanked by a compound SSR consisting of TA, GA, and GT dinucleotide motifs (fig. 3). No SSR was detected at the 3# end of the 6,959-bp fragment. IR-2 repeats were at both termini of the 6,959-bp fragment (fig. 3). ALP-A4.2 Duplication The fourth duplication originated by the insertion of a 6,902-bp ALP-A4.1 fragment immediately downstream of the ALP-A4.1 6,959-bp fragment, creating a tandem duplication (figs. 2 and 3). This second gene is designated ALP-A4.2. No SSR flanks the ALP-A4.2 duplication. The duplicated 6,902-bp fragment terminates with IR-2 at both ends. After the last duplication, the copia-type retrotransposon was inserted upstream of the ALP-A4.1 locus (fig. 2). The ALP-A4.2 duplication was fortunate because it provided unequivocal information about the nucleotide sequence of the insertion site and the end sequences of the duplicated fragment. During the ALP-A4.2 duplication, the 6,902-bp fragment was inserted between the last 2 nucleotides (G and T) of the ALP-A4.1 fragment, as evidenced by the sequence GTTT at the 3# end of the insertion Mechanisms and Fate of Dispersed Gene Duplications 545 FIG. 3.—Nucleotide sequences flanking the duplicated DNA fragments. Inverted repeats are shown by arrows. (figs. 3 and 4). The inserted fragment is 6,902 bp long and begins with the A of the 5# ACAC sequence immediately upstream of the 5# IR-2 repeat and ends with T of the 3# GTGT sequence immediately downstream of the 3# IR-2 repeat. The entire 5# end, starting with the A, can form a perfect hairpin with the entire 3# end ending with the T. Examination of the sequences associated with the ALP-A3 and ALP-A4.1 duplications revealed that they have identical structure to that of the ALP-A4.2 duplication and, like the ALP-A4.2 insertion, each is flanked by a G at the 5# end and a T at the 3# end (fig. 3). These structural characteristics of the duplications suggest that all duplications subsequent to ALP-A2 originated via transposition-like duplication of the same 6,902-bp fragment. No target-site duplications were observed. To verify these inferences, 18,432 T. urartu BAC clones were hybridized with a probe generated for the sequence upstream of the 3# hairpin containing IR-2 (fig. 3). A total of 299 BAC clones hybridized with the probe suggesting that 1.6% of the BAC clones in the T. urartu BAC library contained sequences similar to the terminal sequence of this putative transposon. Sequencing was attempted by primer walking using a primer designed from the 3# end of the probe sequence. Of 299 BAC templates, 110 generated sequences with phred score below 20 and shorter than 100 bp and were discarded. The remaining 189 se- quences were aligned, and those producing ambiguous alignments were removed. The remaining 96 clones had a 3# IR-2 sequence almost identical to that flanking the ALP-4.2 gene on the 3# side (fig. 4). In 1 BAC clone, the terminal region suffered a short deletion. Variation among the remaining 95 sequences was very low and 94 of the 95 IR-2 sequences ended with the T of the GTGT motif and all were flanked by the TTA motif, forming FIG. 4.—The consensus sequence of transposon insertion sites and its comparison with the ALP-A4.2 sequence. The bars indicate the frequency of clones with different nucleotide at the nucleotide position relative to the consensus sequence. In SSR sequences (an arrow), the second nucleotide is the next most frequent alternative at that site. 546 Akhunov et al. the GTTT sequence observed in ALP-A4.2. In few BACs, single nucleotide substitutions differentiated the sequence from the consensus (fig. 4). In 89 of the 96 BAC clones, IR-2 was flanked by a TA or TG SSR, some being compound and one consisting of a tetranucleotide motif. Only in 6 clones the insertion site was not flanked by an SSR, like the 3# end of the ALP-A4.2 insertion. Expression of the ALP Gene Family The ALP-A1 gene lost a part of its 5#UTR sequence and all upstream regulatory elements during the first duplication that generated ALP-A2. Surprisingly, search of NCBI EST database provided evidence that at least one of the duplicated genes is expressed in wheat because 2 classes of wheat ESTs having different 5#UTR sequences were found. One class of ESTs corresponded to the ancestral gene ALP-A1 and contained a 5#UTR sequence similar to the ALP-A1 gene 5#UTR. The second class had 5#UTRs similar to the sequences upstream of the 5# end of the ALPA3–duplicated segment. To verify this inference, RT–PCR was performed using primers specific to the sequences of every member of the ALP gene family (Table 1, Supplementary Material online). One of the RT–PCR primers in every primer pair set was designed to span a junction of neighboring exons to prevent amplification of contaminating DNA and to allow only mature intronless mRNA amplification. RNA isolated from salt stressed and nonstressed tissues of T. urartu was used as a template. The salt stress and control regimes were investigated because cDNA libraries from which most of the wheat ALP ESTs originated were prepared from salt- and cold-stressed plant mRNAs and, hence, it was possible that ALP gene expression could be stress related. RNA isolated from leaves of durum wheat plants grown in nutrient solution without salt (control condition) was also used. Transcription of the ALP-A1 and ALP-A3 genes was detected in both T. urartu and durum wheat (fig. 5A). No difference was observed between plants grown under salt stress and control conditions (data not shown). To detect the boundaries of 5# UTRs of the expressed genes, 5#-RACE products generated with primers specific for the ALP-A1 and ALP-A3 genes were sequenced. The first exon of the ALP-A1 gene consisted of a 185 bp long 5# UTR and 12 bp of the coding DNA sequence (fig. 5B). The first exon of the ALP-A3 gene was 398 bp longer due to change in the location of the start of transcription. The ALPA3–coding region was of the same length as the coding region of the ALP-A1. The lengths of the 3# UTRs were 200 bp in the ALP-A1 gene and 92 bp in the ALP-A3 gene, as inferred from comparison with the NCBI EST database. Our data is consistent with the expression of the ALPA3 gene being driven by new regulatory elements located within the DNA segment flanking at the 5# end the 2,094-bp insertion on chromosome 4A. To confirm the existence of mRNA molecules initiated from a new promoter element on chromosome 2A, RT–PCR with right primer spanning the first exon–exon junction (R in fig. 5) and a left primer located within the 5#UTR region was performed (primers 1–3 in fig. 5). The results of RT–PCR (fig. 5C) were consistent with the location of the experimentally detected start of transcription 284 bp downstream of the 5# end of the 7,021-bp DNA fragment inserted into chromosome 2A, making the 5# UTR of the new duplicated gene 398 bp longer. As a negative control, the left primer (primer 4 in fig. 5) was designed to the region located upstream of the experimentally detected start of transcription (primer 4, fig. 5B). This RT–PCR did not produce any PCR product (fig. 5C). Comparison of the region surrounding the new transcription initiation site with the database of repetitive sequences at GIRI revealed a nucleotide sequence (no. 2 element in fig. 2) similar to the CACTA class of grass DNA transposons immediately upstream of the start of transcription (fig. 5D). This element exists in all of the duplications. Selection Operating on the ALP Genes The intensity of selection in paralogs was estimated from the ratio of the number of substitutions per nonsynonymous site (dN) and the number of substitutions per synonymous site (dS). Relaxation of purifying selection causes the dN/dS ratio to approach 1.0. The maximum likelihood analysis implemented in the HyPhy package was used to estimate the rates of evolution of the ALP gene family (Kosakovsky-Pond et al. 2004), using the ALP-B1 gene sequence as an outgroup. A total of 399 nt (133 codons) were analyzed. Both likelihood ratio test and AIC indicated that the HKY85 codon subsitution model (Hasegawa et al. 1985) fit the data best. Using this model, the dN/dS ratio was estimated for every branch of the tree (fig. 6). The maximum likelihood estimation of all model parameters was performed independently for each branch. The relative rate test showed that purifying selection operating on the genes was relaxed after duplication (table 3). The tree was also partitioned into 2 clades (A and B) at internodes N1, N2, and N3 (fig. 6). The A clade contained the most recently duplicated genes, and the B clade contained the rest of the tree in each case. Different dN/dS rate models were tested in clade A, clade B, and the internode connecting both clades (Table 2, Supplementary Material online). Except for the case when N3 was selected as the separating internode, models with rate difference for the 2 clades fit data better than the model implying the equality of dN/dS rates. The results of this analysis are consistent with the results of the relative rate test (table 3). The highest log likelihood value was obtained when the tree was split at internode N1 and when the same dN/dS rate model was used for internode N1 and clade A and a different rate model for the rest of the tree (Table 2, Supplementary Material online). This outcome provided a strong indication that purifying selection was relaxed after the first duplication. When the tree was split at internode N2, the log likelihood still showed a statistically significant difference, indicating an additional relaxation of purifying selection operating on genes ALP-A3 and ALP-A4. The dN/dS ratio for the ALPA4.2 gene was 1.264 (fig. 6), which was not significantly different from 1.0 (P 5 0.36). Discussion Rates and Mechanisms of Gene Duplication Of 21 investigated Triticeae species, duplicated ALP loci were detected only in diploid wheats and one diploid Mechanisms and Fate of Dispersed Gene Duplications 547 FIG. 5.—Analysis of ALP gene family expression. (A) RT–PCR with gene-specific primers. (B) Structure of the ancestral gene ALP-A1 and duplicated gene ALP-A3. The length of the first exon is indicated. The start of transcription is indicated by an arrow and labeled 11. The region of the transcribed DNA located between the new start of transcription and duplicated gene is shown as a crosshatched box. The black and open boxes correspond to exons and UTRs, respectively. Primers are indicated as arrows and numbered 1–4. The reverse primer is indicated by an arrow labeled R. (C) RT–PCR with the primers located within and outside of the 5# UTR. The numbering corresponds to RT–PCR primers shown in part B of the figure. M is the size standard. (D) Comparison of sequences upstream of the duplicated ALP genes with the sequence of a CACTA-like transposon in the TREP database (bottom). Wheat in the figure stands for durum wheat. Aegilops; only a single gene was detected in the rest of the species. This observation and the fact that there is also only a single gene in rice, located on a chromosome homoeologous with wheat chromosomes 1A, 1B, and 1D, indicates that a single ALP gene is the ancestral state in Triticeae and likely across the entire grass family. Radiation of Triticeae spans 10 Myr (Huang et al. 2002; Ramakrishna et al. 2002; Dvorak et al. 2006). The slow duplication rate of the ancestral locus seems therefore consistent with the slow rate with which interchromosomally duplicated loci have been evolving in diploid species of Triticeae, 2.9 3 10 3 gene 1 Myr 1 (Dvorak and Akhunov 2005). However, the interchromosomal duplication rate subsequent to the origin of ALP-A2 was greatly accelerated. The ALP-A2, -A3, -A4.1, and -A4.2 loci are 1.9, 0.9, 0.6, and 0.4 Myr old, respectively. Their average age is 0.95 Myr within which 2 interchromosomally duplicated genes evolved. Hence, the duplication rate after the first duplication increased to 5.2 3 10 2 gene 1 Myr 1. This acceleration of duplication rate was caused by a fortuitous insertion of the ALP-A2 gene into a novel class of transpo- sons containing IR-2. The IR-2 sequences are part of a larger sequence capable of forming a perfect cruciform at each end of a transposon-like element. The IR-2 sequence is an end sequence of the 1,273-bp repeat present in ALP-A2, ALPA3, ALP-A4.1, and -A4.2. A remarkable characteristic of this element is its propensity to insert itself into simple FIG. 6.—dN/dS ratio estimates for the gene tree branches. The question mark indicates that the dN/dS ratio is not defined for the branch. N1, N2, and N3 are internodes of the tree used for testing evolution rate models. 548 Akhunov et al. Table 3 Pair-wise Relative Evolution Rates of A-Genome Genes Using ALP-B1 Gene as an Outgroup Gene Triplet ALP-B1 ALP-B1 ALP-B1 ALP-B1 ALP-B1 ALP-B1 ALP-B1 ALP-B1 ALP-B1 ALP-B1 * (ALP-A3, ALP-A2) (ALP-A3, ALP-A1) (ALP-A3, ALP-A4.1) (ALP-A3, ALP-A4.2) (ALP-A2, ALP-A1) (ALP-A2, ALP-A4.1) (ALP-A2, ALP-A4.2) (ALP-A1, ALP-A4.1) (ALP-A1, ALP-A4.2) (ALP-A4.1, ALP-A4.2) Likelihood Ratio Probability 7.070 7.033 3.657 2.758 6.706 0.996 3.823 6.157 9.762 1.781 0.029* 0.029* 0.161 0.252 0.035* 0.608 0.148 0.046* 0.008* 0.411 themselves generated by recent duplications. By removing all genes that could have been duplicated by the secondary duplication process from data reported by Dvorak and Akhunov (2005), the primary rate of interchromosomal gene duplication is 2.5 3 10 3 gene 1 Myr 1. The secondary rate may vary among gene families; for the ALP family the rate is 5.2 3 10 2 gene 1 Myr 1. The secondary duplication rate for the ALP gene family is 20 times greater than the primary duplication rate. The propagation of duplicated gene fragments by Helitrons (Morgante et al. 2005) and MULEs (Jiang et al. 2004) are other examples of the secondary duplication process and also undoubtedly happen with very high rates. Lifespan of a Duplicated Gene Statistically significant. or compound SSRs, most of them based on the TA or TG motifs (fig. 4). The insertion site is almost always flanked by a TA dinucleotide. These findings are consistent with the inference that the IR-2 is the terminus of a transposon, and the duplications of the ALP-A2, ALP-A3, and ALP-A4.1 loci were mediated by the transposon. The terminal sequence of IR-2 includes CACTA motif, which characterizes a major transposon class in wheat (Wicker, Guyot, et al. 2003). However, we failed to detect a target-site duplication upon insertion of the transposon, which is one of the characteristics of CACTA transposons, and the rest of its sequence bears no similarity to the wheat or any other CACTA-type transposons or any other known transposon. A total of 0.5% of all T. urartu BAC clones contained IR-2 sequence. And an additional 1.1% hybridized with a sequence derived from the 1,273-bp repeat but very likely had diverged termini. Although the characterization of this mobile element family requires an additional work, there is little doubt that it represents an important component of the intergenic space in the T. urartu genome and contributes to its dynamic state. The acceleration of duplication rate of the ALP genes after the first duplication caused by recurrent transposition facilitated by a IR-2–containing transposon provides a direct evidence for the importance of DNA transposons for new gene evolution via gene duplication. The very high rate with which T. urartu intergenic DNA accumulates large indels (Dvorak et al. 2006) may account for the curious observation that it was always the most recently duplicated gene that produced the next duplication. Two of the 4 duplicated genes suffered insertions of large retroelements, and an additional indel occurred in the immediate vicinity of one of the IR-2 sequence at the ALP-A2 locus. It is possible that insertions of large retroelements alter the ability of repeated elements to duplicate. The high rate with which large indels occur in the T. urartu genome may leave only a short time window for duplication. Hence, the youngest element may have the greatest chance to be the source of the next duplication. The acceleration of the gene duplication process after the first duplication should be taken into account in the estimation of gene duplication rates. The overall duplication rate of 2.9 3 10 3 gene 1 Myr 1 (Dvorak and Akhunov 2005) actually consists of 2 very different rates: 1) the primary rate involving duplications of ancestral genes and 2) the secondary rate of duplications of genes and gene fragments The comparison of sequences of duplicated ALP genes with the ancestral gene showed that each duplication produced a gene with a complete coding sequence and each duplicated gene had a complete ORF at the time of duplication. This is unequivocally shown by the full-length mRNA transcribed from the ALP-A3 gene. Since their origin, 3 of the 4 duplicated genes in T. urartu and 2 of the 4 in durum wheat either have acquired stop codons, which truncated their products, or large retroelements have been inserted into their promoters effectively precluding their expression. Using an average age of the duplicated loci of 0.95 Myr and the fact that in the T. urartu and durum wheat lineages 6/8 of duplicated genes were not expressed, the rate of nonfunctionalization of duplicated genes was 0.79 gene 1 Myr 1. Substituting this constant into the exponential equation 0.5 5 e kt, to compute the half-life, the half-life of a duplicated gene is 0.9 Myr. This empirical rate of nonfunctionalization of duplicated genes is 4-fold higher than the half-life of 3.2 Myr computed from the genomic sequence of Arabidopsis (Lynch and Conery 2000). The use of stop codons as indicators of nonfunctionalization could underestimate the actual nonfunctionalization rate. For example, the ALP-A4.1 and ALP-A4.2 genes have no stop codons but are not expressed. The absence of expression of the ALP-A4.1 gene could be explained by the insertion of retroelement Claudia in the 5# UTR of the gene. The factors resulting in the absence of ALP-A4.2 gene expression are unknown. Expression of Duplicated Genes The analysis of the wheat EST database and transcription analysis of the ALP genes showed that the ancestral gene and the ALP-A3–duplicated gene are abundantly transcribed. Transcription of the duplicated gene is driven by regulatory elements located within the sequence having similarity to CACTA type of transposons. A similar case has been described in Japanese morning glory in which transcription of a captured gene was initiated within the sequence of the CACTA-type transposon Tpn1 (Kawasaki and Nitasaka 2004). Transcription from the promoters of the transposable elements was hypothesized for gene fragments duplicated by MULEs; however, the analysis of these transcripts showed that all of them were pseudogenes (Juretic et al. 2006). The transcription of the ALP-A3 gene was as abundant as that of the ancestral ALP-A1 gene, but the transcript had Mechanisms and Fate of Dispersed Gene Duplications 549 a longer 5# UTR. Both genes were constitutively expressed under the limited number of developmental and environmental conditions tested. The fact that the duplicated gene ALP-A3 has a new promoter qualifies it as a new gene. Dispersed Duplicated Genes and Selection The dN/dS ratio of 0.028 along the ALP-A1 gene branch as compared to the dN/dS ratio of 0.1 along the ALP-B1 gene branch show no relaxation of purifying selection acting on ALP-A1 after the origin of duplicated genes. The dN/dS ratio of all duplicated genes was significantly higher than that of the ALP-A1 gene, suggesting a relaxation of purifying selection acting on them. Because all genes were identical after their duplication and because ALPA3 is expressed, it is very likely that most of the duplicated genes were also expressed after their duplication and may have temporarily been under purifying selection. It is therefore interesting to note that the expressed duplicated gene ALP-A3 had one of the highest dN/dS ratios (0.73) and that it has accumulated a total of 15 amino acid differences compared to the ancestral gene. Duplicated genes generated by polyploidy reside in their original environment after the whole-genome duplication. The subsequent evolution therefore follows one of the paths described in Introduction: nonfunctionalization or neofunctionalization of one of the genes or subfunctionalization of both. Duplicated genes produced by interspersed duplications, exemplified by the ALP family, are located in a new genomic environment that is different from that of the ancestral genes. Such genes will in most cases be unequal partners, the ancestral gene maintaining the original function and remaining under strong purifying selection, as shown here for the ALP-A1 gene, and the duplicated genes most often becoming pseudogenes, like the ALP-A2, ALPA4.1, and ALP-A4.2, or, rarely, resulting in the evolution of new genes exemplified by the ALP-A3 gene. Repeated Sequences and New Gene Evolution by Interspersed Gene Duplication The evolution of the ALP gene family illustrates the importance of repeated DNA making up the intergenic space in the Triticeae genomes for new gene evolution. Repeated DNA in Triticeae genomes facilitates gene duplication and may also be an inexhaustible source of ready-made promoters to drive the expression of duplicated genes. Repeated DNA thus facilitates both prerequisites for the evolution of new genes via gene duplication. Viewing repeated DNA from this point of view, it is hard to believe that this genomic component of the large plant genomes is selectively neutral. Supplementary Material Supplementary Figure 1 and Tables 1 and 2 are available at Molecular Biology and Evolution online (http:// www.mbe.oxfordjournals.org/). Acknowledgments We would like to thank Bhupinder Saini and Paula Goines for assistance with the sequencing of the duplicated ALP genes, Hieu Phan for assistance with the sequencing of the transposon insertion sites, Karen Deal for editorial suggestions during the preparation of manuscript, and 4 anonymous reviewers for providing very helpful comments on the manuscript. This work was supported by National Science Foundation Plant Genome Research Program under Contract Agreement No. DBI-9975989. Literature Cited Akhunov ED, Akhunova AR, Linkiewicz AM, et al. (31 coauthors). 2003. Synteny perturbations between wheat homoeologous chromosomes by locus duplications and deletions correlate with recombination rates along chromosome arms. Proc Natl Acad Sci USA. 100:10836–10841. Akhunov ED, Akhunova AR, Dvorak J. 2005. BAC libraries of Triticum urartu, Aegilops speltoides and Ae. tauschii, the diploid ancestors of polyploid wheat. Theor Appl Genet. 111:1617–1622. Akhunov ED, Goodyear JA, Geng S, et al. (33 co-authors). 2003. The organization and rate of evolution of the wheat genomes are correlated with recombination rates along chromosome arms. Genome Res. 13:753–763. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic logical alignment search tool. J Mol Biol. 215:403–410. Arumuganathan K, Earle ED. 1991. Nuclear DNA content of some important plant species. Plant Mol Biol Rep. 9:208–218. Blanc G, Wolfe KH. 2004. Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell. 16:1667–1678. Brunner S, Pea G, Rafalski A. 2005. Origins, genetic organization and transcription of a family of non-autonomous helitron elements in maize. Plant J. 43:799–810. Cenci A, Chantret N, Kong X, Gu Y, Anderwson OD, Fahima T, Distelfeld A, Dubcovsky J. 2003. Construction and characterization of a half million clone BAC library of durum wheat (Triticum turgidum ssp. durum). Theor Appl Genet. 107:931–939. Dubcovsky J, Luo MC, Zhong GY, Bransteitter R, Desai A, Kilian A, Kleinhofs A, Dvorak J. 1996. Genetic map of diploid wheat, Triticum monococcum L., and its comparison with maps of Hordeum vulgare L. Genetics. 143:983–999. Dvorak J, Akhunov ED. 2005. Tempos of deletions and duplications of gene loci in relation to recombination rate during diploid and polyploid evolution in the Aegilops-Triticum alliance. Genetics. 171:323–332. Dvorak J, Akhunov ED, Akhunova AR, Deal KR, Luo MC. 2006. Molecular characterization of a diagnostic DNA marker for domesticated tetraploid wheat provides evidence for gene flow from wild tetraploid wheat to hexaploid wheat. Mol Biol Evol. 23:1386–1396. Dvorak J, di Terlizzi P, Zhang HB, Resta P. 1993. The evolution of polyploid wheats: identification of the A genome donor species. Genome. 36:21–31. Dvorak J, Luo M-C, Yang Z-L, Zhang H-B. 1998. The structure of Aegilops tauschii genepool and the evolution of hexaploid wheat. Theor Appl Genet. 97:657–670. Dvorak J, McGuire PE, Cassidy B. 1988. Apparent sources of the A genomes of wheats inferred from the polymorphism in abundance and restriction fragment length of repeated nucleotide sequences. Genome. 30:680–689. Dvorak J, Zhang HB. 1990. Variation in repeated nucleotide sequences sheds light on the phylogeny of the wheat B and G genomes. Proc Natl Acad Sci USA. 87:9640–9644. Gordon D, Abajian C, Green P. 1998. Consed: a graphical tool for sequence finishing. Genome Res. 8:195–202. 550 Akhunov et al. Hasegawa M, Kishino K, Yano T. 1985. Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol. 22:160–174. Huang S, Sirikhachornkit A, Su X, Faris J, Gill BS, Haselkorn R, Gornicki P. 2002. Genes encoding plastid acetyl-CoA carboxylase and 3-phopshoglycerate kinase of the Triticum/Aegilops complex and the evolutionary history of polyploid wheat. Proc Natl Acad Sci USA. 99:8133–8138. Hughes AL. 2002. Adaptive evolution after gene duplication. Trends Genet. 18:433–434. Jiang N, Bao Z, Zhang X, Eddy SR, Wessler SR. 2004. PackMULE transposable elements mediate gene evolution in plants. Nature. 30:569–573. Jones CD, Custer AW, Begun DJ. 2005. Origin and evolution of a chimeric fusion gene in Drosophila subobscura, D. madeirensis and D. guanche. Genetics. 170:207–219. Juretic N, Hoen DR, Huynh ML, Marrison PM, Bureau TE. 2006. The evolutionary fate of MULE-mediated duplications of host gene fragments in rice. Genome Res. 15:1292–1297. Kapitonov VV, Jurka J. 2001. Rolling-circle transposons in eukaryotes. Proc Natl Acad Sci USA. 98:8714–8719. Katju V, Lynch M. 2003. The structure and early evolution of recently arisen gene duplicates in the Caenorhabditis elegans genome. Genetics. 165:1793–1803. Kawasaki S, Nitasaka E. 2004. Characterization of Tpn1 family in the Japanese morning glory: En/Spm-related transposable elements capturing host genes. Plant Cell Physiol. 45:933–944. Kihara H. 1944. [Discovery of the DD-analyser, one of the ancestors of Triticum vulgare]. Agric Horticulture (Tokyo). 19:13– 14. Japanese. Kosakovsky-Pond SL, Frost SD, Muse SV. 2004. HyPhy: hypothesis testing using phylogenies. Bioinformatics. 21:676–679. Luo MC, Yang ZL, Dvorak J. 1998. Position effects of ribosomal RNA multigene loci on meiotic recombination in wheat. Genetics. 149:1105–1113. Lynch M, Conery JS. 2000. The evolutionary fate and consequences of duplicate genes. Science. 290:1151–1154. Lynch M, Force A. 2000. The probability of duplicate gene preservation by subfunctionalization. Genetics. 154:459–473. McFadden ES, Sears ER. 1946. The origin of Triticum spelta and its free-threshing hexaploid relatives. J Hered. 37:81–89, 107–116. Moore RC, Purugganan MD. 2003. The early stages of duplicate gene evolution. Proc Natl Acad Sci USA. 100:15682–15687. Moore RC, Purugganan MD. 2005. The evolutionary dynamics of plant duplicate genes. Curr Opin Plant Biol. 8:122–128. Morgante M. 2006. Plant genome organization and diversity: the year of the junk! Curr Opin Biotech. 17:168–173. Morgante M, Brunner S, Pea G, Fengler K, Zuccolo A, Rafalski A. 2005. Gene duplication and exon shuffling by helitron-like transposons generate intraspecies diversity in maize. Nat Genet. 37:997–1002. Ohno S. 1970. Evolution by gene duplication. Berlin (Germany): Springer. Otto SP, Whitton J. 2000. Polyploid incidence and evolution. Annu Rev Genet. 34:401–437. Paterson AH, Bowers JE, Chapman BA. 2004. Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc Natl Acad Sci USA. 101:9903–9908. Peng JH, Zadeh H, Lazo GR, et al. (25 co-authors). 2004. Chromosome bin map of expressed sequence tags in homoeologous group 1 of hexaploid wheat and homoeology with rice and Arabidopsis. Genetics. 168:609–623. Posada D, Grandall KA. 1998. Modeltest: testing the model of DNA substitution. Bioinformatics. 14:817–818. Ramakrishna W, Dubcovsky J, Park YJ, Busso C, Embereton J, SanMiguel P, Bennetzen JL. 2002. Different types and rates of genome evolution detected by comparative sequence analysis of orthologus segments from four cereal genomes. Genetics. 162:1389–1400. Sanderson MJ. 2002. Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach. Mol Biol Evol. 19:101–109. Sawyer SA. 1989. Statistical test for determining gene conversion. Mol Biol Evol. 6:526–538. Sorrells ME, La Rota CM, Bermudez-Kandianis CE, et al. (35 coauthors). 2003. Comparative DNA sequence analysis of wheat and rice genomes. Genome Res. 13:1818–1827. Stein N, Feuillet C, Wicker T, Schlagenhauf E, Keller B. 2000. Subgenome chromosome walking in wheat: a 450-kb physical contig in Triticum monococcum L. spans the Lr10 resistance locus in hexaploid wheat (Triticum aestivum L.). Proc Natl Acad Sci USA. 97:13436–13441. Swofford DL. 2003. PAUP*. Phylogenetic analysis using parsimony (*and other methods). Sunderland (MA): Sinauer Associates. Walsh JB. 1995. How often do duplicated genes evolve new functions? Genetics. 139:421–428. White SE, Habera LF, Wessler SR. 1994. Retrotransposons in the flanking regions of normal plant genes—a role for copia-like elements in the evolution of gene structure and expression. Proc Natl Acad Sci USA. 91:11792–11796. Wicker T, Guyot R, Yahiaoui N, Keller B. 2003. CACTA transposons in Triticeae. A diverse family of high-copy repetitive elements. Plant Physiol. 132:52–63. Wicker T, Yahiaoui N, Guyot R, Schlagenhauf E, Liu ZD, Dubcovsky J, Keller B. 2003. Rapid genome divergence at orthologous low molecular weight glutenin loci of the A and Am genomes of wheat. Plant Cell. 15:1186–1197. Zhang L, Lu HHS, Chung W-Y, Yang J, Li W-H. 2004. Patterns of segmental duplications in the human genome. Mol Biol Evol. 22:135–141. William Martin, Associate Editor Accepted November 15, 2006
© Copyright 2026 Paperzz