Mechanisms and Rates of Birth and Death of Dispersed Duplicated

Mechanisms and Rates of Birth and Death of Dispersed Duplicated Genes during
the Evolution of a Multigene Family in Diploid and Tetraploid Wheats
Eduard D. Akhunov, Alina R. Akhunova, and Jan Dvorak
Department of Plant Sciences, University of California, Davis
A family of 5 genes that evolved within the past 1.9 Myr in diploid wheat was characterized. The ancestral gene, ALP-A1, is
on chromosome 1A and encodes an aci-reductone dioxygenase–like protein. The duplicated genes ALP-A2, ALP-A3, ALPA4.1, and ALP-A4.2 acquired complete coding sequences but lost the original promoter. They are on chromosomes 4A, 2A,
6A and 6A, respectively, and evolved sequentially, the youngest duplicated gene always producing the next duplicate. It is
shown that dispersed gene duplication rate consists of the primary rate (duplications of ancestral genes) and the secondary
rate (duplications of genes that had been generated by recent duplications). The primary rate was 2.5 3 10 3 gene 1 Myr 1
in diploid wheat. The secondary rate was 5.2 3 10 2 gene 1 Myr 1 in the ALP family. The 20-fold acceleration of the
secondary rate was caused by the insertion of the ALP-A2 gene into a novel type transposon. Only the ALP-A1 and ALP-A3
genes are transcribed. The transcription of ALP-A3 is directed by a promoter within a DNA fragment similar to a CACTA
type of DNA transposons, making ALP-A3 a new gene. The ALP-A3 transcript is longer than that of the ALP-A1. The halflife of ALP duplicated genes was estimated to be 0.87 Myr. Strong purifying selection acting on the ancestral gene ALP-A1
was undiminished by the evolution of duplicated genes. The evolution of the ALP family shows that repeated elements
facilitate both gene duplication and expression of duplicated genes and highlights their importance for the evolution of gene
repertoire in large plant genomes.
Introduction
The evolution of new genes by gene duplication is one
of the most important processes driving organic evolution.
Polyploidy duplicates the entire gene repertoire of an organism in a single step and is therefore an exceedingly important source of duplicated genes (Ohno 1970). In the plant
kingdom, polyploidy is a major evolutionary strategy, and
even such classical ‘‘diploid’’ plant models as Arabidopsis,
rice, and maize evolved from ancient polyploids (Blanc and
Wolfe 2004; Paterson et al. 2004). It is therefore almost unavoidable to assume that a recent or ancient polyploidization is the cause of virtually all duplicated loci in a genome.
Studies on wheat clearly showed that such an assumption
would greatly distort our understanding of genome evolution in plants (Akhunov, Goodyear, et al. 2003).
Beside polyploidy and segmental chromosome duplications, there are 2 basic types of gene duplications: tandem
and dispersed. The former are subjected to unequal crossovers leading to reversions and concerted evolution. The
latter are copies of genes or gene fragments translocated
to other locations in a genome, giving rise to dispersed duplicated loci and dispersed multigene families. This type of
gene duplication is more likely to evolve a new expression
pattern because of the physical separation of the duplicated
gene from its ancestral locus. Tandem gene duplications are
more frequent than dispersed gene duplications. In the Arabidopsis genome, tandem duplication may represent nearly
50% of all recently evolved duplicated gene pairs, whereas
dispersed duplication may account for only 6% of them
(Moore and Purugganan 2003). In contrast, in wheat, nearly
20% of wheat unigenes involve loci that originated by
recent interchromosomal gene duplications (Akhunov,
Goodyear, et al. 2003).
Key words: gene duplications, transposon, transcription, wheat,
genome evolution.
E-mail: [email protected].
Mol. Biol. Evol. 24(2):539–550. 2007
doi:10.1093/molbev/msl183
Advance Access publication November 29, 2006
Ó The Author 2006. Published by Oxford University Press on behalf of
the Society for Molecular Biology and Evolution. All rights reserved.
For permissions, please e-mail: [email protected]
Wheat species form a classical polyploid series at 3
ploidy levels: diploid (Triticum urartu and Triticum monococcum, genomes AA and AmAm, respectively), tetraploid
(Triticum turgidum and Triticum timopheevii, genomes
AABB and AAGG, respectively), and hexaploid (Triticum
aestivum and Triticum zhukovskyi, genomes AABBDD and
AAGGAmAm, respectively). Both tetraploid wheats originated by hybridization of species in the Aegilops speltoides
evolutionary lineage with T. urartu (Dvorak and Zhang 1990;
Dvorak et al. 1993). Triticum aestivum originated by hybridization of T. turgidum with Aegilops tauschii (genomes DD)
(Kihara 1944; McFadden and Sears 1946; Dvorak et al. 1998).
The analysis of 3,159 gene loci in the wheat A and D
genomes and their diploid sources uncovered 25 loci that
evolved by interchromosomal duplications during the evolution of T. urartu or Ae. tauschii since their divergence
about 2.7 MYA (Dvorak and Akhunov 2005). It was estimated from these numbers that new duplicated loci have
been evolving with a rate of 2.9 3 10 3 gene 1 Myr 1
in these diploid lineages.
Genes that have an open reading frame (ORF) and are
expressed at the time of their origin are probably most likely
to result in the evolution of a gene with a new function. The
completeness of a nascent duplicated gene and its expression status can seldom be deduced directly from genomic
DNA sequence and usually must be inferred experimentally. The half-life of duplicated genes was estimated to
be 2.9 Myr in invertebrate lineages and 3.2 Myr in the
Arabidopsis lineage (Lynch and Conery 2000). Because
of this short life span, most complete duplicated genes are
destined to become pseudogenes (Walsh 1995) and gene
fragments. This is consistent with global analyses of animal
genomes, which revealed that most duplicated gene copies
contained incomplete gene sequences; only 10.7% of duplicated genes in segmental duplications in the human genome
are complete (Zhang et al. 2004) and as much as 70% of
duplications are shorter than 2 kb in the Caenorhabditis
elegans genome (Katju and Lynch 2003).
However, duplicated gene fragments and pseudogenes
could occasionally produce a new gene by combining
several unrelated gene fragments into a single transcript
540 Akhunov et al.
(Brunner et al. 2005). In the rice genome, gene fragments
averaging only 325 bp are propagated by the ‘‘mutator’’like transposable element (MULE) (Jiang et al. 2004). Recently discovered helitrons (Kapitonov and Jurka 2001)
are another example of transposons propagating numerous
gene fragments (Morgante et al. 2005).
The expression of duplicated genes can be directed by
their own promoters or promoters of other genes. An intriguing possibility is that the expression of duplicated
genes is directed by promoters furnished by repeated elements (White et al. 1994; Kawasaki and Nitasaka 2004;
Brunner et al. 2005). A bioinformatic study suggested that
rice MULEs have the potential to direct transcription of
gene fragments (Jiang et al. 2004), but a follow-up study
suggested that MULE-mediated gene duplication results
in the formation of pseudogenes (Juretic et al. 2006).
Duplications of complete genes may have several theoretical outcomes. 1) One of the duplicated genes could become a pseudogene through the acquisition of deleterious
mutations, 2) both could continue to fulfill a similar function, 3) one copy could gain a new function, and 4) the
original function of a gene could become split between
the duplicated copies (subfunctionalization) (Ohno 1970;
Walsh 1995; Lynch and Force 2000; Hughes 2002). Because the probability of the acquisition of deleterious mutations by a duplicated gene is very high, it was suggested that
(1) is the primary fate of a duplicated gene (Walsh 1995).
The high incidence of duplicated genes in eukaryotic
genomes has been interpreted as suggesting that selection
plays an important role in the fate of duplicated genes
(Otto and Whitton 2000; Moore and Purugganan 2003; Jones
et al. 2005; Moore and Purugganan 2005). Low levels of
polymorphism in 3 pairs of dispersed Arabidopsis thaliana
duplicated genes were interpreted as evidence of positive selection acting on the genes (Moore and Purugganan 2003).
If repeated elements, such as MULEs and Helitrons,
are able to duplicate gene fragments in the small rice genome and the medium-size maize genome (Jiang et al.
2004; Morgante 2006), the significance of repeated sequences for gene duplication could be far greater in large plant
genomes, exemplified by those of wheat and its relatives in
the grass tribe Triticeae, and may be the factor responsible
for the large difference in the abundance of dispersed duplicated genes between Arabidopsis and wheat pointed out
above. The sizes of the genomes of wheat diploid ancestors
range from 4 Gb to about 6 Gb (Arumuganathan and Earle
1991) of which more than 90% are repeated nucleotide
sequences (Akhunov et al. 2005).
To obtain experimental data on the early stages of gene
evolution via dispersed gene duplications, we analyzed the
mechanisms and rates of duplication of genes corresponding to wheat expressed sequence tag (EST) unigene
BF200640. The entire BF200640 family evolved in the
A-genome diploid lineage since its divergence from the
B and D genome lineages (Akhunov, Akhunova, et al.
2003). The ancestral state is a single locus on chromosome
1. This state is preserved in the wheat B and D genomes
and the genomes of other wheat diploid relatives. In the
wheat B and D genomes, the ancestral locus was mapped
in the most distal bin of the 1AL and 1BL arms (Peng et al.
2004). In the A genome, duplicated genes were mapped
Table 1
Triticum urartu Accessions Used in This Study
Label
Tu-B1
Tu-E1
Tu-F1
Tu-B2
Tu-D2
Tu-G2
Tu-G1812
Accession
Location
G1791
G1895
G3159
DV2351
DV2122
DV2374
G1812
Mardin, Turkey
Urfa, Turkey
El Beqaa, Lebanon
Turkey, Sanli Urfa
Syria, Aleppo
Turkey, Sanli Urfa
Mardin, Turkey
on chromosomes 2A and 4A and 2 on chromosome 6A
(Akhunov, Akhunova, et al. 2003). The recent origin of this
family allowed us to infer the mechanisms of duplication,
its rates, and the completeness of each gene at the time of
the origin of the duplication. Transcription of each gene was
analyzed in detail to assess its expression and regulation.
Materials and Methods
Plant Materials
Nuclear DNAs were isolated (Dvorak et al. 1988) from
single plants of 7 T. urartu accessions representative of the
geographic distribution of this species (table 1) and from
single accessions of the following species in the tribe Triticeae of the grass family: Triticum monococcum, Aegilops
speltoides, Aegilops sharonensis, Aegilops longissima,
Aegilops bicornis, Aegilops searsii, Aegilops caudata,
Aegilops comosa, Aegilops uniaristata, Aegilops umbellulata, Aegilops tauschii, Taeniatherum caput-medusae, Heteranthelium piliferum, Secale cereale, Haynaldia villosa,
Agropyron cristatum, Lophopyrum elongatum, Pseudoroegneria stipifolia, Thinopyrum bessarabicum, and Psathyrostachys juncea.
Southern Blot Hybridization
The DNAs were digested with EcoRI, electrophoretically fractionated in 1% agarose gels, and transferred to
Hybond N1 nylon membranes (Amersham, Piscataway,
NJ) by capillary transfer in 0.4 N NaOH overnight (Luo
et al. 1998). The membranes were then rinsed in 2 3 standard saline citrate (SSC) for 5 min and hybridized with 32Plabeled probe derived from EST BF200640 (supplied by
Olin Anderson, USDA Western Research Center, Albany,
CA; cDNA clone WHE0825-0828_L16_L16) amplified by
polymerase chain reaction (PCR) from the plasmid. Prehybridization and hybridization were performed as described
earlier (Dubcovsky et al. 1996). The membranes were
washed in 2 3 SSC and 0.5% sodium dodecyl sulfate
(SDS) for 30 min to 2 h at 65 °C, 1 3 SSC and 0.5% SDS
for 30 min at 65 °C, and 0.5 3 SSC and 0.5% SDS for
12 min and autoradiographed.
Bacterial Artificial Chromosome Library Screening
Bacterial artificial chromosome (BAC) library of
T. urartu accession G1812 (Akhunov et al. 2005) and Triticum turgidum ssp. durum cv Langdon (Cenci et al. 2003)
were employed in this study. 32P-labeled probe of wheat
EST BF200640 was hybridized with 28 high-density membranes, each containing 18,432 double-printed clones, of
BAC library of T. turgidum ssp. durum (henceforth durum
Mechanisms and Fate of Dispersed Gene Duplications 541
Table 2
Gene Loci Sequenced in the Study
Chromosome (gene)
1A (ALP-A1)
2A (ALP-A3)
4A (ALP-A2)
6A (ALP-A4.11.2)
1B (ALP-B1)
1D (ALP-D1)
Species
Triticum durum
T. urartu
T. durum
T. urartu
T. durum
T. urartu
T. durum
T. urartu
T. durum
Aegilops tauschii
BAC Clone
(PCR amplicon)
285M18
317A22
466G24
404H6
221H19
292N12
219E24
41C8
285O20
(AL8/78)
wheat). A total of 33 positive clones were isolated. The
DNA of these BAC clones was digested with EcoRI restriction endonuclease, restriction fragments were resolved by
1% agarose gel electrophoresis, and Southern blot was hybridized with a 32P-labeled BF200640 probe. BAC clones
containing the duplicated copies of the gene family were
selected by comparing the hybridization profiles of the
clones with the hybridization profile of the wheat genomic
DNA digested with EcoRI restriction endonuclease.
To sequence T. urartu loci, the EST BF200640 probe
was hybridized with 9 screening membranes of T. urartu
BAC library (Akhunov et al. 2005). Eight positive clones
were identified.
To sequence insertion sites of inverted repeat 2 (IR)-2–
containing transposons, a T. urartu BAC library high-density
screening membrane containing 18,432 double-printed
clones was screened with a DNA fragment amplified by
PCR from the BAC 404H6 DNA. The PCR target was a sequence upstream of the IR-2 terminal repeat (nucleotides
61,247–61,375 of BAC clone 404H6). The PCR product
was 32P-labeled. The 3# insertion sites of the transposon were
sequenced by primer walking using DNA of each positive
BAC clone as a template.
Clone Sequencing
As a first step, durum wheat BAC EcoRI restriction
fragments hybridizing with the BF200640 probe were
subcloned into the pGEM3Zf(1) vector and sequenced
using the transposon Tn5 kit (Epicentre, Berkley, CA).
Durum wheat BAC clones harboring genes located on
chromosomes 2A, 4A, and 6A were completely sequenced
using the shotgun approach (Stein et al. 2000). Base calling
and assembly of BAC contigs was performed using the
Phred/Phrap/Consed software (Gordon et al. 1998). Only
the gene and neighboring flanking DNA of a BAC clone
harboring the durum wheat gene located on chromosome
1A were sequenced. BigDye v3.1 sequencing chemistry
(ABI, Foster City, CA) and capillary electrophoresis with
ABI3730xl was used to sequence DNA. The DNA sequences of genes from durum wheat were used to design 3 pairs
of primers spanning the gene region (Table 1, Supplementary Material online). These primers were used to amplify
and sequence the T. urartu gene on chromosome 1A using
BAC DNA as a template. The total length of the sequenced
gene region was about 1,200 bp. Triticum urartu BAC
clones harboring genes located on chromosomes 2A, 4A,
Sequencing Strategy
Length (bp)
Primer walk
PCR amplicon sequencing
Shotgun sequencing
Shotgun sequencing
Shotgun sequencing
Shotgun sequencing
Shotgun sequencing
Shotgun sequencing
PCR amplicon sequencing
PCR amplicon sequencing
8,921
1,736
99,752
106,806
148,156
111,168
168,664
129,021
1,421
1,738
and 6A were completely sequenced using the shotgun approach (table 2).
The same pairs of primers were used to amplify and
sequence the gene from the genome of Ae. tauschii and
the B-genome of durum wheat, using genomic DNA of
Ae. tauschii AL8/78 and a BAC clone of durum wheat, respectively, as templates. In all cases, both strands of PCR
products were sequenced. Ambiguous base callings were
resolved by resequencing.
To annotate repeated elements, BAC DNA sequences
were compared with the Triticeae Repeat Sequence (TREP
database; http://wheat.pw.usda.gov/ITMI/Repeats/) and Genetic Information Research Institute (GIRI) (http://www.
girinst.org) databases. The coding potential of sequences
were established by comparisons of translated BAC sequences with the National Center for Biotechnology Information
(NCBI) nonredundant database using the BlastX (Altschul
et al. 1990) program and by comparison of BAC sequences
with the NCBI EST database using the BlastN program. Sequence comparisons of the paralogous gene loci were performed with the BlastN program.
In addition to the T. urartu accession G1812, in which
the aci-reductone dioxygenase–like protein (ALP) genes
were sequenced in their entirety, a fragment of each gene
from exons 3 to 5 was sequenced in 6 additional T. urartu
accessions (table 1) representative of the geographic distribution of the species. Each gene fragment was PCR amplified and subcloned into the pGEM–T Easy plasmid vector
(Promega, Madison, WI), and Escherichia coli DH10B cells
were transformed by electroporation. A minimum of 2 independent clones per gene were sequenced using the M13
forward and reverse primers. Both strands of each clone were
sequenced. Sequences were aligned with ClustalW program.
The gaps in the alignment were deleted before analysis. The
Close-Neighbor-Interchange algorithm implemented in the
MEGA 3.1 program was used to construct the maximum
parsimony trees. Confidence levels of the trees were assessed
by bootstrap resampling replicated 1,000 times.
Phylogenetic Analysis
The exonic and intronic sequences of durum wheat,
T. urartu, and Ae. tauschii were aligned using the ClustalX
program followed by manual editing of the alignment. Phylogenetic relationships among the genes were inferred using
the maximum parsimony and Neighbor-Joining methods
of tree construction implemented in the PAUP program
542 Akhunov et al.
(Swofford 2003). The Ae. tauschii gene sequence was used
as an outgroup to root each tree. The bootstrap confidence
of individual nodes was based on 1,000 resampling runs.
A total of 890 noncoding and third position nucleotides of each gene were used to time each duplication event.
The substitution model parameters were estimated using hierarchical likelihood ratio test implemented in the Modeltest program (Posada and Grandall 1998). According to the
Akaike information criterion (AIC), the best model fitting
the observed data was HKY (Hasegawa et al. 1985) without
rate variation. The selected model parameters were then
used for likelihood estimation of the branch lengths of
the tree with the given topology using the PAUP program
(Swofford 2003). The branch length estimates and the tree
were used to compute the divergence time of the duplicated
genes using the semiparametric penalized likelihood method
implemented in the r8s program (Sanderson 2002). The
smoothing parameter for the penalized likelihood method
was estimated as described (Sanderson 2002). The ALPB1 gene sequence was used as an outgroup to root the tree.
The outgroup was pruned before estimation of the divergence times. The calculations were based on the assumption
that the A and D genomes diverged 2.7 MYA (Dvorak and
Akhunov 2005).
The dN/dS ratio was used as a measure of selective constraints imposed on duplicated genes. The dN/dS ratio was
estimated using the maximum likelihood framework implemented in the HyPhy package (Kosakovsky-Pond et al.
2004). The HyPhy package was also used to perform the
relative rate tests.
The gene conversion between the genes of the ALP
family in T. urartu and durum wheat was tested by GeneConv program (Sawyer 1989). The length of the alignment
used in gene conversion analysis was 1,787 bp.
Gene Expression and Rapid Amplification
of cDNA Ends
To determine the expression of each gene, T. urartu
(accession G1812) and durum wheat cv. Langdon were grown
in solution tanks containing 300 l of either 0.53 modified
Hoagland solution or the same solution containing 125 mM
NaCl(salt stress).Saltstresswas imposedbystepwiseincrease
of NaCl concentration to 50, 100, and 125mM NaCl each third
day. Whole roots and leaves were collected from 4-week
plants, frozen immediately in liquid nitrogen, and stored
at 80 °C. RNA was isolated using the RNA isolation kit
(Qiagen, Valencia, CA). Reverse transcriptase–PCR was performed using the one-step RT–PCR Kit (Qiagen). A list of primers specific to every member of the ALP gene family is
provided in the Table 1, Supplementary Material online.
The transcription initiation site was determined with the GeneRacer Kit (Invitrogen,Carlsbad, CA). Rapid amplification of
cDNA ends (RACE) products were subcloned using the
TOPO Cloning Kit (Invitrogen) and sequenced.
Results
Phylogenetic Analysis of the ALP Gene Family
The NCBI protein database was searched with the
translated sequence of wheat EST BF200640. The EST
showed the highest (75%) similarity at the amino acid level
to the aci-reductone dioxygenase–like protein from rice (accession AAP53794). It is proposed to name wheat proteins
encoded by this gene family as ALP. In the previous mapping study, a single ALP gene was located on chromosomes
1A, 2A, and 4A, and 2 genes were located on chromosome
6A (Akhunov, Akhunova, et al. 2003). Following the rules
of nomenclature for wheat genes and to reflect the sequence
of gene duplication (see below), these genes were designated as ALP-A1 (chromosome 1A), ALP-A2 (chromosome
4A), ALP-A3 (chromosome 2A), ALP-A4.1 (chromosome
6A), and ALP-A4.2 (chromosome 6A). Genes orthologous
to ALP-A1 on chromosomes 1B and 1D were designated
ALP-B1 and ALP-D1, respectively. The ancestral gene of
this paralogous gene set is ALP-A1, whereas ALP-A2,
ALP-A3, ALP-A4.1, and ALP-A4.2 genes are duplicated
genes (Akhunov, Akhunova, et al. 2003).
To assess the frequency of duplication of the ALP loci
in the tribe Triticeae, Southern blots of 12 diploid species of
the Triticum/Aegilops alliance and a single species from an
additional 9 genera were hybridized with the BF200640
probe. Except for T. monococcum, T. urartu, and Aegilops
umbellulatum, the remaining species showed a single restriction fragment, suggesting that they possessed only
the ancestral ALP-1 gene. The number of restriction fragments per profile suggested that there were at least 3 ALP
loci in T. monococcum and at least 2 in Ae. umbellulatum.
Only a single gene was detected in rice. The gene was
on rice chromosome 10 and was very likely orthologous to
the locus on wheat chromosomes 1A, 1B, and 1D because
the distal end of the wheat chromosomes of homoeologous
group 1 is homoeologous with rice chromosome 10
(Sorrells et al. 2003).
The 5 ALP genes present in the T. urartu genome were
acquired by tetraploid and hexaploid wheats, such as durum
wheat and T. aestivum (Akhunov, Akhunova, et al. 2003).
Durum wheat and T. urartu BAC clones harboring each
gene were isolated from BAC libraries and sequenced using
either primer walking along the clone or by shotgun sequencing of the entire BAC (table 2). Triticum urartu
BAC clones containing the ALP-A2, ALP-A3, ALP-A4.1,
and ALP-A4.2 genes were sequenced completely. Nucleotide sequence of the durum wheat ALP-A1 gene was employed in the design of primers for sequencing of the T.
urartu ALP-A1 gene (1,736 bp), Triticum durum ALP-B1
gene (1,421 bp), and Ae. tauschii ALP-D1 gene (1,738
bp) (table 2). The following sequence descriptions are
based on data obtained for both the T. urartu and the durum
wheat A-genome BAC sequences, unless it is necessary to
discuss differences between them.
The ancestral gene of the paralogous set, ALP-A1, is
2,368 bp long from the transcription start to the polyadenylation site and codes for a polypeptide 183 amino acids
long. The gene has 5 exons and 4 introns.
A 1,792-bp alignment of intronic and exonic sequences was used to infer the phylogeny of the ALP gene family.
The nucleotide similarity levels between genes ranged from
90.1% to 98.4%. Neighbor-Joining tree based on these sequences (not shown) had the same branching pattern as the
maximum parsimony tree (fig. 1). In the maximum parsimony tree, the T. urartu and durum wheat orthologous
Mechanisms and Fate of Dispersed Gene Duplications 543
FIG. 1.—(A) A maximum parsimony tree of the ALP family based on nucleotide sequences of genes including introns. Bootstrap values based on
1,000 replicates are indicated above the branches. Aegilops tauschii was used as an outgroup species. The lengths of tree branches are proportional to the
number of mutations. (B) Amino acid sequence alignment of the ALP gene family. Only variable sites are shown and exon–exon junctions are indicated
above the amino acid alignments. Stop codons are indicated by asterisks. The scale bar is 10-nt substitutions. Wheat in the figure stands for durum wheat.
genes located on the same chromosome are clustered together (fig. 1). Each node of the tree had a high bootstrap
confidence. The topology of the tree showed that ALP-A1 is
the ancestral locus and indicates that the evolution of the
ALP gene family proceeded by interchromosomal duplications in the order ALP-A1 / ALP-A2 / ALP-A3 /
ALP-A4. The last duplication was followed by an intrachromosomal duplication on chromosome 6A (ALP-A4.1 and
ALP-A4.2 genes).
A single conversion event was detected in durum
wheat between the tandem duplicated loci ALP-A4.1 and
ALP-A4.2. The tract of the gene conversion was 1,088
bp long (P 5 0.00661 after Bonferroni correction). As
a consequence of the conversion, the terminal branches
leading to the durum wheat ALP-A4.1 and ALP-A4.2 genes
are disproportionately short (fig. 1). No gene conversions
were detected among the ALP genes in T. urartu. Therefore,
an absence of gene conversions was assumed in all further
computations.
To determine whether or not the relationships observed in T. urartu accession G1812, the source of the
BAC library used here, were representative of T. urartu
as a whole, portion of each gene was sequenced in an
additional 6 T. urartu accessions representative of the
geographic distribution of this species (table 1) and
maximum parsimony trees were constructed (Fig. 1, Supplementary Material online). Although the trees were based
on only a 1,012-bp sequence, which lowered the confidence
in tree branching, the topology of 5 of the 6 trees was identical to that of the tree in figure 1. The remaining tree
showed a single-gene switch; ALP-A4.1 clustered with
ALP-A3 rather than with its tandem duplication ALP-A4.2.
All genes, except for the ALP-A2 gene on chromosome
4A, had an uninterrupted coding sequence (fig. 1B). The
ALP-A2 gene had mutations in the coding sequence resulting
in 2 stop codons. Because these stop codons were absent
from the ALP-A3, ALP-A4.1 and ALP-A4.2 genes, the
ALP-A2 gene must have had acquired these mutations after
the next duplication had originated. The 2 stop codons were
present in both T. urartu and durum wheat, showing that
they occurred before the divergence of the T. urartu and
durum wheat haplotypes. Another stop codon in the coding
sequence was in the T. urartu ALP-A3 gene. This stop codon
was present in the T. urartu haplotype but not in its durum
wheat orthologue, indicating that this mutation originated
after the divergence of wheat and T. urartu haplotypes.
The stop codons in exons 3 and 4 of the ALP-A2 and
ALP-A3 genes, respectively, were monomorphic in the 7
investigated T. urartu accessions, but the stop codon in exon
5 was polymorphic, being present in 4 of the 7 accessions.
A total of 890 bp of third-codon positions and intronic
sequences were used to estimate the time of the origin of
each member of the ALP paralogous set (fig. 2), using
the 2.7 MYA as the divergence time of the A- and Dgenomes (Dvorak and Akhunov 2005). The ALP-A4.2
and ALP-A4.1 genes were located on the same BAC clones
in T. urartu and durum wheat in tandem.
Structure and Evolution of the ALP Gene Family
The 5# RACE was performed on T. urartu and durum
wheat RNAs. Sequencing of 5# RACE products showed that
the 5# untranslated regions (UTR) of the ALP-A1 gene is 185 bp
long. The 3# end of the ALP-A1 gene was inferred from the
lengths of the 3# EST sequences in the NCBI database to be
at least 200 bp. No known promoter or enhancer elements
were found with the promoter prediction software (www.
softberry.com/berry.phtml) within a 1,297-bp sequenced region upstream of the transcription initiation site of ALP-A1.
ALP-A2 Duplication
The fragment of chromosome 1A duplicated to chromosome 4A was 2,094 bp long and included the complete
coding sequence and 19 bp of the 5# UTR and 92 bp of the
3# UTR of the gene. In the ALP-A1 gene, the ends of the
DNA fragment shared with the ALP-A2 locus were flanked
by 9-bp GTTGGTTTC inverted repeats (henceforth IR-1)
(fig. 3). The left break point was at the gene-proximal
boundary of IR-1, and the right break point was 3 bp inside
IR-1 (fig. 3). No target-site duplication was found at the insertion site on chromosome 4A. The entire promoter and
166 bp of the 5# UTR and 108 bp of the 3# UTR of the
ancestral ALP-A1 gene were lost from the duplicated
ALP-A2 gene. A total of 49 bp of new DNA has been
544 Akhunov et al.
FIG. 2.—Reconstruction of the evolution of ALP gene family. Timing of duplication events in million years is shown on the left. Corresponding
regions between loci are connected with gray rectangles.
inserted and 292 bp deleted from introns of the ALP-A2
gene since its origin. All these indels are present in the
ALP-A3 and -A4 genes indicating that they originated before the other duplications occurred. All deletions are
flanked by di- or trinucleotide repeats in ALP-A1, suggesting that they originated by replication slippage (Wicker,
Yahiaoui, et al. 2003). Sequences flanking the inserted gene
fragment on chromosome 4A do not have any similarity to
known transposable or repetitive elements, and they do not
have any significant match with sequences in the NCBI database. An exception is the Sabrina element (no. 6 in fig. 2)
located upstream of the ALP-A2 gene–coding sequence (fig.
2). This element was inserted less then 0.9 MYA because it
is absent from all subsequent duplications. A 259-bp insertion occurred downstream of the ALP-A2 gene (fig. 2) and
was also inserted less than 0.9 MYA.
ALP-A3 Duplication
During the second duplication, a DNA fragment 7,021
bp long was translocated from chromosome 4A to chromosome 2A, generating the ALP-A3 locus (fig. 2). The fragment included the entire promoterless ALP gene previously
duplicated to chromosome 4A from chromosome 1A. The
fragment acquired an additional 862 bp at the 5# end and
4,563 bp at the 3# end that bore no similarity to ALP-A1 (fig.
2). The comparison of the 4A BAC sequence with the 2A
BAC sequence revealed the following sequences surrounding the 5# excision site on chromosome 4A (5# to 3# order):
1) a 1,273-bp direct repeat ending with 14-bp inverted
repeats AGACTATTCTAATCC (henceforth IR-2), 2)
(TA)32(GA)12 simple sequence repeat (SSR), 3) IR-2,
and 4) a TAT transposon-like sequence (a CACTA-type
DNA transposon) truncated from the 5# end (fig. 2). At
the 3# end of the 7,021-bp fragment was another 1,273-bp
direct repeat ending with IR-2 and a TA dinucleotide SSR
(fig. 2). The following elements were surrounding the 5#
insertion site on chromosome 2A (5# to 3# direction): 1)
(TA)4GA SSR, 2) IR-2, and 3) the truncated CACTA-type
DNA transposon (figs. 2 and 3). At the 3# end, there was
a 1,273-bp direct repeat ending with IR-2 and a TA dinucleotide SSR (fig. 2). The 14-bp IR-2 is a part of a larger,
30-bp element with an internal 24-bp sequence able to form
a perfect hairpin (fig. 3).
ALP-A4.1 Duplication
The third duplication translocated a fragment containing the ALP-A3 gene from chromosome 2A to chromosome
6A, generating the ALP-A4.1 locus (figs. 2 and 3). The 5#
end of the fragment is flanked by a compound SSR consisting of TA, GA, and GT dinucleotide motifs (fig. 3). No SSR
was detected at the 3# end of the 6,959-bp fragment. IR-2
repeats were at both termini of the 6,959-bp fragment (fig. 3).
ALP-A4.2 Duplication
The fourth duplication originated by the insertion of
a 6,902-bp ALP-A4.1 fragment immediately downstream
of the ALP-A4.1 6,959-bp fragment, creating a tandem duplication (figs. 2 and 3). This second gene is designated
ALP-A4.2. No SSR flanks the ALP-A4.2 duplication. The
duplicated 6,902-bp fragment terminates with IR-2 at both
ends. After the last duplication, the copia-type retrotransposon was inserted upstream of the ALP-A4.1 locus (fig. 2).
The ALP-A4.2 duplication was fortunate because it
provided unequivocal information about the nucleotide sequence of the insertion site and the end sequences of the
duplicated fragment. During the ALP-A4.2 duplication,
the 6,902-bp fragment was inserted between the last 2
nucleotides (G and T) of the ALP-A4.1 fragment, as evidenced by the sequence GTTT at the 3# end of the insertion
Mechanisms and Fate of Dispersed Gene Duplications 545
FIG. 3.—Nucleotide sequences flanking the duplicated DNA fragments. Inverted repeats are shown by arrows.
(figs. 3 and 4). The inserted fragment is 6,902 bp long and
begins with the A of the 5# ACAC sequence immediately
upstream of the 5# IR-2 repeat and ends with T of the 3#
GTGT sequence immediately downstream of the 3# IR-2
repeat. The entire 5# end, starting with the A, can form a perfect hairpin with the entire 3# end ending with the T. Examination of the sequences associated with the ALP-A3 and
ALP-A4.1 duplications revealed that they have identical
structure to that of the ALP-A4.2 duplication and, like
the ALP-A4.2 insertion, each is flanked by a G at the 5#
end and a T at the 3# end (fig. 3). These structural characteristics of the duplications suggest that all duplications
subsequent to ALP-A2 originated via transposition-like
duplication of the same 6,902-bp fragment. No target-site
duplications were observed.
To verify these inferences, 18,432 T. urartu BAC
clones were hybridized with a probe generated for the sequence upstream of the 3# hairpin containing IR-2 (fig. 3).
A total of 299 BAC clones hybridized with the probe suggesting that 1.6% of the BAC clones in the T. urartu BAC
library contained sequences similar to the terminal sequence of this putative transposon. Sequencing was attempted by primer walking using a primer designed from the 3#
end of the probe sequence. Of 299 BAC templates, 110 generated sequences with phred score below 20 and shorter
than 100 bp and were discarded. The remaining 189 se-
quences were aligned, and those producing ambiguous
alignments were removed. The remaining 96 clones had
a 3# IR-2 sequence almost identical to that flanking the
ALP-4.2 gene on the 3# side (fig. 4). In 1 BAC clone,
the terminal region suffered a short deletion. Variation
among the remaining 95 sequences was very low and 94
of the 95 IR-2 sequences ended with the T of the GTGT
motif and all were flanked by the TTA motif, forming
FIG. 4.—The consensus sequence of transposon insertion sites and its
comparison with the ALP-A4.2 sequence. The bars indicate the frequency
of clones with different nucleotide at the nucleotide position relative to the
consensus sequence. In SSR sequences (an arrow), the second nucleotide is
the next most frequent alternative at that site.
546 Akhunov et al.
the GTTT sequence observed in ALP-A4.2. In few BACs,
single nucleotide substitutions differentiated the sequence
from the consensus (fig. 4). In 89 of the 96 BAC clones,
IR-2 was flanked by a TA or TG SSR, some being compound and one consisting of a tetranucleotide motif. Only
in 6 clones the insertion site was not flanked by an SSR, like
the 3# end of the ALP-A4.2 insertion.
Expression of the ALP Gene Family
The ALP-A1 gene lost a part of its 5#UTR sequence
and all upstream regulatory elements during the first duplication that generated ALP-A2. Surprisingly, search of NCBI
EST database provided evidence that at least one of the duplicated genes is expressed in wheat because 2 classes of
wheat ESTs having different 5#UTR sequences were
found. One class of ESTs corresponded to the ancestral
gene ALP-A1 and contained a 5#UTR sequence similar
to the ALP-A1 gene 5#UTR. The second class had 5#UTRs
similar to the sequences upstream of the 5# end of the ALPA3–duplicated segment. To verify this inference, RT–PCR
was performed using primers specific to the sequences of
every member of the ALP gene family (Table 1, Supplementary Material online). One of the RT–PCR primers
in every primer pair set was designed to span a junction
of neighboring exons to prevent amplification of contaminating DNA and to allow only mature intronless mRNA
amplification. RNA isolated from salt stressed and nonstressed tissues of T. urartu was used as a template. The
salt stress and control regimes were investigated because
cDNA libraries from which most of the wheat ALP ESTs
originated were prepared from salt- and cold-stressed plant
mRNAs and, hence, it was possible that ALP gene expression could be stress related. RNA isolated from leaves of
durum wheat plants grown in nutrient solution without salt
(control condition) was also used.
Transcription of the ALP-A1 and ALP-A3 genes was
detected in both T. urartu and durum wheat (fig. 5A).
No difference was observed between plants grown under
salt stress and control conditions (data not shown).
To detect the boundaries of 5# UTRs of the expressed
genes, 5#-RACE products generated with primers specific
for the ALP-A1 and ALP-A3 genes were sequenced. The
first exon of the ALP-A1 gene consisted of a 185 bp long
5# UTR and 12 bp of the coding DNA sequence (fig. 5B).
The first exon of the ALP-A3 gene was 398 bp longer due to
change in the location of the start of transcription. The ALPA3–coding region was of the same length as the coding region of the ALP-A1. The lengths of the 3# UTRs were 200
bp in the ALP-A1 gene and 92 bp in the ALP-A3 gene, as
inferred from comparison with the NCBI EST database.
Our data is consistent with the expression of the ALPA3 gene being driven by new regulatory elements located
within the DNA segment flanking at the 5# end the 2,094-bp
insertion on chromosome 4A. To confirm the existence of
mRNA molecules initiated from a new promoter element on
chromosome 2A, RT–PCR with right primer spanning the
first exon–exon junction (R in fig. 5) and a left primer located within the 5#UTR region was performed (primers 1–3
in fig. 5). The results of RT–PCR (fig. 5C) were consistent
with the location of the experimentally detected start of
transcription 284 bp downstream of the 5# end of the
7,021-bp DNA fragment inserted into chromosome 2A,
making the 5# UTR of the new duplicated gene 398 bp longer. As a negative control, the left primer (primer 4 in fig. 5)
was designed to the region located upstream of the experimentally detected start of transcription (primer 4, fig. 5B).
This RT–PCR did not produce any PCR product (fig. 5C).
Comparison of the region surrounding the new transcription
initiation site with the database of repetitive sequences at
GIRI revealed a nucleotide sequence (no. 2 element in
fig. 2) similar to the CACTA class of grass DNA transposons immediately upstream of the start of transcription (fig.
5D). This element exists in all of the duplications.
Selection Operating on the ALP Genes
The intensity of selection in paralogs was estimated
from the ratio of the number of substitutions per nonsynonymous site (dN) and the number of substitutions per synonymous site (dS). Relaxation of purifying selection causes
the dN/dS ratio to approach 1.0. The maximum likelihood
analysis implemented in the HyPhy package was used
to estimate the rates of evolution of the ALP gene family
(Kosakovsky-Pond et al. 2004), using the ALP-B1 gene sequence as an outgroup. A total of 399 nt (133 codons) were
analyzed. Both likelihood ratio test and AIC indicated that
the HKY85 codon subsitution model (Hasegawa et al.
1985) fit the data best. Using this model, the dN/dS ratio
was estimated for every branch of the tree (fig. 6). The maximum likelihood estimation of all model parameters was
performed independently for each branch.
The relative rate test showed that purifying selection
operating on the genes was relaxed after duplication (table
3). The tree was also partitioned into 2 clades (A and B) at
internodes N1, N2, and N3 (fig. 6). The A clade contained
the most recently duplicated genes, and the B clade contained the rest of the tree in each case. Different dN/dS rate
models were tested in clade A, clade B, and the internode
connecting both clades (Table 2, Supplementary Material
online). Except for the case when N3 was selected as the
separating internode, models with rate difference for the
2 clades fit data better than the model implying the equality
of dN/dS rates. The results of this analysis are consistent
with the results of the relative rate test (table 3). The highest
log likelihood value was obtained when the tree was split at
internode N1 and when the same dN/dS rate model was used
for internode N1 and clade A and a different rate model for
the rest of the tree (Table 2, Supplementary Material online). This outcome provided a strong indication that purifying selection was relaxed after the first duplication. When
the tree was split at internode N2, the log likelihood still
showed a statistically significant difference, indicating an
additional relaxation of purifying selection operating on
genes ALP-A3 and ALP-A4. The dN/dS ratio for the ALPA4.2 gene was 1.264 (fig. 6), which was not significantly
different from 1.0 (P 5 0.36).
Discussion
Rates and Mechanisms of Gene Duplication
Of 21 investigated Triticeae species, duplicated ALP
loci were detected only in diploid wheats and one diploid
Mechanisms and Fate of Dispersed Gene Duplications 547
FIG. 5.—Analysis of ALP gene family expression. (A) RT–PCR with gene-specific primers. (B) Structure of the ancestral gene ALP-A1 and duplicated
gene ALP-A3. The length of the first exon is indicated. The start of transcription is indicated by an arrow and labeled 11. The region of the transcribed
DNA located between the new start of transcription and duplicated gene is shown as a crosshatched box. The black and open boxes correspond to exons
and UTRs, respectively. Primers are indicated as arrows and numbered 1–4. The reverse primer is indicated by an arrow labeled R. (C) RT–PCR with the
primers located within and outside of the 5# UTR. The numbering corresponds to RT–PCR primers shown in part B of the figure. M is the size standard.
(D) Comparison of sequences upstream of the duplicated ALP genes with the sequence of a CACTA-like transposon in the TREP database (bottom).
Wheat in the figure stands for durum wheat.
Aegilops; only a single gene was detected in the rest of the
species. This observation and the fact that there is also only
a single gene in rice, located on a chromosome homoeologous with wheat chromosomes 1A, 1B, and 1D, indicates
that a single ALP gene is the ancestral state in Triticeae and
likely across the entire grass family.
Radiation of Triticeae spans 10 Myr (Huang et al.
2002; Ramakrishna et al. 2002; Dvorak et al. 2006). The
slow duplication rate of the ancestral locus seems therefore
consistent with the slow rate with which interchromosomally
duplicated loci have been evolving in diploid species of
Triticeae, 2.9 3 10 3 gene 1 Myr 1 (Dvorak and Akhunov
2005). However, the interchromosomal duplication rate
subsequent to the origin of ALP-A2 was greatly accelerated.
The ALP-A2, -A3, -A4.1, and -A4.2 loci are 1.9, 0.9, 0.6,
and 0.4 Myr old, respectively. Their average age is 0.95
Myr within which 2 interchromosomally duplicated genes
evolved. Hence, the duplication rate after the first duplication increased to 5.2 3 10 2 gene 1 Myr 1. This acceleration of duplication rate was caused by a fortuitous
insertion of the ALP-A2 gene into a novel class of transpo-
sons containing IR-2. The IR-2 sequences are part of a larger
sequence capable of forming a perfect cruciform at each end
of a transposon-like element. The IR-2 sequence is an end
sequence of the 1,273-bp repeat present in ALP-A2, ALPA3, ALP-A4.1, and -A4.2. A remarkable characteristic of
this element is its propensity to insert itself into simple
FIG. 6.—dN/dS ratio estimates for the gene tree branches. The question
mark indicates that the dN/dS ratio is not defined for the branch. N1, N2,
and N3 are internodes of the tree used for testing evolution rate models.
548 Akhunov et al.
Table 3
Pair-wise Relative Evolution Rates of A-Genome Genes
Using ALP-B1 Gene as an Outgroup
Gene Triplet
ALP-B1
ALP-B1
ALP-B1
ALP-B1
ALP-B1
ALP-B1
ALP-B1
ALP-B1
ALP-B1
ALP-B1
*
(ALP-A3, ALP-A2)
(ALP-A3, ALP-A1)
(ALP-A3, ALP-A4.1)
(ALP-A3, ALP-A4.2)
(ALP-A2, ALP-A1)
(ALP-A2, ALP-A4.1)
(ALP-A2, ALP-A4.2)
(ALP-A1, ALP-A4.1)
(ALP-A1, ALP-A4.2)
(ALP-A4.1, ALP-A4.2)
Likelihood Ratio
Probability
7.070
7.033
3.657
2.758
6.706
0.996
3.823
6.157
9.762
1.781
0.029*
0.029*
0.161
0.252
0.035*
0.608
0.148
0.046*
0.008*
0.411
themselves generated by recent duplications. By removing
all genes that could have been duplicated by the secondary
duplication process from data reported by Dvorak and
Akhunov (2005), the primary rate of interchromosomal gene
duplication is 2.5 3 10 3 gene 1 Myr 1. The secondary rate
may vary among gene families; for the ALP family the rate
is 5.2 3 10 2 gene 1 Myr 1. The secondary duplication rate
for the ALP gene family is 20 times greater than the primary
duplication rate. The propagation of duplicated gene fragments by Helitrons (Morgante et al. 2005) and MULEs (Jiang
et al. 2004) are other examples of the secondary duplication
process and also undoubtedly happen with very high rates.
Lifespan of a Duplicated Gene
Statistically significant.
or compound SSRs, most of them based on the TA or TG
motifs (fig. 4). The insertion site is almost always flanked
by a TA dinucleotide. These findings are consistent with the
inference that the IR-2 is the terminus of a transposon, and
the duplications of the ALP-A2, ALP-A3, and ALP-A4.1 loci
were mediated by the transposon. The terminal sequence of
IR-2 includes CACTA motif, which characterizes a major
transposon class in wheat (Wicker, Guyot, et al. 2003).
However, we failed to detect a target-site duplication upon
insertion of the transposon, which is one of the characteristics of CACTA transposons, and the rest of its sequence
bears no similarity to the wheat or any other CACTA-type
transposons or any other known transposon.
A total of 0.5% of all T. urartu BAC clones contained
IR-2 sequence. And an additional 1.1% hybridized with
a sequence derived from the 1,273-bp repeat but very likely
had diverged termini. Although the characterization of this
mobile element family requires an additional work, there is
little doubt that it represents an important component of the
intergenic space in the T. urartu genome and contributes to
its dynamic state. The acceleration of duplication rate of the
ALP genes after the first duplication caused by recurrent
transposition facilitated by a IR-2–containing transposon
provides a direct evidence for the importance of DNA transposons for new gene evolution via gene duplication.
The very high rate with which T. urartu intergenic
DNA accumulates large indels (Dvorak et al. 2006) may
account for the curious observation that it was always
the most recently duplicated gene that produced the next
duplication. Two of the 4 duplicated genes suffered insertions of large retroelements, and an additional indel occurred in the immediate vicinity of one of the IR-2
sequence at the ALP-A2 locus. It is possible that insertions
of large retroelements alter the ability of repeated elements
to duplicate. The high rate with which large indels occur in
the T. urartu genome may leave only a short time window
for duplication. Hence, the youngest element may have the
greatest chance to be the source of the next duplication.
The acceleration of the gene duplication process after
the first duplication should be taken into account in the estimation of gene duplication rates. The overall duplication rate
of 2.9 3 10 3 gene 1 Myr 1 (Dvorak and Akhunov 2005)
actually consists of 2 very different rates: 1) the primary
rate involving duplications of ancestral genes and 2) the
secondary rate of duplications of genes and gene fragments
The comparison of sequences of duplicated ALP genes
with the ancestral gene showed that each duplication produced a gene with a complete coding sequence and each
duplicated gene had a complete ORF at the time of duplication. This is unequivocally shown by the full-length
mRNA transcribed from the ALP-A3 gene. Since their origin, 3 of the 4 duplicated genes in T. urartu and 2 of the 4 in
durum wheat either have acquired stop codons, which truncated their products, or large retroelements have been inserted into their promoters effectively precluding their
expression. Using an average age of the duplicated loci
of 0.95 Myr and the fact that in the T. urartu and durum
wheat lineages 6/8 of duplicated genes were not expressed,
the rate of nonfunctionalization of duplicated genes was
0.79 gene 1 Myr 1. Substituting this constant into the exponential equation 0.5 5 e kt, to compute the half-life, the
half-life of a duplicated gene is 0.9 Myr. This empirical
rate of nonfunctionalization of duplicated genes is 4-fold
higher than the half-life of 3.2 Myr computed from the
genomic sequence of Arabidopsis (Lynch and Conery
2000). The use of stop codons as indicators of nonfunctionalization could underestimate the actual nonfunctionalization rate. For example, the ALP-A4.1 and ALP-A4.2
genes have no stop codons but are not expressed. The absence of expression of the ALP-A4.1 gene could be explained by the insertion of retroelement Claudia in the 5#
UTR of the gene. The factors resulting in the absence of
ALP-A4.2 gene expression are unknown.
Expression of Duplicated Genes
The analysis of the wheat EST database and transcription analysis of the ALP genes showed that the ancestral
gene and the ALP-A3–duplicated gene are abundantly transcribed. Transcription of the duplicated gene is driven by
regulatory elements located within the sequence having
similarity to CACTA type of transposons. A similar case
has been described in Japanese morning glory in which
transcription of a captured gene was initiated within the sequence of the CACTA-type transposon Tpn1 (Kawasaki
and Nitasaka 2004). Transcription from the promoters of
the transposable elements was hypothesized for gene fragments duplicated by MULEs; however, the analysis of these
transcripts showed that all of them were pseudogenes
(Juretic et al. 2006).
The transcription of the ALP-A3 gene was as abundant
as that of the ancestral ALP-A1 gene, but the transcript had
Mechanisms and Fate of Dispersed Gene Duplications 549
a longer 5# UTR. Both genes were constitutively expressed
under the limited number of developmental and environmental conditions tested. The fact that the duplicated gene
ALP-A3 has a new promoter qualifies it as a new gene.
Dispersed Duplicated Genes and Selection
The dN/dS ratio of 0.028 along the ALP-A1 gene
branch as compared to the dN/dS ratio of 0.1 along the
ALP-B1 gene branch show no relaxation of purifying selection acting on ALP-A1 after the origin of duplicated genes.
The dN/dS ratio of all duplicated genes was significantly
higher than that of the ALP-A1 gene, suggesting a relaxation
of purifying selection acting on them. Because all genes
were identical after their duplication and because ALPA3 is expressed, it is very likely that most of the duplicated
genes were also expressed after their duplication and may
have temporarily been under purifying selection. It is therefore interesting to note that the expressed duplicated gene
ALP-A3 had one of the highest dN/dS ratios (0.73) and that
it has accumulated a total of 15 amino acid differences compared to the ancestral gene.
Duplicated genes generated by polyploidy reside in
their original environment after the whole-genome duplication. The subsequent evolution therefore follows one of the
paths described in Introduction: nonfunctionalization or
neofunctionalization of one of the genes or subfunctionalization of both. Duplicated genes produced by interspersed
duplications, exemplified by the ALP family, are located in
a new genomic environment that is different from that of the
ancestral genes. Such genes will in most cases be unequal
partners, the ancestral gene maintaining the original function and remaining under strong purifying selection, as
shown here for the ALP-A1 gene, and the duplicated genes
most often becoming pseudogenes, like the ALP-A2, ALPA4.1, and ALP-A4.2, or, rarely, resulting in the evolution of
new genes exemplified by the ALP-A3 gene.
Repeated Sequences and New Gene Evolution by
Interspersed Gene Duplication
The evolution of the ALP gene family illustrates the
importance of repeated DNA making up the intergenic space
in the Triticeae genomes for new gene evolution. Repeated
DNA in Triticeae genomes facilitates gene duplication and
may also be an inexhaustible source of ready-made promoters to drive the expression of duplicated genes. Repeated
DNA thus facilitates both prerequisites for the evolution of
new genes via gene duplication. Viewing repeated DNA
from this point of view, it is hard to believe that this genomic
component of the large plant genomes is selectively neutral.
Supplementary Material
Supplementary Figure 1 and Tables 1 and 2 are available at Molecular Biology and Evolution online (http://
www.mbe.oxfordjournals.org/).
Acknowledgments
We would like to thank Bhupinder Saini and Paula
Goines for assistance with the sequencing of the duplicated
ALP genes, Hieu Phan for assistance with the sequencing
of the transposon insertion sites, Karen Deal for editorial
suggestions during the preparation of manuscript, and 4
anonymous reviewers for providing very helpful comments
on the manuscript. This work was supported by National
Science Foundation Plant Genome Research Program
under Contract Agreement No. DBI-9975989.
Literature Cited
Akhunov ED, Akhunova AR, Linkiewicz AM, et al. (31 coauthors). 2003. Synteny perturbations between wheat homoeologous chromosomes by locus duplications and deletions
correlate with recombination rates along chromosome arms.
Proc Natl Acad Sci USA. 100:10836–10841.
Akhunov ED, Akhunova AR, Dvorak J. 2005. BAC libraries of
Triticum urartu, Aegilops speltoides and Ae. tauschii, the diploid ancestors of polyploid wheat. Theor Appl Genet.
111:1617–1622.
Akhunov ED, Goodyear JA, Geng S, et al. (33 co-authors). 2003.
The organization and rate of evolution of the wheat genomes
are correlated with recombination rates along chromosome
arms. Genome Res. 13:753–763.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990.
Basic logical alignment search tool. J Mol Biol. 215:403–410.
Arumuganathan K, Earle ED. 1991. Nuclear DNA content of
some important plant species. Plant Mol Biol Rep. 9:208–218.
Blanc G, Wolfe KH. 2004. Widespread paleopolyploidy in model
plant species inferred from age distributions of duplicate genes.
Plant Cell. 16:1667–1678.
Brunner S, Pea G, Rafalski A. 2005. Origins, genetic organization
and transcription of a family of non-autonomous helitron elements in maize. Plant J. 43:799–810.
Cenci A, Chantret N, Kong X, Gu Y, Anderwson OD, Fahima T,
Distelfeld A, Dubcovsky J. 2003. Construction and characterization of a half million clone BAC library of durum wheat
(Triticum turgidum ssp. durum). Theor Appl Genet.
107:931–939.
Dubcovsky J, Luo MC, Zhong GY, Bransteitter R, Desai A, Kilian
A, Kleinhofs A, Dvorak J. 1996. Genetic map of diploid wheat,
Triticum monococcum L., and its comparison with maps of
Hordeum vulgare L. Genetics. 143:983–999.
Dvorak J, Akhunov ED. 2005. Tempos of deletions and duplications of gene loci in relation to recombination rate during diploid and polyploid evolution in the Aegilops-Triticum alliance.
Genetics. 171:323–332.
Dvorak J, Akhunov ED, Akhunova AR, Deal KR, Luo MC. 2006.
Molecular characterization of a diagnostic DNA marker for domesticated tetraploid wheat provides evidence for gene flow
from wild tetraploid wheat to hexaploid wheat. Mol Biol Evol.
23:1386–1396.
Dvorak J, di Terlizzi P, Zhang HB, Resta P. 1993. The evolution of
polyploid wheats: identification of the A genome donor species. Genome. 36:21–31.
Dvorak J, Luo M-C, Yang Z-L, Zhang H-B. 1998. The structure of
Aegilops tauschii genepool and the evolution of hexaploid
wheat. Theor Appl Genet. 97:657–670.
Dvorak J, McGuire PE, Cassidy B. 1988. Apparent sources of the
A genomes of wheats inferred from the polymorphism in abundance and restriction fragment length of repeated nucleotide
sequences. Genome. 30:680–689.
Dvorak J, Zhang HB. 1990. Variation in repeated nucleotide
sequences sheds light on the phylogeny of the wheat B and
G genomes. Proc Natl Acad Sci USA. 87:9640–9644.
Gordon D, Abajian C, Green P. 1998. Consed: a graphical tool for
sequence finishing. Genome Res. 8:195–202.
550 Akhunov et al.
Hasegawa M, Kishino K, Yano T. 1985. Dating the human-ape
splitting by a molecular clock of mitochondrial DNA. J Mol
Evol. 22:160–174.
Huang S, Sirikhachornkit A, Su X, Faris J, Gill BS, Haselkorn R,
Gornicki P. 2002. Genes encoding plastid acetyl-CoA carboxylase and 3-phopshoglycerate kinase of the Triticum/Aegilops
complex and the evolutionary history of polyploid wheat. Proc
Natl Acad Sci USA. 99:8133–8138.
Hughes AL. 2002. Adaptive evolution after gene duplication.
Trends Genet. 18:433–434.
Jiang N, Bao Z, Zhang X, Eddy SR, Wessler SR. 2004. PackMULE transposable elements mediate gene evolution in
plants. Nature. 30:569–573.
Jones CD, Custer AW, Begun DJ. 2005. Origin and evolution of
a chimeric fusion gene in Drosophila subobscura, D. madeirensis and D. guanche. Genetics. 170:207–219.
Juretic N, Hoen DR, Huynh ML, Marrison PM, Bureau TE. 2006.
The evolutionary fate of MULE-mediated duplications of host
gene fragments in rice. Genome Res. 15:1292–1297.
Kapitonov VV, Jurka J. 2001. Rolling-circle transposons in eukaryotes. Proc Natl Acad Sci USA. 98:8714–8719.
Katju V, Lynch M. 2003. The structure and early evolution of
recently arisen gene duplicates in the Caenorhabditis elegans
genome. Genetics. 165:1793–1803.
Kawasaki S, Nitasaka E. 2004. Characterization of Tpn1 family in
the Japanese morning glory: En/Spm-related transposable elements capturing host genes. Plant Cell Physiol. 45:933–944.
Kihara H. 1944. [Discovery of the DD-analyser, one of the ancestors of Triticum vulgare]. Agric Horticulture (Tokyo). 19:13–
14. Japanese.
Kosakovsky-Pond SL, Frost SD, Muse SV. 2004. HyPhy: hypothesis testing using phylogenies. Bioinformatics. 21:676–679.
Luo MC, Yang ZL, Dvorak J. 1998. Position effects of ribosomal
RNA multigene loci on meiotic recombination in wheat. Genetics. 149:1105–1113.
Lynch M, Conery JS. 2000. The evolutionary fate and consequences of duplicate genes. Science. 290:1151–1154.
Lynch M, Force A. 2000. The probability of duplicate gene preservation by subfunctionalization. Genetics. 154:459–473.
McFadden ES, Sears ER. 1946. The origin of Triticum spelta
and its free-threshing hexaploid relatives. J Hered. 37:81–89,
107–116.
Moore RC, Purugganan MD. 2003. The early stages of duplicate
gene evolution. Proc Natl Acad Sci USA. 100:15682–15687.
Moore RC, Purugganan MD. 2005. The evolutionary dynamics of
plant duplicate genes. Curr Opin Plant Biol. 8:122–128.
Morgante M. 2006. Plant genome organization and diversity: the
year of the junk! Curr Opin Biotech. 17:168–173.
Morgante M, Brunner S, Pea G, Fengler K, Zuccolo A, Rafalski A.
2005. Gene duplication and exon shuffling by helitron-like
transposons generate intraspecies diversity in maize. Nat
Genet. 37:997–1002.
Ohno S. 1970. Evolution by gene duplication. Berlin (Germany):
Springer.
Otto SP, Whitton J. 2000. Polyploid incidence and evolution.
Annu Rev Genet. 34:401–437.
Paterson AH, Bowers JE, Chapman BA. 2004. Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc Natl Acad Sci USA.
101:9903–9908.
Peng JH, Zadeh H, Lazo GR, et al. (25 co-authors). 2004. Chromosome bin map of expressed sequence tags in homoeologous
group 1 of hexaploid wheat and homoeology with rice and Arabidopsis. Genetics. 168:609–623.
Posada D, Grandall KA. 1998. Modeltest: testing the model of
DNA substitution. Bioinformatics. 14:817–818.
Ramakrishna W, Dubcovsky J, Park YJ, Busso C, Embereton J,
SanMiguel P, Bennetzen JL. 2002. Different types and rates of
genome evolution detected by comparative sequence analysis
of orthologus segments from four cereal genomes. Genetics.
162:1389–1400.
Sanderson MJ. 2002. Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach.
Mol Biol Evol. 19:101–109.
Sawyer SA. 1989. Statistical test for determining gene conversion.
Mol Biol Evol. 6:526–538.
Sorrells ME, La Rota CM, Bermudez-Kandianis CE, et al. (35 coauthors). 2003. Comparative DNA sequence analysis of wheat
and rice genomes. Genome Res. 13:1818–1827.
Stein N, Feuillet C, Wicker T, Schlagenhauf E, Keller B. 2000.
Subgenome chromosome walking in wheat: a 450-kb physical
contig in Triticum monococcum L. spans the Lr10 resistance
locus in hexaploid wheat (Triticum aestivum L.). Proc Natl
Acad Sci USA. 97:13436–13441.
Swofford DL. 2003. PAUP*. Phylogenetic analysis using parsimony (*and other methods). Sunderland (MA): Sinauer Associates.
Walsh JB. 1995. How often do duplicated genes evolve new functions? Genetics. 139:421–428.
White SE, Habera LF, Wessler SR. 1994. Retrotransposons in the
flanking regions of normal plant genes—a role for copia-like
elements in the evolution of gene structure and expression.
Proc Natl Acad Sci USA. 91:11792–11796.
Wicker T, Guyot R, Yahiaoui N, Keller B. 2003. CACTA transposons in Triticeae. A diverse family of high-copy repetitive
elements. Plant Physiol. 132:52–63.
Wicker T, Yahiaoui N, Guyot R, Schlagenhauf E, Liu ZD,
Dubcovsky J, Keller B. 2003. Rapid genome divergence at
orthologous low molecular weight glutenin loci of the A
and Am genomes of wheat. Plant Cell. 15:1186–1197.
Zhang L, Lu HHS, Chung W-Y, Yang J, Li W-H. 2004. Patterns of
segmental duplications in the human genome. Mol Biol Evol.
22:135–141.
William Martin, Associate Editor
Accepted November 15, 2006