Evolution of the Australian Lungfish (Neoceratodus forsteri) Genome: A Major Role for CR1 and L2 LINE Elements Cushla J. Metcalfe,z,1 Jonathan Filée,1 Isabelle Germon,1 Jean Joss,2 and Didier Casane*,1 1 Laboratoire Evolution, Génomes et Spéciation, Centre National de la Recherche Scientifique, Gif-sur-Yvette, and Université Paris Diderot, Paris, France 2 Department of Biological Sciences, Macquarie University, New South Wales, Australia zPresent address: Universidade de São Paulo, Instituto de Biociências, Cidade Universitária, São Paulo, Brazil *Corresponding author: E-mail: [email protected]. Associate editor: Todd Oakley Haploid genomes greater than 25,000 Mb are rare, within the animals only the lungfish and some of the salamanders and crustaceans are known to have genomes this large. There is very little data on the structure of genomes this size. It is known, however, that for animal genomes up to 3,000 Mb, there is in general a good correlation between genome size and the percent of the genome composed of repetitive sequence and that this repetitive component is highly dynamic. In this study, we sampled the Australian lungfish genome using three mini-genomic libraries and found that with very little sequence, the results converged on an estimate of 40% of the genome being composed of recognizable transposable elements (TEs), chiefly from the CR1 and L2 long interspersed nuclear element clades. We further characterized the CR1 and L2 elements in the lungfish genome and show that although most CR1 elements probably represent recent amplifications, the L2 elements are more diverse and are more likely the result of a series of amplifications. We suggest that our sampling method has probably underestimated the recognizable TE content. However, on the basis of the most likely sources of error, we suggest that this very large genome is not largely composed of recently amplified, undetected TEs but may instead include a large component of older degenerate TEs. Based on these estimates, and on Thomson’s (Thomson K. 1972. An attempt to reconstruct evolutionary changes in the cellular DNA content of lungfish. J Exp Zool. 180:363–372) inference that in the lineage leading to the extant Australian lungfish, there was massive increase in genome size between 350 and 200 mya, after which the size of the genome changed little, we speculate that the very large Australian lungfish genome may be the result of a massive amplification of TEs followed by a long period with a very low rate of sequence removal and some ongoing TE activity. Key words: lungfish, genome size, transposable elements, CR1-like element. Introduction The lungfishes are likely the closest extant relatives to the tetrapods (Brinkmann et al. 2004). They are, therefore, important to our understanding of the evolutionary transition from water to air. The lungfishes are of additional interest because all extant lungfish have very large genomes: the smallest of them is that of Neoceratodus forsteri (the Australian lungfish), with a haploid genome of 50,000 Mb, and the largest is the enormous 130,000 Mb genome of Protopterus aethiopicus (the Marbled lungfish). Within the metazoans examined only some of the salamanders and amphipods have comparable sized genomes (>25,000 Mb) (Gregory 2010). Genome size is linked with features at the organismal level, such as cell size and cell division, metabolic rate, and developmental rate (Gregory 2005). Of particular interest is the link between developmental complexity and genome size, best described in the salamanders, where normal metamorphosis is associated with the smallest genomes, whereas obligate neotenic development is associated with the largest genomes (Gregory 2005). It has been hypothesized that extant lungfish are actually obligate neotenics (Joss 2006), based on deficiencies in the thyroid axis but entirely consistent with the very large genome sizes of extant lungfish. The lack of correlation between genome size and the number of genes it contains or the complexity of the organism in which it is found is known as the "C-value paradox" (Thomas 1971). This "paradox" can be answered at the most basic level by the observation that, above a genome size of 100 Mb, as genomes become larger the greater the contribution of transposable elements (TEs), relative to other sources of length variation (Kidwell 2002; Lynch and Conery 2003). For example, TEs make up 45% of the human genome (3,000 Mb) but only 2.7% of the pufferfish genome (330 Mb) (Hua-Van et al. 2011). In larger genomes, therefore, cellular genes are a tiny fraction of the genome, whereas TEs form the bulk. The variation in genome size is not a one-way street of increasing genome size by TE amplification but more likely an interplay between the rate of removal and addition of DNA. For example, in insects, the small Drosophila melanogaster genome (175 Mb) removes DNA 40 times faster than the 11 times larger genomes of the Laupala crickets (Petrov et al. 2000), whereas in plants, the size and frequency of deletions is greater in Arabidopsis (125 Mb) than in tobacco (5,100 Mb). TEs can impact the genome in many diverse ways, not only by affecting genome size but also by, for example, providing a template for recombination, by inserting within ß The Author 2012. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] Mol. Biol. Evol. 29(11):3529–3539 doi:10.1093/molbev/mss159 Advance Access publication June 24, 2012 3529 Downloaded from http://mbe.oxfordjournals.org/ at INIST-CNRS on November 6, 2012 Research article Abstract MBE Metcalfe et al. . doi:10.1093/molbev/mss159 3530 tested whether such a small amount of sequence could give a broad estimate of the TE composition of a genome by an in silico simulation of random sequencing in the wellcharacterized human genome. We compare our results for the Australian lungfish with those from the salamanders, which have comparable genome sizes (Sun et al. 2011). Materials and Methods Unless otherwise noted, all kits were used according to the manufacturer’s instructions. Sample Collection and DNA Extraction Animals were held at Macquarie University (Australia) under license no. 2009/039. Blood was taken using the protocol under license no. 2006/020, and DNA was extracted using the DNeasy Tissue Kit (Qiagen). Tissue samples were taken using the protocol under license no. 2006/020 and DNA extracted using either a standard phenol chloroform extraction method (Sambrook et al. 1989) or the High Pure PCR Template Preparation Kit (Roche). Construction and Sequencing of Mini-Genomic Libraries The composition of the genome was estimated by random sequencing of three mini-genomic libraries. For all three libraries, 6 g of gDNA was digested. The first library was constructed by full digest with PvuII and EcoRV (New England BioLabs) of gDNA extracted from tissue. The digested DNA was migrated on a gel and a smear between 10 and 1 kb excised and purified using the NucleoTraPCR kit (Macherey-Nagel). The other two libraries were constructed using partial restriction digests following the protocol from the Vosshall Laboratory (Rockefeller University) website, http://vosshall.rockefeller.edu/protocols.php. The first partial digest library was created using DNA extracted from blood and using AluI (New England Biolabs). The second partial digest library was constructed using DNA extracted from tissue and MboI (New England Biolabs). For both enzymes, a 1:20 dilution resulting in a smear between 10 and 1 kb was purified using the High Pure PCR Template Preparation Kit (Roche), precipitated using standard method and resuspended in 20 l 10 mM Tris. The 50 overhang of the MboI digest was blunted using T4 DNA polymerase (Expand Cloning kit from Roche). Adenine overhangs were created for all three digests using Go Taq Flexi DNA Polymerase (Promega) according to the Promega Subcloning Notebook protocol, purified using the High Pure PCR Template Preparation Kit (Roche) and precipitated using standard methods (Sambrook et al. 1989); 200–400 ng was used in a ligation reaction with pGEM-T Easy Vector (Promega). The insert length of a random sample was estimated by PCR, and inserts greater than 1 kb were sequenced (National Center for Biotechnology Information [NCBI] accession numbers JJ725300–JJ725335 for sequences from the AluI library, JJ725336–JJ725427 for sequences from the PvuII/EcoRV library, and HR872187– HR872226 for the sequences from the MboI library). A total Downloaded from http://mbe.oxfordjournals.org/ at INIST-CNRS on November 6, 2012 or near coding regions thereby disrupting gene expression or modifying the transcription of neighboring genes, or by providing sequence that is co-opted by the genome for essential functions (reviewed in Venner et al. 2009). An understanding of the dynamics between the host genome and TEs is, therefore, crucial to our understanding of genome structure and function. The timing of the expansion of the Australian lungfish genome may shed light on the dynamics of this genome, was it a single rapid expansion, and was it recent or very ancient, or were there a series of expansions and contractions? Based on the correlation between genome size and cell size in extant organisms (Gregory 2001), it is possible to estimate the genome size of extinct organisms using fossil cell size as proxies (Morris and Harper 1988; Organ et al. 2007). Thomson (1972) used the comprehensive lungfish fossil record to estimate the evolutionary history of lungfish genome sizes and inferred that genome size remained small and constant until 350 mya, rapidly increased between 350 and 200 mya, and then changed little in the lineage leading to the extant Australian lungfish. On the basis of these results of Thomson (1972) and their estimation that 0.05% of the Australian lungfish genome is recognizable TEs, Sirijovski et al. (2005) proposed that this genome is "a cemetery of TEs" resulting from an ancient burst of transposition followed by a long period of very low activity, resulting in massive amounts of unique sequences. On the assumption that any major bands from a restriction enzyme digest would be a major component of the genome, Sirijovski et al. (2005) sequenced the only two bands resulting from multiple restriction digests of Australian lungfish genomic DNA (gDNA). The chief component of both bands was fragments of a TE, a CR1 element, the copy number of which they estimated using quantitative polymerase chain reaction (qPCR). If the lungfish genome is not chiefly composed of TEs, could the large genome be the result of a polyploidy event? The lungfish karyotype (Rock et al. 1996) and a phylogeny of several Hox genes (Longhurst and Joss 1999) suggest that the genome has not undergone a recent polyploidy event. In addition, polyploidy does not necessarily result in large genomes, due to the phenomenon of genome downsizing (Ozkan et al. 2003; Leitch and Bennett 2004). The data of Sirijovski et al. (2005), however, are indirect evidence of how the Australian lungfish genome is organized. In addition, a qPCR, because it uses specific primers, may only identify a small proportion of the number of elements present. We, therefore, decided to revisit the question of the composition of the Australian lungfish genome. As a first estimate to determine what this genome is chiefly composed of, we randomly sequenced three mini-genomic libraries. With a very small amount of sequence, the results from all three libraries converged, i.e., a large percentage, between 36.7 and 42.3%, of the sequence data was recognizable TEs, between 49.8 and 69.1% of this from the closely related long interspersed nuclear element (LINE) CR1 and L2 clades. We further characterized these elements to obtain an almost full-length representative copy of each and confirm that the random sequences are CR1 and L2 LINE elements. We Evolution of the Size of the Australian Lungfish Genome . doi:10.1093/molbev/mss159 of 131.6 kb was sequenced. Sequences were examined using Chromatogram Explorer (Heracle Software) and trimmed to remove low-quality ends. MBE MEGA5 (Tamura et al. 2011) for two regions: 1) the contig created by BioEdit and the genome-walking sequences at each step and 2) the sequences in the overlapping regions between each step. In Silico Random Sequencing of Human Genome Sequence of the complete human genome (version GRCh37) was downloaded from the UCSC Genome Browser website (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/). A Perl script was written to extract 150 900 bp sequences at random. We also checked that the AluI, EcoRV, MboI, and PvuII restriction sites are frequent and quite evenly distributed throughout the human genome (data not shown). Sequence Analysis PCR Genome Walking Genome walking was performed using the GenomeWalker Universal Kit (Clontech). Specific primers (supplementary table S1, Supplementary Material online) were designed using Primer3Plus (Untergasser et al. 2007); 10 l PCR reactions were performed using Advantage Genomic LA Polymerase Mix (Clontech) according to the Clontech’s instructions. PCR reactions were visualized using gel electrophoresis, and single bands were excised and purified using NucleoSpin Extract II kit (Machery-Nagel). Purified DNA then cloned into pGEM-T Easy Vector (Promega). Eight clones were screened by PCR, two to four clones, except for one step for the L2 sequence, were sequenced at each step (NCBI accession numbers JF501664–JF501688 and JN935285– JN935284). For each set of genome-walked sequences and for the genome-walked fragments published by Sirijovski et al. (2005), a single contig was created using the CAP contig assembly program in BioEdit (Hall 1999) with a minimum base overlap of 20 and a minimum percent match of 85. These were named including the appropriate prefix from Wicker et al. (2007) RIJ_NfCR1_PE112GW, RIJ_NfCR1_PE114GW, and RIJ_NfL2_PE19GW, and for the genome-walked fragments published by Sirijovski et al. (2005), RIJ_NfCR1_GW. To check that the contig was representative of closely related CR1-like elements, the mean p distance was estimated using Degenerate primers were designed against amino acid motifs to amplify the endonuclease and reverse transcriptase domains for 20 CR1 (RIJ_NfCR1_F1 and F5 clones) and 9 L2 sequences (RIJ_NfL2_49A and L2_59A clones). Primer sequences are shown in supplementary table S1, Supplementary Material online. The fragments were amplified using DNA extracted from blood using the Advantage Genomic LA Polymerase Mix (Clontech). Ten RIJ_NfCR1_F1 and 10 RIJ_NfCR1_F5 and nine RIJ-NfL2_49A/59A clones were fully sequenced using internal primers and aligned with BioEdit (Hall 1999). NCBI accession numbers are JF501689– JF501708 and JN935276–JN935284. Phylogenetic Analysis of CR1 and L2 Sequences Ninety-four percent of the top hits of the mini-genomic libraries CR1 and L2 sequences were to sequences from other chordates, therefore the phylogenetic analysis was done with just chordate sequences. All CR1 and L2 nucleotide sequences plus those from the other clades within the Jockey group (Kapitonov et al. 2009) were retrieved from Repbase (Jurka et al. 2005). A single lungfish PCR-amplified CR1 and L2 sequence was also used as a separate tblastx query on the NCBI website. Incomplete sequences, that is, those that did not include sequence from domain II in the endonuclease domain to domain 8 in the reverse transcriptase domain were removed. For the remaining CR1 sequences only, sequences were grouped together according to which organism they came from and aligned using ClustalW within BioEdit (Hall 1999). For sequences with a pairwise percent identity > 95% at the nucleotide level, redundant sequences were removed, so that only one remained. For organisms where there were more than five sequences, a neighbor-joining (NJ) phylogeny was inferred, and three to five sequences representing their diversity were selected (supplementary fig. S1, Supplementary Material online). All sequences retrieved from the NCBI website and Repbase (Jurka et al. 2005) were submitted to RepeatMasker (Smit et al. 2010). Sequences were renamed according to the type of element as identified by RepeatMasker, a two-letter identifier for the species, and then the original sequence name or identifier from NCBI. The original CR1-like element from the Australian lungfish (Sirijovski et al. 2005) was named "NfCR1." The sequence used here is a contig using the genome-walked fragments and was renamed "CR1-NfCR1_GW" to distinguish it from the original element described. The lungfish CR1-like genome-walked contigs (RIJ_NfCR1_ PE112GW, RIJ_NfCR1_PE114GW and RIJ_NfCR1_GW, and RIJ_NfL2_PE19GW), PCR-amplified elements and the collection of elements from Repbase and NCBI were conceptually translated, aligned with MUSCLE (Edgar 2004), and manually 3531 Downloaded from http://mbe.oxfordjournals.org/ at INIST-CNRS on November 6, 2012 The 150 900 bp sequences from the human genome were analyzed only using RepeatMasker (Smit et al. 2010) against the human library. Lungfish sequences were submitted to CENSOR on the Repbase website (Kohany et al. 2006) using the forced translate and report simple repeats option. Sequences masked by CENSOR were screened for de novo repeats using RepeatModeler (Smit and Hubley 2010). CENSOR hits were reclassified by submitting them to RepeatMasker and totals calculated using an Excel spreadsheet. This was done because although a reclassification of some LINEs in the Repbase database has been published (Kapitonov et al. 2009), the actual database does not reflect it. CENSOR masked lungfish sequences were submitted to self-Blastn and a Blastx against nonredundant protein sequences (cutoff < e15) on the NCBI website (Sayers et al. 2011) in an attempt to find further repetitive sequences and "orphan" coding sequences, respectively. Amplification of Endonuclease and Reverse Transcriptase Domains MBE Metcalfe et al. . doi:10.1093/molbev/mss159 adjusted by eye using BioEdit (Hall 1999). All sequence apart from the endonuclease domains I–IX and the reverse transcriptase domains 1–8 were removed and the remaining sequence concatenated (supplementary fig. S2, Supplementary Material online). Using MEGA5 (Tamura et al. 2011), we estimated the optimal model of amino acid substitution. NJ and maximum likelihood (ML) trees were inferred using a JonesTaylor-Thornton (JTT) substitution matrix (Jones et al. 1992) with a gamma distribution of substitution rate variation across sites (optimal model: JTT + D with = 2.8). In both case, the robustness of the nodes was estimated by 1,000 bootstrap replicates. Phylogenetic Analysis of Random-Sequenced Fragments Results Estimated Composition of the Australian Lungfish (Neoceratodus fosteri) Genome Analysis of each of the three mini-genomic libraries resulted in a similar estimated composition of the Australian lungfish genome (table 1 and supplementary table S2, Supplementary Material online). Approximately 40% of the genome was estimated to be composed of repetitive sequences. Of these, the largest component was non–long terminal repeat (LTR) retrotransposons, 77% of the identified repeats. Within the non-LTR retrotransposons, the most highly represented superfamily was Jockey (the CR1, L2, and Rex-Babar clades). The CR1 and L2 clades alone were estimated to make up 22% of genome, that is, 56% of repetitive component (table 1 and supplementary table S2, Supplementary Material online). Within the LTR retrotransposons, the Dictyostelium intermediate repeat sequence (DIRS) superfamily is the most highly represented (11% of repeats). DNA transposons are the smallest TE component, (3% of repeats), whereas simple repeats were estimated to be 0.8% of the genome or 2% of repeats identified. No coding sequences, apart from those from TEs, were identified using Blastx (NCBI), leaving 60% of the sequences unidentifiable. 3532 Repeat class Length Satellite Low complexity 403 Simple repeat 642 Total satellite 1,045 Transposable element DNA transposon hAT 155 Academ 1,274 Harbinger 295 Total DNA 1,724 transposon LTR retrotransposon ERV 245 DIRS 5,836 Gypsy 1,390 Ngaro 2,086 Total LTR 9,557 retrotransposon Non-LTR retrotransposon HER1 57 CR1 20,035 L2 9,430 Tx1 2,963 RTE-X 2,179 Rex-babar 89 L1 4,379 Penelope 673 SINE 734 Total non-LTR 40,539 retrotransposon Total TEs 51,820 Total repeats 52,865 Total sequence 131,622 Fraction repeats identified (%) Fraction sampled sequence (%) Fraction of sampled sequences, min–max (%) 0.8 1.2 2.0 0.3 0.5 0.8 0.0–0.5 0.0–0.7 0.0–1.2 0.3 2.4 0.6 3.3 0.1 1.0 0.2 1.3 0.0–0.5 0.0–2.8 0.0–0.3 0.5–3.0 0.5 11.0 2.6 3.9 18.1 0.2 4.4 1.1 1.6 7.3 0.0–0.3 4.3–4.8 0.0–2.8 0.0–3.0 6.3–7.6 0.1 37.9 17.8 5.6 4.1 0.2 8.3 1.3 1.4 76.7 0.0 15.2 7.2 2.3 1.7 0.1 3.3 0.5 0.6 30.8 0.0–0.1 13.5–18.8 6.6–8.9 1.7–2.5 0.0–2.6 0.0–0.1 0.4–4.4 0.0–1.0 0.0–1.0 27.4–34.2 98.0 100.0 39.4 40.2 36.7–42.3 36.7–43.0 A total of 167 genomic fragments (131,622 bp) were sequenced. Min–Max: the lowest and highest percentage found in the three genomic libraries. Characterization of CR1 and L2 Elements by PCR Genome Walking Fourteen CR1 and L2 fragments were genome walked from the largest mini-genomic (PvuII/EcoRV) library, 8 of the 30 CR1 sequences and 6 of the 14 L2 sequences. We chose to genome walk not only the top scoring elements but also some more low scoring elements, in case the lungfish genome had unusual CR1 and L2 elements. After one to two genome walking steps, most of these elements were found to be nested or incomplete. For three sequences, two CR1 sequences and one L2 sequence, genome walking resulted in sequence that covered the open reading frame (ORF) 2 and the 30 -untranslated region (UTR) (fig. 1). The zinc finger/leucine zipper and an esterase domain (Kapitonov and Jurka 2003) in ORF 1 (ORF1) were also retrieved for the CR1 elements, but we were unable to find the 50 -end. The inverted repeat and 8-bp tandem Downloaded from http://mbe.oxfordjournals.org/ at INIST-CNRS on November 6, 2012 The random-sequenced fragments of CR1 elements were aligned against the three genome-walked contigs, RIJ_ NfCR1_PE112GW, RIJ_NfCR1_PE114GW, and RIJ_NfCR1_ GW. The random-sequenced fragments of L2 elements were aligned against the genome-walked contig RIJ_NfL2_ PE19GW, L2-Ambystoma [email protected]:4379248294, and Branchiostoma floridae@Crack-17_BF sequences. With the optimal model defined earlier, for each randomly sequenced fragment, a NJ and a ML tree (100 bootstrap replicates) was inferred with the sequence and the corresponding reference sequences using the "complete deletion" option in MEGA5 (Tamura et al. 2011). Each phylogenetic analysis has thus been performed using only the region in the reference sequences that is homologous to the random sequence. The trees were rooted at the midpoint. We counted how many random sequences were inside, and outside, the clade of the three reference sequences (supplementary fig. S3, Supplementary Material online). Table 1. Repetitive elements identified in the lungfish genome. MBE Evolution of the Size of the Australian Lungfish Genome . doi:10.1093/molbev/mss159 1kb A esterase zf/lz 5‘ UTR reverse transcriptase endonuclease ORF1 3‘ UTR ORF2 B i RIJ_NfCR1_PE114GW 83% 99% 97% clone 114 from 99% PvuII/EcoRV library 84% 78% 88% 99% 81% 98% 92% 99% 94% clone 112 from PvuII/EcoRV library 84% 98% 81% iii RIJ_NfCR1_GW 85% 83% 83% 81% 85% BglIII fragments 84% 84% EcoRI fragments 79% 82% iv RIJ_NfL2_PE19GW 89% 99% 88% 100%85% 94% 100% clone 19 from PvuII/EcoRV library 81% 98% FIG. 1. Representation of the structure of CR1 and L2 elements found in the Australian lungfish genome and diagrammatic summary of genome walking. (A) Structure of CR1 and L2 elements. Zf/lz = zinc finger/leucine zipper domain. Domains are indicated by gray shading. Sequence obtained by genome walking in outlined in black. We were unable to obtain the 50 -UTR for any elements, shown as a gray box with no outline. The region used for phylogenetic analysis is shown by dashed lines. Arrows indicate the position of PCR primers used to obtain this region. (B) Contigs of elements genome walked by us (i, ii, and iv) and Sirijovski et al. (2005) (iii). Double lines indicate sequence obtained by random sequencing (this article) or restriction digest (Sirijovski et al. 2005). Single lines indicate sequence obtained by genome walking. Percentages above the line indicate mean percent identity at the overlaps. Percentages below the line indicate mean percent identity between the contig created by the BioEdit CAP contig assembly program in BioEdit (Hall 1999) and the genome-walked fragments. repeats (Haas et al. 2001) were identified within the 30 -UTR of the CR1 elements by aligning with 30 -UTR regions from the Anolis, painted turtle and zebrafish (data kindly given by Andrew Shedlock) (data not shown). One of the CR1 elements, RIJ_NfCR1_PE114GW, had an additional 0.6 kb stretch of unidentifiable sequence between the zinc finger/leucine zipper and the esterase domain in ORF1. For the L2 sequence, a (TAAA)5 repeat was identified at the end of the 30 -UTR. Phylogenetic analysis of CR1 and L2 PCR Genome Walked and PCR-Amplified Sequences Phylogenetic relationships inferred using the ML and NJ methods based on the endonuclease and reverse transcriptase domains (supplementary fig. S2, Supplementary Material online) resulted in similar topologies (fig. 2). The CR1 sequences, as defined by RepeatMasker (Smit et al. 2010), formed a monophyletic group, with two main well-supported clades, the first with all the lungfish sequences and the second with most of the sequences from the birds, the turtles, and the lizard. With the lungfish clade, the genome-walked contigs, RIJ_NfCR1_PE114GW and RIJ_NfCR1_GW (Sirijovski et al. 2005), fell out with one set of PCR-amplified elements ("F5" elements), whereas RIJ_NfCR1_PE112GW fell out with the other set of PCR-amplified elements ("F1" elements). The L2 sequences, on the other hand, did not form a monophyletic group (fig. 2). However, sequences fell into 3533 Downloaded from http://mbe.oxfordjournals.org/ at INIST-CNRS on November 6, 2012 ii RIJ_NfCR1_PE112GW MBE Metcalfe et al. . doi:10.1093/molbev/mss159 A B Downloaded from http://mbe.oxfordjournals.org/ at INIST-CNRS on November 6, 2012 FIG. 2. (A) Neighbor-joining and (B) maximum likelihood phylogenies of Jockey elements. Both phylogenies are based on a concatenation of the endonuclease domains I–IX and the reverse transcriptase domains 1–8 (720 amino acids) and inferred using the JTT substitution matrix and a gamma distribution of the substitution rate across sites (JTT + with = 2.8). Robustness of the nodes was estimated by 1,000 bootstrap replications. Bootstrap values less than 70 are not shown. Colored shading indicates clades shared by both inferred phylogenies. The dashed line indicates the boundary between the CR1 and the L2 sequences. All nonlungfish sequences names are prefixed with the lineage name according to RepeatMasker (Smit et al. 2010) and an abbreviated species name. Abbreviations for species are as follows: Ac, Anolis carolinensis; Aj, Anguilla japonica; Am, Ambystoma mexicanum; Bf, Branchiostoma floridae; Bt, Bos taurus; Cf, Canis familiaris; Cm, Callorhinchus milii; Cp, Chrysemys picta; Dr, Danio rerio; Ga, Gallus gallus; Hf, Heterodontus francisci; Hs, Homo sapiens; Lm, Latimeria menadoensis; Md, Mus musculus domesticus; Oa, Ornithorhynchus anatinus; Ol, Oryzias latipes; Pf, Passeriformes; Pm, Petromyzon marinus; Ps, Platemys spixii; Ss, Sus scrofa; Tf, Takifugu rubripes; Xm, Xiphophorus maculatus; Xt, Xenopus tropicalis. 3534 Evolution of the Size of the Australian Lungfish Genome . doi:10.1093/molbev/mss159 the same six clades in both phylogenies, except for the L2-Bf@Crack-16_BF sequence. The first and second clades contained all sequences from the lungfish and the Mexican axolotl, respectively, the third most of the "Crack" sequences from the lancelet, and the fourth lancelet "Crack" and Danio rerio "CR1" sequences. The last two clades included a mixture of fish and shark sequences, the coelacanth sequence fell into one clade of these last two clades, whereas sequence from the platypus, eel, and shark fell into the second clade. The topology of the relationships between the L2 clades was not consistent between the two phylogenies, and most branches were poorly supported. Phylogenetic Analysis of Random-Sequenced Fragments In Silico Simulation of Random Sequencing of Human Genome The 150 900 bp sequences were classified as approximately 21% LINEs, chiefly L1, 9.6% short interspersed nuclear elements (SINEs), chiefly Alu, and 3.7% DNA transposons (supplementary table S3, Supplementary Material online). Discussion In most eukaryotes, the protein-coding non-TE sequences represent a small fraction of the genome, which is the most stable component of the genome, whereas the bulk of the genome is composed of repetitive DNA, often TEs, and is considered highly dynamic (Pritham 2009). In an examination of the lungfish genome, Sirijovski et al. (2005) used restriction digests of whole gDNA to isolate the single largest identifiable component, which was fragments of a CR1 element, and using qPCR determined that it constituted only 0.05% of the genome. Sirijovski et al. (2005) suggest that most of the genome is degenerate TEs, which implies that the enormous TE component of the genome is no longer dynamic but instead has become fossilized. We reapproached the question of genome composition in the Australia lungfish. We faced a number of challenges. Not only is the Australian lungfish genome very large but extant lungfish are also evolutionarily distant from other extant organisms, last sharing a common ancestor with the tetrapods and coelacanths approximately 400 mya (Blair and Hedges 2005). There is no database of repetitive elements from lungfish or a closely related organism. We constructed three mini-genomic libraries, one a full-digest and the other two partial digest libraries. Clones were randomly sequenced and analyzed. We used CENSOR on the Repbase website (Kohany et al. 2006) with the forced translate option because this method is more likely to identify related sequences when no repeat database for the organism in question is available. Results from the three libraries quickly converged, using only a small amount of sequence, on an estimate of 40% repetitive sequences, of which about half were CR1 and closely related L2 sequences (table 1 and supplementary table S2, Supplementary Material Online). We were unable to use RepeatModeler (Smit and Hubley 2010) to further identify repeats, because of the small number and length of the sequences. No further highly repetitive sequences were identified using a self-Blastn of masked sequences nor were any cellular genes or "orphan" TE coding sequences identified by a Blastx of masked sequences (Sayers et al. 2011). Given the size of the genome and our sample size, we would not have expected to identify cellular genes. The discrepancy in the percentage of the genome estimated to be CR1 elements by us and Sirijovski et al. (2005) is because a PCR-based method using specific primers may underestimate CR1 copy number. However, their restriction digest photograph does broadly support our results: on the agarose gel is 6 g of gDNA, 20% of 6 g is 1.2 g which would be clearly visible, whereas 0.05% is 3 ng, which would not. We tested whether such little sequence could give a broad estimate of the repetitive content of a genome by doing an in silico simulation of our random sequencing in the well-characterized human genome. To analyze this sequence, we used RepeatMasker (Smit et al. 2010) with the human library because the repetitive component of the human genome is well annotated. The results (supplementary table S3, Supplementary Material Online) were similar to published results for the whole genome. For example, in Lander et al. (2001), classes of interspersed repeats are reported as 21% LINEs, 13% SINEs, and 3% DNA transposons, compared with our estimate of 21% LINEs, 10% SINEs, and 4% DNA tranposons based on 135 kb of random sequence. We could not simulate our identification methods because there is no database of repetitive elements from lungfish, or from a closely related organism, but it does show that a broad estimate can be obtained using a sampling approach with very little sequence. Approximately 30% of the Australian lungfish random sequences were LINE (non-LTR) elements (table 1). These were reclassified into lineages using RepeatMasker (Smit et al. 2010), almost all of them were identified as CR1 and the closely related L2 elements. We further characterized these elements by PCR genome walking to obtain representative elements. LINEs are frequently 50 truncated (Luan et al. 1993; Wicker et al. 2005), and we were unable to get full-length copies for either type of element. In the case of CR1 elements, we obtained two sequences that lacked only the 50 -UTR, and for the L2 elements, we obtained one sequence which 3535 Downloaded from http://mbe.oxfordjournals.org/ at INIST-CNRS on November 6, 2012 The random sequences from the mini-genomic libraries were short (990 bp) and do not all cover the same region, so therefore could not be used in a single phylogenetic analysis. Forty-three random sequences were analyzed. The CR1 elements were analyzed against the three CR1 genome-walked contigs, whereas the L2 elements were analyzed against the L2 genome-walked contig and two sequences representing the most closely related clades to the PCR-amplified L2 elements, [email protected]:43792-48294, and L2-Bf@Crack-17_BF. Seventy-five percent of the CR1 random sequences (21/28) were found within the clade of genome-walked contigs (fig. 3), suggesting that the elements we genome walked are representative of the majority of CR1 elements found in the Australian lungfish genome, whereas only 30% of the L2 elements were more closely related to the L2 genome-walked contig than to the other 2 elements (fig. 3). MBE MBE Metcalfe et al. . doi:10.1093/molbev/mss159 A B RIJ_NfCR1_GW 1 13 (12) RIJ_NfCR1_PE114GW RIJ_NfL2_PE19GW 1 3 (2) L2-Branchiostoma@Crack-17_BF 2 2 (3) 3 2 (2) 2 3 2 (2) 4 3 (3) [email protected]:43792-48294 4 2 (2) 5 7 (7) 5 3 (4) RIJ_NfCR1_PE112GW 1 (1) FIG. 3. Maximum likelihood and neighbor-joining phylogenies of elements from random sequencing of mini-genomic libraries against genome-walked elements. Most of the phylogenies had identical topologies with both methods. RIJ_NfCR1_PE112GW, RIJ_NfCR1_PE114GW, and RIJ_NfL2_PE19GW are from this article, and RIJ-NfCR_GW is a contig generated from the fragments published by Sirijovski et al. (2005). Dashed lines indicate the position of randomly sequenced fragments. The number in the circle is the group number for random sequences at that position. The number to the right of the number in the circle is the number of randomly sequenced fragments in that group: those not in brackets are inferred from maximum likelihood phylogenies and those in brackets from neighbor-joining phylogenies. 3536 sequence, but we were unable to classify 60% of the lungfish sequences. However, TE diversity is highly variable from one species to another (Hua-Van et al. 2011), so it is difficult to predict what the profile of the "missing" 60% of the sequences may be. There are several possibilities, some that we can rule out, but many that we cannot. There are two reasons why it is not likely to be a single conserved highly repetitive component. First, from multiple restriction digests, Sirijovski et al. (2005) identified just two bands, both of which contained chiefly CR1 fragments. Second, a self-Blastn of the masked random sequences did not identify any highly repetitive sequences. It is, therefore, more likely that we have underestimated the percent of the genome composed of TEs and at least part of the "missing" 60% of the genome is composed of diverse TEs components. These could include a previously unidentified type of TE. A blastx of masked sequences was used to search for "orphan" TE coding regions, such as the conserved reverse transcriptase region (Xiong and Eickbush 1990). None were identified. However, if the sequence was not from a highly conserved region, such as a gag domain or a currently unknown domain, it may not have been retrieved. Some repetitive elements, particularly those with no known coding region, are difficult to identify by homology-based methods if a database of closely related sequences is not available. For example, CACTA elements in wheat can be composed entirely of various repeats flanked by short terminal repeats and be over 5 kb (Wicker et al. 2003). Similarly, SINEs and miniature inverted repeat transposable elements can be difficult to identify. Some regions within TEs also would not be detected, such as large long terminal repeats, or spacer regions, leading to an underestimate of the space that the TE occupies. Finally, TE sequences that are highly divergent, in particular older, degenerate copies, would not have been identified. Using next-generation sequencing, Sun et al. (2011) examined the genomes of six salamanders with haploid genome sizes ranging from 15,000 to 47,000 Mb, the larger genomes being almost as large as the Australian lungfish genome. They identified 25–47% of the sequence as TEs, the most abundant elements being those from the LTR Gypsy superfamily, in contrast to both the lungfish and other examined animal Downloaded from http://mbe.oxfordjournals.org/ at INIST-CNRS on November 6, 2012 lacked the ORF1. A phylogenetic analysis of PCR-amplified elements and genome-walked elements confirmed that they are CR1 and L2 elements (fig. 2). Phylogenetic analyses showed that although the CR1 genome-walked elements are representative of the CR1 sequences in the lungfish genome, the L2 sequences are more diverse (fig. 3). Our results suggest, therefore, that the lungfish genome is 22% CR1 and L2 elements. The majority of identified CR1 elements formed a monophyletic group and may be the result of an amplification burst specific to the lungfish lineage. Most of the L2 elements, on the other hand, do not form a monophyletic lineage and are probably the result of a series of bursts (fig. 3). LINEs (non-LTR elements) are the predominant TE in most of the animal genomes examined (Wicker et al. 2007). The percentage (30) of lungfish sequence composed of LINE elements, despite the much larger size of the genome, is comparable with that found in other vertebrates, for example, humans (21%) (Lander et al. 2001), monodelphis (29%) (Gentles et al. 2007), and the platypus (21%) (Warren et al. 2008). LINE clades, however, show varying success within vertebrate lineages. In the monotreme (platypus), L2 elements predominate (Warren et al. 2008), whereas in marsupials (monodelphis) and in eutherians, it is L1 elements that have been more successful (Li et al. 2001; Gibbs et al. 2004; Gentles et al. 2007). In nonavian reptiles and in the chicken, CR1 elements dominate, they make up a huge 81% of repeats in the alligator and 71% in the anole (Shedlock et al. 2007). On the basis of previous data and large-scale sequencing of genomic clones of a turtle, alligator, and lizard, Shedlock et al. (2007) proposed that the repetitive component of the amniote ancestor genome was CR1 dominated. The lungfish and coelacanths are considered to be the extant species most closely related to tetrapods (Zardoya and Meyer 1996). If the unidentified component of the lungfish genome comprised diverse elements, as our data suggest, and the single largest component is CR1 elements, our data suggest that CR1 elements predominated in the sarcopterygian ancestor genome. An extension of the correlation between genome size and the percentage of the genome composed of TEs (Kidwell 2002, Lynch and Conery 2003) would predict that this genome would be almost entirely composed of repetitive MBE Evolution of the Size of the Australian Lungfish Genome . doi:10.1093/molbev/mss159 B % sequence identified as TEs 90 3 70 2 50 1 30 A 10 0 0 90 3 70 2 50 1 0.5 1.0 1.5 2.0 2.5 3.0 3.5 plants animals salamanders Australian lungfish 30 10 0 10 20 30 40 50 60 haploid genome size (Gb) FIG. 4. The relationship between the percentage of sequence identified as transposable elements and genome size. Data are from Hua-Van et al. (2011), Sun et al. (2011), de Koning et al. (2011), and this study. (A) All genomes are shown. (B) Genomes < 3.5 Gb are shown (shaded in gray in A). 1, Homo sapiens (Hua-Van et al. 2011); 2, Homo sapiens (de Koning et al. 2011); and 3, Zea mays (Hua-Van et al. 2011). Other taxa shown are Arabiodopsis thaliana, Oryza sativa, Vitis vinifera, Sorghum bicolor, Caenorhabditis elegans, Drosophila ananassae, Drosophila melanogaster, Fugu rubripes, Branchiostoma floridae, Gallus gallus, Mus musculus, Desmognathus ochrophaeus, Eurycea tynerensis, Batrachoseps nigriventris, Aneides flavipunctatus, Bolitoglossa occidentalis, Bolitoglossa rostrata, and Neoceratodus forsteri. genomes, in which non-LTR retrotransposons are more prevalent (Sun et al. 2011). We have plotted genome size versus percent TEs identified for plant and animal genomes, including our data for the Australian lungfish (fig. 4). There is a welldescribed correlation between genome size and percent TEs identified in plant and animal genomes (Kidwell 2002), and this correlation seems to hit a "ceiling" of 50% TEs in the animal genomes examined greater than 3,500 Mb (fig. 4A). The height of this ceiling is almost certainly an artifact resulting from an underestimation of percent TEs in the salamander and Australian lungfish genomes. However, the types of TEs or TE regions that are most difficult to identify, such as SINEs or noncoding spaces, are unlikely, based on estimates of TE composition in other well-characterized genomes, to comprise the remaining approximately 50% of the genome. A recent publication (de Koning et al. 2011) examining TE content in the human genome may shed light on the "missing" 50% of the large metazoan genomes examined by Sun et al. (2011) and us. Both the human and the maize genomes are 3,000 Mb, but the maize genome is reported as 85% TEs and the human genome as 45% (Hua-Van et al. 2011). de Koning et al. (2011), using a novel method, estimated that the human genome repetitive fraction is at least 66–69%, chiefly TEs (fig. 4B), which is much closer to estimate for the maize genome. de Koning et al. (2011) attribute the differences in estimated TE content in the human genome to the ability of their approach to better detect short sequences and sequences from older and more diverse TE families. This finding supports the indication that the underestimation of TE content in the salamander and lungfish genomes is due to the presence of undetected older and diverse TEs. In conclusion, there are several lines of evidence which can shed light on the evolution of this very large genome: 1) Thomson’s (1972) inference that in the lineage leading to the extant Australian lungfish, there was a massive increase in genome size between 350 and 200 mya, after which the size of the genome changed little, 2) our estimate of 40% of the Australian lungfish genome being recognizable TEs, and 3) identification of previously unidentified older diverse TE families by de Koning et al. (2011) in the human genome. We make two propositions. First, that in this very large genome, a portion of the genome, 40%, is the result of recent amplifications, largely CR1 and L2 elements. Second, the high percentage of the genome we were unable to identify is likely to be chiefly older, degenerate TEs resulting from ancient amplification bursts. We, therefore, speculate that the very large lungfish genomes may be the result of a massive amplification of TEs followed by a long period with a very low rate of sequence removal and some ongoing TE activity. Future work on a very much larger scale using full sequence of bacterial artificial chromosomes to build a database of lungfish-specific repeats and to examine the structure of a sample of the genome, combined with next-generation sequencing and improved TE identification algorithms and pipelines should allow us to develop an improved picture of this extraordinary genome. Supplementary Material Supplementary figures S1–S3 and tables S1–S3 are available at Molecular Biology and Evolution online (http://www.mbe .oxfordjournals.org/). 3537 Downloaded from http://mbe.oxfordjournals.org/ at INIST-CNRS on November 6, 2012 0 Metcalfe et al. . doi:10.1093/molbev/mss159 Acknowledgments The authors thank David Ogereau, Jean-Luc Da Lage, and Gaëlle Claisse for technical help. They also thank Aurelie Hua-Van for helpful comments and two anonymous reviewers for their comments. They also thank the staff at the Fauna Park, Macquarie University, Sydney, Australia for care of the lungfish. A very special thanks to Marie-Louise Cariou for help with funding. This work was supported by Centre National de la Recherche Scientifique under the program "Action Thématique Incitative sur Programme" awarded to Didier Casane from 2006 to 2009. References 3538 Kapitonov VV, Tempel S, Jurka J. 2009. Simple and fast classification of non-LTR retrotransposons based on phylogeny of their RT domain protein sequences. Gene 448:207–213. Kidwell MG. 2002. Transposable elements and the evolution of genome size in eukaryotes. Genetica 115:49–63. Kohany O, Gentles AJ, Hankus L, Jurka J. 2006. Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and censor. BMC Bioinformatics. 7:474. Lander ES, Linton LM, Birren B, et al. (2753 co-authors). 2001. Initial sequencing and analysis of the human genome. Nature 409:860–921. Leitch I, Bennett M. 2004. Genome downsizing in polyploid plants. Biol J Linn Soc Lond. 82:651–663. Li WH, Gu Z, Wang H, Nekrutenko A. 2001. Evolutionary analyses of the human genome. Nature 409:847–849. Longhurst TJ, Joss JM. 1999. Homeobox genes in the Australian lungfish, Neoceratodus forsteri. J Exp Zool B Mol Dev Evol. 285:140–145. Luan DD, Korman MH, Jakubczak JL, Eickbush TH. 1993. Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell 72: 595–605. Lynch M, Conery JS. 2003. The origins of genome complexity. Science 302:1401–1404. Morris SC, Harper E. 1988. Genome size in conodonts (chordata): inferred variations during 270 million years. Science 241:1230–1232. Organ CL, Shedlock AM, Meade A, Pagel M, Edwards SV. 2007. Origin of avian genome size and structure in non-avian dinosaurs. Nature 446: 180–184. Ozkan H, Tuna M, Aramuganathan K. 2003. Nonadditive changes in genome size during allopolyploidization in the wheat (AegilopsTriticum) group. J Hered. 94:260–264. Petrov D, Sangster TA, Johnston JS, Hartl DL, Shaw KL. 2000. Evidence for DNA loss as a determinant of genome size. Science 287: 1060–1062. Pritham EJ. 2009. Transposable elements and factors influencing their success in eukaryotes. J Hered. 100:648–655. Rock J, Eldridge M, Champion A, Johnston P, Joss J. 1996. Karyotype and nuclear DNA content of the Australian lungfish, Neoceratodus forsteri (Ceratodidae: Dipnoi). Cytogenet Cell Genet. 73:187–189. Sambrook J, Fritsch EF, Maniatis T. 1989. Molecular cloning: a laboratory manual. In: Irwin N, Ford N, Nolan C, Ferguson M, editors. Molecular cloning: a laboratory manual, 2nd ed. New York: Cold Spring Harbor Laboratory Press. Sayers EW, Barrett T, Benson DA, et al. (42 co-authors). 2011. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 39:D38–D51. Shedlock AM, Botka CW, Zhao S, Shetty J, Zhang T, Liu JS, Deschavanne PJ, Edwards SV. 2007. Phylogenomics of nonavian reptiles and the structure of the ancestral amniote genome. Proc Natl Acad Sci U S A. 104:2767–2772. Sirijovski N, Woolnough C, Rock J, Joss JM. 2005. NfCR1, the first non-LTR retrotransposon characterized in the Australian lungfish genome, Neoceratodus forsteri, shows similarities to CR1-like elements. J Exp Zool B Mol Dev Evol. 304:40–49. Smit A, Hubley R. 2010. RepeatModeler Open-1.0. Available from: http:// www.repeatmasker.org. Smit A, Hubley R, Green P. 2010. RepeatMasker Open-3.0. Available from: http://www.repeatmasker.org. Sun C, Shepard DB, Chong RA, Lopez-Arriaza J, Hall K, Castoe TA, Feschotte C, Pollock DD, Mueller RL. 2011. LTR retrotransposons Downloaded from http://mbe.oxfordjournals.org/ at INIST-CNRS on November 6, 2012 Blair JE, Hedges SB. 2005. Molecular phylogeny and divergence times of deuterostome animals. Mol Biol Evol. 22:2275–2284. Brinkmann H, Venkatesh B, Brenner S, Meyer A. 2004. Nuclear protein-coding genes support lungfish and not the coelacanth as the closest living relatives of land vertebrates. Proc Natl Acad Sci U S A. 101:4900–4905. de Koning AP, Gu W, Castoe TA, Batzer MA, Pollock DD. 2011. Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet. 7(12): e1002384. Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32:1792–1797. Gentles AJ, Wakefield MJ, Kohany O, Gu W, Batzer MA, Pollock DD, Jurka J. 2007. Evolutionary dynamics of transposable elements in the short-tailed opossum Monodelphis domestica. Genome Res. 17: 992–1004. Gibbs RA, Weinstock GM, Metzker ML, et al. (230 co-authors). 2004. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428:493–521. Gregory TR. 2001. Coincidence, coevolution, or causation? DNA content, cell size, and the C-value enigma. Biol Rev Camb Philos Soc. 76: 65–101. Gregory TR. 2005. Genome size evolution in animals. In: Gregory RT, editor. The evolution of the genome. San Diego (CA): Elsevier. p. 3–87. Gregory TR. 2010. Animal genome size database. Available from: http:// www.genomesize.com. Haas NB, Grabowski JM, North J, Moran JV, Kazazian HH, Burch JB. 2001. Subfamilies of CR1 non-LTR retrotransposons have different 5’UTR sequences but are otherwise conserved. Gene 265:175–183. Hall TA. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser. 41:95–98. Hua-Van A, Le Rouzic A, Boutin TS, Filée J, Capy P. 2011. The struggle for life of the genome’s selfish architects. Biol Direct. 6:19. Jones DT, Taylor WR, Thornton JM. 1992. The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci. 8: 275–282. Joss JM. 2006. Lungfish evolution and development. Gen Comp Endocrinol. 148:285–289. Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O. 2005. Repbase update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 110:462–467. Kapitonov VV, Jurka J. 2003. The esterase and PHD domains in CR1-Like Non-LTR retrotransposons. Mol Biol Evol. 20:38–46. MBE Evolution of the Size of the Australian Lungfish Genome . doi:10.1093/molbev/mss159 contribute to genomic gigantism in plethodontid salamanders. Genome Biol Evol. 4:168–183. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. 2011. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 28:2731–2739. Thomas CA. 1971. The genetic organization of chromosomes. Annu Rev Genet. 5:237–256. Thomson K. 1972. An attempt to reconstruct evolutionary changes in the cellular DNA content of lungfish. J Exp Zool. 180:363–372. Untergasser A, Nijveen H, Rao X, Bisseling T, Geurts R, Leunissen JA. 2007. Primer3Plus, an enhanced web interface to Primer3. Nucleic Acids Res. 35:W71–W74. Venner S, Feschotte C, Biémont C. 2009. Dynamics of transposable elements: towards a community ecology of the genome. Trends Genet. 25:317–323. MBE Warren WC, Hillier LW, Marshall Graves JA, et al. (102 co-authors). 2008. Genome analysis of the platypus reveals unique signatures of evolution. Nature 453:175–183. Wicker T, Guyot R, Yahiaoui N, Keller B. 2003. CACTA transposons in Triticeae. A diverse family of high-copy repetitive elements. Plant Physiol. 132:52–63. Wicker T, Robertson JS, Schulze SR, et al. (11 co-authors). 2005. The repetitive landscape of the chicken genome. Genome Res. 15:126–136. Wicker T, Sabot F, Hua-Van A, et al. (13 co-authors). 2007. A unified classification system for eukaryotic transposable elements. Nat Rev Genet. 8:973–982. Xiong Y, Eickbush TH. 1990. Origin and evolution of retroelements based their reverse transcriptase sequences. EMBO J. 9:3353–3362. Zardoya R, Meyer A. 1996. Evolutionary relationships of the coelacanth, lungfishes, and tetrapods based on the 28S ribosomal RNA gene. Proc Natl Acad Sci U S A. 93:5449–5454. Downloaded from http://mbe.oxfordjournals.org/ at INIST-CNRS on November 6, 2012 3539
© Copyright 2026 Paperzz