Published February 26, 2016 o r i g i n a l r es e a r c h Exploiting the Repetitive Fraction of the Wheat Genome for High-Throughput Single-Nucleotide Polymorphism Discovery and Genotyping Nelly Cubizolles, Elodie Rey, Frédéric Choulet, Hélène Rimbert, Christel Laugier, François Balfourier, Jacques Bordes, Charles Poncet, Peter Jack, Chris James, Jan Gielen, Odile Argillier, Jean-Pierre Jaubertie, Jérôme Auzanneau, Antje Rohde, Pieter B.F. Ouwerkerk, Viktor Korzun, Sonja Kollers, Laurent Guerreiro, Delphine Hourcade, Olivier Robert, Pierre Devaux, Anna-Maria Mastrangelo, Catherine Feuillet, Pierre Sourdille, and Etienne Paux* Abstract Transposable elements (TEs) account for more than 80% of the wheat genome. Although they represent a major obstacle for genomic studies, TEs are also a source of polymorphism and consequently of molecular markers such as insertion site-based polymorphism (ISBP) markers. Insertion site-based polymorphisms have been found to be a great source of genome-specific singlenucleotide polymorphism (SNPs) in the hexaploid wheat (Triticum aestivum L.) genome. Here, we report on the development of a high-throughput SNP discovery approach based on sequence capture of ISBP markers. By applying this approach to the reference sequence of chromosome 3B from hexaploid wheat, we designed 39,077 SNPs that are evenly distributed along the chromosome. We demonstrate that these SNPs can be efficiently scored with the KASPar (Kompetitive allele-specific polymerase chain reaction) genotyping technology. Finally, through genetic diversity and genome-wide association studies, we also demonstrate that ISBP-derived SNPs can be used in marker-assisted breeding programs. Published in The Plant Genome 9 doi: 10.3835/plantgenome2015.09.0078 © Crop Science Society of America 5585 Guilford Rd., Madison, WI 53711 USA This is an open access article distributed under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). the pl ant genome m arch 2016 vol . 9, no . 1 N. Cubizolles, E. Rey, F. Choulet, H. Rimbert, C. Laugier, F. Balfourier, J. Bordes, C. Poncet, C. Feuillet, P. Sourdille, and E. Paux, Institut National de la Recherche Agronomique , Joint Research Unit 1095 Génétique, Diversité et Ecophysiologie des Céréales, 5 Chemin de Beaulieu, F-63039 Clermont-Ferrand, France; N. Cubizolles, E. Rey, F. Choulet, H. Rimbert, C. Laugier, F. Balfourier, J. Bordes, C. Poncet, C. Feuillet, P. Sourdille, and E. Paux, Université Blaise Pascal, UMR 1095 Génétique, Diversité et Ecophysiologie des Céréales, 5 Chemin de Beaulieu, F-63177 Aubière Cedex, France; P. Jack and C. James, RAGT Seeds Ltd., Grange Rd., Ickleton, Essex, CB10 1TA, UK; J. Gielen and O. Argillier, Syngenta France, 12 Chemin de l’Hobit, F-31790 Saint Sauveur, France; J.-P. Jaubertie and J. Auzanneau, Agri-Obtentions, Chemin de la Petite Minière, F-78280 Guyancourt, France; A. Rohde and P.B.F. Ouwerkerk, B. CropScience NV, Innovation Center Technologiepark 38, B-9052 Ghent, Belgium; V. Korzun and S. Kollers, KWS LOCHOW GMBH, Ferdinand-von-LochowStraße 5, D-29303 Bergen, Germany; L. Guerreiro, ARVALIS – Institut du Végétal, Station Expérimentale, F-91720 Boigneville, France; D. Hourcade, ARVALIS – Institut du Végétal, 6 Chemin de la Côté Vieille, F-31450 Baziège, France. Olivier Robert and Pierre Devaux, Florimond Desprez, BP 41, F-59242 Cappelle-en-Pévèle, France; A.-M. Mastrangelo, Council for Agricultural Research and Economics – Cereal Research Centre, S.S. 673 km 25.200, I-71122 Foggia, Italy; N. Cubizolles and C. Laugier, current address: Biogemma, Route d’Ennezat, F-63720 Chappes, France; E. Rey, current address: Institute of Experimental Botany, Centre of the Region Haná for Biotechnological and Agricultural Research, Šlechtitelů 31, CZ-78371 Olomouc, Czech Republic; L. Guerreiro, current address: RAGT 2n, Rue Emile Singla, BP 3331, F-12033 Rodez, France; and C. Feuillet, current address: Bayer CropScience, 3500 Paramount Pkwy., Morrisville, NC 27560, USA. Received 1 Sept. 2015. Accepted 20 Oct. 2015. *Corresponding author ([email protected]). Abbreviations: DArT, diversity array technology; GC, guanine– cytosine; INRA, Institut National de la Recherche Agronomique (French National Institute for Agricultural Research); ISBP, insertion site-based polymorphism; ISBP-SNP, ISBP-derived single-nucleotide polymorphism; KASPar, Kompetitive allele-specific polymerase chain reaction; LOD, logarithm of the odds; PCR, polymerase chain reaction; PIC, polymorphism index content; SNP, single-nucleotide polymorphism; SSR, simple sequence repeats; TE, transposable elements. 1 of 11 I decade, SNPs have emerged as the marker of choice in genetics because of their abundance and high-throughput detection capacities. Single-nucleotide polymorphisms are commonly used for a wide range of applications in plants, including generation of high-density genetic maps, physical map or sequence anchoring and ordering, quantitative trait locus mapping, genomewide association studies, map-based gene cloning, characterization of genetic resources, and marker-assisted breeding and genomic selection (Ganal et al., 2009, 2012). Unlike other types of markers such as simple sequence repeats (SSRs) or diversity array technology (DArTs), SNPs require the comparison of homologous sequences between genotypes to identify variations at the sequence level. The recent advent of next-generation sequencing systems (Liu et al., 2012) has opened the way to wholegenome resequencing of several plant species for SNP discovery or direct genotyping. Although this approach is affordable for plant with small to moderate genome sizes such as Arabidopsis thaliana L. Heyhn. (Cao et al., 2011), rice (Oryza sativa L.) (Xu et al., 2012), or soybean [Glycine max (L.) Merr.] (Lam et al., 2010), whole genome resequencing remains too expensive for plant with large genomes such as bread wheat (17,000 Mb; International Wheat Genome Sequencing Consortium, 2014) or loblolly pine (Pinus taeda L.) (22,000 Mb; Zimin et al., 2014) and therefore cannot be applied to a large set of lines. For such species, complexity reduction approaches can be an alternative for efficiently and reproducibly sampling the same sequence fraction between genotypes (Paux et al., 2011). Two types of approaches, “random” and “targeted”, can be used to reduce genome complexity. Random approaches include transcriptome resequencing, methylfiltration, renaturation-based (high-Cot) filtration or restriction enzyme-based techniques such as complexity reduction of polymorphic sequences. Targeted approaches include polymerase chain reaction (PCR) amplification of similar targets in different cultivars or hybridization using oligonucleotide capture technologies. Most of these approaches rely on the filtration of the repetitive fraction of the genome to reduce complexity. Even whole-genome resequencing approaches tend to focus on low-copy DNA regions as repeat-originating reads can hardly be mapped to their correct locus in the genome. However, because they are less subjected to selection pressure, TEs can be a large source of polymorphism. In wheat, TEs account for more than 80% of the genome size (Choulet et al., 2010; Wicker et al., 2011). Over the past few years, tremendous efforts in SNP discovery have been made in wheat, mainly targeting the coding fraction of the genome. Various approaches have been used, including cDNA sequencing (Allen et al., 2011), targeted resequencing of the wheat exome (Allen et al., 2013; Winfield et al., 2012), transcriptome sequencing (Lai et al., 2012), and whole-genome resequencing (Lai et al., 2015; International Wheat Genome Sequencing Consortium, 2014). These projects have led to the discovery of several millions of SNPs. Concomitant to SNP discovery, n the past 2 of 11 several technologies have been implemented for SNP genotyping in hexaploid wheat, including KASPar technology (Allen et al., 2011, 2013;), Illumina GoldenGate (Akhunov et al., 2009), and Illumina iSelect (Cavanagh et al., 2013; Wang et al., 2014). However, the allohexaploid nature of the wheat genome, combined with its high level of gene duplication (Choulet et al., 2014), poses significant challenges (Akhunov et al., 2009). Indeed, during the SNP discovery process, true allelic SNPs have to be discriminated from homoeologous and paralogous SNPs. In addition, the presence of homoeologous or paralogous loci creates a shift of genotype cluster plots compared to their position in diploid species (Wang et al., 2014). Access to unique loci might help to circumvent these challenges, both in terms of SNP discovery and SNP genotyping. Insertion site-based polymorphism markers have been shown to be a very efficient and straightforward way to design unique genome-specific markers (Paux et al., 2006). Insertion site-based polymorphism markers were initially described as PCR-based markers for the amplification of a DNA fragment spanning a junction between a TE and its flanking sequence (another TE or a low-copy sequence). They represent a way to access the repetitive fraction of the wheat genome that has not been exploited for SNP discovery so far. Indeed, we recently demonstrated the potential of ISBP markers to saturate the wheat genome efficiently and to be used in marker-assisted selection (Paux et al., 2010). With average densities of one ISBP per 5.4 kb in the hexaploid wheat genome and one SNP every ~100 bp in ISBPs, we estimated that about six million genome-specific SNPs could be designed from ISBPs in the whole wheat genome. However, the full potential of ISBPs for SNP discovery can be achieved only with high-throughput technologies that allow for the efficient resequencing of several thousand ISBPs in different cultivars. So far, SNP discovery within ISBPs has been done through PCR amplification, which is not compatible with high-throughput approaches. Here, we report on the development of a high throughput ISBP-derived SNP (ISBP-SNP) discovery approach. We also demonstrate the efficiency of the combination of ISBP-SNP and KASPar genotyping and its potential application to wheat breeding. Material and Methods Plant Material and DNA Extraction Ten wheat cultivars and lines were selected for SNP discovery: ‘Apache’, ‘Autan’, ‘Aztec’, ‘Cezanne’, ‘Chinese Spring’, ‘INRA3’, ‘Premio’, ‘Renan’, ‘Robigus’, and ‘Xi19’. The elite panel consisted of a set of 93 European elite varieties, which had mainly been registered during the last decades in France, the United Kingdom, and Germany (Supplemental Table S1). The Institut National de la Recherche Agronomique (INRA; French National Institute for Agricultural Research) worldwide bread wheat core collection is made up of 367 accessions that capture more than 98% the pl ant genome m arch 2016 vol . 9, no . 1 of the total allelic diversity observed at 38 polymorphic SSR loci in a sample of 4000 bread wheat accessions (Balfourier et al., 2007) (Supplemental Table S1). Accessions within the latter collection originate from 70 different countries and include both landraces from the 19th century and old or modern cultivars from the 20th century. DNA was extracted as described in Graner et al. (1990). IsbpProbeDesign The IsbpProbeDesign program is written in Perl. It uses data generated by IsbpFinder (Paux et al., 2010) and retrieves only the ISBP markers that correspond to a junction between a TE and a low-copy sequence. It then uses Tallymer (Kurtz et al., 2008) to mask k-mers that are repeated x times or more at the whole genome level. Finally, it extracts subsequences of unmasked nucleotides in the low-copy part of the ISBP marker flanking the TE sequence. Parameters such as k-mer length, number of k-mer occurrences in the Tallymer index, and subsequence length are customizable. In this study, the Tallymer index was computed on a 1× genome coverage sampled in the reads produced on the cultivar Chinese Spring (Brenchley et al., 2012). K-mer length was set up to 20, the number of occurrences to 10, and the subsequence length to 120 nt. IsbpProbeDesign is freely available on request. Insertion Site-based Polymorphism Sequence Capture Sequence capture was performed using the SureSelect XT Target Enrichment system for Illumina Paired-end Sequencing Library (Agilent, Santa Clara, CA). A total of 52,265 120-nt baits were designed to target 6272 kb of sequence. For each wheat accession, 3 µg of genomic DNA dissolved in 100 mL of 1× TE buffer were fragmented to an average size of 150 to 200 bp on a Covaris S2 (Covaris Inc., Woburn, MA) using the following parameters: duty factor, 10%; intensity, 5; cycles per burst, 200; time, 180s; set mode frequency sweeping; temperature, 4°C. Fragment sizes were checked on 2% agarose gel. The following steps were performed according to the standard Agilent protocol. Libraries were tagged during the post-capture ligation-mediated PCR step, using Index 2, 4, 5, 6 or 7 (two samples per index). Following DNA quantitation, two equimolar pools of five samples carrying five different index sequences were prepared. Each pool was sequenced in paired-end 2× 100-bp reads on an Illumina HiSeq2000 lane (Illumina Inc., San Diego, CA). Sequence Alignment and SNP Discovery Read mapping was performed using the Mosaik assembler (Strömberg, 2009). MosaikBuild was used with FASTQ mate files for each accession. A multi-FASTA file containing 52,265 ISBP sequences was used as a reference. Reads were aligned with MosaikAligner using the following criteria: maximum number of mismatches, 3; alignment candidate threshold, 60; minimum percentage of the read length that should be aligned, 0.6; use of aligned read length when counting errors. MosaikSort, MosaikMerge, and MosaikAssembler were then used to generate a single assembly file. This file was then submitted to the GigaBayes package (Hillier et al., 2008; Marth et al., 1999) for SNP discovery using the following parameters: ploidy, diploid; sample, multiple; SNP lower probability threshold for reporting polymorphism candidates, 1; output detail level for candidate polymorphism report, 4; minimum total read coverage for position to be considered, 20. GigaBayes results were then processed with a Perl script to classify SNPs into four different classes. These classes were based on the homozygous or heterozygous nature of the SNP in the selected panel. A SNP was considered as heterozygous if more than 10% of the reads from a given accession showed a variation in its sequence. Four different classes were defined: (i) the locus showed only homozygous AA and BB alleles among all 10 lines, (ii) the locus showed homozygous AA and BB and heterozygous AB alleles, (iii) the locus showed only homozygous AA and heterozygous AB alleles, and (iv) the locus was heterozygous AB in all 10 lines. To be conservative, SNPs were considered only if the position was covered by at least 20 reads (from 10 lines). KASPar Genotyping Single-nucleotide polymorphism context sequences were submitted to LGC Genomics (Teddington, UK) for KASPar assay design. In addition to the SNP itself, the position of the junction was identified on the sequence as an x. KASPar assays were designed so that the allele-specific primers were on the SNP itself and the locus-specific primer was on the other side of the boundary between the TE and the low-copy sequence. Genotyping was conducted at LGC Genomics following their own protocol. Variability Estimation and Diversity Analysis Variability for each locus was measured using the polymorphism index content (PIC) (Anderson et al., 1993): n PIC = 1 - åp 2 i . i Genotyping data were used to describe diversity within both the elite and diversity panels. For each sample set, a dissimilarity matrix was built using simple matching coefficient between each pair of accessions. For the elite panel, a Neighbor-Joining tree was created using this dissimilarity matrix and 1000 bootstrapped trees were calculated to assess the uncertainty of the tree structure. For the INRA core collection, dissimilarities were calculated between individuals and the geographical structure of the wheat germplasm diversity was analyzed by a Ward dendrogram between accessions. Data analyses were conducted using DARwin software (Perrier et al., 2003; Perrier and Jacquemoud-Collet, 2006). Association Studies Dough consistency was estimated in 2005 and 2006 from the mixograph test as described in Bordes et al. (2011). cu bizolles et al .: mining snps in the repetitive fraction of the wheat genome 3 of 11 Accessions of the INRA core collection were grown in 2005 and 2006 in 10-m2 plots in a complete randomized block design at the INRA plant breeding station in Clermont-Ferrand (France) using regional farming methods, which include applying appropriate fertilizers and pesticides. To test dough quality, white flour was obtained with the Chopin Dubois CD1 mill (Tripette & Renaud, Villeneuve la Garenne, France), which gives a 70% extraction rate. Mixograph tests were assessed on 10 g of flour. Mixograph parameters were estimated from the height and width of the mixograph curve as described by Martinant et al. (1998). The average height of the curve measured at peak time, peak time plus 2 min, and peak time plus 8 min were used as a measure of the dough consistency (mixing consistency). Association between markers and mixing consistency was tested for the mean of the 2 yr with the TASSEL version 5.0 software (Bradbury et al., 2007) using the mixed linear model (MLM-Q-K) with controls for population structure (Q-matrix) and kinship between accessions (K-matrix). The kinship matrix was estimated with 527 of the 554 DArT markers described in Bordes et al. (2011) located on all chromosomes except chromosome 3B, according to the method proposed by Rincent et al. (2014). The power of association of a marker with mixing consistency is represented by a logarithm of the odds (LOD) score, computed as log(P), where P is the probability for a SNP to be associated with the trait. Markers were considered as associated if the LOD scores were 3 and 4, which indicate the significance and high significance thresholds, respectively. Results and Discussion Transposable Elements are a Rich Source of SNPs in Wheat IsbpFinder (Paux et al., 2010) was used to mine for ISBP markers in 16,159 chromosome 3B scaffolds (Choulet et al., 2014). Overall, a total of 410,124 ISBPs were found, including 178,535 high-confidence, 160,473 medium-confidence, and 71,116 low-confidence markers (Supplemental Table S2). To develop a high-throughput approach, we investigated the use of target enrichment strategies for efficient and high-throughput ISBP sequence capture. Our main problem laid in the repetitive nature of these markers. Indeed, the use of a repeat as bait would have led to the unspecific capture of all the repeats from the same family. To overcome this limitation, we focused on a specific subset of ISBP markers corresponding to the insertion of a TE in a low-copy sequence (either coding or noncoding). Out of 410,124 ISBPs, 184,452 corresponded to a junction between a TE and a low-copy sequence (58,676 high-confidence and 125,776 medium-confidence). A de novo repeat detection approach based on k-mer frequency was further applied to identify and mask any potential repeated stretches of DNA in the low-copy sequence. The purpose was to ensure that the template used for bait design was free 4 of 11 from repeated DNA (to prevent cross-hybridization). We finally designed 120-nt baits in the unrepeated DNA flanking a TE for target enrichment. This process was fully automatized in a program called IsbpProbeDesign (see the Material and Methods section). Eventually, 57,705 120-nt baits were identified in the 184,452 TE low-copy ISBPs. These baits were filtered according to their guanine–cytosine (GC) content to avoid the presence of too GC-poor or GC-rich sequences in the capture kit. Indeed, GC-poor or GC-rich sequences are known to affect both PCR amplification and capture efficiency (Asan et al., 2011; Chilamakuri et al., 2014). Baits with a GC content ranging from 33.3 to 70.0% were submitted to Agilent for the construction of a SureSelect sequence capture kit. In contrast to most of the previously reported studies that used tiling designs with several overlapping or nonoverlapping baits to capture a large contiguous region, our design targeted 52,265 discrete loci (average length = 221 bp) with one single bait per locus (Supplemental Table S3). The total capture space was 6272 kb, (52,265 × 120 bp) corresponding to a targeted space of 11,556 kb (52,265 × 221 bp). The corresponding SureSelect kit was used for target enrichment in 10 hexaploid wheat cultivars and lines. These lines correspond to European elite varieties, except for Chinese Spring, which is the reference cultivar used for genome sequencing (Choulet et al., 2014; International Wheat Genome Sequencing Consortium, 2014) and was used as control. Out of the 52,265 targeted ISBPs, 51,437 (98.4%) were efficiently captured in at least one of the 10 lines. The vast majority (71.3%) of ISBPs were captured in all lines. On average, 87.2 ± 0.8% of the 51,437 ISBPs were captured in a given accession, except for Chinese Spring, in which 95.4% were efficiently captured. This demonstrates the reliability of this approach and, in addition, suggests that the absence of a targeted ISBP in a given line might reflect the absence of the corresponding locus in its genome. The 828 loci that were not captured in any accession were clearly biased toward GC-rich sequences, with 51.4% of the baits having a GC content greater than 60% (compared to 10.7% for all baits and 9.7% for captured ones). This result was not unexpected, as it has been shown that the optimal GC content for sequence capture baits was 40 to 60% (Asan et al., 2011). A total of 49,836 SNPs were identified in 18,134 ISBPs (Supplemental Table S4). These SNPs were classified into four categories: (i) 33,220 (66.7%) loci showing only homozygous AA and BB alleles among all 10 lines, (ii) 5857 (11.8%) loci with homozygous AA and BB and heterozygous AB alleles, (iii) 8858 (17.8%) with only homozygous AA and heterozygous AB alleles, and (iv) 1901 (3.8%) with only heterozygous AB in all 10 lines. The majority of these SNPs (69.7%) corresponded to transition and 30.3% to transversion. This rate is consistent with previous studies and was expected, as the high methylation level in TEs leads to an increase in mutation the pl ant genome m arch 2016 vol . 9, no . 1 frequency at deaminated sites (Clark et al., 2005; Paux et al., 2010; Rabinowicz et al., 2005). For subsequent analyses, only Class 1 (AA and BB) and Class 2 (AA, AB, and BB) SNPs were selected. These 39,077 SNPs were identified in 16,556 ISBP markers (Supplemental Table S5). The number of SNPs per ISBP ranged from 1 to 12, with an average of 2.36 ± 1.58, corresponding to a density of one SNP every 98 bp, which was highly similar to our previous estimate of one SNP every 99 bp (Paux et al., 2010). This density was found to be comparable between the TE and low-copy sides of ISBPs (one SNP every 95 bp and one every 100 bp, respectively), suggesting that if TE methylation is responsible for the higher mutation rate of ISBPs compared to genes, methylation is likely to spread to neighboring regions, as has already been reported in other studies (for examples, see Martin et al., 2009; Eichten et al., 2012 and the references therein). In hexaploid wheat, the B genome is known to be the most polymorphic, accounting for ~47.5% of all polymorphisms, compared to 40.8 and 11.7% for the A and D genomes, respectively (International Wheat Genome Sequencing Consortium, 2014). Based on these proportions, one can extrapolate that this approach would lead to the discovery of more than 400,000 ISBP-SNPs in the whole wheat genome. As this number is relatively low compared to whole-genome resequencing approaches (Lai et al., 2015; International Wheat Genome Sequencing Consortium, 2014), this complexity reduction approach allows for SNPs to be mined in a larger set of lines, thus reducing ascertainment bias. The PIC values ranged from 0.18 to 0.50, with an average of 0.34, which is slightly higher than the average PIC value observed for transcript-specific SNPs in a set of 23 wheat varieties representing a broad cross-section of wheat germplasm (Allen et al., 2011). In addition, SNPs located on the same ISBP can be used to define haplotypes (Rafalski, 2002). To assess the number of haplotypes in our dataset, we focused on a subset of 25,832 Class 1 SNPs without missing data. The number of SNPs per ISBP ranged from 1 to 11. Among the 6830 multi-SNP ISBPs, the number of haplotypes ranged from two to seven and was highly correlated with the SNP number (R = 0.977, p = 10-7). When haplotypes were considered, PIC values ranged from 0.18 to 0.82, with an average of 0.42 ± 0.15. Insertion Site-based Polymorphism SNPs are Amenable to KASPar Genotyping To validate our SNP discovery process and assess the compatibility of ISBP-SNPs with genotyping technologies, a set of 3300 markers that were evenly distributed on chromosome 3B scaffolds was submitted to KBioscience for the synthesis of KASPar assays. To keep the genome-specificity of ISBP markers, the design was done so that the allele-specific primer was located on the SNP but the locus-specific primer was designed on the other side of the boundary between the TE and the low-copy sequence. Eventually, 3284 (99.5%) KASPar assays were successfully designed and used to genotype a panel of 460 wheat lines (Supplemental Table S6). These lines comprised 93 European elite cultivars (the elite panel) and the INRA core collection’ (Balfourier et al., 2007). Genotyping data on the elite panel and the INRA core collection were used to calculate performance metrics commonly applied to SNP genotyping, namely conversion rate, call rate, completeness, and accuracy (Hardenbol et al., 2005). Out of the 3284 KASPar assays synthesized, 2842 could be examined, corresponding to a “conversion rate” of 86.6%. This rate is higher than the one observed for gene-derived SNPs using the KASPar technology (Allen et al., 2011). This difference is probably caused by the hexaploid nature of the wheat genome in which the presence of homoeologous target sites can result in genotype call clustering failure (Akhunov et al., 2009). Here, because of the uniqueness of the ISBPs as well as the way the KASPar probes were designed, most of the assays behaved in a diploid manner, with homozygous clusters clearly separated on each side of the diagonal defined by the heterozygous cluster. The overall heterozygosity rate was found to be lower than 1%, as expected for homozygous wheat lines, with only 3% of the accessions having a heterozygosity rate greater than 3%. The number of missing data per genotype ranged from 17 to 1058, with only 44 out of 460 lines (9.6%) having more than 10% missing data. This can be extrapolated as a 90.5% “call rate” over the whole analysis. Finally, the percentage of total genotypes returned across all markers, defined as the completeness, was 94.6%, with 5.4% missing data. However, it is worth noting that, as previously stated, these missing data might reflect the fact that the corresponding loci are absent from the genome of some lines. In this case, it might be interesting to consider these data as presence or absence variations, which are known to occur frequently in plant genomes, especially in the noncoding repetitive fraction where TE insertions are not necessarily conserved across all accessions (Saxena et al., 2014). To assess the accuracy of KASPar results, the genotyping data were compared with the sequencing data of 7 out of the 10 lines used for SNP discovery. In total, 94.8% of the two sets were consistent. Most of the discrepancies (3.3% out of 5.2%) were caused by missing data either in the sequencing or in the genotyping results. An additional 1.7% of discrepancies was caused by genotypes that were called heterozygous by one technique and homozygous by the other one. The remaining 0.2% corresponded to genotypes with two different homozygous alleles with the two techniques. Interestingly, out of the 519 SureSelect missing data, almost one half (243) were confirmed by KASPar data, reinforcing the hypothesis that these missing data might be true presence or absence variations. Over the past few years, KASPar technology has become one of the most widely used technologies in plant breeding because of its flexibility, relatively low cost and high throughput (Michael, 2014). However, it is worth noting that ISBP-SNPs have already been proven to be compatible with other genotyping technologies such as cu bizolles et al .: mining snps in the repetitive fraction of the wheat genome 5 of 11 melting curve analysis, the Life Technologies SNaPshot Multiplex System, and the Illumina BeadArray technology (Paux et al., 2010), as well as high-resolution melting analysis (Schulman et al., 2012), Illumina iSelect, and Affymetrix Axiom (unpublished data). In addition, one could consider using the ISBP sequence capture approach as a complexity reduction step before sequencing-based genotyping. Indeed, thanks to its high reproducibility, this technique might circumvent the high level of missing data typically observed with classical genotyping-bysequencing approaches (Fu, 2014). Insertion Site-based Polymorphism SNPs Allow for the Saturation of Wheat Chromosomes To assess the distribution of total targeted ISBPs, polymorphic ISBPs carrying Class 1 or Class 2 SNPs as well as ISBP-SNPs in the wheat genome, ISBP sequences were mapped back to the chromosome 3B pseudomolecule. Since ISBPs were designed at an early stage of the chromosome 3B sequencing and considering that some scaffolds were not anchored on the pseudomolecule, 96% of the targeted ISBPs (50,408 out of 52,265), polymorphic ISBPs (15,869 out of 16,556) and SNPs (37,544 out of 39,077) have a position on this sequence. The distribution of targeted ISBPs was found to follow a gradient along the centromere–telomere axis (Fig. 1A). Considering that we focused on TE low-copy DNA ISBP markers, this result was not unexpected, as the percentage of low-copy sequences is higher in distal regions than in proximal ones (Choulet et al., 2014). The distribution of polymorphic ISBPs was found to be correlated with that of total ISBPs (R = 0.79, p = 0). The average polymorphic ISBP density was one every 48.8 kb. About three-fourths (77.4%) of the inter-ISBP distances were shorter than 50 kb, the largest region without any ISBP marker being 2.16 Mb. On average, 32.6% of ISBPs were polymorphic. Although this percentage was found to be quite even on the short arm of chromosome 3B, it appeared to be much more variable on the long arm (Fig. 1B). In particular, two regions of ~30 Mb were identified in which this percentage dropped down to 11 to 12%. Such regions might indicate domestication or selection footprints (Nielsen et al., 2005). Similarly, the SNP distribution was found to be highly correlated with that of polymorphic ISBPs (R = 0.98, p = 0), with an average of 2.36 SNPs per ISBP. The number of SNPs per ISBP was quite even along the chromosome, except in some regions (including the two aforementioned ones), reinforcing the hypothesis that they might correspond to regions that have been fixed through domestication or selection (Fig. 1C). However, the overall results demonstrate that the polymorphism level of ISBP markers (both in terms of polymorphic versus nonpolymorphic loci and in terms of SNP number) was not impacted by their position on the chromosome. In contrast, allele frequencies were found to vary greatly along the chromosome. In the elite panel and the INRA core collection, the PIC values ranged from 0.004 to 0.500, with an average value of 0.286 (0.268 in the elite 6 of 11 panel and 0.285 in the INRA core collection). This figure is comparable to other estimates in wheat (Allen et al., 2011; Chao et al., 2009; Edwards et al., 2009). The distribution of PIC values was not even along the chromosome (Fig. 1D). A strong decrease was observed in the proximal part of chromosome 3B, which is consistent with previous studies in wheat (Akhunov et al., 2010; Jordan et al., 2015) as well as in maize (Zea mays L.) (Gore et al., 2009). This region fits almost perfectly with the centromeric–pericentromeric region (265–387 Mb), defined previously on the basis of the absence of meiotic recombination and high linkage disequilibrium (Choulet et al., 2014), as well as high TE density (Daron et al., 2014) and enrichment in constitutively expressed genes (Pingault et al., 2015) and ancestrally conserved genes (Glover et al., 2015). Strikingly, in this region that encompasses more than 400 SNPs, two main haplotypes were observed. The less frequent one was found in a set of roughly 30 lines, mostly originating from Asia (China and Japan) as well as from the European elite germplasm pool. The genetic proximity of Asian and European lines in the pericentromeric region of chromosome 3B might be explained by the use of Japanese lines such as ‘Akakomugi’ or ‘Norin 10’ in European breeding programs for the transfer of Rht genes (Borojevic and Borojevic, 2005). Insertion Site-based Polymorphism SNPs Can be Used Efficiently for Wheat Breeding To assess the usefulness of ISBP-SNPs to discriminate between wheat lines, genotyping data from 2842 KASPar assays were used to describe diversity within the elite panel and the INRA core collection. Figure 2 shows the Neighbor-Joining tree among the 93 elite lines of the elite panel. As indicated by the high bootstrap values, each accession was clearly differentiated from the genetically nearest one. It is worth noting that a group of nine varieties (on the top of the tree) differentiates from the others. These correspond to the European material carrying an Asian-related pericentromere, as discussed previously. Figure S1 illustrates the Ward dendrogram of the 367 accessions of the INRA core collection, which was originally designed to capture more than 98% of the total allelic diversity observed at 38 polymorphic SSR loci in a sample of 4000 bread wheat accessions. As previously described (Horvath et al., 2009), the analysis of the population structure within the core collection using a set of whole-genome DArT markers led to five groups of accessions which can be related to their geographical origins: Western Europe, Eastern Europe, Mediterranean, Asia, and Nepal. In the present study, ISBP-SNPs succeeded in recovering geographical structure within five groups, with a clear separation between accessions from the Asian gene pool (Asia and Nepal) and from the European gene pool (Western Europe, Eastern Europe, and Mediterranean). When we compare our results with those obtained from a set of whole-genome DArT markers, some discrepancies in the clustering of accessions were observed. However, one should keep in mind that the pl ant genome m arch 2016 vol . 9, no . 1 Fig. 1. Distribution of polymorphism-related features along bread wheat chromosome 3B. (A) Average number of targeted insertion sitebased polymorphisms (ISBPs). (B) Average percentage of polymorphic ISBPs. (C) Average number of single-nucleotide polymorphisms (SNPs) per ISBP. (D) Average polymorphism index content (PIC) value. All features are calculated in a 10-Mb sliding window (Step 1 Mb). The x-axis represents the position on the chromosome 3B pseudomolecule (in Mb). cu bizolles et al .: mining snps in the repetitive fraction of the wheat genome 7 of 11 Fig. 2. Neighbor-Joining tree showing phylogenetic relationships between 93 wheat elite lines revealed by insertion site-based polymorphism-derived single-nucleotide polymorphisms (ISBP-SNPs). Bootstrapped values are given in blue. our SNPs originated from one chromosome only and this might have led to a slightly different genetic structure. Consistent with this is the bottom branch of the tree that contains accessions harboring the minor Asian pericentromeric haplotype mentioned previously. KASPar genotyping data from 2745 ISBP-SNPs with a minor allele frequency greater than 5% were also used to conduct a genome-wide association study for dough rheology on chromosome 3B. Phenotypic data derived from the INRA core collection were retrieved from Bordes et al. (2011). A strong association (LOD score >3) was found with 34 SNPs. These SNPs defined a region of ~3 Mb located on the long arm of chromosome 3B (Fig. 3). With 34 SNPs, one might theoretically expect more than 17 billion possible haplotypes. Instead, only 11 haplotypes were observed, the two most common ones representing 52.3 and 37.8%, respectively. The average mixograph consistency values of these two haplotypes were 32.3 and 34.6% of the curve width, respectively, suggesting a 6.9% difference between the two groups. 8 of 11 The 3-Mb region contains 47 annotated protein-coding genes. Their expression profiles were investigated in 15 conditions covering the entire plant development process (Pingault et al., 2015). Sixteen genes did not show any expression whereas three of them showed an increased expression in grain 30 d after anthesis (Zadoks scale 85), including two putative a or β hydrolases and one protein of unknown function. These genes might be good candidates for dough rheology. Because of their ability to distinguish efficiently between genetically close accessions as well as their potential association with traits of interest, ISBP-SNPs are useful tools for marker-assisted breeding, both for describing genetic diversity and for detecting genomic regions involved in agronomic traits. This is also exemplified by the recent use of these markers in markerbased phenology modeling (Bogard et al., 2014) or growth stage prediction (Bogard et al., 2015). the pl ant genome m arch 2016 vol . 9, no . 1 Fig. 3. Genome-wide association study results using 2745 single-nucleotide polymorphisms (SNPs) located on chromosome 3B in the Institut National de la Recherche Agronomique core collection for dough consistency. The black horizontal dashed line indicates the threshold of significance. The x-axis represents the position on the chromosome 3B pseudomolecule (in Mb). Conclusions In this study, we developed a high-throughput method for SNP discovery in the repetitive fraction of the wheat genome. By using sequence capture, this approach allows for SNP discovery in both regions of interest and at the whole-genome level in species for which genome size makes whole-genome resequencing too expensive. In addition, thanks to the genome specificity of ISBP markers, this approach makes SNP discovery and genotyping much more straightforward, by reducing the confounding effect of homoeologous and paralogous SNPs in polyploid and highly duplicated genomes. Despite the fact that they are not derived from coding regions, their high density allows for virtually all genetic loci to be in linkage disequilibrium with at least one marker. In addition, their compatibility with KASPar genotyping technology, as well as their ability to discriminate between closely related wheat lines, makes them suitable markers for molecular breeding. Supplemental Material Supplemental Figure S1. Ward dendrogram showing the phylogenetic relationships among wheat accessions of the INRA core collection revealed by ISBP-SNPs. Accessions are named according to their genetic resource number (ERGE) in the INRA Biological Resources Center. They are colored according to five groups related to previous structure analysis (Horvath et al., 2009): Asia (orange), Nepal (pink), Northwest Europe (blue), Eastern Europe (red), and Mediterranean (green). Supplemental Table S1. List of wheat lines genotyped with KASPar assays and the corresponding panel. Supplemental Table S2. List of 412,124 ISBPs detected on chromosome 3B with the IsbpFinder program. Supplemental Table S3. List of 52,265 ISBPs used for bait design. Supplemental Table S4. List of 49,836 SNPs with their corresponding alleles in the 10 resequenced lines and their SNP category. Supplemental Table S5. List of 39,077 Class1 and Class 2 SNPs with their context sequence. Supplemental Table S6. List of 3284 KASPar assays. Acknowledgments This work was supported by two grants from the French National Research Agency (ANR-09-GENM-025 3BSEQ and ANR-09-GENM-027 GAIN-SPEED), grants from France Agrimer, and the private-public consortium DIGITAL (INRA, AgriObtentions, ARVALIS – Institut du Végétal, Bayer CropScience, Florimond-Desprez, KWS LOCHOW GMBH, RAGT, and Syngenta). Sequence capture experiments were conducted on the genotyping platform GENTYANE at INRA Clermont-Ferrand (gentyane.clermont.inra.fr, accessed 2 Dec. 2015). Seeds from the INRA core collection were provided by the Biological Resources Centre on small-grain cereals at INRA Clermont-Ferrand (www6.clermont.inra.fr/ umr1095_eng/Teams/Experimental-Infrastructure/Biological-ResourcesCentre, accessed 2 Dec. 2015). References Akhunov, E., C. Nicolet, and J. Dvorak. 2009. Single nucleotide polymorphism genotyping in polyploid wheat with the Illumina GoldenGate assay. Theor. Appl. Genet. 119:507–517. doi:10.1007/s00122-009-1059-5 Akhunov, E.D., A.R. Akhunova, O.D. Anderson, J.A. Anderson, N. Blake, M.T. Clegg, et al. 2010. Nucleotide diversity maps reveal variation in diversity among wheat genomes and chromosomes. BMC Genomics 11:702. doi:10.1186/1471-2164-11-702 Allen, A.M., G.L. Barker, P. Wilkinson, A. Burridge, M. Winfield, J. Coghill, et al. 2013. Discovery and development of exome-based, co-dominant single nucleotide polymorphism markers in hexaploid wheat (Triticum aestivum L.). Plant Biotechnol. J. 11:279–295. doi:10.1111/pbi.12009 cu bizolles et al .: mining snps in the repetitive fraction of the wheat genome 9 of 11 Allen, A.M., G.L.A. Barker, S.T. Berry, J.A. Coghill, R. Gwilliam, S. Kirby, et al. 2011. Transcript-specific, single-nucleotide polymorphism discovery and linkage analysis in hexaploid bread wheat (Triticum aestivum L.). Plant Biotechnol. J. 9:1086–1099. doi:10.1111/j.14677652.2011.00628.x Anderson, J.A., G.A. Churchill, J.E. Autrique, S.D. Tanksley, and M.E. Sorrells. 1993. Optimizing parental selection for genetic-linkage maps. Genome 36:181–186. doi:10.1139/g93-024 Asan, Y. Xu, H. Jiang, C. Tyler-Smith, Y. Xue, T. Jiang, et al. 2011. Comprehensive comparison of three commercial human whole-exome capture platforms. Genome Biol. 12:R95. doi:10.1186/gb-2011-12-9-r95 Balfourier, F., V. Roussel, P. Strelchenko, F. Exbrayat-Vinson, P. Sourdille, G. Boutet, et al. 2007. A worldwide bread wheat core collection arrayed in a 384-well plate. Theor. Appl. Genet. 114:1265–1275. doi:10.1007/s00122-007-0517-1 Bogard, M., J.-B. Pierre, B. Huguenin-Bizot, D. Hourcade, E. Paux, X. Le Bris, et al. 2015. A simple approach to predict growth stages in winter wheat (Triticum aestivum L.) combining prediction of a crop model and marker based prediction of the deviation to a reference cultivar: A case study in France. Eur. J. Agron. 68:57–68. doi:10.1016/j. eja.2015.04.007 Bogard, M., C. Ravel, E. Paux, J. Bordes, F. Balfourier, S. Chapman, et al. 2014. Predictions of heading date in bread wheat (Triticum aestivum L.) using QTL-based parameters of an ecophysiological model. J. Exp. Bot. 65:5849–5865. doi:10.1093/jxb/eru328 Bordes, J., C. Ravel, J. Le Gouis, A. Lapierre, G. Charmet, and F. Balfourier. 2011. Use of a global wheat core collection for association analysis of flour and dough quality traits. J. Cereal Sci. 54:137–147. doi:10.1016/j. jcs.2011.03.004 Borojevic, K., and K. Borojevic. 2005. The transfer and history of “reduced height genes” (Rht) in wheat from Japan to Europe. J. Hered. 96:455– 459. doi:10.1093/jhered/esi060 Bradbury, P.J., Z. Zhang, D.E. Kroon, T.M. Casstevens, Y. Ramdoss, and E.S. Buckler. 2007. TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 23:2633–2635. doi:10.1093/bioinformatics/btm308 Brenchley, R., M. Spannagl, M. Pfeifer, G. Barker, R. D’Amore, A. Allen, et al. 2012. Analysis of the bread wheat genome using whole-genome shotgun sequencing. Nature 491:705–710. doi:10.1038/nature11650 Cao, J., K. Schneeberger, S. Ossowski, T. Gunther, S. Bender, J. Fitz, et al. 2011. Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nat. Genet. 43:956–963. doi:10.1038/ng.911 Cavanagh, C.R., S. Chao, S. Wang, B.E. Huang, S. Stephen, S. Kiani, et al. 2013. Genome-wide comparative diversity uncovers multiple targets of selection for improvement in hexaploid wheat landraces and cultivars. Proc. Natl. Acad. Sci. USA 110:8057–8062. doi:10.1073/ pnas.1217133110 Chao, S., W. Zhang, E. Akhunov, J. Sherman, Y. Ma, M.-C. Luo, et al. 2009. Analysis of gene-derived SNP marker polymorphism in US wheat (Triticum aestivum L.) cultivars. Mol. Breed. 23:23–33. doi:10.1007/ s11032-008-9210-6 Chilamakuri, C.S., S. Lorenz, M.-A. Madoui, D. Vodak, J. Sun, E. Hovig, et al. 2014. Performance comparison of four exome capture systems for deep sequencing. BMC Genomics 15:449. doi:10.1186/1471-2164-15-449 Choulet, F., A. Alberti, S. Theil, N. Glover, V. Barbe, J. Daron, et al. 2014. Structural and functional partitioning of bread wheat chromosome 3B. Science 345:1249721. doi:10.1126/science.1249721 Choulet, F., T. Wicker, C. Rustenholz, E. Paux, J. Salse, P. Leroy, et al. 2010. Megabase level sequencing reveals contrasted organization and evolution patterns of the wheat gene and transposable element spaces. Plant Cell 22:1686–1701. doi:10.1105/tpc.110.074187 Clark, R.M., S. Tavare, and J. Doebley. 2005. Estimating a nucleotide substitution rate for maize from polymorphism at a major domestication locus. Mol. Biol. Evol. 22:2304–2312. doi:10.1093/molbev/msi228 Daron, J., N. Glover, L. Pingault, S. Theil, V. Jamilloux, E. Paux, et al. 2014. Organization and evolution of transposable elements along the bread wheat chromosome 3B. Genome Biol. 15:546. doi:10.1186/s13059-0140546-4 10 of 11 Edwards, K.J., A.L. Reid, J.A. Coghill, S.T. Berry, and G.L. Barker. 2009. Multiplex single nucleotide polymorphism (SNP)-based genotyping in allohexaploid wheat using padlock probes. Plant Biotechnol. J. 7:375–390. doi:10.1111/j.1467-7652.2009.00413.x Eichten, S.R., N.A. Ellis, I. Makarevitch, C.-T. Yeh, J.I. Gent, L. Guo, et al. 2012. Spreading of heterochromatin is limited to specific families of maize retrotransposons. PLoS Genet. 8:e1003127. doi:10.1371/journal. pgen.1003127 Fu, Y.-B. 2014. Genetic diversity analysis of highly incomplete SNP genotype data with imputations: an empirical assessment. G3 (Bethesda) 4(5): 891–900. doi: 10.1534/g3.114.010942. Ganal, M.W., T. Altmann, and M.S. Röder. 2009. SNP identification in crop plants. Curr. Opin. Plant Biol. 12:211–217. doi:10.1016/j. pbi.2008.12.009 Ganal, M.W., A. Polley, E.M. Graner, J. Plieske, R. Wieseke, H. Luerssen, et al. 2012. Large SNP arrays for genotyping in crop plants. J. Biosci. 37:821–828. doi:10.1007/s12038-012-9225-3 Glover, N., J. Daron, L. Pingault, K. Vandepoele, E. Paux, C. Feuillet, et al. 2015. Small-scale gene duplications played a major role in the recent evolution of wheat chromosome 3B. Genome Biol. 16:188. doi:10.1186/ s13059-015-0754-6 Gore, M.A., J.M. Chia, R.J. Elshire, Q. Sun, E.S. Ersoz, B.L. Hurwitz, et al. 2009. A first-generation haplotype map of maize. Science 326:1115– 1117. doi:10.1126/science.1177837 Graner, A., H. Siedler, A. Jahoor, R.G. Herrmann, and G. Wenzel. 1990. Assessment of the degree and the type of restriction-fragment-lengthpolymorphism in barley (Hordeum vulgare). Theor. Appl. Genet. 80:826–832. doi:10.1007/BF00224200 Hardenbol, P., F. Yu, J. Belmont, J. MacKenzie, C. Bruckner, T. Brundage, et al. 2005. Highly multiplexed molecular inversion probe genotyping: Over 10,000 targeted SNPs genotyped in a single tube assay. Genome Res. 15:269–275. doi:10.1101/gr.3185605 Hillier, L.W., G.T. Marth, A.R. Quinlan, D. Dooling, G. Fewell, D. Barnett, et al. 2008. Whole-genome sequencing and variant discovery in C. elegans. Nat. Methods 5:183–188. doi:10.1038/nmeth.1179 Horvath, A., A. Didier, J. Koenig, F. Exbrayat, G. Charmet, and F. Balfourier. 2009. Analysis of diversity and linkage disequilibrium along chromosome 3B of bread wheat (Triticum aestivum L.). Theor. Appl. Genet. 119:1523–1537. doi:10.1007/s00122-009-1153-8 International Wheat Genome Sequencing Consortium. 2014. A chromosome-based draft sequence of the hexaploid bread wheat genome. Science 345:1251788. doi:10.1126/science.1251788 Jordan, K., S. Wang, Y. Lun, L.-J. Gardiner, R. MacLachlan, P. Hucl, et al. 2015. A haplotype map of allohexaploid wheat reveals distinct patterns of selection on homoeologous genomes. Genome Biol. 16:48. doi:10.1186/s13059-015-0606-4 Kurtz, S., A. Narechania, J.C. Stein, and D. Ware. 2008. A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes. BMC Genomics 9:517. doi:10.1186/14712164-9-517 Lai, K., C. Duran, P.J. Berkman, M.T. Lorenc, J. Stiller, S. Manoli, et al. 2012. Single nucleotide polymorphism discovery from wheat next-generation sequence data. Plant Biotechnol. J. 10:743–749. doi:10.1111/j.14677652.2012.00718.x Lai, K., M.T. Lorenc, H.C. Lee, P.J. Berkman, P.E. Bayer, P. Visendi, et al. 2015. Identification and characterization of more than 4 million intervarietal SNPs across the group 7 chromosomes of bread wheat. Plant Biotechnol. J. 13:97–104. doi:10.1111/pbi.12240 Lam, H.-M., X. Xu, X. Liu, W. Chen, G. Yang, F.-L. Wong, et al. 2010. Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversity and selection. Nat. Genet. 42:1053–1059. doi:10.1038/ng.715 Liu, L., Y. Li, S. Li, N. Hu, Y. He, R. Pong, et al. 2012. Comparison of nextgeneration sequencing systems. J. Biomed. Biotechnol. 2012: 251364. doi:10.1155/2012/251364. Marth, G.T., I. Korf, M.D. Yandell, R.T. Yeh, Z. Gu, H. Zakeri, et al. 1999. A general approach to single-nucleotide polymorphism discovery. Nat. Genet. 23:452–456. doi:10.1038/70570 the pl ant genome m arch 2016 vol . 9, no . 1 Martin, A., C. Troadec, A. Boualem, M. Rajab, R. Fernandez, H. Morin, et al. 2009. A transposon-induced epigenetic change leads to sex determination in melon. Nature 461:1135–1138. doi:10.1038/nature08498. Martinant, J.P., Y. Nicolas, A. Bouguennec, Y. Popineau, L. Saulnier, and G. Branlard. 1998. Relationships between mixograph parameters and indices of wheat grain quality. J. Cereal Sci. 27:179–189. doi:10.1006/ jcrs.1997.0156 Michael, J.T. 2014. High-throughput SNP genotyping to accelerate crop improvement. Plant Breeding and Biotechnology 2:195–212. doi:10.9787/PBB.2014.2.3.195 Nielsen, R., S. Williamson, Y. Kim, M.J. Hubisz, A.G. Clark, and C. Bustamante. 2005. Genomic scans for selective sweeps using SNP data. Genome Res. 15:1566–1575. doi:10.1101/gr.4252305 Paux, E., S. Faure, F. Choulet, D. Roger, V. Gauthier, J.P. Martinant, et al. 2010. Insertion site-based polymorphism markers open new perspectives for genome saturation and marker-assisted selection in wheat. Plant Biotechnol. J. 8:196–210. doi:10.1111/j.1467-7652.2009.00477.x Paux, E., D. Roger, E. Badaeva, G. Gay, M. Bernard, P. Sourdille, et al. 2006. Characterizing the composition and evolution of homoeologous genomes in hexaploid wheat through BACend sequencing on chromosome 3B. Plant J. 48:463–474. doi:10.1111/j.1365-313X.2006.02891.x Paux, E., P. Sourdille, I. Mackay, and C. Feuillet. 2011. Sequence-based marker development in wheat: Advances and applications to breeding. Biotechnol. Adv. 30:1071–1088. doi:10.1016/j.biotechadv.2011.09.015 Perrier, X., A. Flori, and F. Bonnot. 2003. Data analysis methods. In: P. Hamon, M. Seguin, X. Perrier, and J.-C. Glaszmann, editors, Genetic diversity of cultivated tropical plants. Enfield Science Publishers, Montpellier, VT. p. 43–76. Perrier, X., and J.P. Jacquemoud-Collet. 2006. DARwin software. CIRAD (French Agricultural Research Centre for International Development) http://darwin.cirad.fr/darwin (accessed 2 Dec. 2015). Pingault, L., F. Choulet, A. Alberti, N. Glover, P. Wincker, C. Feuillet, et al. 2015. Deep transcriptome sequencing provides new insights into the structural and functional organization of the wheat genome. Genome Biol. 16:29. doi:10.1186/s13059-015-0601-9 Rabinowicz, P.D., R. Citek, M.A. Budiman, A. Nunberg, J.A. Bedell, N. Lakey, et al. 2005. Differential methylation of genes and repeats in land plants. Genome Res. 15:1431–1440. doi:10.1101/gr.4100405 Rafalski, A. 2002. Applications of single nucleotide polymorphisms in crop genetics. Curr. Opin. Plant Biol. 5:94–100. doi:10.1016/S13695266(02)00240-6 Rincent, R., L. Moreau, H. Monod, E. Kuhn, A.E. Melchinger, R.A. Malvar, et al. 2014. Recovering power in association mapping panels with variable levels of linkage disequilibrium. Genetics 197:375–387. doi:10.1534/genetics.113.159731 Saxena, R.K., D. Edwards, and R.K. Varshney. 2014. Structural variations in plant genomes. Brief. Funct. Gen. 13:296–307. doi:10.1093/bfgp/elu016 Schulman, A., A. Flavell, E. Paux, and T. Ellis. 2012. The application of LTR retrotransposons as molecular markers in plants. In: Y. Bigot, editor, Mobile genetic elements—Methods in molecular biology. Springer, New York. p. 115–153. Strömberg, M. 2009. Mosaik assembler. Release 1.0. Boston College, Chestnut Hill, MA. Wang, S., D. Wong, K. Forrest, A. Allen, S. Chao, B.E. Huang, et al. 2014. Characterization of polyploid wheat genomic diversity using a highdensity 90 000 single nucleotide polymorphism array. Plant Biotechnol. J. 12:787–796. doi:10.1111/pbi.12183 Wicker, T., K. Mayer, H. Gundlach, M. Martis, B. Steuernagel, U. Scholz, et al. 2011. Frequent gene movement and pseudogene evolution is common to the large and complex genomes of wheat, barley, and their relatives. Plant Cell 5:1706–1718. doi:10.1105/tpc.111.086629 Winfield, M.O., P.A. Wilkinson, A.M. Allen, G.L. Barker, J.A. Coghill, A. Burridge, et al. 2012. Targeted re-sequencing of the allohexaploid wheat exome. Plant Biotechnol. J. 10:733–742. doi:10.1111/j.14677652.2012.00713.x Xu, X., X. Liu, S. Ge, J.D. Jensen, F. Hu, X. Li, et al. 2012. Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes. Nat. Biotechnol. 30:105–111. doi:10.1038/nbt.2050 Zimin, A., K.A. Stevens, M.W. Crepeau, A. Holtz-Morris, M. Koriabine, G. Marçais, et al. 2014. Sequencing and assembly of the 22-Gb loblolly pine genome. Genetics 196:875–890. doi:10.1534/genetics.113.159715 cu bizolles et al .: mining snps in the repetitive fraction of the wheat genome 11 of 11
© Copyright 2026 Paperzz