Exploiting the Repetitive Fraction of the Wheat Genome for High

Published February 26, 2016
o r i g i n a l r es e a r c h
Exploiting the Repetitive Fraction of the Wheat
Genome for High-Throughput Single-Nucleotide
Polymorphism Discovery and Genotyping
Nelly Cubizolles, Elodie Rey, Frédéric Choulet, Hélène Rimbert, Christel Laugier,
François Balfourier, Jacques Bordes, Charles Poncet, Peter Jack, Chris James,
Jan Gielen, Odile Argillier, Jean-Pierre Jaubertie, Jérôme Auzanneau, Antje Rohde,
Pieter B.F. Ouwerkerk, Viktor Korzun, Sonja Kollers, Laurent Guerreiro,
Delphine Hourcade, Olivier Robert, Pierre Devaux, Anna-Maria Mastrangelo,
Catherine Feuillet, Pierre Sourdille, and Etienne Paux*
Abstract
Transposable elements (TEs) account for more than 80% of the
wheat genome. Although they represent a major obstacle for
genomic studies, TEs are also a source of polymorphism and
consequently of molecular markers such as insertion site-based
polymorphism (ISBP) markers. Insertion site-based polymorphisms
have been found to be a great source of genome-specific singlenucleotide polymorphism (SNPs) in the hexaploid wheat (Triticum
aestivum L.) genome. Here, we report on the development of a
high-throughput SNP discovery approach based on sequence
capture of ISBP markers. By applying this approach to the reference sequence of chromosome 3B from hexaploid wheat, we
designed 39,077 SNPs that are evenly distributed along the
chromosome. We demonstrate that these SNPs can be efficiently
scored with the KASPar (Kompetitive allele-specific polymerase
chain reaction) genotyping technology. Finally, through genetic
diversity and genome-wide association studies, we also demonstrate that ISBP-derived SNPs can be used in marker-assisted
breeding programs.
Published in The Plant Genome 9
doi: 10.3835/plantgenome2015.09.0078
© Crop Science Society of America
5585 Guilford Rd., Madison, WI 53711 USA
This is an open access article distributed under the CC BY-NC-ND
license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
the pl ant genome

m arch 2016

vol . 9, no . 1 N. Cubizolles, E. Rey, F. Choulet, H. Rimbert, C. Laugier, F. Balfourier,
J. Bordes, C. Poncet, C. Feuillet, P. Sourdille, and E. Paux, Institut
National de la Recherche Agronomique , Joint Research Unit 1095
Génétique, Diversité et Ecophysiologie des Céréales, 5 Chemin de
Beaulieu, F-63039 Clermont-Ferrand, France; N. Cubizolles, E. Rey,
F. Choulet, H. Rimbert, C. Laugier, F. Balfourier, J. Bordes, C. Poncet,
C. Feuillet, P. Sourdille, and E. Paux, Université Blaise Pascal, UMR
1095 Génétique, Diversité et Ecophysiologie des Céréales, 5 Chemin de Beaulieu, F-63177 Aubière Cedex, France; P. Jack and C.
James, RAGT Seeds Ltd., Grange Rd., Ickleton, Essex, CB10 1TA, UK;
J. Gielen and O. Argillier, Syngenta France, 12 Chemin de l’Hobit,
F-31790 Saint Sauveur, France; J.-P. Jaubertie and J. Auzanneau,
Agri-Obtentions, Chemin de la Petite Minière, F-78280 Guyancourt,
France; A. Rohde and P.B.F. Ouwerkerk, B. CropScience NV, Innovation Center Technologiepark 38, B-9052 Ghent, Belgium; V. Korzun
and S. Kollers, KWS LOCHOW GMBH, Ferdinand-von-LochowStraße 5, D-29303 Bergen, Germany; L. Guerreiro, ARVALIS – Institut
du Végétal, Station Expérimentale, F-91720 Boigneville, France; D.
Hourcade, ARVALIS – Institut du Végétal, 6 Chemin de la Côté Vieille,
F-31450 Baziège, France. Olivier Robert and Pierre Devaux, Florimond Desprez, BP 41, F-59242 Cappelle-en-Pévèle, France; A.-M.
Mastrangelo, Council for Agricultural Research and Economics – Cereal Research Centre, S.S. 673 km 25.200, I-71122 Foggia, Italy;
N. Cubizolles and C. Laugier, current address: Biogemma, Route
d’Ennezat, F-63720 Chappes, France; E. Rey, current address: Institute of Experimental Botany, Centre of the Region Haná for Biotechnological and Agricultural Research, Šlechtitelů 31, CZ-78371 Olomouc,
Czech Republic; L. Guerreiro, current address: RAGT 2n, Rue Emile
Singla, BP 3331, F-12033 Rodez, France; and C. Feuillet, current
address: Bayer CropScience, 3500 Paramount Pkwy., Morrisville,
NC 27560, USA. Received 1 Sept. 2015. Accepted 20 Oct. 2015.
*Corresponding author ([email protected]).
Abbreviations: DArT, diversity array technology; GC, guanine–
cytosine; INRA, Institut National de la Recherche Agronomique (French
National Institute for Agricultural Research); ISBP, insertion site-based
polymorphism; ISBP-SNP, ISBP-derived single-nucleotide polymorphism;
KASPar, Kompetitive allele-specific polymerase chain reaction;
LOD, logarithm of the odds; PCR, polymerase chain reaction; PIC,
polymorphism index content; SNP, single-nucleotide polymorphism;
SSR, simple sequence repeats; TE, transposable elements.
1
of
11
I
decade, SNPs have emerged as the marker
of choice in genetics because of their abundance and
high-throughput detection capacities. Single-nucleotide
polymorphisms are commonly used for a wide range of
applications in plants, including generation of high-density genetic maps, physical map or sequence anchoring
and ordering, quantitative trait locus mapping, genomewide association studies, map-based gene cloning, characterization of genetic resources, and marker-assisted
breeding and genomic selection (Ganal et al., 2009, 2012).
Unlike other types of markers such as simple sequence
repeats (SSRs) or diversity array technology (DArTs),
SNPs require the comparison of homologous sequences
between genotypes to identify variations at the sequence
level. The recent advent of next-generation sequencing
systems (Liu et al., 2012) has opened the way to wholegenome resequencing of several plant species for SNP
discovery or direct genotyping. Although this approach
is affordable for plant with small to moderate genome
sizes such as Arabidopsis thaliana L. Heyhn. (Cao et al.,
2011), rice (Oryza sativa L.) (Xu et al., 2012), or soybean
[Glycine max (L.) Merr.] (Lam et al., 2010), whole genome
resequencing remains too expensive for plant with large
genomes such as bread wheat (17,000 Mb; International
Wheat Genome Sequencing Consortium, 2014) or loblolly pine (Pinus taeda L.) (22,000 Mb; Zimin et al., 2014)
and therefore cannot be applied to a large set of lines. For
such species, complexity reduction approaches can be an
alternative for efficiently and reproducibly sampling the
same sequence fraction between genotypes (Paux et al.,
2011). Two types of approaches, “random” and “targeted”,
can be used to reduce genome complexity. Random
approaches include transcriptome resequencing, methylfiltration, renaturation-based (high-Cot) filtration or
restriction enzyme-based techniques such as complexity
reduction of polymorphic sequences. Targeted approaches
include polymerase chain reaction (PCR) amplification
of similar targets in different cultivars or hybridization
using oligonucleotide capture technologies. Most of these
approaches rely on the filtration of the repetitive fraction
of the genome to reduce complexity. Even whole-genome
resequencing approaches tend to focus on low-copy DNA
regions as repeat-originating reads can hardly be mapped
to their correct locus in the genome. However, because
they are less subjected to selection pressure, TEs can be a
large source of polymorphism.
In wheat, TEs account for more than 80% of the
genome size (Choulet et al., 2010; Wicker et al., 2011).
Over the past few years, tremendous efforts in SNP discovery have been made in wheat, mainly targeting the
coding fraction of the genome. Various approaches have
been used, including cDNA sequencing (Allen et al.,
2011), targeted resequencing of the wheat exome (Allen et
al., 2013; Winfield et al., 2012), transcriptome sequencing
(Lai et al., 2012), and whole-genome resequencing (Lai et
al., 2015; International Wheat Genome Sequencing Consortium, 2014). These projects have led to the discovery of
several millions of SNPs. Concomitant to SNP discovery,
n the past
2
of
11
several technologies have been implemented for SNP
genotyping in hexaploid wheat, including KASPar technology (Allen et al., 2011, 2013;), Illumina GoldenGate
(Akhunov et al., 2009), and Illumina iSelect (Cavanagh et
al., 2013; Wang et al., 2014).
However, the allohexaploid nature of the wheat
genome, combined with its high level of gene duplication (Choulet et al., 2014), poses significant challenges
(Akhunov et al., 2009). Indeed, during the SNP discovery
process, true allelic SNPs have to be discriminated from
homoeologous and paralogous SNPs. In addition, the
presence of homoeologous or paralogous loci creates a
shift of genotype cluster plots compared to their position
in diploid species (Wang et al., 2014).
Access to unique loci might help to circumvent these
challenges, both in terms of SNP discovery and SNP
genotyping. Insertion site-based polymorphism markers have been shown to be a very efficient and straightforward way to design unique genome-specific markers
(Paux et al., 2006). Insertion site-based polymorphism
markers were initially described as PCR-based markers
for the amplification of a DNA fragment spanning a junction between a TE and its flanking sequence (another TE
or a low-copy sequence). They represent a way to access
the repetitive fraction of the wheat genome that has
not been exploited for SNP discovery so far. Indeed, we
recently demonstrated the potential of ISBP markers to
saturate the wheat genome efficiently and to be used in
marker-assisted selection (Paux et al., 2010). With average
densities of one ISBP per 5.4 kb in the hexaploid wheat
genome and one SNP every ~100 bp in ISBPs, we estimated that about six million genome-specific SNPs could
be designed from ISBPs in the whole wheat genome.
However, the full potential of ISBPs for SNP discovery can be achieved only with high-throughput technologies that allow for the efficient resequencing of several
thousand ISBPs in different cultivars. So far, SNP discovery within ISBPs has been done through PCR amplification, which is not compatible with high-throughput
approaches. Here, we report on the development of a
high throughput ISBP-derived SNP (ISBP-SNP) discovery approach. We also demonstrate the efficiency of the
combination of ISBP-SNP and KASPar genotyping and
its potential application to wheat breeding.
Material and Methods
Plant Material and DNA Extraction
Ten wheat cultivars and lines were selected for SNP discovery: ‘Apache’, ‘Autan’, ‘Aztec’, ‘Cezanne’, ‘Chinese Spring’,
‘INRA3’, ‘Premio’, ‘Renan’, ‘Robigus’, and ‘Xi19’. The elite
panel consisted of a set of 93 European elite varieties,
which had mainly been registered during the last decades
in France, the United Kingdom, and Germany (Supplemental Table S1). The Institut National de la Recherche
Agronomique (INRA; French National Institute for Agricultural Research) worldwide bread wheat core collection
is made up of 367 accessions that capture more than 98%
the pl ant genome

m arch 2016

vol . 9, no . 1
of the total allelic diversity observed at 38 polymorphic
SSR loci in a sample of 4000 bread wheat accessions (Balfourier et al., 2007) (Supplemental Table S1). Accessions
within the latter collection originate from 70 different
countries and include both landraces from the 19th century and old or modern cultivars from the 20th century.
DNA was extracted as described in Graner et al. (1990).
IsbpProbeDesign
The IsbpProbeDesign program is written in Perl. It uses
data generated by IsbpFinder (Paux et al., 2010) and
retrieves only the ISBP markers that correspond to a
junction between a TE and a low-copy sequence. It then
uses Tallymer (Kurtz et al., 2008) to mask k-mers that
are repeated x times or more at the whole genome level.
Finally, it extracts subsequences of unmasked nucleotides
in the low-copy part of the ISBP marker flanking the TE
sequence. Parameters such as k-mer length, number of
k-mer occurrences in the Tallymer index, and subsequence
length are customizable. In this study, the Tallymer index
was computed on a 1× genome coverage sampled in the
reads produced on the cultivar Chinese Spring (Brenchley
et al., 2012). K-mer length was set up to 20, the number of
occurrences to 10, and the subsequence length to 120 nt.
IsbpProbeDesign is freely available on request.
Insertion Site-based Polymorphism
Sequence Capture
Sequence capture was performed using the SureSelect
XT Target Enrichment system for Illumina Paired-end
Sequencing Library (Agilent, Santa Clara, CA). A total
of 52,265 120-nt baits were designed to target 6272 kb of
sequence. For each wheat accession, 3 µg of genomic DNA
dissolved in 100 mL of 1× TE buffer were fragmented to
an average size of 150 to 200 bp on a Covaris S2 (Covaris
Inc., Woburn, MA) using the following parameters: duty
factor, 10%; intensity, 5; cycles per burst, 200; time, 180s;
set mode frequency sweeping; temperature, 4°C. Fragment sizes were checked on 2% agarose gel. The following
steps were performed according to the standard Agilent
protocol. Libraries were tagged during the post-capture
ligation-mediated PCR step, using Index 2, 4, 5, 6 or 7
(two samples per index). Following DNA quantitation,
two equimolar pools of five samples carrying five different index sequences were prepared. Each pool was
sequenced in paired-end 2× 100-bp reads on an Illumina
HiSeq2000 lane (Illumina Inc., San Diego, CA).
Sequence Alignment and SNP Discovery
Read mapping was performed using the Mosaik assembler (Strömberg, 2009). MosaikBuild was used with
FASTQ mate files for each accession. A multi-FASTA file
containing 52,265 ISBP sequences was used as a reference. Reads were aligned with MosaikAligner using the
following criteria: maximum number of mismatches, 3;
alignment candidate threshold, 60; minimum percentage of the read length that should be aligned, 0.6; use of
aligned read length when counting errors. MosaikSort,
MosaikMerge, and MosaikAssembler were then used to
generate a single assembly file. This file was then submitted to the GigaBayes package (Hillier et al., 2008;
Marth et al., 1999) for SNP discovery using the following
parameters: ploidy, diploid; sample, multiple; SNP lower
probability threshold for reporting polymorphism candidates, 1; output detail level for candidate polymorphism
report, 4; minimum total read coverage for position to
be considered, 20. GigaBayes results were then processed
with a Perl script to classify SNPs into four different
classes. These classes were based on the homozygous or
heterozygous nature of the SNP in the selected panel. A
SNP was considered as heterozygous if more than 10% of
the reads from a given accession showed a variation in its
sequence. Four different classes were defined: (i) the locus
showed only homozygous AA and BB alleles among all
10 lines, (ii) the locus showed homozygous AA and BB
and heterozygous AB alleles, (iii) the locus showed only
homozygous AA and heterozygous AB alleles, and (iv)
the locus was heterozygous AB in all 10 lines. To be conservative, SNPs were considered only if the position was
covered by at least 20 reads (from 10 lines).
KASPar Genotyping
Single-nucleotide polymorphism context sequences were
submitted to LGC Genomics (Teddington, UK) for KASPar assay design. In addition to the SNP itself, the position of the junction was identified on the sequence as an
x. KASPar assays were designed so that the allele-specific
primers were on the SNP itself and the locus-specific
primer was on the other side of the boundary between
the TE and the low-copy sequence. Genotyping was conducted at LGC Genomics following their own protocol.
Variability Estimation and Diversity Analysis
Variability for each locus was measured using the polymorphism index content (PIC) (Anderson et al., 1993):
n
PIC = 1 -
åp
2
i
.
i
Genotyping data were used to describe diversity within
both the elite and diversity panels. For each sample set,
a dissimilarity matrix was built using simple matching
coefficient between each pair of accessions. For the elite
panel, a Neighbor-Joining tree was created using this
dissimilarity matrix and 1000 bootstrapped trees were
calculated to assess the uncertainty of the tree structure.
For the INRA core collection, dissimilarities were calculated between individuals and the geographical structure of the wheat germplasm diversity was analyzed by
a Ward dendrogram between accessions. Data analyses
were conducted using DARwin software (Perrier et al.,
2003; Perrier and Jacquemoud-Collet, 2006).
Association Studies
Dough consistency was estimated in 2005 and 2006 from
the mixograph test as described in Bordes et al. (2011).
cu bizolles et al .: mining snps in the repetitive fraction of the wheat genome
3
of
11
Accessions of the INRA core collection were grown in
2005 and 2006 in 10-m2 plots in a complete randomized block design at the INRA plant breeding station
in Clermont-Ferrand (France) using regional farming
methods, which include applying appropriate fertilizers
and pesticides. To test dough quality, white flour was
obtained with the Chopin Dubois CD1 mill (Tripette &
Renaud, Villeneuve la Garenne, France), which gives a
70% extraction rate. Mixograph tests were assessed on 10
g of flour. Mixograph parameters were estimated from the
height and width of the mixograph curve as described by
Martinant et al. (1998). The average height of the curve
measured at peak time, peak time plus 2 min, and peak
time plus 8 min were used as a measure of the dough consistency (mixing consistency). Association between markers and mixing consistency was tested for the mean of the
2 yr with the TASSEL version 5.0 software (Bradbury et
al., 2007) using the mixed linear model (MLM-Q-K) with
controls for population structure (Q-matrix) and kinship
between accessions (K-matrix). The kinship matrix was
estimated with 527 of the 554 DArT markers described
in Bordes et al. (2011) located on all chromosomes except
chromosome 3B, according to the method proposed by
Rincent et al. (2014). The power of association of a marker
with mixing consistency is represented by a logarithm
of the odds (LOD) score, computed as log(P), where P is
the probability for a SNP to be associated with the trait.
Markers were considered as associated if the LOD scores
were 3 and 4, which indicate the significance and high
significance thresholds, respectively.
Results and Discussion
Transposable Elements are a Rich Source
of SNPs in Wheat
IsbpFinder (Paux et al., 2010) was used to mine for ISBP
markers in 16,159 chromosome 3B scaffolds (Choulet et al., 2014). Overall, a total of 410,124 ISBPs were
found, including 178,535 high-confidence, 160,473
medium-confidence, and 71,116 low-confidence markers
(Supplemental Table S2). To develop a high-throughput
approach, we investigated the use of target enrichment strategies for efficient and high-throughput ISBP
sequence capture. Our main problem laid in the repetitive nature of these markers. Indeed, the use of a repeat
as bait would have led to the unspecific capture of all the
repeats from the same family. To overcome this limitation, we focused on a specific subset of ISBP markers
corresponding to the insertion of a TE in a low-copy
sequence (either coding or noncoding). Out of 410,124
ISBPs, 184,452 corresponded to a junction between a TE
and a low-copy sequence (58,676 high-confidence and
125,776 medium-confidence). A de novo repeat detection
approach based on k-mer frequency was further applied
to identify and mask any potential repeated stretches
of DNA in the low-copy sequence. The purpose was to
ensure that the template used for bait design was free
4
of
11
from repeated DNA (to prevent cross-hybridization).
We finally designed 120-nt baits in the unrepeated DNA
flanking a TE for target enrichment. This process was
fully automatized in a program called IsbpProbeDesign
(see the Material and Methods section). Eventually,
57,705 120-nt baits were identified in the 184,452 TE
low-copy ISBPs. These baits were filtered according to
their guanine–cytosine (GC) content to avoid the presence of too GC-poor or GC-rich sequences in the capture
kit. Indeed, GC-poor or GC-rich sequences are known
to affect both PCR amplification and capture efficiency
(Asan et al., 2011; Chilamakuri et al., 2014). Baits with a
GC content ranging from 33.3 to 70.0% were submitted
to Agilent for the construction of a SureSelect sequence
capture kit. In contrast to most of the previously
reported studies that used tiling designs with several
overlapping or nonoverlapping baits to capture a large
contiguous region, our design targeted 52,265 discrete
loci (average length = 221 bp) with one single bait per
locus (Supplemental Table S3). The total capture space
was 6272 kb, (52,265 × 120 bp) corresponding to a targeted space of 11,556 kb (52,265 × 221 bp).
The corresponding SureSelect kit was used for target
enrichment in 10 hexaploid wheat cultivars and lines.
These lines correspond to European elite varieties, except
for Chinese Spring, which is the reference cultivar used
for genome sequencing (Choulet et al., 2014; International Wheat Genome Sequencing Consortium, 2014)
and was used as control.
Out of the 52,265 targeted ISBPs, 51,437 (98.4%)
were efficiently captured in at least one of the 10 lines.
The vast majority (71.3%) of ISBPs were captured in all
lines. On average, 87.2 ± 0.8% of the 51,437 ISBPs were
captured in a given accession, except for Chinese Spring,
in which 95.4% were efficiently captured. This demonstrates the reliability of this approach and, in addition,
suggests that the absence of a targeted ISBP in a given
line might reflect the absence of the corresponding locus
in its genome. The 828 loci that were not captured in any
accession were clearly biased toward GC-rich sequences,
with 51.4% of the baits having a GC content greater
than 60% (compared to 10.7% for all baits and 9.7% for
captured ones). This result was not unexpected, as it has
been shown that the optimal GC content for sequence
capture baits was 40 to 60% (Asan et al., 2011).
A total of 49,836 SNPs were identified in 18,134
ISBPs (Supplemental Table S4). These SNPs were classified into four categories: (i) 33,220 (66.7%) loci showing
only homozygous AA and BB alleles among all 10 lines,
(ii) 5857 (11.8%) loci with homozygous AA and BB and
heterozygous AB alleles, (iii) 8858 (17.8%) with only
homozygous AA and heterozygous AB alleles, and (iv)
1901 (3.8%) with only heterozygous AB in all 10 lines.
The majority of these SNPs (69.7%) corresponded to
transition and 30.3% to transversion. This rate is consistent with previous studies and was expected, as the high
methylation level in TEs leads to an increase in mutation
the pl ant genome

m arch 2016

vol . 9, no . 1
frequency at deaminated sites (Clark et al., 2005; Paux et
al., 2010; Rabinowicz et al., 2005).
For subsequent analyses, only Class 1 (AA and BB)
and Class 2 (AA, AB, and BB) SNPs were selected. These
39,077 SNPs were identified in 16,556 ISBP markers (Supplemental Table S5). The number of SNPs per ISBP ranged
from 1 to 12, with an average of 2.36 ± 1.58, corresponding to a density of one SNP every 98 bp, which was highly
similar to our previous estimate of one SNP every 99 bp
(Paux et al., 2010). This density was found to be comparable between the TE and low-copy sides of ISBPs (one
SNP every 95 bp and one every 100 bp, respectively), suggesting that if TE methylation is responsible for the higher
mutation rate of ISBPs compared to genes, methylation is
likely to spread to neighboring regions, as has already been
reported in other studies (for examples, see Martin et al.,
2009; Eichten et al., 2012 and the references therein).
In hexaploid wheat, the B genome is known to be the
most polymorphic, accounting for ~47.5% of all polymorphisms, compared to 40.8 and 11.7% for the A and
D genomes, respectively (International Wheat Genome
Sequencing Consortium, 2014). Based on these proportions, one can extrapolate that this approach would lead
to the discovery of more than 400,000 ISBP-SNPs in the
whole wheat genome. As this number is relatively low
compared to whole-genome resequencing approaches
(Lai et al., 2015; International Wheat Genome Sequencing Consortium, 2014), this complexity reduction
approach allows for SNPs to be mined in a larger set of
lines, thus reducing ascertainment bias.
The PIC values ranged from 0.18 to 0.50, with an
average of 0.34, which is slightly higher than the average
PIC value observed for transcript-specific SNPs in a set of
23 wheat varieties representing a broad cross-section of
wheat germplasm (Allen et al., 2011). In addition, SNPs
located on the same ISBP can be used to define haplotypes (Rafalski, 2002). To assess the number of haplotypes
in our dataset, we focused on a subset of 25,832 Class 1
SNPs without missing data. The number of SNPs per ISBP
ranged from 1 to 11. Among the 6830 multi-SNP ISBPs,
the number of haplotypes ranged from two to seven and
was highly correlated with the SNP number (R = 0.977,
p = 10-7). When haplotypes were considered, PIC values
ranged from 0.18 to 0.82, with an average of 0.42 ± 0.15.
Insertion Site-based Polymorphism SNPs are
Amenable to KASPar Genotyping
To validate our SNP discovery process and assess the
compatibility of ISBP-SNPs with genotyping technologies, a set of 3300 markers that were evenly distributed
on chromosome 3B scaffolds was submitted to KBioscience for the synthesis of KASPar assays. To keep the
genome-specificity of ISBP markers, the design was done
so that the allele-specific primer was located on the SNP
but the locus-specific primer was designed on the other
side of the boundary between the TE and the low-copy
sequence. Eventually, 3284 (99.5%) KASPar assays were
successfully designed and used to genotype a panel of
460 wheat lines (Supplemental Table S6). These lines
comprised 93 European elite cultivars (the elite panel)
and the INRA core collection’ (Balfourier et al., 2007).
Genotyping data on the elite panel and the INRA
core collection were used to calculate performance
metrics commonly applied to SNP genotyping, namely
conversion rate, call rate, completeness, and accuracy
(Hardenbol et al., 2005). Out of the 3284 KASPar assays
synthesized, 2842 could be examined, corresponding to
a “conversion rate” of 86.6%. This rate is higher than the
one observed for gene-derived SNPs using the KASPar
technology (Allen et al., 2011). This difference is probably caused by the hexaploid nature of the wheat genome
in which the presence of homoeologous target sites can
result in genotype call clustering failure (Akhunov et
al., 2009). Here, because of the uniqueness of the ISBPs
as well as the way the KASPar probes were designed,
most of the assays behaved in a diploid manner, with
homozygous clusters clearly separated on each side of
the diagonal defined by the heterozygous cluster. The
overall heterozygosity rate was found to be lower than
1%, as expected for homozygous wheat lines, with only
3% of the accessions having a heterozygosity rate greater
than 3%. The number of missing data per genotype
ranged from 17 to 1058, with only 44 out of 460 lines
(9.6%) having more than 10% missing data. This can be
extrapolated as a 90.5% “call rate” over the whole analysis. Finally, the percentage of total genotypes returned
across all markers, defined as the completeness, was
94.6%, with 5.4% missing data. However, it is worth noting that, as previously stated, these missing data might
reflect the fact that the corresponding loci are absent
from the genome of some lines. In this case, it might be
interesting to consider these data as presence or absence
variations, which are known to occur frequently in plant
genomes, especially in the noncoding repetitive fraction
where TE insertions are not necessarily conserved across
all accessions (Saxena et al., 2014). To assess the accuracy
of KASPar results, the genotyping data were compared
with the sequencing data of 7 out of the 10 lines used for
SNP discovery. In total, 94.8% of the two sets were consistent. Most of the discrepancies (3.3% out of 5.2%) were
caused by missing data either in the sequencing or in the
genotyping results. An additional 1.7% of discrepancies
was caused by genotypes that were called heterozygous
by one technique and homozygous by the other one. The
remaining 0.2% corresponded to genotypes with two different homozygous alleles with the two techniques. Interestingly, out of the 519 SureSelect missing data, almost
one half (243) were confirmed by KASPar data, reinforcing the hypothesis that these missing data might be true
presence or absence variations.
Over the past few years, KASPar technology has
become one of the most widely used technologies in plant
breeding because of its flexibility, relatively low cost and
high throughput (Michael, 2014). However, it is worth
noting that ISBP-SNPs have already been proven to be
compatible with other genotyping technologies such as
cu bizolles et al .: mining snps in the repetitive fraction of the wheat genome
5
of
11
melting curve analysis, the Life Technologies SNaPshot
Multiplex System, and the Illumina BeadArray technology (Paux et al., 2010), as well as high-resolution melting
analysis (Schulman et al., 2012), Illumina iSelect, and
Affymetrix Axiom (unpublished data). In addition, one
could consider using the ISBP sequence capture approach
as a complexity reduction step before sequencing-based
genotyping. Indeed, thanks to its high reproducibility,
this technique might circumvent the high level of missing data typically observed with classical genotyping-bysequencing approaches (Fu, 2014).
Insertion Site-based Polymorphism SNPs Allow
for the Saturation of Wheat Chromosomes
To assess the distribution of total targeted ISBPs, polymorphic ISBPs carrying Class 1 or Class 2 SNPs as well
as ISBP-SNPs in the wheat genome, ISBP sequences were
mapped back to the chromosome 3B pseudomolecule.
Since ISBPs were designed at an early stage of the chromosome 3B sequencing and considering that some scaffolds were not anchored on the pseudomolecule, 96% of
the targeted ISBPs (50,408 out of 52,265), polymorphic
ISBPs (15,869 out of 16,556) and SNPs (37,544 out of
39,077) have a position on this sequence.
The distribution of targeted ISBPs was found to follow
a gradient along the centromere–telomere axis (Fig. 1A).
Considering that we focused on TE low-copy DNA ISBP
markers, this result was not unexpected, as the percentage
of low-copy sequences is higher in distal regions than in
proximal ones (Choulet et al., 2014). The distribution of
polymorphic ISBPs was found to be correlated with that of
total ISBPs (R = 0.79, p = 0). The average polymorphic ISBP
density was one every 48.8 kb. About three-fourths (77.4%)
of the inter-ISBP distances were shorter than 50 kb, the
largest region without any ISBP marker being 2.16 Mb. On
average, 32.6% of ISBPs were polymorphic. Although this
percentage was found to be quite even on the short arm
of chromosome 3B, it appeared to be much more variable
on the long arm (Fig. 1B). In particular, two regions of
~30 Mb were identified in which this percentage dropped
down to 11 to 12%. Such regions might indicate domestication or selection footprints (Nielsen et al., 2005).
Similarly, the SNP distribution was found to be
highly correlated with that of polymorphic ISBPs (R
= 0.98, p = 0), with an average of 2.36 SNPs per ISBP.
The number of SNPs per ISBP was quite even along the
chromosome, except in some regions (including the two
aforementioned ones), reinforcing the hypothesis that
they might correspond to regions that have been fixed
through domestication or selection (Fig. 1C). However,
the overall results demonstrate that the polymorphism
level of ISBP markers (both in terms of polymorphic versus nonpolymorphic loci and in terms of SNP number)
was not impacted by their position on the chromosome.
In contrast, allele frequencies were found to vary
greatly along the chromosome. In the elite panel and the
INRA core collection, the PIC values ranged from 0.004
to 0.500, with an average value of 0.286 (0.268 in the elite
6
of
11
panel and 0.285 in the INRA core collection). This figure
is comparable to other estimates in wheat (Allen et al.,
2011; Chao et al., 2009; Edwards et al., 2009). The distribution of PIC values was not even along the chromosome
(Fig. 1D). A strong decrease was observed in the proximal part of chromosome 3B, which is consistent with
previous studies in wheat (Akhunov et al., 2010; Jordan
et al., 2015) as well as in maize (Zea mays L.) (Gore et
al., 2009). This region fits almost perfectly with the centromeric–pericentromeric region (265–387 Mb), defined
previously on the basis of the absence of meiotic recombination and high linkage disequilibrium (Choulet et al.,
2014), as well as high TE density (Daron et al., 2014) and
enrichment in constitutively expressed genes (Pingault et
al., 2015) and ancestrally conserved genes (Glover et al.,
2015). Strikingly, in this region that encompasses more
than 400 SNPs, two main haplotypes were observed. The
less frequent one was found in a set of roughly 30 lines,
mostly originating from Asia (China and Japan) as well
as from the European elite germplasm pool. The genetic
proximity of Asian and European lines in the pericentromeric region of chromosome 3B might be explained by
the use of Japanese lines such as ‘Akakomugi’ or ‘Norin
10’ in European breeding programs for the transfer of
Rht genes (Borojevic and Borojevic, 2005).
Insertion Site-based Polymorphism SNPs Can
be Used Efficiently for Wheat Breeding
To assess the usefulness of ISBP-SNPs to discriminate
between wheat lines, genotyping data from 2842 KASPar
assays were used to describe diversity within the elite
panel and the INRA core collection. Figure 2 shows the
Neighbor-Joining tree among the 93 elite lines of the
elite panel. As indicated by the high bootstrap values,
each accession was clearly differentiated from the genetically nearest one. It is worth noting that a group of nine
varieties (on the top of the tree) differentiates from the
others. These correspond to the European material carrying an Asian-related pericentromere, as discussed
previously. Figure S1 illustrates the Ward dendrogram
of the 367 accessions of the INRA core collection, which
was originally designed to capture more than 98% of the
total allelic diversity observed at 38 polymorphic SSR
loci in a sample of 4000 bread wheat accessions. As previously described (Horvath et al., 2009), the analysis of
the population structure within the core collection using
a set of whole-genome DArT markers led to five groups
of accessions which can be related to their geographical
origins: Western Europe, Eastern Europe, Mediterranean, Asia, and Nepal. In the present study, ISBP-SNPs
succeeded in recovering geographical structure within
five groups, with a clear separation between accessions
from the Asian gene pool (Asia and Nepal) and from the
European gene pool (Western Europe, Eastern Europe,
and Mediterranean). When we compare our results with
those obtained from a set of whole-genome DArT markers, some discrepancies in the clustering of accessions
were observed. However, one should keep in mind that
the pl ant genome

m arch 2016

vol . 9, no . 1
Fig. 1. Distribution of polymorphism-related features along bread wheat chromosome 3B. (A) Average number of targeted insertion sitebased polymorphisms (ISBPs). (B) Average percentage of polymorphic ISBPs. (C) Average number of single-nucleotide polymorphisms
(SNPs) per ISBP. (D) Average polymorphism index content (PIC) value. All features are calculated in a 10-Mb sliding window (Step 1
Mb). The x-axis represents the position on the chromosome 3B pseudomolecule (in Mb).
cu bizolles et al .: mining snps in the repetitive fraction of the wheat genome
7
of
11
Fig. 2. Neighbor-Joining tree showing phylogenetic relationships between 93 wheat elite lines revealed by insertion site-based polymorphism-derived single-nucleotide polymorphisms (ISBP-SNPs). Bootstrapped values are given in blue.
our SNPs originated from one chromosome only and this
might have led to a slightly different genetic structure.
Consistent with this is the bottom branch of the tree that
contains accessions harboring the minor Asian pericentromeric haplotype mentioned previously.
KASPar genotyping data from 2745 ISBP-SNPs with
a minor allele frequency greater than 5% were also used
to conduct a genome-wide association study for dough
rheology on chromosome 3B. Phenotypic data derived
from the INRA core collection were retrieved from
Bordes et al. (2011). A strong association (LOD score >3)
was found with 34 SNPs. These SNPs defined a region of
~3 Mb located on the long arm of chromosome 3B (Fig.
3). With 34 SNPs, one might theoretically expect more
than 17 billion possible haplotypes. Instead, only 11
haplotypes were observed, the two most common ones
representing 52.3 and 37.8%, respectively. The average
mixograph consistency values of these two haplotypes
were 32.3 and 34.6% of the curve width, respectively,
suggesting a 6.9% difference between the two groups.
8
of
11
The 3-Mb region contains 47 annotated protein-coding
genes. Their expression profiles were investigated in 15
conditions covering the entire plant development process
(Pingault et al., 2015). Sixteen genes did not show any
expression whereas three of them showed an increased
expression in grain 30 d after anthesis (Zadoks scale 85),
including two putative a or β hydrolases and one protein
of unknown function. These genes might be good candidates for dough rheology.
Because of their ability to distinguish efficiently
between genetically close accessions as well as their
potential association with traits of interest, ISBP-SNPs
are useful tools for marker-assisted breeding, both for
describing genetic diversity and for detecting genomic
regions involved in agronomic traits. This is also exemplified by the recent use of these markers in markerbased phenology modeling (Bogard et al., 2014) or
growth stage prediction (Bogard et al., 2015).
the pl ant genome

m arch 2016

vol . 9, no . 1
Fig. 3. Genome-wide association study results using 2745 single-nucleotide polymorphisms (SNPs) located on chromosome 3B in the
Institut National de la Recherche Agronomique core collection for dough consistency. The black horizontal dashed line indicates the
threshold of significance. The x-axis represents the position on the chromosome 3B pseudomolecule (in Mb).
Conclusions
In this study, we developed a high-throughput method
for SNP discovery in the repetitive fraction of the wheat
genome. By using sequence capture, this approach allows
for SNP discovery in both regions of interest and at the
whole-genome level in species for which genome size
makes whole-genome resequencing too expensive. In addition, thanks to the genome specificity of ISBP markers,
this approach makes SNP discovery and genotyping much
more straightforward, by reducing the confounding effect
of homoeologous and paralogous SNPs in polyploid and
highly duplicated genomes. Despite the fact that they are
not derived from coding regions, their high density allows
for virtually all genetic loci to be in linkage disequilibrium
with at least one marker. In addition, their compatibility
with KASPar genotyping technology, as well as their ability to discriminate between closely related wheat lines,
makes them suitable markers for molecular breeding.
Supplemental Material
Supplemental Figure S1. Ward dendrogram showing the
phylogenetic relationships among wheat accessions of the
INRA core collection revealed by ISBP-SNPs. Accessions
are named according to their genetic resource number
(ERGE) in the INRA Biological Resources Center. They
are colored according to five groups related to previous
structure analysis (Horvath et al., 2009): Asia (orange),
Nepal (pink), Northwest Europe (blue), Eastern Europe
(red), and Mediterranean (green).
Supplemental Table S1. List of wheat lines genotyped
with KASPar assays and the corresponding panel.
Supplemental Table S2. List of 412,124 ISBPs detected
on chromosome 3B with the IsbpFinder program.
Supplemental Table S3. List of 52,265 ISBPs used for
bait design.
Supplemental Table S4. List of 49,836 SNPs with
their corresponding alleles in the 10 resequenced lines
and their SNP category.
Supplemental Table S5. List of 39,077 Class1 and
Class 2 SNPs with their context sequence.
Supplemental Table S6. List of 3284 KASPar assays.
Acknowledgments
This work was supported by two grants from the French National
Research Agency (ANR-09-GENM-025 3BSEQ and ANR-09-GENM-027
GAIN-SPEED), grants from France Agrimer, and the private-public consortium DIGITAL (INRA, AgriObtentions, ARVALIS – Institut du Végétal, Bayer CropScience, Florimond-Desprez, KWS LOCHOW GMBH,
RAGT, and Syngenta). Sequence capture experiments were conducted
on the genotyping platform GENTYANE at INRA Clermont-Ferrand
(gentyane.clermont.inra.fr, accessed 2 Dec. 2015). Seeds from the INRA
core collection were provided by the Biological Resources Centre on
small-grain cereals at INRA Clermont-Ferrand (www6.clermont.inra.fr/
umr1095_eng/Teams/Experimental-Infrastructure/Biological-ResourcesCentre, accessed 2 Dec. 2015).
References
Akhunov, E., C. Nicolet, and J. Dvorak. 2009. Single nucleotide polymorphism genotyping in polyploid wheat with the Illumina GoldenGate
assay. Theor. Appl. Genet. 119:507–517. doi:10.1007/s00122-009-1059-5
Akhunov, E.D., A.R. Akhunova, O.D. Anderson, J.A. Anderson, N. Blake,
M.T. Clegg, et al. 2010. Nucleotide diversity maps reveal variation in
diversity among wheat genomes and chromosomes. BMC Genomics
11:702. doi:10.1186/1471-2164-11-702
Allen, A.M., G.L. Barker, P. Wilkinson, A. Burridge, M. Winfield, J.
Coghill, et al. 2013. Discovery and development of exome-based,
co-dominant single nucleotide polymorphism markers in hexaploid wheat (Triticum aestivum L.). Plant Biotechnol. J. 11:279–295.
doi:10.1111/pbi.12009
cu bizolles et al .: mining snps in the repetitive fraction of the wheat genome
9
of
11
Allen, A.M., G.L.A. Barker, S.T. Berry, J.A. Coghill, R. Gwilliam, S. Kirby,
et al. 2011. Transcript-specific, single-nucleotide polymorphism
discovery and linkage analysis in hexaploid bread wheat (Triticum
aestivum L.). Plant Biotechnol. J. 9:1086–1099. doi:10.1111/j.14677652.2011.00628.x
Anderson, J.A., G.A. Churchill, J.E. Autrique, S.D. Tanksley, and M.E. Sorrells. 1993. Optimizing parental selection for genetic-linkage maps.
Genome 36:181–186. doi:10.1139/g93-024
Asan, Y. Xu, H. Jiang, C. Tyler-Smith, Y. Xue, T. Jiang, et al. 2011. Comprehensive comparison of three commercial human whole-exome capture platforms. Genome Biol. 12:R95. doi:10.1186/gb-2011-12-9-r95
Balfourier, F., V. Roussel, P. Strelchenko, F. Exbrayat-Vinson, P. Sourdille,
G. Boutet, et al. 2007. A worldwide bread wheat core collection
arrayed in a 384-well plate. Theor. Appl. Genet. 114:1265–1275.
doi:10.1007/s00122-007-0517-1
Bogard, M., J.-B. Pierre, B. Huguenin-Bizot, D. Hourcade, E. Paux, X. Le
Bris, et al. 2015. A simple approach to predict growth stages in winter
wheat (Triticum aestivum L.) combining prediction of a crop model
and marker based prediction of the deviation to a reference cultivar: A case study in France. Eur. J. Agron. 68:57–68. doi:10.1016/j.
eja.2015.04.007
Bogard, M., C. Ravel, E. Paux, J. Bordes, F. Balfourier, S. Chapman, et al.
2014. Predictions of heading date in bread wheat (Triticum aestivum
L.) using QTL-based parameters of an ecophysiological model. J. Exp.
Bot. 65:5849–5865. doi:10.1093/jxb/eru328
Bordes, J., C. Ravel, J. Le Gouis, A. Lapierre, G. Charmet, and F. Balfourier.
2011. Use of a global wheat core collection for association analysis of
flour and dough quality traits. J. Cereal Sci. 54:137–147. doi:10.1016/j.
jcs.2011.03.004
Borojevic, K., and K. Borojevic. 2005. The transfer and history of “reduced
height genes” (Rht) in wheat from Japan to Europe. J. Hered. 96:455–
459. doi:10.1093/jhered/esi060
Bradbury, P.J., Z. Zhang, D.E. Kroon, T.M. Casstevens, Y. Ramdoss, and
E.S. Buckler. 2007. TASSEL: Software for association mapping of
complex traits in diverse samples. Bioinformatics 23:2633–2635.
doi:10.1093/bioinformatics/btm308
Brenchley, R., M. Spannagl, M. Pfeifer, G. Barker, R. D’Amore, A. Allen,
et al. 2012. Analysis of the bread wheat genome using whole-genome
shotgun sequencing. Nature 491:705–710. doi:10.1038/nature11650
Cao, J., K. Schneeberger, S. Ossowski, T. Gunther, S. Bender, J. Fitz, et al.
2011. Whole-genome sequencing of multiple Arabidopsis thaliana
populations. Nat. Genet. 43:956–963. doi:10.1038/ng.911
Cavanagh, C.R., S. Chao, S. Wang, B.E. Huang, S. Stephen, S. Kiani, et al.
2013. Genome-wide comparative diversity uncovers multiple targets of selection for improvement in hexaploid wheat landraces and
cultivars. Proc. Natl. Acad. Sci. USA 110:8057–8062. doi:10.1073/
pnas.1217133110
Chao, S., W. Zhang, E. Akhunov, J. Sherman, Y. Ma, M.-C. Luo, et al. 2009.
Analysis of gene-derived SNP marker polymorphism in US wheat
(Triticum aestivum L.) cultivars. Mol. Breed. 23:23–33. doi:10.1007/
s11032-008-9210-6
Chilamakuri, C.S., S. Lorenz, M.-A. Madoui, D. Vodak, J. Sun, E. Hovig, et al.
2014. Performance comparison of four exome capture systems for deep
sequencing. BMC Genomics 15:449. doi:10.1186/1471-2164-15-449
Choulet, F., A. Alberti, S. Theil, N. Glover, V. Barbe, J. Daron, et al. 2014.
Structural and functional partitioning of bread wheat chromosome
3B. Science 345:1249721. doi:10.1126/science.1249721
Choulet, F., T. Wicker, C. Rustenholz, E. Paux, J. Salse, P. Leroy, et al. 2010.
Megabase level sequencing reveals contrasted organization and evolution patterns of the wheat gene and transposable element spaces. Plant
Cell 22:1686–1701. doi:10.1105/tpc.110.074187
Clark, R.M., S. Tavare, and J. Doebley. 2005. Estimating a nucleotide substitution rate for maize from polymorphism at a major domestication
locus. Mol. Biol. Evol. 22:2304–2312. doi:10.1093/molbev/msi228
Daron, J., N. Glover, L. Pingault, S. Theil, V. Jamilloux, E. Paux, et al. 2014.
Organization and evolution of transposable elements along the bread
wheat chromosome 3B. Genome Biol. 15:546. doi:10.1186/s13059-0140546-4
10
of
11
Edwards, K.J., A.L. Reid, J.A. Coghill, S.T. Berry, and G.L. Barker. 2009.
Multiplex single nucleotide polymorphism (SNP)-based genotyping
in allohexaploid wheat using padlock probes. Plant Biotechnol. J.
7:375–390. doi:10.1111/j.1467-7652.2009.00413.x
Eichten, S.R., N.A. Ellis, I. Makarevitch, C.-T. Yeh, J.I. Gent, L. Guo, et al.
2012. Spreading of heterochromatin is limited to specific families of
maize retrotransposons. PLoS Genet. 8:e1003127. doi:10.1371/journal.
pgen.1003127
Fu, Y.-B. 2014. Genetic diversity analysis of highly incomplete SNP genotype data with imputations: an empirical assessment. G3 (Bethesda)
4(5): 891–900. doi: 10.1534/g3.114.010942.
Ganal, M.W., T. Altmann, and M.S. Röder. 2009. SNP identification
in crop plants. Curr. Opin. Plant Biol. 12:211–217. doi:10.1016/j.
pbi.2008.12.009
Ganal, M.W., A. Polley, E.M. Graner, J. Plieske, R. Wieseke, H. Luerssen,
et al. 2012. Large SNP arrays for genotyping in crop plants. J. Biosci.
37:821–828. doi:10.1007/s12038-012-9225-3
Glover, N., J. Daron, L. Pingault, K. Vandepoele, E. Paux, C. Feuillet, et al.
2015. Small-scale gene duplications played a major role in the recent
evolution of wheat chromosome 3B. Genome Biol. 16:188. doi:10.1186/
s13059-015-0754-6
Gore, M.A., J.M. Chia, R.J. Elshire, Q. Sun, E.S. Ersoz, B.L. Hurwitz, et al.
2009. A first-generation haplotype map of maize. Science 326:1115–
1117. doi:10.1126/science.1177837
Graner, A., H. Siedler, A. Jahoor, R.G. Herrmann, and G. Wenzel. 1990.
Assessment of the degree and the type of restriction-fragment-lengthpolymorphism in barley (Hordeum vulgare). Theor. Appl. Genet.
80:826–832. doi:10.1007/BF00224200
Hardenbol, P., F. Yu, J. Belmont, J. MacKenzie, C. Bruckner, T. Brundage, et
al. 2005. Highly multiplexed molecular inversion probe genotyping:
Over 10,000 targeted SNPs genotyped in a single tube assay. Genome
Res. 15:269–275. doi:10.1101/gr.3185605
Hillier, L.W., G.T. Marth, A.R. Quinlan, D. Dooling, G. Fewell, D. Barnett,
et al. 2008. Whole-genome sequencing and variant discovery in C.
elegans. Nat. Methods 5:183–188. doi:10.1038/nmeth.1179
Horvath, A., A. Didier, J. Koenig, F. Exbrayat, G. Charmet, and F. Balfourier. 2009. Analysis of diversity and linkage disequilibrium along
chromosome 3B of bread wheat (Triticum aestivum L.). Theor. Appl.
Genet. 119:1523–1537. doi:10.1007/s00122-009-1153-8
International Wheat Genome Sequencing Consortium. 2014. A chromosome-based draft sequence of the hexaploid bread wheat genome.
Science 345:1251788. doi:10.1126/science.1251788
Jordan, K., S. Wang, Y. Lun, L.-J. Gardiner, R. MacLachlan, P. Hucl, et al.
2015. A haplotype map of allohexaploid wheat reveals distinct patterns of selection on homoeologous genomes. Genome Biol. 16:48.
doi:10.1186/s13059-015-0606-4
Kurtz, S., A. Narechania, J.C. Stein, and D. Ware. 2008. A new method
to compute K-mer frequencies and its application to annotate large
repetitive plant genomes. BMC Genomics 9:517. doi:10.1186/14712164-9-517
Lai, K., C. Duran, P.J. Berkman, M.T. Lorenc, J. Stiller, S. Manoli, et al. 2012.
Single nucleotide polymorphism discovery from wheat next-generation sequence data. Plant Biotechnol. J. 10:743–749. doi:10.1111/j.14677652.2012.00718.x
Lai, K., M.T. Lorenc, H.C. Lee, P.J. Berkman, P.E. Bayer, P. Visendi, et al.
2015. Identification and characterization of more than 4 million intervarietal SNPs across the group 7 chromosomes of bread wheat. Plant
Biotechnol. J. 13:97–104. doi:10.1111/pbi.12240
Lam, H.-M., X. Xu, X. Liu, W. Chen, G. Yang, F.-L. Wong, et al. 2010.
Resequencing of 31 wild and cultivated soybean genomes identifies
patterns of genetic diversity and selection. Nat. Genet. 42:1053–1059.
doi:10.1038/ng.715
Liu, L., Y. Li, S. Li, N. Hu, Y. He, R. Pong, et al. 2012. Comparison of nextgeneration sequencing systems. J. Biomed. Biotechnol. 2012: 251364.
doi:10.1155/2012/251364.
Marth, G.T., I. Korf, M.D. Yandell, R.T. Yeh, Z. Gu, H. Zakeri, et al. 1999. A
general approach to single-nucleotide polymorphism discovery. Nat.
Genet. 23:452–456. doi:10.1038/70570
the pl ant genome

m arch 2016

vol . 9, no . 1
Martin, A., C. Troadec, A. Boualem, M. Rajab, R. Fernandez, H. Morin, et
al. 2009. A transposon-induced epigenetic change leads to sex determination in melon. Nature 461:1135–1138. doi:10.1038/nature08498.
Martinant, J.P., Y. Nicolas, A. Bouguennec, Y. Popineau, L. Saulnier, and
G. Branlard. 1998. Relationships between mixograph parameters and
indices of wheat grain quality. J. Cereal Sci. 27:179–189. doi:10.1006/
jcrs.1997.0156
Michael, J.T. 2014. High-throughput SNP genotyping to accelerate
crop improvement. Plant Breeding and Biotechnology 2:195–212.
doi:10.9787/PBB.2014.2.3.195
Nielsen, R., S. Williamson, Y. Kim, M.J. Hubisz, A.G. Clark, and C. Bustamante. 2005. Genomic scans for selective sweeps using SNP data.
Genome Res. 15:1566–1575. doi:10.1101/gr.4252305
Paux, E., S. Faure, F. Choulet, D. Roger, V. Gauthier, J.P. Martinant, et al.
2010. Insertion site-based polymorphism markers open new perspectives for genome saturation and marker-assisted selection in wheat.
Plant Biotechnol. J. 8:196–210. doi:10.1111/j.1467-7652.2009.00477.x
Paux, E., D. Roger, E. Badaeva, G. Gay, M. Bernard, P. Sourdille, et al. 2006.
Characterizing the composition and evolution of homoeologous
genomes in hexaploid wheat through BACend sequencing on chromosome 3B. Plant J. 48:463–474. doi:10.1111/j.1365-313X.2006.02891.x
Paux, E., P. Sourdille, I. Mackay, and C. Feuillet. 2011. Sequence-based
marker development in wheat: Advances and applications to breeding.
Biotechnol. Adv. 30:1071–1088. doi:10.1016/j.biotechadv.2011.09.015
Perrier, X., A. Flori, and F. Bonnot. 2003. Data analysis methods. In: P.
Hamon, M. Seguin, X. Perrier, and J.-C. Glaszmann, editors, Genetic
diversity of cultivated tropical plants. Enfield Science Publishers,
Montpellier, VT. p. 43–76.
Perrier, X., and J.P. Jacquemoud-Collet. 2006. DARwin software. CIRAD
(French Agricultural Research Centre for International Development)
http://darwin.cirad.fr/darwin (accessed 2 Dec. 2015).
Pingault, L., F. Choulet, A. Alberti, N. Glover, P. Wincker, C. Feuillet, et al.
2015. Deep transcriptome sequencing provides new insights into the
structural and functional organization of the wheat genome. Genome
Biol. 16:29. doi:10.1186/s13059-015-0601-9
Rabinowicz, P.D., R. Citek, M.A. Budiman, A. Nunberg, J.A. Bedell, N.
Lakey, et al. 2005. Differential methylation of genes and repeats in
land plants. Genome Res. 15:1431–1440. doi:10.1101/gr.4100405
Rafalski, A. 2002. Applications of single nucleotide polymorphisms in
crop genetics. Curr. Opin. Plant Biol. 5:94–100. doi:10.1016/S13695266(02)00240-6
Rincent, R., L. Moreau, H. Monod, E. Kuhn, A.E. Melchinger, R.A. Malvar, et al. 2014. Recovering power in association mapping panels
with variable levels of linkage disequilibrium. Genetics 197:375–387.
doi:10.1534/genetics.113.159731
Saxena, R.K., D. Edwards, and R.K. Varshney. 2014. Structural variations in
plant genomes. Brief. Funct. Gen. 13:296–307. doi:10.1093/bfgp/elu016
Schulman, A., A. Flavell, E. Paux, and T. Ellis. 2012. The application of LTR
retrotransposons as molecular markers in plants. In: Y. Bigot, editor,
Mobile genetic elements—Methods in molecular biology. Springer,
New York. p. 115–153.
Strömberg, M. 2009. Mosaik assembler. Release 1.0. Boston College, Chestnut Hill, MA.
Wang, S., D. Wong, K. Forrest, A. Allen, S. Chao, B.E. Huang, et al. 2014.
Characterization of polyploid wheat genomic diversity using a highdensity 90 000 single nucleotide polymorphism array. Plant Biotechnol. J. 12:787–796. doi:10.1111/pbi.12183
Wicker, T., K. Mayer, H. Gundlach, M. Martis, B. Steuernagel, U. Scholz, et
al. 2011. Frequent gene movement and pseudogene evolution is common to the large and complex genomes of wheat, barley, and their
relatives. Plant Cell 5:1706–1718. doi:10.1105/tpc.111.086629
Winfield, M.O., P.A. Wilkinson, A.M. Allen, G.L. Barker, J.A. Coghill, A.
Burridge, et al. 2012. Targeted re-sequencing of the allohexaploid
wheat exome. Plant Biotechnol. J. 10:733–742. doi:10.1111/j.14677652.2012.00713.x
Xu, X., X. Liu, S. Ge, J.D. Jensen, F. Hu, X. Li, et al. 2012. Resequencing 50
accessions of cultivated and wild rice yields markers for identifying agronomically important genes. Nat. Biotechnol. 30:105–111.
doi:10.1038/nbt.2050
Zimin, A., K.A. Stevens, M.W. Crepeau, A. Holtz-Morris, M. Koriabine, G.
Marçais, et al. 2014. Sequencing and assembly of the 22-Gb loblolly
pine genome. Genetics 196:875–890. doi:10.1534/genetics.113.159715
cu bizolles et al .: mining snps in the repetitive fraction of the wheat genome
11
of
11