Genetic Diversity, Population Structure, and Linkage Disequilibrium

Published September, 2010
O R I G I N A L R ES E A R C H
Genetic Diversity, Population Structure, and Linkage
Disequilibrium in U.S. Elite Winter Wheat
Dadong Zhang, Guihua Bai,* Chengsong Zhu, Jianming Yu, and Brett F. Carver
Information on genetic diversity and population structure of elite
wheat (Triticum aestivum L.) breeding lines promotes effective use
of genetic resources. We analyzed 205 elite wheat breeding
lines from major winter wheat breeding programs in the USA
using 245 markers across the wheat genomes. This collection
showed a high level of genetic diversity as reflected by allele
number per locus (7.2) and polymorphism information content
(0.54). However, the diversity of U.S. modern wheat appeared
to be lower than previously reported diversity levels in worldwide
germplasm collections. As expected, this collection was highly
structured according to geographic origin and market class with
soft and hard wheat clearly separated from each other. Hard
wheat accessions were further divided into three subpopulations.
Linkage disequilibrium (LD) was primarily distributed around
centromere regions. The mean genome-wide LD decay estimate
was 10 cM (r2 > 0.1), although the extent of LD was highly
variable throughout the genome. Our results on genetic diversity
of different gene pools and the distribution of LD facilitates
the effective use of genetic resources for wheat breeding and
the choice of marker density in gene mapping and markerassisted breeding.
Published in The Plant Genome 3:117–127. Published 15 Sept. 2010.
doi: 10.3835/plantgenome2010.03.0004
© Crop Science Society of America
5585 Guilford Rd., Madison, WI 53711 USA
An open-access publication
All rights reserved. No part of this periodical may be reproduced or
transmitted in any form or by any means, electronic or mechanical,
including photocopying, recording, or any information storage and
retrieval system, without permission in writing from the publisher.
Permission for printing and for reprinting the material contained herein
has been obtained by the publisher.
THE PL ANT GENOME
W
HEAT,
Abstract
Q
SEPTEMBER 2010
Q
VOL . 3, NO . 2
rice (Oryza sativa L.), and maize (Zea mays
L.) are the three staple food crops in the world.
As the human population increases and more maize is
dedicated to a growing biofuel industry, the demand for
high yields of good quality wheat grain will increase. In
the past century, wheat breeders worldwide have made
remarkable progress in improving disease resistance,
grain yield, and end-use quality of wheat. However,
continuous breeding activities may result in erosion of
genetic diversity as a result of intense selection pressure,
recurrent use of a few adapted elite germplasm lines as
parents, and adoption of breeding schemes that do not
perpetuate genetic recombination (Hoisington et al.,
1999). The degree of genetic diversity in contemporary
germplasm from breeding programs may indirectly
reflect the level of genetic progress achievable in future
cultivars. Therefore, evaluation of the genetic diversity
resident in current breeding programs at the molecular
level and integration of this information into cultivar
development programs are essential to using genetic
resources effectively in breeding programs (Chao et
al., 2007).
Genetic diversity, population structure, and LD of a
population provide strategic information for association
mapping and marker-assisted breeding. The resolution of
association mapping and effectiveness of marker-assisted
Dadong Zhang, Chengsong Zhu, and Jianming Yu, Dep. of
Agronomy, Kansas State Univ., 2004 Throckmorton Hall, Manhattan,
KS 66506; Guihua Bai, USDA-ARS Hard Winter Wheat Genetics
Research Unit, Manhattan, KS 66506; Brett Carver, Dep. of Plant
and Soil Sciences, Oklahoma State Univ., 368 Agricultural Hall,
Stillwater, OK 74078. Received 21 Dec. 2009. *Corresponding
author ([email protected]).
Abbreviations: AMOVA, an analysis of molecular variance; LD,
linkage disequilibrium; PCoA, principal coordinate analysis; PIC,
polymorphism information content; QTL, quantitative trait locus(i);
SNP, single-nucleotide polymorphism; STS, sequence-tagged site;
SSR, simple sequence repeat; UPGMA, unweighted pair group
method with arithmetic mean.
117
breeding are mostly determined by LD, the nonrandom
association of alleles at different loci (Flint-Garcia et al.,
2003; Kim et al., 2007). High LD in a population often
facilitates QTL detection with a reduced number of
marker requirements, whereas low LD usually implies
that high mapping resolution can be achieved, provided
that enough markers are available in the target chromosome region. In a self-pollinated species such as wheat,
recombination frequency is obviously low in populations
undergoing a rapid rate of inbreeding with selfing. Thus,
the estimated LD of wheat (0.5–50 cM) (Breseghello and
Sorrells, 2006; Maccaferri et al., 2006; Chao et al., 2007;
Somers et al., 2007) is relatively high compared with
that of maize (200–2000 bp) (Tenaillon et al., 2001) and
Arabidopsis (less than 10 kb) (Kim et al., 2007; Song et
al., 2009). In wheat, LD estimates also vary with chromosome regions across genomes in a population (Breseghello and Sorrells, 2006; Chao et al., 2007). To date, the
extent of LD among contemporary wheat accessions resident in breeding programs has not been reported.
The degree of marker polymorphism and genome
coverage is also important to association mapping and
candidate gene detection. Because of its large genome
size, wheat has the poorest marker coverage compared
with other crops such as maize, rice, barley (Hordeum
vulgare L.), and sorghum [Sorghum bicolor (L.) Moench].
Simple sequence repeat (SSR) markers remain one of the
best marker systems for wheat research because of their
chromosome specificity, high polymorphism (Plaschke
et al., 1995; Huang et al., 2002), high reproducibility, and
codominant inheritance patterns (Roder et al., 1998).
Elite breeding lines tested in regional performance
nurseries represent the most advanced materials derived
from breeding programs and best represent the pool of
candidates for cultivar release or use as parents in future
crosses. A collection of elite breeding lines from major
U.S. regional performance nurseries may represent the
current status of genetic diversity available to major U.S.
hard and soft winter wheat breeding programs. From two
major contemporary U.S. wheat gene pools, our objectives were to estimate the levels of genetic diversity and
LD and to characterize the population structure of these
two major contemporary U.S. wheat gene pools.
MATERIALS AND METHODS
Plant Materials
After removing full-sib lines, this study used 205 wheat
accessions combined from the 2008 U.S. Southern
(SRPN, n = 42) and Northern (NRPN, n = 23) Hard
Winter Wheat Regional Performance Nurseries, 2008
Hard Winter Wheat Regional Germplasm Observation Nursery (RGON, n = 37), an 2008 elite hard winter
wheat nursery at Oklahoma State University (OSU, n =
18), 2008 U.S. Uniform Eastern Soft Red Winter Wheat
Nursery (UESRWWN, n = 32), and 2008 Uniform
Southern Soft Red Winter Wheat Nursery (USSRWWN,
n = 31), plus 22 major cultivars recently released in the
118
hard winter wheat region (Supp. Table 1). Among these
accessions, 115 hard red winter (HRW) wheat and 22
hard white winter (HWW) wheat were from eight major
hard wheat growing states, and 68 soft winter wheat lines
were from 15 major soft wheat growing states in the eastern and midwestern USA (Supp. Table 1). DNA from all
accessions were derived from a single plant in the greenhouse to minimize within-line heterogeneity.
Extraction of DNA and Marker Analysis
Leaf tissue from a single plant was sampled at the two-leaf
stage by means of 1.5-mL strip tubes and dried for 2 d in a
freeze drier (Thermo Fisher, Waltham, MA) for DNA isolation. Tubes containing a 3.2-mm stainless steel bead and
dried tissue were shaken in a Mixer Mill (Retsch GmbH,
Germany) at 25 times s−1 for 5 min. Genomic DNA was
extracted by the cetyltrimethyl ammonium bromide
(CTAB) method (Saghai-Maroof et al., 1984). Polymerase
chain reaction (PCR) amplifications were performed in a
Tetrad Peltier DNA Engine (Bio-Rad Lab, Hercules, CA,
USA) with a 12-μL PCR mixture containing 1.2 μL 10×
PCR buffer (Bioline, Taunton, MA, USA), 2.5 mM MgCl2,
200 μM of each dNTP, 50 nM forward tailed primer, 250
nM reverse primer, 200 nM M13 fluorescent-dye-labeled
primer, 0.6 U Taq DNA polymerase, and about 80 ng of
template DNA. A touchdown PCR program was used for
PCR amplification. Briefly, the reaction was incubated at
95°C for 5 min then continued for five cycles of 1 min at
96°C, 5 min at 68°C with a decrease of 2°C in each subsequent cycle, and 1 min at 72°C. For another five cycles, the
annealing temperature started at 58°C for 2 min with a
decrease of 2°C for each subsequent cycle. Reactions then
went through an additional 25 cycles of 1 min at 96°C, 1
min at 50°C, and 1 min at 72°C with a final extension at
72°C for 5 min. PCR products were analyzed on an ABI
PRISM 3730 DNA Analyzer (Applied Biosystems, Foster
City, CA).
A total of 245 markers covering all 21 wheat chromosomes were selected to analyze the entire population
(Supp. Table 2). All primers were selected according to
currently available genetic maps (Matthews et al., 2003)
(http://www.graingenes.org; verified 11 Aug. 2010) and
previously reported polymorphism information content
(PIC) (Breseghello and Sorrells, 2006; Chao et al., 2007).
Consideration was also given to markers linked to QTL
for several important traits such as resistance to Fusarium
head blight (caused by Fusarium graminearum Schwabe),
wheat rusts [caused by Puccinia graminis Pers. f. sp.
tritici Eriks. & Henn.; Puccinia striiformis Westend. f. sp.
tritici Eriks.; and Puccinia triticina Erikss. (= P. recondita
Roberge ex Desmaz. f. sp. tritici)], soil aluminum toxicity,
preharvest sprouting, and several end-use quality traits.
Included were 86 GWM, 73 WMC, 48 BARC, 12 CFD, 7
GDM, and 3 CFA markers, plus 16 sequence-tagged site
(STS) markers for specific genes or QTL.
Data collected from an ABI DNA Analyzer (Applied
Biosystems, Foster City, CA) were processed by GeneMarker version 1.6 (SoftGenetics LLC, State College, PA)
THE PL ANT GENOME
Q
SEPTEMBER 2010
Q
VOL . 3, NO . 2
and manually checked twice for accuracy. Marker allele
scoring followed Breseghello and Sorrells (2006). All data
from the 245 SSR markers were used in cluster analysis,
but only 79 markers, covering all 42 chromosome arms
at a genetic distance of at least 15 cM apart, were selected
for structure calculation. A linkage map was assembled
according to information from the SSR consensus map
(Somers et al., 2004). Markers that were mapped to
more than one position were assigned by aligning the
Ta-SSR-2004 map with either the wheat composite 2004
map (http://wheat.pw.usda.gov; verified 11 Aug. 2010) or
the Ta-Synthetic/Opata-SSR map (Song et al., 2005) and
by comparing fragment size for markers with known
position on GrainGenes if available. A total of 186 were
selected for LD analysis after discarding markers that
couldn’t be positioned by either of the above methods.
Data Analysis
PowerMarker soft ware version 3.25 (Liu and Muse,
2005) was used to calculate number of alleles and values
of gene diversity and PIC (Botstein, 1980) of each locus
from 245 SSR markers. Distance-based cluster analysis,
using Nei’s distance (Nei, 1973), was conducted separately for hard and soft wheat groups because they were
clearly demarcated in structure analysis. PIC was calculated using the formula PIC = 1 − Σ(Pi)2, where Pi is the
proportion of the population carrying the ith allele.
A model-based (Bayesian) method was used to assess
the number of subpopulations among all accessions
with the soft ware STRUCTURE 2.2 (Pritchard et al.,
2000). Structure was analyzed by means of k-values (an
assumed fi xed number of subpopulations) from 1 to 10
in the entire population. Six independent analyses were
conducted for each k-value with the burn-in time and
replication number set at 5 × 105 and 1 × 105, respectively.
All accessions were assigned to a subpopulation according to the largest probability estimated by the program.
All subpopulations derived from either cluster analysis
or model-based structure analysis were also evaluated
by principal coordinate analysis (PCoA) implemented
by the MVSP 3.13 program (Kovach, 1998). An analysis of molecular variance (AMOVA) (Weir, 1996) was
conducted by Arlequin v. 3.11 (Excoffier et al., 2005) to
partition the variation among and within clusters. The
threshold for statistical significance was determined by
running 1000 permutations.
One hundred eighty-six marker loci that were
assembled in a consensus map, mainly based on linkage
information from the Ta-SSR-2004 map, were used for
genome-wide LD analysis (Supp. Table 2). Pairwise LD
was calculated by TASSEL 1.9.4 (http://www.maize
genetics.net; verified 11 Aug. 2010) (Bradbury et al.,
2007). The comparison-wise significance was computed
by 1000 permutations. LD was estimated separately for
each chromosome, then the genome-wide LD decay
was estimated by plotting the pairwise r2 against the
genetic distance from all 21 chromosomes. A trend line
was drawn by second-degree loess (Cleveland, 1979) by
means of the statistical program R (http://www.r-project.
org; verified 11 Aug. 2010).
RESULTS
Diversity of Marker Alleles
All 245 primers tested were polymorphic across 205
accessions and amplified a total of 2153 alleles at 301 loci,
confirming that most of the selected markers were highly
informative. The total number of alleles per locus varied from 2 to 25, averaging 7.2 alleles per locus and 8.8
alleles per primer (Table 1 and Fig. 1). Primer WMC644
on chromosome 2A amplified the most alleles. The average PIC across all loci was 0.54 with a range from 0.01
(BARC123) to 0.92 (GWM484). Among 137 hard wheat
accessions, 243 primers were polymorphic and amplified 1918 alleles. Only two primers (BARC85 and BYAGi)
were monomorphic (Supp. Table 2 and Fig. 1). The number of alleles per locus ranged from 1 to 22, averaging
6.4 alleles per locus across all hard wheat accessions.
Among 68 soft wheat accessions, 242 primers showed
polymorphism and amplified 1574 alleles. Three primers
(BARC123, CFD106, and PinaD1) were monomorphic.
The number of alleles per locus ranged from 2 to 21 with
an average of 5.2 (Table 1).
Four SSR markers for aluminum tolerance gene
ALMT1, WMC331, GDM125, SSR3a, and SSR3b, were
screened across 205 accessions. SSR3a amplified 15 alleles
in hard wheat and nine in soft wheat, indicating high
genetic variation for this gene marker sequence in the
U.S. wheat germplasm. SSR primer WMC331 amplified
six alleles in hard wheat and three in soft wheat. A high
level of polymorphism was also found for other markers
linked to specific genes or QTLs including Sr2-x3b061c22
(a marker for stem rust resistance gene Sr2), SWM10 (a
marker for a leaf rust resistance gene), csLV34-Lr34 (a
marker for leaf rust resistance gene Lr34), GWM526 (a
marker for Barley yellow dwarf virus resistance gene Bdv2),
GWM533 (a marker for a major wheat Fusarium head
blight resistance QTL FHB1), GWM261 (a marker for the
reduced plant height gene Rht8), GWM111 [a marker for
Russian wheat aphid Diuraphis noxia (Mordvilko), Dn,
resistance gene], GDM33 (a marker for Lr21), CFD132 (a
marker for a Hessian fly resistance gene H13), BARC321
(a marker for seed dormancy), BARC164 (a marker for an
aluminum tolerance QTL), and WMC474 (a marker for
stem rust resistance gene Sr40) (Supp. Table 2).
Genetic Relationship among the Accessions
Distance-based cluster analysis clearly identified two
major clusters in corresponding to their geographic origins and market classes, soft vs. hard (Fig. 2). Within a
class, hard wheat could be classified into three subclusters of 23, 52, and 62 accessions each (Fig. 2). The first
two subclusters primarily contained accessions from the
southern and central Great Plains states, with 20 of 23
and 45 of 52 accessions from Kansas, Oklahoma, Texas,
and Colorado. ‘Jagger’, ‘Ogallala’, and ‘Trego’ were the
ZHANG ET AL .: GENETI C DIVERSIT Y AND POPU L ATION STRUCTU RE OF ELITE WHEAT
119
Table 1. Summary of allelic diversity for 205 U.S. wheat accessions.
Total
Hard wheat
Soft wheat
Sample size
No. of polymorphic markers
No. of alleles scored
Average no. of alleles†
Gene diversity
PIC‡
205
137
68
245
243
242
2153
1918
1574
7.2 (8.8)
6.4 (7.8)
5.2 (6.4)
0.57
0.55
0.54
0.54
0.51
0.50
†
Mean allele number per marker locus; value in parentheses is mean allele number per primer.
PIC: Polymorphism information content.
‡
Figure 1. Frequency distribution of number of markers in each allele category across hard wheat (Hard), soft wheat (Soft), and combined hard and soft (Total) accessions.
prevailing parents in these subclusters. These parents
respectively appeared in the pedigrees of 18, six, and
five of 75 accessions. The third subcluster contained
accessions originating throughout the Great Plains and
could be further divided into two groups with 29 accessions in Group 1 from the southern Great Plains and 33
accessions in Group 2 from the two northern states of
Nebraska and South Dakota (Fig. 2). ‘Wesley’ was a frequent parent in these two states, appearing in the pedigrees of eight accessions.
Soft wheat accessions were classified into three
subclusters. The first subcluster consisted of 35 accessions from 13 of the 14 soft wheat states. The other two
subclusters contained accessions almost all from four
midwestern states (Ohio, Indiana, Illinois, and Missouri)
with 13 of 14 accessions in the second group and 15 of
19 accessions in the third group from these four states.
About 75% of the Indiana accessions (11 out of 14) and
86% of the Ohio accessions (6 out of 7) were grouped in
these two subclusters (Fig. 2).
Population Structure
The number of subpopulations was determined by
model-based structure analysis with model parameter
k from 1 to 10. The maximum likelihood values showed
a typical curvilinear response to increasing k, such that
k = 4 (four subpopulations) was defined to provide the
optimal structure for further analysis. Similar to cluster
analysis, structure analysis clearly separated soft wheat
from hard wheat accessions. Almost all soft wheat accessions (67 of 68) were classified into one group with the
only exception of Atlas 66, which was structured into the
120
hard group with the lowest membership coefficient. Hard
wheat accessions were further divided into three subpopulations: two south-central hard and north hard (Fig. 3).
Two south-central hard wheat subpopulations consisted
of 41 and 39 accessions. The first subpopulation had 30
accessions from Colorado, Kansas, Oklahoma, and Texas
with 21 from Oklahoma alone, and the second subpopulation contained 34 accessions from the same four states
with 21 accessions from Kansas alone. Thirty-seven
accessions were grouped in the north hard subpopulation
with 19 accessions from Nebraska and seven from South
Dakota (Fig. 3).
Similar to structure analysis, the first principal coordinate (PCo1) also clearly separated hard wheat from soft
wheat accessions. Among the three hard wheat subpopulations, the south-central hard subpopulation (majority
of accessions from Kansas) could be singled out as one
subpopulation, whereas the other two showed mixed
geographic origin (Fig. 4).
AMOVA partitioned 10% of the total variation for
genetic diversity as among subcluster and 90% as within
subcluster in the distance-based analysis. Similar variance
partitioning was obtained among the groupings on the
basis of the structure (model-based) analysis (Table 2).
Linkage Disequilibrium
A genome-wide linkage map was assembled on the basis
of a previously reported map (Somers et al., 2004) (http://
wheat.pw.usda.gov/ggpages/map_shortlist.html; verified 11 Aug. 2010). A total of 186 marker loci (60 markers
from the A genome, 65 from the B genome, and 61 from
the D genome) was assembled in the linkage map for
THE PL ANT GENOME
Q
SEPTEMBER 2010
Q
VOL . 3, NO . 2
ZHANG ET AL .: GENETI C DIVERSIT Y AND POPU L ATION STRUCTU RE OF ELITE WHEAT
121
Figure 2. Dendrogram showing the relationships among 137 hard wheat accessions and 68 soft wheat accessions as revealed by cluster analysis based on Nei (1973) genetic distance.
The colors on the map show the geographical distribution of subgroups estimated from hard and soft wheat on the basis of their genetic distance. The number in the map indicates the total
number of accessions collected from the state. The digits on the cluster tree represent the accessions corresponding to Supp. Table 1.
Figure 3. Four subpopulations inferred from structure analysis. The vertical coordinate of each subpopulation indicates the membership coefficients for each individual, and the digits on
the horizontal coordinate represent the accessions corresponding to Supp. Table 1.
122
LD detection. In the map, 62 and 41 genome-wide LD
blocks were identified at P < 0.01 and P < 0.0001 (Fig.
5), respectively. Most LD blocks (79.0%) were <10 cM in
genetic distance. Long-distance LD blocks were identified on three chromosomes (1D, 2A, and 6A); the longest (>40 cM) was on chromosome 6A. LD blocks were
often detected in regions encompassing the centromere,
especially on chromosomes 4D, 5A, and throughout the
B genome (Fig. 5). Combined analysis of all pair-wise
r2 from 21 chromosomes indicated that the r2-values
declined rapidly to 0.1 within 10 cM (Fig. 6).
DISCUSSION
Genetic Diversity in Modern U.S. Winter Wheat
The wheat accessions used in this study represent the
diversity in the most advanced breeding stages of major
publicly and privately operated variety development programs from the major hard winter wheat and soft winter
wheat regions (Fig. 2) and encompass the marketable gene
pools available in both regions. Therefore, information on
genetic diversity, population structure, and LD in this collection should provide relevant guidelines for designing
new breeding strategies for wheat cultivar improvement
and assembly of association mapping populations.
Compared with previous reports, the genetic diversity
discovered across 205 U.S. hard and soft wheat accessions in this study was high, as reflected in gene diversity
value (0.54) and an average number of alleles per loci (7.2).
Other researchers reported averages of 4.2 to 6.9 alleles
per locus (Plaschke et al., 1995; Dreisigacker et al., 2004;
Breseghello and Sorrells, 2006; Maccaferri et al., 2005)
among varied sources of wheat collections. Chao et al.
(2007) reported a level of genetic diversity similar to that
in this study by surveying 42 mapping parents from all
major U.S. wheat breeding programs. These mapping
parents included cultivars from hard winter, soft winter,
and hard spring wheat classes, as well as some hallmark
unadapted germplasm. Similar magnitudes of genetic
diversity between wheat collections from hard and soft
winter wheat regions alone versus from all U.S. wheat
growing areas indicates no erosion of diversity within
gene pools tied to a specific market class. This result is
not unexpected given the extensive sharing of germplasm
among U.S. breeding programs across different regions.
However, higher levels of diversity have been reported
elsewhere. Roussel et al. (2004) evaluated 559 French
wheat accessions with 42 SSR markers and identified
14.5 alleles per locus and a PIC value of 0.66. Huang et al.
(2002) studied a world collection of 998 accessions and
found a gene diversity value of 0.77 and 18.1 alleles per
locus at 26 SSR loci. It is possible that the higher genetic
diversity in these studies resulted from use of a highly
selective set of markers with greatest PIC. However, when
markers common to both sets of studies were compared,
the genetic diversity parameters from their studies were
still significantly higher than our results. For example,
the average number of alleles and PIC in Roussel et al.
THE PL ANT GENOME
Q
SEPTEMBER 2010
Q
VOL . 3, NO . 2
Figure 4. Principal coordinate analysis (PCoA) of 205 U.S. wheat accessions. The different colors represent the four subpopulations
inferred by structure analysis. Hard was clearly separated from soft by the first principal component, whereas south-central Hard and
north Hard could be roughly separated by the second principal component.
Table 2. Summarized result of AMOVA of subpopulations generated by the model-based method (STRUCTURE 2.2)
and distance-based method (Nei, 1973).
Cluster method
Distance based
(Cluster, Nei 1973)
Model based
(STRUCTURE)
Source of variation
among populations
within populations
total
among populations
within populations
total
df
5
199
204
3
201
204
(2004) were 16.1 and 0.72, respectively, compared with
10.3 and 0.69 in this study based on 14 common primers.
This study shared 11 common markers with Huang et al.
(2002) and had eight fewer alleles with an average allele
of 17.4 per primer and gene diversity of 0.76 in Huang et
al. (2002) versus 9.3 alleles and gene diversity of 0.69 in
this study. These studies also included collections from
wider geographic areas, especially the report by Huang et
al. (2002). In addition, researchers using DArT markers
recently reported a lower level of genetic diversity within
U.S. wheat in comparison with Australian wheat (White
et al., 2008). Therefore, continuous introduction of wheat
germplasm from other countries into U.S. breeding programs has potential to enhance the genetic diversity of
U.S. breeding materials.
Hard wheat appeared to have slightly higher genetic
diversity than soft wheat on the basis of average number
of alleles per locus or degree of dispersion based on population structure and PCoA. The soft wheat accessions
Sum of squares
1522.25
13498.94
15021.19
1403.35
13617.84
15021.19
Variance components
7.38
67.83
75.21
7.89
67.75
75.64
Percentage of variation
9.82
90.18
100.0
10.43
89.57
100.0
averaged 5.2 alleles per locus, whereas hard wheat averaged 6.4 alleles per locus. About one third of the soft
wheat accessions had ≤3 alleles per locus, whereas most
hard wheat accessions had ≥4 alleles per locus. The lower
genetic diversity in soft wheat was also reflected in fewer
marker alleles (4%) that were specific to the soft wheat
class. A previous study revealed a similar low number of
alleles per locus (4.8) in eastern U.S. soft winter wheat
lines (Breseghello and Sorrells, 2006). Relatively lower
polymorphism among soft wheat accessions than hard
wheat was also observed for some important gene markers, such as Sr2-X3b061c22 and SSR3a. The narrower
latitudinal distribution of soft wheat (Fig. 2) may partly
explain the fewer alleles per locus observed in this study,
as less sequence variation might be needed for adaptation
to a narrower latitude area. In addition, structure analysis resulted in three subpopulations in hard wheat, but
only one in soft wheat, which also suggests lower polymorphism among soft wheat than hard wheat. However,
ZHANG ET AL .: GENETI C DIVERSIT Y AND POPU L ATION STRUCTU RE OF ELITE WHEAT
123
Figure 5. Wheat consensus map assembled from 186 SSR markers and genome-wide distribution of LD of 205 U.S. wheat accessions.
Red: P < 0.0001; Green: P < 0.001; Blue: P < 0.01; No fill: P > 0.01. Centromere region for each chromosome is marked with an
ellipse. *markers used for structure analysis.
the smaller sample size of the soft wheat collection may
underrepresent the soft wheat gene pool.
Genetic Relationship and Population Structure
Population structure analysis enables understanding
genetic diversity in a given collection and identifying the
124
appropriate population for association mapping. In this
study, all wheat accessions were consistently separated,
whether by cluster, structure, or PCo analyses, according
to both grain texture (hard versus soft) and geographical region (Fig. 2 and Fig. 3). Grain texture, a major
delineator for U.S. wheat market classes, is controlled
THE PL ANT GENOME
Q
SEPTEMBER 2010
Q
VOL . 3, NO . 2
Figure 6. Genome-wide LD (r2) distribution against the genetic distance. The simulation trend line within 20 cM showed that the r2
declined to below 0.1 within 10 cM.
by puroindoline genes at two loci on chromosome 5DS,
but they alone are unlikely the key factor with which the
collection was grouped (Hogg et al., 2004). One of the
sources of divergence between soft and hard wheat can
be traced back to their entirely different founder populations. The hard red winter wheat was primarily founded
on Turkey Red and its derivatives, whereas soft red
derived from the landrace Mediterranean and a number
of other Northern European landraces (Cox et al., 1986).
In addition, hard and soft wheat cultivars are typically
grown in distinct ecogeographic zones featuring different climatic patterns for optimum grain production,
thereby requiring different genes for localized adaptation. Regional adaptation may be another primary determinant for major group separation, as reported in several
studies on different wheat classes and traits (Bai et al.,
2003; Maccaferri et al., 2005; White et al., 2008). Therefore, the population from the USA used in this study was
also highly structured according to geographic origin or
class, and hard and soft wheat accessions should be analyzed separately in an association study.
In this study, hard wheat accessions were consistently
divided into three subpopulations in both distance-based
cluster analysis and model-based structure analysis.
Within each subgroup, some cultivars or lines were frequently used as parents in different breeding programs.
For example, Jagger, Ogallala, and Trego could be found
in many accessions from the southern and central Great
Plains breeding programs, and Wesley could be found in
many accessions from Nebraska and South Dakota breeding programs. Jagger has many desirable fitness traits such
as resistance to aluminum toxicity, stripe rust (caused
by Puccinia striiformis), Soil-borne wheat mosaic virus,
Wheat spindle streak mosaic virus, tan spot [caused by
Pyrenophora tritici-repentis (Died.) Drechs], speckled leaf
blotch (caused by Septoria tritici Roberge), with broad
adaptation across the southern Great Plains (http://www.
ars-grin.gov/; verified 11 Aug. 2010), any one of which
would contribute to its popularity in southern Great Plains
breeding programs. Similarly, Wesley has desirable breadmaking quality, resistance to a series of diseases, high yield
potential, and good adaptation in the north central Great
Plains. These results suggested that breeding activities
with different objectives in different geographic regions
imparted significant impact on the genetic structure of
breeding populations.
Genome-Wide Linkage Disequilibrium
Genome-wide LD and the magnitude of its distribution
across genomes and chromosomes may significantly
affect the power of association mapping and effectiveness
of marker-assisted breeding (Sorrells and Yu, 2009). Usually, a higher level of LD is expected in hexaploid wheat
than in maize and sorghum because of the rapid rate
of inbreeding in wheat with a high degree of self-pollination. In this study, genome-wide LD was reflected by
r2-values that abruptly declined to 0.1 within 10 cM when
all mapped loci were analyzed (Fig. 6). The LD estimated
in this study was similar to that reported by Chao et al.
(2007) but still larger than that reported by Somers et
al. (2007) for Canadian hard red spring wheat breeding
lines. Therefore, a high level of LD in many chromosome
ZHANG ET AL .: GENETI C DIVERSIT Y AND POPU L ATION STRUCTU RE OF ELITE WHEAT
125
regions of the population suggests that association
mapping with SSR markers can be an effective method
for QTL identification and validation in these regions.
Marker-assisted selection can also be an effective tool for
integrating these QTL into elite breeding lines. However,
varied levels of LD were observed across chromosomes
or genomes. Low LD observed in the majority of each
genome suggests that map resolution can be improved in
many important regions of chromosomes by association
mapping, especially for chromosomes 4D, 5A, and all B
chromosomes where LD blocks were mainly distributed
near the centromere (Devos and Gale, 2000). Sandhu
and Gill (2002) reported that gene density was low in the
centromere region but high in the distal part of chromosomes. Thus, it is possible to improve map resolution
through association mapping in these gene-rich regions,
especially as more single-nucleotide polymorphism
(SNP) markers become available in wheat.
Genetic distance used for LD calculations in this
study was based on the consensus map from mapping
populations (Somers et al., 2004), and thus the distances
may not reflect actual genetic distances in the population. Further, actual LD may differ among populations
and need to be evaluated for each population on a caseby-case basis (Breseghello and Sorrells, 2006). Some alien
chromosome translocations may also affect LD in the
population. Currently high-density SNP maps that are
ideal for LD estimation are still not available in wheat
because of unavailability of the whole genome sequence
and the complexity of the wheat genome. Nevertheless, nearly all LD studies are empirically predicated on
genetic distances in the SSR consensus map (Somers et
al., 2004). Whereas the consensus map has proven useful for linkage mapping of many important genes–QTL
(Arbelbide and Bernardo, 2006; Pushpendra et al., 2007;
Santra et al., 2008), and although genetic distance among
markers may vary between populations, it is still the
most appropriate map currently available for LD analysis
in wheat. Therefore, the LD estimates from this study
provide a valuable addition to the current literature on
LD in wheat. A more precise estimation can be generated
when a high-resolution SNP map is available.
In summary, this study demonstrated that modern
breeding practices still maintain reasonable genetic
diversity in major U.S. winter wheat gene pools. However, introduction of useful genes from other countries
or gene pools may broaden U.S. wheat genetic diversity.
Two subpopulations, hard vs. soft, were clearly structured, which matched with their market classification
and geographic distribution. Higher genetic diversity in
hard winter wheat than soft winter wheat suggests that
gene flow from hard wheat to soft may enhance genetic
diversity of soft gene pool. LD analysis identified significant LD blocks, especially in centromere regions, and
low-density markers in these regions may be sufficient
for association mapping and marker-assisted selection.
However, the majority of genomes have low LD, especially in many gene-rich regions, and thus association
126
mapping in these regions has potential to significantly
improve map resolution for cloning many important
genes when higher density marker coverage with new
marker systems such as SNP is available in wheat.
Acknowledgments
Th is project is partly funded by the NRI of the USDA CSREES, CAP grant
number 2006-55606-16629. Mention of trade names or commercial products in this article is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S.
Department of Agriculture. Contribution no. 10-166-J from the Kansas
Agricultural Experiment Station, Manhattan, KS.
References
Arbelbide, M., and R. Bernardo. 2006. Mixed-model QTL mapping for
kernel hardness and dough strength in bread wheat. Theor. Appl.
Genet. 112:885–890.
Bai, G., P. Guo, and F.L. Kolb. 2003. Genetic relationships among head
blight resistant cultivars of wheat assessed on the basis of molecular
markers. Crop Sci. 43:498–507.
Botstein, D. 1980. A theory of modular evolution for bacteriophages. Ann.
N.Y. Acad. Sci. 354:484–490.
Bradbury, P.J., Z. Zhang, D.E. Kroon, T.M. Casstevens, Y. Ramdoss, and
E.S. Buckler. 2007. TASSEL: Soft ware for association mapping of
complex traits in diverse samples. Bioinformatics 23:2633–2635.
Breseghello, F., and M.E. Sorrells. 2006. Association mapping of kernel
size and milling quality in wheat (Triticum aestivum L.) cultivars.
Genetics 172:1165–1177.
Chao, S., W. Zhang, J. Dubcovsky, and M. Sorrells. 2007. Evaluation of
genetic diversity and genome-wide linkage disequilibrium among
U.S. wheat (Triticum aestivum L.) germplasm representing different
market class. Crop Sci. 47:1018–1030.
Cleveland, W.S. 1979. Robust locally weighted regression and smoothing
scatterplots. J. Am. Statist. Assoc. 74:829–836.
Cox, T.S., J.P. Murphy, and D.M. Rodgers. 1986. Changes in genetic diversity in the red winter wheat regions of the United States. Proc. Natl.
Acad. Sci. USA 83:5583–5586.
Devos, K.M., and M.D. Gale. 2000. Genome relationships: The grass
model in current research. Plant Cell 12:637–646.
Dreisigacker, S., P. Zhang, M.L. Warburton, M.V. Ginkel, D. Hoisington,
M. Bohn, and A.E. Melchinger. 2004. SSR and pedigree analyses of
genetic diversity among CIMMYT wheat lines targeted to different
megaenvironments. Crop Sci. 44:381–388.
Excoffier, L., G. Laval, and S. Schneider. 2005. Arlequin (version 3.0): An
integrated soft ware package for population genetics data analysis.
Evol. Bioinform. Online 1:47–50.
Flint-Garcia, S.A., J.M. Thornsberry, and E.S.T. Buckler. 2003. Structure of
linkage disequilibrium in plants. Annu. Rev. Plant Biol. 54:357–374.
Hogg, A.C., T. Sripo, B. Beecher, J.M. Martin, and M.J. Giroux. 2004.
Wheat puroindolines interact to form friabilin and control wheat
grain hardness. Theor. Appl. Genet. 108:1089–1097.
Hoisington, D., M. Khairallah, T. Reeves, J.M. Ribaut, B. Skovmand, S.
Taba, and M. Warburton. 1999. Plant genetic resources: What can
they contribute toward increased crop productivity? Proc. Natl.
Acad. Sci. USA 96:5937–5943.
Huang, Q., A. Borner, S. Roder, and W. Ganal. 2002. Assessing genetic
diversity of wheat (Triticum aestivum L.) germplasm using microsatellite markers. Theor. Appl. Genet. 105:699–707.
Kim, S., V. Plagnol, T.T. Hu, C. Toomajian, R.M. Clark, S. Ossowski,
J.R. Ecker, D. Weigel, and M. Nordborg. 2007. Recombination
and linkage disequilibrium in Arabidopsis thaliana. Nat. Genet.
39:1151–1155.
Kovach, W.L. 1998. MVSP- A Multivariate Statistical Package for Windows. Release 3.0. Kovach, W.L., Pentraeth, Wales, UK.
Liu, K., and S.V. Muse. 2005. PowerMarker: An integrated analysis environment for genetic marker analysis. Bioinformatics 21:2128–2129.
Maccaferri, M., M.C. Sanguineti, V. Natoli, J.L. Ortega, M.B. Salem, J.
Bort, C. Chenenaoui, E. De Ambrogio, L.G. del Moral, and A. De
Montis. 2006. A panel of elite accessions of durum wheat (Triticum
THE PL ANT GENOME
Q
SEPTEMBER 2010
Q
VOL . 3, NO . 2
durum Desf.) suitable for association mapping studies. Plant Genet.
Resour. 4:79–85.
Maccaferri, M., M.C. Sanguineti, E. Noli, and R. Tuberosa. 2005. Population structure and long-range linkage disequilibrium in a durum
wheat elite collection. Mol. Breed. 15:271–289.
Matthews, D.E., V.L. Carollo, G.R. Lazo, and O.D. Anderson. 2003.
GrainGenes, the genome database for small-grain crops. Nucleic
Acids Res. 31:183–186.
Nei, M. 1973. Analysis of gene diversity in subdivided populations. Proc.
Natl. Acad. Sci. USA 70:3321–3323.
Plaschke, J., M.W. Ganal, and M.S. Röder. 1995. Detection of genetic
diversity in closely related bread wheat using microsatellite markers.
Theor. Appl. Genet. 91:1001–1007.
Pritchard, J.K., M. Stephens, and P. Donnelly. 2000. Inference of population
structure using multilocus genotype data. Genetics 155:945–959.
Pushpendra, K.G., S.B. Harindra, L.K. Pawan, K. Neeraj, K. Ajay, R.M.
Reyazul, M. Amita, and K. Jitendra. 2007. QTL analysis for some
quantitative traits in bread wheat. J. Zhejiang Univ. Sci. B 8:807–814.
Roder, M.S., V. Korzun, K. Wendehake, J. Plaschke, M.H. Tixier, P. Leroy,
and M.W. Ganal. 1998. A microsatellite map of wheat. Genetics
149:2007–2023.
Roussel, V., J. Koenig, M. Beckert, and F. Balfourier. 2004. Molecular
diversity in French bread wheat accessions related to temporal
trends and breeding programmes. Theor. Appl. Genet. 108:920–930.
Saghai-Maroof, M.A., K.M. Soliman, R.A. Jorgensen, and R.W. Allard.
1984. Ribosomal DNA spacer-length polymorphisms in barley:
Mendelian inheritance, chromosomal location, and population
dynamics. Proc. Natl. Acad. Sci. USA 81:8014–8018.
Sandhu, D., and K.S. Gill. 2002. Gene-containing regions of wheat and
the other grass genomes. Plant Physiol. 128:803–811.
Santra, D.K., X.M. Chen, M. Santra, K.G. Campbell, and K.K. Kidwell.
2008. Identification and mapping QTL for high-temperature
adult-plant resistance to stripe rust in winter wheat (Triticum aestivum L.) cultivar ‘Stephens’. Theor. Appl. Genet. 117:793–802.
Somers, D.J., T. Banks, R. Depauw, S. Fox, J. Clarke, C. Pozniak, and C.
McCartney. 2007. Genome-wide linkage disequilibrium analysis in
bread wheat and durum wheat. Genome 50:557–567.
Somers, D.J., P. Isaac, and K. Edwards. 2004. A high-density microsatellite consensus map for bread wheat (Triticum aestivum L.). Theor.
Appl. Genet. 109:1105–1114.
Song, B.H., A.J. Windsor, K.J. Schmid, S. Ramos-Onsins, M.E. Schranz,
A.J. Heidel, and T. Mitchell-Olds. 2009. Multilocus patterns of
nucleotide diversity, population structure and linkage disequilibrium in Boechera stricta, a wild relative of Arabidopsis. Genetics
181:1021–1033.
Song, Q.J., J.R. Shi, S. Singh, E.W. Fickus, J.M. Costa, J. Lewis, B.S. Gill, R.
Ward, and P.B. Cregan. 2005. Development and mapping of microsatellite (SSR) markers in wheat. Theor. Appl. Genet. 110:550–560.
Sorrells, M.E., and J. Yu. 2009. Linkage disequilibrium and association
mapping in the Triticeae. p. 655–683. In C. Feuillet and G.J. Muehlbauer (ed.) Genetics and genomics of the Triticeae, Vol. 7. Springer.
Tenaillon, M.I., M.C. Sawkins, A.D. Long, R.L. Gaut, J.F. Doebley, and
B.S. Gaut. 2001. Patterns of DNA sequence polymorphism along
chromosome 1 of maize (Zea mays ssp. mays L.). Proc. Natl. Acad.
Sci. USA 98:9161–9166.
Weir, B.S. 1996. Genetic data analysis II. Sinauer Associated, Inc., Sunderland, MA.
White, J., J.R. Law, I. MacKay, K.J. Chalmers, J.S. Smith, A. Kilian, and
W. Powell. 2008. The genetic diversity of UK, US and Australian
cultivars of Triticum aestivum measured by DArT markers and
considered by genome. Theor. Appl. Genet. 116:439–453.
ZHANG ET AL .: GENETI C DIVERSIT Y AND POPU L ATION STRUCTU RE OF ELITE WHEAT
127