File S2 Permutation tests for statistics comparing two

Supporting information for
Safeguarding our genetic resources with libraries of doubled-haploid lines
Albrecht E. Melchinger*, Pascal Schopp*, Dominik Müllera, Tobias A. Schrag*, Eva Bauer†,
Sandra Unterseer†, Linda Homann*, Wolfgang Schipprack*, Chris-Carolin Schön†
*Institute
of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim,
Fruwirthstraße 21, 70593 Stuttgart, Germany
†
Technische Universität München, TUM School of Life Sciences Weihenstephan, Liesel-
Beckmann-Straße 2, 85354 Freising, Germany
Correspondence should be sent to:
Prof. Dr. A. E. Melchinger
University of Hohenheim
Institute of Plant Breeding, Seed Science and Population Genetics
Fruwirthstraße 21
70593 Stuttgart, Germany
E-mail: [email protected]
or
Prof. Dr. Chris-Carolin Schön
Technische Universität München, Plant Breeding
Liesel-Beckmann-Straße 2
85354 Freising, Germany
E-mail: [email protected]
1
Supporting information Melchinger et al
CONTENTS
Supplemental Figures
Figure S1 Working steps in production of doubled-haploid lines.
Figure S2 Gene diversity in the original landraces and the DH libraries derived from them.
Figure S3 Tests for Hardy-Weinberg equilibrium in the original landraces.
Figure S4 Comparison of allele frequencies in the original landraces and doubled-haploid
(DH) libraries derived from them.
Supplemental Tables
Table S1 Success rates in production of doubled-haploid lines from landraces versus elite
germplasm.
Table S2 Costs in production of DH lines from individual landraces.
Supplemental Notes
File S1 Process of DH production.
File S2 Permutation tests for statistics comparing two populations across multiple markers
simultaneously.
File S3 Theory for effects of hitchhiking under selection at the haploid and/or diploid stage
during the development of doubled-haploid (DH) lines.
File S4 Translation of allele designation between different SNP arrays.
2
Supporting information Melchinger et al
Figure S1 Working steps in production of doubled-haploid lines, for details see Supporting File.
3
Supporting information Melchinger et al
Figure S2 Nei’s (1973) gene diversity Hs in (A) the original landrace (S0 generation) and (B) doubledhaploid (DH) lines (D1 generation) derived from them, averaged across all markers in a sliding window
of 10 Mb width along the chromosomes for five European flint maize landraces (BU, GB, RT, SC, SF).
The heat map at the bottom, calculated on the basis of the 28,133 SNPs analyzed, indicates the marker
density within the window (Mb-1). Centromeres are indicated by grey vertical lines. (C) Genome-wide
means of Hs values for the S0 and D1 generation.
4
Supporting information Melchinger et al
Figure S3 Fisher’s exact test of for deviations from Hardy-Weinberg equilibrium in the original landrace
(S0 generation), averaged across markers in a sliding window of 10 Mb width along the chromosomes
for five European maize landraces (BU, GB, RT, SC, SF). The heat map at the bottom, calculated on the
basis of the 28,133 SNPs analyzed, indicate the marker density within the window (Mb-1). Centromeres
are indicated by grey vertical lines.
5
Supporting information Melchinger et al
Figure S4
Allele frequencies in the original landrace
(S0
generation)
plotted
against
the
corresponding allele frequencies in the
population of doubled-haploid (DH) lines
(D1 generation) derived from it for each of
five European maize landraces (BU, GB, RT,
SC, SF) shown for the 28,133 SNPs analyzed
in this study. Allele frequencies refer to the
major allele determined in the combined
data set.
6
Supporting information Melchinger et al
Table S1 Success rates in different stages i of production of doubled-haploid (DH) lines for five European maize
landraces and elite crosses from the flint germplasm pool. For definition of stages and counts Ni in each stage, see
Figure S1.
Success rate in different stages†
Source germplasm
N2
N1
N3
N2
N4
N3
N5
N4
N6
N5
N7
N6
N8
N7
--------------------------------------------- % --------------------------------------------Landraces (LR)
Gelber Badischer (GB)
1.38d
77.2c
93.2c
90.1b
21.5c
60.4a
44.8b
Rheintaler (RT)
2.03c
80.0c
88.3d
91.7a
46.2a
30.6c
51.8b
Strenzfelder (SF)
3.02b
83.4b
92.2c
93.0a
29.2b
46.0b
50.4b
Satu Mare (SM)
2.09c
79.5c
93.4c
93.7a
20.7c
55.5a
56.5b
Walliser (WA)
3.43a
87.0a
95.1b
82.5c
22.1c
58.1a
57.1b
Mean
2.39*
81.4
92.4*
90.2*
27.9
50.1**
52.1**
Elite crosses (EC)
2.72a
83.3a
97.0a
93.3a
26.7a
71.0a
81.0a
†
Values followed by the same letter are not significantly different at Bonferroni corrected P < 0.05.
*, ** Mean of the landraces and elite crosses materials differed at P < 0.05 and P < 0.01, respectively
7
Supporting information Melchinger et al
Table S2 Production costs† of one doubled-haploid (DH) line for five European maize landraces (GB, RT, SF, SM, WA) on the basis of the success rates shown
in Table S1.
103 x costs† per unit
Stage i§
1
Activity from stage i to i+1
Labor†
Consum.
Units‡ required in stage i for obtaining one
propagatable D1 line
GB
RT
SF
SM
WA
Costs† of working step
GB
RT
SF
SM
WA
12.02
9.45
¶
Production of induction crosses RT
15.67
12.24
¶
Production of induction crosses SF
8.37
6.55¶
6.12
4.83
¶
10.20
8.05
¶
194.82
0.00
28.55
22.65
22.22
23.72
21.47
5.17
4.10
4.03
4.31
3.91
Production of induction crosses GB
Production of induction crosses SM
Production of induction crosses WA
2,066.23
41.32
1,112.01
28.92
735.26
10.22
1,137.77
11.60
626.85
10.66
2
Identification of haploid seeds
3
423.87
175.39
22.00
18.14
18.46
18.89
18.68
12.29
10.10
10.34
10.54
10.44
339.18
85.65
20.50
15.99
17.07
17.60
17.82
8.11
6.32
6.75
6.98
7.04
142.65
0.00
2.04
1.29
1.18
1.07
3.11
0.27
0.17
0.16
0.15
0.42
6
Germination, colchicine treatment
transplanting to jiffy pods
Transplanting form greenhouse to
field
Verification of true H/DH plants and
rogueing of F1 plants
Isolation of H/DH plants
215.10
13.52
18.46
14.71
15.89
16.53
14.71
3.93
3.12
3.38
3.52
3.12
7
Pollination of fertile H/DH plants
872.97
46.58
3.97
6.77
4.62
3.43
3.22
3.40
5.80
3.96
2.93
2.77
8
Harvest of D1 ears
1,395.48
3.00
2.36
2.04
2.15
1.93
1.93
3.12
2.69
2.77
2.47
2.45
9
Self-pollination of D1 lines
4.19¶
1.07
1.07
1.07
1.07
1.07
8.06
8.06
8.06
8.06
8.06
85.67
69.28
49.67
50.56
48.87
4
5
#
3.86
Total
†
Costs (in USD) are based on wages, machinery, consumables and land rent in Germany
Units refer to seeds, seedlings or plants
§
For detailed description see Materials and Methods
¶
Includes taxes and proportional costs for handling, travel, shipping etc. (~24% of total costs)
#
Induction rate of 7.5%
‡
8
Supporting information Melchinger et al
1
Supporting Files
2
File S1: Process of DH production.
3
The entire production process of doubled-haploid (DH) lines by the in vivo haploid method
4
applied in our study (Prigge and Melchinger 2012) can be subdivided into the following eight
5
steps (see also Figure S1):
6
1. Provision of seeds from induction crosses produced by emasculating plants from the
7
source germplasm (female parent) and pollinating them with pollen from inducer UH400
8
(https://plant-breeding.uni-hohenheim.de/84531); harvesting of all seeds from each
9
induction cross in bulk.
10
2. Identification of all putative haploid seeds in each induction cross by selecting seeds
11
which shows (i) purple coloration of the aleurone to check expression of the R1-nj
12
marker gene and (ii) absence of a purple scutellum on the embryo.
13
3. Germination of putative haploid seeds in a growth cabin at 28° C and 90% humidity for
14
3 to 5 days; treating the seedlings with colchicine for 8 hrs after cutting their coleoptile
15
tips; subsequently, transplanting the seedlings into jiffy pots filled with soil and
16
cultivation in the greenhouse until growth stage V3 (Abendroth et al. 2011).
17
4. Transplanting of the surviving plants into the field.
18
5. Verification of genuine haploid (H) or doubled-haploid (DH) plants on the basis of visual
19
scoring (compared with the hybrid phenotype, the H/DH phenotype is characterized by
20
a shorter stature, erect and narrow leaves and reduced growth and fertility) and
21
rogueing of false positives (F1 plants resulting from hybrid seeds of induction cross that
22
were misclassified due to absence of a purple scutellum on the embryo) before
23
pollination.
9
Supporting information Melchinger et al
24
25
6. Shoot bagging and self-pollination of D0 plants, which produced both silks and filled
anthers.
26
7. Harvest of D1 ears with seed set.
27
8. Growing the seeds of D1 ears ear-to-row; checking the D1 lines for phenotypic
28
uniformity; elimination of off-types; line multiplication by self-pollination of individual
29
plants in each row.
30
We recorded for each induction cross the number N i of units (seeds, seedlings, plants, D0
31
plants, D1 ears with seed set, propagated D1 lines) present in each stage ( i  1,...,8 ) and
32
determined the success rate for each working step i as the ratio SRi 
33
the production costs for each step, the expected total production costs TCosts per D1 line for
34
each landrace and the elite crosses were calculated as follows:
N i 1
. Together with
Ni
8
35
T Cos ts   Ci  ni
i 1
36
Here, Ci refers to all variable costs per unit (plants in induction cross, seeds, seedlings, plants
37
or lines) in stage i , and ni 
38
order to obtain one D1 line. In stages 𝑖 = 1 and 2, the costs Ci varied among the different
39
source germplasm depending on the efforts required for sorting of haploid and hybrid seeds
40
in induction crosses due to variable expression levels of the R1-nj marker. After stage 2, costs
41
Ci were identical for all source germplasm except for isolation of silks and pollination in the
42
landraces (working step 5), but the success rates differed among landraces and the elite
43
materials. Costs of labor per processed unit were based on long-term data gathered in the
44
maize breeding program of the University of Hohenheim (W. Schipprack, unpublished data,
Ni
refers to the number of units required in working step i in
N8
10
Supporting information Melchinger et al
45
2016) and are based on current wages and cost for consumables as well as land rent in
46
Germany. Cost of induction crosses and line multiplication by selfing in the winter nursery in
47
Chile were taken from the price list of companies offering this service.
11
Supporting information Melchinger et al
48
49
File S2: Permutation tests for statistics comparing two populations across multiple markers
simultaneously.
50
For comparing two population samples for a statistic  (e.g., FST statistic) or absolute
51
difference in allele frequencies) calculated from the allele frequencies at marker set M, the
52
following problems can exist:
53
54
1. The allele frequencies at different loci in M are not stochastically independent. This
occurs for example, if markers are in linkage disequilibrium.
55
2. The two populations differ in their population structure. In our study, the S 0 and D1
56
generation differ in their degree of homozygosity: DH lines, subsequently referred to as
57
D1 lines, are completely homozygous and, hence, both parental gametes are identical,
58
whereas the parental gametes of S0 genotypes can be assumed to be stochastically
59
independent, because the S0 generation was produced by random mating each
60
landrace.
61
A solution to Problem 1 can be obtained by a permutation test, in which the test statistic  is
62
calculated as function (in our case as the mean) of all markers in set M (M could be (i) a
63
single marker, or (ii) all markers in a given bin, or (iii) all markers over the entire genome. To
64
obtain the distribution of  under the null hypothesis H0 (the two populations compared do
65
not differ in the allele frequencies at all markers in set M ),  is calculated for each of a large
66
number (N = 10,000 in our study) of permutations of the genotypes from the two populations.
67
Comparing the observed value of  with the distribution of  obtained for the permutations
68
yields the corresponding P-value. An advantage of this test is that it can be applied irrespective
69
of whether the markers in set M are stochastically independent or not.
12
Supporting information Melchinger et al
70
A solution to Problem 2 was obtained by using so-called pseudo-S0 (PS0) individuals in the
71
permutation test instead of D1 lines, when calculating  for the different permutations to
72
obtain the distribution of  under the null hypothesis that the S0 and D1 generation do not
73
differ in their allele frequencies at marker set M. PS0 individuals are obtained by sampling
74
from the N D1 D1 lines at random  0.5 N D1  pairs of lines without replacement, where
75
 0.5 N D1  is the largest integer  0.5 N D1 . The genotype of these pairs of PS0 individuals is
76
obtained from the genotypes of the two “parental” D1 lines used in their formation and
77
corresponds exactly to the genotype that would be obtained if the two gametes from which
78
the two D1 lines originated, had been combined in the S0 generation by random mating. Thus,
79
the PS0 genotypes have exactly the same allele frequencies as the original D1 lines (except for
80
minor deviations if 0.5 N D1 is odd) and have the same population structure as the S0
81
generation, from which the D1 lines were generated. Consequently, the S0 and PS0 populations
82
can be compared in permutation tests without complications arising from different population
83
structure due to different degree of homozygosity. In our study, we used in each permutation
84
run a new set of PS0 genotypes obtained by random union of D1 lines for calculating  .
13
Supporting information Melchinger et al
85
86
File S3: Theory for effects of hitchhiking under selection at the haploid and/or diploid stage
during the development of doubled-haploid (DH) lines.
87
Let p1 and q1 be the allele frequencies of alleles A and a at the A locus and let p2 and q2 be
88
the allele frequencies of alleles B and b at the B locus in the array of gametes used for
89
production of doubled-haploid (DH) lines, i.e., before selection. Let p1* , q1* , p2* and q2* be
90
corresponding allele frequencies in the population of doubled-haploid (DH) lines produced
91
from them, i.e., after selection. The latter correspond to the frequencies of genotypes AA, aa,
92
BB and bb, respectively, in the D1 generation in this study. Let D denote the linkage
93
disequilibrium before selection.
94
We assume that the A locus is subject to selection during the DH process either already at the
95
haploid level (e.g. the haploid embryo does not survive) or at the diploid level (e.g. the diploid
96
DH plant is not fertile), whereas the B locus is selectively neutral. Let w1 be the “overall” fitness
97
of genotype AB or Ab at the haploid stage and genotype AABB or AAbb at the diploid stage
98
and w2 be the “overall” fitness of genotypes aB and ab or aaBB and aabb.
99
Then, we get the following table for the two-locus frequencies before selection:
100
Diploid
Frequencies
A
Haploid
B locus
A locus
B
BB
p2
p1 p2  D
q1 p2  D
Frequencies
b
bb
q2
p1q2  D
q1q2  D
gametes or
gametes/
w1
w2
DH lines
DH genotypes
Fitness
101
a
Haploid genotype
AA
aa
Diploid genotype
p1
q1
Frequencies
of
14
Supporting information Melchinger et al
102
Then, the average fitness of the population at the haploid or diploid homozygous state before
103
selection w  w1 p1  w2 q1 .
104
Defining v1 
105
locus:
p1*  p1v1 and q1*  q1v2 .
106
107
108
w1
w
and v2  2 , we obtain after selection the following frequencies at the A
w
w
(1)
For the B locus, we get the following frequencies of DH lines after selection:
p2*   p1 p2  D  v1   q1 p2  D  v2  p2  D  v1  v2  ,
(2)
109
q2*   p1q2  D  v1   q1q2  D  v2  q2  D  v2  v1  .
110
For the change in frequencies after selection, we get:
111
A locus:
p1  p1*  p1  p1  v1  1
(4)
112
B locus:
p2  p2*  p2  D  v1  v2 
(5)
113
The linkage disequilibrium after selection is:
114
D*   p1 p2  D  v1  p1* p2*   p1 p2  D  v1   p1v1   p2  D  v1  v2  
115
D*  Dv1 1  p1  v1  v2  
116
and for the change in linkage disequilibrium after selection, we get:
117
D  D*  D  D v1 1  p1  v1  v2    1  D v1  1  v1 p1  v1  v2  
118
From this result, we can draw the following conclusions:
(3)
(6)
15
Supporting information Melchinger et al
119
1.
If a selectively neutral locus (B locus) has linkage disequilibrium D with a locus (A locus)
120
under selection at the haploid and/or diploid homozygous state during production of DH
121
lines, it follows from Eqn. (5) that the change in allele frequency at the B locus ( p2 ) is
122
a linear function of D . Thus, if D  0 , i.e., both loci are in linkage equilibrium, the allele
123
frequency at the selectively neutral locus will not change.
124
2. Suppose the allele a at locus A is lethal, i.e., w2  v2  0 , w  w1 p1 , and v1 
w1
1

w1 p1 p1
1
D
, and we get p2 
.
p1
p1
125
Thus, p1*  1 , p2*  p2  D
126
For a lethal allele a, it will most likely be close to extinction, i.e., p1  1.0 . Hence,
127
p2  D and the change in allele frequency at this locus depends almost exclusively on
128
D . In our study (see Figure 3), the decay of linkage disequilibrium, measured as r²,
129
2
reached r  0.10 (i.e., r  0.3162 ) at a physical distance of 3 Mb between loci.
130
Assuming without loss of generality r ≥ 0, we get from r 
131
D  r pi qi p j q j  0.25r .
132
Thus, the change in allele frequency at a selectively neutral gene that is 3 Mb distant
133
from
134
p2  0.25  0.3162  0.078 at maximum, i.e. very small.
a
lethal
allele,
the
expected
change
D
the equation
pi qi p j q j
in
allele
frequency
is
135
16
Supporting information Melchinger et al
136
File S4: Translation of allele designation between different SNP arrays.
137
For the set of 36,209 SNPs in common between markers of the class “PolyHighResolution” on
138
the 600k Affymetrix Axiom® Maize Genotyping Array (Unterseer et al. 2014) and the 50k
139
Illumina® MaizeSNP50 BeadChip, a “translation” of allele coding was necessary because in
140
about half of the cases, the two array platforms targeted opposite strands of the template
141
DNA. The “translation” of the allele coding used in the manufacturer’s annotation file of the
142
Affymetrix 600k chip to the “Forward” allele coding given in the allele report table of the
143
Illumina 50k chip was based on data from 29 maize inbred lines from a sequence variant
144
discovery panel (3), on which genotyping data for both arrays were available. First, we
145
identified for each SNP the major allele in the data set of these lines for the 600k chip and
146
identified a relationship to one of the alleles on the 50k chip using as a basis the subset of lines
147
carrying this allele in the 600k data set. Afterwards, the second allele on the 600k chip was
148
assigned to the second allele on the 50k chip. Second, on the basis of this “translation” rule,
149
we translated the 600k genotype data of all 29 lines, resulting in the so-called 50kT data. Third,
150
the 50k and 50kT genotype data were compared for each line and only those SNPs and their
151
“translation” were accepted for further analyses, if they met both quality criteria:
152
153
154
155
1. Data from both the 50k and 50kT genotyping were available for at least 25 out of the
29 lines.
2. Genotyping results for the 50k and 50kT data matched for all available lines except for
the maximum of one mismatch.
156
This yielded a set of 33,039 markers, of which 28,133 were polymorphic across the entire set
157
of 380 genotypes and fulfilled all quality criteria described in section of Materials and
158
Methods.
159
17
Supporting information Melchinger et al
160
References
161
Abendroth, L. J., R. W. Elmore, M. J. Boyer, and S. K. Marlay, 2011 Corn growth and
162
163
164
165
development. Iowa State University Extensions, Ames.
Nei, M., 1973 Analysis of Gene Diversity in Subdivided Populations. Proc. Nat. Acad. Sci.
U.S.A. 70: 3321-3323.
Prigge, V., and A. E. Melchinger, 2012 Production of Haploids and Doubled Haploids in
166
Maize, pp. 161-172 in: Plant Cell Culture Protocols, Methods in Molecular Biology 3rd
167
edition, edited by V. M. Loyola-Vargas, and N. Ochoa-Alejo. Humana Press, Totowa.
168
Unterseer, S., E. Bauer, G. Haberer, M. Seidel, C. Knaak, et al., 2014 A powerful tool for
169
genome analysis in maize: development and evaluation of the high density 600 k SNP
170
genotyping array. BMC Genomics 15: 823.
18
Supporting information Melchinger et al