Genotyping-by-sequencing, linkage mapping, protoplasting, and

Genotyping-by-sequencing, linkage mapping,
protoplasting, and mating of the Termitomyces symbiont of
Macrotermes natalensis
MSc thesis report of:
Registration number:
Study programme:
Year:
Chair group:
Supervisor:
Examiner:
Bas Jacobs
940221384050
MSc Biotechnology
2015/2016
Genetics
Sabine Vreeburg
Duur Aanen
Table of Contents
Construction of the first genetic map of the Termitomyces symbiont associated with
Macrotermes natalensis using a genotyping-by-sequencing approach ...................................... 5
Abstract .................................................................................................................................. 5
Introduction ............................................................................................................................ 5
Methods .................................................................................................................................. 7
Mapping population ........................................................................................................... 7
DNA isolation and RNA degradation ................................................................................ 7
Detection of heterokaryons in the mapping population and choice of samples for GBS .. 7
Concentration measurement and quality control of DNA samples .................................... 8
Genotyping-by-sequencing ................................................................................................ 8
SNP discovery .................................................................................................................... 8
SNP filtering ....................................................................................................................... 8
Linkage mapping ................................................................................................................ 9
Genotyping and mapping of the mating type locus .......................................................... 10
Alignment error detection of markers from scaffolds split across linkage groups .......... 10
Genome coverage estimation and marker distribution ..................................................... 10
Comparison physical and genetic distance and analysis number of recombination events
.......................................................................................................................................... 11
Results .................................................................................................................................. 11
Detection of heterokaryons in the mapping population ................................................... 11
Quality control of DNA samples ...................................................................................... 12
GBS analysis and SNP filtering ....................................................................................... 12
Linkage map ..................................................................................................................... 13
Mapping of the mating type locus .................................................................................... 16
Alignment errors .............................................................................................................. 17
Genome Coverage ............................................................................................................ 17
Marker distribution ........................................................................................................... 18
Physical and genetic distance ........................................................................................... 19
Recombination events ...................................................................................................... 20
Discussion ............................................................................................................................ 20
Conclusion ............................................................................................................................ 23
Acknowledgements .............................................................................................................. 23
References ............................................................................................................................ 24
2
A simple, working protoplasting protocol for the Termitomyces symbiont of Macrotermes
natalensis .................................................................................................................................. 27
Abstract ................................................................................................................................ 27
Introduction .......................................................................................................................... 27
Methods ................................................................................................................................ 27
Strain and growth medium ............................................................................................... 27
Growth and harvesting of mycelium for protoplasting .................................................... 28
Protoplast production ....................................................................................................... 28
Protoplast regeneration ..................................................................................................... 28
Results .................................................................................................................................. 28
Discussion ............................................................................................................................ 29
Conclusion ............................................................................................................................ 29
Acknowledgements .............................................................................................................. 29
References ............................................................................................................................ 30
Determining the mating system of the Termitomyces symbiont of Macrotermes natalensis .. 31
Abstract ................................................................................................................................ 31
Introduction .......................................................................................................................... 31
Methods ................................................................................................................................ 32
Set-up of crosses ............................................................................................................... 32
Purification of cross products ........................................................................................... 33
Results .................................................................................................................................. 33
Discussion ............................................................................................................................ 33
Acknowledgements .............................................................................................................. 33
References ............................................................................................................................ 33
The search for homeodomain genes involved in mating in Termitomyces sp. ........................ 35
Abstract ................................................................................................................................ 35
Introduction .......................................................................................................................... 35
Methods ................................................................................................................................ 36
Template DNA and primer design ................................................................................... 36
PCR reactions ................................................................................................................... 36
Results .................................................................................................................................. 36
Discussion ............................................................................................................................ 38
Acknowledgements .............................................................................................................. 39
References ............................................................................................................................ 39
3
Appendix .................................................................................................................................. 40
1. Probability calculations .................................................................................................... 40
A. Probability of having at least one of all four different mating types in a sample of n
homokaryons .................................................................................................................... 40
B. Probability of having at least three out of four different mating types in a sample of n
homokaryons .................................................................................................................... 41
4
Construction of the first genetic map of the Termitomyces
symbiont associated with Macrotermes natalensis using a
genotyping-by-sequencing approach
Abstract
Genotyping-by-sequencing (GBS) is a cost-effective approach to SNP marker discovery that
has been used in many genetic mapping studies in plants, but thus far not that much in fungi.
Termitomyces is a genus of basidiomycetes that lives in a mutualistic symbiosis with fungusgrowing termites and produces mushrooms that are edible for humans and have high nutritive
value. Here, the GBS approach was used to construct the first genetic linkage map of the
Termitomyces symbiont of Macrotermes natalensis. The map, based on 88 haploid offspring
from a single heterokaryon, consists of 16 linkage groups, containing 586 markers and
spanning a total length of 1303 Haldane cM. A preliminary mapping of the mating type locus
embedded it firmly in one of the larger linkage groups. Due to the fragmented nature of the
reference genome, total genome coverage cannot be guaranteed, but the map does provide an
order for many of the scaffolds from the genome assembly. Analysis of the numbers of
recombination events indicates that crossover interference may not play a large role in the
recombination behaviour of this fungus. In addition, comparison of physical and genetic
distances indicates that the recombination landscape of the species may be dominated by
hotspots and coldspots. A more complete reference genome assembly will, however, be
necessary to make this conclusion stronger. The genetic map will be a useful resource for
future genetic and genomic studies on Termitomyces, providing a framework for mapping
interesting traits and QTLs.
Introduction
All known species of the fungal genus Termitomyces grow in a remarkable mutualistic
symbiosis with fungus-growing termites of the subfamily macrotermitinae (Aanen et al.,
2002). This mutualistic relation is obligate; neither partner can survive for long without the
other (Sands, 1956; De Fine Licht et al., 2005). The symbiosis is thought to have evolved a
single time in the African rainforest (Aanen and Eggleton, 2005) and no reversals to a free
living state have thus far been found for both termites and fungi (Aanen et al., 2002).
In addition to the ecological and evolutionary interests in the genus Termitomyces, the edible
mushrooms produced by this basidiomycete fungus also make it an attractive organism to
study. Not only are these mushrooms a local delicacy (Oso, 1975), they also contain many
important nutrients (Botha and Eicker, 1992; Kansci et al., 2003; Malek et al., 2012;
Ogundana and Fagade, 1982). Furthermore, consumption of the mushroom may help lower
blood cholesterol levels (Nabubuya et al., 2010) and some of its components may have
medical applications (Chatterjee et al., 2013). Unfortunately, these mushrooms appear to be
quite rare. There are indications that they are only formed when the termite symbiont forms
new colonies and needs to obtain spores to reinitiate the symbiosis, which may help explain
this rarity (Johnson et al., 1981). Therefore, it would be valuable if the fungus could be
cultivated and mushroom formation could be induced in the lab (and later on commercial
scale) without requiring the termite. Unfortunately, laboratory cultivation of the fungus,
although possible, is still quite difficult and induction of fruiting bodies in culture has been
reported only once in a single species of Termitomyces (De, 1982).
Many studies on Termitomyces have used the symbiont associated with the termite
Macrotermes natalensis. Since M. natalensis has only ever been found associated with a
5
single specific lineage of Termitomyces (De Fine Licht et al., 2006; Aanen et al., 2007), it is
likely that these studies all involved the same Termitomyces species. This has been confirmed
by pairing between homokaryons isolated from three different heterokaryotic strains
associated with M. natalensis (Nobre et al., 2014). A reference genome of this species has
recently been published (Poulsen et al., 2014). Further genetic studies of this fungus may
provide useful insights into its life cycle, facilitating future attempts to cultivate its
mushrooms as well as providing a better understanding of its symbiosis.
Genetic maps are useful tools for such studies, facilitating for example the localisation of
genes on the genome. Genetic mapping uses the recombination frequency between genetic
markers as a measure of the distance between these markers. If sufficient markers are used,
the markers can be assigned to different linkage groups corresponding to the different
chromosomes and the relative positions of the markers can be determined, allowing the
creation of a genetic map. A genetic map of Termitomyces would be useful for many
purposes, such as (1) the ordering of the rather large number of scaffolds and contigs of the
current Termitomyces genome assembly (Poulsen et al., 2014), (2) the study of the
recombination behaviour of this fungus, (3) the identification of genes and quantitative trait
loci (QTLs), and in the future (4) the amelioration of the (cultivated) mushrooms by marker
assisted selection (Foulongne-Oriol, 2012).
One locus that can be identified using a genetic map is the mating type locus. Knowledge of
the mating system of Termitomyces might help future attempts to induce fruiting bodies, since
two homokaryons (the haploid mycelia that arise when sexual spores germinate) can only
successfully form a heterokaryon (the mycelium that forms the mushrooms, containing two
distinct types of nuclei) when they have compatible mating types. Preliminary crosses indicate
that Termitomyces has a bipolar mating system (unpublished results) and therefore only one
mating type locus (for a review on mating systems in basidiomycetes see: Kües et al., 2011).
By determining the recombination frequency between the markers on the linkage map and the
mating type locus, this locus can be placed on the genetic map. This method has been
successfully used to map the mating type locus in several other species of basidiomycetes (Xu
et al., 1993; van der Nest et al., 2009; Okuda et al., 2009).
For the construction of a genetic map and the subsequent isolation of genes, markers are
required. SNP markers have the advantage of being non-anonymous (i.e. their sequence is
known and they can be directly linked to a place on the genome) and codominant (Rowe et
al., 2011). Strategies employing reduced genome representation in combination with next
generation sequencing have been shown to generate many SNP markers in a fast and costeffective way (Davey et al., 2011). The ability to find a large number of markers allows for
the construction of high-density linkage maps, which is favourable for the localisation of
genes on the genome (Jones et al., 2009).
For constructing genetic linkage maps the genotyping-by-sequencing (GBS) developed by
Elshire et al (2011) approach has been used in many plant species (Bielenberg et al., 2015;
Guajardo et al., 2015; İpek et al., 2016; Ma et al., 2012), although not that much in fungi. In
this method, genomic DNA is digested with a restriction enzyme to generate many different
fragments of varying sizes. Two different adapters, a barcoded adapter and a common
adapter, are then ligated to the fragments. The use of barcoded adaptors allows for the
samples to be pooled before sequencing, making the method more cost-effective. After
pooling, the smaller fragments are amplified by PCR using primers that bind to the adapters
and have an extended region that binds to the oligonucleotides in an Illumina genome
sequencer. The ends of the amplified fragments that have two different adapters attached are
then sequenced with an Illumina sequencer. In theory, this allows for a representative subset
of SNPs to be identified on the sequenced ends (Elshire et al., 2011).
6
In this study, the first genetic map of the Termitomyces sp. associated with M. natalensis was
made using SNP markers discovered by GBS. A preliminary mapping of the mating type
locus to this map was performed using mating types derived from phenotypic analysis of
crosses performed with a subset of the mapping population. In addition, genetic and physical
maps were compared and results indicate that the recombination landscape of Termitomyces is
mainly governed by recombination hotspots and coldspots. Although the completeness of the
current map cannot be guaranteed, it does provide an ordering for part of the scaffolds from
the very fragmented genome assembly and it should prove to be a useful resource for future
gene and QTL mapping studies.
Methods
Mapping population
The population used for linkage mapping consisted of single-spore isolates from a single
Termitomyces heterokaryon isolated in South Africa from an M. natalensis colony. Since this
species of Termitomyces has been shown to be mostly outbreeding in nature (De Fine Licht et
al., 2006), this natural isolate was expected to show sufficient variation for use in genetic
mapping.
DNA isolation and RNA degradation
Genomic DNA for use in subsequent steps was isolated from part of the mapping population
as well as from the parent heterokaryon using the CTAB method. Any RNA present in the
samples after extraction was degraded by adding 3 μL RNase I (Thermo Scientific) and
incubating for two hours at 37 °C. Samples were subsequently incubated for 15 min at 70 °C
to inactivate the RNase.
Detection of heterokaryons in the mapping population and choice of samples for GBS
Since some of the single-spore isolates may actually be heterokaryons that originated by
fusion of two germinating spores, a marker was developed to test some of the suspect
heterokaryons in the mapping population. This marker was developed by PCR amplification
of a highly variable part of the nuclear Elongation Factor 1 alpha (EF1α) from the parent
heterokaryon using primers EF595F and EF1160R described by De Fine Licht et al. (2006).
The reaction volume of the PCR was 25 μL containing 5 μL 5x GoTaq PCR buffer
(Promega), 2 μL 25 mM MgCl2, 1 μL 10 mM dNTPs, 1 μL of each primer, 0.1 μL GoTaq
polymerase (Promega), 2 μL ten times diluted template DNA, and 12.9 μL mili-Q water. The
PCR was performed on a MyCycler Thermal Cycler (Bio-Rad), starting with a denaturation
step of 5 min at 94 °C, followed by 35 PCR cycles (1 min denaturation at 94 °C, 1 min
annealing at 53 °C and 1 min extension at 72 °C), after which a final extension of 10 min at
72 °C was performed. The PCR product was checked under UV, after running 3 μL for 60
min at 70 V on a 1% agarose gel with EtBr, and purified using a Nucleospin gel and PCR
clean-up kit (MACHERY-NAGEL) according to the instructions provided by the
manufacturer. Sanger sequencing of the purified product was performed by Eurofins.
A SNP marker was detected in this sequence by looking for a double peak in the
chromatogram. This SNP disrupted an NdeI restriction site, allowing for the conversion of the
SNP marker into a PCR-RFLP marker. Twelve suspect heterokaryons and 25 putative
homokaryons from the mapping population were analysed for heterozygosity using this
marker. The parent heterokaryon was included in this analysis as a positive control. PCRs
were performed on these samples as before and PCR products were digested in 10 μL
volumes with 5 μL PCR product, 3 μL mili-Q water, 1 μL digestion buffer and 1 μL NdeI
(New England Biolabs). Digestions were incubated for one hour at 37 °C. All digested
7
product was checked by running on a 1% agarose gel with EtBr for one hour at 80 V. Since
roughly half of the suspect heterokaryons showed heterozygosity (which would be expected if
all of them were heterokaryons resulting from a mating of sibling homokaryons), these and all
other suspect heterokaryons in the mapping population were excluded from the further
analyses. From the remaining presumed homokaryons in the mapping population, 92 were
chosen at random for analysis by GBS. In addition, the parent heterokaryon was included in
three replicates as a control.
Concentration measurement and quality control of DNA samples
DNA concentrations were measured using a Qubit 2.0 fluorometer (Life Technologies)
according to the instructions provided by the manufacturer. For GBS, DNA concentrations
needed to be between 30 and 100 ng/μL. Therefore, samples with concentrations higher than
100 ng/μL were diluted with mili-Q water and samples with concentrations lower than 30
ng/μL were concentrated by evaporation in a vacuum.
To test if the quality of the DNA was sufficient for GBS, trial digests were performed on the
parent heterokaryon sample and nine randomly chosen samples from the homokaryon
population. The digestions were performed using restriction enzyme HindIII in 20 μL
volumes containing 10 μL DNA, 7.7 μL mili-Q water, 2 μL digestion buffer, and 0.3 μL
HindIII (Promega). The reactions were incubated for two hours at 37 °C. All 20 μL of the trial
digest was subsequently run on a 1% agarose gel with EtBr for three hours at 40 V along with
3 μL of undigested sample and 5 μL of a λ HindIII digest as size standard.
Genotyping-by-sequencing
GBS was performed at the Genomic Diversity Facility of Cornell University according to the
protocol described by Elshire et al. (2011). The enzyme used for the restriction step was
EcoT22I (a six-base cutter) rather than ApeKI (a five-base cutter with one wobble base),
because the less frequent cutting results in fewer different sequenced fragments and therefore
higher coverage per sequenced fragment. Although this is expected to decrease the number of
SNPs that can be identified, it should increase the probability that a SNP can be scored for all
samples, which is favourable for linkage mapping. After adaptor ligation all 95 samples (three
replicates of the parent heterokaryon and 92 presumed homokaryons from the mapping
population) and one blank sample (no DNA) were pooled and sequenced in a single Illumina
sequencing lane (after the PCR-step).
SNP discovery
Processing of raw sequence reads was performed using version 2 of the GBS analysis pipeline
introduced by Glaubitz et al. (2014) implemented in TASSEL 5 (Bradbury et al., 2007). The
genome assembly reported by Poulsen et al. (2014) was used as reference genome. Default
settings were used except for the maximum memory setting, which was increased to 16 Gb
and the minimum minor allele frequency setting in the Discovery SNP Caller, which was
increased to 0.1, since the allele frequency of real SNPs in the mapping population is expected
to be around 0.5, making low minor allele frequencies suspicious. The alignment was
performed with BWA version 0.7.13 (Li and Durbin, 2009), using default settings. Final SNP
calls made by the GBS pipeline were exported in HDF5 format for further filtering in
TASSEL.
SNP filtering
Using TASSEL 5, all genotype calls with a read depth lower than five were set to missing,
retaining only the more reliable calls. Genotypes were exported in hapmap format and further
filtering was performed in R version 3.1.2 (R Core Team, 2014). Firstly, the genotypes calls
8
were converted to the ‘a, h, b, u’ format required for the linkage mapping step. Because the
genotype of the individual nuclei of the parent heterokaryon was unknown, assignment of ‘a’
and ‘b’ was arbitrary. If more than two alleles were found for a marker, the marker was
discarded if there was more than one occurrence of the least frequent allele. Otherwise, the
single conflicting genotype call was set to unknown. Markers were named according to their
scaffold and the position on that scaffold in base pairs.
Secondly, all markers with more than 80% missing genotype calls were removed, to get rid of
the most poorly scored tags. Then, markers and samples were filtered in an iterative way
based on the percentage of non-missing genotype calls that were heterozygous: First all
markers where this was more than 40% were removed, then all samples where it was more
than 40% were discarded and these steps were then repeated with cut-off values of 20%, 10%,
5% and 2%. This was done because heterozygosity should not be present in a population of
haploid individuals. Markers that are mostly heterozygous are likely the result of paralogous
sequences aligning to the same part of the genome assembly and therefore not really SNPs.
Samples that are largely heterozygous are likely to be heterokaryons, either because they were
samples from the parent heterokaryon (i.e. the controls), or because they were the result of a
mating between two single-spore isolates. These samples are not part of the intended mapping
population and should therefore not be included in the mapping process. At the end of this
filtering step, all remaining heterozygous calls were set to unknown.
In the next step, all markers that no longer had two different alleles were removed, after
which the density of the genotype matrix was increased by removing markers and samples
with more than a certain percentage of missing calls. From here, two different datasets were
created: a strictly filtered dataset and a mildly filtered one. For the first dataset, markers with
more than 50% missing data were removed, followed by samples with more than 50%
missing data and this was repeated with cut-off values of 30% and 10%. For the second
dataset, the last step with 10% was omitted. After this, markers showing a significant
segregation distortion (according to a chi-squared test) were removed. For the strictly filtered
dataset the significance level of this test was chosen 0.01 and for the mildly filtered dataset it
was chosen 0.001. These steps were performed because missing data and distorted markers
can have a negative effect on the accuracy of the genetic map.
Finally, from every group formed by linking together all SNPs with physical distances less
than 64 base pairs (bp) only one (with the smallest number of missing genotypes) was kept as
a marker. This was done because markers that close are practically the same and probably
originated from the same tag. The resulting datasets were written to CSV files.
Linkage mapping
Linkage mapping was performed using Joinmap 4 (Van Ooijen, 2006), using the ‘HAP’
population type. Linkage mapping was performed in two rounds, as this has been shown to
reduce the impact of missing genotypes and segregation distortions (Foulongne-Oriol et al.,
2010). In the first round, the strictly filtered dataset (few missing genotypes, little segregation
distortion) was mapped. Grouping of markers into linkage groups was performed using the
independence LOD-score. To determine a reasonable cut-off value for grouping by this
statistic, a permutation test was performed using the mildly filtered dataset (the largest of the
two datasets). This was done by randomly redistributing the genotype calls over the samples
for each marker and computing the grouping tree for this permuted dataset fifty times. After
permutation, none of the markers should be linked anymore and therefore all observed
linkages are merely the result of chance. For each of the fifty permutations the highest LODscore at which linkage was still observed was recorded, to obtain an estimate of the highest
LOD-score at which spurious linkage can still be expected if no real linkage is present. Since
it is highly unlikely that there will be no real linkage between any of the markers, this
9
estimate is probably quite conservative. Since spurious linkage occurred in less than 10% of
the cases at a LOD-score of 6, this score was chosen for determining linkage groups. In
addition, if this grouping resulted in gaps larger than 40 cM (Haldane, corresponding to a
recombination frequency of 0.275), groups were split up again.
Within each of the linkage groups, identically segregating loci were assigned to the group to
make them show up on the map and marker orders were calculated using the maximum
likelihood (ML) algorithm. Although the ML algorithm can only be used with the Haldane
mapping function and tends to produce inflated map lengths, especially in the presence of
genotyping errors, it is better in the correct ordering of markers than the regression algorithm
(Hackett and Broadfoot, 2003).
In the second mapping round, the mildly filtered dataset was mapped using the same strategy.
This time, however, any markers that increased the map length by more than 10 cM, or got
placed in between two markers from a single, different scaffold, were removed.
Genotyping and mapping of the mating type locus
Preliminary genotyping of the mating type locus was performed for 29 of the homokaryons
used in linkage mapping, by analysing the successfulness of crosses based on phenotypic
changes at the contact zone. The genotype data obtained in this way were added to the mildly
filtered dataset and linkage mapping was performed again as described above in order to place
the mating type locus.
Alignment error detection of markers from scaffolds split across linkage groups
Linkage groups were manually screened for markers from the same scaffold that were not
grouped together on the same linkage group. Since individual markers that had positions on
the linkage map that were far apart from other markers from the same scaffold could have
been erroneously assigned to this scaffold by misalignment, the alignment of these deviant
markers was tested. GBS-tags from these markers were retrieved from the SAM file created in
the alignment step of the GBS pipeline and aligned to the genome using BLASTN (Camacho
et al., 2009) with default settings except for the ‘-task’ flag, which was set to ‘blastn’.
Genome coverage estimation and marker distribution
Total coverage of the genome by markers on the linkage map was estimated in two different
ways. Firstly, the combined length of all scaffolds represented on the linkage map was
calculated. Secondly, the total physical map length represented by the genetic map was
estimated from the total genetic length of the map and an estimate of the average physical to
genetic distance ratio (kb / cM). This ratio was estimated by adding up all the physical and
genetic distances between markers from the same scaffold for all scaffolds represented by at
least four markers (markers that were assigned to the wrong scaffold due to misalignment
were excluded from this analysis). The distance ratio was then determined by dividing the
total added physical distance by the total added genetic distance.
The distribution of the mapped markers over the scaffolds and contigs of the genome
assembly was examined by testing for each scaffold whether it was over- or underrepresented.
This was done using two-tailed hypergeometric tests for each of the scaffolds and contigs,
where the observed markers were considered a sample from a population of all positions
where markers could have been found. The total population size was estimated as two times
the number of restriction sites, because, in the GBS procedure, every restriction site gives rise
to two ends on which a SNP may or may not be found. The results (at α = 0.05) were
compared with the results of Bonferroni and Benjamini-Hochberg multiple testing
corrections, which may be overly strict, since the tests are not independent.
10
Distribution of markers over the linkage groups was tested by comparing the observed
number of markers on a linkage group with the expected number under a Poisson distribution
(Remington et al., 1999). The expected number of markers for each linkage group was
calculated by multiplying the total number of markers with the ratio between the length of the
linkage group and the total map length. P-values were calculated as the probability of finding
the observed number of markers or a more extreme number and compared to α/2 (α = 0.05),
because it is a two-tailed test.
The distribution of markers within each linkage group was examined by comparing the
markers distribution with a uniform distribution using Q-Q plots and Kolmogorov-Smirnov
tests.
Comparison physical and genetic distance and analysis number of recombination events
To examine the relation between physical and genetic distance, genetic positions were plotted
against physical positions for all scaffolds from which at least ten markers were represented
on the linkage map (markers that were assigned to the wrong scaffold due to misalignment
were again excluded from the analysis).
Genotypes of individual homokaryons ordered according to their position on the linkage map
and coloured according to the parent of origin were visualised in Joinmap. The number of
recombination events was determined for each of the homokaryons and each of the linkage
maps by counting the number of changes from one parental genotype to the other. Changes
that were immediately reverted at the next scored marker on the linkage group were not
counted, because they were likely the result of genotype scoring errors. The average number
of recombination events per homokaryon was calculated for each of the linkage groups and
compared to the total genetic length of the linkage group in both Haldane and Kosambi cM.
Results
Detection of heterokaryons in the mapping population
The PCR-RFLP assay to test individuals for heterozygosity suffered from some non-specific
PCR amplification. An extra band roughly 100 bp larger than the expected full length
fragment of 591 bp could be found when the full length amplicon was present (Figure 1). In
addition, in samples where the intended amplicon was cleaved, this band disappeared and a
band of about 100 bp longer than the intended largest cleavage product of 417 bp appeared.
The extra bands may have been the result of one of the primers annealing non-specifically to a
nearby sequence, lengthening the intended product by 100 bp. However, samples that are
heterozygous for the marker can still be identified because they have all bands, while the
homozygotes have only the top two or the lower two bands. This way, five of the twelve
individuals suspected to be heterokaryons and none of the individuals presumed to be
homokaryons were found to be heterozygous. Since only half of the heterokaryons formed by
sibling matings are expected to be heterozygous, it is quite likely that most of the other
suspect heterokaryons are also heterozygous. Therefore, all individuals from the mapping
population that were expected to be heterokaryons based on their phenotype were excluded
from the further analyses.
11
Figure 1. Heterozygosity test of a subset of the mapping population. PCR-RFLP analysis was
performed on 12 suspect heterokaryons (top left) and 25 presumed homokaryons (bottom), using a
marker for which the parent heterokaryon (top right) was heterozygous. Sizes of the 100 base pair (bp)
ladder (lane M) are indicated on the left. The length of the undigested fragment targeted by PCR was
591 bp and digestion products of 417 bp and 173 bp were expected for one of the two alleles.
Quality control of DNA samples
All ten DNA samples that were chosen for the quality test showed clear single bands where
the full genomic DNA was loaded on the gel and a smear where the digested genomic DNA
was loaded (Figure 2). Therefore, the isolated DNA was expected to be mostly intact, clean
and readily digestible, indicating that it was suitable for GBS.
Figure 2. Quality control and trial digestion of several GBS samples. Samples of full genomic DNA
(single bands) and genomic DNA digested with HindIII (smears) were analysed by gel electrophoresis
for nine homokaryons (HM) and the parent heterokaryon (HT). Sizes in base pairs (bp) of the HindIII
digested lambda DNA marker (M) are indicated on the left.
GBS analysis and SNP filtering
The GBS procedure performed on the 92 homokaryons and 3 replicates of the parent
heterokaryon yielded on average 2.9 million reads per sample, with a standard deviation of
1.1 million, a minimum of 0.8 million, and a maximum of 5.9 million. Of the blank control
sample only 6011 reads were found.
The GBS analysis pipeline initially yielded 9835 SNP markers which were subsequently
filtered in two different ways. Strict filtering, which was harder on missing data and deviating
segregation ratios, yielded a total of 489 high quality SNP markers, while milder filtering
yielded 591 markers (for an overview of all filtering steps see Table 1). The greatest loss in
the number of markers was observed when filtering against heterozygosity, which is probably
the result of tags from different positions aligning to the same place on the reference genome.
In addition, the blank sample, the three replicates of the parent heterokaryon, and four
12
individuals from the mapping population were filtered out in this step, leaving 88 samples for
linkage mapping. The removal of the parent heterokaryon control samples in this step
indicates that this filter is capable of successfully filtering out heterokaryotic samples. It is,
therefore, likely that the four other samples that were removed in this step were heterokaryons
that were missed by the initial phenotypic screening of the mapping population.
Table 1. Numbers of markers and individuals that remained after each of the filtering steps and the
mapping step for both strictly and mildly filtered datasets. Mild filtering was less strict against missing
data and deviations from the expected segregation ratio.
Strict filtering
Mild filtering
Step
Number of Number of
Number of Number of
markers
individuals
markers
individuals
left
left
left
left
GBS pipeline
9835
96
9835
96
Removal of markers with too many alleles
6136
96
6136
96
and markers with more than 80% missing
genotypes
Iterative filtering against heterozygosity
3773
88
3773
88
Removal of all non-polymorphic markers
1432
88
1432
88
Iterative filtering against missing genotypes
969
88
1107
88
Removal of markers with severely distorted
807
88
974
88
segregation
Removal of markers with close physical
489
88
591
88
positions
Removal of markers that could not be
487
88
586
88
reliably placed on the linkage map
Linkage map
The first round of linkage mapping, using only the most reliable SNP markers, yielded fifteen
linkage groups. Most of these groups fell apart at a LOD threshold of 6, with the exception of
linkage groups 8 and 11 which were grouped together up to LOD 8, but were split apart
because this grouping resulted in an interval of approximately 70 cM. Two markers could not
be placed on any of the linkage groups and were left out.
In the second mapping round, some of the less reliable markers were included, to obtain the
final linkage map (Figure 3). This resulted in an extra linkage group (LG16), formed by new
markers that were linked together but not to any of the existing linkage groups. The two
markers that were left out in the previous round still could not be placed anywhere on the
map. In addition, three of the newly added markers were removed because they inflated the
map length by more than 10 cM, or because their placement interrupted two markers from the
same scaffold. This resulted in a final linkage map with 586 markers based on 88 haploid
offspring (Table 1), covering a total length of 1303 Haldane cM (for some summary statistics
see Table 2).
13
14
15
Figure 3. Linkage map of Termitomyces sp. based on 88 homokaryons, consisting of 586 SNP
markers. Genetic positions of markers (Haldane cM) are indicated on the left. Marker names
indicating scaffold of origin and position on that scaffold in base pairs are indicated on the right. LOD
support (up to LOD 10) for various groupings is indicated by curly brackets. The sixteen linkage
groups (LG1 – LG16) were numbered arbitrarily, since the chromosomes they belong to are unknown.
Mapping of the mating type locus
The mating type locus was added to the map in a third mapping round, because it was only
scored for 29 individuals and because determining the successfulness of matings based on
phenotype may not be too reliable. It mapped to LG3 (up to a LOD score of 8) and fit quite
well without significantly distorting or lengthening the map (Figure 4).
16
1
0.0
8.6
8.7
11.0
1
SCAFFOLD10_202034
SCAFFOLD363_39271
SCAFFOLD10_114662 SCAFFOLD10_114461
SCAFFOLD576_19952 SCAFFOLD10_133672
SCAFFOLD10_92792 SCAFFOLD10_87154
SCAFFOLD22_23946 SCAFFOLD22_132660
50.3
SCAFFOLD73_96054 SCAFFOLD73_253765
SCAFFOLD83_23496 SCAFFOLD73_268433
SCAFFOLD73_253597 SCAFFOLD73_178173
SCAFFOLD242_171619 SCAFFOLD242_171803
SCAFFOLD242_515851 SCAFFOLD242_533967
SCAFFOLD242_464945
SCAFFOLD403_77850
66.3
67.4
SCAFFOLD268_49331
SCAFFOLD21_181321 SCAFFOLD616_44995
76.1
SCAFFOLD616_56174
35.4
46.8
49.2
89.0
91.3
100.0
103.5
108.2
119.7
120.8
135.2
136.4
139.3
165.0
166.1
167.9
169.6
SCAFFOLD279_59140 SCAFFOLD279_55932
SCAFFOLD28_18015
SCAFFOLD18_67182 SCAFFOLD18_67347
SCAFFOLD605_112520 SCAFFOLD118_153068
SCAFFOLD605_74042 SCAFFOLD605_74164
SCAFFOLD474_40141
SCAFFOLD142_191024
SCAFFOLD422_71080 SCAFFOLD142_290702
SCAFFOLD142_356285 SCAFFOLD142_290901
SCAFFOLD40_104913 SCAFFOLD40_133995
SCAFFOLD40_105012 SCAFFOLD145_281795
SCAFFOLD40_134441 SCAFFOLD279_79575
SCAFFOLD145_295434
SCAFFOLD145_233835
SCAFFOLD145_151929 SCAFFOLD145_193794
SCAFFOLD145_78665
C257473_4515 SCAFFOLD302_281405
C257473_4394
SCAFFOLD302_278299
SCAFFOLD302_216500
SCAFFOLD193_18192 SCAFFOLD42_7667
SCAFFOLD42_47372 SCAFFOLD15_49351
SCAFFOLD42_11031 SCAFFOLD42_7882
SCAFFOLD19_38178
0.0
8.6
8.7
11.0
SCAFFOLD10_202034
SCAFFOLD363_39271
SCAFFOLD10_133672 SCAFFOLD10_87154
SCAFFOLD576_19952 SCAFFOLD10_92792
SCAFFOLD10_114461 SCAFFOLD10_114662
SCAFFOLD22_132660 SCAFFOLD22_23946
49.8
50.3
SCAFFOLD83_23496 SCAFFOLD73_253597
SCAFFOLD73_96054 SCAFFOLD73_268433
SCAFFOLD73_178173 SCAFFOLD73_253765
SCAFFOLD242_171619 SCAFFOLD242_171803
SCAFFOLD242_515851 SCAFFOLD242_464945
SCAFFOLD242_533967
MAT
SCAFFOLD403_77850
66.4
67.5
67.6
76.3
SCAFFOLD268_49331
SCAFFOLD21_181321
SCAFFOLD616_44995
SCAFFOLD616_56174
35.4
46.9
49.2
89.2
91.5
100.2
103.7
108.5
119.9
121.1
135.5
136.6
139.6
165.2
166.4
168.1
169.8
SCAFFOLD279_59140 SCAFFOLD279_55932
SCAFFOLD28_18015
SCAFFOLD18_67347 SCAFFOLD18_67182
SCAFFOLD605_74164 SCAFFOLD605_74042
SCAFFOLD118_153068 SCAFFOLD605_112520
SCAFFOLD474_40141
SCAFFOLD142_191024
SCAFFOLD422_71080 SCAFFOLD142_290702
SCAFFOLD142_290901 SCAFFOLD142_356285
SCAFFOLD40_134441 SCAFFOLD40_133995
SCAFFOLD40_104913 SCAFFOLD279_79575
SCAFFOLD145_295434 SCAFFOLD40_105012
SCAFFOLD145_281795
SCAFFOLD145_233835
SCAFFOLD145_151929 SCAFFOLD145_193794
SCAFFOLD145_78665
C257473_4515 SCAFFOLD302_281405
C257473_4394
SCAFFOLD302_278299
SCAFFOLD302_216500
SCAFFOLD42_11031 SCAFFOLD19_38178
SCAFFOLD42_7667 SCAFFOLD42_47372
SCAFFOLD15_49351 SCAFFOLD42_7882
SCAFFOLD193_18192
Figure 4. Comparison between maps of linkage group LG3 with (right) and without (left) mating type
locus (MAT). Genetic positions of markers (Haldane cM) are indicated on the left. Marker names
indicating scaffold of origin and position on that scaffold in base pairs are indicated on the right. Red
lines connect identical markers.
Alignment errors
Ten scaffolds were found to have markers that did not all cluster to the same place on the
map. In nine cases, all but one of the markers were found clustered at the same location, while
a single deviant marker mapped somewhere else. In the remaining case, there were only two
markers from scaffold 292 and they both mapped to different locations. Alignment of the
GBS tags from the lone markers (one for each of the first nine cases and both for the last one)
to the reference genome revealed that all except one of the two markers from scaffold 292
(the one on LG2) had many strong hits on a wide variety of different scaffolds. Therefore,
these markers were likely assigned to the wrong scaffold by misalignment during the
alignment step of the GBS pipeline and these occurrences do not indicate that any of the
linkage groups should be connected.
Genome Coverage
In total 198 scaffolds were represented on the linkage map. Together, these scaffolds make up
67% of the total length of the genome assembly. Furthermore, 78% of all scaffolds larger than
200 kb and 93% of all scaffolds larger than 500 kb were represented on the linkage map. By
combining the information on the physical and genetic position for all scaffolds represented
by at least four markers, the average ratio of physical to genetic distance was estimated to be
29.3 kb/cM. Using this estimate and the total length of the linkage map, the total physical
17
distance represented by the map was estimated to be roughly 56% of the total length of the
genome assembly.
Marker distribution
Two-tailed hypergeometric tests indicated that 22 scaffolds were significantly overrepresented
and three scaffolds (scaffolds 12, 100, and 177) were significantly underrepresented in the
dataset used for mapping (α = 0.05). The significantly underrepresented scaffolds together
make up 2.5% of the reference genome. After correction for multiple testing by Bonferroni or
Benjamini Hochberg correction, no significant results remained, probably reflecting the low
power resulting from the small number of observations compared to the number of tests.
Two-tailed Poisson tests to test the distribution of markers over the linkage groups indicated
that LG3 had significantly fewer markers than expected and LG6, LG10, LG12, and LG14
had significantly more markers than expected (Table 2).
Kolmogorov-Smirnov tests revealed that the distribution of markers within the linkage group
deviated significantly from the uniform expectation for linkage groups 2, 4, 5, 6, 7, 9, 10, 11,
14, 15, and 16. Q-Q plots revealed that for many linkage groups the genetic position increased
in jumps rather than in a continuous fashion, with many markers sharing the same genetic
position and relatively large gaps between clusters of markers (Figure 5). This could be the
result of a clustering of markers on the physical map, but may also be the result of large
differences in the recombination rate across the chromosome.
Table 2. Some summary statistics of the Termitomyces sp. linkage map. Average interval sizes were
calculated as the average of all non-zero distances between markers. The expected number of markers
was based on the Poisson expectation (total number of markers multiplied by the ratio between the
length of the linkage group and the total map length). P-values were computed from one tail of the
Poisson distribution and need to be compared to α/2 for a two-sided test.
Linkage Observed Longest Average
Average
Number
Expected Onegroup
length
interval marker
interval
of
number
tailed
(cM)
(cM)
spacing (cM) size (cM) markers
of
Poisson
markers
p-values
LG1
156
30.3
2.6
7.8
59
70.1
1.0∙10-1
LG2
88
24.4
2.4
5.2
37
39.5
3.9∙10-1
LG3
197
25.7
3.1
6.8
64
88.4
4.0∙10-3
LG4
155
24.5
2.1
6.4
74
69.6
3.1∙10-1
LG5
104
30.3
2.7
9.5
39
46.8
1.4∙10-1
LG6
61
22.6
1.4
5.6
43
27.6
4.0∙10-3
LG7
86
22.6
2.1
12.3
41
38.8
3.8∙10-1
LG8
98
15.9
3.0
7.0
32
43.8
3.8∙10-2
LG9
70
24.4
2.6
11.7
27
31.5
2.4∙10-1
LG10
12
8.6
0.9
2.0
13
5.4
4.1∙10-3
LG11
121
22.6
2.0
6.0
60
54.3
2.4∙10-1
LG12
15
4.8
0.5
2.6
30
6.9
7.8∙10-11
LG13
36
17.5
4.0
9.0
9
16.1
4.1∙10-2
LG14
49
14.4
1.4
6.1
35
22.1
6.9∙10-3
LG15
17
8.7
1.3
4.3
13
7.7
5.0∙10-2
LG16
39
30.3
3.9
7.7
10
17.4
4.1∙10-2
Total
1303
586
586
Average 81
20.5
2.3
6.9
37
36.6
18
Figure 5. Q-Q plots of standardised observed genetic positions of markers (Observed) against their
expectations under a uniform distribution (Expected) for all sixteen linkage groups (LG1 – LG16).
The straight line gives indicates the cases where observation and expectation are the same. Stars
indicate linkage groups where the marker distribution deviates significantly from a uniform
distribution according to a Kolmogorov-Smirnov test (α = 0.05).
Physical and genetic distance
To further examine the possibility of an uneven recombination rate across the chromosomes,
genetic distances were plotted against physical distances for all scaffolds of which at least ten
markers were represented on the linkage map (Figure 6). These plots show the same pattern of
stepwise increase of the genetic distance as the physical distance increases. The magnitude of
these steps (up to 30 cM) cannot just be explained by the limited resolution of the map, which
should be between 1 and 2 cM for 88 samples with up to 30% missing genotypes. Therefore,
the steps seen in Figure 5 are at least not entirely due to a physical clustering of markers.
19
Figure 6. Genetic position (cM) as a function of physical position (kb) for all scaffolds from
which at least ten markers are represented on the linkage map.
Recombination events
For each linkage group the number of recombination events was counted and the average
number of recombination events was estimated from these counts (Table 3). This estimate
should reflect the genetic length of the linkage group (in Morgan), but is probably an
underestimation, because any double crossovers that may have taken place between two
markers will be missed. The observed genetic lengths of the linkage groups were converted
from Haldane cM to Kosambi cM for comparison. This revealed that the Haldane length
estimates were consistently higher than expected based on the mean number of recombination
events, while the Kosambi estimates were consistently lower.
Discussion
In this study, a genotyping-by-sequencing approach was used to discover SNP markers for the
construction of the first genetic map of the Termitomyces species associated with M.
natalensis. After filtering, 591 SNPs remained for use in linkage mapping, much fewer than
reported in previous studies (Guajardo et al., 2015; İpek et al., 2016; Ma et al., 2012). This
result can be partly explained by the use of a less frequently cutting restriction enzyme in
GBS library preparation, since this will lead to fewer fragments that can be sequenced. In
addition, the filtering performed here was quite strict and more markers may have been
obtained by relaxing the filtering conditions. However, this would result in many unreliable
markers with much missing data, which would reduce the accuracy of the linkage map
(Foulongne-Oriol, 2012). Another possible explanation may be that the natural isolate from
20
which the mapping population was derived did not contain that much genetic variation, due to
e.g. inbreeding. This would be unexpected, this species of Termitomyces has been found to be
largely outbreeding (De Fine Licht et al., 2006), but it could still have happened purely by
chance.
Table 3. Numbers of recombination events per linkage group, average number of recombination
events and observed linkage group lengths in both Haldane and Kosambi cM.
Linkage
Number of recombination events Mean number of Observed
Observed
group
per linkage group
recombination
length
length
events
(Haldane
cM)
(Kosambi cM)
0
1
2
3
4
5
LG1
19
33
28
8
0
0 1.28
156
95
LG2
35
40
12
1
0
0 0.76
88
59
LG3
16
32
26
11
2
1 1.48
197
115
LG4
14
39
30
4
1
0 1.31
155
94
LG5
29
44
14
1
0
0 0.85
104
68
LG6
45
39
4
0
0
0 0.53
61
44
LG7
38
38
11
1
0
0 0.72
86
58
LG8
29
42
16
1
0
0 0.88
98
64
LG9
49
28
11
0
0
0 0.57
70
49
LG10
78
10
0
0
0
0 0.11
12
11
LG11
21
43
20
4
0
0 1.08
121
77
LG12
76
11
1
0
0
0 0.15
15
14
LG13
60
28
0
0
0
0 0.32
36
28
LG14
51
35
2
0
0
0 0.44
49
37
LG15
74
14
0
0
0
0 0.16
17
15
LG16
62
26
0
0
0
0 0.30
39
30
Total
1303
857
Average
0.68
81
54
The GBS analysis also yielded many loci containing heterozygous genotype calls. Since
heterozygosity should not be present in a population of homokaryons, these calls are probably
the result of alignments of paralogous or repetitive sequences to the same place on the
reference genome. The many occurrences of this phenomenon may indicate problems with the
alignment to or the assembly of the reference genome. Fortunately, the nature of the mapping
population allowed for reliable detection of these events and therefore they should not
influence the quality of the genetic map.
The GBS analysis pipeline used here is not the only pipeline that can be used for the analysis
of GBS data. Other pipelines, such as the UNEAK pipeline, which does not require a
reference genome (Lu et al., 2013), and the recently developed reference optional GBS-SNPCROP pipeline (Melo et al., 2016), can also be used. Reanalysing the data using these
pipelines may improve the number of SNPs identified, since the overlap in the SNPs
discovered by each of these pipelines tends to be small (Melo et al., 2016).
Using the markers identified by GBS, a linkage map was constructed. The map was based on
88 haploid progeny and consisted of 586 markers. The map currently consists of sixteen
linkage groups, which seems to reasonably correspond with haploid chromosome numbers
reported for other basidiomycetes, such as Agaricus bisporus (13; Royer et al., 1992) and
Schizophyllum Commune (11; Carmi et al., 1978). Unfortunately, the haploid chromosome
numbers of all Termitomyces species are still unknown, and therefore cannot be used as
comparison.
21
The current groupings at LOD thresholds of 6 and higher appear to be quite reliable, since
many cases can be found where linkage across the larger intervals is backed up by
information from the scaffold on which the marker was found. In addition, all cases in which
a single marker did not map near the other markers from the scaffold that it was thought to be
part of, could be contributed to errors in the alignment step of the GBS pipeline.
Some of the current linkage groups may, however, still belong together. For example, linkage
groups 8 and 11 were split apart here, because of the large interval between the two groups,
even though they were still grouped together at a LOD threshold of 8. In addition, some of the
current linkage groups are quite small and may well be incomplete, or belong to other linkage
groups. New rounds of GBS to add more individuals or find additional markers may provide
additional evidence for some of these groups to be linked together. Moreover, the constituent
homokaryons of the parent heterokaryon could be included in a new round of GBS, which
would improve the accuracy of the map, because the parental linkage phases would no longer
need to be estimated from the data. Since the parent heterokaryon was a natural isolate, its
constituent homokaryons are not available, but they may be recreated by protoplasting.
A preliminary mapping of the mating type locus to the linkage map indicates with reasonable
certainty (up to a LOD threshold of 8) that this locus belongs on linkage group 3. The exact
position of the locus on this linkage group may, however, be unreliable, because only 29 of
the 88 individuals used in mapping were genotyped for this trait. Furthermore, genotyping
was based on the phenotypic examination of the product of crosses, which may be errorprone. Therefore, further studies are needed, in which all individuals used in mapping are
genotyped in a more reliable way, e.g. using molecular markers that differ between the two
crossed individuals to detect the successful formation of a heterokaryon.
Due to the fragmented nature of the current reference genome (consisting of hundreds of
scaffolds and thousands of contigs), it is impossible to be certain if the linkage map covers the
entire genome. Estimates based on the lengths of the scaffolds represented on the map and the
average genetic to physical distance ratio indicate that roughly half the genome is represented
by the linkage map. This might indicate that part of the genome is not represented, possibly
due to a lack of heterozygosity of the parent heterokaryon at certain regions of the genome.
However, since most scaffolds are quite small and most of the larger scaffolds were
represented on the linkage map, many scaffolds that were not represented on the map may
actually fall in between scaffolds that were. In addition, the estimate of the genetic to physical
distance ratio is based on many small pieces of the linkage map and may be quite inaccurate,
especially if the recombination rate is not constant across the genome. Also, there do not
appear to be many scaffolds that have significantly fewer markers than expected by chance.
For these reasons, there is no conclusive evidence that large parts of the genome are missing
from the linkage map, although it cannot be ruled out. Future studies aimed at improving the
reference genome assembly may help resolve this issue. Such efforts may be informed by the
current linkage map, which already provides an order for a large number of scaffolds,
illustrating how genetic and physical mapping approaches can complement each other.
Markers appear to be mostly fairly distributed across the linkage groups, with a few
exceptions. Distribution of markers within the linkage groups is, however, mostly not
uniform. Instead, many linkage groups contain clusters of many markers at the same genetic
position, spaced by relatively large intervals without any markers. This could be explained by
a physical clustering of markers, or by large differences in the recombination rate along the
chromosome. Plots of the genetic distance as a function of the physical distance (for the
scaffolds where this is possible), indicate that the presence of strong recombination hotspots
and coldspots is at least part of the answer. The fact that these ‘jumps’ in genetic distance
along the physical chromosome are visible at all of the examined positions may indicate that
the recombination landscape of Termitomyces is largely governed by these hotspots and
22
coldspots. Further evidence for this hypothesis would require a more complete, less
fragmented reference genome assembly, which would allow comparison of physical and
genetic distance over larger regions of the chromosomes.
The genetic lengths of the linkage groups estimated from the average number of
recombination events was systematically lower than the estimate of the mapping software in
Haldane cM and systematically higher than that estimate in Kosambi cM. Since using the
number of recombination events likely underestimates the genetic lengths (double crossovers
between adjacent markers are missed) and the estimates from the maximum likelihood
algorithm tend to be inflated (Hackett and Broadfoot, 2003), the Haldane mapping function
probably produces more accurate estimates of the true lengths than the Kosambi mapping
function. This indicates that crossover interference is at least less strong than assumed by the
Kosambi function and may even be completely absent.
In addition to providing an order for many of the scaffolds from the reference genome and
offering insight into the recombination behaviour of the species, the linkage map will be a
useful tool for future genetic analyses. Although it is possible that it is not yet complete, it has
already proven useful in narrowing down the location of the mating type locus. Other
interesting genes and QTLs segregating in the mapping population may be mapped in similar
ways, potentially allowing the discovery of genes involved in the symbiosis with M.
natalensis as well as the formation of mushrooms. Such mapping studies have already been
successfully performed for several commercially important traits such as yield (FoulongneOriol et al., 2012), bruising sensitivity (Gao et al., 2015), and disease resistance (Moquet et
al., 1999) in the cultivated white button mushroom (Agaricus bisporus). When cultivation of
Termitomyces mushrooms becomes feasible, the map may also become a useful tool in
breeding, for example through marker assisted selection. Future efforts should focus on
improving the reference genome assembly and determining the haploid chromosome number
to help find the physical location and exact sequence of mapped genes, as well as gain
additional insights into the recombination landscape of Termitomyces.
Conclusion
Here, the first genetic map of the Termitomyces species associated with M. natalensis is
presented. The map, based on 88 haploid progeny of a single heterokaryon, consists of 586
SNP markers discovered by GBS, indicating that GBS is not only a cost-effective way of
marker discovery for linkage mapping in plants, but also in fungi. The map was used to
narrow down the location of the mating type locus and will be a useful tool for the
identification of other loci. In addition, it provides indications that the recombination
landscape of Termitomyces is dominated by hotspots and coldspots and that crossover
interference plays a relatively small role. Future efforts to improve the assembly of the
reference genome will be necessary to confirm these indications.
Acknowledgements
I would like to thank Sabine Vreeburg for supervising me, isolating the mapping population,
and performing the preliminary genotyping of the mating type locus. In addition, I would like
to thank Bertha Koopmanschap and Marijke Slakhorst for help in the lab, Lennart van de
Peppel for help with the PCR, Alex Grum Grzhimaylo for help with the RNase, Bart
Pannebakker for help with the Linux computer, Erik Wijnker for useful discussions about
linkage mapping, and Duur Aanen for useful discussions and comments on the report. Also, I
would like to thank the people from the Genomic Diversity Facility at Cornell University for
performing the GBS analysis and useful discussions on the sample preparation.
23
References
Aanen, D. K., & Eggleton, P. (2005). Fungus-growing termites originated in African rain
forest. Current biology, 15(9), 851-855.
Aanen, D. K., Eggleton, P., Rouland-Lefevre, C., Guldberg-Frøslev, T., Rosendahl, S., &
Boomsma, J. J. (2002). The evolution of fungus-growing termites and their mutualistic
fungal symbionts. Proceedings of the National Academy of Sciences, 99(23), 1488714892.
Aanen, D. K., Ros, V. I., de Fine Licht, H. H., Mitchell, J., De Beer, Z. W., Slippers, B., ... &
Boomsma, J. J. (2007). Patterns of interaction specificity of fungus-growing termites
and Termitomyces symbionts in South Africa. BMC evolutionary biology, 7(115).
Bielenberg, D. G., Rauh, B., Fan, S., Gasic, K., Abbott, A. G., Reighard, G. L., ... & Wells, C.
E. (2015). Genotyping by Sequencing for SNP-Based Linkage Map Construction and
QTL Analysis of Chilling Requirement and Bloom Date in Peach [Prunus persica (L.)
Batsch]. PloS one, 10(10), e0139406.
Botha, W. J., & Eicker, A. (1992). Nutritional value of Termitomyces mycelial protein and
growth of mycelium on natural substrates. Mycological research, 96(5), 350-354.
Bradbury, P. J., Zhang, Z., Kroon, D. E., Casstevens, T. M., Ramdoss, Y., & Buckler, E. S.
(2007). TASSEL: software for association mapping of complex traits in diverse
samples. Bioinformatics, 23(19), 2633-2635.
Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., & Madden,
T. L. (2009). BLAST+: architecture and applications. BMC bioinformatics, 10(421).
Carmi, P., Holm, P. B., Koltin, Y., Rasmussen, S. W., Sage, J., & Zickler, D. (1978). The
pachytene karyotype of Schizophyllum commune analyzed by three dimensional
reconstruction of synaptonemal complexes. Carlsberg Research Communications,
43(2), 117-132.
Chatterjee, A., Khatua, S., Chatterjee, S., Mukherjee, S., Mukherjee, A., Paloi, S., ... &
Bandyopadhyay, S. K. (2013). Polysaccharide-rich fraction of Termitomyces eurhizus
accelerate healing of indomethacin induced gastric ulcer in mice. Glycoconjugate
journal, 30(8), 759-768.
Davey, J. W., Hohenlohe, P. A., Etter, P. D., Boone, J. Q., Catchen, J. M., & Blaxter, M. L.
(2011). Genome-wide genetic marker discovery and genotyping using next-generation
sequencing. Nature Reviews Genetics, 12(7), 499-510.
De, A. B. (1983). Basidiocarp production by Termitomyces microcarpus (Berk. and Br.) Heim
in culture. Current Science, 52(10), 494-495.
De Fine Licht, H. H., Andersen, A., & Aanen, D. K. (2005). Termitomyces sp. associated with
the termite Macrotermes natalensis has a heterothallic mating system and
multinucleate cells. Mycological research, 109(3), 314-318.
De Fine Licht, H. H., Boomsma, J. J., & Aanen, D. K. (2006). Presumptive horizontal
symbiont transmission in the fungus‐growing termite Macrotermes natalensis.
Molecular ecology, 15(11), 3131-3138.
Elshire, R. J., Glaubitz, J. C., Sun, Q., Poland, J. A., Kawamoto, K., Buckler, E. S., &
Mitchell, S. E. (2011). A robust, simple genotyping-by-sequencing (GBS) approach
for high diversity species. PloS one, 6(5), e19379.
Foulongne-Oriol, M. (2012). Genetic linkage mapping in fungi: current state, applications,
and future trends. Applied microbiology and biotechnology, 95(4), 891-904.
Foulongne-Oriol, M., Rodier, A., Rousseau, T., & Savoie, J. M. (2012). Quantitative Trait
Locus Mapping of Yield-Related Components and Oligogenic Control of the Cap
24
Color of the Button Mushroom, Agaricus bisporus. Applied and environmental
microbiology, 78(7), 2422–2434.
Foulongne-Oriol, M., Spataro, C., Cathalot, V., Monllor, S., & Savoie, J. M. (2010). An
expanded genetic linkage map of an intervarietal Agaricus bisporus var. bisporus× A.
bisporus var. burnettii hybrid based on AFLP, SSR and CAPS markers sheds light on
the recombination behaviour of the species. Fungal Genetics and Biology, 47(3), 226236.
Gao, W., Weijn, A., Baars, J. J., Mes, J. J., Visser, R. G., & Sonnenberg, A. S. (2015).
Quantitative trait locus mapping for bruising sensitivity and cap color of Agaricus
bisporus (button mushrooms). Fungal Genetics and Biology, 77, 69-81.
Glaubitz, J. C., Casstevens, T. M., Lu, F., Harriman, J., Elshire, R. J., Sun, Q., & Buckler, E.
S. (2014). TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline.
PLoS One, 9(2), e90346.
Guajardo, V., Solís, S., Sagredo, B., Gainza, F., Muñoz, C., Gasic, K., & Hinrichsen, P.
(2015). Construction of high density sweet cherry (Prunus avium L.) linkage maps
using microsatellite markers and SNPs detected by genotyping-by-sequencing (GBS).
PloS one, 10(5), e0127750.
Hackett, C. A., & Broadfoot, L. B. (2003). Effects of genotyping errors, missing values and
segregation distortion in molecular marker data on the construction of linkage maps.
Heredity, 90(1), 33-38.
İpek, A., Yılmaz, K., Sıkıcı, P., Tangu, N. A., Öz, A. T., Bayraktar, M., ... & Gülen, H.
(2016). SNP Discovery by GBS in Olive and the Construction of a High-Density
Genetic Linkage Map. Biochemical genetics 54(3), 313-325.
Johnson, R. A., Thomas, R. J., Wood, T. G., & Swift, M. J. (1981). The inoculation of the
fungus comb in newly founded colonies of some species of the Macrotermitinae
(Isoptera) from Nigeria. Journal of Natural History, 15(5), 751-756.
Jones, N., Ougham, H., Thomas, H., & Pašakinskienė, I. (2009). Markers and mapping
revisited: finding your gene. New Phytologist, 183(4), 935-966.
Kansci, G., Mossebo, D. C., Selatsa, A. B., & Fotso, M. (2003). Nutrient content of some
mushroom species of the genus Termitomyces consumed in Cameroon.
Food/Nahrung, 47(3), 213-216.
Kües, U., James, T. Y., & Heitman, J. (2011). Mating Type in Basidiomycetes: Unipolar,
Bipolar, and Tetrapolar Patterns of Sexuality. In S. Pöggeler & J. Wöstemeyer (Eds.),
The Mycota XIV: Evolution of fungi and fungal-like organisms (pp. 97-160). Berlin:
Springer.
Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows–Wheeler
transform. Bioinformatics, 25(14), 1754-1760.
Lu, F., Lipka, A. E., Glaubitz, J., Elshire, R., Cherney, J. H., Casler, M. D., ... & Costich, D.
E. (2013). Switchgrass genomic diversity, ploidy, and evolution: novel insights from a
network-based SNP discovery protocol. PLoS Genetics, 9(1), e1003215.
Ma, X. F., Jensen, E., Alexandrov, N., Troukhan, M., Zhang, L., Thomas-Jones, S., ... &
Flavell, R. (2012). High resolution genetic mapping by genome sequencing reveals
genome duplication and tetraploid genetic structure of the diploid Miscanthus
sinensis. PloS one, 7(3), e33821.
Malek, S. N. A., Kanagasabapathy, G., Sabaratnam, V., Abdullah, N., & Yaacob, H. (2012).
Lipid components of a Malaysian edible mushroom, Termitomyces heimii natarajan.
International Journal of Food Properties, 15(4), 809-814.
Melo, A. T., Bartaula, R., & Hale, I. (2016). GBS-SNP-CROP: a reference-optional pipeline
for SNP discovery and plant germplasm characterization using variable length, pairedend genotyping-by-sequencing data. BMC bioinformatics, 17(29).
25
Moquet, F., Desmerger, C., Mamoun, M., Ramos-Guedes-Lafargue, M., & Olivier, J. M.
(1999). A quantitative trait locus of Agaricus bisporus resistance to Pseudomonas
tolaasii is closely linked to natural cap color. Fungal Genetics and Biology, 28(1), 3442.
Nabubuya, A., Muyonga, J. H., & Kabasa, J. D. (2010). Nutritional and hypocholesterolemic
properties of Termitomyces microcarpus mushrooms. African Journal of Food,
Agriculture, Nutrition and Development, 10(3), 2235-2257.
Nobre, T., Koopmanschap, B., Baars, J. J., Sonnenberg, A. S., & Aanen, D. K. (2014). The
scope for nuclear selection within Termitomyces fungi associated with fungus-growing
termites is limited. BMC evolutionary biology, 14(121).
Ogundana, S. K., & Fagade, O. E. (1982). Nutritive value of some Nigerian edible
mushrooms. Food chemistry, 8(4), 263-268.
Okuda, Y., Murakami, S., & Matsumoto, T. (2009). A genetic linkage map of Pleurotus
pulmonarius based on AFLP markers, and localization of the gene region for the
sporeless mutation. Genome, 52(5), 438-446.
Oso, B. A. (1975). Mushrooms and the Yoruba people of Nigeria. Mycologia, 67(2), 311-319.
Poulsen, M., Hu, H., Li, C., Chen, Z., Xu, L., Otani, S., ... & Zhang, G. (2014).
Complementary symbiont contributions to plant decomposition in a fungus-farming
termite. Proceedings of the National Academy of Sciences, 111(40), 14500-14505.
R Core Team (2014). R: A language and environment for statistical computing. R Foundation
for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
Remington, D. L., Whetten, R. W., Liu, B. H., & O’malley, D. M. (1999). Construction of an
AFLP genetic map with nearly complete genome coverage in Pinus taeda. Theoretical
and Applied Genetics, 98(8), 1279-1292.
Rowe, H. C., Renaut, S., & Guggisberg, A. (2011). RAD in the realm of next‐generation
sequencing technologies. Molecular Ecology, 20(17), 3499-3502.
Royer, J. C., Hintz, W. E., Kerrigan, R. W., & Horgen, P. A. (1992). Electrophoretic
karyotype analysis of the button mushroom, Agaricus bisporus. Genome, 35(4), 694698.
Sands, W. A. (1956). Some factors affecting the survival of Odontotermes badius. Insectes
sociaux, 3(4), 531-536.
Van der Nest, M. A., Slippers, B., Steenkamp, E. T., De Vos, L., Van Zyl, K., Stenlid, J., ... &
Wingfield, B. D. (2009). Genetic linkage map for Amylostereum areolatum reveals an
association between vegetative growth and sexual and self-recognition. Fungal
Genetics and Biology, 46(9), 632-641.
Van Ooijen JW. (2016) JoinMap ® 4, Software for the calculation of genetic linkage maps in
experimental populations. Kyazma B.V., Wageningen, Netherlands.
Xu, J., Kerrigan, R. W., Horgen, P. A., & Anderson, J. B. (1993). Localization of the mating
type gene in Agaricus bisporus. Applied and environmental microbiology, 59(9),
3044-3049.
26
A simple, working protoplasting protocol for the
Termitomyces symbiont of Macrotermes natalensis
Abstract
Including the constituent homokaryons of the parent heterokaryon from the linkage mapping
study described in the previous chapter in a new mapping study may help improve the
accuracy of the genetic map. Obtaining these constituent homokaryons requires the ability to
make protoplasts of the parent heterokaryon, but the only efficient protoplasting protocol for
Termitomyces is quite complicated. Here, a simple, working protoplasting protocol for the
Termitomyces symbiont of Macrotermes natalensis is presented, yielding up to 5∙106
protoplasts/mL. Protoplasts could be regenerated on plates with sucrose as osmotic stabiliser,
but not on plates with KCl. Homokaryons among regenerated protoplasts may be identified
and used in future linkage mapping studies. For other purposes, such as transformation, the
current protocol will need to be optimised, to improve its efficiency.
Introduction
To improve the accuracy of the genetic map described in the first part of this report, the
constituent homokaryons of the parent heterokaryon of the mapping population may be
included in a future round of GBS and linkage mapping. This would allow the direct
determination of the parental genotypes and therefore the parental linkage phases would no
longer need to be estimated from the data on the recombined progeny. Unfortunately, the
parent heterokaryon was a natural isolate and not the result of an artificial cross between two
homokaryons. Therefore, its constituent homokaryons are not available. However, the parent
heterokaryon still contains two separate types nuclei, each with the genome of one of its two
homokaryotic parents. Degrading the cell wall of this heterokaryon to liberate protoplasts
(which may by chance occasionally contain only one of the two different types of nuclei)
therefore allows for the regeneration of the constituent homokaryons.
Efficient protoplasting protocols for filamentous fungi have already been described for many
organisms, including ascomycetes Aspergillus niger (Arentshorst et al., 2012) and
Cochliobolus heterostrophus (Turgeon et al., 2010), as well as basidiomycetes Agaricus
bisporus and Agaricus bitorquis (Sonnenberg et al., 1988). A protoplasting protocol has also
been published for Termitomyces clypeatus (Mukherjee and Sengupta, 1988), but this protocol
is quite complicated and impractical.
Here, a practical, working protoplasting protocol for the Termitomyces symbiont of
Macrotermes natalensis is presented. In this protocol, young mycelium from either liquid or
solid medium is used as starting material. Using material from a solid culture yielded the
highest concentration of protoplasts. The current protocol will be useful for the recreation of
the constituent homokaryons of the parent heterokaryon from the mapping population. Further
optimisation of the protocol will be needed to make the procedure efficient enough for e.g.
transformations.
Methods
Strain and growth medium
The strain used for protoplasting was the parent heterokaryon from which the mapping
population for the construction of the genetic linkage map was obtained (see the methods
from the first part of this report). The liquid growth medium was malt yeast extract (MY)
27
medium, consisting of 20 g/L malt extract and 2 g/L yeast extract. Solid growth medium (malt
yeast extract agar; MYA) consisted of 20 g/L malt extract, 2 g/L yeast extract and 15 g/L
agar.
Growth and harvesting of mycelium for protoplasting
One gram of mycelium (wet weight) from an old liquid culture was crushed in 1 mL saline
solution (8 g/L NaCl) to obtain a homogeneous suspension. For growth in liquid culture, 1
mL of this suspension was added to 100 ml MY medium in a sterile 500 ml Erlenmeyer flask.
For growth on solid medium, 300 μL of the saline suspension was spread on an MYA plate
with a 76 mm polycarbonate membrane with a pore size of 0.1 μm (Profiltra, catalog number
K01CP07600) to prevent the mycelium from growing into the agar. Erlenmeyers were
incubated for one week at 25 °C and 100 rpm and plates for two days at 25 °C. Young
mycelium from liquid cultures was harvested using a sterile Büchner funnel with a sterile
nylon filter, washed with 0.6 M sucrose and scraped into a pre-weighted petri dish for
weighing. Young mycelium from plates was scraped directly from the agar plate into a preweighted petri dish. Harvested material was weighed and 1.2 g of material from the liquid
culture and 0.75 g of material from the solid culture was used for protoplasting.
Protoplast production
The harvested mycelium was added to 10 mL of protoplasting solution (0.6 M sucrose and 20
g/L Novozym 234 (Novo Nordisk), filter sterilised) in a 50 mL tube. The protoplasting
mixture was incubated for 2-3 hours at 30 °C and 80 rpm, shaking horizontally. The resulting
mixture with protoplasts was filtered over a glass wool plug in a funnel pre-rinsed with 0.6 M
sucrose into a fresh tube, after which the filter was rinsed again with 0.6 M sucrose. The
protoplasts in the filtrate were collected by centrifugation (10 min, 2000 × g, 10 °C). The
supernatant was discarded, the protoplasts were resuspended in 5 mL 0.6 M sucrose and
collected again by centrifugation (5 min, 3000 × g, room temperature). Again, the supernatant
was discarded and the protoplasts were resuspended in 5 mL 0.6 M sucrose, after which their
concentration was determined using a Neubauer haemocytometer (Brand GmbH + Co KG).
The mixture was then again centrifuged (5 min, 3000 × g, room temperature), the supernatant
was discarded and the protoplasts were resuspended in 0.6 M sucrose to a concentration of
approximately 5∙106 protoplasts/mL.
Protoplast regeneration
Dilutions of factor 10 and 100 were made from the protoplast suspension derived from the
liquid culture and dilutions of factor 10, 100, and 1000 were made from the protoplast
suspension derived from the solid culture. Of each dilution, 100 μL was spread on each of two
different types of regeneration plates: MYA with 0.6 M sucrose and MYA with 0.5 M KCl.
Regeneration plates were incubated for 6 days at 25 °C. From the plates where regeneration
had been successful after the incubation period, individual outgrowths from regenerated
protoplasts were transferred to fresh MYA plates, which were incubated at 25 °C.
Results
The mycelium from the liquid culture yielded 500 μL of 5∙106 protoplasts/mL, while the
mycelium from solid culture yielded 5 mL of 5∙106 protoplasts/mL. Protoplast regeneration
was observed on all plates with sucrose as osmotic stabiliser, but on none of the plates where
KCl was used as osmotic stabiliser.
28
Discussion
In spite of the fact that less mycelium from the solid culture than from the liquid culture was
used in protoplasting, the mycelium from the solid culture yielded roughly ten times as many
protoplasts. A reason for this may be that the solid culture grew much faster, allowing for its
use after only two days, as compared to one week for the liquid culture. Therefore, the
mycelium from the solid culture was much younger at the time of protoplasting. Young
mycelium tends to yield more protoplasts, because there has been less time for the fungus to
form a thick cell wall (Turgeon et al., 2010).
Protoplasts failed to regenerate on any of the plates using 0.5 M KCl as osmotic stabiliser.
This is surprising, since an efficient protoplasting protocol has been published for
Termitomyces clypeatus, in which the same concentration KCl is used as osmotic stabiliser in
the regeneration plates (Mukherjee and Sengupta, 1988). The fact that KCl was combined
with MYA here and with a different medium in the previous study, may be the cause of the
different outcome.
The regenerated protoplasts that were picked up from the regeneration plates with 0.6 M
sucrose can be tested for homozygosity using the PCR-RFLP marker developed in the linkage
mapping study from the previous part of this report. This way, with a bit of luck, the two
different constituent homokaryons may be found in the population of regenerated protoplasts
and used in a future round of GBS to improve the genetic map.
The current protocol provides a relatively simple, working method for the generation of
protoplasts. For some purposes, such as protoplast transformation, which requires about 108
protoplasts/mL (Turgeon et al., 2010), the current method is, however, not yet efficient
enough. Further optimisation of the protocol will be needed to improve its efficiency.
Possibilities for optimisation include adapting the amount as well as the youth of the
mycelium to be treated with lytic enzymes. Using more and younger mycelium may result in
the release of more protoplasts. In addition, other lytic enzymes could be added to the
protoplasting solution. Mukherjee and Sengupta (1988) showed that a combination of
cellulase, chitinase, and novozym 234 was far more effective than novozym 234 by itself.
Therefore, adding cellulase and chitinase may also improve the protoplasting efficiency.
However, the high concentrations of these enzymes that were used in the previous study may
make their use more expensive than can be justified by the increase in efficiency.
Conclusion
Here, a simple, working protoplasting protocol for the Termitomyces symbiont of M.
natalensis was presented. The protocol is most efficient with relatively young mycelium
grown on solid medium. The current efficiency of the protocol may be enough for the
isolation of the constituent homokaryons of a heterokaryon, but for other purposes, such as
transformations, it still needs to be optimised.
Acknowledgements
I would like to thank Sabine Vreeburg for supervising me, Marijke Slakhorst for assistance in
the lab, and Linda van Oosten for continuing with the identification of homokaryons among
the regenerated protoplasts.
29
References
Arentshorst M., Ram A. F. J., & Meyer V. (2012) Using non-homologous end-joiningdeficient strains for functional gene analyses in filamentous fungi. In Bolton M. D. &
Thomma B. P. H. J. (Eds.), Plant fungal pathogens: methods and protocols, methods
in molecular biology (pp. 133-150). New York: Springer Science + Business Media
LLC.
Mukherjee, M., & Sengupta, S. (1988). Isolation and regeneration of protoplasts from
Termitomyces clypeatus. Canadian journal of microbiology, 34(12), 1330-1332.
Sonnenberg, A. S., Wessels, J. G., & van Griensven, L. J. (1988). An efficient
protoplasting/regeneration system for Agaricus bisporus and Agaricus bitorquis.
Current Microbiology, 17(5), 285-291.
Turgeon, B. G., Condon, B., Liu, J., & Zhang, N. (2010). Protoplast transformation of
filamentous fungi. In Sharon A., (Ed.), Molecular and Cell Biology Methods for Fungi
(pp. 3-19). Totowa: Humana Press.
30
Determining the mating system of the Termitomyces
symbiont of Macrotermes natalensis
Abstract
To reliably determine the mating system of the Termitomyces symbiont of Macrotermes
natelensis and develop tester strains to determine the mating type of any homokaryon from
this species, a series of crosses was set up. The products of these crosses were purified and are
now ready for DNA isolation and further analysis using molecular markers to determine
which crosses were successful.
Introduction
To map the mating type genes of the Termitomyces symbiont of Macrotermes natalensis more
reliably than was done in the linkage mapping study from the first part of this report, the
success of crosses should not be determined based on phenotypic observations, which may
not always be conclusive. Instead, molecular markers could be used. In addition, the mating
system should be determined in a more systematic way, to make sure that there is really only
one mating type locus, as was assumed in the mapping study.
From a previous study, it is known that the Termitomyces symbiont of M. natalensis has a
heterothallic mating system (De Fine Licht et al., 2005). In this type of system, only
homokaryons of a compatible mating type can successfully form a heterokaryon together. The
heterothallic mating systems can be subdivided in bipolar and tetrapolar systems. The
tetrapolar system is more common in basidiomycetes and is believed to be the ancestral state.
In this system, there are two multi-allelic mating type loci and compatibility occurs only if the
alleles of two homokaryons are different at both loci (Kües et al., 2011). The function of such
a mating system is to restrict inbreeding and promote outbreeding (Casselton and Economou,
1985). In a tetrapolar mating system, homokaryons are never compatible with themselves and
are only compatible with 25% of the homokaryons that originated from the same
heterokaryon, because both of the two unlinked mating type loci need to be different.
Outbreeding, on the other hand, can remain largely unrestricted as long as there are many
different mating type alleles present in the population (Casselton and Economou, 1985).
In contrast to species with a tetrapolar system, species with a bipolar mating system only have
one mating type locus that needs to be heteroallelic for compatibility to occur. The inbreeding
restriction in species with this system is, therefore, only 50%. Bipolarity is thought to have
evolved from tetrapolarity several times independently in basidiomycetes. In some cases,
bipolarity arose because the two different mating type loci became linked, resulting in
effectively one mating type locus (Bakkeren and Kronstad, 1994; Nieuwenhuis et al., 2013).
In other cases, one of the two mating type loci lost its function in determining mating
compatibility (Aimi et al., 2005; James et al., 2006).
Here, a series of crosses is performed using a subset of the homokaryons from the mapping
population used in linkage mapping as described in the first chapter. By determining which
combinations of these homokaryons can form a stable heterokaryon, the type of the mating
system can be deduced. This strategy has already been successfully used in previous studies to
identify the mating system of other species of basidiomycetes (Gordon and Petersen, 1991;
Aanen and Kuyper, 1999). In addition, tester strains from each of the two or four different
mating types could be identified using these crosses. Such tester strains could be used to
determine the mating type of all homokaryons in the mapping population, allowing for an
accurate mapping of the mating type locus or loci to the genetic map.
31
Usually, heterokaryons can be distinguished from homokaryons by the formation of clamp
connections or the possession of two nuclei per cell (Kües, 2000). Unfortunately,
Termitomyces heterokaryons do not form clamp connections and have multinucleate cells,
making it hard to distinguish homokaryons from heterokaryons morphologically (De Fine
Licht et al., 2005). Heterokaryons do grow slightly faster than homokaryons (Kües, 2000), a
characteristic that was used in the preliminary mapping of the mating type locus, but this is
not a very accurate criterion for distinguishing between the two. Therefore, molecular markers
will need to be used to detect the successful formation of a heterokaryon. To this end, samples
from the contact zones of the crosses have been purified and prepared for DNA extraction.
Methods
Set-up of crosses
Crosses were made between twenty homokaryons used in the construction of the linkage map.
The first ten homokaryons were crossed in all possible combinations, while the last ten were
only crossed to each of the first ten (Figure 7). These numbers were chosen such that, in case
of a tetrapolar mating system, the probability of finding three of the four mating types in the
first ten samples would be greater than 99% and the probability of all four mating types being
present in all twenty samples would be greater than 98% (for calculations see Appendix 1).
Sample 1
1
2
3
4
5
6
7
8
9
10
2
x
3
x
x
4
x
x
x
5
x
x
x
x
6
x
x
x
x
x
7
x
x
x
x
x
x
8
x
x
x
x
x
x
x
9 10 11 12 13 14 15 16 17 18 19 20
x x x x x x x x x x x x
x x x x x x x x x x x x
x x x x x x x x x x x x
x x x x x x x x x x x x
x x x x x x x x x x x x
x x x x x x x x x x x x
x x x x x x x x x x x x
x x x x x x x x x x x x
x x x x x x x x x x x
x x x x x x x x x x
Figure 7. Scheme of all crosses for the identification of the mating system. Crosses that have been
performed are indicated with an “x”.
Crosses were made on plates with malt yeast extract agar (MYA; 20 g/L malt extract, 2 g/L
yeast extract and 15 g/L agar). All crosses were done in duplicate on the same plate in such a
configuration that the mating between two different homokaryons could immediately be
compared to that between two identical homokaryons (Figure 8). Spacing between the
different inoculation sites was approximately 5 mm. Crosses were incubated for three weeks
at 25 °C.
Figure 8. Configuration of inoculation of the homokaryon strains for the crossing experiments. Blue
squares (number 1) indicate one parent for the cross and red squares (number 2) indicate the other
parent.
32
Purification of cross products
To isolate the products of the crosses and make sure that they consist of only a single
mycelium (heterokaryon or homokaryon), rather than a mixture of two homokaryons, the
cross products had to be purified. To this end, a sample from the contact area of each cross
was transferred to the middle a fresh MYA plate. After three weeks of incubation at 25 °C, a
sample from the edge of the newly grown colony was again transferred to a fresh MYA plate.
If several morphologically distinct areas were observed in the new colonies, a sample from
each of these different areas was transferred. New plates with purified colonies were again
incubated at 25 °C. Samples from the purified colonies were grown on liquid malt yeast
extract medium (20 g/L malt extract and 2 g/L yeast extract) in a shaker at 25 °C and 90 rpm,
after which they were frozen for later DNA extraction and further analysis.
Results
Purified products of most of the crosses have been obtained and are ready for DNA extraction
and further analysis.
Discussion
To determine the outcome of the crosses performed here, DNA will need to be extracted from
the purified products and analysed for heterozygosity of markers for which the two crossed
homokaryons had different alleles. SNP markers that can be used for this analysis can be
selected from the filtered GBS dataset used for the construction of the genetic map. Since
SNP marker genotypes of all of the homokaryons used in the crosses are available, markers
can be selected in such a way that all twenty homokaryons can be distinguished from one
another using a minimal number of markers. The DNA isolated from the products of the
crosses can then be genotyped for these markers using a high-throughput method such as
KASP (Kompetitive Allele Specific PCR) genotyping (He et al., 2014). From the pattern of
compatible and incompatible matings, the mating system can then be deduced, and tester
strains of each of the mating types can be identified. These tester strains can then be crossed
with each of the homokaryons from the mapping population. By analysing the outcome of
these crosses in the same way as before, the mating type of all homokaryons in the mapping
population can be determined. These data can then be used to map the mating type locus more
accurately than before.
Acknowledgements
I would like to thank Sabine Vreeburg for supervising me, Marijke Slakhorst for help in the
lab, and Linda van Oosten for transferring the purified products of the crosses to liquid
culture.
References
Aanen, D. K., & Kuyper, T. W. (1999). Intercompatibility tests in the Hebeloma
crustuliniforme complex in northwestern Europe. Mycologia, 91(5), 783-795.
Aimi, T., Yoshida, R., Ishikawa, M., Bao, D., & Kitamoto, Y. (2005). Identification and
linkage mapping of the genes for the putative homeodomain protein (hox1) and the
putative pheromone receptor protein homologue (rcb1) in a bipolar basidiomycete,
Pholiota nameko. Current genetics, 48(3), 184-194.
33
Bakkeren, G., & Kronstad, J. W. (1994). Linkage of mating-type loci distinguishes bipolar
from tetrapolar mating in basidiomycetous smut fungi. Proceedings of the National
Academy of Sciences, 91(15), 7085-7089.
Casselton, L. A., & Economou, A. (1985). Dikaryon formation. In D. Moore, L. A. Casselton,
D. A. Wood & J. C. Frankland (Eds.), Developmental Biology of Higher Fungi (pp.
213-230). Cambridge: Cambridge University Press.
De Fine Licht, H. H., Andersen, A., & Aanen, D. K. (2005). Termitomyces sp. associated with
the termite Macrotermes natalensis has a heterothallic mating system and
multinucleate cells. Mycological research, 109(3), 314-318.
Gordon, S. A., & Petersen, R. H. (1991). Mating systems in Marasmius. Mycotaxon, 41(2),
371-386.
He, C., Holme, J., & Anthony, J. (2014). SNP genotyping: the KASP assay. In D. Fleury and
R. Whitford (Eds.), Crop Breeding: Methods and Protocols (pp. 75-86). New York:
Springer.
James, T. Y., Srivilai, P., Kües, U., & Vilgalys, R. (2006). Evolution of the bipolar mating
system of the mushroom Coprinellus disseminatus from its tetrapolar ancestors
involves loss of mating-type-specific pheromone receptor function. Genetics, 172(3),
1877-1891.
Kües, U. (2000). Life history and developmental processes in the basidiomycete Coprinus
cinereus. Microbiology and molecular biology reviews, 64(2), 316-353.
Kües, U., James, T. Y., & Heitman, J. (2011). Mating Type in Basidiomycetes: Unipolar,
Bipolar, and Tetrapolar Patterns of Sexuality. In S. Pöggeler & J. Wöstemeyer (Eds.),
The Mycota XIV: Evolution of fungi and fungal-like organisms (pp. 97-160). Berlin:
Springer.
Nieuwenhuis, B. P. S., Billiard, S., Vuilleumier, S., Petit, E., Hood, M. E., & Giraud, T.
(2013). Evolution of uni- and bifactorial sexual compatibility systems in fungi.
Heredity, 111(6), 445-455.
34
The search for homeodomain genes involved in mating in
Termitomyces sp.
Abstract
In this study, several attempts were made to amplify homologs of homeodomain genes from
mating type loci of other basidiomycetes in the Termitomyces symbiont of Macrotermes
natalensis using PCR. All attempts failed to yield the expected PCR product. Future studies to
identify the mating type loci of Termitomyces may try different primers or PCR conditions, or
use alternative strategies, such as searching for genes known to be closely linked to the
mating type locus in other basidiomycetes.
Introduction
Although not much is known about the mating system and mating type loci of Termitomyces,
many studies on mating have been done for other basidiomycetes. These studies have
identified the genes involved and unravelled the underlying molecular mechanism of the
mating type loci. In tetrapolar species, one of the two loci contains genes encoding
homeodomain transcription factors of two different types (HD1 and HD2). Generally, an HD1
gene is tightly linked to an HD2 gene and transcribed in opposite direction. The HD1 gene
products can form functional heterodimeric transcription factors only with their HD2 partner
from a different allele (Kües and Casselton, 1992). These functional transcription factors
regulate genes involved in the formation of stable heterokaryons. The other mating type locus
contains pheromone precursor (Ph) and pheromone receptor (STE3) genes. As with the HD1
and HD2 genes, these genes are tightly linked and the pheromone products can only activate
their receptor partner from different alleles. Activation of the receptor leads to a signal
transduction cascade that together with the dimeric homeodomain transcription factors
regulates the formation of a stable heterokaryon (Kües et al., 2011). There is usually very
little sequence conservation between the different alleles of both mating type loci. This lack of
sequence similarity prevents homologous recombination, which would lead to selfcompatibility (Stankis et al., 1992; Specht et al., 1994). In bipolar mating systems, either the
HD and Ph-STE loci became closely linked, forming effectively one locus (Bakkeren and
Kronstad, 1994; Nieuwenhuis et al., 2013), or one of the two loci lost its function in mating
(Aimi et al., 2005; James et al., 2006).
Since the mating type genes in related species are known, an alternative to the linkage
mapping approach for localising the mating type loci of Termitomyces would be to search for
homologs to the genes from other species in the genome sequence of Termitomyces. To prove
that the homologues found are really part of the mating type locus, segregation of these genes
with mating type would then need to be shown. This strategy has been used successfully in
the identification of the mating type genes in several other basidiomycetes (James et al., 2006;
Idnurm et al., 2008).
In a previous study (Master thesis of Jens Ringelberg. ‘Adaptations to symbiosis in the fungus
cultivated by fungus-growing termites’), homologues have already been identified. If both
alleles of these genes could be found, a marker could be developed for these alleles.
Segregation of this marker with the mating type would be a strong indication that the
identified gene is indeed part of the mating type locus. Here, an attempt was made to amplify
two of the HD gene homologues using PCR.
35
Methods
Template DNA and primer design
Template DNA used in the PCR reactions was isolated from the parent heterokaryon and one
of the homokaryons from the mapping population described in the first part of this report,
using the CTAB method. Primers were designed to amplify a 656 bp fragment from an HD
gene homolog on scaffold 259 (HD1) and a 360 bp fragment from an HD gene homolog on
scaffold 418 (HD2). For HD1 a forward primer with sequence 5’-TGGTATCGTAAG
CCTGCCAC-3’ and a reverse primer with sequence 5’-ACCGAGGAAGCAAGATCGTC-3’
were used. For HD2 a forward primer with sequence 5’-TGTTAATGCTGCCACCCGAT-3’
and a reverse primer with sequence 5’-ACCGGCTCATCGGAAATGTT-3’ were used.
PCR reactions
In a first attempt, PCR reactions were performed on both samples with both primer sets in
25 μL reaction volume, consisting of 5 μL of GoTaq PCR buffer (Promega), 2 μL 25 mM
MgCl2, 1 μL 10 mM dNTPs, 1 μL of each primer, 0.1 μL GoTaq polymerase (Promega), 12.9
μL mili-Q water, and 2 μL ten times diluted template DNA or 2 μL mili-Q water (negative
control). The PCR cycles were run on a MyCycler Thermal Cycler (Bio-Rad), with an initial
denaturation step of 5 min at 94 °C, followed by 35 PCR cycles (1 min denaturation at 94 °C,
1 min annealing at 55 °C, and 1 min extension at 72 °C), after which a final extension of 10
min at 72 °C was performed. Resulting amplified fragments were examined under UV after
running 3 μL of PCR product for one hour at 80 V on a 1% agarose gel with EtBr.
To improve the amount of product obtained with the HD2 primer set, the PCR was repeated
with 40 cycles instead of 35. All fragments obtained in the first (for HD1) or the second (for
HD2) PCR were isolated by running the entire PCR product for 90 min at 60 V on a 1%
agarose gel, cutting out all bands under a UV lamp and purifying the DNA using a Nucleospin
gel and PCR clean-up kit (MACHERY-NAGEL), according to the instructions provided by
the manufacturer. Sanger sequencing of purified fragments was performed by Eurofins.
To reduce the amount of non-specific PCR amplification, the initial PCR was repeated again,
but this time mili-Q water was used instead of MgCl2 and an annealing temperature of 57 °C
was chosen, to improve the stringency of the PCR. Gel-electrophoresis was performed in the
same way as the first time.
Finally, a touchdown PCR was performed, which may help circumvent the problem of nonspecific amplification (Don et al., 1991). This PCR was performed using the same reaction
mixtures as before, as well as reaction mixtures in which the MgCl2 was replaced by mili-Q
water. The PCR program consisted of an initial denaturation step of 5 min at 94 °C, followed
by 10 touchdown cycles (1 min denaturation at 94 °C, 1 min annealing at 65 °C - 1°C/cycle,
and 1 min extension at 72 °C) and 25 normal PCR cycles (1 min denaturation at 94 °C, 1 min
annealing at 55 °C, and 1 min extension at 72 °C), after which a final extension of 10 min at
72 °C was performed. Resulting amplified fragments were separated by gel-electrophoresis in
the same way as before. Products containing only a single amplified fragment were purified
using a Nucleospin gel and PCR clean-up kit (MACHERY-NAGEL), according to the
instructions provided by the manufacturer. Purified fragments were sent to Eurofins for sanger
sequencing.
Results
The first attempt to amplify the HD gene homologs yielded multiple fragments for both
primer sets (Figure 9A). Since the negative control only showed a single band that was likely
caused by primer dimers, at least some of the fragments must have been the result of non36
specific amplification. Repeating the PCR reactions with the HD2 primer set with five
additional PCR cycles yielded the same fragments in slightly higher concentrations (Figure
9B), allowing for all fragments to be purified from gel. Subsequent sequencing of purified
fragments failed for the middle band of the reaction with homokaryon template DNA and
HD1 primers and both middle bands of the PCR with HD2 primers. Of the successfully
sequenced fragments, none matched the intended product.
A repetition of the PCR under more stringent annealing conditions yielded (apart from primer
dimers and a fragment due to template contamination of the negative control) a single
fragment that was too short to be the intended product for two of the reactions (Figure 10).
Figure 9. PCR products from the first attempts to amplify two HD gene homologs. Primers targeting
two different homeodomain genes (HD1 and HD2) were used in PCR reactions with template DNA
from the parent heterokaryon (HT), or one of the homokaryons (HM), or without template (-). To
obtain more product with the HD2 primers, the PCRs with these primers were repeated with 40 cycles
instead of 35 (B). Sizes in base pairs (bp) of the bands from the 100 bp ladder (M) are indicated on the
left of each panel. The intended products were 656 bp (HD1) and 360 bp (HD2) long.
Figure 10. PCR products from a more stringent reaction than before with primers targeting two HD
gene homologs (HD1 and HD2). PCRs were performed with template DNA from the parent
heterokaryon (HT), or one of the homokaryons (HM), or without template DNA (-). Sizes of the
fragments from the 100 bp ladder (M) in base pairs (bp) are indicated on the left. The intended
products were 656 bp (HD1) and 360 bp (HD2) long.
37
A touchdown PCR yielded many different fragments for the HD1 primer set in presence of
MgCl2, none of which matched the length of the intended product (Figure 11). Apart from
that, it only yielded a single band in one of the negative controls and a single band where the
HD1 primer set was used on homokaryon template DNA in absence of MgCl2. Sequencing of
the latter fragment revealed that it was not the intended product either.
Figure 11. Products from a touchdown PCR with primers targeting two HD gene homologs (HD1 and
HD2). PCRs were performed both in presence (MgCl2) and absence (No MgCl2) of MgCl2, on DNA
from the parent heterokaryon (HT), on DNA from one of the homokaryons (HM), and in absence of
template DNA (-). Sizes in base pairs (bp) of the bands from the 100 bp ladder (M) are indicated on
the left. The intended products were 656 bp (HD1) and 360 bp (HD2) long.
Discussion
None of the attempts to amplify either of the two HD gene homologs were successful.
Unfortunately, there are many possible explanations for why this might have happened.
Firstly, it is possible that the PCR conditions tested here were simply not optimal for the
amplification of the desired fragment. Since there are many different parameters that can be
altered to optimise a PCR reaction, it is not possible to rule out this explanation. Secondly, it
is possible that the primers used here do not work, possibly because the intended target in the
individuals used here does not have the exact same sequence as in the reference genome. In
addition, if there are many sequences in the genome that are somewhat similar to the intended
target, that may explain the many non-specific amplifications that were found. No such
similar sequences were found in the reference genome during primer design, but the reference
genome is still quite fragmented and repetitive sequences tend to be difficult to assemble.
If one of the two HD gene homologs really is involved in mating, it is also possible that this
gene could not be amplified, because the allele from the reference genome was different from
the two alleles of the parent heterokaryon. In this case, the alleles from the parent
heterokaryon may not be recognised by primers designed for the allele in the reference
genome, because sequence conservation at the mating type locus is generally low, even
between different alleles of the same species (Stankis et al., 1992; Specht et al., 1994). If this
is the case, a different strategy for finding the mating type locus may be needed. A possible
strategy may be to look for the mitochondrial intermediate peptidase (MIP) gene, a gene
closely linked to the HD gene mating type locus in many basidiomycetes (James et al., 2004).
If MIP is also linked to the mating type locus in Termitomyces, cosegregation of this gene and
the mating type may be shown and the mating type locus may be identified by looking for HD
gene homologs in close proximity to the MIP gene.
38
Acknowledgements
I would like to thank Sabine Vreeburg for supervising me, Bertha Koopmanschap and Lennart
van de Peppel for help with the PCRs, and Jens Ringelberg for finding the mating type gene
homologs and coming over to explain his methods and findings.
References
Aimi, T., Yoshida, R., Ishikawa, M., Bao, D., & Kitamoto, Y. (2005). Identification and
linkage mapping of the genes for the putative homeodomain protein (hox1) and the
putative pheromone receptor protein homologue (rcb1) in a bipolar basidiomycete,
Pholiota nameko. Current genetics, 48(3), 184-194.
Bakkeren, G., & Kronstad, J. W. (1994). Linkage of mating-type loci distinguishes bipolar
from tetrapolar mating in basidiomycetous smut fungi. Proceedings of the National
Academy of Sciences, 91(15), 7085-7089.
Don, R. H., Cox, P. T., Wainwright, B. J., Baker, K., & Mattick, J. S. (1991). 'Touchdown'
PCR to circumvent spurious priming during gene amplification. Nucleic acids
research, 19(14), 4008.
Idnurm, A., Walton, F. J., Floyd, A., & Heitman, J. (2008). Identification of the sex genes in
an early diverged fungus. Nature, 451(7175), 193-196.
James, T. Y., Kües, U., Rehner, S. A., & Vilgalys, R. (2004). Evolution of the gene encoding
mitochondrial intermediate peptidase and its cosegregation with the A mating-type
locus of mushroom fungi. Fungal Genetics and Biology, 41(3), 381-390.
James, T. Y., Srivilai, P., Kües, U., & Vilgalys, R. (2006). Evolution of the bipolar mating
system of the mushroom Coprinellus disseminatus from its tetrapolar ancestors
involves loss of mating-type-specific pheromone receptor function. Genetics, 172(3),
1877-1891.
Kües, U., & Casselton, L. A. (1992). Homeodomains and regulation of sexual development in
basidiomycetes. Trends in Genetics, 8(5), 154-155.
Kües, U., James, T. Y., & Heitman, J. (2011). Mating Type in Basidiomycetes: Unipolar,
Bipolar, and Tetrapolar Patterns of Sexuality. In S. Pöggeler & J. Wöstemeyer (Eds.),
The Mycota XIV: Evolution of fungi and fungal-like organisms (pp. 97-160). Berlin:
Springer.
Nieuwenhuis, B. P. S., Billiard, S., Vuilleumier, S., Petit, E., Hood, M. E., & Giraud, T.
(2013). Evolution of uni- and bifactorial sexual compatibility systems in fungi.
Heredity, 111(6), 445-455.
Specht, C. A., Stankis, M. M., Novotny, C. P., & Ullrich, R. C. (1994). Mapping the
heterogeneous DNA region that determines the nine Aα mating-type specificities of
Schizophyllum commune. Genetics, 137(3), 709-714.
Stankis, M. M., Specht, C. A., Yang, H., Giasson, L., Ullrich, R. C., & Novotny, C. P. (1992).
The Aα mating locus of Schizophyllum commune encodes two dissimilar multiallelic
homeodomain proteins. Proceedings of the National Academy of Sciences, 89(15),
7169-7173.
39
Appendix
1. Probability calculations
A. Probability of having at least one of all four different mating types in a sample of n
homokaryons
Let A be the event of having at least one representative of the first mating type, B the event of
having at least one of the second, C the event of having at least one of the third, and D the
event of having at least one of the fourth. Then 𝐴̅ (not A) is the event of not having a single
representative of the first mating type in the sample population. The probability of this event
can be calculated as follows:
3 𝑛
𝑃(𝐴̅) = ( )
4
̅ ). In addition, the probability of 𝐵̅ given
This probability is the same as 𝑃(𝐵̅ ), 𝑃(𝐶̅ ), and 𝑃(𝐷
̅
𝐴 can be calculated, because, if it is known that the first mating type is not present in the
sample, the probability of drawing one of the other three mating types is one in three for every
draw. Therefore:
2 𝑛
𝑃(𝐵̅|𝐴̅) = ( )
3
Using the definition of a conditional probability, the probability of 𝐴̅ and 𝐵̅ can be calculated:
2 𝑛 3 𝑛
1 𝑛
𝑃(𝐴̅ ∩ 𝐵̅ ) = 𝑃(𝐵̅|𝐴̅) ∙ 𝑃(𝐴̅) = ( ) ∙ ( ) = ( )
3
4
2
The probability of having at least one of all four different mating types can be written as the
probability of A, B, C and D occurring at the same time:
𝑃(𝐴 ∩ 𝐵 ∩ 𝐶 ∩ 𝐷)
Since the probability of “having at least one of each” is the same as the probability of “not
having not one of the first, or the second or the third or the fourth”, this formula can be
rewritten as follows:
̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
̅ ∪ 𝐵̅ ∪ 𝐶̅ ∪ 𝐷
̅ ) = 1 − 𝑃(𝐴̅ ∪ 𝐵̅ ∪ 𝐶̅ ∪ 𝐷
̅)
𝑃(𝐴 ∩ 𝐵 ∩ 𝐶 ∩ 𝐷) = 𝑃(𝐴
̅ can be found by adding the probabilities of the
The probability of the union of 𝐴̅, 𝐵̅, 𝐶̅ , and 𝐷
individual events, subtracting the probabilities of all combinations of intersections between
the events, adding the probabilities of all combinations of triple intersections and subtracting
the probability of the quadruple intersection:
̅)
𝑃(𝐴 ∩ 𝐵 ∩ 𝐶 ∩ 𝐷) = 1 − 𝑃(𝐴̅ ∪ 𝐵̅ ∪ 𝐶̅ ∪ 𝐷
̅ ) − 𝑃(𝐴̅ ∩ 𝐵̅ ) − 𝑃(𝐴̅ ∩ 𝐶̅ ) − 𝑃(𝐴̅ ∩ 𝐷
̅ ) − 𝑃(𝐵̅ ∩ 𝐶̅ )
= 1 − (𝑃(𝐴̅) + 𝑃(𝐵̅ ) + 𝑃(𝐶̅ ) + 𝑃(𝐷
̅ ) − 𝑃(𝐶̅ ∩ 𝐷
̅ ) + 𝑃(𝐴̅ ∩ 𝐵̅ ∩ 𝐶̅ ) + 𝑃(𝐴̅ ∩ 𝐵̅ ∩ 𝐷
̅ ) + 𝑃(𝐴̅ ∩ 𝐶̅ ∩ 𝐷
̅)
− 𝑃(𝐵̅ ∩ 𝐷
̅
̅
̅
̅
̅
̅
̅
+ 𝑃(𝐵 ∩ 𝐶 ∩ 𝐷 ) − 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶 ∩ 𝐷))
Since all four mating types are equally as likely, this formula can be simplified:
̅)
𝑃(𝐴 ∩ 𝐵 ∩ 𝐶 ∩ 𝐷) = 1 − 𝑃(𝐴̅ ∪ 𝐵̅ ∪ 𝐶̅ ∪ 𝐷
̅
̅
̅ ))
= 1 − (4 ∙ 𝑃(𝐴) − 6 ∙ 𝑃(𝐴 ∩ 𝐵̅ ) + 4 ∙ 𝑃(𝐴̅ ∩ 𝐵̅ ∩ 𝐶̅ ) − 𝑃(𝐴̅ ∩ 𝐵̅ ∩ 𝐶̅ ∩ 𝐷
̅ ) = 0, because
In this formula, 𝑃(𝐴̅) and 𝑃(𝐴̅ ∩ 𝐵̅ ) are already known and 𝑃(𝐴̅ ∩ 𝐵̅ ∩ 𝐶̅ ∩ 𝐷
having none of the four mating types is only possible if the sample size is zero. 𝑃(𝐴̅ ∩ 𝐵̅ ∩ 𝐶̅ )
is the same as the probability of getting only the fourth mating type, which can be calculated
as follows:
1 𝑛
𝑃(𝐴̅ ∩ 𝐵̅ ∩ 𝐶̅ ) = 𝑃(𝑂𝑛𝑙𝑦 𝑚𝑎𝑡𝑖𝑛𝑔 𝑡𝑦𝑝𝑒 4) = ( )
4
40
Substituting these expressions in the overall formula results in an expression for the
probability of having each of the four mating type represented at least once as a function of
the sample size:
3 𝑛
1 𝑛
1 𝑛
𝑃(𝐴 ∩ 𝐵 ∩ 𝐶 ∩ 𝐷) = 1 − (4 ∙ ( ) − 6 ∙ ( ) + 4 ∙ ( ) )
4
2
4
Using this formula, it can be calculated that a sample size of at least 16 is needed to have at
least 95% certainty that all four mating types will be represented at least once. For a sample
size of 20 this probability is more than 98%.
B. Probability of having at least three out of four different mating types in a sample of n
homokaryons
The probability that we are looking for can be written as follows:
𝑃((𝐴 ∩ 𝐵 ∩ 𝐶) ∪ (𝐴 ∩ 𝐵 ∩ 𝐷) ∪ (𝐴 ∩ 𝐶 ∩ 𝐷) ∪ (𝐵 ∩ 𝐶 ∩ 𝐷))
This can be rewritten by adding the probabilities of the four parts in brackets and subtracting
three times their mutual intersection as can be deduced from the Venn diagram (Figure 12).
Figure 12. Venn diagram showing all possible combinations of four different sets. The union of
events (A ∩ B ∩ C), (A ∩ B ∩ D), (A ∩ C ∩ D), and (B ∩ C ∩ D) is indicated in red.
Because all four mating types are equally as likely, this probability can be written as:
4 ∙ 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶) − 3 ∙ 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶 ∩ 𝐷)
𝑃(𝐴 ∩ 𝐵 ∩ 𝐶 ∩ 𝐷) has already been found in appendix 1A and 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶) can be
calculated in much the same way:
𝑃(𝐴 ∩ 𝐵 ∩ 𝐶) = 1 − 𝑃(𝐴̅ ∪ 𝐵̅ ∪ 𝐶̅ )
= 1 − (𝑃(𝐴̅) + 𝑃(𝐵̅) + 𝑃(𝐶̅ ) − 𝑃(𝐴̅ ∩ 𝐵̅ ) − 𝑃(𝐴̅ ∩ 𝐶̅ ) − 𝑃(𝐵̅ ∩ 𝐶̅ ) + 𝑃(𝐴̅ ∩ 𝐵̅ ∩ 𝐶̅ ))
= 1 − (3 ∙ 𝑃(𝐴̅) − 3 ∙ 𝑃(𝐴̅ ∩ 𝐵̅ ) + 𝑃(𝐴̅ ∩ 𝐵̅ ∩ 𝐶̅ ))
3 𝑛
1 𝑛
1 𝑛
= 1 − (3 ∙ ( ) − 3 ∙ ( ) + ( ) )
4
2
4
Taken together, the result is an expression for the probability of having at least three of the
four mating types represented in a sample of size n:
𝑃((𝐴 ∩ 𝐵 ∩ 𝐶) ∪ (𝐴 ∩ 𝐵 ∩ 𝐷) ∪ (𝐴 ∩ 𝐶 ∩ 𝐷) ∪ (𝐵 ∩ 𝐶 ∩ 𝐷))
3 𝑛
1 𝑛
1 𝑛
3 𝑛
1 𝑛
1 𝑛
= 4 ∙ (1 − (3 ∙ ( ) − 3 ∙ ( ) + ( ) )) − 3 ∙ (1 − (4 ∙ ( ) − 6 ∙ ( ) + 4 ∙ ( ) ))
4
2
4
4
2
4
𝑛
𝑛
1
1
=1−6∙( ) +8∙( )
2
4
41
Using this formula, it can be calculated that a sample size of at least 7 is needed to have at
least 95% certainty that at least three of the four mating types will be represented at least
once. If a sample size of 10 is used, this probability is more than 99%.
42