Genotyping-by-sequencing, linkage mapping, protoplasting, and mating of the Termitomyces symbiont of Macrotermes natalensis MSc thesis report of: Registration number: Study programme: Year: Chair group: Supervisor: Examiner: Bas Jacobs 940221384050 MSc Biotechnology 2015/2016 Genetics Sabine Vreeburg Duur Aanen Table of Contents Construction of the first genetic map of the Termitomyces symbiont associated with Macrotermes natalensis using a genotyping-by-sequencing approach ...................................... 5 Abstract .................................................................................................................................. 5 Introduction ............................................................................................................................ 5 Methods .................................................................................................................................. 7 Mapping population ........................................................................................................... 7 DNA isolation and RNA degradation ................................................................................ 7 Detection of heterokaryons in the mapping population and choice of samples for GBS .. 7 Concentration measurement and quality control of DNA samples .................................... 8 Genotyping-by-sequencing ................................................................................................ 8 SNP discovery .................................................................................................................... 8 SNP filtering ....................................................................................................................... 8 Linkage mapping ................................................................................................................ 9 Genotyping and mapping of the mating type locus .......................................................... 10 Alignment error detection of markers from scaffolds split across linkage groups .......... 10 Genome coverage estimation and marker distribution ..................................................... 10 Comparison physical and genetic distance and analysis number of recombination events .......................................................................................................................................... 11 Results .................................................................................................................................. 11 Detection of heterokaryons in the mapping population ................................................... 11 Quality control of DNA samples ...................................................................................... 12 GBS analysis and SNP filtering ....................................................................................... 12 Linkage map ..................................................................................................................... 13 Mapping of the mating type locus .................................................................................... 16 Alignment errors .............................................................................................................. 17 Genome Coverage ............................................................................................................ 17 Marker distribution ........................................................................................................... 18 Physical and genetic distance ........................................................................................... 19 Recombination events ...................................................................................................... 20 Discussion ............................................................................................................................ 20 Conclusion ............................................................................................................................ 23 Acknowledgements .............................................................................................................. 23 References ............................................................................................................................ 24 2 A simple, working protoplasting protocol for the Termitomyces symbiont of Macrotermes natalensis .................................................................................................................................. 27 Abstract ................................................................................................................................ 27 Introduction .......................................................................................................................... 27 Methods ................................................................................................................................ 27 Strain and growth medium ............................................................................................... 27 Growth and harvesting of mycelium for protoplasting .................................................... 28 Protoplast production ....................................................................................................... 28 Protoplast regeneration ..................................................................................................... 28 Results .................................................................................................................................. 28 Discussion ............................................................................................................................ 29 Conclusion ............................................................................................................................ 29 Acknowledgements .............................................................................................................. 29 References ............................................................................................................................ 30 Determining the mating system of the Termitomyces symbiont of Macrotermes natalensis .. 31 Abstract ................................................................................................................................ 31 Introduction .......................................................................................................................... 31 Methods ................................................................................................................................ 32 Set-up of crosses ............................................................................................................... 32 Purification of cross products ........................................................................................... 33 Results .................................................................................................................................. 33 Discussion ............................................................................................................................ 33 Acknowledgements .............................................................................................................. 33 References ............................................................................................................................ 33 The search for homeodomain genes involved in mating in Termitomyces sp. ........................ 35 Abstract ................................................................................................................................ 35 Introduction .......................................................................................................................... 35 Methods ................................................................................................................................ 36 Template DNA and primer design ................................................................................... 36 PCR reactions ................................................................................................................... 36 Results .................................................................................................................................. 36 Discussion ............................................................................................................................ 38 Acknowledgements .............................................................................................................. 39 References ............................................................................................................................ 39 3 Appendix .................................................................................................................................. 40 1. Probability calculations .................................................................................................... 40 A. Probability of having at least one of all four different mating types in a sample of n homokaryons .................................................................................................................... 40 B. Probability of having at least three out of four different mating types in a sample of n homokaryons .................................................................................................................... 41 4 Construction of the first genetic map of the Termitomyces symbiont associated with Macrotermes natalensis using a genotyping-by-sequencing approach Abstract Genotyping-by-sequencing (GBS) is a cost-effective approach to SNP marker discovery that has been used in many genetic mapping studies in plants, but thus far not that much in fungi. Termitomyces is a genus of basidiomycetes that lives in a mutualistic symbiosis with fungusgrowing termites and produces mushrooms that are edible for humans and have high nutritive value. Here, the GBS approach was used to construct the first genetic linkage map of the Termitomyces symbiont of Macrotermes natalensis. The map, based on 88 haploid offspring from a single heterokaryon, consists of 16 linkage groups, containing 586 markers and spanning a total length of 1303 Haldane cM. A preliminary mapping of the mating type locus embedded it firmly in one of the larger linkage groups. Due to the fragmented nature of the reference genome, total genome coverage cannot be guaranteed, but the map does provide an order for many of the scaffolds from the genome assembly. Analysis of the numbers of recombination events indicates that crossover interference may not play a large role in the recombination behaviour of this fungus. In addition, comparison of physical and genetic distances indicates that the recombination landscape of the species may be dominated by hotspots and coldspots. A more complete reference genome assembly will, however, be necessary to make this conclusion stronger. The genetic map will be a useful resource for future genetic and genomic studies on Termitomyces, providing a framework for mapping interesting traits and QTLs. Introduction All known species of the fungal genus Termitomyces grow in a remarkable mutualistic symbiosis with fungus-growing termites of the subfamily macrotermitinae (Aanen et al., 2002). This mutualistic relation is obligate; neither partner can survive for long without the other (Sands, 1956; De Fine Licht et al., 2005). The symbiosis is thought to have evolved a single time in the African rainforest (Aanen and Eggleton, 2005) and no reversals to a free living state have thus far been found for both termites and fungi (Aanen et al., 2002). In addition to the ecological and evolutionary interests in the genus Termitomyces, the edible mushrooms produced by this basidiomycete fungus also make it an attractive organism to study. Not only are these mushrooms a local delicacy (Oso, 1975), they also contain many important nutrients (Botha and Eicker, 1992; Kansci et al., 2003; Malek et al., 2012; Ogundana and Fagade, 1982). Furthermore, consumption of the mushroom may help lower blood cholesterol levels (Nabubuya et al., 2010) and some of its components may have medical applications (Chatterjee et al., 2013). Unfortunately, these mushrooms appear to be quite rare. There are indications that they are only formed when the termite symbiont forms new colonies and needs to obtain spores to reinitiate the symbiosis, which may help explain this rarity (Johnson et al., 1981). Therefore, it would be valuable if the fungus could be cultivated and mushroom formation could be induced in the lab (and later on commercial scale) without requiring the termite. Unfortunately, laboratory cultivation of the fungus, although possible, is still quite difficult and induction of fruiting bodies in culture has been reported only once in a single species of Termitomyces (De, 1982). Many studies on Termitomyces have used the symbiont associated with the termite Macrotermes natalensis. Since M. natalensis has only ever been found associated with a 5 single specific lineage of Termitomyces (De Fine Licht et al., 2006; Aanen et al., 2007), it is likely that these studies all involved the same Termitomyces species. This has been confirmed by pairing between homokaryons isolated from three different heterokaryotic strains associated with M. natalensis (Nobre et al., 2014). A reference genome of this species has recently been published (Poulsen et al., 2014). Further genetic studies of this fungus may provide useful insights into its life cycle, facilitating future attempts to cultivate its mushrooms as well as providing a better understanding of its symbiosis. Genetic maps are useful tools for such studies, facilitating for example the localisation of genes on the genome. Genetic mapping uses the recombination frequency between genetic markers as a measure of the distance between these markers. If sufficient markers are used, the markers can be assigned to different linkage groups corresponding to the different chromosomes and the relative positions of the markers can be determined, allowing the creation of a genetic map. A genetic map of Termitomyces would be useful for many purposes, such as (1) the ordering of the rather large number of scaffolds and contigs of the current Termitomyces genome assembly (Poulsen et al., 2014), (2) the study of the recombination behaviour of this fungus, (3) the identification of genes and quantitative trait loci (QTLs), and in the future (4) the amelioration of the (cultivated) mushrooms by marker assisted selection (Foulongne-Oriol, 2012). One locus that can be identified using a genetic map is the mating type locus. Knowledge of the mating system of Termitomyces might help future attempts to induce fruiting bodies, since two homokaryons (the haploid mycelia that arise when sexual spores germinate) can only successfully form a heterokaryon (the mycelium that forms the mushrooms, containing two distinct types of nuclei) when they have compatible mating types. Preliminary crosses indicate that Termitomyces has a bipolar mating system (unpublished results) and therefore only one mating type locus (for a review on mating systems in basidiomycetes see: Kües et al., 2011). By determining the recombination frequency between the markers on the linkage map and the mating type locus, this locus can be placed on the genetic map. This method has been successfully used to map the mating type locus in several other species of basidiomycetes (Xu et al., 1993; van der Nest et al., 2009; Okuda et al., 2009). For the construction of a genetic map and the subsequent isolation of genes, markers are required. SNP markers have the advantage of being non-anonymous (i.e. their sequence is known and they can be directly linked to a place on the genome) and codominant (Rowe et al., 2011). Strategies employing reduced genome representation in combination with next generation sequencing have been shown to generate many SNP markers in a fast and costeffective way (Davey et al., 2011). The ability to find a large number of markers allows for the construction of high-density linkage maps, which is favourable for the localisation of genes on the genome (Jones et al., 2009). For constructing genetic linkage maps the genotyping-by-sequencing (GBS) developed by Elshire et al (2011) approach has been used in many plant species (Bielenberg et al., 2015; Guajardo et al., 2015; İpek et al., 2016; Ma et al., 2012), although not that much in fungi. In this method, genomic DNA is digested with a restriction enzyme to generate many different fragments of varying sizes. Two different adapters, a barcoded adapter and a common adapter, are then ligated to the fragments. The use of barcoded adaptors allows for the samples to be pooled before sequencing, making the method more cost-effective. After pooling, the smaller fragments are amplified by PCR using primers that bind to the adapters and have an extended region that binds to the oligonucleotides in an Illumina genome sequencer. The ends of the amplified fragments that have two different adapters attached are then sequenced with an Illumina sequencer. In theory, this allows for a representative subset of SNPs to be identified on the sequenced ends (Elshire et al., 2011). 6 In this study, the first genetic map of the Termitomyces sp. associated with M. natalensis was made using SNP markers discovered by GBS. A preliminary mapping of the mating type locus to this map was performed using mating types derived from phenotypic analysis of crosses performed with a subset of the mapping population. In addition, genetic and physical maps were compared and results indicate that the recombination landscape of Termitomyces is mainly governed by recombination hotspots and coldspots. Although the completeness of the current map cannot be guaranteed, it does provide an ordering for part of the scaffolds from the very fragmented genome assembly and it should prove to be a useful resource for future gene and QTL mapping studies. Methods Mapping population The population used for linkage mapping consisted of single-spore isolates from a single Termitomyces heterokaryon isolated in South Africa from an M. natalensis colony. Since this species of Termitomyces has been shown to be mostly outbreeding in nature (De Fine Licht et al., 2006), this natural isolate was expected to show sufficient variation for use in genetic mapping. DNA isolation and RNA degradation Genomic DNA for use in subsequent steps was isolated from part of the mapping population as well as from the parent heterokaryon using the CTAB method. Any RNA present in the samples after extraction was degraded by adding 3 μL RNase I (Thermo Scientific) and incubating for two hours at 37 °C. Samples were subsequently incubated for 15 min at 70 °C to inactivate the RNase. Detection of heterokaryons in the mapping population and choice of samples for GBS Since some of the single-spore isolates may actually be heterokaryons that originated by fusion of two germinating spores, a marker was developed to test some of the suspect heterokaryons in the mapping population. This marker was developed by PCR amplification of a highly variable part of the nuclear Elongation Factor 1 alpha (EF1α) from the parent heterokaryon using primers EF595F and EF1160R described by De Fine Licht et al. (2006). The reaction volume of the PCR was 25 μL containing 5 μL 5x GoTaq PCR buffer (Promega), 2 μL 25 mM MgCl2, 1 μL 10 mM dNTPs, 1 μL of each primer, 0.1 μL GoTaq polymerase (Promega), 2 μL ten times diluted template DNA, and 12.9 μL mili-Q water. The PCR was performed on a MyCycler Thermal Cycler (Bio-Rad), starting with a denaturation step of 5 min at 94 °C, followed by 35 PCR cycles (1 min denaturation at 94 °C, 1 min annealing at 53 °C and 1 min extension at 72 °C), after which a final extension of 10 min at 72 °C was performed. The PCR product was checked under UV, after running 3 μL for 60 min at 70 V on a 1% agarose gel with EtBr, and purified using a Nucleospin gel and PCR clean-up kit (MACHERY-NAGEL) according to the instructions provided by the manufacturer. Sanger sequencing of the purified product was performed by Eurofins. A SNP marker was detected in this sequence by looking for a double peak in the chromatogram. This SNP disrupted an NdeI restriction site, allowing for the conversion of the SNP marker into a PCR-RFLP marker. Twelve suspect heterokaryons and 25 putative homokaryons from the mapping population were analysed for heterozygosity using this marker. The parent heterokaryon was included in this analysis as a positive control. PCRs were performed on these samples as before and PCR products were digested in 10 μL volumes with 5 μL PCR product, 3 μL mili-Q water, 1 μL digestion buffer and 1 μL NdeI (New England Biolabs). Digestions were incubated for one hour at 37 °C. All digested 7 product was checked by running on a 1% agarose gel with EtBr for one hour at 80 V. Since roughly half of the suspect heterokaryons showed heterozygosity (which would be expected if all of them were heterokaryons resulting from a mating of sibling homokaryons), these and all other suspect heterokaryons in the mapping population were excluded from the further analyses. From the remaining presumed homokaryons in the mapping population, 92 were chosen at random for analysis by GBS. In addition, the parent heterokaryon was included in three replicates as a control. Concentration measurement and quality control of DNA samples DNA concentrations were measured using a Qubit 2.0 fluorometer (Life Technologies) according to the instructions provided by the manufacturer. For GBS, DNA concentrations needed to be between 30 and 100 ng/μL. Therefore, samples with concentrations higher than 100 ng/μL were diluted with mili-Q water and samples with concentrations lower than 30 ng/μL were concentrated by evaporation in a vacuum. To test if the quality of the DNA was sufficient for GBS, trial digests were performed on the parent heterokaryon sample and nine randomly chosen samples from the homokaryon population. The digestions were performed using restriction enzyme HindIII in 20 μL volumes containing 10 μL DNA, 7.7 μL mili-Q water, 2 μL digestion buffer, and 0.3 μL HindIII (Promega). The reactions were incubated for two hours at 37 °C. All 20 μL of the trial digest was subsequently run on a 1% agarose gel with EtBr for three hours at 40 V along with 3 μL of undigested sample and 5 μL of a λ HindIII digest as size standard. Genotyping-by-sequencing GBS was performed at the Genomic Diversity Facility of Cornell University according to the protocol described by Elshire et al. (2011). The enzyme used for the restriction step was EcoT22I (a six-base cutter) rather than ApeKI (a five-base cutter with one wobble base), because the less frequent cutting results in fewer different sequenced fragments and therefore higher coverage per sequenced fragment. Although this is expected to decrease the number of SNPs that can be identified, it should increase the probability that a SNP can be scored for all samples, which is favourable for linkage mapping. After adaptor ligation all 95 samples (three replicates of the parent heterokaryon and 92 presumed homokaryons from the mapping population) and one blank sample (no DNA) were pooled and sequenced in a single Illumina sequencing lane (after the PCR-step). SNP discovery Processing of raw sequence reads was performed using version 2 of the GBS analysis pipeline introduced by Glaubitz et al. (2014) implemented in TASSEL 5 (Bradbury et al., 2007). The genome assembly reported by Poulsen et al. (2014) was used as reference genome. Default settings were used except for the maximum memory setting, which was increased to 16 Gb and the minimum minor allele frequency setting in the Discovery SNP Caller, which was increased to 0.1, since the allele frequency of real SNPs in the mapping population is expected to be around 0.5, making low minor allele frequencies suspicious. The alignment was performed with BWA version 0.7.13 (Li and Durbin, 2009), using default settings. Final SNP calls made by the GBS pipeline were exported in HDF5 format for further filtering in TASSEL. SNP filtering Using TASSEL 5, all genotype calls with a read depth lower than five were set to missing, retaining only the more reliable calls. Genotypes were exported in hapmap format and further filtering was performed in R version 3.1.2 (R Core Team, 2014). Firstly, the genotypes calls 8 were converted to the ‘a, h, b, u’ format required for the linkage mapping step. Because the genotype of the individual nuclei of the parent heterokaryon was unknown, assignment of ‘a’ and ‘b’ was arbitrary. If more than two alleles were found for a marker, the marker was discarded if there was more than one occurrence of the least frequent allele. Otherwise, the single conflicting genotype call was set to unknown. Markers were named according to their scaffold and the position on that scaffold in base pairs. Secondly, all markers with more than 80% missing genotype calls were removed, to get rid of the most poorly scored tags. Then, markers and samples were filtered in an iterative way based on the percentage of non-missing genotype calls that were heterozygous: First all markers where this was more than 40% were removed, then all samples where it was more than 40% were discarded and these steps were then repeated with cut-off values of 20%, 10%, 5% and 2%. This was done because heterozygosity should not be present in a population of haploid individuals. Markers that are mostly heterozygous are likely the result of paralogous sequences aligning to the same part of the genome assembly and therefore not really SNPs. Samples that are largely heterozygous are likely to be heterokaryons, either because they were samples from the parent heterokaryon (i.e. the controls), or because they were the result of a mating between two single-spore isolates. These samples are not part of the intended mapping population and should therefore not be included in the mapping process. At the end of this filtering step, all remaining heterozygous calls were set to unknown. In the next step, all markers that no longer had two different alleles were removed, after which the density of the genotype matrix was increased by removing markers and samples with more than a certain percentage of missing calls. From here, two different datasets were created: a strictly filtered dataset and a mildly filtered one. For the first dataset, markers with more than 50% missing data were removed, followed by samples with more than 50% missing data and this was repeated with cut-off values of 30% and 10%. For the second dataset, the last step with 10% was omitted. After this, markers showing a significant segregation distortion (according to a chi-squared test) were removed. For the strictly filtered dataset the significance level of this test was chosen 0.01 and for the mildly filtered dataset it was chosen 0.001. These steps were performed because missing data and distorted markers can have a negative effect on the accuracy of the genetic map. Finally, from every group formed by linking together all SNPs with physical distances less than 64 base pairs (bp) only one (with the smallest number of missing genotypes) was kept as a marker. This was done because markers that close are practically the same and probably originated from the same tag. The resulting datasets were written to CSV files. Linkage mapping Linkage mapping was performed using Joinmap 4 (Van Ooijen, 2006), using the ‘HAP’ population type. Linkage mapping was performed in two rounds, as this has been shown to reduce the impact of missing genotypes and segregation distortions (Foulongne-Oriol et al., 2010). In the first round, the strictly filtered dataset (few missing genotypes, little segregation distortion) was mapped. Grouping of markers into linkage groups was performed using the independence LOD-score. To determine a reasonable cut-off value for grouping by this statistic, a permutation test was performed using the mildly filtered dataset (the largest of the two datasets). This was done by randomly redistributing the genotype calls over the samples for each marker and computing the grouping tree for this permuted dataset fifty times. After permutation, none of the markers should be linked anymore and therefore all observed linkages are merely the result of chance. For each of the fifty permutations the highest LODscore at which linkage was still observed was recorded, to obtain an estimate of the highest LOD-score at which spurious linkage can still be expected if no real linkage is present. Since it is highly unlikely that there will be no real linkage between any of the markers, this 9 estimate is probably quite conservative. Since spurious linkage occurred in less than 10% of the cases at a LOD-score of 6, this score was chosen for determining linkage groups. In addition, if this grouping resulted in gaps larger than 40 cM (Haldane, corresponding to a recombination frequency of 0.275), groups were split up again. Within each of the linkage groups, identically segregating loci were assigned to the group to make them show up on the map and marker orders were calculated using the maximum likelihood (ML) algorithm. Although the ML algorithm can only be used with the Haldane mapping function and tends to produce inflated map lengths, especially in the presence of genotyping errors, it is better in the correct ordering of markers than the regression algorithm (Hackett and Broadfoot, 2003). In the second mapping round, the mildly filtered dataset was mapped using the same strategy. This time, however, any markers that increased the map length by more than 10 cM, or got placed in between two markers from a single, different scaffold, were removed. Genotyping and mapping of the mating type locus Preliminary genotyping of the mating type locus was performed for 29 of the homokaryons used in linkage mapping, by analysing the successfulness of crosses based on phenotypic changes at the contact zone. The genotype data obtained in this way were added to the mildly filtered dataset and linkage mapping was performed again as described above in order to place the mating type locus. Alignment error detection of markers from scaffolds split across linkage groups Linkage groups were manually screened for markers from the same scaffold that were not grouped together on the same linkage group. Since individual markers that had positions on the linkage map that were far apart from other markers from the same scaffold could have been erroneously assigned to this scaffold by misalignment, the alignment of these deviant markers was tested. GBS-tags from these markers were retrieved from the SAM file created in the alignment step of the GBS pipeline and aligned to the genome using BLASTN (Camacho et al., 2009) with default settings except for the ‘-task’ flag, which was set to ‘blastn’. Genome coverage estimation and marker distribution Total coverage of the genome by markers on the linkage map was estimated in two different ways. Firstly, the combined length of all scaffolds represented on the linkage map was calculated. Secondly, the total physical map length represented by the genetic map was estimated from the total genetic length of the map and an estimate of the average physical to genetic distance ratio (kb / cM). This ratio was estimated by adding up all the physical and genetic distances between markers from the same scaffold for all scaffolds represented by at least four markers (markers that were assigned to the wrong scaffold due to misalignment were excluded from this analysis). The distance ratio was then determined by dividing the total added physical distance by the total added genetic distance. The distribution of the mapped markers over the scaffolds and contigs of the genome assembly was examined by testing for each scaffold whether it was over- or underrepresented. This was done using two-tailed hypergeometric tests for each of the scaffolds and contigs, where the observed markers were considered a sample from a population of all positions where markers could have been found. The total population size was estimated as two times the number of restriction sites, because, in the GBS procedure, every restriction site gives rise to two ends on which a SNP may or may not be found. The results (at α = 0.05) were compared with the results of Bonferroni and Benjamini-Hochberg multiple testing corrections, which may be overly strict, since the tests are not independent. 10 Distribution of markers over the linkage groups was tested by comparing the observed number of markers on a linkage group with the expected number under a Poisson distribution (Remington et al., 1999). The expected number of markers for each linkage group was calculated by multiplying the total number of markers with the ratio between the length of the linkage group and the total map length. P-values were calculated as the probability of finding the observed number of markers or a more extreme number and compared to α/2 (α = 0.05), because it is a two-tailed test. The distribution of markers within each linkage group was examined by comparing the markers distribution with a uniform distribution using Q-Q plots and Kolmogorov-Smirnov tests. Comparison physical and genetic distance and analysis number of recombination events To examine the relation between physical and genetic distance, genetic positions were plotted against physical positions for all scaffolds from which at least ten markers were represented on the linkage map (markers that were assigned to the wrong scaffold due to misalignment were again excluded from the analysis). Genotypes of individual homokaryons ordered according to their position on the linkage map and coloured according to the parent of origin were visualised in Joinmap. The number of recombination events was determined for each of the homokaryons and each of the linkage maps by counting the number of changes from one parental genotype to the other. Changes that were immediately reverted at the next scored marker on the linkage group were not counted, because they were likely the result of genotype scoring errors. The average number of recombination events per homokaryon was calculated for each of the linkage groups and compared to the total genetic length of the linkage group in both Haldane and Kosambi cM. Results Detection of heterokaryons in the mapping population The PCR-RFLP assay to test individuals for heterozygosity suffered from some non-specific PCR amplification. An extra band roughly 100 bp larger than the expected full length fragment of 591 bp could be found when the full length amplicon was present (Figure 1). In addition, in samples where the intended amplicon was cleaved, this band disappeared and a band of about 100 bp longer than the intended largest cleavage product of 417 bp appeared. The extra bands may have been the result of one of the primers annealing non-specifically to a nearby sequence, lengthening the intended product by 100 bp. However, samples that are heterozygous for the marker can still be identified because they have all bands, while the homozygotes have only the top two or the lower two bands. This way, five of the twelve individuals suspected to be heterokaryons and none of the individuals presumed to be homokaryons were found to be heterozygous. Since only half of the heterokaryons formed by sibling matings are expected to be heterozygous, it is quite likely that most of the other suspect heterokaryons are also heterozygous. Therefore, all individuals from the mapping population that were expected to be heterokaryons based on their phenotype were excluded from the further analyses. 11 Figure 1. Heterozygosity test of a subset of the mapping population. PCR-RFLP analysis was performed on 12 suspect heterokaryons (top left) and 25 presumed homokaryons (bottom), using a marker for which the parent heterokaryon (top right) was heterozygous. Sizes of the 100 base pair (bp) ladder (lane M) are indicated on the left. The length of the undigested fragment targeted by PCR was 591 bp and digestion products of 417 bp and 173 bp were expected for one of the two alleles. Quality control of DNA samples All ten DNA samples that were chosen for the quality test showed clear single bands where the full genomic DNA was loaded on the gel and a smear where the digested genomic DNA was loaded (Figure 2). Therefore, the isolated DNA was expected to be mostly intact, clean and readily digestible, indicating that it was suitable for GBS. Figure 2. Quality control and trial digestion of several GBS samples. Samples of full genomic DNA (single bands) and genomic DNA digested with HindIII (smears) were analysed by gel electrophoresis for nine homokaryons (HM) and the parent heterokaryon (HT). Sizes in base pairs (bp) of the HindIII digested lambda DNA marker (M) are indicated on the left. GBS analysis and SNP filtering The GBS procedure performed on the 92 homokaryons and 3 replicates of the parent heterokaryon yielded on average 2.9 million reads per sample, with a standard deviation of 1.1 million, a minimum of 0.8 million, and a maximum of 5.9 million. Of the blank control sample only 6011 reads were found. The GBS analysis pipeline initially yielded 9835 SNP markers which were subsequently filtered in two different ways. Strict filtering, which was harder on missing data and deviating segregation ratios, yielded a total of 489 high quality SNP markers, while milder filtering yielded 591 markers (for an overview of all filtering steps see Table 1). The greatest loss in the number of markers was observed when filtering against heterozygosity, which is probably the result of tags from different positions aligning to the same place on the reference genome. In addition, the blank sample, the three replicates of the parent heterokaryon, and four 12 individuals from the mapping population were filtered out in this step, leaving 88 samples for linkage mapping. The removal of the parent heterokaryon control samples in this step indicates that this filter is capable of successfully filtering out heterokaryotic samples. It is, therefore, likely that the four other samples that were removed in this step were heterokaryons that were missed by the initial phenotypic screening of the mapping population. Table 1. Numbers of markers and individuals that remained after each of the filtering steps and the mapping step for both strictly and mildly filtered datasets. Mild filtering was less strict against missing data and deviations from the expected segregation ratio. Strict filtering Mild filtering Step Number of Number of Number of Number of markers individuals markers individuals left left left left GBS pipeline 9835 96 9835 96 Removal of markers with too many alleles 6136 96 6136 96 and markers with more than 80% missing genotypes Iterative filtering against heterozygosity 3773 88 3773 88 Removal of all non-polymorphic markers 1432 88 1432 88 Iterative filtering against missing genotypes 969 88 1107 88 Removal of markers with severely distorted 807 88 974 88 segregation Removal of markers with close physical 489 88 591 88 positions Removal of markers that could not be 487 88 586 88 reliably placed on the linkage map Linkage map The first round of linkage mapping, using only the most reliable SNP markers, yielded fifteen linkage groups. Most of these groups fell apart at a LOD threshold of 6, with the exception of linkage groups 8 and 11 which were grouped together up to LOD 8, but were split apart because this grouping resulted in an interval of approximately 70 cM. Two markers could not be placed on any of the linkage groups and were left out. In the second mapping round, some of the less reliable markers were included, to obtain the final linkage map (Figure 3). This resulted in an extra linkage group (LG16), formed by new markers that were linked together but not to any of the existing linkage groups. The two markers that were left out in the previous round still could not be placed anywhere on the map. In addition, three of the newly added markers were removed because they inflated the map length by more than 10 cM, or because their placement interrupted two markers from the same scaffold. This resulted in a final linkage map with 586 markers based on 88 haploid offspring (Table 1), covering a total length of 1303 Haldane cM (for some summary statistics see Table 2). 13 14 15 Figure 3. Linkage map of Termitomyces sp. based on 88 homokaryons, consisting of 586 SNP markers. Genetic positions of markers (Haldane cM) are indicated on the left. Marker names indicating scaffold of origin and position on that scaffold in base pairs are indicated on the right. LOD support (up to LOD 10) for various groupings is indicated by curly brackets. The sixteen linkage groups (LG1 – LG16) were numbered arbitrarily, since the chromosomes they belong to are unknown. Mapping of the mating type locus The mating type locus was added to the map in a third mapping round, because it was only scored for 29 individuals and because determining the successfulness of matings based on phenotype may not be too reliable. It mapped to LG3 (up to a LOD score of 8) and fit quite well without significantly distorting or lengthening the map (Figure 4). 16 1 0.0 8.6 8.7 11.0 1 SCAFFOLD10_202034 SCAFFOLD363_39271 SCAFFOLD10_114662 SCAFFOLD10_114461 SCAFFOLD576_19952 SCAFFOLD10_133672 SCAFFOLD10_92792 SCAFFOLD10_87154 SCAFFOLD22_23946 SCAFFOLD22_132660 50.3 SCAFFOLD73_96054 SCAFFOLD73_253765 SCAFFOLD83_23496 SCAFFOLD73_268433 SCAFFOLD73_253597 SCAFFOLD73_178173 SCAFFOLD242_171619 SCAFFOLD242_171803 SCAFFOLD242_515851 SCAFFOLD242_533967 SCAFFOLD242_464945 SCAFFOLD403_77850 66.3 67.4 SCAFFOLD268_49331 SCAFFOLD21_181321 SCAFFOLD616_44995 76.1 SCAFFOLD616_56174 35.4 46.8 49.2 89.0 91.3 100.0 103.5 108.2 119.7 120.8 135.2 136.4 139.3 165.0 166.1 167.9 169.6 SCAFFOLD279_59140 SCAFFOLD279_55932 SCAFFOLD28_18015 SCAFFOLD18_67182 SCAFFOLD18_67347 SCAFFOLD605_112520 SCAFFOLD118_153068 SCAFFOLD605_74042 SCAFFOLD605_74164 SCAFFOLD474_40141 SCAFFOLD142_191024 SCAFFOLD422_71080 SCAFFOLD142_290702 SCAFFOLD142_356285 SCAFFOLD142_290901 SCAFFOLD40_104913 SCAFFOLD40_133995 SCAFFOLD40_105012 SCAFFOLD145_281795 SCAFFOLD40_134441 SCAFFOLD279_79575 SCAFFOLD145_295434 SCAFFOLD145_233835 SCAFFOLD145_151929 SCAFFOLD145_193794 SCAFFOLD145_78665 C257473_4515 SCAFFOLD302_281405 C257473_4394 SCAFFOLD302_278299 SCAFFOLD302_216500 SCAFFOLD193_18192 SCAFFOLD42_7667 SCAFFOLD42_47372 SCAFFOLD15_49351 SCAFFOLD42_11031 SCAFFOLD42_7882 SCAFFOLD19_38178 0.0 8.6 8.7 11.0 SCAFFOLD10_202034 SCAFFOLD363_39271 SCAFFOLD10_133672 SCAFFOLD10_87154 SCAFFOLD576_19952 SCAFFOLD10_92792 SCAFFOLD10_114461 SCAFFOLD10_114662 SCAFFOLD22_132660 SCAFFOLD22_23946 49.8 50.3 SCAFFOLD83_23496 SCAFFOLD73_253597 SCAFFOLD73_96054 SCAFFOLD73_268433 SCAFFOLD73_178173 SCAFFOLD73_253765 SCAFFOLD242_171619 SCAFFOLD242_171803 SCAFFOLD242_515851 SCAFFOLD242_464945 SCAFFOLD242_533967 MAT SCAFFOLD403_77850 66.4 67.5 67.6 76.3 SCAFFOLD268_49331 SCAFFOLD21_181321 SCAFFOLD616_44995 SCAFFOLD616_56174 35.4 46.9 49.2 89.2 91.5 100.2 103.7 108.5 119.9 121.1 135.5 136.6 139.6 165.2 166.4 168.1 169.8 SCAFFOLD279_59140 SCAFFOLD279_55932 SCAFFOLD28_18015 SCAFFOLD18_67347 SCAFFOLD18_67182 SCAFFOLD605_74164 SCAFFOLD605_74042 SCAFFOLD118_153068 SCAFFOLD605_112520 SCAFFOLD474_40141 SCAFFOLD142_191024 SCAFFOLD422_71080 SCAFFOLD142_290702 SCAFFOLD142_290901 SCAFFOLD142_356285 SCAFFOLD40_134441 SCAFFOLD40_133995 SCAFFOLD40_104913 SCAFFOLD279_79575 SCAFFOLD145_295434 SCAFFOLD40_105012 SCAFFOLD145_281795 SCAFFOLD145_233835 SCAFFOLD145_151929 SCAFFOLD145_193794 SCAFFOLD145_78665 C257473_4515 SCAFFOLD302_281405 C257473_4394 SCAFFOLD302_278299 SCAFFOLD302_216500 SCAFFOLD42_11031 SCAFFOLD19_38178 SCAFFOLD42_7667 SCAFFOLD42_47372 SCAFFOLD15_49351 SCAFFOLD42_7882 SCAFFOLD193_18192 Figure 4. Comparison between maps of linkage group LG3 with (right) and without (left) mating type locus (MAT). Genetic positions of markers (Haldane cM) are indicated on the left. Marker names indicating scaffold of origin and position on that scaffold in base pairs are indicated on the right. Red lines connect identical markers. Alignment errors Ten scaffolds were found to have markers that did not all cluster to the same place on the map. In nine cases, all but one of the markers were found clustered at the same location, while a single deviant marker mapped somewhere else. In the remaining case, there were only two markers from scaffold 292 and they both mapped to different locations. Alignment of the GBS tags from the lone markers (one for each of the first nine cases and both for the last one) to the reference genome revealed that all except one of the two markers from scaffold 292 (the one on LG2) had many strong hits on a wide variety of different scaffolds. Therefore, these markers were likely assigned to the wrong scaffold by misalignment during the alignment step of the GBS pipeline and these occurrences do not indicate that any of the linkage groups should be connected. Genome Coverage In total 198 scaffolds were represented on the linkage map. Together, these scaffolds make up 67% of the total length of the genome assembly. Furthermore, 78% of all scaffolds larger than 200 kb and 93% of all scaffolds larger than 500 kb were represented on the linkage map. By combining the information on the physical and genetic position for all scaffolds represented by at least four markers, the average ratio of physical to genetic distance was estimated to be 29.3 kb/cM. Using this estimate and the total length of the linkage map, the total physical 17 distance represented by the map was estimated to be roughly 56% of the total length of the genome assembly. Marker distribution Two-tailed hypergeometric tests indicated that 22 scaffolds were significantly overrepresented and three scaffolds (scaffolds 12, 100, and 177) were significantly underrepresented in the dataset used for mapping (α = 0.05). The significantly underrepresented scaffolds together make up 2.5% of the reference genome. After correction for multiple testing by Bonferroni or Benjamini Hochberg correction, no significant results remained, probably reflecting the low power resulting from the small number of observations compared to the number of tests. Two-tailed Poisson tests to test the distribution of markers over the linkage groups indicated that LG3 had significantly fewer markers than expected and LG6, LG10, LG12, and LG14 had significantly more markers than expected (Table 2). Kolmogorov-Smirnov tests revealed that the distribution of markers within the linkage group deviated significantly from the uniform expectation for linkage groups 2, 4, 5, 6, 7, 9, 10, 11, 14, 15, and 16. Q-Q plots revealed that for many linkage groups the genetic position increased in jumps rather than in a continuous fashion, with many markers sharing the same genetic position and relatively large gaps between clusters of markers (Figure 5). This could be the result of a clustering of markers on the physical map, but may also be the result of large differences in the recombination rate across the chromosome. Table 2. Some summary statistics of the Termitomyces sp. linkage map. Average interval sizes were calculated as the average of all non-zero distances between markers. The expected number of markers was based on the Poisson expectation (total number of markers multiplied by the ratio between the length of the linkage group and the total map length). P-values were computed from one tail of the Poisson distribution and need to be compared to α/2 for a two-sided test. Linkage Observed Longest Average Average Number Expected Onegroup length interval marker interval of number tailed (cM) (cM) spacing (cM) size (cM) markers of Poisson markers p-values LG1 156 30.3 2.6 7.8 59 70.1 1.0∙10-1 LG2 88 24.4 2.4 5.2 37 39.5 3.9∙10-1 LG3 197 25.7 3.1 6.8 64 88.4 4.0∙10-3 LG4 155 24.5 2.1 6.4 74 69.6 3.1∙10-1 LG5 104 30.3 2.7 9.5 39 46.8 1.4∙10-1 LG6 61 22.6 1.4 5.6 43 27.6 4.0∙10-3 LG7 86 22.6 2.1 12.3 41 38.8 3.8∙10-1 LG8 98 15.9 3.0 7.0 32 43.8 3.8∙10-2 LG9 70 24.4 2.6 11.7 27 31.5 2.4∙10-1 LG10 12 8.6 0.9 2.0 13 5.4 4.1∙10-3 LG11 121 22.6 2.0 6.0 60 54.3 2.4∙10-1 LG12 15 4.8 0.5 2.6 30 6.9 7.8∙10-11 LG13 36 17.5 4.0 9.0 9 16.1 4.1∙10-2 LG14 49 14.4 1.4 6.1 35 22.1 6.9∙10-3 LG15 17 8.7 1.3 4.3 13 7.7 5.0∙10-2 LG16 39 30.3 3.9 7.7 10 17.4 4.1∙10-2 Total 1303 586 586 Average 81 20.5 2.3 6.9 37 36.6 18 Figure 5. Q-Q plots of standardised observed genetic positions of markers (Observed) against their expectations under a uniform distribution (Expected) for all sixteen linkage groups (LG1 – LG16). The straight line gives indicates the cases where observation and expectation are the same. Stars indicate linkage groups where the marker distribution deviates significantly from a uniform distribution according to a Kolmogorov-Smirnov test (α = 0.05). Physical and genetic distance To further examine the possibility of an uneven recombination rate across the chromosomes, genetic distances were plotted against physical distances for all scaffolds of which at least ten markers were represented on the linkage map (Figure 6). These plots show the same pattern of stepwise increase of the genetic distance as the physical distance increases. The magnitude of these steps (up to 30 cM) cannot just be explained by the limited resolution of the map, which should be between 1 and 2 cM for 88 samples with up to 30% missing genotypes. Therefore, the steps seen in Figure 5 are at least not entirely due to a physical clustering of markers. 19 Figure 6. Genetic position (cM) as a function of physical position (kb) for all scaffolds from which at least ten markers are represented on the linkage map. Recombination events For each linkage group the number of recombination events was counted and the average number of recombination events was estimated from these counts (Table 3). This estimate should reflect the genetic length of the linkage group (in Morgan), but is probably an underestimation, because any double crossovers that may have taken place between two markers will be missed. The observed genetic lengths of the linkage groups were converted from Haldane cM to Kosambi cM for comparison. This revealed that the Haldane length estimates were consistently higher than expected based on the mean number of recombination events, while the Kosambi estimates were consistently lower. Discussion In this study, a genotyping-by-sequencing approach was used to discover SNP markers for the construction of the first genetic map of the Termitomyces species associated with M. natalensis. After filtering, 591 SNPs remained for use in linkage mapping, much fewer than reported in previous studies (Guajardo et al., 2015; İpek et al., 2016; Ma et al., 2012). This result can be partly explained by the use of a less frequently cutting restriction enzyme in GBS library preparation, since this will lead to fewer fragments that can be sequenced. In addition, the filtering performed here was quite strict and more markers may have been obtained by relaxing the filtering conditions. However, this would result in many unreliable markers with much missing data, which would reduce the accuracy of the linkage map (Foulongne-Oriol, 2012). Another possible explanation may be that the natural isolate from 20 which the mapping population was derived did not contain that much genetic variation, due to e.g. inbreeding. This would be unexpected, this species of Termitomyces has been found to be largely outbreeding (De Fine Licht et al., 2006), but it could still have happened purely by chance. Table 3. Numbers of recombination events per linkage group, average number of recombination events and observed linkage group lengths in both Haldane and Kosambi cM. Linkage Number of recombination events Mean number of Observed Observed group per linkage group recombination length length events (Haldane cM) (Kosambi cM) 0 1 2 3 4 5 LG1 19 33 28 8 0 0 1.28 156 95 LG2 35 40 12 1 0 0 0.76 88 59 LG3 16 32 26 11 2 1 1.48 197 115 LG4 14 39 30 4 1 0 1.31 155 94 LG5 29 44 14 1 0 0 0.85 104 68 LG6 45 39 4 0 0 0 0.53 61 44 LG7 38 38 11 1 0 0 0.72 86 58 LG8 29 42 16 1 0 0 0.88 98 64 LG9 49 28 11 0 0 0 0.57 70 49 LG10 78 10 0 0 0 0 0.11 12 11 LG11 21 43 20 4 0 0 1.08 121 77 LG12 76 11 1 0 0 0 0.15 15 14 LG13 60 28 0 0 0 0 0.32 36 28 LG14 51 35 2 0 0 0 0.44 49 37 LG15 74 14 0 0 0 0 0.16 17 15 LG16 62 26 0 0 0 0 0.30 39 30 Total 1303 857 Average 0.68 81 54 The GBS analysis also yielded many loci containing heterozygous genotype calls. Since heterozygosity should not be present in a population of homokaryons, these calls are probably the result of alignments of paralogous or repetitive sequences to the same place on the reference genome. The many occurrences of this phenomenon may indicate problems with the alignment to or the assembly of the reference genome. Fortunately, the nature of the mapping population allowed for reliable detection of these events and therefore they should not influence the quality of the genetic map. The GBS analysis pipeline used here is not the only pipeline that can be used for the analysis of GBS data. Other pipelines, such as the UNEAK pipeline, which does not require a reference genome (Lu et al., 2013), and the recently developed reference optional GBS-SNPCROP pipeline (Melo et al., 2016), can also be used. Reanalysing the data using these pipelines may improve the number of SNPs identified, since the overlap in the SNPs discovered by each of these pipelines tends to be small (Melo et al., 2016). Using the markers identified by GBS, a linkage map was constructed. The map was based on 88 haploid progeny and consisted of 586 markers. The map currently consists of sixteen linkage groups, which seems to reasonably correspond with haploid chromosome numbers reported for other basidiomycetes, such as Agaricus bisporus (13; Royer et al., 1992) and Schizophyllum Commune (11; Carmi et al., 1978). Unfortunately, the haploid chromosome numbers of all Termitomyces species are still unknown, and therefore cannot be used as comparison. 21 The current groupings at LOD thresholds of 6 and higher appear to be quite reliable, since many cases can be found where linkage across the larger intervals is backed up by information from the scaffold on which the marker was found. In addition, all cases in which a single marker did not map near the other markers from the scaffold that it was thought to be part of, could be contributed to errors in the alignment step of the GBS pipeline. Some of the current linkage groups may, however, still belong together. For example, linkage groups 8 and 11 were split apart here, because of the large interval between the two groups, even though they were still grouped together at a LOD threshold of 8. In addition, some of the current linkage groups are quite small and may well be incomplete, or belong to other linkage groups. New rounds of GBS to add more individuals or find additional markers may provide additional evidence for some of these groups to be linked together. Moreover, the constituent homokaryons of the parent heterokaryon could be included in a new round of GBS, which would improve the accuracy of the map, because the parental linkage phases would no longer need to be estimated from the data. Since the parent heterokaryon was a natural isolate, its constituent homokaryons are not available, but they may be recreated by protoplasting. A preliminary mapping of the mating type locus to the linkage map indicates with reasonable certainty (up to a LOD threshold of 8) that this locus belongs on linkage group 3. The exact position of the locus on this linkage group may, however, be unreliable, because only 29 of the 88 individuals used in mapping were genotyped for this trait. Furthermore, genotyping was based on the phenotypic examination of the product of crosses, which may be errorprone. Therefore, further studies are needed, in which all individuals used in mapping are genotyped in a more reliable way, e.g. using molecular markers that differ between the two crossed individuals to detect the successful formation of a heterokaryon. Due to the fragmented nature of the current reference genome (consisting of hundreds of scaffolds and thousands of contigs), it is impossible to be certain if the linkage map covers the entire genome. Estimates based on the lengths of the scaffolds represented on the map and the average genetic to physical distance ratio indicate that roughly half the genome is represented by the linkage map. This might indicate that part of the genome is not represented, possibly due to a lack of heterozygosity of the parent heterokaryon at certain regions of the genome. However, since most scaffolds are quite small and most of the larger scaffolds were represented on the linkage map, many scaffolds that were not represented on the map may actually fall in between scaffolds that were. In addition, the estimate of the genetic to physical distance ratio is based on many small pieces of the linkage map and may be quite inaccurate, especially if the recombination rate is not constant across the genome. Also, there do not appear to be many scaffolds that have significantly fewer markers than expected by chance. For these reasons, there is no conclusive evidence that large parts of the genome are missing from the linkage map, although it cannot be ruled out. Future studies aimed at improving the reference genome assembly may help resolve this issue. Such efforts may be informed by the current linkage map, which already provides an order for a large number of scaffolds, illustrating how genetic and physical mapping approaches can complement each other. Markers appear to be mostly fairly distributed across the linkage groups, with a few exceptions. Distribution of markers within the linkage groups is, however, mostly not uniform. Instead, many linkage groups contain clusters of many markers at the same genetic position, spaced by relatively large intervals without any markers. This could be explained by a physical clustering of markers, or by large differences in the recombination rate along the chromosome. Plots of the genetic distance as a function of the physical distance (for the scaffolds where this is possible), indicate that the presence of strong recombination hotspots and coldspots is at least part of the answer. The fact that these ‘jumps’ in genetic distance along the physical chromosome are visible at all of the examined positions may indicate that the recombination landscape of Termitomyces is largely governed by these hotspots and 22 coldspots. Further evidence for this hypothesis would require a more complete, less fragmented reference genome assembly, which would allow comparison of physical and genetic distance over larger regions of the chromosomes. The genetic lengths of the linkage groups estimated from the average number of recombination events was systematically lower than the estimate of the mapping software in Haldane cM and systematically higher than that estimate in Kosambi cM. Since using the number of recombination events likely underestimates the genetic lengths (double crossovers between adjacent markers are missed) and the estimates from the maximum likelihood algorithm tend to be inflated (Hackett and Broadfoot, 2003), the Haldane mapping function probably produces more accurate estimates of the true lengths than the Kosambi mapping function. This indicates that crossover interference is at least less strong than assumed by the Kosambi function and may even be completely absent. In addition to providing an order for many of the scaffolds from the reference genome and offering insight into the recombination behaviour of the species, the linkage map will be a useful tool for future genetic analyses. Although it is possible that it is not yet complete, it has already proven useful in narrowing down the location of the mating type locus. Other interesting genes and QTLs segregating in the mapping population may be mapped in similar ways, potentially allowing the discovery of genes involved in the symbiosis with M. natalensis as well as the formation of mushrooms. Such mapping studies have already been successfully performed for several commercially important traits such as yield (FoulongneOriol et al., 2012), bruising sensitivity (Gao et al., 2015), and disease resistance (Moquet et al., 1999) in the cultivated white button mushroom (Agaricus bisporus). When cultivation of Termitomyces mushrooms becomes feasible, the map may also become a useful tool in breeding, for example through marker assisted selection. Future efforts should focus on improving the reference genome assembly and determining the haploid chromosome number to help find the physical location and exact sequence of mapped genes, as well as gain additional insights into the recombination landscape of Termitomyces. Conclusion Here, the first genetic map of the Termitomyces species associated with M. natalensis is presented. The map, based on 88 haploid progeny of a single heterokaryon, consists of 586 SNP markers discovered by GBS, indicating that GBS is not only a cost-effective way of marker discovery for linkage mapping in plants, but also in fungi. The map was used to narrow down the location of the mating type locus and will be a useful tool for the identification of other loci. In addition, it provides indications that the recombination landscape of Termitomyces is dominated by hotspots and coldspots and that crossover interference plays a relatively small role. Future efforts to improve the assembly of the reference genome will be necessary to confirm these indications. Acknowledgements I would like to thank Sabine Vreeburg for supervising me, isolating the mapping population, and performing the preliminary genotyping of the mating type locus. In addition, I would like to thank Bertha Koopmanschap and Marijke Slakhorst for help in the lab, Lennart van de Peppel for help with the PCR, Alex Grum Grzhimaylo for help with the RNase, Bart Pannebakker for help with the Linux computer, Erik Wijnker for useful discussions about linkage mapping, and Duur Aanen for useful discussions and comments on the report. Also, I would like to thank the people from the Genomic Diversity Facility at Cornell University for performing the GBS analysis and useful discussions on the sample preparation. 23 References Aanen, D. K., & Eggleton, P. (2005). Fungus-growing termites originated in African rain forest. Current biology, 15(9), 851-855. Aanen, D. K., Eggleton, P., Rouland-Lefevre, C., Guldberg-Frøslev, T., Rosendahl, S., & Boomsma, J. J. (2002). The evolution of fungus-growing termites and their mutualistic fungal symbionts. Proceedings of the National Academy of Sciences, 99(23), 1488714892. Aanen, D. K., Ros, V. I., de Fine Licht, H. H., Mitchell, J., De Beer, Z. W., Slippers, B., ... & Boomsma, J. J. (2007). Patterns of interaction specificity of fungus-growing termites and Termitomyces symbionts in South Africa. BMC evolutionary biology, 7(115). Bielenberg, D. G., Rauh, B., Fan, S., Gasic, K., Abbott, A. G., Reighard, G. L., ... & Wells, C. E. (2015). Genotyping by Sequencing for SNP-Based Linkage Map Construction and QTL Analysis of Chilling Requirement and Bloom Date in Peach [Prunus persica (L.) Batsch]. PloS one, 10(10), e0139406. Botha, W. J., & Eicker, A. (1992). Nutritional value of Termitomyces mycelial protein and growth of mycelium on natural substrates. Mycological research, 96(5), 350-354. Bradbury, P. J., Zhang, Z., Kroon, D. E., Casstevens, T. M., Ramdoss, Y., & Buckler, E. S. (2007). TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics, 23(19), 2633-2635. Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., & Madden, T. L. (2009). BLAST+: architecture and applications. BMC bioinformatics, 10(421). Carmi, P., Holm, P. B., Koltin, Y., Rasmussen, S. W., Sage, J., & Zickler, D. (1978). The pachytene karyotype of Schizophyllum commune analyzed by three dimensional reconstruction of synaptonemal complexes. Carlsberg Research Communications, 43(2), 117-132. Chatterjee, A., Khatua, S., Chatterjee, S., Mukherjee, S., Mukherjee, A., Paloi, S., ... & Bandyopadhyay, S. K. (2013). Polysaccharide-rich fraction of Termitomyces eurhizus accelerate healing of indomethacin induced gastric ulcer in mice. Glycoconjugate journal, 30(8), 759-768. Davey, J. W., Hohenlohe, P. A., Etter, P. D., Boone, J. Q., Catchen, J. M., & Blaxter, M. L. (2011). Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nature Reviews Genetics, 12(7), 499-510. De, A. B. (1983). Basidiocarp production by Termitomyces microcarpus (Berk. and Br.) Heim in culture. Current Science, 52(10), 494-495. De Fine Licht, H. H., Andersen, A., & Aanen, D. K. (2005). Termitomyces sp. associated with the termite Macrotermes natalensis has a heterothallic mating system and multinucleate cells. Mycological research, 109(3), 314-318. De Fine Licht, H. H., Boomsma, J. J., & Aanen, D. K. (2006). Presumptive horizontal symbiont transmission in the fungus‐growing termite Macrotermes natalensis. Molecular ecology, 15(11), 3131-3138. Elshire, R. J., Glaubitz, J. C., Sun, Q., Poland, J. A., Kawamoto, K., Buckler, E. S., & Mitchell, S. E. (2011). A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PloS one, 6(5), e19379. Foulongne-Oriol, M. (2012). Genetic linkage mapping in fungi: current state, applications, and future trends. Applied microbiology and biotechnology, 95(4), 891-904. Foulongne-Oriol, M., Rodier, A., Rousseau, T., & Savoie, J. M. (2012). Quantitative Trait Locus Mapping of Yield-Related Components and Oligogenic Control of the Cap 24 Color of the Button Mushroom, Agaricus bisporus. Applied and environmental microbiology, 78(7), 2422–2434. Foulongne-Oriol, M., Spataro, C., Cathalot, V., Monllor, S., & Savoie, J. M. (2010). An expanded genetic linkage map of an intervarietal Agaricus bisporus var. bisporus× A. bisporus var. burnettii hybrid based on AFLP, SSR and CAPS markers sheds light on the recombination behaviour of the species. Fungal Genetics and Biology, 47(3), 226236. Gao, W., Weijn, A., Baars, J. J., Mes, J. J., Visser, R. G., & Sonnenberg, A. S. (2015). Quantitative trait locus mapping for bruising sensitivity and cap color of Agaricus bisporus (button mushrooms). Fungal Genetics and Biology, 77, 69-81. Glaubitz, J. C., Casstevens, T. M., Lu, F., Harriman, J., Elshire, R. J., Sun, Q., & Buckler, E. S. (2014). TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline. PLoS One, 9(2), e90346. Guajardo, V., Solís, S., Sagredo, B., Gainza, F., Muñoz, C., Gasic, K., & Hinrichsen, P. (2015). Construction of high density sweet cherry (Prunus avium L.) linkage maps using microsatellite markers and SNPs detected by genotyping-by-sequencing (GBS). PloS one, 10(5), e0127750. Hackett, C. A., & Broadfoot, L. B. (2003). Effects of genotyping errors, missing values and segregation distortion in molecular marker data on the construction of linkage maps. Heredity, 90(1), 33-38. İpek, A., Yılmaz, K., Sıkıcı, P., Tangu, N. A., Öz, A. T., Bayraktar, M., ... & Gülen, H. (2016). SNP Discovery by GBS in Olive and the Construction of a High-Density Genetic Linkage Map. Biochemical genetics 54(3), 313-325. Johnson, R. A., Thomas, R. J., Wood, T. G., & Swift, M. J. (1981). The inoculation of the fungus comb in newly founded colonies of some species of the Macrotermitinae (Isoptera) from Nigeria. Journal of Natural History, 15(5), 751-756. Jones, N., Ougham, H., Thomas, H., & Pašakinskienė, I. (2009). Markers and mapping revisited: finding your gene. New Phytologist, 183(4), 935-966. Kansci, G., Mossebo, D. C., Selatsa, A. B., & Fotso, M. (2003). Nutrient content of some mushroom species of the genus Termitomyces consumed in Cameroon. Food/Nahrung, 47(3), 213-216. Kües, U., James, T. Y., & Heitman, J. (2011). Mating Type in Basidiomycetes: Unipolar, Bipolar, and Tetrapolar Patterns of Sexuality. In S. Pöggeler & J. Wöstemeyer (Eds.), The Mycota XIV: Evolution of fungi and fungal-like organisms (pp. 97-160). Berlin: Springer. Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics, 25(14), 1754-1760. Lu, F., Lipka, A. E., Glaubitz, J., Elshire, R., Cherney, J. H., Casler, M. D., ... & Costich, D. E. (2013). Switchgrass genomic diversity, ploidy, and evolution: novel insights from a network-based SNP discovery protocol. PLoS Genetics, 9(1), e1003215. Ma, X. F., Jensen, E., Alexandrov, N., Troukhan, M., Zhang, L., Thomas-Jones, S., ... & Flavell, R. (2012). High resolution genetic mapping by genome sequencing reveals genome duplication and tetraploid genetic structure of the diploid Miscanthus sinensis. PloS one, 7(3), e33821. Malek, S. N. A., Kanagasabapathy, G., Sabaratnam, V., Abdullah, N., & Yaacob, H. (2012). Lipid components of a Malaysian edible mushroom, Termitomyces heimii natarajan. International Journal of Food Properties, 15(4), 809-814. Melo, A. T., Bartaula, R., & Hale, I. (2016). GBS-SNP-CROP: a reference-optional pipeline for SNP discovery and plant germplasm characterization using variable length, pairedend genotyping-by-sequencing data. BMC bioinformatics, 17(29). 25 Moquet, F., Desmerger, C., Mamoun, M., Ramos-Guedes-Lafargue, M., & Olivier, J. M. (1999). A quantitative trait locus of Agaricus bisporus resistance to Pseudomonas tolaasii is closely linked to natural cap color. Fungal Genetics and Biology, 28(1), 3442. Nabubuya, A., Muyonga, J. H., & Kabasa, J. D. (2010). Nutritional and hypocholesterolemic properties of Termitomyces microcarpus mushrooms. African Journal of Food, Agriculture, Nutrition and Development, 10(3), 2235-2257. Nobre, T., Koopmanschap, B., Baars, J. J., Sonnenberg, A. S., & Aanen, D. K. (2014). The scope for nuclear selection within Termitomyces fungi associated with fungus-growing termites is limited. BMC evolutionary biology, 14(121). Ogundana, S. K., & Fagade, O. E. (1982). Nutritive value of some Nigerian edible mushrooms. Food chemistry, 8(4), 263-268. Okuda, Y., Murakami, S., & Matsumoto, T. (2009). A genetic linkage map of Pleurotus pulmonarius based on AFLP markers, and localization of the gene region for the sporeless mutation. Genome, 52(5), 438-446. Oso, B. A. (1975). Mushrooms and the Yoruba people of Nigeria. Mycologia, 67(2), 311-319. Poulsen, M., Hu, H., Li, C., Chen, Z., Xu, L., Otani, S., ... & Zhang, G. (2014). Complementary symbiont contributions to plant decomposition in a fungus-farming termite. Proceedings of the National Academy of Sciences, 111(40), 14500-14505. R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/. Remington, D. L., Whetten, R. W., Liu, B. H., & O’malley, D. M. (1999). Construction of an AFLP genetic map with nearly complete genome coverage in Pinus taeda. Theoretical and Applied Genetics, 98(8), 1279-1292. Rowe, H. C., Renaut, S., & Guggisberg, A. (2011). RAD in the realm of next‐generation sequencing technologies. Molecular Ecology, 20(17), 3499-3502. Royer, J. C., Hintz, W. E., Kerrigan, R. W., & Horgen, P. A. (1992). Electrophoretic karyotype analysis of the button mushroom, Agaricus bisporus. Genome, 35(4), 694698. Sands, W. A. (1956). Some factors affecting the survival of Odontotermes badius. Insectes sociaux, 3(4), 531-536. Van der Nest, M. A., Slippers, B., Steenkamp, E. T., De Vos, L., Van Zyl, K., Stenlid, J., ... & Wingfield, B. D. (2009). Genetic linkage map for Amylostereum areolatum reveals an association between vegetative growth and sexual and self-recognition. Fungal Genetics and Biology, 46(9), 632-641. Van Ooijen JW. (2016) JoinMap ® 4, Software for the calculation of genetic linkage maps in experimental populations. Kyazma B.V., Wageningen, Netherlands. Xu, J., Kerrigan, R. W., Horgen, P. A., & Anderson, J. B. (1993). Localization of the mating type gene in Agaricus bisporus. Applied and environmental microbiology, 59(9), 3044-3049. 26 A simple, working protoplasting protocol for the Termitomyces symbiont of Macrotermes natalensis Abstract Including the constituent homokaryons of the parent heterokaryon from the linkage mapping study described in the previous chapter in a new mapping study may help improve the accuracy of the genetic map. Obtaining these constituent homokaryons requires the ability to make protoplasts of the parent heterokaryon, but the only efficient protoplasting protocol for Termitomyces is quite complicated. Here, a simple, working protoplasting protocol for the Termitomyces symbiont of Macrotermes natalensis is presented, yielding up to 5∙106 protoplasts/mL. Protoplasts could be regenerated on plates with sucrose as osmotic stabiliser, but not on plates with KCl. Homokaryons among regenerated protoplasts may be identified and used in future linkage mapping studies. For other purposes, such as transformation, the current protocol will need to be optimised, to improve its efficiency. Introduction To improve the accuracy of the genetic map described in the first part of this report, the constituent homokaryons of the parent heterokaryon of the mapping population may be included in a future round of GBS and linkage mapping. This would allow the direct determination of the parental genotypes and therefore the parental linkage phases would no longer need to be estimated from the data on the recombined progeny. Unfortunately, the parent heterokaryon was a natural isolate and not the result of an artificial cross between two homokaryons. Therefore, its constituent homokaryons are not available. However, the parent heterokaryon still contains two separate types nuclei, each with the genome of one of its two homokaryotic parents. Degrading the cell wall of this heterokaryon to liberate protoplasts (which may by chance occasionally contain only one of the two different types of nuclei) therefore allows for the regeneration of the constituent homokaryons. Efficient protoplasting protocols for filamentous fungi have already been described for many organisms, including ascomycetes Aspergillus niger (Arentshorst et al., 2012) and Cochliobolus heterostrophus (Turgeon et al., 2010), as well as basidiomycetes Agaricus bisporus and Agaricus bitorquis (Sonnenberg et al., 1988). A protoplasting protocol has also been published for Termitomyces clypeatus (Mukherjee and Sengupta, 1988), but this protocol is quite complicated and impractical. Here, a practical, working protoplasting protocol for the Termitomyces symbiont of Macrotermes natalensis is presented. In this protocol, young mycelium from either liquid or solid medium is used as starting material. Using material from a solid culture yielded the highest concentration of protoplasts. The current protocol will be useful for the recreation of the constituent homokaryons of the parent heterokaryon from the mapping population. Further optimisation of the protocol will be needed to make the procedure efficient enough for e.g. transformations. Methods Strain and growth medium The strain used for protoplasting was the parent heterokaryon from which the mapping population for the construction of the genetic linkage map was obtained (see the methods from the first part of this report). The liquid growth medium was malt yeast extract (MY) 27 medium, consisting of 20 g/L malt extract and 2 g/L yeast extract. Solid growth medium (malt yeast extract agar; MYA) consisted of 20 g/L malt extract, 2 g/L yeast extract and 15 g/L agar. Growth and harvesting of mycelium for protoplasting One gram of mycelium (wet weight) from an old liquid culture was crushed in 1 mL saline solution (8 g/L NaCl) to obtain a homogeneous suspension. For growth in liquid culture, 1 mL of this suspension was added to 100 ml MY medium in a sterile 500 ml Erlenmeyer flask. For growth on solid medium, 300 μL of the saline suspension was spread on an MYA plate with a 76 mm polycarbonate membrane with a pore size of 0.1 μm (Profiltra, catalog number K01CP07600) to prevent the mycelium from growing into the agar. Erlenmeyers were incubated for one week at 25 °C and 100 rpm and plates for two days at 25 °C. Young mycelium from liquid cultures was harvested using a sterile Büchner funnel with a sterile nylon filter, washed with 0.6 M sucrose and scraped into a pre-weighted petri dish for weighing. Young mycelium from plates was scraped directly from the agar plate into a preweighted petri dish. Harvested material was weighed and 1.2 g of material from the liquid culture and 0.75 g of material from the solid culture was used for protoplasting. Protoplast production The harvested mycelium was added to 10 mL of protoplasting solution (0.6 M sucrose and 20 g/L Novozym 234 (Novo Nordisk), filter sterilised) in a 50 mL tube. The protoplasting mixture was incubated for 2-3 hours at 30 °C and 80 rpm, shaking horizontally. The resulting mixture with protoplasts was filtered over a glass wool plug in a funnel pre-rinsed with 0.6 M sucrose into a fresh tube, after which the filter was rinsed again with 0.6 M sucrose. The protoplasts in the filtrate were collected by centrifugation (10 min, 2000 × g, 10 °C). The supernatant was discarded, the protoplasts were resuspended in 5 mL 0.6 M sucrose and collected again by centrifugation (5 min, 3000 × g, room temperature). Again, the supernatant was discarded and the protoplasts were resuspended in 5 mL 0.6 M sucrose, after which their concentration was determined using a Neubauer haemocytometer (Brand GmbH + Co KG). The mixture was then again centrifuged (5 min, 3000 × g, room temperature), the supernatant was discarded and the protoplasts were resuspended in 0.6 M sucrose to a concentration of approximately 5∙106 protoplasts/mL. Protoplast regeneration Dilutions of factor 10 and 100 were made from the protoplast suspension derived from the liquid culture and dilutions of factor 10, 100, and 1000 were made from the protoplast suspension derived from the solid culture. Of each dilution, 100 μL was spread on each of two different types of regeneration plates: MYA with 0.6 M sucrose and MYA with 0.5 M KCl. Regeneration plates were incubated for 6 days at 25 °C. From the plates where regeneration had been successful after the incubation period, individual outgrowths from regenerated protoplasts were transferred to fresh MYA plates, which were incubated at 25 °C. Results The mycelium from the liquid culture yielded 500 μL of 5∙106 protoplasts/mL, while the mycelium from solid culture yielded 5 mL of 5∙106 protoplasts/mL. Protoplast regeneration was observed on all plates with sucrose as osmotic stabiliser, but on none of the plates where KCl was used as osmotic stabiliser. 28 Discussion In spite of the fact that less mycelium from the solid culture than from the liquid culture was used in protoplasting, the mycelium from the solid culture yielded roughly ten times as many protoplasts. A reason for this may be that the solid culture grew much faster, allowing for its use after only two days, as compared to one week for the liquid culture. Therefore, the mycelium from the solid culture was much younger at the time of protoplasting. Young mycelium tends to yield more protoplasts, because there has been less time for the fungus to form a thick cell wall (Turgeon et al., 2010). Protoplasts failed to regenerate on any of the plates using 0.5 M KCl as osmotic stabiliser. This is surprising, since an efficient protoplasting protocol has been published for Termitomyces clypeatus, in which the same concentration KCl is used as osmotic stabiliser in the regeneration plates (Mukherjee and Sengupta, 1988). The fact that KCl was combined with MYA here and with a different medium in the previous study, may be the cause of the different outcome. The regenerated protoplasts that were picked up from the regeneration plates with 0.6 M sucrose can be tested for homozygosity using the PCR-RFLP marker developed in the linkage mapping study from the previous part of this report. This way, with a bit of luck, the two different constituent homokaryons may be found in the population of regenerated protoplasts and used in a future round of GBS to improve the genetic map. The current protocol provides a relatively simple, working method for the generation of protoplasts. For some purposes, such as protoplast transformation, which requires about 108 protoplasts/mL (Turgeon et al., 2010), the current method is, however, not yet efficient enough. Further optimisation of the protocol will be needed to improve its efficiency. Possibilities for optimisation include adapting the amount as well as the youth of the mycelium to be treated with lytic enzymes. Using more and younger mycelium may result in the release of more protoplasts. In addition, other lytic enzymes could be added to the protoplasting solution. Mukherjee and Sengupta (1988) showed that a combination of cellulase, chitinase, and novozym 234 was far more effective than novozym 234 by itself. Therefore, adding cellulase and chitinase may also improve the protoplasting efficiency. However, the high concentrations of these enzymes that were used in the previous study may make their use more expensive than can be justified by the increase in efficiency. Conclusion Here, a simple, working protoplasting protocol for the Termitomyces symbiont of M. natalensis was presented. The protocol is most efficient with relatively young mycelium grown on solid medium. The current efficiency of the protocol may be enough for the isolation of the constituent homokaryons of a heterokaryon, but for other purposes, such as transformations, it still needs to be optimised. Acknowledgements I would like to thank Sabine Vreeburg for supervising me, Marijke Slakhorst for assistance in the lab, and Linda van Oosten for continuing with the identification of homokaryons among the regenerated protoplasts. 29 References Arentshorst M., Ram A. F. J., & Meyer V. (2012) Using non-homologous end-joiningdeficient strains for functional gene analyses in filamentous fungi. In Bolton M. D. & Thomma B. P. H. J. (Eds.), Plant fungal pathogens: methods and protocols, methods in molecular biology (pp. 133-150). New York: Springer Science + Business Media LLC. Mukherjee, M., & Sengupta, S. (1988). Isolation and regeneration of protoplasts from Termitomyces clypeatus. Canadian journal of microbiology, 34(12), 1330-1332. Sonnenberg, A. S., Wessels, J. G., & van Griensven, L. J. (1988). An efficient protoplasting/regeneration system for Agaricus bisporus and Agaricus bitorquis. Current Microbiology, 17(5), 285-291. Turgeon, B. G., Condon, B., Liu, J., & Zhang, N. (2010). Protoplast transformation of filamentous fungi. In Sharon A., (Ed.), Molecular and Cell Biology Methods for Fungi (pp. 3-19). Totowa: Humana Press. 30 Determining the mating system of the Termitomyces symbiont of Macrotermes natalensis Abstract To reliably determine the mating system of the Termitomyces symbiont of Macrotermes natelensis and develop tester strains to determine the mating type of any homokaryon from this species, a series of crosses was set up. The products of these crosses were purified and are now ready for DNA isolation and further analysis using molecular markers to determine which crosses were successful. Introduction To map the mating type genes of the Termitomyces symbiont of Macrotermes natalensis more reliably than was done in the linkage mapping study from the first part of this report, the success of crosses should not be determined based on phenotypic observations, which may not always be conclusive. Instead, molecular markers could be used. In addition, the mating system should be determined in a more systematic way, to make sure that there is really only one mating type locus, as was assumed in the mapping study. From a previous study, it is known that the Termitomyces symbiont of M. natalensis has a heterothallic mating system (De Fine Licht et al., 2005). In this type of system, only homokaryons of a compatible mating type can successfully form a heterokaryon together. The heterothallic mating systems can be subdivided in bipolar and tetrapolar systems. The tetrapolar system is more common in basidiomycetes and is believed to be the ancestral state. In this system, there are two multi-allelic mating type loci and compatibility occurs only if the alleles of two homokaryons are different at both loci (Kües et al., 2011). The function of such a mating system is to restrict inbreeding and promote outbreeding (Casselton and Economou, 1985). In a tetrapolar mating system, homokaryons are never compatible with themselves and are only compatible with 25% of the homokaryons that originated from the same heterokaryon, because both of the two unlinked mating type loci need to be different. Outbreeding, on the other hand, can remain largely unrestricted as long as there are many different mating type alleles present in the population (Casselton and Economou, 1985). In contrast to species with a tetrapolar system, species with a bipolar mating system only have one mating type locus that needs to be heteroallelic for compatibility to occur. The inbreeding restriction in species with this system is, therefore, only 50%. Bipolarity is thought to have evolved from tetrapolarity several times independently in basidiomycetes. In some cases, bipolarity arose because the two different mating type loci became linked, resulting in effectively one mating type locus (Bakkeren and Kronstad, 1994; Nieuwenhuis et al., 2013). In other cases, one of the two mating type loci lost its function in determining mating compatibility (Aimi et al., 2005; James et al., 2006). Here, a series of crosses is performed using a subset of the homokaryons from the mapping population used in linkage mapping as described in the first chapter. By determining which combinations of these homokaryons can form a stable heterokaryon, the type of the mating system can be deduced. This strategy has already been successfully used in previous studies to identify the mating system of other species of basidiomycetes (Gordon and Petersen, 1991; Aanen and Kuyper, 1999). In addition, tester strains from each of the two or four different mating types could be identified using these crosses. Such tester strains could be used to determine the mating type of all homokaryons in the mapping population, allowing for an accurate mapping of the mating type locus or loci to the genetic map. 31 Usually, heterokaryons can be distinguished from homokaryons by the formation of clamp connections or the possession of two nuclei per cell (Kües, 2000). Unfortunately, Termitomyces heterokaryons do not form clamp connections and have multinucleate cells, making it hard to distinguish homokaryons from heterokaryons morphologically (De Fine Licht et al., 2005). Heterokaryons do grow slightly faster than homokaryons (Kües, 2000), a characteristic that was used in the preliminary mapping of the mating type locus, but this is not a very accurate criterion for distinguishing between the two. Therefore, molecular markers will need to be used to detect the successful formation of a heterokaryon. To this end, samples from the contact zones of the crosses have been purified and prepared for DNA extraction. Methods Set-up of crosses Crosses were made between twenty homokaryons used in the construction of the linkage map. The first ten homokaryons were crossed in all possible combinations, while the last ten were only crossed to each of the first ten (Figure 7). These numbers were chosen such that, in case of a tetrapolar mating system, the probability of finding three of the four mating types in the first ten samples would be greater than 99% and the probability of all four mating types being present in all twenty samples would be greater than 98% (for calculations see Appendix 1). Sample 1 1 2 3 4 5 6 7 8 9 10 2 x 3 x x 4 x x x 5 x x x x 6 x x x x x 7 x x x x x x 8 x x x x x x x 9 10 11 12 13 14 15 16 17 18 19 20 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x Figure 7. Scheme of all crosses for the identification of the mating system. Crosses that have been performed are indicated with an “x”. Crosses were made on plates with malt yeast extract agar (MYA; 20 g/L malt extract, 2 g/L yeast extract and 15 g/L agar). All crosses were done in duplicate on the same plate in such a configuration that the mating between two different homokaryons could immediately be compared to that between two identical homokaryons (Figure 8). Spacing between the different inoculation sites was approximately 5 mm. Crosses were incubated for three weeks at 25 °C. Figure 8. Configuration of inoculation of the homokaryon strains for the crossing experiments. Blue squares (number 1) indicate one parent for the cross and red squares (number 2) indicate the other parent. 32 Purification of cross products To isolate the products of the crosses and make sure that they consist of only a single mycelium (heterokaryon or homokaryon), rather than a mixture of two homokaryons, the cross products had to be purified. To this end, a sample from the contact area of each cross was transferred to the middle a fresh MYA plate. After three weeks of incubation at 25 °C, a sample from the edge of the newly grown colony was again transferred to a fresh MYA plate. If several morphologically distinct areas were observed in the new colonies, a sample from each of these different areas was transferred. New plates with purified colonies were again incubated at 25 °C. Samples from the purified colonies were grown on liquid malt yeast extract medium (20 g/L malt extract and 2 g/L yeast extract) in a shaker at 25 °C and 90 rpm, after which they were frozen for later DNA extraction and further analysis. Results Purified products of most of the crosses have been obtained and are ready for DNA extraction and further analysis. Discussion To determine the outcome of the crosses performed here, DNA will need to be extracted from the purified products and analysed for heterozygosity of markers for which the two crossed homokaryons had different alleles. SNP markers that can be used for this analysis can be selected from the filtered GBS dataset used for the construction of the genetic map. Since SNP marker genotypes of all of the homokaryons used in the crosses are available, markers can be selected in such a way that all twenty homokaryons can be distinguished from one another using a minimal number of markers. The DNA isolated from the products of the crosses can then be genotyped for these markers using a high-throughput method such as KASP (Kompetitive Allele Specific PCR) genotyping (He et al., 2014). From the pattern of compatible and incompatible matings, the mating system can then be deduced, and tester strains of each of the mating types can be identified. These tester strains can then be crossed with each of the homokaryons from the mapping population. By analysing the outcome of these crosses in the same way as before, the mating type of all homokaryons in the mapping population can be determined. These data can then be used to map the mating type locus more accurately than before. Acknowledgements I would like to thank Sabine Vreeburg for supervising me, Marijke Slakhorst for help in the lab, and Linda van Oosten for transferring the purified products of the crosses to liquid culture. References Aanen, D. K., & Kuyper, T. W. (1999). Intercompatibility tests in the Hebeloma crustuliniforme complex in northwestern Europe. Mycologia, 91(5), 783-795. Aimi, T., Yoshida, R., Ishikawa, M., Bao, D., & Kitamoto, Y. (2005). Identification and linkage mapping of the genes for the putative homeodomain protein (hox1) and the putative pheromone receptor protein homologue (rcb1) in a bipolar basidiomycete, Pholiota nameko. Current genetics, 48(3), 184-194. 33 Bakkeren, G., & Kronstad, J. W. (1994). Linkage of mating-type loci distinguishes bipolar from tetrapolar mating in basidiomycetous smut fungi. Proceedings of the National Academy of Sciences, 91(15), 7085-7089. Casselton, L. A., & Economou, A. (1985). Dikaryon formation. In D. Moore, L. A. Casselton, D. A. Wood & J. C. Frankland (Eds.), Developmental Biology of Higher Fungi (pp. 213-230). Cambridge: Cambridge University Press. De Fine Licht, H. H., Andersen, A., & Aanen, D. K. (2005). Termitomyces sp. associated with the termite Macrotermes natalensis has a heterothallic mating system and multinucleate cells. Mycological research, 109(3), 314-318. Gordon, S. A., & Petersen, R. H. (1991). Mating systems in Marasmius. Mycotaxon, 41(2), 371-386. He, C., Holme, J., & Anthony, J. (2014). SNP genotyping: the KASP assay. In D. Fleury and R. Whitford (Eds.), Crop Breeding: Methods and Protocols (pp. 75-86). New York: Springer. James, T. Y., Srivilai, P., Kües, U., & Vilgalys, R. (2006). Evolution of the bipolar mating system of the mushroom Coprinellus disseminatus from its tetrapolar ancestors involves loss of mating-type-specific pheromone receptor function. Genetics, 172(3), 1877-1891. Kües, U. (2000). Life history and developmental processes in the basidiomycete Coprinus cinereus. Microbiology and molecular biology reviews, 64(2), 316-353. Kües, U., James, T. Y., & Heitman, J. (2011). Mating Type in Basidiomycetes: Unipolar, Bipolar, and Tetrapolar Patterns of Sexuality. In S. Pöggeler & J. Wöstemeyer (Eds.), The Mycota XIV: Evolution of fungi and fungal-like organisms (pp. 97-160). Berlin: Springer. Nieuwenhuis, B. P. S., Billiard, S., Vuilleumier, S., Petit, E., Hood, M. E., & Giraud, T. (2013). Evolution of uni- and bifactorial sexual compatibility systems in fungi. Heredity, 111(6), 445-455. 34 The search for homeodomain genes involved in mating in Termitomyces sp. Abstract In this study, several attempts were made to amplify homologs of homeodomain genes from mating type loci of other basidiomycetes in the Termitomyces symbiont of Macrotermes natalensis using PCR. All attempts failed to yield the expected PCR product. Future studies to identify the mating type loci of Termitomyces may try different primers or PCR conditions, or use alternative strategies, such as searching for genes known to be closely linked to the mating type locus in other basidiomycetes. Introduction Although not much is known about the mating system and mating type loci of Termitomyces, many studies on mating have been done for other basidiomycetes. These studies have identified the genes involved and unravelled the underlying molecular mechanism of the mating type loci. In tetrapolar species, one of the two loci contains genes encoding homeodomain transcription factors of two different types (HD1 and HD2). Generally, an HD1 gene is tightly linked to an HD2 gene and transcribed in opposite direction. The HD1 gene products can form functional heterodimeric transcription factors only with their HD2 partner from a different allele (Kües and Casselton, 1992). These functional transcription factors regulate genes involved in the formation of stable heterokaryons. The other mating type locus contains pheromone precursor (Ph) and pheromone receptor (STE3) genes. As with the HD1 and HD2 genes, these genes are tightly linked and the pheromone products can only activate their receptor partner from different alleles. Activation of the receptor leads to a signal transduction cascade that together with the dimeric homeodomain transcription factors regulates the formation of a stable heterokaryon (Kües et al., 2011). There is usually very little sequence conservation between the different alleles of both mating type loci. This lack of sequence similarity prevents homologous recombination, which would lead to selfcompatibility (Stankis et al., 1992; Specht et al., 1994). In bipolar mating systems, either the HD and Ph-STE loci became closely linked, forming effectively one locus (Bakkeren and Kronstad, 1994; Nieuwenhuis et al., 2013), or one of the two loci lost its function in mating (Aimi et al., 2005; James et al., 2006). Since the mating type genes in related species are known, an alternative to the linkage mapping approach for localising the mating type loci of Termitomyces would be to search for homologs to the genes from other species in the genome sequence of Termitomyces. To prove that the homologues found are really part of the mating type locus, segregation of these genes with mating type would then need to be shown. This strategy has been used successfully in the identification of the mating type genes in several other basidiomycetes (James et al., 2006; Idnurm et al., 2008). In a previous study (Master thesis of Jens Ringelberg. ‘Adaptations to symbiosis in the fungus cultivated by fungus-growing termites’), homologues have already been identified. If both alleles of these genes could be found, a marker could be developed for these alleles. Segregation of this marker with the mating type would be a strong indication that the identified gene is indeed part of the mating type locus. Here, an attempt was made to amplify two of the HD gene homologues using PCR. 35 Methods Template DNA and primer design Template DNA used in the PCR reactions was isolated from the parent heterokaryon and one of the homokaryons from the mapping population described in the first part of this report, using the CTAB method. Primers were designed to amplify a 656 bp fragment from an HD gene homolog on scaffold 259 (HD1) and a 360 bp fragment from an HD gene homolog on scaffold 418 (HD2). For HD1 a forward primer with sequence 5’-TGGTATCGTAAG CCTGCCAC-3’ and a reverse primer with sequence 5’-ACCGAGGAAGCAAGATCGTC-3’ were used. For HD2 a forward primer with sequence 5’-TGTTAATGCTGCCACCCGAT-3’ and a reverse primer with sequence 5’-ACCGGCTCATCGGAAATGTT-3’ were used. PCR reactions In a first attempt, PCR reactions were performed on both samples with both primer sets in 25 μL reaction volume, consisting of 5 μL of GoTaq PCR buffer (Promega), 2 μL 25 mM MgCl2, 1 μL 10 mM dNTPs, 1 μL of each primer, 0.1 μL GoTaq polymerase (Promega), 12.9 μL mili-Q water, and 2 μL ten times diluted template DNA or 2 μL mili-Q water (negative control). The PCR cycles were run on a MyCycler Thermal Cycler (Bio-Rad), with an initial denaturation step of 5 min at 94 °C, followed by 35 PCR cycles (1 min denaturation at 94 °C, 1 min annealing at 55 °C, and 1 min extension at 72 °C), after which a final extension of 10 min at 72 °C was performed. Resulting amplified fragments were examined under UV after running 3 μL of PCR product for one hour at 80 V on a 1% agarose gel with EtBr. To improve the amount of product obtained with the HD2 primer set, the PCR was repeated with 40 cycles instead of 35. All fragments obtained in the first (for HD1) or the second (for HD2) PCR were isolated by running the entire PCR product for 90 min at 60 V on a 1% agarose gel, cutting out all bands under a UV lamp and purifying the DNA using a Nucleospin gel and PCR clean-up kit (MACHERY-NAGEL), according to the instructions provided by the manufacturer. Sanger sequencing of purified fragments was performed by Eurofins. To reduce the amount of non-specific PCR amplification, the initial PCR was repeated again, but this time mili-Q water was used instead of MgCl2 and an annealing temperature of 57 °C was chosen, to improve the stringency of the PCR. Gel-electrophoresis was performed in the same way as the first time. Finally, a touchdown PCR was performed, which may help circumvent the problem of nonspecific amplification (Don et al., 1991). This PCR was performed using the same reaction mixtures as before, as well as reaction mixtures in which the MgCl2 was replaced by mili-Q water. The PCR program consisted of an initial denaturation step of 5 min at 94 °C, followed by 10 touchdown cycles (1 min denaturation at 94 °C, 1 min annealing at 65 °C - 1°C/cycle, and 1 min extension at 72 °C) and 25 normal PCR cycles (1 min denaturation at 94 °C, 1 min annealing at 55 °C, and 1 min extension at 72 °C), after which a final extension of 10 min at 72 °C was performed. Resulting amplified fragments were separated by gel-electrophoresis in the same way as before. Products containing only a single amplified fragment were purified using a Nucleospin gel and PCR clean-up kit (MACHERY-NAGEL), according to the instructions provided by the manufacturer. Purified fragments were sent to Eurofins for sanger sequencing. Results The first attempt to amplify the HD gene homologs yielded multiple fragments for both primer sets (Figure 9A). Since the negative control only showed a single band that was likely caused by primer dimers, at least some of the fragments must have been the result of non36 specific amplification. Repeating the PCR reactions with the HD2 primer set with five additional PCR cycles yielded the same fragments in slightly higher concentrations (Figure 9B), allowing for all fragments to be purified from gel. Subsequent sequencing of purified fragments failed for the middle band of the reaction with homokaryon template DNA and HD1 primers and both middle bands of the PCR with HD2 primers. Of the successfully sequenced fragments, none matched the intended product. A repetition of the PCR under more stringent annealing conditions yielded (apart from primer dimers and a fragment due to template contamination of the negative control) a single fragment that was too short to be the intended product for two of the reactions (Figure 10). Figure 9. PCR products from the first attempts to amplify two HD gene homologs. Primers targeting two different homeodomain genes (HD1 and HD2) were used in PCR reactions with template DNA from the parent heterokaryon (HT), or one of the homokaryons (HM), or without template (-). To obtain more product with the HD2 primers, the PCRs with these primers were repeated with 40 cycles instead of 35 (B). Sizes in base pairs (bp) of the bands from the 100 bp ladder (M) are indicated on the left of each panel. The intended products were 656 bp (HD1) and 360 bp (HD2) long. Figure 10. PCR products from a more stringent reaction than before with primers targeting two HD gene homologs (HD1 and HD2). PCRs were performed with template DNA from the parent heterokaryon (HT), or one of the homokaryons (HM), or without template DNA (-). Sizes of the fragments from the 100 bp ladder (M) in base pairs (bp) are indicated on the left. The intended products were 656 bp (HD1) and 360 bp (HD2) long. 37 A touchdown PCR yielded many different fragments for the HD1 primer set in presence of MgCl2, none of which matched the length of the intended product (Figure 11). Apart from that, it only yielded a single band in one of the negative controls and a single band where the HD1 primer set was used on homokaryon template DNA in absence of MgCl2. Sequencing of the latter fragment revealed that it was not the intended product either. Figure 11. Products from a touchdown PCR with primers targeting two HD gene homologs (HD1 and HD2). PCRs were performed both in presence (MgCl2) and absence (No MgCl2) of MgCl2, on DNA from the parent heterokaryon (HT), on DNA from one of the homokaryons (HM), and in absence of template DNA (-). Sizes in base pairs (bp) of the bands from the 100 bp ladder (M) are indicated on the left. The intended products were 656 bp (HD1) and 360 bp (HD2) long. Discussion None of the attempts to amplify either of the two HD gene homologs were successful. Unfortunately, there are many possible explanations for why this might have happened. Firstly, it is possible that the PCR conditions tested here were simply not optimal for the amplification of the desired fragment. Since there are many different parameters that can be altered to optimise a PCR reaction, it is not possible to rule out this explanation. Secondly, it is possible that the primers used here do not work, possibly because the intended target in the individuals used here does not have the exact same sequence as in the reference genome. In addition, if there are many sequences in the genome that are somewhat similar to the intended target, that may explain the many non-specific amplifications that were found. No such similar sequences were found in the reference genome during primer design, but the reference genome is still quite fragmented and repetitive sequences tend to be difficult to assemble. If one of the two HD gene homologs really is involved in mating, it is also possible that this gene could not be amplified, because the allele from the reference genome was different from the two alleles of the parent heterokaryon. In this case, the alleles from the parent heterokaryon may not be recognised by primers designed for the allele in the reference genome, because sequence conservation at the mating type locus is generally low, even between different alleles of the same species (Stankis et al., 1992; Specht et al., 1994). If this is the case, a different strategy for finding the mating type locus may be needed. A possible strategy may be to look for the mitochondrial intermediate peptidase (MIP) gene, a gene closely linked to the HD gene mating type locus in many basidiomycetes (James et al., 2004). If MIP is also linked to the mating type locus in Termitomyces, cosegregation of this gene and the mating type may be shown and the mating type locus may be identified by looking for HD gene homologs in close proximity to the MIP gene. 38 Acknowledgements I would like to thank Sabine Vreeburg for supervising me, Bertha Koopmanschap and Lennart van de Peppel for help with the PCRs, and Jens Ringelberg for finding the mating type gene homologs and coming over to explain his methods and findings. References Aimi, T., Yoshida, R., Ishikawa, M., Bao, D., & Kitamoto, Y. (2005). Identification and linkage mapping of the genes for the putative homeodomain protein (hox1) and the putative pheromone receptor protein homologue (rcb1) in a bipolar basidiomycete, Pholiota nameko. Current genetics, 48(3), 184-194. Bakkeren, G., & Kronstad, J. W. (1994). Linkage of mating-type loci distinguishes bipolar from tetrapolar mating in basidiomycetous smut fungi. Proceedings of the National Academy of Sciences, 91(15), 7085-7089. Don, R. H., Cox, P. T., Wainwright, B. J., Baker, K., & Mattick, J. S. (1991). 'Touchdown' PCR to circumvent spurious priming during gene amplification. Nucleic acids research, 19(14), 4008. Idnurm, A., Walton, F. J., Floyd, A., & Heitman, J. (2008). Identification of the sex genes in an early diverged fungus. Nature, 451(7175), 193-196. James, T. Y., Kües, U., Rehner, S. A., & Vilgalys, R. (2004). Evolution of the gene encoding mitochondrial intermediate peptidase and its cosegregation with the A mating-type locus of mushroom fungi. Fungal Genetics and Biology, 41(3), 381-390. James, T. Y., Srivilai, P., Kües, U., & Vilgalys, R. (2006). Evolution of the bipolar mating system of the mushroom Coprinellus disseminatus from its tetrapolar ancestors involves loss of mating-type-specific pheromone receptor function. Genetics, 172(3), 1877-1891. Kües, U., & Casselton, L. A. (1992). Homeodomains and regulation of sexual development in basidiomycetes. Trends in Genetics, 8(5), 154-155. Kües, U., James, T. Y., & Heitman, J. (2011). Mating Type in Basidiomycetes: Unipolar, Bipolar, and Tetrapolar Patterns of Sexuality. In S. Pöggeler & J. Wöstemeyer (Eds.), The Mycota XIV: Evolution of fungi and fungal-like organisms (pp. 97-160). Berlin: Springer. Nieuwenhuis, B. P. S., Billiard, S., Vuilleumier, S., Petit, E., Hood, M. E., & Giraud, T. (2013). Evolution of uni- and bifactorial sexual compatibility systems in fungi. Heredity, 111(6), 445-455. Specht, C. A., Stankis, M. M., Novotny, C. P., & Ullrich, R. C. (1994). Mapping the heterogeneous DNA region that determines the nine Aα mating-type specificities of Schizophyllum commune. Genetics, 137(3), 709-714. Stankis, M. M., Specht, C. A., Yang, H., Giasson, L., Ullrich, R. C., & Novotny, C. P. (1992). The Aα mating locus of Schizophyllum commune encodes two dissimilar multiallelic homeodomain proteins. Proceedings of the National Academy of Sciences, 89(15), 7169-7173. 39 Appendix 1. Probability calculations A. Probability of having at least one of all four different mating types in a sample of n homokaryons Let A be the event of having at least one representative of the first mating type, B the event of having at least one of the second, C the event of having at least one of the third, and D the event of having at least one of the fourth. Then 𝐴̅ (not A) is the event of not having a single representative of the first mating type in the sample population. The probability of this event can be calculated as follows: 3 𝑛 𝑃(𝐴̅) = ( ) 4 ̅ ). In addition, the probability of 𝐵̅ given This probability is the same as 𝑃(𝐵̅ ), 𝑃(𝐶̅ ), and 𝑃(𝐷 ̅ 𝐴 can be calculated, because, if it is known that the first mating type is not present in the sample, the probability of drawing one of the other three mating types is one in three for every draw. Therefore: 2 𝑛 𝑃(𝐵̅|𝐴̅) = ( ) 3 Using the definition of a conditional probability, the probability of 𝐴̅ and 𝐵̅ can be calculated: 2 𝑛 3 𝑛 1 𝑛 𝑃(𝐴̅ ∩ 𝐵̅ ) = 𝑃(𝐵̅|𝐴̅) ∙ 𝑃(𝐴̅) = ( ) ∙ ( ) = ( ) 3 4 2 The probability of having at least one of all four different mating types can be written as the probability of A, B, C and D occurring at the same time: 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶 ∩ 𝐷) Since the probability of “having at least one of each” is the same as the probability of “not having not one of the first, or the second or the third or the fourth”, this formula can be rewritten as follows: ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ ̅ ∪ 𝐵̅ ∪ 𝐶̅ ∪ 𝐷 ̅ ) = 1 − 𝑃(𝐴̅ ∪ 𝐵̅ ∪ 𝐶̅ ∪ 𝐷 ̅) 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶 ∩ 𝐷) = 𝑃(𝐴 ̅ can be found by adding the probabilities of the The probability of the union of 𝐴̅, 𝐵̅, 𝐶̅ , and 𝐷 individual events, subtracting the probabilities of all combinations of intersections between the events, adding the probabilities of all combinations of triple intersections and subtracting the probability of the quadruple intersection: ̅) 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶 ∩ 𝐷) = 1 − 𝑃(𝐴̅ ∪ 𝐵̅ ∪ 𝐶̅ ∪ 𝐷 ̅ ) − 𝑃(𝐴̅ ∩ 𝐵̅ ) − 𝑃(𝐴̅ ∩ 𝐶̅ ) − 𝑃(𝐴̅ ∩ 𝐷 ̅ ) − 𝑃(𝐵̅ ∩ 𝐶̅ ) = 1 − (𝑃(𝐴̅) + 𝑃(𝐵̅ ) + 𝑃(𝐶̅ ) + 𝑃(𝐷 ̅ ) − 𝑃(𝐶̅ ∩ 𝐷 ̅ ) + 𝑃(𝐴̅ ∩ 𝐵̅ ∩ 𝐶̅ ) + 𝑃(𝐴̅ ∩ 𝐵̅ ∩ 𝐷 ̅ ) + 𝑃(𝐴̅ ∩ 𝐶̅ ∩ 𝐷 ̅) − 𝑃(𝐵̅ ∩ 𝐷 ̅ ̅ ̅ ̅ ̅ ̅ ̅ + 𝑃(𝐵 ∩ 𝐶 ∩ 𝐷 ) − 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶 ∩ 𝐷)) Since all four mating types are equally as likely, this formula can be simplified: ̅) 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶 ∩ 𝐷) = 1 − 𝑃(𝐴̅ ∪ 𝐵̅ ∪ 𝐶̅ ∪ 𝐷 ̅ ̅ ̅ )) = 1 − (4 ∙ 𝑃(𝐴) − 6 ∙ 𝑃(𝐴 ∩ 𝐵̅ ) + 4 ∙ 𝑃(𝐴̅ ∩ 𝐵̅ ∩ 𝐶̅ ) − 𝑃(𝐴̅ ∩ 𝐵̅ ∩ 𝐶̅ ∩ 𝐷 ̅ ) = 0, because In this formula, 𝑃(𝐴̅) and 𝑃(𝐴̅ ∩ 𝐵̅ ) are already known and 𝑃(𝐴̅ ∩ 𝐵̅ ∩ 𝐶̅ ∩ 𝐷 having none of the four mating types is only possible if the sample size is zero. 𝑃(𝐴̅ ∩ 𝐵̅ ∩ 𝐶̅ ) is the same as the probability of getting only the fourth mating type, which can be calculated as follows: 1 𝑛 𝑃(𝐴̅ ∩ 𝐵̅ ∩ 𝐶̅ ) = 𝑃(𝑂𝑛𝑙𝑦 𝑚𝑎𝑡𝑖𝑛𝑔 𝑡𝑦𝑝𝑒 4) = ( ) 4 40 Substituting these expressions in the overall formula results in an expression for the probability of having each of the four mating type represented at least once as a function of the sample size: 3 𝑛 1 𝑛 1 𝑛 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶 ∩ 𝐷) = 1 − (4 ∙ ( ) − 6 ∙ ( ) + 4 ∙ ( ) ) 4 2 4 Using this formula, it can be calculated that a sample size of at least 16 is needed to have at least 95% certainty that all four mating types will be represented at least once. For a sample size of 20 this probability is more than 98%. B. Probability of having at least three out of four different mating types in a sample of n homokaryons The probability that we are looking for can be written as follows: 𝑃((𝐴 ∩ 𝐵 ∩ 𝐶) ∪ (𝐴 ∩ 𝐵 ∩ 𝐷) ∪ (𝐴 ∩ 𝐶 ∩ 𝐷) ∪ (𝐵 ∩ 𝐶 ∩ 𝐷)) This can be rewritten by adding the probabilities of the four parts in brackets and subtracting three times their mutual intersection as can be deduced from the Venn diagram (Figure 12). Figure 12. Venn diagram showing all possible combinations of four different sets. The union of events (A ∩ B ∩ C), (A ∩ B ∩ D), (A ∩ C ∩ D), and (B ∩ C ∩ D) is indicated in red. Because all four mating types are equally as likely, this probability can be written as: 4 ∙ 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶) − 3 ∙ 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶 ∩ 𝐷) 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶 ∩ 𝐷) has already been found in appendix 1A and 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶) can be calculated in much the same way: 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶) = 1 − 𝑃(𝐴̅ ∪ 𝐵̅ ∪ 𝐶̅ ) = 1 − (𝑃(𝐴̅) + 𝑃(𝐵̅) + 𝑃(𝐶̅ ) − 𝑃(𝐴̅ ∩ 𝐵̅ ) − 𝑃(𝐴̅ ∩ 𝐶̅ ) − 𝑃(𝐵̅ ∩ 𝐶̅ ) + 𝑃(𝐴̅ ∩ 𝐵̅ ∩ 𝐶̅ )) = 1 − (3 ∙ 𝑃(𝐴̅) − 3 ∙ 𝑃(𝐴̅ ∩ 𝐵̅ ) + 𝑃(𝐴̅ ∩ 𝐵̅ ∩ 𝐶̅ )) 3 𝑛 1 𝑛 1 𝑛 = 1 − (3 ∙ ( ) − 3 ∙ ( ) + ( ) ) 4 2 4 Taken together, the result is an expression for the probability of having at least three of the four mating types represented in a sample of size n: 𝑃((𝐴 ∩ 𝐵 ∩ 𝐶) ∪ (𝐴 ∩ 𝐵 ∩ 𝐷) ∪ (𝐴 ∩ 𝐶 ∩ 𝐷) ∪ (𝐵 ∩ 𝐶 ∩ 𝐷)) 3 𝑛 1 𝑛 1 𝑛 3 𝑛 1 𝑛 1 𝑛 = 4 ∙ (1 − (3 ∙ ( ) − 3 ∙ ( ) + ( ) )) − 3 ∙ (1 − (4 ∙ ( ) − 6 ∙ ( ) + 4 ∙ ( ) )) 4 2 4 4 2 4 𝑛 𝑛 1 1 =1−6∙( ) +8∙( ) 2 4 41 Using this formula, it can be calculated that a sample size of at least 7 is needed to have at least 95% certainty that at least three of the four mating types will be represented at least once. If a sample size of 10 is used, this probability is more than 99%. 42
© Copyright 2026 Paperzz