University of Colorado, Boulder CU Scholar Ecology & Evolutionary Biology Graduate Theses & Dissertations Ecology & Evolutionary Biology Spring 1-1-2015 Genomics of Adaptation and Diversification Ryan C. Lynch University of Colorado Boulder, [email protected] Follow this and additional works at: http://scholar.colorado.edu/ebio_gradetds Part of the Biodiversity Commons, Bioinformatics Commons, Desert Ecology Commons, Evolution Commons, and the Genomics Commons Recommended Citation Lynch, Ryan C., "Genomics of Adaptation and Diversification" (2015). Ecology & Evolutionary Biology Graduate Theses & Dissertations. 68. http://scholar.colorado.edu/ebio_gradetds/68 This Dissertation is brought to you for free and open access by Ecology & Evolutionary Biology at CU Scholar. It has been accepted for inclusion in Ecology & Evolutionary Biology Graduate Theses & Dissertations by an authorized administrator of CU Scholar. For more information, please contact [email protected]. GENOMICS OF ADAPTATION AND DIVERSIFICATION by Ryan Lynch B.A., University of Colorado, 2004 A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirement for the degree of Doctor of Philosophy Department of Ecology and Evolutionary Biology 2015 This thesis entitled: Genomics of Adaptation and Diversification written by Ryan Lynch has been approved for the Department of Ecology and Evolutionary Biology Dr. Nolan Kane Dr. Noah Fierer Date The final copy of this thesis has been examined by the signatories, and we Find that both the content and the form meet acceptable presentation standards Of scholarly work in the above mentioned discipline. Lynch, Ryan (Ph.D., Department of Ecology and Evolutionary Biology) Genomics of Adaptation and Diversification Thesis directed by Assistant Professor Nolan Kane ABSTRACT Recent and ongoing advances in DNA sequencing, coupled with computational developments, have opened new frontiers for understanding the structure and function of biological diversity. For my dissertation, I first addressed questions related to a low diversity community of un-cultured Atacama Desert bacteria, using a variety of sequencing approaches. The principal goal was to infer what metabolic traits these bacteria possess, which allows them to survive harsh desert conditions that other bacteria could not. Through detailed genome assembly, annotation and comparative analyses I developed a working hypothesis that trace gas metabolism (H2, CO and several organic C1 compounds) may sustain these microorganisms in their habitat, although many aspects of their metabolic capacity remain undetermined. In the second component of my dissertation, I used whole genome sequencing of diverse Cannabis accessions to infer the phylogenetic lineages of this genus. These findings show support for at least one major clade of hemp and two clades of drug-type Cannabis, as well as hybrid origins of many commercially available modern drug-type cultivars. The levels of divergence among these clades suggest multiple independent domestication events may have occurred, though extensive breeding for hemp and drug-type strains from a single domestication origin cannot be ruled out at present, due to the lack of known true wild populations. This work has relevance for future cultivar development, but also reflects long forgotten events that have occurred during 6,000 years of the iii Cannabis-human relationship. Together, my dissertation demonstrates several ways that DNA sequencing technology and analytical approaches can address questions in ecology and evolutionary biology, but also highlights the limitations of these methods and underscores the importance of complementary non-sequencing approaches. iv ACKNOWLEDGMENTS I thank my committee members Dr. Nolan Kane, Dr. Noah Fierer, Dr. Andy Martin, Dr. Diana Nemergut and Dr. Erin Trip for their guidance, support and inspiration through my doctoral work and career development. The time spent getting know and work with each of you has been a privilege I will never forget. I also thank former committee member Dr. Patrik Nosil. Dr. Mike Robeson, Jack Darcy, Jon Leff and Dr. Daniela Vergara have also been great friends and collaborators through my time here in the EBIO department. I need to thank Dr. Chris Jung, Dr. Collin Becker and Ryan Artale for all the years of adventures in Colorado and beyond, you can’t work all the time if you want to get anything done. However the original inspiration to follow my curiosity is my father, Dr. James F. Lynch (PhD, CU Mathematics 1977)--who himself has never stopped in the pursuit of understanding biological complexity. v CONTENTS 1 INTRODUCTION…………………………………………………………………………1 2 THE POTENTIAL FOR MICROBIAL LIFE IN THE HIGHEST-ELEVATION (>6000 M.A.S.L.) MINERAL SOILS OF THE ATACAMA REGION 2.1 Abstract…………………………………………………………………………5 2.2 Introduction……………………………………………………………………..6 2.3 Methods……….………………………………………………………………...9 2.4 Results………………………………………………………………………….11 2.5 Discussion……………………………………………………………………...17 2.6 References………………………………………………………………….......22 3 METAGENOMIC EVIDENCE FOR METABOLISM OF TRACE ATMOSPHERIC GASES BY HIGH-ELEVATION DESERT ACTINOBACTERIA 3.1 Abstract…………………………………………………………………………29 3.2 Introduction……………………………………………………………………..31 3.3 Materials and Methods………………………………………………………….32 3.4 Results…………………………………………………………………………..39 3.5 Discussion………………………………………….……………………………47 3.6 References………………………………………………………………………58 4 GENOMIC DIVERSITY IN CANNABIS 4.1 Introduction………………………………………………………………….......66 4.2 Materials and Methods…………………………………………………………..70 4.3 Results and Discussion…………………………………………………………..73 5 SUMMARY….………………………………………………………………………....….81 6 BIBLIOGRAPHY………………………………………………………...…………….….86 7 APPENDIX………………………………………………………………………….……..92 vi CHAPTER 1 INTRODUCTION I started my Ph.D. at a time of optimistic rhetoric for DNA sequencing and genomic science (Varshney et al., 2009). In the early years of DNA sequencing, starting with the method of Sanger et al. (1978), data generation was clearly a major limiter. Through the 1980s entire Ph.D. dissertations were sometimes based on sequencing only a small gene (Dovichi and Zhang, 2000). Driven by the promise of revolutions in human medicine, various new high-throughput sequencing platforms came and went through the market during the 1990s and 2000s (Shendure et al., 2004). The early years of high-throughput sequencing were limited to a few specialized and well-funded labs equipped with the required technical expertise. But the early insights (and papers) from these efforts paved the way for the broad democratization of high-throughput sequencing technology and analysis tools. Software development for analysis of large sequence datasets has continued to accelerate and many new training programs for computationally oriented biologists were launched in the 2000s. By 2010, when I started towards a Ph.D., highthroughput sequencing techniques and buzzwords were already dominating many areas of biological research and discourse. Now nearly five years later, these trends continue, with affordable computing power and continuous improvements to software facilitating the analysis of terabase-scale population genomic datasets. Neither data quantity nor computer horsepower can be currently considered a limiting factor for developing biological insight from DNA sequence data, however other factors appear to now constrain our ability to penetrate biological complexity. 1 The connection between genotype and phenotype remains unclear in many situations (Tomasetti and Vogelstein, 2014), and various projects and techniques have come under fire for over interpreting genomic evidence (Graur et al., 2013; Hanage, 2014). Improving the statistical framework used to identify neutral and adaptive diversity across large populations of whole genome scale datasets remains the major challenge facing the field of ecological and evolutionary genomics (Leinonen et al., 2013; Flaxman et al., 2014). Understanding the current analytical methods for interpreting genomic data, and their current limitations, was a major objective of my Ph.D. work, and though these efforts span many projects and study systems, the question remained the same: how can DNA sequence data be used to understand biological diversity and function? The microbial ecology portion of my work stared from one simple observation: high elevation Atacama Desert volcano samples host a different, and simpler, community of bacteria compared to anything else. Even without statistical analyses I could simply see the difference in the DNA sequence data from my first PCR based study of this gravelly material. What about these bacteria makes them able to survive conditions that other bacteria could not? From my naive observations as a lab technician, I decided to address this question using genomic techniques. My rationale being if I could determine the metabolic traits these bacteria possess, then I could compare these to traits which other related bacteria possess, and thus develop a picture of how high-elevation bacteria live in such an environment. Because almost nothing is known about this desert mountaintop habitat, the trait based inferences could be also be used to understand how this environment shapes the community and its resident organisms. What environmental stressors limited life here so drastically? Thus my Ph.D. started from five one- 2 gallon zip lock bags that were partially filled with crumbled dusty volcanic debris--I knew little about the organisms present in the rocky matrix, but even less about their far-away habitat. To address these questions I present first my initial biogeochemical descriptions of the Atacama Llullaillaco Volcano sites, as well as a brief amplicon based assay of the bacterial, microbial eukaryotic and archaeal communities (Chapter 2). I also introduce the bacterial trace gas metabolism as an energy source hypothesis for this microbial community, and present limited support as partial carbon monoxide dehydrogenase genes that were also PCR amplified from bulk soil genomic DNA extractions. Expanding the breadth and depth of genomic analyses in Chapter 3, I analyze full shotgun metagenomic datasets, some derived from bulk Llullaillaco Volcano soil genomic DNA extractions. These analyses include: statistical tests of the community structure, gene functional category comparisons to other diverse microbial ecosystems, de novo genome assembly of high abundance community members, metabolic pathway analysis and selection detection across genomes through synonymous to nonsynonymous mutation rate ratio estimations in protein coding genes. Ultimately these efforts produced details regarding some aspects of the metabolic potential of Llullaillaco Volcano bacteria, and supported the further development of the trace gas metabolism hypothesis to include H2, CO and several organic C1 compounds. However this work fails to fully answer the question of why these organisms were found at 21,000 feet elevation above the driest desert on Earth, rather than any of the countless alternative bacterial types or species known. Overall this work highlights both the strengths and the weakness of DNA sequence based biology and ecology, and reinforces the importance of study design and use of complementary nonsequencing approaches. 3 The Cannabis diversity and evolution portion of my work represents a step towards realizing my long term goal of utilizing genomic methods to understand and manipulate biological traits. Even though Cannabis is one of earliest domesticated crops and produces the most widely used illicit drug in the world, scientific study has been limited to only a few research groups scattered around the globe. Cannabis occupies a unique cultural, political and scientific position. It is a polarizing but charismatic plant, both widely dismissed as harmful, and praised for its curative potential. Although the future legal status of Cannabis remains hazy, it is here to stay in one form or another. The long history of human domestication, dispersal to every continent and divergent section for both drug and hemp types makes for a rare set of scientific opportunities. In Chapter 4 I present a whole genome re-mapping analysis of 43 Cannabis accessions from geographically and morphologically diverse hemp and drug-type strains. Through analyzing over 10 million single nucleotide polymorphisms (SNPs) across the single copy portion of the genome I address basic questions related to Cannabis diversity and population structure: how many Cannabis lineages are contained within the colloquially used terms sativa and indica? And which genomic regions from modern cultivars originate from different known landrace populations? 4 CHAPTER 2 THE POTENTIAL FOR MICROBIAL LIFE IN THE HIGHEST-ELEVATION (>6000 M.A.S.L.) MINERAL SOILS OF THE ATACAMA REGION1 2.1 Abstract Here we present the first culture-independent microbiological and biogeochemical study of the mineral soils from 6000 m above sea level (m.a.s.l.) on some the highest volcanoes in the Atacama region of Argentina and Chile. These soils experience some of the harshest environmental conditions on Earth including daily temperature fluctuations across the freezing point (with an amplitude of up to 70 C) and intense solar radiation. Soil carbon and water levels are among the lowest yet measured for a terrestrial ecosystem and enzyme activity was near or below detection limits for all microbial enzymes measured. The soil microbial communities were among the simplest yet studied in a terrestrial environment and contained novel Bacteria and Fungi and only one Archaeal phylotype. No photosynthetic organisms were detected but several of the dominant bacterial phylotypes are related to organisms involved in carbon monoxide oxidation on other volcanoes (e.g., Pseudonocardia and Ktedonobacter spp.). Focused studies of a gene responsible for carbon monoxide oxidation, the large subunit of carbon monoxide dehydrogenase (coxL of CODH), revealed several novel lineages and a broad diversity of coxL genes. Overall our results suggest that a unique microbial community, sustained by diffuse atmospheric and volcanic gases, is barely functioning on these volcanoes, which represent the highest terrestrial ecosystems yet studied. 1 Published as: Lynch,R.C., King,A.J., Farías,M.E., Sowell,P., Vitry,C., and Schmidt,S. K. (2012). The potential for microbial life in the highest elevation (>6000 m.a.s.l.) mineral soils of the Atacama region. J. Geophys.Res. 117, G02028. 5 2.2 Introduction Studies of microbial life in extremely dry environments have focused mostly on low elevation areas such as the Dry Valleys of Antarctica [Cary et al., 2010] and the Atacama Desert [Connon et al., 2007; Lester et al., 2007]. Due to its status as the driest desert of the planet, the lower elevation regions of the Atacama have served as a natural testing ground for the dry-limit of microbial life [Navarro- González et al., 2003]. In the hyper-arid core of the Atacama, mean annual rainfall is less than 5 mm/year (with decadal periods of no rainfall), which appears to be below the threshold of water availability required to support soil phototrophic life [WarrenRhodes et al., 2006]. At slightly higher elevations in the Atacama region precipitation allows for sparse vegetation in a zone between 3000 and 4900 m.a.s.l. [Arroyo et al., 1988; Richter and Schmidt, 2002]. At elevations above 5000 m.a.s.l., extreme conditions create a Mars-like landscape (totally devoid of plant life) that receives intermittent snowfall, most of which sublimates back to the atmosphere [Richter and Schmidt, 2002]. Very little work has been done on un-vegetated soils above 5000 m.a.s.l. [Schmidt et al., 2009, 2011], especially on the large stratovolcanoes that dot the Atacama region [Costello et al. 2009, Halloy 1991]. Volcán Llullaillaco (6739 m.a.s.l.) and Volcán Socompa (6051 m.a.s.l.) are part of a chain of stratovolcanos that comprise the Andean Central Volcanic Zone [Stern, 2004], which rise above the “true desert” zone [Arroyo et al., 1988] and altiplano of the Atacama region. Although these volcanoes receive snowfall, they are at present largely un-glaciated [Richards and Villeneuve, 2001], making their upper reaches some of the highest-elevation exposed soil and lithic environments on Earth (Figure 2.1). 6 7 Figure 2.1. Photographs of the Volcán Llullaillaco (6739 m.a.s.l.) taken during the February 2009 expedition. (a) The upper plant-free zone as viewed from an elevation of ∼5400 m.a.s.l. during the climb (from the east). (b) Southeastward view toward Cerro Rosado from the high sample site at 6330 m.a.s.l. These soils were deposited at least 48,000 years ago [Richards and Villeneuve, 2001] but are still unvegetated. (Photo credit: Preston Sowell) In spite of the interest in the Atacama region [Connon et al., 2007; Lester et al., 2007; Warren-Rhodes et al., 2006], the upper band of unvegetated mineral soil and rock that extends from 5000 to over 6700 m.a.s.l. has received little attention except from archeologists [Wilson et al., 2007]. Initial exploration of the upper unvegetated zone on Volcán Socompa in 2005 revealed a low diversity microbial community, and low levels of organic matter (0.03%) in the mineral soils at 5235 m.a.s.l. [Costello et al., 2009]. The present study was undertaken to determine if soils at significantly higher elevations (>6000 m.a.s.l.) in this region are similarly depauperate, or if the increased snowfall at higher elevations counterbalances the harsh conditions in a way that increases either diversity or activity of microbial communities. Here we report on the results of the first biogeochemical and cultivation–independent exploration of the potential for microbial activity in mineral soils above 6000 m.a.s.l. in the Andean Central Volcanic Zone. These data suggest that a low diversity, low energy ecosystem of unique and previously uncharacterized microbes may function during periodic episodes of favorable conditions. Although oxygenic phototrophs are absent from all samples, we suggest the ecosystem has at most two trophic levels and is subsisting on both aeolian organic carbon inputs as well as chemoautotrophic CO2 fixation and trace gas oxidation. 8 2.3 Methods Soil samples and data logger data were collected during the austral summer in mid-February 2009 at elevations ranging from 5500 to 6330 m.a.s.l. on Volcán Socompa and Volcán Llullaillaco. Soils used for biogeochemical and microbial diversity measurements were collected on February 14 from six spatially separated samples (to four cm depth) in a semi-nested sampling scheme [King et al., 2008, 2010b] at elevations of 6034 m.a.s.l. and 6330 m.a.s.l. on Volcán Llullaillaco. Soil temperatures at four cm depth and the soil surface were recorded every 15 min at two sites on Volcán Socompa and Volcán Llullaillaco using HOBO Pendant data loggers (UA-002-08, Onset Computer Corp., Bourne, Mass.). The data from the loggers were also used to calculate sub-zero rates of soil cooling, a parameter that can profoundly affect microbial survival in soils [Henry, 2007; Lipson et al., 2000; Schmidt et al., 2009]. Rates of subzero soil cooling were estimated by using linear regressions of soil temperatures after soils dropped below 0 C. The rates obtained were deemed reliable if the R2 value from the regression was greater than 0.96 from at least five data points during the linear cooling period. This sampling expedition was part of a broader global study of biodiversity at high elevation sites in the Andes, Rockies, and Himalayan mountain ranges and more information about sampling protocol and sites has been published previously [Freeman et al., 2009; King et al., 2010a, 2010b; Schmidt et al., 2011]. Dissolved organic carbon (DOC) and nitrogen (DON) and microbial biomass carbon (MBC) and nitrogen (MBN) were determined using a Shimadzu TOC-V CSN Analyzer with a previously described protocol [King et al., 2008]. Total nitrogen (N) and carbon (C) measurements were performed according to the method of Nemergut et al. [2007], wherein soils were dried and sieved to 2 mm then ground to a fine powder and measured for percent C and N 9 by mass using a Carlo-Erba combustion-reduction elemental analyzer (CE Elantech, USA). Soil water content was measured gravimetrically as the difference between the weight of the soils at field conditions and the weight after drying at 80 C for 48 h. Soil pH was determined using a glass pH probe (Oakion Instruments, Vernon Hills, IL, USA) in soil slurries consisting of 2 g soil and 2 ml of water that were shaken for 1 h. Levels of common microbially produced extracellular enzymes were also measured using standard techniques adapted for cold soils as described by King et al. [2008, 2010b]. Enzyme activities assayed were: N-acetylglucosaminase, cellulase (b-glucosidase), a-glucosidase, b-xylase, cellobiosidase, leucine aminopeptidase and phosphatase. For each sample, 2 g of soil was added to 150 ml of buffer (adjusted to the pH of the soil) and homogenized at 3000 rpm for 1 min using an Ultra-Turrax homogenizer (IKA Works Inc., USA). Soil slurries were incubated for 20 h at 14_C using the controls, fluorescent substrates, and volumes as described in King et al. [2008]. DNA was extracted from the soils using the MO BIO Power Soil bead beating kit (MO BIO Laboratories, Inc., Carlsbad, CA, USA). Community small-subunit ribosomal DNA was PCR amplified using the 18S/16S primers 4Fa-short (5-ATCCGGTTGATCCTGC-3) and 1492R (5-GGTTACCTTGTTACGACTT-3), and 16S primers 8F (5-AGAGTTTGATCCTGGCTCAG3) and 1391R (5-GACGGGCGGTGWGTRCA-3). The large subunit of the carbon monoxide dehydrogenase gene (coxL) was targeted for PCR amplification using the primers Ompf (5GGCGGCTTYGGSAASAAGGT- 3) and O/Br (5-YTCGAYGATCATCGGRTTGA- 3) [King, 2003b]. Amplicons were then gel purified, and cloned as described elsewhere [Schmidt et al., 2011]. Cell pellets were sent to Functional Biosciences (Madison, WI, USA) for plasmid extraction, and bidirectional Sanger sequencing. Sequences were vector trimmed and assembled into contigs using SEQUENCHER 4.6 (Gene Codes Co., Ann Arbor, MI, USA). For the 10 ribosomal small subunit data, the full contigs were then aligned with the SINA aligner tool [Pruesse et al., 2007]. The parsimony insertion function of ARB (5.1) was then utilized to determine the nearest relatives in the Silva 108 database, which formed the basis for taxonomy assignment [Ludwig et al., 2004]. An iterative process of calculating neighbor joining trees using the Felsenstein correction and a 35% minimum identity per residue filter in ARB, with National Center for Biotechnology Information (NCBI) web-based BLASTN homology tests [Altschul et al., 1990], was used to refine our sequence classifications. We then clustered our sequences with the select database guide sequences into 97% identity operational taxonomic units (OTUs) using the average neighbor algorithm implementation in mothur [Schloss et al., 2009]. For the coxL data set, our multiple sequence alignment (MSA) of translated amino acids was built in ClustalX (2.0) [Larkin et al., 2007], and anchored around the essential active site motifs. This ‘Form I’ (OMP) motif (AYXCSFR) is 100% conserved in the MSA, which seems to be restricted to functional coxL genes [King and Weber, 2007]. A final uncorrected neighbor-joining tree with 1000 bootstrap replicates was calculated in ClustalX (2.0) after top scoring NCBI BLASTP hits were added into the MSA. Comparisons of microbial community beta diversity among sites was done using weighted Unifrac analysis [Lozupone and Knight, 2005]. 2.4 Results During our expedition in February of 2009 we were able to deploy data loggers at two high elevation sites on Volcán Socompa and Volcán Llullaillaco to gain a preliminary indication of how soil temperatures vary on a diurnal basis. Due to time and weather restrictions, these data loggers were deployed at the highest camps on each volcano at elevations slightly lower than the sampling sites. Nonetheless, the data paint an extraordinary picture of the temperature 11 fluctuations faced by life in these high elevation soils. Temperatures on Socompa volcano at 5500 m.a.s.l. dropped to overnight lows of -10 C and reached highs of 56 C by midday on the soil surface (Figure 2.2a). On Volcán Llullaillaco we were only able to deploy data loggers for 16 h at 5737 m, but they show a similar trajectory of subfreezing overnight lows (-15 C) followed by a rapid rise in temperatures in the morning (Figure 2.2b). Linear rates of subzero temperature decline (at 4 cm depth) were 1.15 C h-1 (R2 = 0.996) and 1.50 Ch-1 (R2 = 0.991), on Volcán Socompa and Volcán Llullaillaco, respectively. Figure 2.2 Diurnal temperature fluctuations on Volcán Socompa and Volcán Llullaillaco. (a) Surface soil temperatures at base camp (5500 m.a.s.l.) on Socompa Volcano ranged from a low of −10.2°C to a high of 56.2°C with an amplitude of 66.4°C. Temperature extremes at 4 cm depth were dampened with an amplitude of 48.2°C. (b) Due to an incoming storm, we were unable to capture the full diurnal temperature cycle at high camp on Volcán Llullaillaco (5737 m.a.s.l.), but nighttime lows reached −14.5°C and −9.4°C at the surface and 4 cm depth, respectively. Our analyses demonstrate for the first time the truly oligotrophic status of these soils, with levels of carbon similar to other almost lifeless soils. In addition, total nitrogen values were below detection limits in all samples, indicating that nitrogen levels in these soils are less than 25 mg N g soil-1. Likewise, microbial biomass C and N were extremely low (Table 2.1). Water 12 levels in the soils at the time of sampling were also extremely low, and the soils were quite acidic (Table 2.1). Levels of common microbial extracellular enzymes were also mostly undetectable despite the fact that methods were employed to increase the sensitivity of these measurements for cold oligotrophic soils. Our comprehensive 16S and 18S targeted surveys of the soil community revealed a microbial community noteworthy for overall low diversity and the phylogenetic uniqueness of the component community members (Figure 2.3). The species richness Chao1 estimate for bacteria, pooled from five sample sites is 95 OTUs (97% identity). Nearly 75% of that total bacterial diversity is contained within just four OTUs, and our sampling effort recovered representatives from only nine bacterial phyla. All between site community beta diversity tests (weighted Unifrac) were significantly different at both 5 m and 300 m scales (P < 0.05). Of particular interest is the shift in dominance of a Pseudonocardia-like OTU at our lower elevation (6034 m.a.s.l.) sites, to the dominance of a relative of the Ktedonobacter genus at the 6330 m.a.s.l. sites. 13 Figure 2.3 Normalized rank-abundance plot for the three domains of life. Each bar represents a single operational taxonomic unit (OTU), and vertical bar height is proportional to the total number of sequences for all OTUs. OTUs were assembled using a 3% maximum difference as determined by the average-neighbor algorithm. Values in parentheses following OTU names are the uncorrected genetic distance to the nearest National Center for Biotechnology Information (NCBI) database match. Number of sequences: 512 Bacteria, 318 Eukaryotes, and 81 Archaea. Low Site High Site M.A.S.L. 6034 6330 UTM coordinates 19J 05481687266202 19J 05476147266157 Percent water 0.24 (0.1) 0.25 (0.2) TOC (%) 0.017 (0.006) 0.005 (0.005) TON (%) <d.l.b <d.l. Extractable DOC (μg g dry soil−1) 1.3 (0.9) 2.0 (1.2) Extractable TDN (μg g dry soil−1) 0.7 (0.4) 0.6 (0.5) pH 4.2 (0.03) 4.6 (0.1) Microbial biomass C (μg g−1) 30.61 (30.61) 58.07 (24.6) Microbial biomass N (μg g−1) 2.24 (0.52) 1.15 (0.9) BG (nmol h−1 g−1) 0.24 (0.1) 0.25 (0.2) NAG (nmol h−1 g−1) 0.02 (0.02) 0.05 (0.006) PHO (nmol h−1 g−1) 0.17 (0.16) 0.26 (0.09) Table 2.1. Biogeochemical Properties of the High-Elevation Mineral Soils of Volcán Llullaillacoa Enzyme activities are abbreviated as BG for β-glucosidase, NAG for N-aceytalglucosaminase, and PHO for phosphatase. Activity of α-glucosidase,β-xylase, cellobiosidase, and leucine aminopeptidase was below detection limit. All values are the means of at least 3 replicates with the standard error of the mean in parentheses. b Below detection limit. a Eukaryotic diversity was restricted to only seven 18S OTUs (97% identity) and 92% of the total sampling effort (>300 sequences) revealed a single novel OTU. This dominant OTU is most closely related to endolithic and xerotolerant members of the Cryptococcus-albidus clade 14 (Figure 2.4). Archaeal diversity was limited to just one 16S OTU across all sites, which is most closely related to the obligate oligotrophs of the phylum Thaumarchaeota. Figure 2.4 Bayesian consensus tree of basidiomycetous yeasts from Llullaillaco and Socompa volcanoes, with established representatives of the Cryptococcus clades shown for reference. The length of the rectangles is relative to the abundance of sequences within each 1% OTU, with the smallest rectangle equaling one sequence. Asterisks indicate node support of >70% posterior probability. Data are from Schmidt et al. [2012] Absent from our data are any known chlorophyll containing clades of bacteria or algae. The lack of traditional photoautotrophs was partially confirmed by the lack of observable autofluorescence (680 nm) using the same methods that detected very low levels of chlorophyll containing algae and cyanobacteria in high elevation soils of the Himalayas. Given the lack of evidence for phototrophic primary production, we began a preliminary exploration of other means of carbon and energy acquisition in these soils. Sequences of the large subunit of the carbon monoxide dehydrogenase gene (coxL of CODH) from Volcán Llullaillaco soils are at minimum 5% different, and in one instance up to 22% different, compared to their nearest 15 database relatives (Figure 2.5). These nearest relatives are for the most part uncultured representatives from other oligotrophic volcanic deposits and cultured Actinobacteria (rather than common CO oxidizing Proteobacteria), which re-enforces the general phylogenetic signal from our SSU rDNA data. Additionally, despite these large genetic distances, we are confident in the coxL homology of these sequences, due to the 100% conservation of the primary catalytic site motif (as well as four other separate sites) that contact an essential molybdopterin cytosine dinucleotide cofactor. Figure 2.5 Carbon monoxide can be oxidized by the metalloprotein carbon monoxide dehydrogenase (CODH). Here we show an uncorrected neighboring-joining tree of the translated proteins of the catalytic coxL sequences that were PCR amplified from high-elevation Volcán 16 Llullaillaco soils (with the nearest NCBI database relatives). These sequences may represent the first reported instances of psychrotolerant or psychrophilic CO oxidizers. 2.5 Discussion Taken together, these results suggest that conditions in the high-mountain mineral soils above 6000 m.a.s.l. are more restrictive to life than nearly anywhere on the surface of Earth. Despite potentially higher water availability due to orographic snowfall compared to the lower elevation portions of the Atacama, high-elevations pose additional challenges to life. The thinness of the atmosphere exposes any surface life to severe solar radiation [Farías et al., 2009], and massive daily temperature cycles across the freezing point (Figure 2.2) UV exposure for only 1 day, combined with extreme aridity, has been previously shown to sterilize both monolayers of Chroococcidiopsis, as well as dormant Bacillus endospores at just 1000 m.a.s.l. in the Atacama [Cockell et al., 2008]. Given that UV intensity increases 4– 10% every 1000 m in elevation gained [Cabrera et al., 1995], our sites above 6000 m.a.s.l. may be subjected to the most UV exposure of any terrestrial soil environment studied to date. Daily temperature cycling across the freezing point is considered a key challenge that severely limits net primary productivity in the similarly extreme Dry Valleys of Antarctica [Cary et al., 2010]. At Dry Valley sites temperatures vary more than 20 C per day during the austral summer, resulting in annual net primary productivity (NPP) in the 1 - 20 g carbon m-2 yr-1 range [Aislabie et al., 2006; Novis et al., 2007]. During our 2009 expedition to the mountains of the Atacama region, mineral soils at 5500 m.a.s.l. experienced triple the diurnal temperature fluctuations of Antarctic Dry Valley soils (Figure 2.2). While winter Antarctic Dry Valley soil temperatures stay well below freezing, with daily minimums of _40 to _60_C, insulating mountain-top snow cover could potentially offer a dark microbial niche [Freeman et al. 2009; 17 Ley et al., 2004], but no data currently exists for the duration and depth of snow cover (or wintertime temperatures) above 6000 m in the Atacama region. Previous work has also shown that the rate of freezing is an important parameter determining the survivability of microbes in cold terrestrial ecosystems [Henry, 2007; Lipson et al., 2000; Schmidt et al., 2009]. For example, Lipson et al. [2000] showed that alpine tundra microbial biomass levels were significantly depressed by cooling rates of over 1.4 Ch-1 (measured at the soil surface) but were largely unaffected by slower rates of soil cooling. The linear cooling rates recorded on Volcán Llullaillaco (1.50 C h-1 at 4 cm depth) were faster than 1.4 C h-1 and were comparable to the highest rates of subzero soil cooling yet reported (1.83 C h1 ), measured during the austral winter at 5400 m.a.s.l. in barren, peri-glacial soils of the Peruvian Andes [Schmidt et al., 2009]. The rate of soil freezing on Volcán Llullaillaco is also much faster than that measured in limited studies of high elevation soils (5000 m.a.s.l.) in the Himalayas and Tibetan Plateau [cf. King et al., 2010a, Yang et al., 2003]. The average organic carbon value from our six sites on Volcán Llullaillaco (163 mg C g soil-1), classifies these high-mountain soils as highly oligotrophic; at the low end of the range found in other extreme deserts [Drees et al., 2006; Parsons et al., 2004]. Soils on the hyper-arid desert floor of the Atacama contain organic carbon values consistently below that of the samples studied here. However, pyrolysis-GC-MS analysis of the desert floor organic carbon revealed a much simpler mixture of organic compounds than that released from living microbes [NavarroGonzález et al., 2003]. This, and other evidence, suggests that life is rarely if ever active in some parts of the soil in the hyper-arid core of the Atacama. Conversely, 18 year old volcanic deposits on Kilauea volcano of the Hawaiian archipelago reportedly contain only slightly more (200 mg C g soil-1) organic carbon than the >6000 m.a.s.l soils, yet conclusively demonstrate in situ 18 biological uptake of CO2, CO, and H2 [King, 2003a]. Although exact ages for the parent volcanic deposits of our samples are currently undetermined, we know they are much older than 0.048 +/_ 0.012 Ma, based on the work of Richards and Villeneuve [2001]. On Volcán Llullaillaco the early colonizers appear to have gained a foothold, but unlike less restrictive environments, are never supplanted by later successional communities even after tens of thousands of years. In addition to low TOC values for Llullaillaco soils, our estimates of microbial biomass carbon (MBC) were also extremely low (Table 1). These values are similar to those measured (using the same method) in soils of the Dry Valleys of Antarctica (26 mg C g_1) [Ball et al., 2009] and high elevation soils of the Himalayas (21 mg C g-1) [Schmidt et al., 2011]. They are also lower than MBC values (140 mg C g-1 averaged across many sites in a plant-free, recently de-glaciated landscape in the high Andes of Perú [King et al., 2008]. For comparison vegetated soils usually have MBC levels that are two orders of magnitude higher than those reported here [Cleveland et al., 2004; Weintraub et al., 2007]. Another indication of the extreme nature of Llullaillaco soils is that the levels of measurable enzyme activities (Table 1) were 3 to 80 times lower than values from the driest sites studied by Zeglin et al. [2009] in the Antarctic Dry Valleys. Aside from revealing a low diversity community, which lacks obvious phototrophs, our molecular phylogenetic analyses hint at a set of traits necessary for survival in the >6000 m.a.s.l. soil environment. For example, the dominant Actinobacterial OTU is closely related (94% identity) to Pseudonocardia asaccharolytica (Y08536), which can oxidize dimethyl sulfide (DMS) for energy [Reichert et al., 1998]. Nearer un-cultured database relatives are from Icelandic and Azorean volcanic deposits (GQ495403, HM445437). Likewise, the dominant 19 Chloroflexi lineage branches from Ktedonobacter racemifer (AM180159), a putative facultative ‘carboxydovore’ [Chang et al., 2011], which may be able to use carbon monoxide (CO) as an electron donor and carbon source, in addition to wide array of organic carbon substrates [Cavaletti et al., 2006]. Other un-cultured relatives are from dry Antarctic soils (FR749824, FR749772). Both of these distantly related clades seem to a share a number of convergent traits that confer success in these oligotrophic environments: a mixotrophic lifestyle, filamentous morphology, and the ability to sporulate. The extremely limited eukaryotic and archaeal diversity mirrors the organic carbon restriction, which can only support all but the most efficient of secondary trophic consumers. Members of the Cryptococcus-albidus clade (Basidiomycetious yeasts, Figure 2.4) seem well suited to this role. They have radiated widely into xeric environments, where they can occupy the endolithic niche as highly competitive heterotrophs due, in part, to abundant carbohydrate capsule production [Vishniac, 2006]. Although knowledge is limited regarding the archaeal phylum Thaumarchaeota [Brochier-Armanet et al., 2008], multiple lines of evidence suggest they can aerobically oxidize trace quantities ammonia for energy [Könneke et al., 2005], and have a broad distribution in soil environments [Bates et al., 2011; Oline et al., 2006]. Overall, our analyses suggest that energy and carbon sources for microbial activity above 6000 m.a.s.l. could be derived from a combination of heterotrophic respiration of aeolian deposited organic carbon, and chemoautotrophic carbon fixation driven by aerobic oxidation of ammonia, DMS, and CO. Although energy yield from trace gas oxidation is limited, it is a constantly available substrate, even in the dark deeper layers of soil and rock where microbes can avoid the massive diurnal temperature swings, rapid cooling and UV exposure of the surface environment. Additionally, even though global atmospheric CO concentrations are only in the 5–350 ppb 20 range, proximity to fumaroles may increase CO availability on volcanoes [King, 1999; Symonds et al., 1994]. The last un-official reported activity of Volcán Llullaillaco dates to 1887 (www.volcano.si.edu), but it is unknown whether the local atmosphere is currently being enriched with volcanic gases. Either way, our coxL data (Figure 2.5) are genetic novelties that represent either divergent natural selection driven by this unique environment, or genetic drift by geographic isolation, both of which support the hypothesis that soils above 6000 m.a.s.l. harbor functioning microbial ecosystems. As discussed above, our results suggest that an endogenous community of novel microbes may be periodically active in this understudied high-elevation setting. However, it is also possible that continuous atmospheric deposition of microbial propagules is responsible for some of the genetic diversity seen in these soils. Microbes are well known to be globally dispersed in the upper atmosphere [Darcy et al., 2011; Mladenov et al., 2011] and it is possible that there is a constant input of ice nucleating [Christner et al., 2008] and other microbes to these soils. But the unique and extreme environmental conditions on Volcán Llullaillaco are likely to be highly selective for specific microbes. Indeed the limited diversity of the microbial community on Llullaillaco suggests strong selection because the microbial groups present do not match the profiles of atmospheric microbial communities. For example, the Llullaillaco soils contain less than 1% of the common groups Betaproteobacteria, Firmicutes and Pseudomonas and 7 out of 10 major groups of bacteria that are abundant in atmospheric samples [Bowers et al., 2012]. Likewise the limited fungal diversity on Llullaillaco is very different than the profile of fungal spores found in atmospheric samples; out of the 22 different fungal genera present in high elevation atmospheric samples [Amato et al., 2007], only 2 were present on Volcán Llullaillaco. However, studies of the connection between atmospheric and terrestrial microbes are in their 21 infancy and much more work is needed to determine both the origin and function of the microbial communities of high elevation soils [Meyer et al., 2004; Schmidt et al., 2011]. Like the chemosynthetic ecosystems of the deep sea and deep subsurface biosphere [e.g., Connelly et al., 2012; Lin et al., 2006], life on the Earth’s highest volcanoes may not be supported by in situ photosynthesis but rather by the oxidation of gaseous substrates. Our work suggests that the highest sites on Volcán Llullaillaco are devoid of photosynthetic primary producers and contain unique microbial communities that may be partially supported by the oxidation of carbon monoxide, but more work is needed to test this hypothesis. Future research at sites above 6000 m.a.s.l. will focus on isolation of the dominant microbes from high elevation sites, and determination of survival and growth under conditions that mimic the extreme temperature fluctuations and low energy inputs of the environment. It is expected that these organisms are reservoirs of uncharacterized biological traits that allow adaptation to the unique challenges of this dynamic and oligotrophic environment. Deeper insight into these outer limits of biological adaptive capacity will inform our understanding of biogeochemical processes under conditions never before examined in terrestrial ecosystems. This work may also be informative for the search for life on other planets, especially in light of recent analyses that suggest seasonal near-surface water flow on Mars [McEwen et al., 2011]. 2.6 References Aislabie, J., K. Chhour, D. Saul, S. Miyauchi, J. Ayton, R. Paetzold, and M. Balks (2006), Dominant bacteria in soils of Marble Point and Wright Valley, Victoria Land, Antarctica, Soil Biol. Biochem., 38, 3041–3056, doi:10.1016/j.soilbio.2006.02.018. Altschul, S. F., W. Gish, W. Miller, E. W. Meyers, and D. J. Lipman (1990), Basic local alignment search tool, J. Mol. Biol., 215, 403–410. 22 Amato, P., M. Parazols, M. Sancelme, P. Laj, G. Mailhot, and A.-M. Delort (2007), Microorganisms isolated from the water phase of tropospheric clouds at the Puy de Dôme: Major groups and growth abilities at low temperatures, FEMS Microbiol. Ecol., 59, 242–254, doi:10.1111/j.1574-6941.2006.00199.x. Arroyo, M. T. K., F. A. Squeo, J. J. Armesto, and C. Villagrán (1988), Effects of aridity on plant diversity in the northern Chilean Andes: Results of a natural experiment, Ann. Mo. Bot. Gard., 75, 55–78, doi:10.2307/2399466. Ball, B. A., R. A. Virginia, J. E. Barrett, A. N. Parsons, and W. H. Wall (2009), Interactions between physical and biotic factors influence CO2 flux in Antarctic Dry Valley soils, Soil Biol. Biochem., 41, 1510–1517, doi:10.1016/j.soilbio.2009.04.011. Bates, S. T., D. Berg-Lyons, J. G. Caporaso, W. Walters, R. Knight, and N. Fierer (2011), Examining the global distribution of dominant archaeal populations in soil, ISME J., 5, 908–917, doi:10.1038/ismej.2010.171. Bowers, R. M., I. B. McCubbin, A. G. Hallar, and N. Fierer (2012), Seasonal variability in airborne bacterial communities at a high-elevation site, Atmos. Environ., 50, 41–49, doi:10.1016/j.atmosenv.2012.01.005. Brochier-Armanet, C., B. Boussau, S. Gribaldo, and P. Forterre (2008), Mesophilic Crenarchaeota: Proposal for a third archaeal phylum, the Thaumarchaeota, Nat. Rev. Microbiol., 6, 245–252, doi:10.1038/nrmicro1852. Cabrera, S., S. Bozzo, and H. Fuenzalida (1995), Variations in UV radiation in Chile, J. Photochem. Photobiol., 28, 137–142, doi:10.1016/1011-1344(94) 07103-U. Cary, S. C., I. R. McDonald, J. E. Barrett, and D. A. Cowan (2010), On the rocks: The microbiology of Antarctic Dry Valley soils, Nat. Rev. Microbiol., 8, 129–138, doi:10.1038/nrmicro2281. Cavaletti, L., P. Monciardini, R. Bamonte, P. Schumann, M. Rohde, M. Sosio, and S. Donadio (2006), New lineage of filamentous, sporeforming, gram-positive bacteria from soil, Appl. Environ. Microbiol., 72, 4360–4369, doi:10.1128/AEM.00132-06. Chang, Y. J., et al. (2011), Non-contiguous finished genome sequence and contextual data of the filamentous soil bacterium Ktedonobacter racemifer type strain (SOSP1–21), Stand. Genomic Sci., 5, 97–111, doi:10.4056/sigs.2114901. Christner, B. C., C. E. Morris, C. M. Foreman, R. Cai, and D. C. Sands (2008), Ubiquity of biological ice nucleators in snowfall, Science, 319, 1214, doi:10.1126/science.1149757. Cleveland, C. C., et al. (2004), Soil microbial dynamics in Costa Rica: Seasonal and biogeochemical constraints, Biotropica, 36, 184–195. 23 Cockell, C. S., C. P. McKay, K. Warren-Rhodes, and G. Horneck (2008), Ultraviolet radiationinduced limitation to epilithic microbial growth in arid deserts—Dosimetric experiments in the hyperarid core of the Atacama Desert, J. Photochem. Photobiol., 90, 79–87, doi:10.1016/j. jphotobiol.2007.11.009. Connelly, D. P., et al. (2012), Hydrothermal vent fields and chemosynthetic biota on the world’s deepest seafloor spreading centre, Nat. Commun., 3, 620, doi:10.1038/ncomms1636. Connon, S. A., E. D. Lester, H. S. Shafaat, D. C. Obenhuber, and A. Ponce (2007), Bacterial diversity in hyperarid Atacama Desert soils, J. Geophys. Res., 112, G04S17, doi:10.1029/2006JG000311. Costello, E. K., S. R. P. Halloy, S. C. Reed, P. Sowell, and S. K. Schmidt (2009), Fumarolesupported islands of biodiversity within a hyperarid, high-elevation landscape on Socompa Volcano, Puna de Atacama, Andes, Appl. Environ. Microbiol., 75, 735–747, doi:10.1128/AEM.01469-08. Darcy, J. L., R. C. Lynch, A. J. King, M. S. Robeson, and S. K. Schmidt (2011), Global distribution of Polaromonas phylotypes—Evidence for a highly successful dispersal capacity, PLoS ONE, 6, e23742, doi:10.1371/journal.pone.0023742. Drees, K. P., J. W. Neilson, J. L. Betancourt, J. Quade, D. A. Henderson, B. M. Pryor, and R. M. Maier (2006), Bacterial community structure in the hyperarid core of the Atacama Desert, Chile, Appl. Environ. Microbiol., 72, 7902–7908, doi:10.1128/AEM.01305-06. Farías, M. E., V. Fernández-Zenoff, R. Flores, O. Ordóñez, and C. Estévez (2009), Impact of solar radiation on bacterioplankton in Laguna Vilama, a hypersaline Andean lake (4650 m), J. Geophys. Res., 114, G00D04, doi:10.1029/2008JG000784. Freeman, K. R., A. P. Martin, D. Karki, R. C. Lynch, M. S. Mitter, A. F. Meyer, J. E. Longcore, D. R. Simmons, and S. K. Schmidt (2009), Evidence that chytrids dominate fungal communities in high-elevation soils, Proc. Natl. Acad. Sci. U. S. A., 106, 18,315–18,320, doi:10.1073/pnas. 0907303106. Halloy, S. R. P. (1991), Islands of life at 6000 m altitude: The environment of the highest autotrophic communities on Earth (Socompa Volcano, Andes), Arct. Alp. Res., 23, 247–262, doi:10.2307/1551602. Henry, H. A. L. (2007), Soil freeze–thaw cycle experiments: Trends, methodological weaknesses and suggested improvements, Soil Biol. Biochem., 39, 977–986, doi:10.1016/j.soilbio.2006.11.017. King, A. J., A. F. Meyer, and S. K. Schmidt (2008), High levels of microbial biomass and activity in unvegetated tropical and temperate alpine soils, Soil Biol. Biochem., 40, 2605–2610, doi:10.1016/j.soilbio.2008.06.026. 24 King, A. J., D. Karki, L. Nagy, A. Racoviteanu, and S. K. Schmidt (2010a), Microbial biomass and activity in high elevation (>5100 meters) soils from the Annapurna and Sagarmatha regions of the Nepalese Himalayas,Himalayan J. Sci., 6, 11–18, doi:10.3126/hjs.v6i8.2303. King, A. J., K. R. Freeman, K. F. McCormick, R. C. Lynch, C. A. Lozupone, R. Knight, and S. K. Schmidt (2010b), Biogeography and habitat modelling of high-alpine bacteria, Nat. Commun., 1, 53, doi:10.1038/ncomms1055. King, G. M. (1999), Characteristics and significance of atmospheric carbon monoxide consumption by soils, Chemosphere Global Change Sci., 1, 53–63, doi:10.1016/S14659972(99)00021-5. King, G. M. (2003a), Contributions of atmospheric CO and hydrogen uptake to microbial dynamics on recent Hawaiian volcanic deposits, Appl. Environ. Microbiol., 69, 4067–4075, doi:10.1128/AEM.69.7.4067-4075.2003. King, G. M. (2003b), Molecular and culture-based analyses of aerobic carbon monoxide oxidizer diversity, Appl. Environ. Microbiol., 69, 7257–7265, doi:10.1128/AEM.69.12.7257-7265.2003. King, G. M., and C. F. Weber (2007), Distribution, diversity and ecology of aerobic COoxidizing bacteria, Nat. Rev. Microbiol., 5, 107–118, doi:10.1038/nrmicro1595. Könneke, M., A. E. Bernhard, J. R. de la Torre, C. B. Walker, J. B. Waterbury, and D. A. Stahl (2005), Isolation of an autotrophic ammonia-oxidizingmarine archaeon, Nature, 437, 543–546, doi:10.1038/nature03911. Larkin, M. A., et al. (2007), Clustal W and Clustal X version 2.0, Bioinformatics, 23, 2947– 2948, doi:10.1093/bioinformatics/btm404. Lester, E. D., M. Satomi, and A. Ponce (2007), Microflora of extreme arid Atacama Desert soils, Soil Biol. Biochem., 39, 704–708, doi:10.1016/j.soilbio.2006.09.020. Ley, R. E., M. W. Williams, and S. K. Schmidt (2004), Microbial population dynamics in an extreme environment: Controlling factors in talus soils at 3750 m in the Colorado Rocky Mountains, Biogeochemistry, 68, 297–311, doi:10.1023/B:BIOG.0000031032.58611.d0. Lin, L.-H., et al. (2006), Long-term sustainability of a high-energy, low-diversity crustal biome, Science, 314, 479–482, doi:10.1126/science.1127376. Lipson, D. A., S. K. Schmidt, and R. K. Monson (2000), Carbon availability and temperature control the post-snowmelt decline in alpine soil microbial biomass, Soil Biol. Biochem., 32, 441–448, doi:10.1016/S0038-0717(99)00068-1. Lozupone, C., and R. Knight (2005), UniFrac: A new phylogenetic method for comparing microbial communities, Appl. Environ. Microbiol., 71,8228–8235, doi:10.1128/AEM.71.12.8228-8235.2005. 25 Ludwig, W., et al. (2004), ARB: A software environment for sequence data, Nucleic Acids Res., 32, 1363–1371, doi:10.1093/nar/gkh293. McEwen, A. S., L. Ojha, C. M. Dundas, S. S. Mattson, S. Byrne, J. J. Wray, S. C. Cull, S. L. Murchie, N. Thomas, and V. C. Gulick (2011), Seasonal flows on warm Martian slopes, Science, 333, 740–743, doi:10.1126/ science.1204816. Meyer, A. F., et al. (2004), Molecular and metabolic characterization of cold tolerant, alpine soil Pseudomonas, sensu stricto, Appl. Environ. Microbiol., 70, 483–489, oi:10.1128/AEM.70.1.483489.2004. Mladenov, N., et al. (2011), Dust inputs and bacteria influence dissolved organic matter in clear alpine lakes, Nat. Commun., 2, 405, doi:10.1038/ncomms1411. Navarro-González, R., et al. (2003), Mars-like soils in the Atacama Desert, Chile, and the dry limit of microbial life, Science, 302, 1018–1021, doi:10.1126/science.1089143. Nemergut, D. R., S. P. Anderson, C. C. Cleveland, A. P. Martin, A. E. Miller, A. Seimon, and S. K. Schmidt (2007), Microbial community succession in unvegetated, recently deglaciated soils, Microb. Ecol., 53, 110–122, doi:10.1007/s00248-006-9144-7. Novis, P. M., D. Whitehead, E. G. Gregorich, J. E. Hunt, A. D. Sparrow, D. W. Hopkins, B. Elberling, and L. G. Greenfield (2007), Annual carbon fixation in terrestrial populations of Nostoc commune (Cyanobacteria) from an Antarctic dry valley is driven by temperature regime, Global Change Biol., 13(6), 1224–1237, doi:10.1111/j.1365-2486.2007.01354.x. Oline, D. K., S. K. Schmidt, and M. C. Grant (2006), Biogeography and landscape-scale diversity of the dominant Crenarchaeota of soil, Microb. Ecol., 52, 480–490, doi:10.1007/s00248-006-9101-5. Parsons, A. N., J. E. Barrett, D. H. Wall, and R. A. Virginia (2004), Soil carbon dioxide flux in Antarctic dry valley ecosystems, Ecosystems, 7,286–295, doi:10.1007/s10021-003-0132-1. Pruesse, E., C. Quast, K. Knittel, B. Fuchs, W. Ludwig, J. Peplies, and F. O. Glöckner (2007), SILVA: A comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB, Nucleic Acids Res., 35, 7188–7196, doi:10.1093/nar/gkm864. Reichert, K., A. Lipski, S. Pradella, E. Stackebrandt, and K. Altendorf (1998), Pseudonocardia asaccharolytica sp. nov. and Pseudonocardia sulfidoxydans sp. nov., two new dimethyl disulfide-degrading actinomycetes and emended description of the genus Pseudonocardia, Int. J. Syst. Evol. Microbiol., 48, 441–449, doi:10.1099/00207713-48-2-441. 26 Richards, J., and M. Villeneuve (2001), The Llullaillaco volcano, northwest Argentina: Construction by Pleistocene volcanism and destruction by sector collapse, J. Volcanol. Geotherm. Res., 105, 77–105, doi:10.1016/S0377-0273(00)00245-6. Richter, M., and D. Schmidt (2002), Cordillera de la Atacama. Das trockenste Hochgebirge der Welt, Petermanns Geogr. Mitt., 146, 48–57. Schloss, P. D., et al. (2009), Introducing mothur: Open-source, platform independent, community-supported software for describing and comparing microbial communities, Appl. Environ. Microbiol., 75, 7537–7541, doi:10.1128/AEM.01541-09. Schmidt, S. K., D. R. Nemergut, A. E. Miller, K. R. Freeman, A. J. King, and A. Seimon (2009), Microbial activity and diversity during extreme freeze-thaw cycles in periglacial soils, 5400 m elevation, Cordillera Vilcanota, Perú, Extremophiles, 13, 807–816, doi:10.1007/s00792-0090268-9. Schmidt, S. K., R. C. Lynch, A. J. King, D. Karki, M. S. Robeson, L. Nagy, M. W. Williams, M. S. Mitter, and K. R. Freeman (2011), Phylogeography of microbial phototrophs in the dry valleys of the high Himalayas and Antarctica, Proc. R. Soc. London, Ser. B, 278, 702–708, doi:10.1098/ rspb.2010.1254. Schmidt, S. K., C. S. Naff, and R. C. Lynch (2012), Fungal communities at the edge: Ecological lessons from high alpine fungi, Fungal Ecol., 5, 443–452, doi:10.1016/j.funeco.2011.10.005. Stern, C. R. (2004), Active Andean volcanism: Its geologic and tectonic setting, Rev. Geol. Chile, 31, 161–206. Symonds, R. B., W. I. Rose, G. Bluth, and T. M. Gerlach (1994), Volcanic gas studies: Methods, results, and applications, Rev. Mineral. Geochem., 30, 1–66. Vishniac, H. S. (2006), A multivariate analysis of soil yeasts isolated from a latitudinal gradient, Microb. Ecol., 52, 90–103, doi:10.1007/s00248-006-9066-4. Warren-Rhodes, K. A., K. L. Rhodes, S. B. Pointing, S. A. Ewing, D. C. Lacap, B. Gómez-Silva, R. Amundson, E. I. Friedmann, and C. P. McKay (2006), Hypolithic cyanobacteria, dry limit of photosynthesis, and microbial ecology in the hyperarid Atacama Desert, Microb. Ecol., 52, 389– 398,doi:10.1007/s00248-006-9055-7. Weintraub, M. N., L. E. Scott-Denton, S. K. Schmidt, and R. K. Monson (2007), The effects of tree rhizodeposition on soil exoenzyme activity, dissolved organic carbon, and nutrient availability in a subalpine forest ecosystem, Oecologia, 154, 327–338, doi:10.1007/s00442-0070804-1. Wilson, A. S., et al. (2007), Stable isotope and DNA evidence for ritual sequences in Inca child sacrifice, Proc. Natl. Acad. Sci. U. S. A., 104, 16,456–16,461, doi:10.1073/pnas.0704276104. 27 Yang, M., T. Yao, X. Gou, T. Koike, and Y. He (2003), The soil moisture distribution, thawing– freezing processes and their effects on the seasonal transition on the Qinghai–Xizang (Tibetan) plateau, J. Asian Earth Sci., 21, 457–465, doi:10.1016/S1367-9120(02)00069-X. Zeglin, L. H., R. L. Sinsabaugh, J. E. Barrett, M. N. Gooseff, and C. D. Takacs-Vesbach (2009), Landscape distribution of microbial activity in the McMurdo Dry Valleys: Linked biotic processes, hydrology, and geochemistry in a cold desert ecosystem, Ecosystems, 12, 562–573, doi:10.1007/s10021-009-9242-8. 28 CHAPTER 3 METAGENOMIC EVIDENCE FOR METABOLISM OF TRACE ATMOSPHERIC GASES BY HIGH-ELEVATION DESERT ACTINOBACTERIA2 3.1 Abstract Previous surveys of very dry Atacama Desert mineral soils have consistently revealed sparse communities of non-photosynthetic microbes. The functional nature of these microorganisms remains debatable given the harshness of the environment and low levels of biomass and diversity. The aim of this study was to gain an understanding of the phylogenetic community structure and metabolic potential of a low-diversity mineral soil metagenome that was collected from a high-elevation Atacama Desert volcano debris field. We pooled DNA extractions from over 15 g of volcanic material, and using whole genome shotgun sequencing, observed only 75– 78 total 16S rRNA gene OTUs3%. The phylogenetic structure of this community is significantly under dispersed, with actinobacterial lineages making up 97.9–98.6% of the 16S rRNA genes, suggesting a high degree of environmental selection. Due to this low diversity and uneven community composition, we assembled and analyzed the metabolic pathways of the most abundant genome, a Pseudonocardia sp. (56–72% of total 16S genes). Our assembly and binning efforts yielded almost 4.9 Mb of Pseudonocardia sp. contigs, which accounts for an estimated 2 Published as: Lynch RC, Darcy JL, Kane NC, Nemergut DR and Schmidt SK (2014) Metagenomic evidence for metabolism of trace atmospheric gases by high-elevation desert Actinobacteria. Front. Microbiol. 5:698. doi: 10.3389/fmicb.2014.00698 29 99.3% of its non-repetitive genomic content. This genome contains a limited array of carbohydrate catabolic pathways, but encodes for CO2 fixation via the Calvin cycle. The genome also encodes complete pathways for the catabolism of various trace gases (H2, CO and several organic C1 compounds) and the assimilation of ammonia and nitrate. We compared genomic content among related Pseudonocardia spp. and estimated rates of non-synonymous and synonymous nucleic acid substitutions between protein coding homologs. Collectively, these comparative analyses suggest that the community structure and various functional genes have undergone strong selection in the nutrient poor desert mineral soils and high-elevation atmospheric conditions. 3.2 Introduction The Atacama Desert is the driest and perhaps oldest desert on Earth, where an estimated 150 My of sustained aridity and 3–4 My of hyperaridity across the central plateau have shaped the landscape (Hartley et al., 2005). The Atacama region is bounded by the Andes to the east and by the coastal mountain range and the cold water Pacific Humboldt current to the west (GómezSilva et al., 2008). These barriers restrict the flow of atmospheric moisture, which in turn results in some of the most inhospitable proto-mineral soils on the planet that contain nearly undetectable organic carbon stocks and microbial biomass pools (Navarro-González et al., 2003). The eastern boundary of the region hosts large volcanoes that are situated in the leeward rainshadow of the Andes. The upper plant-free reaches of these peaks are distinct from other more well studied Atacama geographic zones in that the higher elevation increases rates of precipitation, yet also increases rates of evaporation, sublimation, solar incidence and freezethaw cycling (Schmidt et al., 2009). Despite these additional stressors, the barren high volcanic 30 deposits are a habitat still principally limited by water availability (Costello et al., 2009). Photoatmospheric processes (e.g., lightning derived nitrate deposition, Michalski et al., 2004), likely play defining roles in these gravel-like mineral soils where biotic geochemical cycling is constrained to nearly undetectable levels. Although meteorological data from the high-elevation reaches of the Atacama volcanoes are sparse (Richter and Schmidt, 2002), the restrictiveness of the conditions to biological activity is manifest in the biomass levels of the mineral soils, which are barely above detection limits, as well as microbial diversity estimates that rival the lowest ever sampled for exposed terrestrial systems (Costello et al., 2009; Lynch et al., 2012). The physical conditions that exclude nearly all microbial life seem to have been overcome by a limited spectrum of bacterial and fungal lineages that may have evolved the capacity for in situ activity. The most abundant of these organisms are Chloroflexi and certain Actinobacteria, mainly of the Actinomycetales, Acidimicrobiales and Rubrobacterales orders (Costello et al., 2009; Lynch et al., 2012). Based on our initial molecular survey of these volcanic samples (Costello et al., 2009; Lynch et al., 2012), and work carried out in other areas of the Atacama where plant and microbial phototrophs are absent (Neilson et al., 2012), we hypothesized that chemoautotrophic microbes may be supplying organic carbon to simple and low-energy flux communities. Previous studies elsewhere have demonstrated the biological uptake of trace gases (CO and H2, but not CH4) in 26 year old plant-free and carbon limited Hawaiian volcanic deposits (King, 2003a), implying trace gases may be important energy sources where organic carbon accumulations are limited. The present metagenomic study was undertaken to develop a more comprehensive understanding of the potential metabolic traits, particularly focused on energy and nutrient acquisition, which the few community members found at the Llullaillaco Volcano study sites 31 possess. The functional hypotheses developed through this study will be considered in light of the known environmental conditions present at these sites, and support the ongoing development of realistic growth conditions for culture based experiments. Here we present a shotgun metagenomic study of a low-diversity and phylogenetically under-dispersed community, composed almost exclusively of Actinobacteria (>98% of all bacteria) found in the high-elevation (>6000 m elevation) Atacama Desert volcanic deposits. By leveraging the natural low diversity of these samples with deep coverage from long-read whole metagenome shotgun sequencing, we were able to characterize the genomic makeup of the community members at a high level of detail through reference database classification of raw sequence reads. Our high sequencing depth and coverage also enabled de novo assembly based analyses of selection through estimation of non-synonymous and synonymous mutation rates for protein coding genes of the most abundant community member's genome. 3.3 Materials and Methods Sample Collection and Preservation Two snow free mineral soil samples located approximately 5 m apart were collected from the Llullaillaco Volcano (−24.718, −68.529) at an elevation of 6034 m above sea level (m.a.s.l.) during the austral summer in mid-February 2009. The top 4 cm of surface material, excluding rocks larger than 2 cm in diameter, were aseptically collected and frozen the same day in the field using blue ice packs. By the evening of the day the samples were collected, they were transferred to a −20°C freezer at the army barracks (on the ChileArgentina border) near the field site. The next day they were driven (on ice in a cooler) to Salta, Argentina where they were again placed in a −20°C freezer until they were hand 32 carried to Colorado in a thick-walled cooler on blue ice packs. They arrived in Boulder, Colorado within 24 h of being taken out of the freezer in Salta and were still frozen upon arrival (i.e., the ice packs hadn't melted). The samples have since been continuously stored at −20°C. Further details regarding these and other samples collected from the Llullaillaco Volcano can be found in Lynch et al. (2012). DNA Extraction and Sequencing and Quality Control We utilized a modified serial silica filter binding protocol (Fierer et al., 2012) to overcome the low DNA yields of these low biomass samples and to avoid the potential biases introduced from random genomic amplification techniques. DNA extractions were quantified using PicoGreen dsDNA fluorometry (Thermo Fisher Scientific Inc.). We recovered 1 μg of gDNA from each of the samples, which required 10.4 g of volcanic debris from sample 1 and 4.8 g from sample 3 (Table 3.1). Negative extraction controls were run with the same batch of extraction reagents, but no soils were added. These negative control extractions were excluded from the sequencing libraries due to insufficient quantities of dsDNA. Samples were shipped to the Duke University Genome Sequencing and Analysis Core Resource where the long-read 454 GS FLX+ platform was used to sequence randomly fragmented bulk nucleic acid extractions. Table 3.1 Summary of sample characteristics for volcano metagenomes. 33 Library parsing and removal of the 454 MIDs was achieved with the sfffiles package (454 Life Sciences) and manually confirmed using the Geneious (6.1.3) viewer. Reads were trimmed so they contained no more than five bases with quality scores of 15 or lower (Cox et al., 2010). Sequence length was required to be within two standard deviations of the mean length, and no more than five ambiguous bases per read were permitted. We found very low rates of artificial read duplication (Gomez-Alvarez et al., 2009, 0.31 and 0.13% for the sites 1 and 3 libraries respectively), which was tested using CD HIT (Fu et al., 2012), with settings 1 1 3 that require 100% sequence identity and length. We used a 15-mer spectrum analysis (Supplementary Figure 1, Marçais and Kingsford, 2011) to visualize how sequencing depth relates to the total metagenomic complexity of the samples. Additional desert and non-desert metagenomes were downloaded from the MG RAST server (Meyer et al., 2008), ID 4446153.3 and all datasets from Fierer et al. (2012). rDNAs A closed reference operational taxonomic unit (OTU) picking method (pick_closed_reference_otus.py, Caporaso et al., 2010) was applied to a UCLUST (Edgar, 2010) identified set of candidate 16S RNAs genes. This method overcomes the issue of sequencing different regions for the 16S rRNA gene with the shotgun technique. A 97% similarity was required for each candidate sequence alignment to the most current Green Genes reference dataset available (Release 13_5, McDonald et al., 2012). For the analysis of phylogenetic dispersion, near full length 16S rRNA gene sequences that have been 34 previously published (JX098304—JX098810) were used to construct a maximum likelihood tree (Price et al., 2009) with the Green Genes reference dataset (13_5) clustered into 5088 OTUs85%. Phylocom 4.2 (Webb et al., 2008) was used to calculate a net relatedness index (NRI) value and associated one-tail P-values with 999 randomization iterations and the null hypothesis setting 2 (sample OTUs are drawn at random from the total species pool without replacement). This null hypothesis is intended to model the homogenizing effects of long distance atmospheric transport and deposition of bacterial cells from diverse sources, with a total absence of selection. Fine scale phylogenetic trees were constructed with OTUs1% of the full length 16S sequences determined by the QIIME pick_de_novo_otus.py workflow. SINA alignments (Pruesse et al., 2012) were built with Silva (115) reference database representatives (Quast et al., 2013) and maximum likelihood phylogenies were inferred with PhyML 3.0 (Guindon et al., 2010) using a GTR model of nucleic acid evolution. Genetic Inventory The SEED database (Overbeek et al., 2005) uses a hierarchical classification system where the broadest level (level 1) includes many anabolic and catabolic pathways and their associated single enzyme catalyzed intermediaries. Pairwise t-tests were used to calculate significance of gene category count differences (level 1) between the Llullaillaco Volcano libraries and a collection of desert and non-desert metagenomes, using the pooled SD option and a Bonferroni correction for multiple comparisons (α = 0.05/ (28 × 2) = 0.0009) in R (http://www.r-project.org/). Gene calls were made based on minimum ID of 60% and 35 a maximum e-value of 1 e−5 for all BLAT alignments that were generated from MG RAST, and the SEED database. Assembly De novo assembly was attempted on each of the two separate Llullaillaco site metagenomes with the MIRA V3.4.0 (Chevreux et al., 1999) signal trace assembly platform using the following settings: --job=denovo,genome,accurate,454 --highlyrepetitive --noclipping -notraceinfo --fasta -project=RL1All -SK:not=46 -AS:sep=yes 454_SETTINGS -ED:ace=yes AL:mo=40:ms=30 -CL:bsqc=yes -LR:lsd=yes:ft=fastq. These settings require that each fragment addition to a contig have at least 40 high quality scoring bases of overlap and minimum quality scores of 30. They also restrict the variance of coverage levels across each contig to reflect the expectation that random shotgun sampling of each community member's genome should result in a unique coverage level that reflects its natural relative abundance in the community of genomes. This assembly approach assumes a theoretical copy number of one per unique genomic element leading to exclusion of repetitive elements, and also assumes that the main community members have significantly different relative abundances. Assembly Evaluation and Annotation Tetramer based emergent self-organizing maps (ESOMs) http://databionicesom.sourceforge.net/were used to help evaluate contig binning (Dick et al., 2009) in conjunction with analysis coverage levels. Descriptions of the databionic ESOM settings and the Perl scripts used to calculate tetramer frequencies can be found 36 at https://github.com/tetramerfreqs/binning. Consensus sequences from contigs were called with a majority rule to filter out all but the most abundant strains and low coverage ends were trimmed. Bins of contigs that represent draft genomes and associated metadata were uploaded to the JGI IMG/ER database (Markowitz et al., 2012) for initial annotation. The phylogenetic origins of the JGI protein annotations were inspected and annotations for select coding DNA sequences (CDS) were checked manually. Completeness of the metagenome assembles was assessed by comparing protein family database (Punta et al., 2012) annotations to the list of conserved single copy genes (CSCGs, Rinke et al., 2013). Putative genes involved in major metabolic pathways were manual curated by evaluating blastx alignments and through literature-based refinement of functional annotations. Comparative Genomics and Analysis of Selection Clusters of orthologs genes (COGs, Tatusov, 1997) for the three publically available Pseudonocardia sp. genomes were downloaded from the IMG/ER database. COG count data were subjected to hierarchal centroid clustering with Cluster 3.0http://bonsai.hgc.jp/mdehoon/software/cluster/software.htm. and visualized with heatmaps drawn in TreeView (Saldanha, 2004). Even when genes share clear homologous relationships they may perform divergent functions. One way to detect the signature of divergent selection between orthologous genes is through the comparison of rates of non-synonymous (Ka) to synonymous (Ks) mutations. When selection is weak or absent Ka:Ks ratios should be close to one since genetic drift should have an equal chance of causing either non-synonymous or 37 synonymous mutations. However, when divergent selection drives altered amino acid coding potential, rates of non-synonymous mutations should be elevated relative to synonymous mutations (Yang, 1998). A Perl pipeline was used to link the following steps together for an iterative Ka:Ks analysis. Pairs of candidate CDS orthologs between our best volcano Pseudonocardia sp. draft genome and the Pseudonocardia asaccharolytica (IMG ID 13496) draft genome were identified as reciprocal blastn hits (with ≥70% identity for 100 bp). Protein guided DNA alignments were generated for each CDS pair through the TranslatorX approach (Abascal et al., 2010), which relies on Muscle (Edgar, 2004) to align predicted amino acid sequences. Codeml (PAML 4.7, Yang, 2007) was then used to estimate rates of non-synonymous (Ka) and synonymous (Ks) nucleic acid substitutions for each ortholog pair alignment, using the WAG model of amino acid evolution. Ortholog pairs found with signatures of positive selection for amino acids substitutions (Ka:Ks ratios of ≥ 1) were checked manually and annotated with a database of genes from the P. asaccharolytica draft genome using blastx. Hydrogenase Phylogenetics To place the [NiFe]-hydrogenase genes from our volcano Pseudonocardia sp. assembly into a broader phylogenetic context, we constructed a phylogeny using available sequence data from other studies. A broad sampling of [NiFe]-hydrogenase large subunit amino acid sequences was obtained from the list of sequences provided by Vignais and Billoud (2007), along with their subgroup annotations. Sequences for a fifth subgroup were obtained through blastn searches using our assembled sequence, as well as from Constant et al. (2010). Incomplete sequences were not included in our analysis. All amino acid sequences 38 were aligned using ClustalW2 (Larkin et al., 2007) using default parameters, and a phylogeny was made using the neighbor-joining algorithm implemented in MEGA 6 (Tamura et al., 2013) using the Poisson model with 1000 bootstrap replications. 3.4 Results Sequencing and rDNA Diversity After trimming we were left with 3.85 million reads that total 1.3 Gb of DNA sequence data for downstream analysis. Each of the two site libraries contained nearly identical distributions of bacterial (99.2%), eukaryotic (0.5%) and archaeal (0.3%) reads, based on all MG RAST annotation databases. We found a low diversity community populated mostly by Actinobacteria (Table 3.1), which make up 98.6 and 97.9% of the 16S rRNA genes from the site 1 and 3 libraries, respectively. This highly uneven community structure is significantly under dispersed (P < 0.001 and 0.01 for the phylogenetic randomization tests on the two samples), indicating a likely non-random assemblage of bacterial lineages. All lineages shown in Figure 3.1 belong within the Actinomycetales, other than an OTU3% belonging to the Acidimicrobiales order (Supplementary Figure 1) that makes up 15.6% of the site 3 library, but only 1.9% of the site 1 library. The Pseudonocardia are by far the most abundant lineages (72.2% of site 1 and 56.3% of site 3 total 16S reads) and the Saccharopolyspora (Pseudonocardiaceae) also make up 8.8% and 12.6% of total 16S rRNA gene reads from sites 1 and 3, respectively. 39 Figure 3.1 Community profile (A) All OTU3% taxonomic assignments from each site that represent at least 1% of the total metagenome 16S gene reads. These 12 OTUs constitute 94.3 and 92.6% of the total 16S gene reads from sites 1 (gray bars) and 3 (black bars) respectively and are all members of Actinomycetales other than the single Acidimicrobiales OTU3%. (B) Maximum likelihood phylogeny of the most abundant Pseudonocardia OTU3%, split into sub OTUs1%. The scale bar represents 1% divergence between nucleic acid sequences. Genetic Inventory The Llullaillaco metagenomes show a pronounced reduction in genes associated with carbohydrate metabolism compared with other desert and non-desert metagenomes (Figure 3.2). By contrast we found significant enrichment of pathways categorized as membrane transport, nucleotide metabolism, regulation and cell signaling, nitrogen metabolism and virulence and defense. Examining the presence and absence of metabolic pathways within the total 40 metagenome, we found no evidence for complete photosynthetic pathways, yet found complete gene sets for the oxidation of CO and H2, and for CO2 fixation with the Calvin cycle. Methylotrophic pathways also suggest a role for other C1 compound oxidation and assimilation including: methanol, formaldehyde, formate and perhaps methane. No nitrogen (N2) fixation or ammonia monooxygenase genes were identified, but genes for nitrate (NO−3) reduction (nitrate reductase) and ammonia (NH3) assimilation (glutamine synthetase) were found in high abundance. Figure 3.2 Inventories of gene functional categories, comparing non-desert (gray), desert (black) biomes to the high-elevation volcano metagenomes (blue). Asterisks indicate Bonferroni corrected significant differences (P < 0.0009) between the volcano data and desert or non-desert data (desert to non-desert comparisons not shown) for all pairwise T-tests. Genome Assembly Results 41 We were able to assemble and bin contigs (Supplementary Figures 3, 4) that represent composite genomes of the most abundant Pseudonocardia sp. (Table 3.2), as well as the other lower abundance community members, such as a member of the Acidimicrobiales (Supplementary Figures 1, 3). The best Pseudonocardia assembly appears to represent a nearly complete set of non-repetitive genomic elements since it contains 138/139 CSCGs (missing a DNA uptake competence gene, PF03772). None of the CSCGs were present in more than one copy in the metagenome assemblies, suggesting we did not greatly over-assemble this genome. The CSCGs are 139 protein coding genes that were found to occur only once in at least 90% of the 1515 finished bacterial genomes available in the IMG/ER database (Rinke et al., 2013). Within each of the new Pseudonocardia sp. assemblies, 2–3 single nucleotide polymorphisms (SNPs) were present in many of the CDS regions, which are likely indicative of strain and population level variation. Table 3.2. Summary of metagenome Pseudonocardia sp. assemblies and nearest phylogenetic reference genome, P. asaccharolytica (JGI IMG id 13496). COG Comparisons COG counts from our highest quality Pseudonocardia sp. assembly (68–115 × coverage bin from site 1) and the three other publicly available genomes for named Pseudonocardia spp. (Figure 3.3) highlight some of the specific differences in genome content. We found certain COGs like those needed for CO oxidation are conserved at high copy numbers across all the Pseudonocardia spp., and that COGs such as those required for assimilatory nitrate reduction and carbon fixation (RuBisCO) show relatively higher counts in both our metagenome assembly 42 and P. asaccharolytica. Other highly abundant gene clusters within our metagenome assembly bear resemblance to the more phylogenetically distant Pseudonocardia spp. These clusters include the antibiotic producing non-ribosomal peptide synthesis pathway (NRPS), various ABC peptide importers, cytochrome P450 monooxygenase, and several recombinases. Figure 3.3 Venn diagram of the shared and unique genes (COGs) among named Pseudonocardia spp. with complete genomes and the volcano Pseudonocardia sp. genome assembly. Although most of the 50 COGs unique to the volcano Pseudonocardia sp. are classified as “function unknown” or “general function prediction only,” the six additional defense mechanism related COGs and the nine fewer carbohydrate transport and metabolism COGs in the volcano Pseudonocardia sp. stand out as potentially relevant functional differences with other Pseudonocardia spp. Signatures of Selection Analysis Of the 5024 annotated CDS from the draft P. asaccharolytica genome we were able to initially align 1722 orthologous coding sequences from our best metagenome Pseudonocardia sp. assembly with at least 70% nucleotide identity. Of these, manual inspection filtered out 462 gene pairs that were poorly aligned or were not true homologs across the entire sequence. There were 59 remaining ortholog pairs (4.7%) with estimated Ka:Ks ratios ≥ 1, which reflects elevated rates of non-synonymous mutations brought about through strong divergent selection acting upon the amino acid sequences (Figure 3.4, Supplementary Table 1). 43 Figure 3.4 Distribution of Ka:Ks ratios for 1260 pairwise orthologous protein coding sequences between the best volcano Pseudonocardia sp. assembly and its closest fully-sequenced relative, Pseudonocardia asaccharolytica, showing the majority of genes (95.3%) to be under purifying or relaxed selection regimes, where synonymous substitutions that do not alter the amino acid coding potential dominate the gene. However, some outliers (4.7%) display higher levels of non-synonymous mutations (≥1 Ka:Ks) likely driven by divergent selection from the harsh high-elevation desert conditions. This analysis was limited to 23% of total volcano Pseudonocardia sp. genes due to the high degree of overall genomic divergence between these two species. Characteristics of the Volcano Pseudonocardia sp. Genome The volcano Pseudonocardia sp. genome is at least 4.9 Mb (Table 3.2) and contains many of the pathways that define the total community metabolic potential (e.g., aerobic heterotrophic metabolism, NO−3 and NH3 utilization, H2 and CO oxidation, CO2 fixation and methylotrophic pathways, Figure 3.5). Many genes (33%) were found with multiple copies in the genome, suggesting a possible role for gene duplication events during the divergence of this genome. Potential carbohydrate oxidation pathways are quite limited, with genes present only for the utilization of glucose, mannose, ribose, gluconate, maltose, trehalose, lactose, and galactose that feed into the Embden-Meyerhof-Parnas pathway or the pentose phosphate pathway. Carbohydrate uptake potential is apparently even more restricted as only a single annotated maltose ABC importer was identified. A complete list of putative gene annotations can be found in the IMG/ER database (id 45716). 44 Figure 3.5 Ecophysiological overview of the volcano Pseudonocardia sp. metabolic pathways as inferred from assembled metagenomic data. sMMO, soluble methane monooxygenase; MDH, (PQQ)-dependent methanol dehydrogenase; FDH, formaldehyde dehydrogenase; FoDH formate dehydrogenase-O; NDH, group 5 high-affinity NiFe hydrogenase, ATPS, ATP synthase; ETC electron transport chain; COD, form I carbon monoxide dehydrogenase; AsE, arsenite efflux; CYP, cyanate permease; CYL cyanate lyase; AMI, ammonium importer; NAS, assimilatory nitrate reductase; NAR, respiratory nitrate reductase; NIE, nitrite extrusion protein; NIR, nitrite reductase; GS, glutamine synthetase; SPM, sulfate permease; 3PG 3-phosphoglyceric acid; PHB, polyhydroxybutyrate; Gln, glutamine. Hydrogenase Phylogeny Results 45 Our phylogenetic analysis of [NiFe]-hydrogenase sequences confirmed that the volcano Pseudonocardia sp. assembly includes a group 5 [NiFe]-hydrogenase gene (Figure 3.6). Our phylogeny resolved a monophyletic clade for hydrogenase group 5, which includes the group 5 hydrogenase sequences from Constant et al. (2010) as well as several other Actinobacterial phylogypes. [NiFe]-hydrogenase protein sequences that are most closely related to the volcano Pseudonocardia sp. came from P. asaccharolytica, Pseudonocardia spinosispora, and Actinomycetospora chiangmaiensis. 46 Figure 3.6 Neighbor-joinging phylogenetic tree of [NiFe]-hydrogenase amino acid sequences. The phylotype from our Pseudonocardia sp. assembly (star) falls into the same clade as sequences shown in Constant et al. (2010), which are marked with circles. Sequences from other [NiFe]-hydrogenase large subunit subclades (L1–L4, Vignais and Billoud, 2007) are shown as the outgroup. Bootstrap support values are shown for nodes present in over 80% of bootstrapped trees. The scale bar represents 20% divergence between amino acid sequences. 3.5 Discussion The conditions present in the most extreme Atacama Desert soils exclude most life and leaves open the questions of if and how microbes may survive there. Previous studies of Atacama Desert soil microbiota have used either 16S gene based culture-independent approaches (Navarro-González et al., 2003; Costello et al., 2009; Lynch et al., 2012; Neilson et al., 2012), or to a limited extent culture-dependent methods (Lester et al., 2007; Okoro et al., 2009). Taken together, the pioneering work done on Atacama soils indicates that low diversity microbial communities are present at many sites, though few details have emerged regarding the origins and functional nature of these microorganisms. In this study, we used a deep metagenomic sequencing strategy to examine the structure and functional potential of the Llullaillaco Volcano microbial community (Lynch et al., 2012). Difficulty with extracting DNA from very low biomass mineral soils required us to pool roughly the equivalent of 60 standard 0.25 g soil DNA extractions to achieve the quantity of genomic DNA necessary for shotgun metagenomic sequencing. As a result, this dataset is less spatially expansive than our previous amplicon based analysis (Lynch et al., 2012), yet still demonstrates the low-diversity community structure extends throughout a relatively large volume of soil. Despite the limitations of this study, the approach allowed for a more thorough description of the Llullaillaco Volcano microbial community structure, and provides an initial insight into the protein coding potential of the metagenome as well as the most abundant community member's genome. 47 Through this approach we found an extremely low-diversity community of organisms (Figure 3.1, Table 3.1) that host an unusual inventory of functional genes (Figure 3.2), including an absence of phototrophic pathways and limited capacity for heterotrophic carbohydrate metabolism. The low diversity community lacks many of the clades previously recovered from high-elevation air (Bowers et al., 2012) and dust (Stres et al., 2013) microbiome studies, suggesting a high degree of environmental selection that could occur during atmospheric transport to these Atacama sites, or during active or dormant residence in the mineral soils. The most abundant 16S gene OTU (Pseudonocardia sp.) recovered from the two sites used in this study (and from the third “low site” from Lynch et al., 2012), shares a relationship with Pseudonocardia sp. detected in other high elevation samples from Himalayan and Antarctic mineral soils (Rhodes et al., 2013), as well as with isolates from Icelandic volcanic deposits (Cockell et al., 2013) leaving open the possibilities it may be native to these sites or that it could be present at the Llullaillaco Volcano sites as a consequence of atmospheric transport (Stres et al., 2013). It is noteworthy that the Acidimicrobiales OTUs3% (Figure 3.1) found in this environment (15.6% of the site 3 library, and 1.9% of the site 1 library) is related to known inhabitants of fumaroles (Supplementary Figure 1, Benson et al., 2011; Itoh et al., 2011), so it is likely that at least some of the organisms present at our research sites are the result of regional wind transport from active fumaroles on nearby Socompa Volcano (Costello et al., 2009), or from as yet undiscovered fumarolic activity on Llullaillaco Volcano. Indeed, we found Acidimicrobiales 16S gene sequences identical to those from the Llullaillaco Volcano in warm fumaroles of Socompa Volcano (Costello et al., 2009). It is also possible that the presence of known fumarole inhabitants indicates that our research sites are located on soils that were originally fumarolic and that the organisms found there are relics that have survived as dormant 48 spores. This would explain the presence of genes for the utilization of gases that are found in fumarolic emissions (e.g., CO and H2), rather than the idea that they serve to metabolize the exceedingly low concentrations of atmospheric gases found at elevations above 6000 m.a.s.l. Energetics Detailed examination of the most abundant community member's genome assembly reveals unique genetic content (Figure 3.3), evidence for divergent natural selection acting on certain homologs (Figure 3.4, Supplementary Table 2) and complete metabolic pathways related to trace atmospheric substance metabolism (Figure 3.5). Unidentified soil oligotrophs have long been suspected of oxidizing ubiquitous trace gases like H2, CO, and CH4 based on evidence from bulk soil process studies (Conrad, 1996; Constant et al., 2011). Although unequivocal demonstrations of bacterial growth and cell division from trace gas metabolism have been elusive, several actinobacterial isolates have been shown to oxidize ambient H2 and CO at atmospheric concentrations (Constant et al., 2008; King, 2003b). In certain actinobacteria, ambient H2 oxidation has now been conclusively tied to the activity of high-affinity group 5 [NiFe] hydrogenases (Greening et al., 2014). [NiFe] hydrogenases are membrane-bound enzymes that catalyze the splitting of periplasmic H2, facilitating the production of a proton gradient for ATP synthesis (Figure 3.5, “NDH”). A novel group 5 [NiFe] hydrogenase gene set is present in our genome assembly of the most abundant volcano Pseudonocardia sp. (Figure 3.6), indicating that the dominant organism at this site likely has the ability to utilize atmospheric concentrations of H2 (0.53 ppmv, at sea level, but about 0.24 ppmv at 6000 m.a.s.l.) for energy production. Greening et al. (2014) also found that Mycobacterium smegmatis group 5 [NiFe] hydrogenase expression levels increased 49 under carbon starvation conditions, implicating the oxidation of H2 as a source of electrons during low metabolic states. Given the low levels of organic carbon measured at the volcano sites (Table 3.1), and the phylogenetic affiliation between the group 5 volcano Pseudonocardia sp. [NiFe] hydrogenase and the M. smegmatis group 5 [NiFe] hydrogenase (sharing 80% amino acid identity) studied by Greening et al. (2014), oxidation of trace H2 seems to be a plausible energy source for the new Pseuodnocardia sp. However, [NiFe] hydrogenase genes are not the only genes we observed that could be used to metabolize atmospheric substrates. Previous studies have correlated a widespread occurrence of carbon monoxide dehydrogenase genes with soil CO uptake (King, 2003a; Weber and King, 2010; Quiza et al., 2014), and various soil bacterial isolates have been confirmed to oxidize CO at atmospheric concentrations (<400 ppbv at sea level, Hardy and King, 2001; King, 2003b). Carbon monoxide dehydrogenase functions similarly to [NiFe] hydrogenase, in that it is a membrane-bound enzyme that facilitates the generation of a proton gradient. In this case, the enzyme oxidizes CO and reduces H2O, forming CO2 and two periplasmic protons (Figure 3.5, “COD”). M. smegmatis has been shown to be capable of trace CO uptake, and hosts canonical type I carbon monoxide dehydrogenase genes (Quiza et al., 2014), similar to the CO dehydrogenase genes present in the volcano Pseudonocardia sp. assembly. However, it is not yet clear how this activity affects cellular physiology. It is likely that tropospheric CO oxidation is often a supplemental energy source, contributing to a mixotrophic metabolism (King and Weber, 2007). Thus, physiological work focused on high-affinity CO oxidizing bacteria must carefully consider the possible requirements and roles of organic carbon sources, in addition to tracking lowconcentration CO uptake (King and King, 2014). 50 The volcano Pseudonocardia sp. genome encodes complete pathways for the oxidation and assimilation of methanol, formaldehyde, and formate (Figure 3.5). The atmosphere contains very low concentrations of these gases mainly due to plant volatile emission and photochemical reactions (Hu et al., 2011; Stavrakou et al., 2011; Luecken et al., 2012). The study of bacterial metabolism of atmospheric concentrations of these C1 compounds is limited, although efforts are underway to develop an understanding of the distributions of methylotrophs and how they influence the global methanol cycle (Kolb and Stacheter, 2013). Furthermore, some evidence suggests that various Actinobacteria (e.g., Streptococcus and Rhodococcus spp., Yoshida et al., 2007) are capable of “CO2 dependent oligotrophic growth” under laboratory carbon starvation conditions by oxidizing ambient methanol and formaldehyde (Yoshida et al., 2011), suggesting these C1 gases can be atmospheric sources of energy and carbon for some bacteria. Methane is the most abundant of the trace gases at 1.79 ppmv (or 0.80 ppmv at 6000 m.a.s.l.), so would seem to be a likely target for trace gas oxidizers. However, the Llullaillaco Volcano metagenome lacks any identifiable particulate methane monooxygenase (pMMO) genes, which have been previously identified as likely coding for the high-affinity methane oxidation enzymes in various soils (Bull et al., 2000; Kolb, 2009). Likewise the study of earlysuccessional Kilauea Volcano soils by King (2003a) detected CO and H2 uptake, but not CH4. Yet the volcano Pseudonocardia sp. does encode all genes required for a putative iron-dependent soluble methane monooxygenase (sMMO) enzyme that could function to oxidize methane to methanol, which would then be fed into the abovementioned methylotrophic pathways. sMMOs are notoriously non-specific enzymes (Green and Dalton, 1989), and atmospheric concentrations of methane have not yet been reported to support bacterial growth (Theisen and Murrell, 2005; Conrad, 2009). Nevertheless, the evidence for widespread ambient methane oxidation 51 (McDonald et al., 2008) and experimental confirmation of methane oxidation by members of the phylum Verrucomicrobia (Dunfield et al., 2007) illustrates the continued need to explore the phylogenetic and geographic distributions of methane oxidizers. Given the presence of these various gas utilization pathways in the volcano Pseudonocardia sp. genome (Figure 3.5), and the constant availability of these substrates at low concentrations in the atmosphere, the high-elevation volcanic deposit community may rely on a mixture of diffuse atmospheric substrates in the absence of direct photosynthetic inputs to at least maintain redox balance, or perhaps even to drive carbon fixation. However, it is important to note the volcano Pseudonocardia sp. shares nearly all of these aforementioned trace gas oxidation pathways (Figures 3.5, 3.6) with P. asaccharolytica, its nearest phylogenetic relative (Figure 3.1). P. asaccharolytica does lack a (PQQ)-dependent methanol dehydrogenase gene, but these were present in other Pseudonocardia spp. (Figure 3.3). While no studies to date have tested P. asaccharolytica for trace gas metabolism either in situ or in culture (Reichert et al., 1998), the trace gas metabolism related genes common to the P. asaccharolytica and the volcano Pseudonocardia sp. genomes have been shown to confer trace gas metabolism capacity in other bacteria (Figure 3.6), making it a plausible trait shared by various members of this genus. Consequently, the relevance of trace gas utilization as a potential metabolic strategy in the harsh Atacama Desert mineral soils of this study is difficult to interpret, since trace gas metabolism genes are not exclusive to Pseudonocardia sp. recovered from desert environments. Atmospheric gas metabolism is not mutually exclusive with other trophic strategies. The volcano Pseudonocardia sp. hosts fully encoded aerobic heterotrophic and autotrophic carbon acquisition pathways, and several energy storage pathways (Figure 3.5). The large and small RuBisCO subunit genes of the volcano Pseudonocardia sp. both cluster within the form IC 52 clade, which contains other known bacterial facultative autotrophs (Yuan et al., 2012) including various Actinobacteria such as P. asaccharolytica, further suggesting a flexibility in carbon and energy acquisition physiology. It is certainly possible this organism is opportunistic, capable of survival at low metabolic rates through the utilization of a variety of low-concentration and constantly replenished atmospheric gases, but perhaps is also capable of capitalizing on pulses of other multi-carbon nutrients and water when they become available, such as after a snow melt event. Further understanding of the environmental conditions and how they vary through annual cycles at these difficult to access field sites combined with direct experimental growth assays will be required to test if and how this bacterium, or other members of the community, may grow under and respond to, variable and stressful conditions. Stress Tolerance and Other Traits Metabolism of various trace atmospheric substrates may be important adaptations to survival in the harsh and nutrient limited desert volcano environment, but the reduced and under-dispersed phylogenetic diversity of the microbial community (Figure 3.1, Table 3.1) suggests that other traits must be important for fitness, given that H2 and CO oxidizing genes are present in many species of several bacterial phyla. Actinobacteria have a seemingly ubiquitous distribution across varied terrestrial and aquatic environments (Dinsdale et al., 2008), but are relatively most abundant in cold-desert soil environments (Fierer et al., 2012). Some obvious traits of the actinobacteria are likely linked to desert fitness, such as gram positive cell wall architecture, which is perhaps an original adaptation to ancient terrestrial colonization (Battistuzzi and Hedges, 2009; Rinke et al., 2013), and the ability of many lineages to sporulate. However, given 53 the metabolic diversity and rapid genomic evolution found within this phylum (Zaneveld et al., 2010), the full scope of desert actinobacteria traits remains largely uncharacterized. The volcano Pseudonocardia sp. assembly contains COGs with relatively high copy number compared to other species of the genus that could possibly underlie stress tolerance adaptations including: DNA replication and repair machinery, transcriptional regulators, response regulators, cytochrome P450, arabinose efflux permeases, ABC-type multidrug transport systems and non-ribosomal peptide synthesis pathways (NRPS). It is not possible to determine the exact functional roles these genes play without experimental confirmation, but it is conceivable they could be linked to adaptations to the stresses of wet-dry or freeze-thaw cycling or UV exposure. The multiple copies (≥18) of the NRPS genes are notable because they share sequence homology most similar to the antibiotic gramicidin D gene set (Kessler et al., 2004). Considering the known importance of extrapolymeric substance production as a xerotolerace trait for many microorganisms (Lennon et al., 2012), and the presence of arabinose and polysaccharide export genes in the volcano Pseudonocardia sp. genome, it is not surprising that investment in antibiotic defense mechanisms that may ward off scavengers of these vulnerable carbon sources (e.g., fungi, Schmidt et al., 2012) may also be necessary. We compared all well aligned homologs between the volcano Pseudonocardia sp. to P. asaccharolytica in order to identity how selection may have affected the amino acid sequences (and functions) of certain genes. P. asaccharolytica was isolated from a dimethyl sulfide and tree-bark biofilter enrichment experiment (Reichert et al., 1998), but little else is known about its ecology or physiology other than the lack of ability to oxidize any of the single carbohydrates tested in the original report, and that it can be grown at moderate rates on TSA media at mesophillic temperatures. Our analysis identified 59 volcano Pseudonocardia sp. genes (4.7% of 54 all analyzed homolog pairs, Supplementary Table 1) that have higher rates of non-synonymous mutations when compared to their homolog in Pseudonocardia assaccharolytica (Ka:Ks ≥ 1) because they evolved under a strong divergent selection regime (Figure 3.4). These genes fall into categories of protein translation (four tRNA methyltransferase modification enzymes and a ribosomal modulation protein), respiration (succinate dehydrogenase), energy storage (acyl CoA dehydrogenase) and membrane transport (polysaccharide, multidrug, potassium, phosphate and cyanate). Other annotations of genes found with a ≥ 1 Ka:Ks ratio are more difficult to interpret such as 13 uncharacterized conserved proteins and three transposases, but underscore the potential for discovery of novel microbial traits from understudied environments and taxa. Although this analysis cannot determine the particulars of how these genes differ in terms of the reaction kinetics or substrate specificities of the enzymes they code for, functions like membrane transport and energy storage could plausibly underlie important survival traits for conditions in the nutrient limited high-elevation volcanic deposits of this study. Another interesting aspect of the Ka:Ks ratio analysis is that only 23% of total volcano Pseudonocardia sp. protein coding genes could be unambiguously aligned to homologs from P. asaccharolytica. The remaining 77% of genes are too divergent to analyze with this method. This limits the power of the analysis somewhat, but highlights the genetic novelty of each of these organisms, and suggests that further genomic and culture work on the Pseudonocardia spp. is warranted. We find the most abundant genome in the community is intermediately sized (4.9 Mb, not including highly repetitive content, Table 3.2), and codes for diverse metabolic potential. This size is not unexpected though, as work by Konstantinidis and Tiedje (2004) shows evidence that heterogeneous, variable, and low nutrient niches in soils select for larger genomes, which 55 often contain enhanced regulatory and secondary metabolite synthesis pathways. Barberán et al. (2014) recently expanded this concept by showing that, to some extent, genome size is a reflection of the complexity and variability of terrestrial bacterial niches. Thus, even though utilization of low concentration atmospheric substrates may be important traits for the volcano Pseudonocardia sp., we did not expect to find signatures of genome streamlining, as have been documented in oceanic bacteria that specialize in low concentration nutrient uptake (Giovannoni et al., 2014). Given the variability of a high mountain top environment (Lynch et al., 2012) that experiences frequent wet-dry and freeze-thaw cycling stresses (Stres et al., 2010), we are not surprised to find significantly higher numbers of genes classified in the regulation and cell signaling categories in the total metagenome (Figure 3.2), as well as specific examples of transcription and response regulator genes with high copy numbers, and with high Ka:Ks ratios in the genome of the most abundant community member. Conclusions The functional inferences drawn from this culture-independent study can now serve as testable hypotheses for ongoing culture-based experiments. Although a modest collection of bacteria and fungi have been cultured and isolated from these volcano samples using a variety of selection techniques (unpublished), the most abundant lineages observed from culture-independent approaches have thus far resisted isolation. Nevertheless, the results we present here can inform future culture-based physiological analyses by providing information on potential electron donors and growth conditions. The atmosphere interfaces with diverse terrestrial and aquatic environments, so it is possible that the pathways and signatures of selection we have detected result from activity and 56 replication elsewhere. Selective dispersal and dormancy processes cannot be ruled out either; perhaps we have recovered genomic material from the most well-dispersing or longest surviving spores. Although there is little evidence to suggest that the most abundant organism from the Llullaillaco Volcano study sites is native to another environment, or is an exceptional spore producer, these are possibilities that cannot yet be rejected, especially considering the evidence for wind borne transport of other lower abundance lineages of the community (Supplementary Figure 1). Overall, our initial analyses of these metagenomes indicates that despite, or perhaps because of, the intense solar radiation this sparsely populated high-elevation microbial community lacks endogenous photosynthesizing primary producers, but possesses the genetic potential for utilization of various low molecular weight atmospheric substrates and CO2 fixation. This seems to support our hypothesis that chemoautotrophic, rather than photoautotrophic, microbes may be supplying organic carbon to simple and low-energy flux communities at these sites, but does not allow us to determine the relative roles that heterotrophic or mixotrophic metabolism may play. Bacterial growth on trace gases and aerosols is difficult to study and can likely support only low rates of metabolism. Answering whether or not the intriguing combination of metabolic pathways found in the volcano Pseudonocardia sp. genome indicates an actual dependency for growth on one or more atmospheric substrates requires direct physiological experimentation at relevant gas concentrations. These pathways could also be supplemental to more standard heterotrophic metabolism, and may not by themselves support growth and cell division. Future studies of these high-elevation actinobacteria and their relatives (Cockell et al., 2013; Rhodes et al., 2013) should consider the possibility that a mixture of 57 atmospheric, precipitation and soil derived substrates may be required for growth, or that these organisms are but remnants of extinct ecosystems or windblown transients. 3.6 References Abascal, F., Zardoya, R., and Telford, M. J. (2010). TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations. Nucleic Acids Res. 38, 7–13. doi: 10.1093/nar/gkq291 Barberán, A., Ramirez, K. S., Leff, J. W., Bradford, M. A, Wall, D. H., and Fierer, N. (2014). Why are some microbes more ubiquitous than others? Predicting the habitat breadth of soil bacteria. Ecol. Lett. 17, 794–802. doi: 10.1111/ele.12282 Battistuzzi, F. U., and Hedges, S. B. (2009). A major clade of prokaryotes with ancient adaptations to life on land. Mol. Biol. Evol. 26, 335–343. doi: 10.1093/molbev/msn247 Benson, C. A., Bizzoco, R. W., Lipson, D. A., and Kelley, S. T. (2011). Microbial diversity in non-sulfur, sulfur and iron geothermal steam vents. FEMS Microbiol. Ecol. 76, 74–88. doi: 10.1111/j.1574-6941.2011.01047.x Bowers, R. M., McCubbin, I. B., Hallar, A. G., and Fierer, N. (2012). Seasonal variability in airborne bacterial communities at a high-elevation site. Atmos. Environ. 50, 41–49. doi: 10.1016/j.atmosenv.2012.01.005 Bull, I. D., Parekh, N. R., Hall, G. H., Ineson, P., and Evershed, R. P. (2000). Detection and classification of atmospheric methane oxidizing bacteria in soil. Nature 405, 175–178. doi: 10.1038/35012061 Caporaso, J. G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F. D., Costello, E. K., et al. (2010). QIIME allows analysis of high-throughput community sequencing data. Nat. Methods. 7, 335–336. doi: 10.1038/nmeth.f.303 Chevreux, B., Wetter, T., and Suhai, S. (1999). Genome sequence assembly using trace signals and additional sequence information. Comput. Sci. Biol. 99, 45–56. Cockell, C. S., Kelly, L. C., and Marteinsson, V. (2013). Actinobacteria —an ancient phylum active in volcanic rock weathering. Geomicrobiol. J. 30, 706–720. doi: 10.1080/01490451.2012.758196 Conrad, R. (1996). Soil microorganisms as controllers of atmospheric trace gases (H2,CO, CH4, OCS, N2O, and NO).Microbiol. Rev. 60:609. 58 Conrad, R. (2009). The global methane cycle: recent advances in understanding the microbial processes involved. Environ. Microbiol. Rep. 1, 285–292. doi: 10.1111/j.17582229.2009.00038.x Constant, P., Chowdhury, S. P., Hesse, L., Pratscher, J., and Conrad, R. (2011). Genome data mining and soil survey for the novel group 5 [NiFe]-hydrogenase to explore the diversity and ecological importance of presumptive high-affinity H2-oxidizing bacteria. Appl. Environ. Microbiol. 77, 6027–6035. doi: 10.1128/AEM.00673-11 Constant, P., Chowdhury, S. P., Pratscher, J., and Conrad, R. (2010). Streptomycetes contributing to atmospheric molecular hydrogen soil uptake are widespread and encode a putative high-affinity [NiFe]-hydrogenase. Environ. Microbiol. 12, 821–829. doi: 10.1111/j.1462-2920.2009.02130.x Constant, P., Poissant, L., and Villemur, R. (2008). Isolation of Streptomyces sp. PCB7, the first microorganism demonstrating high-affinity uptake of tropospheric H2. ISME J. 2, 1066–1076. doi: 10.1038/ismej.2008.59 Costello, E. K., Halloy, S. R. P., Reed, S. C., Sowell, P., and Schmidt, S. K. (2009). Fumarolesupported islands of biodiversity within a hyperarid, high-elevation landscape on Socompa Volcano, Puna de Atacama, Andes. Appl. Environ. Microbiol. 75, 735–747. doi: 10.1128/AEM.01469-08 Cox, M. P., Peterson, D. A., and Biggs, P. J. (2010). SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinformatics 11:485. doi: 10.1186/14712105-11-485 Dick, G. J., Andersson, A. F., Baker, B. J., Simmons, S. L., Thomas, B. C., Yelton, A. P., et al. (2009). Community-wide analysis of microbial genome sequence signatures. Genome Biol. 10:R85. doi: 10.1186/gb-2009-10-8-r85 Dinsdale, E. A., Edwards, R. A., Hall, D., Angly, F., Breitbart, M., Brulc, J. M., et al. (2008). Functional metagenomic profiling of nine biomes. Nature 452, 629–632. doi: 10.1038/nature06810 Dunfield, P. F., Yuryev, A., Senin, P., Smirnova, A. V., Stott, M. B., Hou, S. B., et al. (2007). Methane oxidation by an extremely acidophilic bacterium of the phylum Verrucomicrobia. Nature 450, 879–882. doi: 10.1038/nature06411 Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797. doi: 10.1093/nar/gkh340 Edgar, R. C. (2010). Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461. doi: 10.1093/bioinformatics/btq461 59 Fierer, N., Leff, J. W., Adams, B. J., Nielsen, U. N., Thomas, S., Lauber, C. L., et al. (2012). Cross-biome metagenomic analyses of soil microbial communities and their functional attributes. Proc. Natl. Acad. Sci. U.S.A. 109, 21390–21395. doi: 10.1073/pnas.1215210110 Fu, L., Niu, B., Zhu, Z., Wu, S., and Li, W. (2012). CD-HIT: accelerated for clustering the nextgeneration sequencing data.Bioinformatics 28, 3150–3152. doi: 10.1093/bioinformatics/bts565 Giovannoni, S. J., Cameron Thrash, J., and Temperton, B. (2014). Implications of streamlining theory for microbial ecology. ISME J. 8, 1553–1565. doi: 10.1038/ismej.2014.60 Gomez-Alvarez, V., Teal, T. K., and Schmidt, T. M. (2009). Systematic artifacts in metagenomes from complex microbial communities. ISME J. 3, 1314–1317. doi: 10.1038/ismej.2009.72 Gómez-Silva, B., Rainey, F. A., Warren-Rhodes, K. A., McKay, C. P., and Navarro-González, R. (2008). “Atacama desert soil microbiology,” in Microbiology of Extreme Soils, ed P. Dion and C. S. Nautiyal (Berlin; Heidelberg: Springer-Verlag), 117–132. Green, J., and Dalton, H. (1989). Substrate specificity of soluble methane monooxygenase. J. Biol. Chem. 264, 17698–17703. Greening, C., Berney, M., Hards, K., Cook, G. M., and Conrad, R. (2014). A soil actinobacterium scavenges atmospheric H2 using two membrane-associated, oxygen-dependent [NiFe] hydrogenases. Proc. Natl. Acad. Sci. U.S.A. 111, 4257–4261. doi: 10.1073/pnas.1320586111 Guindon, S., Dufayard, J.-F., Lefort, V., Anisimova, M., Hordijk, W., and Gascuel, O. (2010). New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321. doi: 10.1093/sysbio/syq010 Hardy, K. R., and King, G. M. (2001). Enrichment of high-affinity CO oxidizers in Maine forest soil. Appl. Environ. Microbiol. 67, 3671–3676. doi: 10.1128/AEM.67.8.3671-3676.2001 Hartley, A., Chong, G., Houston, J., and Mather, A. E. (2005). 150 million years of climatic stability: evidence from the Atacama Desert, northern Chile. J. Geol. Soc. 162, 421–424. doi: 10.1144/0016-764904-071 Hu, L., Millet, D. B., Mohr, M. J., Wells, K. C., Griffis, T. J., and Helmig, D. (2011). Sources and seasonality of atmospheric methanol based on tall tower measurements in the US Upper Midwest. Atmos. Chem. Phys. 11, 11145–11156. doi: 10.5194/acp-11-11145-2011 Itoh, T., Yamanoi, K., Kudo, T., Ohkuma, M., and Takashina, T. (2011). Aciditerrimonas ferrireducens gen. nov., sp. nov., an iron-reducing thermoacidophilic actinobacterium isolated from a solfataric field. Int. J. Syst. Evol. Microbiol. 61, 1281–1285. doi: 10.1099/ijs.0.023044-0 60 Kessler, N., Schuhmann, H., Morneweg, S., Linne, U., and Marahiel, M. A. (2004). The linear pentadecapeptide gramicidin is assembled by four multimodular nonribosomal peptide synthetases that comprise 16 modules with 56 catalytic domains. J. Biol. Chem. 279, 7413–7419. doi: 10.1074/jbc.M309658200 King, C. E., and King, G. M. (2014). Description of Thermogemmatispora carboxidivorans sp. nov., a carbon-monoxide-oxidizing member of the class Ktedonobacteria isolated from a geothermally heated biofilm, and analysis of carbon monoxide oxidation by members of the class Ktedonobacteria. Int. J. Syst. Evol. Microbiol. 64, 1244–1251. doi: 10.1099/ijs.0.059675-0 King, G. M. (2003a). Contributions of atmospheric CO and hydrogen uptake to microbial dynamics on recent hawaiian volcanic deposits. Appl. Environ. Microbiol. 69, 4067–4075. doi: 10.1128/AEM.69.7.4067-4075.2003 King, G. M. (2003b). Uptake of carbon monoxide and hydrogen at environmentally relevant concentrations by mycobacteria.Appl. Environ. Microbiol. 69, 7266–7272. doi: 10.1128/AEM.69.12.7266-7272.2003 King, G. M., and Weber, C. F. (2007). Distribution, diversity and ecology of aerobic COoxidizing bacteria. Nat. Rev. Microbiol. 5, 107–118. doi: 10.1038/nrmicro1595 Kolb, S. (2009). The quest for atmospheric methane oxidizers in forest soils. Environ. Microbiol. Rep. 1, 336–346. doi: 10.1111/j.1758-2229.2009.00047.x Kolb, S., and Stacheter, A. (2013). Prerequisites for amplicon pyrosequencing of microbial methanol utilizers in the environment. Front. Microbiol. 4:268. doi: 10.3389/fmicb.2013.00268 Konstantinidis, K. T., and Tiedje, J. M. (2004). Trends between gene content and genome size in prokaryotic species with larger genomes. Proc. Natl. Acad. Sci. U.S.A. 101, 3160–3165. doi: 10.1073/pnas.0308653100 Larkin, M. A., Blackshields, G., Brown, N. P., Chenna, R., McGettigan, P. A., McWilliam, H., et al. (2007). Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947–2948. doi: 10.1093/bioinformatics/btm404 Lennon, J. T., Aanderud, Z. T., Lehmkuhl, B. K., and Schoolmaster, D. R. (2012). Mapping the niche space of soil microorganisms using taxonomy and traits. Ecology 93, 1867–1879. doi: 10.1890/11-1745.1 Lester, E. D., Satomi, M., and Ponce, A. (2007). Microflora of extreme arid Atacama Desert soils. Soil Biol. Biochem. 39, 704–708. doi: 10.1016/j.soilbio.2006.09.020 Luecken, D. J., Hutzell, W. T., Strum, M. L., and Pouliot, G. A. (2012). Regional sources of atmospheric formaldehyde and acetaldehyde and implications for atmospheric modeling. Atmos. Environ. 47, 477–490. doi: 10.1016/j.atmosenv.2011.10.005 61 Lynch, R. C., King, A. J., Farías, M. E., Sowell, P., Vitry, C., and Schmidt, S. K. (2012). The potential for microbial life in the highest elevation (>6000 m.a.s.l.) mineral soils of the Atacama region. J. Geophys. Res. 117, G02028. doi: 10.1029/2012JG001961 Marçais, G., and Kingsford, C. (2011). A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.Bioinformatics 27, 764–770. doi: 10.1093/bioinformatics/btr011 Markowitz, V. M., Chen, I.-M. A., Palaniappan, K., Chu, K., Szeto, E., Grechkin, Y., et al. (2012). IMG: the Integrated Microbial Genomes database and comparative analysis system. Nucleic Acids Res. 40, 115–122. doi: 10.1093/nar/gkr1044 McDonald, D., Price, M. N., Goodrich, J., Nawrocki, E. P., DeSantis, T. Z., Probst, A., et al. (2012). An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J. 6, 610–618. doi: 10.1038/ismej.2011.139 McDonald, I. R., Bodrossy, L., Chen, Y., and Murrell, J. C. (2008). Molecular ecology techniques for the study of aerobic methanotrophs. Appl. Environ. Microbiol. 74, 1305–1315. doi: 10.1128/AEM.02233-07 Meyer, F., Paarmann, D., D'Souza, M., Olson, R., Glass, E. M., Kubal, M., et al. (2008). The metagenomics RAST server: a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9:386. doi: 10.1186/1471-2105-9-386 Michalski, G., Böhlke, J. K., and Thiemens, M. (2004). Long term atmospheric deposition as the source of nitrate and other salts in the Atacama Desert, Chile: new evidence from massindependent oxygen isotopic compositions. Geochim. Cosmochim. Acta 68, 4023–4038. doi: 10.1016/j.gca.2004.04.009 Navarro-González, R., Rainey, F. A., Molina, P., Bagaley, D. R., Hollen, B. J., de la Rosa, J., et al. (2003). Mars-like soils in the Atacama Desert, Chile, and the dry limit of microbial life. Science 302, 1018–1021. doi: 10.1126/science.1089143 Neilson, J. W., Quade, J., Ortiz, M., Nelson, W. M., Legatzki, A., Tian, F., et al. (2012). Life at the hyperarid margin: novel bacterial diversity in arid soils of the Atacama Desert, Chile. Extremophiles 16, 553–566. doi: 10.1007/s00792-012-0454-z Okoro, C. K., Brown, R., Jones, A. L., Andrews, B. A., Asenjo, J. A., Goodfellow, M., et al. (2009). Diversity of culturable actinomycetes in hyper-arid soils of the Atacama Desert, Chile. Antonie Van Leeuwenhoek 95, 121–133. doi: 10.1007/s10482-008-9295-2 Overbeek, R., Begley, T., Butler, R. M., Choudhuri, J. V., Chuang, H.-Y., Cohoon, M., et al. (2005). The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 33, 5691–5702. doi: 10.1093/nar/gki866 62 Price, M. N., Dehal, P. S., and Arkin, A. P. (2009). FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26, 1641–1650. doi: 10.1093/molbev/msp077 Pruesse, E., Peplies, J., and Glöckner, F. O. (2012). SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes. Bioinformatics 28, 1823–1829. doi: 10.1093/bioinformatics/bts252 Punta, M., Coggill, P. C., Eberhardt, R. Y., Mistry, J., Tate, J., Boursnell, C., et al. (2012). The Pfam protein families database. Nucleic Acids Res. 40, 290–301. doi: 10.1093/nar/gkr1065 Pubmed Abstract | Pubmed Full Text | CrossRef Full Text Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., et al. (2013). The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596. doi: 10.1093/nar/gks1219 Quiza, L., Lalonde, I., Guertin, C., and Constant, P. (2014). Land-use influences the distribution and activity of high affinity CO-oxidizing bacteria associated to type I-coxL genotype in soil. Front. Microbiol. 5:271. doi: 10.3389/fmicb.2014.00271 Reichert, K., Lipski, A., Pradella, S., Stackebrandt, E., and Altendorf, K. (1998). New dimethyl disulfide-degrading actinomycetes and emended description of the genus Pseudonocardia. Int. J. Syst. Bacteriol. 48, 441–449. doi: 10.1099/00207713-48-2-441 Rhodes, M., Knelman, J., Lynch, R. C., Darcy, J. L., Nemergut, D. R., and Schmidt, S. K. (2013). “Alpine and arctic soil microbial communities,” in The Prokaryotes, eds E. Rosenberg, E. F. DeLong, E. Stackebrandt, S. Lory, and F. Thompson (Berlin: Springer), 44–56. Richter, M., and Schmidt, D. (2002). Cordillera de la Atacama. Das trockenste Hochgebirge der Welt. Petermanns Geographische Mitteilungen 146, 48–57. Rinke, C., Schwientek, P., Sczyrba, A., Ivanova, N. N., Anderson, I. J., Cheng, J.-F., et al. (2013). Insights into the phylogeny and coding potential of microbial dark matter. Nature 499, 431–437. doi: 10.1038/nature12352 Saldanha, A. J. (2004). Java Treeview–extensible visualization of microarray data. Bioinformatics 20, 3246–3248. doi: 10.1093/bioinformatics/bth349 Schmidt, S. K., Naff, C. S., and Lynch, R. C. (2012). Fungal communities at the edge: ecological lessons from high alpine fungi. Fungal Ecol. 5, 443–452. doi: 10.1016/j.funeco.2011.10.005 Schmidt, S. K., Nemergut, D. R., Miller, A. E., Freeman, K. R., King, A. J., and Seimon, A. (2009). Microbial activity and diversity during extreme freeze-thaw cycles in periglacial soils, 5400 m elevation, Cordillera Vilcanota, Perú. Extremophiles13, 807–816. doi: 10.1007/s00792009-0268-9 63 Stavrakou, T., Müller, J.-F., Peeters, J., Razavi, A., Clarisse, L., Clerbaux, C., et al. (2011). Satellite evidence for a large source of formic acid from boreal and tropical forests. Nat. Geosci. 5, 26–30. doi: 10.1038/ngeo1354 Stres, B., Philippot, L., Faganeli, J., and Tiedje, J. M. (2010). Frequent freeze-thaw cycles yield diminished yet resistant and responsive microbial communities in two temperate soils: a laboratory experiment. FEMS Microb. Ecol. 74, 323–335. doi: 10.1111/j.15746941.2010.00951.x Stres, B., Sul, W. J., Murovec, B., and Tiedje, J. M. (2013). Recently deglaciated high-altitude soils of the Himalaya: diverse environments, heterogenous bacterial communities and long-range dust inputs from the upper troposphere. PLoS ONE8:e76440. doi: 10.1371/journal.pone.0076440 Tamura, K., Stecher, G., Peterson, D., Filipski, A., and Kumar, S. (2013). MEGA6: molecular evolutionary genetics analysis version 6.0. Mol. Biol. Evol. 30, 2725–2729. doi: 10.1093/molbev/mst197 Tatusov, R. L. (1997). A genomic perspective on protein families. Science 278, 631–637. doi: 10.1126/science.278.5338.631 Theisen, A. R., and Murrell, J. C. (2005). Facultative Methanotrophs Revisited. J. Bacteriol. 187, 4303–4305. doi: 10.1128/JB.187.13.4303-4305.2005 Vignais, P. M., and Billoud, B. (2007). Occurrence, classification, and biological function of hydrogenases: an overview.Chem. Rev. 107, 4206–4272. doi: 10.1021/cr050196r Webb, C. O., Ackerly, D. D., and Kembel, S. W. (2008). Phylocom: software for the analysis of phylogenetic community structure and trait evolution. Bioinformatics 24, 2098–2100. doi: 10.1093/bioinformatics/btn358 Weber, C. F., and King, G. M. (2010). Distribution and diversity of carbon monoxide-oxidizing bacteria and bulk bacterial communities across a succession gradient on a Hawaiian volcanic deposit. Environ. Microbiol. 12, 1855–1867. doi: 10.1111/j.1462-2920.2010.02190.x Yang, Z. (1998). Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol. Biol. Evol. 15, 568–573. doi: 10.1093/oxfordjournals.molbev.a025957 Yang, Z. (2007). PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591. doi: 10.1093/molbev/msm088 Yoshida, N., Hayasaki, T., and Takagi, H. (2011). Gene expression analysis of methylotrophic oxidoreductases involved in the oligotrophic growth of Rhodococcus erythropolis N9T-4. Biosci. Biotechnol. Biochem. 75, 123–127. doi: 10.1271/bbb.100700 64 Yoshida, N. Y., Hhata, N. O., Oshino, Y. Y., Atsuragi, T. K., Ani, Y. T., and Akagi, H. T. (2007). Screening of carbon dioxide-requiring extreme oligotrophs from soil. Biosci. Biotechnol. Biochem. 71, 2830–2832. doi: 10.1271/bbb.70042 Yuan, H., Ge, T., Chen, C., O'Donnell, A. G., and Wu, J. (2012). Significant role for microbial autotrophy in the sequestration of soil carbon. Appl. Environ. Microbiol. 78, 2328–2336. doi: 10.1128/AEM.06881-11 Zaneveld, J. R., Lozupone, C., Gordon, J. I., and Knight, R. (2010). Ribosomal RNA diversity predicts genome diversity in gut bacteria and their relatives. Nucleic Acids Res. 38, 3869–3879. doi: 10.1093/nar/gkq066 65 CHAPTER 4 GENOMIC DIVERSITY IN CANNABIS 4.1 Introduction Plants of the genus Cannabis (Cannabaceae; hemp, drug-type) have been used for thousands of years for fiber, nutritional seedoil and medicinal or psychoactive effects. Archaeological evidence for hemp fiber textile production in China dates to at least as early as 6000 years ago (Li 1973), but possibly as early as 12,000 years ago (Russo, 2007), suggesting Cannabis was one of the first domesticated plants. Evidence for ancient medicinal or shamanistic use of Cannabis has been found at Chinese, Indian, middle-Eastern, Greek and sub-Saharan African sites (reviewed in: Russo, 2007), illustrating the extent of Cannabis use throughout human history. A central Asian site of domestication is commonly cited (Schultes et al. 1974), although genetic analyses suggest two independent domestication events may have occurred separately for Cannabis sativa and Cannabis indica (Hillig, 2005). Cannabis plants are usually annual wind-pollinated dioecious herbs, though individuals may live more than a year in subtropical climates (Cherniak 1982) and monoecious populations exist (de Meijer et al., 2003). The taxonomic composition of the genus remains unresolved, with two species (C. indica and C. sativa) commonly cited (Habib et al. 2013), although C. ruderalis is sometimes proposed as a third species that contains northern short-day or auto-flowering plants (Small and Cronquist 1976). Monospecific treatment of the genus as Cannabis sativa L. is also common (van Bakel et al., 2011) and various alternative species and subspecies names (e.g. C. chinensis or Cannabis sativa subsp. indica var. kafiristanica) are sometimes still referenced (reviewed in: Schultes et al. 1974). Given that no broad-scale samplings of Cannabis genomic 66 diversity are published to date, these classification schemes remain debatable. Domesticated and feral populations are currently growing on every continent except Antarctica, in addition to the putatively native wild populations found across Eurasia. These populations contain expansive phenotypic diversity in terpenoid and cannabinoid profiles (Hillig, 2004; Hillig and Mahlberg, 2004), as well as many morphological and life-history characteristics, further fueling debate regarding Cannabis taxonomic status (Russo, 2007). One unusual feature of the Cannabis genus is the production of a tremendous diversity of compounds called Cannabinoids, so named because they are not produced at high levels in any other plant species. Cannabinoids are a group of over 74 known C21 terpenophenolic compounds (Radwan et al., 2008; ElSohly and Slade, 2005) responsible for many reported medicinal and psychoactive effects of Cannabis consumption (Poklis et al., 2010). The plants synthesize a carboxylic acid form of these compounds, and heating is required to produce the pH-neutral forms that are most active in humans. Interestingly, these compounds have pronounced neurological effects on a wide range of vertebrate and invertebrate taxa, suggesting an ancient origin of the endocannabinoid receptors, perhaps as old as the last common ancestor or all extant bilaterians, over 500 MYA (Salzet et al., 2000; McPartland et al., 2006). The plant compounds thus produced have the potential to effect a broad range of metazoans, though their ecological function in nature is not well understood. Indeed, suggested roles for these compounds include many biotic and abiotic defenses, such as suppression of pathogens and herbivores, protection from UV radiation damage, and attraction of seed dispersers. These hypotheses about the selective benefits of cannabinoid production in nature remain speculative, as none have been experimentally verified to date. We do know more, however, about the more recent evolution of the plants under human cultivation. 67 High delta-9-tetrahydrocannabinol (THC) content has been breed into many strains due to its potent psychoactive, appetite-stimulating, analgesic and antiemetic effects (Mechoulam and Gaoni 1967), which are mediated through interactions with human endocannabinoid receptors CB1 and CB2 (Matsuda et al., 1990; Di Marzo et al., 2004). After several decades of accelerated clandestine cultivation and breeding improvements, modern high-THC strains can currently yield dried un-pollenated female flower material that contains over 30% THC by dry-weight (Swift et al., 2013). High cannabidiol (CBD) content producing plants are historically used in some hashish preparations (Anderson 1980), and are presently in high demand on the US market as an antiseizure therapy (Mechoulam et al., 2002). A single locus with co-dominate THC or CBD production alleles (Staginnus et al., 2014; de Meijer et al., 2003) provides easy control for breeding high or ~50% CBD production plants. Other cannabinoids such as cannabigerol (CBG) (Borrelli et al., 2014), cannabichromene (CBC) (Izzo et al., 2012) or delta-9-tetrahydocannabivarin (THCV) (Mcpartland et al., 2015) demonstrate pharmacological promise, and can also be produced at high levels by the plant (de Meijer et al., 2008, 2009; de Meijer and Hammond, 2005). Other Cannabis secondary metabolites such as terpenes and flavonoids likely contribute to therapeutic or psychoactive effects (Russo, 2011), with myrcene for example being proposed to produce sedative effects associated with specific strains (Hazekamp and Fischedick, 2012). Plants that produce low levels of THC are herein referred to as hemp, while high THC producing varietals used in this study are described as drug-type strains. Hemp strains typically have a distinct set of growth characteristics, with fiber varieties reaching up to 6 meters in height during a growing season, exhibiting reduced flower set and increased internodal spacing compared to drug-type relatives. Despite the widespread prohibition of drug-type Cannabis cultivation from the 1930s to present (Bonnie and Whitebread 1970), hemp cultivation and breeding continued in 68 parts of Europe, Canada, and China though this period, as well as for a brief comeback during WWII in the USA through the hemp for victory campaign. Studies to date have found hemp varieties are genetically distinct from drug-type strains (van Bakel et al., 2011; Gilmore et al., 2007), though (Hillig, 2005) interestingly found southeastern Asian hemp landraces are more closely related to Afghani drug-type strains than to European hemp strains. Cannabis has a diploid genome (2n = 20), and an XY/XX chromosomal sex determining system (Divashuk et al., 2014; Moliterni et al., 2004). Genome size is estimated to be 818 Mb for female plants and 843 Mb for male plants (Sakamoto et al. 1998). Currently a draft genome consisting of 60,029 scaffolds is available for a Purple Kush drug-type from the National Center for Biotechnology Information (NCBI accessions: JH226140-JH286168). Additional whole genome data is available from NCBI for the Finola (SRP008728) and USO31 (SRP008730) hemp strains. Presently Cannabis is the only multi-billion dollar crop without a genetic linkage or physical genome map available (Semagn et al., 2006). Previous studies of Cannabis genetic diversity have used either many samples with few molecular markers (Hillig, 2005; Gilmore et al., 2007) or whole genome wide data for relatively few samples types (van Bakel et al., 2011). In this study we present a genome wide re-mapping analysis for 43 Cannabis individuals sampled from diverse hemp and drug-type plants, the largest to date. The aim of this study was to assess the genomic diversity and phylogenetic relationships among Cannabis plants that have distinct phenotypes, and that were described a priori by plant breeders as various landrace indica, sativa, hemp and drug-types, as well as commercially available hemp and drug-types with unclear pedigrees. These data and analyses will pave the way for the development of modernized breeding and quality assurance tools (Collard and Mackill, 2008), which are lacking in the nascent US Cannabis breeding and 69 production industry. Cannabis genomics also offers an ideal system for understanding plant domestication and hybridization events (Baute et al., 2015), as well as the evolution of separate sexes (Divashuk et al., 2014). 4.2 Materials and Methods Forty plant tissue samples were collected from a variety of breeding and production facilities in Colorado. The strain names, descriptions and putative origins used in this paper were recorded from the providers of the sources material (Table 4.1). DNA extractions were performed using the Qiagen DNeasy Plant Mini Kit (Valencia, CA) according to the manufacturer’s protocol. Whole genome shotgun sequencing was performed using standard Illumina multiplexed library preparation protocols for a 2 x 125 HiSeq 2500 lane and 2 x 150 NextSeq 500 run. Sequencing efforts were targeted to approximately 4-6x coverage of the Cannabis genome per sample. Trimmomatic (Bolger et al., 2014) was used to trim any remaining adaptor sequence from raw fastq reads and remove sequences with low quality regions or ambiguous base calls using the following settings: ILLUMINACLIP:IlluminaAdapters:2:20:10 LEADING:20 TRAILING:20 SLIDINGWINDOW:5:15 MINLEN:100. Trimmed raw reads from our 40 new samples, plus raw reads from the three publically available Cannabis genomes (Purple Kush, Finola and USO31) were then aligned to the only publicly available draft genome of Purple Kush (JH226140-JH286168) using the Burrows-Wheeler Alignment tool (BWA mem) (Li and Durbin, 2009). Chloroplast and mitochondrial regions were excluded. Individual alignments were then collated and used to produce a single variant call format table (.vcf) for all samples using samtools mpileup -uf | bcftools view –bvcg (Li et al., 2009). This table was then filtered to include only high quality informative SNP sites from the single copy portion of the genome 70 (Supplementary Figure 5) using bash and awk routines on the following standard .vcf fields: QC ( >200), GQ (< 30), DP (75 - 300) and AF1 (.1 - .9). The remaining SNP sites where then reformatted and downsampled for further analysis using bash and awk scripts. To visualize genetic relationships, divergence, and ancestral hybridization among lineages, a phylogenetic neighbor network was inferred using 100,000 SNPs from all 43 available datasets with simple p-distance calculations (Huson and Bryant, 2006). The Structure 2.3.4 admixture model (Pritchard et al., 2000) was used to calculate the likelihood of various numbers of populations (K=1-10) given our data, using 5000 MCMC replications and a burnin of 500 per run. The Evanno method was then used to determine the most probable value of K (Evanno et al., 2005). Table 4.1 Sample details. Cultivar name Sex Reproductive Supplier classification Heterozygous SNPs Afghan Kush 1 M D indica 1425638 Afghan Kush 2 M D indica 1251378 Afghan Kush 3 M D indica 1326196 Afghan Kush 4 F D indica 1200949 Afghan Kush 5 F D indica 1365756 Afghan Kush 6 F D indica 1349126 Carmagnola 1 F D hemp 1649463 Carmagnola 2 F D hemp 1782259 Carmagnola 3 F D hemp 1954731 71 Carmagnola 4 M D hemp 2111915 Carmagnola 5 M D hemp 2113380 Carmagnola 6 M D hemp 2145951 Dagestani hemp F M hemp 1387341 Chem91 F D hybrid 1423733 Original Sour Diesel F D sativa 1441569 Durban Poison 1 F D sativa 1529428 Hawaiian F D sativa 1737720 Lebanese F D unknown 1489262 Tora Bora F D indica 1490365 G13 F D indica 1728251 Harlequin1 F D sativa 1899375 Cannatonic1 F D hybrid 1707990 Auto AK47* F D hybrid 1825464 Low Ryder* F D hybrid 1601477 Pre-98 Bubba Kush F D indica 1619790 Maui Waui F D sativa 1490231 Super Lemon Haze F D sativa 1286703 Hindu Kush F D indica 1245063 Somali Taxi Cab F D sativa 1877605 72 Durban Poison 2 F D sativa 1889544 Rocky Mountain Unknown D hybrid 1783563 R4 (Charlotte's Web)1 F D indica 1474524 Kunduz M D indica 1651702 Kansas feral Unknown D hemp 1852936 Kompolti 1 F D hemp 1528469 Kompolti 2 M D hemp 1764527 Euro Seed Oil F D/M hemp 1854976 Chinese hemp F D hemp 1789975 Colombia Rio Negro F D sativa 1660096 Mexican F D sativa 1621453 Finola F D hemp 1350964 Purple Kush F D indica 1683486 USO31 F D hemp 1701596 Blueberry Column two sex: M=male, F=female; Column three reproductive type: D=Dioecious, M=Monoecious; * = autoflowering; 1 = CBD producer 4.3 Results and Discussion Summary information and raw sequencing libraries are publically available from the NCBI short read archive (accessions pending). Alignments to the Purple Kush reference scaffolds were quality filtered to include 10,392,741 SNPs from the single copy portion of the nuclear genome. 73 Further filtering for sites that had at minimum 10% and at maximum 90% variant frequency left 8,538,516 SNPs for downstream analysis. Phylogenetic relationships are commonly represented as bifurcating trees that explicitly model mutation driven divergence and speciation events. Whole genome wide sequence datasets include information about recombination, hybridization, and gene loss or genesis events, some of which may be incongruent with one and other (Huson and Bryant, 2006). Phylogenetic networks can represent incompatible phylogenetic signals across large character matrices, and thus offer a more appropriate model for the variety of events that drive genomic evolution. Our phylogenetic neighbor network of 43 Cannabis nuclear genomes (Figure 4.1) shows that all European hemp strains form a distinct clade, separated from all drug-type strains by a consistent band of parallel branches. Clustering of various wide-leaf blade drug-type strains (Figure 4.2) around our population of Afghan Kush landrace samples provides further support for the designation of C. indica (Hillig, 2005), although narrow-leaf blade drug-type strains appear to form several clades, perhaps influenced by the inclusion of hybrid strains in the analysis. 74 Figure 4.1 Phylogenetic neighbor network of 100,000 SNPs from the single-copy portion of the Cannabis genome. 75 Figure 4.2 Example of narrow-leaf blade type (left) and wide-leaf blade type (right) strains. To determine the statistical likelihood of various population scenarios for our dataset, we applied the admixture model-based Bayesian clustering method of Structure (Pritchard et al., 2000) to 100,000 randomly sampled SNP loci. Even though the Structure admixture model assumes an absence of linkage disequilibrium within populations, genome wide datasets that include closely linked markers can be appropriate for this approach when the signal from independent sites outweighs those of the linked sites (Conrad et al., 2006). Thus by downsampling from the more than 10 million SNPs identified through our initial alignments, we reduced the number closely linked sites, making the data more appropriate given the Structure model assumptions (in addition to constraining the data to a more computationally practical size). Our most likely population structure analysis ( K=3, mean Ln likelihood = -2327993.3, Figure 4.3), again shows 76 clear separation between hemp and drug-type strains, except for a wide-leaf blade type Chinese hemp sample that does not cluster strongly with any other hemp or drug-type strain. Hillig’s (2005) analysis of alloenzymes concluded that Asian hemp strains were more similar to Asian C. indica drug-type strains than they were to European hemp, and while we did not find support for this conclusion entirely, it is apparent that Asian and European hemp strains are highly dissimilar, possibly reflecting independent domestication events. We again also found strong support for the putative C. indica clade anchored by the Afghan Kush samples, but now also resolved a third clade of drug-type strains comprised of narrow-leaf blade varieties Super Lemon Haze, Maui Waui, Hawaiian, and Durban Poison. It remains unclear if both this narrow-leaf blade drug-type clade, and the European hemp clade, fit together within the C. sativa concept, given the distance between these clades. Proportions of hybrid drug-type genomes were also inferred, showing a range of ratios across strains. 77 Figure 4.3 Structure plot for K = 3. Only one or two individuals from each strain were used in this analysis in order to avoid biased cluster inferences (33 of 43 samples). Much more Cannabis diversity likely remains to be sampled. Notably absent from our sample set are putative C. ruderalis samples, although Finola is an auto-flowering hemp strain, and Low Ryder and Auto AK-47 are autoflowering drug-type strains—all with purported C. ruderalis ancestry. ‘Indica’ and ‘sativa’ are commonly used terms ascribed to plants that have certain characteristics, often leaf morphology and perceived effects of consuming the plant (Habib et al. 2013). However these names are also rooted in taxonomic traditions dating to Linnaeus who first classified the genus as monotypic (Cannabis sativa L., Linnaeus 1753), and Lamarck who subsequently designated Cannabis indica to accommodate the short stature potent drug-type plants from the Indian subcontinent (Lamarck 1783). Final resolution of Cannabis taxonomy will require complete assessment of standing global genetic diversity and experimental evaluation of reproductive compatibly across all major genetic groups (Rieseberg and Willis, 2007), in conjunction with morphological circumscriptions. 78 One major complication obscuring the understanding of Cannabis diversity and history is the lack of a known native range or ranges of Cannabis spp. In addition to divergent breeding efforts and human-vectored transport of seeds, the tendency of Cannabis to escape into feral populations wherever human cultivation occurs (Small et al., 2003; Haney and Kutscheid, 1975), coupled wind pollination biology and no known post-zygotic reproductive barriers, makes the existence of pure wild native Cannabis populations unlikely--or at least difficult to confirm. The weedy tendencies of Cannabis are exemplified by the mid-western US populations of feral hemp that flourish despite the eradication efforts by the Drug Enforcement Agency, which have for decades totaled millions of plants removed per year (http://www.dea.gov/ops/cannabis.shtml. We were unable to access putative wild or feral Eurasian landrace material for this study, but a comprehensive evaluation of Cannabis diversity that includes feral and wild Eurasian populations is required to ascertain if levels of divergence and gene flow are consistent with one or more origins of domestication (Hillig, 2005). Even if these extant populations are highly admixed with modern cultivars, their study promises to offer insight into Cannabis ecology and evolution, given how different the selective regime of the feral setting is compared to that of agricultural fields (Kane and Rieseberg, 2008). Cannabis genomics offers a window into the past, but also a road forward. Although historical and clandestine breeding efforts have been clearly successful in many regards (Swift et al., 2013; Mehmedic et al., 2010), Cannabis lags decades behind other major plant crop species in other respects (Collard and Mackill, 2008). Developing stable Cannabis lines capable of producing the full range of potentially therapeutic non-psychoactive cannabinoids is important for the research community, which currently lacks access to diverse and high-quality material in the US (Nutt et al., 2013). In addition to breeding resource 79 development, Cannabis genome science offers an attractive study system for understanding the evolution of separate sexes in plants (Charlesworth, 2006; Barrett, 2002). Only 6-7% of angiosperms are dioecious (Renner and Ricklefs, 1995) like Cannabis, plus the existence of monoecious populations (de Meijer et al., 2003) that can be intercrossed with dioecious individuals presents an excellent experimental system for studying the genetics and evolution of separate sexes. In this paper we extended the initial Cannabis genome study (van Bakel et al., 2011), by re-mapping whole genome sequence reads to the existing Purple Kush draft scaffolds, to understand diversity and evolutionary relationships among the major lineages. Analyses of a subset of the 10.3 million SNPs from the single copy portion of the genome lends support for the existence of at least three major Cannabis lineages (Figure 4.3). Deep and consistent separation between European hemp and drug-type strains was found across the nuclear genome (Figure 4.1), while moderate evidence was found for at least two drug-type lineages, and numerous hybrids (Figure 4.3). Overall, we hope the publicly available data and analyses from this study will help pave the way for continued exploration of the origins and history of this controversial plant, and to unlock the full agricultural and therapeutic potential of Cannabis. 80 CHAPTER 5 SUMMARY In Chapter 2 I used culture independent techniques to characterize a low diversity microbial ecosystem, including Bacteria, Eukaryotes and Archaea, from volcanic mineral soils with extremely low organic carbon levels found at > 6,000 meters elevation above the Atacama Desert. In the absence of plant life or microbial phototrophs this led me to propose that trace gas oxidation may supply energy for microbial activity, although precipitation derived dissolved organic carbon and atmospheric dust are also likely sources of nutrients. In Chapter 3 I used comparative community metagenomics, and genomics of the most abundant bacterial community member, to explore if trace gas metabolism, or other inferred metabolic traits, might explain the uneven and low diversity community structure found in these mineral soils. Owing to the culture-free nature of these analyses, and the limited geographic extent that my samples covered, I was ultimately unable to reject or confirm the trace gases hypothesis or the dissolved organic carbon and dusts hypotheses. Nonetheless, Chapter 3 provided some evidence for the use of a variety of trace gases by the most abundant bacterial community member that were not considered in Chapter 2 (H2, CO and several organic C1 compounds). I also found that heterotrophic metabolism of organic carbon may play a lesser role for this ecosystem compared to other deserts ecosystems (Figure 3.2). In retrospect I have concluded that an experimental evaluation of metabolic uptake of substrates either in situ, by sample microcosms and or by cultured isolates would have provided a more direct and compelling test of putative energy sources. 81 Even though this type of ecologically contextualized microbial physiology is challenging to execute and is not free of potential confounding issues (Theisen and Murrell, 2005), it is in my opinion an important method for answering questions about specific microbial metabolic functions. This is especially true since microbes transported as inactive forms over long distances (Stres et al., 2013) can influence community structure (Nemergut et al., 2013) and functional gene presence. That most microbes are un-culturable is often cited as a rationale for using sequencing based culture-independent approaches (Pace, 1997), and while sequencing methods are by far more high-throughput and have many advantages, understanding why most environmental microbes resist standard culturing efforts remains an intriguing question. Indeed, it appears standard culturing techniques are simply the wrong methods for the majority of bacteria (Tanaka et al., 2014). Furthermore, standard plate or liquid dilution and isolation methods are lethal to many bacteria because they require metabolites produced by other community members (Ling et al. 2015). Unraveling the evolutionary co-dependencies within microbial communities (Morris et al., 2012), is a key and often overlooked challenge that remains for a more complete understanding of microbial diversity and function. Figure 5.1 Examples of isolates from the Llullaillaco Volcano samples used in Chapters 2 and 3. Left: Blastococcus sp. nov., right: Rhodanobacter sp. nov. Putative taxonomy is based on fulllength 16S rRNA gene homology. The Blastococcus sp. nov. colony measured approximately 3 82 mm in diameter after 11 months of incubation at room temperature. The Rhodanobacter sp. nov. measured approximately 2 mm in diameter after 6 months of incubation at room temperature. After several years of experiments, and dozens of isolates I failed to recover any of the top five most abundant lineages from the Figure 3.1. Whether this is because the Pseudonocardia spp. could not grow in isolation, or the laboratory conditions were not conducive to growth, or if they were non-viable remains unknown. The Cannabis genome project is the start of a lifelong goal to understand plant traits and breed novel varieties. In Chapter 4 I analyzed genomic diversity and population structure of 43 Cannabis accession using a whole genome re-mapping analysis. These efforts revealed evidence for three major clusters of genotypic diversity (Figure 4.3), and extensive hybridization of many modern drug-type strains. This study is limited by the lack samples from Eurasian wild or feral populations, including northern putative C. ruderalis samples, due to lack of access to this material in Colorado, but nonetheless provides the largest comparative analysis of Cannabis genomic diversity to date. These data and analysis can further serve as a foundation for understanding the origins of Cannabis domestication events with the additional sequencing of Eurasian wild plants, and the full range of cultivar diversity (www.leafly.com lists 1,224 drugtype strains). Assaying and describing Cannabis diversity is only the first step however. In addition to answering basic questions about the evolution of Cannabis lineages and dioecy, the broader applied goal of my work is to develop genomic tools for accelerating the throughput of breeding projects (Collard and Mackill, 2008; Pacifico et al., 2006). This will be the work of many labs for years to come and will include developing genetic linkage maps and trait based associations (Korte and Farlow, 2013) before marker assisted selection efforts can even begin. Mapping populations and trait association datasets are currently being developed through various 83 collaborations, but are for the time being limited by the absence of large scale pollen-controlled breeding facilities in Colorado. 84 6. Bibliography Anderson, L. C. 1974. A study of systematic wood anatomy in Cannabis. Harvard University Botanical Museum Leaflets 24, 29–36. van Bakel, H., Stout, J. M., Cote, A. G., Tallon, C. M., Sharpe, A. G., Hughes, T. R., and Page, J. E. (2011). The draft genome and transcriptome of Cannabis sativa. Genome Biol. 12, R102. doi:10.1186/gb-2011-12-10-r102. Barrett, S. C. H. (2002). The evolution of plant sexual diversity. Nat. Rev. Genet. 3, 274–284. doi:10.1038/nrg776. Baute, G. J., Kane, N. C., Grassa, C. J., Lai, Z., and Rieseberg, L. H. (2015). Genome scans reveal candidate domestication and improvement genes in cultivated sunflower , as well as post-domestication introgression with wild relatives. New Phytologist doi: 10.1111/nph.13255 Beutler, J. A. and A. H. Dermarderosian. 1978. Chemotaxonomy of Cannabis I. Crossbreeding between Cannabis sativa and C. ruderalis, with analysis of cannabinoid content. Econ. Bot. 32,387-394. Bolger, A. M., Lohse, M., and Usadel, B. (2014). Genome analysis Trimmomatic : a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120. doi:10.1093/bioinformatics/btu170. Borrelli, F., Pagano, E., Romano, B., Panzera, S., Maiello, F., Coppola, D., Petrocellis, L. De, Buono, L., Orlando, P., and Izzo, A. A. (2014). Colon carcinogenesis is inhibited by the TRPM8 antagonist cannabigerol , a Cannabis-derived non-psychotropic cannabinoid. Carcinogenesis 35, 2787–2797. doi:10.1093/carcin/bgu205. Bonnie, R. J. and Whitebread, C. H. (1970). The forbidden fruit and the tree of knowledge: an inquiry into the legal history of american marijuana prohibition. Virginia Law Rev. 56, 9711203. Casano, S., G. Grassi, V. Martini, and M. Michelozzi. 2011. Variations in Terpene Profiles of Different Strains of Cannabis sativa L. Xxviii International Horticultural Congress on Science and Horticulture for People (Ihc2010): A New Look at Medicinal and Aromatic Plants Seminar 925:115-121. Charlesworth, D. (2006). Evolution of Plant Breeding Systems. Curr. Biol. 16, 726–735. doi:10.1016/j.cub.2006.07.068. 85 Collard, B. C. Y., and Mackill, D. J. (2008). Marker-assisted selection: an approach for precision plant breeding in the twenty-first century. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 363, 557–572. doi:10.1098/rstb.2007.2170. Conrad, D. F., Jakobsson, M., Coop, G., Wen, X., Wall, J. D., Rosenberg, N. a, and Pritchard, J. K. (2006). A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nat. Genet. 38, 1251–1260. doi:10.1038/ng1911. Divashuk, M. G., Alexandrov, O. S., Razumova, O. V, Kirov, I. V, and Karlov, G. I. (2014). Molecular cytogenetic characterization of the dioecious Cannabis sativa with an XY chromosome sex determination system. PLoS One 9, e85118. doi:10.1371/journal.pone.0085118. Dovichi, N. J., and Zhang, J. (2000). How capillary electrophoresis sequenced the human genome. Angew. Chemie - Int. Ed. 39, 4463–4468. doi:10.1002/15213773(20001215)39:24<4463::AID-ANIE4463>3.0.CO;2-8. ElSohly, M. A., and Slade, D. (2005). Chemical constituents of marijuana: The complex mixture of natural cannabinoids. Life Sci. 78, 539–548. doi:10.1016/j.lfs.2005.09.011. Evanno, G., Regnaut, S., and Goudet, J. (2005). Detecting the number of clusters of individuals using the software STRUCTURE: A simulation study. Mol. Ecol. 14, 2611–2620. doi:10.1111/j.1365-294X.2005.02553.x. Flaxman, S. M., Wacholder, A. C., Feder, J. L., and Nosil, P. (2014). Theoretical models of the influence of genomic architecture on the dynamics of speciation. Mol. Ecol. 23, 4074–4088. doi:10.1111/mec.12750. Gilmore, S., Peakall, R., and Robertson, J. (2007). Organelle DNA haplotypes reflect crop-use characteristics and geographic origins of Cannabis sativa. Forensic Sci. Int. 172, 179–190. doi:10.1016/j.forsciint.2006.10.025. Graur, D., Zheng, Y., Price, N., Azevedo, R. B. R., Zufall, R. A., and Elhaik, E. (2013). On the immortality of television sets: “Function” in the human genome according to the evolutionfree gospel of encode. Genome Biol. Evol. 5, 578–590. doi:10.1093/gbe/evt028. Habib et al. 2013: http://liq.wa.gov/publications/Drug-type/BOTEC%20reports/1c-Testing-forPsychoactive-Agents-Final.pdf Hanage, W. P. (2014). Microbiome science needs a healthy dose of scepticism. Nature 512, 247– 248. doi:10.1038/512247a. Haney, A. and Kutscheid, B. B. (1975) An ecological study of naturalized hemp (Cannabis sativa L.) in east-central Illinois. Am. Midl. Nat. 93, 1–24. 86 Hillig, K. W. (2004). A chemotaxonomic analysis of terpenoid variation in Cannabis. Biochem. Syst. Ecol. 32, 875–891. doi:10.1016/j.bse.2004.04.004. Hillig, K. W. (2005). Genetic evidence for speciation in Cannabis (Cannabaceae). Genet. Resour. Crop Evol. 52, 161–180. doi:10.1007/s10722-003-4452-y. Hillig, K. W., and Mahlberg, P. G. (2004). A chemotaxonomic analysis of cannabinoid variation in Cannabis (Cannabaceae). Am. J. Bot. 91, 966–975. doi:10.3732/ajb.91.6.966. Huson, D. H., and Bryant, D. (2006). Application of phylogenetic networks in evolutionary studies. Mol. Biol. Evol. 23, 254–267. doi:10.1093/molbev/msj030. Itoh, T., Yamanoi, K., Kudo, T., Ohkuma, M., and Takashina, T. (2011). Aciditerrimonas ferrireducens gen. nov., sp. nov., an iron-reducing thermoacidophilic actinobacterium isolated from a solfataric field. Int. J. Syst. Evol. Microbiol. 61, 1281–5. doi:10.1099/ijs.0.023044-0. Izzo, A. A., Capasso, R., Aviello, G., Borrelli, F., Romano, B., Piscitelli, F., Gallo, L., Capasso, F., Orlando, P., and Di Marzo, V. (2012). Inhibitory effect of cannabichromene, a major non-psychotropic cannabinoid extracted from Cannabis sativa, on inflammation-induced hypermotility in mice. Br. J. Pharmacol. 166, 1444–1460. doi:10.1111/j.14765381.2012.01879.x. Kane, N. C., and Rieseberg, L. H. (2008). Genetics and evolution of weedy Helianthus annuus populations: Adaptation of an agricultural weed. Mol. Ecol. 17, 384–394. doi:10.1111/j.1365-294X.2007.03467.x. Korte, A., and Farlow, A. (2013). The advantages and limitations of trait analysis with GWAS: a review. Plant Methods 9, 29. doi:10.1186/1746-4811-9-29. Lamarck, J. B. (1783). Encyclope´die me´thodique botanique, Chez Panckoucke. Leinonen, T., McCairns, R. J. S., O’Hara, R. B., and Merilä, J. (2013). Q(ST)-F(ST) comparisons: evolutionary and ecological insights from genomic heterogeneity. Nat. Rev. Genet. 14, 179–90. doi:10.1038/nrg3395. Li, H. L. (1973). An archaeological and historical account of Cannabis in China. Econ. Bot. 1973, 28, 437-444. Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760. doi:10.1093/bioinformatics/btp324. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and Durbin, R. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079. doi:10.1093/bioinformatics/btp352. 87 Ling, L. L., Schneider, T., Peoples, A. J., Spoering, A. L., Engels, I., Conlon, B. P., Mueller, A., Hughes, D. E., Epstein, S., Jones, M., et al. A new antibiotic kills pathogens without detectable resistance. Nature 517, 455-461. doi:10.1038/nature14098. Linnaeus, C. (1753). Species plantarum. Laurentii Salvii, Holmiae, p. 1200. Di Marzo, V., Bifulco, M., and De Petrocellis, L. (2004). The endocannabinoid system and its therapeutic exploitation. Nat. Rev. Drug Discov. 3, 771–784. doi:10.1038/nrd1495. Matsuda, L. a, Lolait, S. J., Brownstein, M. J., Young, a C., and Bonner, T. I. (1990). Structure of a cannabinoid receptor and functional expression of the cloned cDNA. Nature 346, 561– 564. doi:10.1038/346561a0. McPartland, J. M., Duncan, M., Marzo, V. Di, and Pertwee, R. G. (2015). Are cannabidiol and Δ 9 -tetrahydrocannabivarin negative modulators of the endocannabinoid system ? A systematic review. 737–753. doi:10.1111/bph.12944. McPartland, J. M., Matias, I., Di Marzo, V., and Glass, M. (2006). Evolutionary origins of the endocannabinoid system. Gene 370, 64–74. doi:10.1016/j.gene.2005.11.004. Mechoulam, R., Gaoni, Y., (1967) Recent advances in the chemistry of hashish. Fortschritte der Chemie Organischer Naturstoffe. 25, 175– 213 Mechoulam, R., Parker, L. A. and Gallily, R. (2002). Cannabidiol: an overview of some pharmacological aspects. J. Clin. Pharmacol. 42, 11S–19S. doi:10.1177/0091270002238789. Mehmedic, Z., Chandra, S., Slade, D., Denham, H., Foster, S., Patel, A. S., Ross, S. A., Khan, I. A.,. and ElSohly, M. A. (2010). Potency trends of Δ9-THC and other cannabinoids in confiscated Cannabis preparations from 1993 to 2008. J. Forensic Sci. 55, 1209–1217. doi:10.1111/j.1556-4029.2010.01441.x. de Meijer, E. P. M. De, Bagatta, M., Carboni, A., Crucitti, P., Moliterni, V. M. C., Ranalli, P., and Mandolino, G. (2003). The Inheritance of Chemical Phenotype in Cannabis sativa L . 346, 335–346. de Meijer, E. P. M., and Hammond, K. M. (2005). The inheritance of chemical phenotype in Cannabis sativa L. (II): Cannabigerol predominant plants. Euphytica 145, 189–198. doi:10.1007/s10681-005-1164-8. de Meijer, E. P. M., Hammond, K. M., and Micheler, M. (2008). The inheritance of chemical phenotype in Cannabis sativa L. (III): variation in cannabichromene proportion. Euphytica 165, 293–311. doi:10.1007/s10681-008-9787-1. 88 de Meijer, E. P. M., Hammond, K. M., and Sutton, a. (2009). The inheritance of chemical phenotype in Cannabis sativa L. (IV): cannabinoid-free plants. Euphytica 168, 95–112. doi:10.1007/s10681-009-9894-7. Moliterni, V. M. C., Cattivelli, L., Ranalli, P., and Mandolino, G. (2004). The sexual differentiation of Cannabis sativa L.: A morphological and molecular study. Euphytica 140, 95–106. doi:10.1007/s10681-004-4758-7. Morris, J. J., Lenski, R. E., Zinser, E. R., and Loss, A. G. (2012). The black queen hypothesis: evolution of dependencies through adaptive gene loss. mBio 3, e00036-12. doi:10.1128/mBio.00036-12.Updated. Nemergut, D. R., Schmidt, S. K., Fukami, T., O’Neill, S. P., Bilinski, T. M., Stanish, L. F., Knelman, J. E., Darcy, J. L., Lynch, R. C., Wickey, P., et al. (2013). Patterns and Processes of Microbial Community Assembly. Microbiol. Mol. Biol. Rev. 77, 342–356. doi:10.1128/MMBR.00051-12. Nutt, D. J., King, L. A, and Nichols, D. E. (2013). Effects of Schedule I drug laws on neuroscience research and treatment innovation. Nat. Rev. Neurosci. 14, 577–585. doi:10.1038/nrn3530. Pace, N. R. (1997). A molecular view of microbial diversity and the biosphere. Science 276, 734–40. Pacifico, D., Miselli, F., Micheler, M., Carboni, A., Ranalli, P., and Mandolino, G. (2006). Genetics and Marker-assisted Selection of the Chemotype in Cannabis sativa L. Mol. Breed. 17, 257–268. doi:10.1007/s11032-005-5681-x. Poklis, J. L., Thompson, C. C., Long, K. a, Lichtman, A. H., and Poklis, A. (2010). Disposition of cannabichromene, cannabidiol, and Δ9-tetrahydrocannabinol and its metabolites in mouse brain following marijuana inhalation determined by high-performance liquid chromatography-tandem mass spectrometry. J. Anal. Toxicol. 34, 516–20. Pritchard, J. K., Stephens, M., and Donnelly, P. (2000). Inference of population structure using multilocus genotype data. Genetics 155, 945–959. doi:10.1111/j.1471-8286.2007.01758.x. Radwan, M. M., Ross, S. a., Slade, D., Ahmed, S. a., Zulfiqar, F., and Elsohly, M. a. (2008). Isolation and characterization of new cannabis constituents from a high potency variety. Planta Med. 74, 267–272. doi:10.1055/s-2008-1034311. Renner, S. S., and Ricklefs, R. E. (1995). Dioecy and its correlates in the flowering plants. Am. J. Bot. 82, 596. doi:10.2307/2445418. Rieseberg, L. H., and Willis, J. H. (2007). Plant speciation. Science 317, 910–914. doi:10.1126/science.1137729. 89 Russo, E. B. (2007). History of Cannabis and its preparations in saga, science, and sobriquet. ChemInform 38. doi:10.1002/chin.200747224. Russo, E. B. (2011). Taming THC: potential Cannabis synergy and phytocannabinoid-terpenoid entourage effects. Br. J. Pharmacol. 163, 1344–64. doi:10.1111/j.1476-5381.2011.01238.x. Sakamoto, K., Akiyama, Y., Fuku, I. K., Kamada, H., Satoh, S. (1998). Characterization; Genomes izes and morphology of sex chromosomes in hemp (Cannabis sativ L.). Cytologia 63,459-464. Salzet, M., Breton, C., Bisogno, T., and Di Marzo, V. (2000). Comparative biology of the endocannabinoid system possible role in the immune response. Eur. J. Biochem. 267, 4917– 27. doi:ejb1550. Sanger, F., Coulson, A. R., Friedmann, T., Air, G. M., Barrell, B. G., Brown, N. L., Fiddes, J. C., Hutchison, C. A., Slocombe, P. M., and Smith, M. (1978). The nucleotide sequence of bacteriophage phiX174. J. Mol. Biol. 125, 225–246. doi:10.1016/0022-2836(78)90346-7. Schultes R. E., Klein W. M., Plowman T., Lockwood T. E. (1974). Cannabis: an example of taxonomic neglect. Harvard University Botanical Museum Leaflets 23 337–367. Semagn, K., Bjørnstad, Å., and Ndjiondjop, M. N. (2006). Principles , requirements and prospects of genetic mapping in plants. African J. Biotechnol. 5, 2569–2587. Shendure, J., Mitra, R. D., Varma, C., and Church, G. M. (2004). Advanced sequencing technologies: methods and goals. Nat. Rev. Genet. 5, 335–344. doi:10.1038/nrg1325. Small, E., Cronquist, A., 1976. A practical and natural taxonomy for Cannabis. Taxon 25, 405– 435. Small, E., Pocock, T., and Cavers, P. (2003). The biology of Canadian weeds. 119. Cannabis sativa L. Can. J. plant Sci. 83: 217–237. Staginnus, C., Zörntlein, S., and de Meijer, E. (2014). A PCR marker linked to a THCA synthase polymorphism is a reliable tool to discriminate potentially THC-rich plants of Cannabis sativa L. J. Forensic Sci. 59, 919–26. doi:10.1111/1556-4029.12448. Stres, B., Sul, W. J., Murovec, B., and Tiedje, J. M. (2013). Recently deglaciated high-altitude soils of the Himalaya: diverse environments, heterogenous bacterial communities and longrange dust inputs from the upper troposphere. PLoS One 8, e76440. doi:10.1371/journal.pone.0076440. Swift, W., Wong, A., Li, K. M., Arnold, J. C., and McGregor, I. S. (2013). Analysis of Cannabis seizures in NSA, Australia: Cannabis potency and cannabinoid profile. PLoS One 8, 1–9. doi:10.1371/journal.pone.0070052. 90 Tanaka, T., Kawasaki, K., Daimon, S., Kitagawa, W., Yamamoto, K., Tamaki, H., Tanaka, M., Nakatsu, C. H., and Kamagata, Y. (2014). A hidden pitfall in the preparation of agar media undermines microorganism cultivability. Appl. Environ. Microbiol. 80, 7659–7666. doi:10.1128/AEM.02741-14. Theisen, A. R., and Murrell, J. C. (2005). Facultative Methanotrophs Revisited. J. Bacteriol. 187, 4303–4305. doi: 10.1128/JB.187.13.4303-4305.2005 Tomasetti, C., and Vogelstein, B. (2015). Variation in cancer risk among tissues can be explained by the number of stem cell divisions. Science. 347, 78-81. doi: 10.1126/science.1260825 Varshney, R. K., Nayak, S. N., May, G. D., and Jackson, S. a. (2009). Next-generation sequencing technologies and their implications for crop genetics and breeding. Trends Biotechnol. 27, 522–530. doi:10.1016/j.tibtech.2009.05.006. 91 7. APPENDEX Supplementary Figure 1. Maximum likelihood phylogeny of Acidimicrobiae OTU1% lineages. Aciditerrimonas ferrireducens (Itoh et al., 2011) was isolated from geothermal volcanic soils and can respire heterotrophically using a limited spectrum of sugars or reduce ferric iron under anaerobic conditions. 92 Supplementary Figure 2. K-mer analyses of genomic complexity, sampling effort and identity. A) 15-mer spectrum plot including both Llullaillaco volcano metagenome libraries (yellow and blue) and various publically available reference desert (black) and non-desert (grey) metagenomes. Dataset are publicly available from MG RAST: all datasets from (Fierer et al., 2012) and the Luquillo rainforest metagenome (4446153.3). 93 Supplementary Figure 3. Tetramer based emergent self-organizing map, built from the volcano Pseudonocardia sp. contigs (large bright green squares), volcano Acidimicrobiae contigs (small green squares), plus the Streptococcus coelicolor genome AL645882, purple squares) as an out group. The topology of this self-organizing map confirms the discriminatory power of the assembly coverage level based bins (Figure S3), particularly for the volcano Pseudonocardia sp. The Acidimicrobiae bin contigs appear to contain a higher degree of mixing across topological features, likely representing both misclassifications and transposon driven horizontal gene transfer events. 94 Supplementary Figure 4. Histograms of average coverage levels of contigs from metagenome assembly. 95 Supplementary Figure 5. Alignment depths of all 43 Cannabis samples for all high-quality SNP loci. 96 Supplementary Table 1. genes with Ka:Ks ratios ≥ 1 when comparing Pseudonocardia asaccharolytica with the site 1 volcano Pseudonocardia sp. 2_polyprenylphenol_hydroxylase_and_related_flavodoxin_oxidoreductases_CDS hypothetical_protein_CDS D_alanine_D_alanine_ligase_(EC_6_3_2_4)_CDS SSU_ribosomal_protein_S30P_sigma_54_modulation_protein_CDS ATPases_involved_in_chromosome_partitioning_CDS ferrochelatase_(EC_4_99_1_1)_CDS Cyanate_permease_CDS NADH_dehydrogenase_subunit_J_(EC_1_6_5_3)_CDS Predicted_phosphohydrolases_CDS tRNA_(Guanine37_N(1)_)_methyltransferase_(EC_2_1_1_31)_CDS Acyl_CoA_dehydrogenases_CDS phosphoribosyl_AMP_cyclohydrolase_(EC_3_5_4_19)_CDS alanine_racemase_(EC_5_1_1_1)_CDS Protein_of_unknown_function_(DUF3159)_CDS hypothetical_protein_CDS succinyldiaminopimelate_aminotransferase_apoenzyme_(EC_2_6_1_17)_CDS Leucyl_aminopeptidase_CDS RNAse_PH_(EC_2_7_7_56)_CDS 2_3_dihydro_2_3_dihydroxybenzoyl_CoA_ring_cleavage_enzyme_CDS Transposase_DDE_domain_CDS pantothenate_synthetase_(EC_6_3_2_1)_CDS Uncharacterized_conserved_protein_CDS Protein_of_unknown_function_(DUF3000)_CDS Prephenate_dehydrogenase_CDS Protein_of_unknown_function_(DUF2029)_CDS Formamidopyrimidine_DNA_glycosylase_CDS Predicted_acyltransferase_CDS probable_S_adenosylmethionine_dependent_methyltransferase_YraL_family_CDS phosphomethylpyrimidine_kinase_CDS ABC_type_multidrug_transport_system_ATPase_component_CDS ABC_type_polysaccharide_polyol_phosphate_export_systems_permease_component_CDS hypothetical_protein_CDS Anaerobic_dehydrogenases_typically_selenocysteine_containing_CDS coenzyme_F420_0_gamma_glutamyl_ligase_(EC_6_3_2_31)_CDS 6_phosphogluconolactonase_(EC_3_1_1_31)_CDS tRNA_(5_methylaminomethyl_2_thiouridylate)_methyltransferase_(EC_2_1_1_61)_CDS serine_threonine_protein_kinase_CDS transketolase_(EC_2_2_1_1)_CDS Uncharacterized_conserved_protein_(some_members_contain_a_von_Willebrand_factor_type_A_(vWA)_domain)_CDS Helicase_conserved_C_terminal_domain_CDS 97 3_dehydroquinate_synthase_(EC_4_2_3_4)_CDS methionyl_tRNA_formyltransferase_(EC_2_1_2_9)_CDS hypothetical_protein_CDS K+_transport_systems_NAD_binding_component_CDS selenocysteine_specific_translation_elongation_factor_SelB_CDS DNA_segregation_ATPase_FtsK_SpoIIIE_and_related_proteins_CDS hypothetical_protein_CDS Protein_of_unknown_function_(DUF3263)_CDS leucyl_tRNA_synthetase_(EC_6_1_1_4)_CDS 6_7_dimethyl_8_ribityllumazine_synthase_(EC_2_5_1_78)_CDS succinate_dehydrogenase_subunit_A_(EC_1_3_5_1)_CDS phosphate_ABC_transporter_ATP_binding_protein_PhoT_family_(TC_3_A_1_7_1)_CDS Uncharacterized_conserved_protein_CDS Uridine_kinase_CDS hypothetical_protein_CDS ribonuclease_Rne_Rng_family_CDS Transposase_IS116_IS110_IS902_family_Transposase_CDS 1_deoxy_D_xylulose_5_phosphate_reductoisomerase_(EC_1_1_1_267)_CDS selenophosphate_synthase_(EC_2_7_9_3)_CDS 98
© Copyright 2026 Paperzz