Genomics of Adaptation and Diversification - CU Scholar

University of Colorado, Boulder
CU Scholar
Ecology & Evolutionary Biology Graduate Theses &
Dissertations
Ecology & Evolutionary Biology
Spring 1-1-2015
Genomics of Adaptation and Diversification
Ryan C. Lynch
University of Colorado Boulder, [email protected]
Follow this and additional works at: http://scholar.colorado.edu/ebio_gradetds
Part of the Biodiversity Commons, Bioinformatics Commons, Desert Ecology Commons,
Evolution Commons, and the Genomics Commons
Recommended Citation
Lynch, Ryan C., "Genomics of Adaptation and Diversification" (2015). Ecology & Evolutionary Biology Graduate Theses & Dissertations.
68.
http://scholar.colorado.edu/ebio_gradetds/68
This Dissertation is brought to you for free and open access by Ecology & Evolutionary Biology at CU Scholar. It has been accepted for inclusion in
Ecology & Evolutionary Biology Graduate Theses & Dissertations by an authorized administrator of CU Scholar. For more information, please contact
[email protected].
GENOMICS OF ADAPTATION AND DIVERSIFICATION
by
Ryan Lynch
B.A., University of Colorado, 2004
A thesis submitted to the
Faculty of the Graduate School of the
University of Colorado in partial fulfillment
of the requirement for the degree of
Doctor of Philosophy
Department of Ecology and Evolutionary Biology
2015
This thesis entitled:
Genomics of Adaptation and Diversification
written by Ryan Lynch
has been approved for the Department of Ecology and Evolutionary Biology
Dr. Nolan Kane
Dr. Noah Fierer
Date
The final copy of this thesis has been examined by the signatories, and we
Find that both the content and the form meet acceptable presentation standards
Of scholarly work in the above mentioned discipline.
Lynch, Ryan (Ph.D., Department of Ecology and Evolutionary Biology)
Genomics of Adaptation and Diversification
Thesis directed by Assistant Professor Nolan Kane
ABSTRACT
Recent and ongoing advances in DNA sequencing, coupled with computational developments,
have opened new frontiers for understanding the structure and function of biological
diversity. For my dissertation, I first addressed questions related to a low diversity community
of un-cultured Atacama Desert bacteria, using a variety of sequencing approaches. The principal
goal was to infer what metabolic traits these bacteria possess, which allows them to survive harsh
desert conditions that other bacteria could not. Through detailed genome assembly, annotation
and comparative analyses I developed a working hypothesis that trace gas metabolism (H2, CO
and several organic C1 compounds) may sustain these microorganisms in their habitat, although
many aspects of their metabolic capacity remain undetermined. In the second component of my
dissertation, I used whole genome sequencing of diverse Cannabis accessions to infer the
phylogenetic lineages of this genus. These findings show support for at least one major clade of
hemp and two clades of drug-type Cannabis, as well as hybrid origins of many commercially
available modern drug-type cultivars. The levels of divergence among these clades suggest
multiple independent domestication events may have occurred, though extensive breeding for
hemp and drug-type strains from a single domestication origin cannot be ruled out at present, due
to the lack of known true wild populations. This work has relevance for future cultivar
development, but also reflects long forgotten events that have occurred during 6,000 years of the
iii
Cannabis-human relationship. Together, my dissertation demonstrates several ways that DNA
sequencing technology and analytical approaches can address questions in ecology and
evolutionary biology, but also highlights the limitations of these methods and underscores the
importance of complementary non-sequencing approaches.
iv
ACKNOWLEDGMENTS
I thank my committee members Dr. Nolan Kane, Dr. Noah Fierer, Dr. Andy Martin, Dr. Diana
Nemergut and Dr. Erin Trip for their guidance, support and inspiration through my doctoral work
and career development. The time spent getting know and work with each of you has been a
privilege I will never forget. I also thank former committee member Dr. Patrik Nosil. Dr. Mike
Robeson, Jack Darcy, Jon Leff and Dr. Daniela Vergara have also been great friends and
collaborators through my time here in the EBIO department. I need to thank Dr. Chris Jung, Dr.
Collin Becker and Ryan Artale for all the years of adventures in Colorado and beyond, you can’t
work all the time if you want to get anything done. However the original inspiration to follow
my curiosity is my father, Dr. James F. Lynch (PhD, CU Mathematics 1977)--who himself has
never stopped in the pursuit of understanding biological complexity.
v
CONTENTS
1 INTRODUCTION…………………………………………………………………………1
2 THE POTENTIAL FOR MICROBIAL LIFE IN THE HIGHEST-ELEVATION
(>6000 M.A.S.L.) MINERAL SOILS OF THE ATACAMA REGION
2.1 Abstract…………………………………………………………………………5
2.2 Introduction……………………………………………………………………..6
2.3 Methods……….………………………………………………………………...9
2.4 Results………………………………………………………………………….11
2.5 Discussion……………………………………………………………………...17
2.6 References………………………………………………………………….......22
3
METAGENOMIC EVIDENCE FOR METABOLISM OF TRACE ATMOSPHERIC
GASES BY HIGH-ELEVATION DESERT ACTINOBACTERIA
3.1 Abstract…………………………………………………………………………29
3.2 Introduction……………………………………………………………………..31
3.3 Materials and Methods………………………………………………………….32
3.4 Results…………………………………………………………………………..39
3.5 Discussion………………………………………….……………………………47
3.6 References………………………………………………………………………58
4
GENOMIC DIVERSITY IN CANNABIS
4.1 Introduction………………………………………………………………….......66
4.2 Materials and Methods…………………………………………………………..70
4.3 Results and Discussion…………………………………………………………..73
5
SUMMARY….………………………………………………………………………....….81
6
BIBLIOGRAPHY………………………………………………………...…………….….86
7
APPENDIX………………………………………………………………………….……..92
vi
CHAPTER 1
INTRODUCTION
I started my Ph.D. at a time of optimistic rhetoric for DNA sequencing and genomic science
(Varshney et al., 2009). In the early years of DNA sequencing, starting with the method of
Sanger et al. (1978), data generation was clearly a major limiter. Through the 1980s entire Ph.D.
dissertations were sometimes based on sequencing only a small gene (Dovichi and Zhang, 2000).
Driven by the promise of revolutions in human medicine, various new high-throughput
sequencing platforms came and went through the market during the 1990s and 2000s (Shendure
et al., 2004). The early years of high-throughput sequencing were limited to a few specialized
and well-funded labs equipped with the required technical expertise. But the early insights (and
papers) from these efforts paved the way for the broad democratization of high-throughput
sequencing technology and analysis tools. Software development for analysis of large sequence
datasets has continued to accelerate and many new training programs for computationally
oriented biologists were launched in the 2000s. By 2010, when I started towards a Ph.D., highthroughput sequencing techniques and buzzwords were already dominating many areas of
biological research and discourse. Now nearly five years later, these trends continue, with
affordable computing power and continuous improvements to software facilitating the analysis
of terabase-scale population genomic datasets. Neither data quantity nor computer horsepower
can be currently considered a limiting factor for developing biological insight from DNA
sequence data, however other factors appear to now constrain our ability to penetrate biological
complexity.
1
The connection between genotype and phenotype remains unclear in many situations
(Tomasetti and Vogelstein, 2014), and various projects and techniques have come under fire for
over interpreting genomic evidence (Graur et al., 2013; Hanage, 2014). Improving the statistical
framework used to identify neutral and adaptive diversity across large populations of whole
genome scale datasets remains the major challenge facing the field of ecological and
evolutionary genomics (Leinonen et al., 2013; Flaxman et al., 2014). Understanding the current
analytical methods for interpreting genomic data, and their current limitations, was a major
objective of my Ph.D. work, and though these efforts span many projects and study systems, the
question remained the same: how can DNA sequence data be used to understand biological
diversity and function?
The microbial ecology portion of my work stared from one simple observation: high
elevation Atacama Desert volcano samples host a different, and simpler, community of bacteria
compared to anything else. Even without statistical analyses I could simply see the difference in
the DNA sequence data from my first PCR based study of this gravelly material. What about
these bacteria makes them able to survive conditions that other bacteria could not? From my
naive observations as a lab technician, I decided to address this question using genomic
techniques. My rationale being if I could determine the metabolic traits these bacteria possess,
then I could compare these to traits which other related bacteria possess, and thus develop a
picture of how high-elevation bacteria live in such an environment. Because almost nothing is
known about this desert mountaintop habitat, the trait based inferences could be also be used to
understand how this environment shapes the community and its resident organisms. What
environmental stressors limited life here so drastically? Thus my Ph.D. started from five one-
2
gallon zip lock bags that were partially filled with crumbled dusty volcanic debris--I knew little
about the organisms present in the rocky matrix, but even less about their far-away habitat.
To address these questions I present first my initial biogeochemical descriptions of the
Atacama Llullaillaco Volcano sites, as well as a brief amplicon based assay of the bacterial,
microbial eukaryotic and archaeal communities (Chapter 2). I also introduce the bacterial trace
gas metabolism as an energy source hypothesis for this microbial community, and present
limited support as partial carbon monoxide dehydrogenase genes that were also PCR amplified
from bulk soil genomic DNA extractions. Expanding the breadth and depth of genomic analyses
in Chapter 3, I analyze full shotgun metagenomic datasets, some derived from bulk Llullaillaco
Volcano soil genomic DNA extractions. These analyses include: statistical tests of the
community structure, gene functional category comparisons to other diverse microbial
ecosystems, de novo genome assembly of high abundance community members, metabolic
pathway analysis and selection detection across genomes through synonymous to nonsynonymous mutation rate ratio estimations in protein coding genes. Ultimately these efforts
produced details regarding some aspects of the metabolic potential of Llullaillaco Volcano
bacteria, and supported the further development of the trace gas metabolism hypothesis to
include H2, CO and several organic C1 compounds. However this work fails to fully answer the
question of why these organisms were found at 21,000 feet elevation above the driest desert on
Earth, rather than any of the countless alternative bacterial types or species known. Overall this
work highlights both the strengths and the weakness of DNA sequence based biology and
ecology, and reinforces the importance of study design and use of complementary nonsequencing approaches.
3
The Cannabis diversity and evolution portion of my work represents a step towards
realizing my long term goal of utilizing genomic methods to understand and manipulate
biological traits. Even though Cannabis is one of earliest domesticated crops and produces the
most widely used illicit drug in the world, scientific study has been limited to only a few research
groups scattered around the globe. Cannabis occupies a unique cultural, political and scientific
position. It is a polarizing but charismatic plant, both widely dismissed as harmful, and praised
for its curative potential. Although the future legal status of Cannabis remains hazy, it is here to
stay in one form or another. The long history of human domestication, dispersal to every
continent and divergent section for both drug and hemp types makes for a rare set of scientific
opportunities. In Chapter 4 I present a whole genome re-mapping analysis of 43 Cannabis
accessions from geographically and morphologically diverse hemp and drug-type
strains. Through analyzing over 10 million single nucleotide polymorphisms (SNPs) across the
single copy portion of the genome I address basic questions related to Cannabis diversity and
population structure: how many Cannabis lineages are contained within the colloquially used
terms sativa and indica? And which genomic regions from modern cultivars originate from
different known landrace populations?
4
CHAPTER 2
THE POTENTIAL FOR MICROBIAL LIFE IN THE HIGHEST-ELEVATION
(>6000 M.A.S.L.) MINERAL SOILS OF THE ATACAMA REGION1
2.1 Abstract
Here we present the first culture-independent microbiological and biogeochemical study of the
mineral soils from 6000 m above sea level (m.a.s.l.) on some the highest volcanoes in the
Atacama region of Argentina and Chile. These soils experience some of the harshest
environmental conditions on Earth including daily temperature fluctuations across the freezing
point (with an amplitude of up to 70 C) and intense solar radiation. Soil carbon and water levels
are among the lowest yet measured for a terrestrial ecosystem and enzyme activity was near or
below detection limits for all microbial enzymes measured. The soil microbial communities were
among the simplest yet studied in a terrestrial environment and contained novel Bacteria and
Fungi and only one Archaeal phylotype. No photosynthetic organisms were detected but several
of the dominant bacterial phylotypes are related to organisms involved in carbon monoxide
oxidation on other volcanoes (e.g., Pseudonocardia and Ktedonobacter spp.). Focused studies of
a gene responsible for carbon monoxide oxidation, the large subunit of carbon monoxide
dehydrogenase (coxL of CODH), revealed several novel lineages and a broad diversity of
coxL genes. Overall our results suggest that a unique microbial community, sustained by diffuse
atmospheric and volcanic gases, is barely functioning on these volcanoes, which represent the
highest terrestrial ecosystems yet studied.
1
Published as: Lynch,R.C., King,A.J., Farías,M.E., Sowell,P., Vitry,C., and Schmidt,S. K. (2012). The potential for
microbial life in the highest elevation (>6000 m.a.s.l.) mineral soils of the Atacama region. J. Geophys.Res. 117,
G02028.
5
2.2 Introduction
Studies of microbial life in extremely dry environments have focused mostly on low elevation
areas such as the Dry Valleys of Antarctica [Cary et al., 2010] and the Atacama Desert [Connon
et al., 2007; Lester et al., 2007]. Due to its status as the driest desert of the planet, the lower
elevation regions of the Atacama have served as a natural testing ground for the dry-limit of
microbial life [Navarro- González et al., 2003]. In the hyper-arid core of the Atacama, mean
annual rainfall is less than 5 mm/year (with decadal periods of no rainfall), which appears to be
below the threshold of water availability required to support soil phototrophic life [WarrenRhodes et al., 2006]. At slightly higher elevations in the Atacama region precipitation allows for
sparse vegetation in a zone between 3000 and 4900 m.a.s.l. [Arroyo et al., 1988; Richter and
Schmidt, 2002]. At elevations above 5000 m.a.s.l., extreme conditions create a Mars-like
landscape (totally devoid of plant life) that receives intermittent snowfall, most of which
sublimates back to the atmosphere [Richter and Schmidt, 2002]. Very little work has been done
on un-vegetated soils above 5000 m.a.s.l. [Schmidt et al., 2009, 2011], especially on the large
stratovolcanoes that dot the Atacama region [Costello et al. 2009, Halloy 1991]. Volcán
Llullaillaco (6739 m.a.s.l.) and Volcán Socompa (6051 m.a.s.l.) are part of a chain of
stratovolcanos that comprise the Andean Central Volcanic Zone [Stern, 2004], which rise above
the “true desert” zone [Arroyo et al., 1988] and altiplano of the Atacama region. Although these
volcanoes receive snowfall, they are at present largely un-glaciated [Richards and Villeneuve,
2001], making their upper reaches some of the highest-elevation exposed soil and lithic
environments on Earth (Figure 2.1).
6
7
Figure 2.1. Photographs of the Volcán Llullaillaco (6739 m.a.s.l.) taken during the February
2009 expedition. (a) The upper plant-free zone as viewed from an elevation of ∼5400 m.a.s.l.
during the climb (from the east). (b) Southeastward view toward Cerro Rosado from the high
sample site at 6330 m.a.s.l. These soils were deposited at least 48,000 years ago [Richards and
Villeneuve, 2001] but are still unvegetated. (Photo credit: Preston Sowell)
In spite of the interest in the Atacama region [Connon et al., 2007; Lester et al., 2007;
Warren-Rhodes et al., 2006], the upper band of unvegetated mineral soil and rock that extends
from 5000 to over 6700 m.a.s.l. has received little attention except from archeologists [Wilson et
al., 2007]. Initial exploration of the upper unvegetated zone on Volcán Socompa in 2005
revealed a low diversity microbial community, and low levels of organic matter (0.03%) in the
mineral soils at 5235 m.a.s.l. [Costello et al., 2009]. The present study was undertaken to
determine if soils at significantly higher elevations (>6000 m.a.s.l.) in this region are similarly
depauperate, or if the increased snowfall at higher elevations counterbalances the harsh
conditions in a way that increases either diversity or activity of microbial communities.
Here we report on the results of the first biogeochemical and cultivation–independent
exploration of the potential for microbial activity in mineral soils above 6000 m.a.s.l. in the
Andean Central Volcanic Zone. These data suggest that a low diversity, low energy ecosystem of
unique and previously uncharacterized microbes may function during periodic episodes of
favorable conditions. Although oxygenic phototrophs are absent from all samples, we suggest the
ecosystem has at most two trophic levels and is subsisting on both aeolian organic carbon inputs
as well as chemoautotrophic CO2 fixation and trace gas oxidation.
8
2.3 Methods
Soil samples and data logger data were collected during the austral summer in mid-February
2009 at elevations ranging from 5500 to 6330 m.a.s.l. on Volcán Socompa and Volcán
Llullaillaco. Soils used for biogeochemical and microbial diversity measurements were collected
on February 14 from six spatially separated samples (to four cm depth) in a semi-nested
sampling scheme [King et al., 2008, 2010b] at elevations of 6034 m.a.s.l. and 6330 m.a.s.l. on
Volcán Llullaillaco. Soil temperatures at four cm depth and the soil surface were recorded every
15 min at two sites on Volcán Socompa and Volcán Llullaillaco using HOBO Pendant data
loggers (UA-002-08, Onset Computer Corp., Bourne, Mass.). The data from the loggers were
also used to calculate sub-zero rates of soil cooling, a parameter that can profoundly affect
microbial survival in soils [Henry, 2007; Lipson et al., 2000; Schmidt et al., 2009]. Rates of subzero soil cooling were estimated by using linear regressions of soil temperatures after soils
dropped below 0 C. The rates obtained were deemed reliable if the R2 value from the regression
was greater than 0.96 from at least five data points during the linear cooling period. This
sampling expedition was part of a broader global study of biodiversity at high elevation sites in
the Andes, Rockies, and Himalayan mountain ranges and more information about sampling
protocol and sites has been published previously [Freeman et al., 2009; King et al., 2010a,
2010b; Schmidt et al., 2011].
Dissolved organic carbon (DOC) and nitrogen (DON) and microbial biomass carbon
(MBC) and nitrogen (MBN) were determined using a Shimadzu TOC-V CSN Analyzer with a
previously described protocol [King et al., 2008]. Total nitrogen (N) and carbon (C)
measurements were performed according to the method of Nemergut et al. [2007], wherein soils
were dried and sieved to 2 mm then ground to a fine powder and measured for percent C and N
9
by mass using a Carlo-Erba combustion-reduction elemental analyzer (CE Elantech, USA). Soil
water content was measured gravimetrically as the difference between the weight of the soils at
field conditions and the weight after drying at 80 C for 48 h. Soil pH was determined using a
glass pH probe (Oakion Instruments, Vernon Hills, IL, USA) in soil slurries consisting of 2 g soil
and 2 ml of water that were shaken for 1 h. Levels of common microbially produced
extracellular enzymes were also measured using standard techniques adapted for cold soils as
described by King et al. [2008, 2010b]. Enzyme activities assayed were: N-acetylglucosaminase,
cellulase (b-glucosidase), a-glucosidase, b-xylase, cellobiosidase, leucine aminopeptidase and
phosphatase. For each sample, 2 g of soil was added to 150 ml of buffer (adjusted to the pH of
the soil) and homogenized at 3000 rpm for 1 min using an Ultra-Turrax homogenizer (IKA
Works Inc., USA). Soil slurries were incubated for 20 h at 14_C using the controls, fluorescent
substrates, and volumes as described in King et al. [2008].
DNA was extracted from the soils using the MO BIO Power Soil bead beating kit (MO
BIO Laboratories, Inc., Carlsbad, CA, USA). Community small-subunit ribosomal DNA was
PCR amplified using the 18S/16S primers 4Fa-short (5-ATCCGGTTGATCCTGC-3) and 1492R
(5-GGTTACCTTGTTACGACTT-3), and 16S primers 8F (5-AGAGTTTGATCCTGGCTCAG3) and 1391R (5-GACGGGCGGTGWGTRCA-3). The large subunit of the carbon monoxide
dehydrogenase gene (coxL) was targeted for PCR amplification using the primers Ompf (5GGCGGCTTYGGSAASAAGGT- 3) and O/Br (5-YTCGAYGATCATCGGRTTGA- 3) [King,
2003b]. Amplicons were then gel purified, and cloned as described elsewhere [Schmidt et al.,
2011]. Cell pellets were sent to Functional Biosciences (Madison, WI, USA) for plasmid
extraction, and bidirectional Sanger sequencing. Sequences were vector trimmed and assembled
into contigs using SEQUENCHER 4.6 (Gene Codes Co., Ann Arbor, MI, USA). For the
10
ribosomal small subunit data, the full contigs were then aligned with the SINA aligner tool
[Pruesse et al., 2007]. The parsimony insertion function of ARB (5.1) was then utilized to
determine the nearest relatives in the Silva 108 database, which formed the basis for taxonomy
assignment [Ludwig et al., 2004]. An iterative process of calculating neighbor joining trees using
the Felsenstein correction and a 35% minimum identity per residue filter in ARB, with National
Center for Biotechnology Information (NCBI) web-based BLASTN homology tests [Altschul et
al., 1990], was used to refine our sequence classifications. We then clustered our sequences with
the select database guide sequences into 97% identity operational taxonomic units (OTUs) using
the average neighbor algorithm implementation in mothur [Schloss et al., 2009]. For the coxL
data set, our multiple sequence alignment (MSA) of translated amino acids was built in ClustalX
(2.0) [Larkin et al., 2007], and anchored around the essential active site motifs. This ‘Form I’
(OMP) motif (AYXCSFR) is 100% conserved in the MSA, which seems to be restricted to
functional coxL genes [King and Weber, 2007]. A final uncorrected neighbor-joining tree with
1000 bootstrap replicates was calculated in ClustalX (2.0) after top scoring NCBI BLASTP hits
were added into the MSA. Comparisons of microbial community beta diversity among sites was
done using weighted Unifrac analysis [Lozupone and Knight, 2005].
2.4 Results
During our expedition in February of 2009 we were able to deploy data loggers at two high
elevation sites on Volcán Socompa and Volcán Llullaillaco to gain a preliminary indication of
how soil temperatures vary on a diurnal basis. Due to time and weather restrictions, these data
loggers were deployed at the highest camps on each volcano at elevations slightly lower than the
sampling sites. Nonetheless, the data paint an extraordinary picture of the temperature
11
fluctuations faced by life in these high elevation soils. Temperatures on Socompa volcano at
5500 m.a.s.l. dropped to overnight lows of -10 C and reached highs of 56 C by midday on the
soil surface (Figure 2.2a). On Volcán Llullaillaco we were only able to deploy data loggers for
16 h at 5737 m, but they show a similar trajectory of subfreezing overnight lows (-15 C)
followed by a rapid rise in temperatures in the morning (Figure 2.2b). Linear rates of subzero
temperature decline (at 4 cm depth) were 1.15 C h-1 (R2 = 0.996) and 1.50 Ch-1 (R2 = 0.991), on
Volcán Socompa and Volcán Llullaillaco, respectively.
Figure 2.2 Diurnal temperature fluctuations on Volcán Socompa and Volcán Llullaillaco. (a)
Surface soil temperatures at base camp (5500 m.a.s.l.) on Socompa Volcano ranged from a low
of −10.2°C to a high of 56.2°C with an amplitude of 66.4°C. Temperature extremes at 4 cm
depth were dampened with an amplitude of 48.2°C. (b) Due to an incoming storm, we were
unable to capture the full diurnal temperature cycle at high camp on Volcán Llullaillaco
(5737 m.a.s.l.), but nighttime lows reached −14.5°C and −9.4°C at the surface and 4 cm depth,
respectively.
Our analyses demonstrate for the first time the truly oligotrophic status of these soils,
with levels of carbon similar to other almost lifeless soils. In addition, total nitrogen values were
below detection limits in all samples, indicating that nitrogen levels in these soils are less than 25
mg N g soil-1. Likewise, microbial biomass C and N were extremely low (Table 2.1). Water
12
levels in the soils at the time of sampling were also extremely low, and the soils were quite
acidic (Table 2.1). Levels of common microbial extracellular enzymes were also mostly
undetectable despite the fact that methods were employed to increase the sensitivity of these
measurements for cold oligotrophic soils. Our comprehensive 16S and 18S targeted surveys of
the soil community revealed a microbial community noteworthy for overall low diversity and the
phylogenetic uniqueness of the component community members (Figure 2.3). The species
richness Chao1 estimate for bacteria, pooled from five sample sites is 95 OTUs (97% identity).
Nearly 75% of that total bacterial diversity is contained within just four OTUs, and our sampling
effort recovered representatives from only nine bacterial phyla. All between site community beta
diversity tests (weighted Unifrac) were significantly different at both 5 m and 300 m scales (P <
0.05). Of particular interest is the shift in dominance of a Pseudonocardia-like OTU at our lower
elevation (6034 m.a.s.l.) sites, to the dominance of a relative of the Ktedonobacter genus at the
6330 m.a.s.l. sites.
13
Figure 2.3 Normalized rank-abundance plot for the three domains of life. Each bar represents a
single operational taxonomic unit (OTU), and vertical bar height is proportional to the total
number of sequences for all OTUs. OTUs were assembled using a 3% maximum difference as
determined by the average-neighbor algorithm. Values in parentheses following OTU names are
the uncorrected genetic distance to the nearest National Center for Biotechnology Information
(NCBI) database match. Number of sequences: 512 Bacteria, 318 Eukaryotes, and 81 Archaea.
Low Site
High Site
M.A.S.L.
6034
6330
UTM coordinates
19J 05481687266202
19J 05476147266157
Percent water
0.24 (0.1)
0.25 (0.2)
TOC (%)
0.017 (0.006)
0.005 (0.005)
TON (%)
<d.l.b
<d.l.
Extractable DOC (μg g dry soil−1)
1.3 (0.9)
2.0 (1.2)
Extractable TDN (μg g dry soil−1)
0.7 (0.4)
0.6 (0.5)
pH
4.2 (0.03)
4.6 (0.1)
Microbial biomass C (μg g−1)
30.61 (30.61)
58.07 (24.6)
Microbial biomass N (μg g−1)
2.24 (0.52)
1.15 (0.9)
BG (nmol h−1 g−1)
0.24 (0.1)
0.25 (0.2)
NAG (nmol h−1 g−1)
0.02 (0.02)
0.05 (0.006)
PHO (nmol h−1 g−1)
0.17 (0.16)
0.26 (0.09)
Table 2.1. Biogeochemical Properties of the High-Elevation Mineral Soils of Volcán
Llullaillacoa
Enzyme activities are abbreviated as BG for β-glucosidase, NAG for N-aceytalglucosaminase, and PHO for
phosphatase. Activity of α-glucosidase,β-xylase, cellobiosidase, and leucine aminopeptidase was below
detection limit. All values are the means of at least 3 replicates with the standard error of the mean in
parentheses.
b
Below detection limit.
a
Eukaryotic diversity was restricted to only seven 18S OTUs (97% identity) and 92% of
the total sampling effort (>300 sequences) revealed a single novel OTU. This dominant OTU is
most closely related to endolithic and xerotolerant members of the Cryptococcus-albidus clade
14
(Figure 2.4). Archaeal diversity was limited to just one 16S OTU across all sites, which is most
closely related to the obligate oligotrophs of the phylum Thaumarchaeota.
Figure 2.4 Bayesian consensus tree of basidiomycetous yeasts from Llullaillaco and Socompa
volcanoes, with established representatives of the Cryptococcus clades shown for reference. The
length of the rectangles is relative to the abundance of sequences within each 1% OTU, with the
smallest rectangle equaling one sequence. Asterisks indicate node support of >70% posterior
probability. Data are from Schmidt et al. [2012]
Absent from our data are any known chlorophyll containing clades of bacteria or algae.
The lack of traditional photoautotrophs was partially confirmed by the lack of observable
autofluorescence (680 nm) using the same methods that detected very low levels of chlorophyll
containing algae and cyanobacteria in high elevation soils of the Himalayas. Given the lack of
evidence for phototrophic primary production, we began a preliminary exploration of other
means of carbon and energy acquisition in these soils. Sequences of the large subunit of the
carbon monoxide dehydrogenase gene (coxL of CODH) from Volcán Llullaillaco soils are at
minimum 5% different, and in one instance up to 22% different, compared to their nearest
15
database relatives (Figure 2.5). These nearest relatives are for the most part uncultured
representatives from other oligotrophic volcanic deposits and cultured Actinobacteria (rather
than common CO oxidizing Proteobacteria), which re-enforces the general phylogenetic signal
from our SSU rDNA data. Additionally, despite these large genetic distances, we are confident in
the coxL homology of these sequences, due to the 100% conservation of the primary catalytic
site motif (as well as four other separate sites) that contact an essential molybdopterin cytosine
dinucleotide cofactor.
Figure 2.5 Carbon monoxide can be oxidized by the metalloprotein carbon monoxide
dehydrogenase (CODH). Here we show an uncorrected neighboring-joining tree of the translated
proteins of the catalytic coxL sequences that were PCR amplified from high-elevation Volcán
16
Llullaillaco soils (with the nearest NCBI database relatives). These sequences may represent the
first reported instances of psychrotolerant or psychrophilic CO oxidizers.
2.5 Discussion
Taken together, these results suggest that conditions in the high-mountain mineral soils above
6000 m.a.s.l. are more restrictive to life than nearly anywhere on the surface of Earth. Despite
potentially higher water availability due to orographic snowfall compared to the lower elevation
portions of the Atacama, high-elevations pose additional challenges to life. The thinness of the
atmosphere exposes any surface life to severe solar radiation [Farías et al., 2009], and massive
daily temperature cycles across the freezing point (Figure 2.2) UV exposure for only 1 day,
combined with extreme aridity, has been previously shown to sterilize both monolayers of
Chroococcidiopsis, as well as dormant Bacillus endospores at just 1000 m.a.s.l. in the Atacama
[Cockell et al., 2008]. Given that UV intensity increases 4– 10% every 1000 m in elevation
gained [Cabrera et al., 1995], our sites above 6000 m.a.s.l. may be subjected to the most UV
exposure of any terrestrial soil environment studied to date.
Daily temperature cycling across the freezing point is considered a key challenge that
severely limits net primary productivity in the similarly extreme Dry Valleys of Antarctica [Cary
et al., 2010]. At Dry Valley sites temperatures vary more than 20 C per day during the austral
summer, resulting in annual net primary productivity (NPP) in the 1 - 20 g carbon m-2 yr-1 range
[Aislabie et al., 2006; Novis et al., 2007]. During our 2009 expedition to the mountains of the
Atacama region, mineral soils at 5500 m.a.s.l. experienced triple the diurnal temperature
fluctuations of Antarctic Dry Valley soils (Figure 2.2). While winter Antarctic Dry Valley soil
temperatures stay well below freezing, with daily minimums of _40 to _60_C, insulating
mountain-top snow cover could potentially offer a dark microbial niche [Freeman et al. 2009;
17
Ley et al., 2004], but no data currently exists for the duration and depth of snow cover (or
wintertime temperatures) above 6000 m in the Atacama region.
Previous work has also shown that the rate of freezing is an important parameter
determining the survivability of microbes in cold terrestrial ecosystems [Henry, 2007; Lipson et
al., 2000; Schmidt et al., 2009]. For example, Lipson et al. [2000] showed that alpine tundra
microbial biomass levels were significantly depressed by cooling rates of over 1.4 Ch-1
(measured at the soil surface) but were largely unaffected by slower rates of soil cooling. The
linear cooling rates recorded on Volcán Llullaillaco (1.50 C h-1 at 4 cm depth) were faster than
1.4 C h-1 and were comparable to the highest rates of subzero soil cooling yet reported (1.83 C h1
), measured during the austral winter at 5400 m.a.s.l. in barren, peri-glacial soils of the Peruvian
Andes [Schmidt et al., 2009]. The rate of soil freezing on Volcán Llullaillaco is also much faster
than that measured in limited studies of high elevation soils (5000 m.a.s.l.) in the Himalayas and
Tibetan Plateau [cf. King et al., 2010a, Yang et al., 2003].
The average organic carbon value from our six sites on Volcán Llullaillaco (163 mg C g
soil-1), classifies these high-mountain soils as highly oligotrophic; at the low end of the range
found in other extreme deserts [Drees et al., 2006; Parsons et al., 2004]. Soils on the hyper-arid
desert floor of the Atacama contain organic carbon values consistently below that of the samples
studied here. However, pyrolysis-GC-MS analysis of the desert floor organic carbon revealed a
much simpler mixture of organic compounds than that released from living microbes [NavarroGonzález et al., 2003]. This, and other evidence, suggests that life is rarely if ever active in some
parts of the soil in the hyper-arid core of the Atacama. Conversely, 18 year old volcanic deposits
on Kilauea volcano of the Hawaiian archipelago reportedly contain only slightly more (200 mg C
g soil-1) organic carbon than the >6000 m.a.s.l soils, yet conclusively demonstrate in situ
18
biological uptake of CO2, CO, and H2 [King, 2003a]. Although exact ages for the parent
volcanic deposits of our samples are currently undetermined, we know they are much older than
0.048 +/_ 0.012 Ma, based on the work of Richards and Villeneuve [2001]. On Volcán
Llullaillaco the early colonizers appear to have gained a foothold, but unlike less restrictive
environments, are never supplanted by later successional communities even after tens of
thousands of years.
In addition to low TOC values for Llullaillaco soils, our estimates of microbial biomass
carbon (MBC) were also extremely low (Table 1). These values are similar to those measured
(using the same method) in soils of the Dry Valleys of Antarctica (26 mg C g_1) [Ball et al.,
2009] and high elevation soils of the Himalayas (21 mg C g-1) [Schmidt et al., 2011]. They are
also lower than MBC values (140 mg C g-1 averaged across many sites in a plant-free, recently
de-glaciated landscape in the high Andes of Perú [King et al., 2008]. For comparison vegetated
soils usually have MBC levels that are two orders of magnitude higher than those reported here
[Cleveland et al., 2004; Weintraub et al., 2007]. Another indication of the extreme nature of
Llullaillaco soils is that the levels of measurable enzyme activities (Table 1) were 3 to 80 times
lower than values from the driest sites studied by Zeglin et al. [2009] in the Antarctic Dry
Valleys.
Aside from revealing a low diversity community, which lacks obvious phototrophs, our
molecular phylogenetic analyses hint at a set of traits necessary for survival in the >6000 m.a.s.l.
soil environment. For example, the dominant Actinobacterial OTU is closely related (94%
identity) to Pseudonocardia asaccharolytica (Y08536), which can oxidize dimethyl sulfide
(DMS) for energy [Reichert et al., 1998]. Nearer un-cultured database relatives are from
Icelandic and Azorean volcanic deposits (GQ495403, HM445437). Likewise, the dominant
19
Chloroflexi lineage branches from Ktedonobacter racemifer (AM180159), a putative facultative
‘carboxydovore’ [Chang et al., 2011], which may be able to use carbon monoxide (CO) as an
electron donor and carbon source, in addition to wide array of organic carbon substrates
[Cavaletti et al., 2006]. Other un-cultured relatives are from dry Antarctic soils (FR749824,
FR749772). Both of these distantly related clades seem to a share a number of convergent traits
that confer success in these oligotrophic environments: a mixotrophic lifestyle, filamentous
morphology, and the ability to sporulate.
The extremely limited eukaryotic and archaeal diversity mirrors the organic carbon
restriction, which can only support all but the most efficient of secondary trophic consumers.
Members of the Cryptococcus-albidus clade (Basidiomycetious yeasts, Figure 2.4) seem well
suited to this role. They have radiated widely into xeric environments, where they can occupy the
endolithic niche as highly competitive heterotrophs due, in part, to abundant carbohydrate
capsule production [Vishniac, 2006]. Although knowledge is limited regarding the archaeal
phylum Thaumarchaeota [Brochier-Armanet et al., 2008], multiple lines of evidence suggest they
can aerobically oxidize trace quantities ammonia for energy [Könneke et al., 2005], and have a
broad distribution in soil environments [Bates et al., 2011; Oline et al., 2006]. Overall, our
analyses suggest that energy and carbon sources for microbial activity above 6000 m.a.s.l. could
be derived from a combination of heterotrophic respiration of aeolian deposited organic carbon,
and chemoautotrophic carbon fixation driven by aerobic oxidation of ammonia, DMS, and CO.
Although energy yield from trace gas oxidation is limited, it is a constantly available
substrate, even in the dark deeper layers of soil and rock where microbes can avoid the massive
diurnal temperature swings, rapid cooling and UV exposure of the surface environment.
Additionally, even though global atmospheric CO concentrations are only in the 5–350 ppb
20
range, proximity to fumaroles may increase CO availability on volcanoes [King, 1999; Symonds
et al., 1994]. The last un-official reported activity of Volcán Llullaillaco dates to 1887
(www.volcano.si.edu), but it is unknown whether the local atmosphere is currently being
enriched with volcanic gases. Either way, our coxL data (Figure 2.5) are genetic novelties that
represent either divergent natural selection driven by this unique environment, or genetic drift by
geographic isolation, both of which support the hypothesis that soils above 6000 m.a.s.l. harbor
functioning microbial ecosystems.
As discussed above, our results suggest that an endogenous community of novel
microbes may be periodically active in this understudied high-elevation setting. However, it is
also possible that continuous atmospheric deposition of microbial propagules is responsible for
some of the genetic diversity seen in these soils. Microbes are well known to be globally
dispersed in the upper atmosphere [Darcy et al., 2011; Mladenov et al., 2011] and it is possible
that there is a constant input of ice nucleating [Christner et al., 2008] and other microbes to these
soils. But the unique and extreme environmental conditions on Volcán Llullaillaco are likely to
be highly selective for specific microbes. Indeed the limited diversity of the microbial
community on Llullaillaco suggests strong selection because the microbial groups present do not
match the profiles of atmospheric microbial communities. For example, the Llullaillaco soils
contain less than 1% of the common groups Betaproteobacteria, Firmicutes and Pseudomonas
and 7 out of 10 major groups of bacteria that are abundant in atmospheric samples [Bowers et al.,
2012]. Likewise the limited fungal diversity on Llullaillaco is very different than the profile of
fungal spores found in atmospheric samples; out of the 22 different fungal genera present in high
elevation atmospheric samples [Amato et al., 2007], only 2 were present on Volcán Llullaillaco.
However, studies of the connection between atmospheric and terrestrial microbes are in their
21
infancy and much more work is needed to determine both the origin and function of the
microbial communities of high elevation soils [Meyer et al., 2004; Schmidt et al., 2011].
Like the chemosynthetic ecosystems of the deep sea and deep subsurface biosphere [e.g.,
Connelly et al., 2012; Lin et al., 2006], life on the Earth’s highest volcanoes may not be
supported by in situ photosynthesis but rather by the oxidation of gaseous substrates. Our work
suggests that the highest sites on Volcán Llullaillaco are devoid of photosynthetic primary
producers and contain unique microbial communities that may be partially supported by the
oxidation of carbon monoxide, but more work is needed to test this hypothesis. Future research
at sites above 6000 m.a.s.l. will focus on isolation of the dominant microbes from high elevation
sites, and determination of survival and growth under conditions that mimic the extreme
temperature fluctuations and low energy inputs of the environment. It is expected that these
organisms are reservoirs of uncharacterized biological traits that allow adaptation to the unique
challenges of this dynamic and oligotrophic environment. Deeper insight into these outer limits
of biological adaptive capacity will inform our understanding of biogeochemical processes under
conditions never before examined in terrestrial ecosystems. This work may also be informative
for the search for life on other planets, especially in light of recent analyses that suggest seasonal
near-surface water flow on Mars [McEwen et al., 2011].
2.6 References
Aislabie, J., K. Chhour, D. Saul, S. Miyauchi, J. Ayton, R. Paetzold, and M. Balks (2006),
Dominant bacteria in soils of Marble Point and Wright Valley, Victoria Land, Antarctica, Soil
Biol. Biochem., 38, 3041–3056, doi:10.1016/j.soilbio.2006.02.018.
Altschul, S. F., W. Gish, W. Miller, E. W. Meyers, and D. J. Lipman (1990), Basic local
alignment search tool, J. Mol. Biol., 215, 403–410.
22
Amato, P., M. Parazols, M. Sancelme, P. Laj, G. Mailhot, and A.-M. Delort (2007),
Microorganisms isolated from the water phase of tropospheric clouds at the Puy de Dôme: Major
groups and growth abilities at low temperatures, FEMS Microbiol. Ecol., 59, 242–254,
doi:10.1111/j.1574-6941.2006.00199.x.
Arroyo, M. T. K., F. A. Squeo, J. J. Armesto, and C. Villagrán (1988), Effects of aridity on plant
diversity in the northern Chilean Andes: Results of a natural experiment, Ann. Mo. Bot. Gard.,
75, 55–78, doi:10.2307/2399466.
Ball, B. A., R. A. Virginia, J. E. Barrett, A. N. Parsons, and W. H. Wall (2009), Interactions
between physical and biotic factors influence CO2 flux in Antarctic Dry Valley soils, Soil Biol.
Biochem., 41, 1510–1517, doi:10.1016/j.soilbio.2009.04.011.
Bates, S. T., D. Berg-Lyons, J. G. Caporaso, W. Walters, R. Knight, and N. Fierer (2011),
Examining the global distribution of dominant archaeal populations in soil, ISME J., 5, 908–917,
doi:10.1038/ismej.2010.171.
Bowers, R. M., I. B. McCubbin, A. G. Hallar, and N. Fierer (2012), Seasonal variability in
airborne bacterial communities at a high-elevation site, Atmos. Environ., 50, 41–49,
doi:10.1016/j.atmosenv.2012.01.005.
Brochier-Armanet, C., B. Boussau, S. Gribaldo, and P. Forterre (2008), Mesophilic
Crenarchaeota: Proposal for a third archaeal phylum, the Thaumarchaeota, Nat. Rev. Microbiol.,
6, 245–252, doi:10.1038/nrmicro1852.
Cabrera, S., S. Bozzo, and H. Fuenzalida (1995), Variations in UV radiation in Chile, J.
Photochem. Photobiol., 28, 137–142, doi:10.1016/1011-1344(94) 07103-U.
Cary, S. C., I. R. McDonald, J. E. Barrett, and D. A. Cowan (2010), On the rocks: The
microbiology of Antarctic Dry Valley soils, Nat. Rev. Microbiol., 8, 129–138,
doi:10.1038/nrmicro2281.
Cavaletti, L., P. Monciardini, R. Bamonte, P. Schumann, M. Rohde, M. Sosio, and S. Donadio
(2006), New lineage of filamentous, sporeforming, gram-positive bacteria from soil, Appl.
Environ. Microbiol., 72, 4360–4369, doi:10.1128/AEM.00132-06.
Chang, Y. J., et al. (2011), Non-contiguous finished genome sequence and contextual data of the
filamentous soil bacterium Ktedonobacter racemifer type strain (SOSP1–21), Stand. Genomic
Sci., 5, 97–111, doi:10.4056/sigs.2114901.
Christner, B. C., C. E. Morris, C. M. Foreman, R. Cai, and D. C. Sands (2008), Ubiquity of
biological ice nucleators in snowfall, Science, 319, 1214, doi:10.1126/science.1149757.
Cleveland, C. C., et al. (2004), Soil microbial dynamics in Costa Rica: Seasonal and
biogeochemical constraints, Biotropica, 36, 184–195.
23
Cockell, C. S., C. P. McKay, K. Warren-Rhodes, and G. Horneck (2008), Ultraviolet radiationinduced limitation to epilithic microbial growth in arid deserts—Dosimetric experiments in the
hyperarid core of the Atacama Desert, J. Photochem. Photobiol., 90, 79–87, doi:10.1016/j.
jphotobiol.2007.11.009.
Connelly, D. P., et al. (2012), Hydrothermal vent fields and chemosynthetic biota on the world’s
deepest seafloor spreading centre, Nat. Commun., 3, 620, doi:10.1038/ncomms1636.
Connon, S. A., E. D. Lester, H. S. Shafaat, D. C. Obenhuber, and A. Ponce (2007), Bacterial
diversity in hyperarid Atacama Desert soils, J. Geophys. Res., 112, G04S17,
doi:10.1029/2006JG000311.
Costello, E. K., S. R. P. Halloy, S. C. Reed, P. Sowell, and S. K. Schmidt (2009), Fumarolesupported islands of biodiversity within a hyperarid, high-elevation landscape on Socompa
Volcano, Puna de Atacama, Andes, Appl. Environ. Microbiol., 75, 735–747,
doi:10.1128/AEM.01469-08.
Darcy, J. L., R. C. Lynch, A. J. King, M. S. Robeson, and S. K. Schmidt (2011), Global
distribution of Polaromonas phylotypes—Evidence for a highly successful dispersal capacity,
PLoS ONE, 6, e23742, doi:10.1371/journal.pone.0023742.
Drees, K. P., J. W. Neilson, J. L. Betancourt, J. Quade, D. A. Henderson, B. M. Pryor, and R. M.
Maier (2006), Bacterial community structure in the hyperarid core of the Atacama Desert, Chile,
Appl. Environ. Microbiol., 72, 7902–7908, doi:10.1128/AEM.01305-06.
Farías, M. E., V. Fernández-Zenoff, R. Flores, O. Ordóñez, and C. Estévez (2009), Impact of
solar radiation on bacterioplankton in Laguna Vilama, a hypersaline Andean lake (4650 m), J.
Geophys. Res., 114, G00D04, doi:10.1029/2008JG000784.
Freeman, K. R., A. P. Martin, D. Karki, R. C. Lynch, M. S. Mitter, A. F. Meyer, J. E. Longcore,
D. R. Simmons, and S. K. Schmidt (2009), Evidence that chytrids dominate fungal communities
in high-elevation soils, Proc. Natl. Acad. Sci. U. S. A., 106, 18,315–18,320, doi:10.1073/pnas.
0907303106.
Halloy, S. R. P. (1991), Islands of life at 6000 m altitude: The environment of the highest
autotrophic communities on Earth (Socompa Volcano, Andes), Arct. Alp. Res., 23, 247–262,
doi:10.2307/1551602.
Henry, H. A. L. (2007), Soil freeze–thaw cycle experiments: Trends, methodological weaknesses
and suggested improvements, Soil Biol. Biochem., 39, 977–986,
doi:10.1016/j.soilbio.2006.11.017.
King, A. J., A. F. Meyer, and S. K. Schmidt (2008), High levels of microbial biomass and
activity in unvegetated tropical and temperate alpine soils, Soil Biol. Biochem., 40, 2605–2610,
doi:10.1016/j.soilbio.2008.06.026.
24
King, A. J., D. Karki, L. Nagy, A. Racoviteanu, and S. K. Schmidt (2010a), Microbial biomass
and activity in high elevation (>5100 meters) soils from the Annapurna and Sagarmatha regions
of the Nepalese Himalayas,Himalayan J. Sci., 6, 11–18, doi:10.3126/hjs.v6i8.2303.
King, A. J., K. R. Freeman, K. F. McCormick, R. C. Lynch, C. A. Lozupone, R. Knight, and S.
K. Schmidt (2010b), Biogeography and habitat modelling of high-alpine bacteria, Nat.
Commun., 1, 53, doi:10.1038/ncomms1055.
King, G. M. (1999), Characteristics and significance of atmospheric carbon monoxide
consumption by soils, Chemosphere Global Change Sci., 1, 53–63, doi:10.1016/S14659972(99)00021-5.
King, G. M. (2003a), Contributions of atmospheric CO and hydrogen uptake to microbial
dynamics on recent Hawaiian volcanic deposits, Appl. Environ. Microbiol., 69, 4067–4075,
doi:10.1128/AEM.69.7.4067-4075.2003.
King, G. M. (2003b), Molecular and culture-based analyses of aerobic carbon monoxide oxidizer
diversity, Appl. Environ. Microbiol., 69, 7257–7265, doi:10.1128/AEM.69.12.7257-7265.2003.
King, G. M., and C. F. Weber (2007), Distribution, diversity and ecology of aerobic COoxidizing bacteria, Nat. Rev. Microbiol., 5, 107–118, doi:10.1038/nrmicro1595.
Könneke, M., A. E. Bernhard, J. R. de la Torre, C. B. Walker, J. B. Waterbury, and D. A. Stahl
(2005), Isolation of an autotrophic ammonia-oxidizingmarine archaeon, Nature, 437, 543–546,
doi:10.1038/nature03911.
Larkin, M. A., et al. (2007), Clustal W and Clustal X version 2.0, Bioinformatics, 23, 2947–
2948, doi:10.1093/bioinformatics/btm404.
Lester, E. D., M. Satomi, and A. Ponce (2007), Microflora of extreme arid Atacama Desert soils,
Soil Biol. Biochem., 39, 704–708, doi:10.1016/j.soilbio.2006.09.020.
Ley, R. E., M. W. Williams, and S. K. Schmidt (2004), Microbial population dynamics in an
extreme environment: Controlling factors in talus soils at 3750 m in the Colorado Rocky
Mountains, Biogeochemistry, 68, 297–311, doi:10.1023/B:BIOG.0000031032.58611.d0.
Lin, L.-H., et al. (2006), Long-term sustainability of a high-energy, low-diversity crustal biome,
Science, 314, 479–482, doi:10.1126/science.1127376.
Lipson, D. A., S. K. Schmidt, and R. K. Monson (2000), Carbon availability and temperature
control the post-snowmelt decline in alpine soil microbial biomass, Soil Biol. Biochem., 32,
441–448, doi:10.1016/S0038-0717(99)00068-1.
Lozupone, C., and R. Knight (2005), UniFrac: A new phylogenetic method for comparing
microbial communities, Appl. Environ. Microbiol., 71,8228–8235,
doi:10.1128/AEM.71.12.8228-8235.2005.
25
Ludwig, W., et al. (2004), ARB: A software environment for sequence data, Nucleic Acids Res.,
32, 1363–1371, doi:10.1093/nar/gkh293.
McEwen, A. S., L. Ojha, C. M. Dundas, S. S. Mattson, S. Byrne, J. J. Wray, S. C. Cull, S. L.
Murchie, N. Thomas, and V. C. Gulick (2011), Seasonal flows on warm Martian slopes, Science,
333, 740–743, doi:10.1126/ science.1204816.
Meyer, A. F., et al. (2004), Molecular and metabolic characterization of cold tolerant, alpine soil
Pseudomonas, sensu stricto, Appl. Environ. Microbiol., 70, 483–489, oi:10.1128/AEM.70.1.483489.2004.
Mladenov, N., et al. (2011), Dust inputs and bacteria influence dissolved organic matter in clear
alpine lakes, Nat. Commun., 2, 405, doi:10.1038/ncomms1411.
Navarro-González, R., et al. (2003), Mars-like soils in the Atacama Desert, Chile, and the dry
limit of microbial life, Science, 302, 1018–1021, doi:10.1126/science.1089143.
Nemergut, D. R., S. P. Anderson, C. C. Cleveland, A. P. Martin, A. E. Miller, A. Seimon, and S.
K. Schmidt (2007), Microbial community succession in unvegetated, recently deglaciated soils,
Microb. Ecol., 53, 110–122, doi:10.1007/s00248-006-9144-7.
Novis, P. M., D. Whitehead, E. G. Gregorich, J. E. Hunt, A. D. Sparrow, D. W. Hopkins, B.
Elberling, and L. G. Greenfield (2007), Annual carbon fixation in terrestrial populations of
Nostoc commune (Cyanobacteria) from an Antarctic dry valley is driven by temperature regime,
Global Change Biol., 13(6), 1224–1237, doi:10.1111/j.1365-2486.2007.01354.x.
Oline, D. K., S. K. Schmidt, and M. C. Grant (2006), Biogeography and landscape-scale
diversity of the dominant Crenarchaeota of soil, Microb. Ecol., 52, 480–490,
doi:10.1007/s00248-006-9101-5.
Parsons, A. N., J. E. Barrett, D. H. Wall, and R. A. Virginia (2004), Soil carbon dioxide flux in
Antarctic dry valley ecosystems, Ecosystems, 7,286–295, doi:10.1007/s10021-003-0132-1.
Pruesse, E., C. Quast, K. Knittel, B. Fuchs, W. Ludwig, J. Peplies, and F. O. Glöckner (2007),
SILVA: A comprehensive online resource for quality checked and aligned ribosomal RNA
sequence data compatible with ARB, Nucleic Acids Res., 35, 7188–7196,
doi:10.1093/nar/gkm864.
Reichert, K., A. Lipski, S. Pradella, E. Stackebrandt, and K. Altendorf (1998), Pseudonocardia
asaccharolytica sp. nov. and Pseudonocardia sulfidoxydans sp. nov., two new dimethyl
disulfide-degrading actinomycetes and emended description of the genus Pseudonocardia, Int. J.
Syst. Evol. Microbiol., 48, 441–449, doi:10.1099/00207713-48-2-441.
26
Richards, J., and M. Villeneuve (2001), The Llullaillaco volcano, northwest Argentina:
Construction by Pleistocene volcanism and destruction by sector collapse, J. Volcanol.
Geotherm. Res., 105, 77–105, doi:10.1016/S0377-0273(00)00245-6.
Richter, M., and D. Schmidt (2002), Cordillera de la Atacama. Das trockenste Hochgebirge der
Welt, Petermanns Geogr. Mitt., 146, 48–57.
Schloss, P. D., et al. (2009), Introducing mothur: Open-source, platform independent,
community-supported software for describing and comparing microbial communities, Appl.
Environ. Microbiol., 75, 7537–7541, doi:10.1128/AEM.01541-09.
Schmidt, S. K., D. R. Nemergut, A. E. Miller, K. R. Freeman, A. J. King, and A. Seimon (2009),
Microbial activity and diversity during extreme freeze-thaw cycles in periglacial soils, 5400 m
elevation, Cordillera Vilcanota, Perú, Extremophiles, 13, 807–816, doi:10.1007/s00792-0090268-9.
Schmidt, S. K., R. C. Lynch, A. J. King, D. Karki, M. S. Robeson, L. Nagy, M. W. Williams, M.
S. Mitter, and K. R. Freeman (2011), Phylogeography of microbial phototrophs in the dry valleys
of the high Himalayas and Antarctica, Proc. R. Soc. London, Ser. B, 278, 702–708, doi:10.1098/
rspb.2010.1254.
Schmidt, S. K., C. S. Naff, and R. C. Lynch (2012), Fungal communities at the edge: Ecological
lessons from high alpine fungi, Fungal Ecol., 5, 443–452, doi:10.1016/j.funeco.2011.10.005.
Stern, C. R. (2004), Active Andean volcanism: Its geologic and tectonic setting, Rev. Geol.
Chile, 31, 161–206.
Symonds, R. B., W. I. Rose, G. Bluth, and T. M. Gerlach (1994), Volcanic gas studies: Methods,
results, and applications, Rev. Mineral. Geochem., 30, 1–66.
Vishniac, H. S. (2006), A multivariate analysis of soil yeasts isolated from a latitudinal gradient,
Microb. Ecol., 52, 90–103, doi:10.1007/s00248-006-9066-4.
Warren-Rhodes, K. A., K. L. Rhodes, S. B. Pointing, S. A. Ewing, D. C. Lacap, B. Gómez-Silva,
R. Amundson, E. I. Friedmann, and C. P. McKay (2006), Hypolithic cyanobacteria, dry limit of
photosynthesis, and microbial ecology in the hyperarid Atacama Desert, Microb. Ecol., 52, 389–
398,doi:10.1007/s00248-006-9055-7.
Weintraub, M. N., L. E. Scott-Denton, S. K. Schmidt, and R. K. Monson (2007), The effects of
tree rhizodeposition on soil exoenzyme activity, dissolved organic carbon, and nutrient
availability in a subalpine forest ecosystem, Oecologia, 154, 327–338, doi:10.1007/s00442-0070804-1.
Wilson, A. S., et al. (2007), Stable isotope and DNA evidence for ritual sequences in Inca child
sacrifice, Proc. Natl. Acad. Sci. U. S. A., 104, 16,456–16,461, doi:10.1073/pnas.0704276104.
27
Yang, M., T. Yao, X. Gou, T. Koike, and Y. He (2003), The soil moisture distribution, thawing–
freezing processes and their effects on the seasonal transition on the Qinghai–Xizang (Tibetan)
plateau, J. Asian Earth Sci., 21, 457–465, doi:10.1016/S1367-9120(02)00069-X.
Zeglin, L. H., R. L. Sinsabaugh, J. E. Barrett, M. N. Gooseff, and C. D. Takacs-Vesbach (2009),
Landscape distribution of microbial activity in the McMurdo Dry Valleys: Linked biotic
processes, hydrology, and geochemistry in a cold desert ecosystem, Ecosystems, 12, 562–573,
doi:10.1007/s10021-009-9242-8.
28
CHAPTER 3
METAGENOMIC EVIDENCE FOR METABOLISM OF TRACE ATMOSPHERIC GASES
BY HIGH-ELEVATION DESERT ACTINOBACTERIA2
3.1 Abstract
Previous surveys of very dry Atacama Desert mineral soils have consistently revealed sparse
communities of non-photosynthetic microbes. The functional nature of these microorganisms
remains debatable given the harshness of the environment and low levels of biomass and
diversity. The aim of this study was to gain an understanding of the phylogenetic community
structure and metabolic potential of a low-diversity mineral soil metagenome that was collected
from a high-elevation Atacama Desert volcano debris field. We pooled DNA extractions from
over 15 g of volcanic material, and using whole genome shotgun sequencing, observed only 75–
78 total 16S rRNA gene OTUs3%. The phylogenetic structure of this community is significantly
under dispersed, with actinobacterial lineages making up 97.9–98.6% of the 16S rRNA genes,
suggesting a high degree of environmental selection. Due to this low diversity and uneven
community composition, we assembled and analyzed the metabolic pathways of the most
abundant genome, a Pseudonocardia sp. (56–72% of total 16S genes). Our assembly and binning
efforts yielded almost 4.9 Mb of Pseudonocardia sp. contigs, which accounts for an estimated
2
Published as: Lynch RC, Darcy JL, Kane NC, Nemergut DR and Schmidt SK (2014) Metagenomic evidence for
metabolism of trace atmospheric gases by high-elevation desert Actinobacteria. Front. Microbiol. 5:698. doi:
10.3389/fmicb.2014.00698
29
99.3% of its non-repetitive genomic content. This genome contains a limited array of
carbohydrate catabolic pathways, but encodes for CO2 fixation via the Calvin cycle. The genome
also encodes complete pathways for the catabolism of various trace gases (H2, CO and several
organic C1 compounds) and the assimilation of ammonia and nitrate. We compared genomic
content among related Pseudonocardia spp. and estimated rates of non-synonymous and
synonymous nucleic acid substitutions between protein coding homologs. Collectively, these
comparative analyses suggest that the community structure and various functional genes have
undergone strong selection in the nutrient poor desert mineral soils and high-elevation
atmospheric conditions.
3.2 Introduction
The Atacama Desert is the driest and perhaps oldest desert on Earth, where an estimated 150 My
of sustained aridity and 3–4 My of hyperaridity across the central plateau have shaped the
landscape (Hartley et al., 2005). The Atacama region is bounded by the Andes to the east and by
the coastal mountain range and the cold water Pacific Humboldt current to the west (GómezSilva et al., 2008). These barriers restrict the flow of atmospheric moisture, which in turn results
in some of the most inhospitable proto-mineral soils on the planet that contain nearly
undetectable organic carbon stocks and microbial biomass pools (Navarro-González et al., 2003).
The eastern boundary of the region hosts large volcanoes that are situated in the leeward rainshadow of the Andes. The upper plant-free reaches of these peaks are distinct from other more
well studied Atacama geographic zones in that the higher elevation increases rates of
precipitation, yet also increases rates of evaporation, sublimation, solar incidence and freezethaw cycling (Schmidt et al., 2009). Despite these additional stressors, the barren high volcanic
30
deposits are a habitat still principally limited by water availability (Costello et al., 2009). Photoatmospheric processes (e.g., lightning derived nitrate deposition, Michalski et al., 2004), likely
play defining roles in these gravel-like mineral soils where biotic geochemical cycling is
constrained to nearly undetectable levels.
Although meteorological data from the high-elevation reaches of the Atacama volcanoes
are sparse (Richter and Schmidt, 2002), the restrictiveness of the conditions to biological activity
is manifest in the biomass levels of the mineral soils, which are barely above detection limits, as
well as microbial diversity estimates that rival the lowest ever sampled for exposed terrestrial
systems (Costello et al., 2009; Lynch et al., 2012). The physical conditions that exclude nearly
all microbial life seem to have been overcome by a limited spectrum of bacterial and fungal
lineages that may have evolved the capacity for in situ activity. The most abundant of these
organisms are Chloroflexi and certain Actinobacteria, mainly of the Actinomycetales,
Acidimicrobiales and Rubrobacterales orders (Costello et al., 2009; Lynch et al., 2012).
Based on our initial molecular survey of these volcanic samples (Costello et al.,
2009; Lynch et al., 2012), and work carried out in other areas of the Atacama where plant and
microbial phototrophs are absent (Neilson et al., 2012), we hypothesized that chemoautotrophic
microbes may be supplying organic carbon to simple and low-energy flux communities. Previous
studies elsewhere have demonstrated the biological uptake of trace gases (CO and H2, but not
CH4) in 26 year old plant-free and carbon limited Hawaiian volcanic deposits (King, 2003a),
implying trace gases may be important energy sources where organic carbon accumulations are
limited. The present metagenomic study was undertaken to develop a more comprehensive
understanding of the potential metabolic traits, particularly focused on energy and nutrient
acquisition, which the few community members found at the Llullaillaco Volcano study sites
31
possess. The functional hypotheses developed through this study will be considered in light of
the known environmental conditions present at these sites, and support the ongoing development
of realistic growth conditions for culture based experiments.
Here we present a shotgun metagenomic study of a low-diversity and phylogenetically
under-dispersed community, composed almost exclusively of Actinobacteria (>98% of all
bacteria) found in the high-elevation (>6000 m elevation) Atacama Desert volcanic deposits. By
leveraging the natural low diversity of these samples with deep coverage from long-read whole
metagenome shotgun sequencing, we were able to characterize the genomic makeup of the
community members at a high level of detail through reference database classification of raw
sequence reads. Our high sequencing depth and coverage also enabled de novo assembly based
analyses of selection through estimation of non-synonymous and synonymous mutation rates for
protein coding genes of the most abundant community member's genome.
3.3 Materials and Methods
Sample Collection and Preservation
Two snow free mineral soil samples located approximately 5 m apart were collected from
the Llullaillaco Volcano (−24.718, −68.529) at an elevation of 6034 m above sea level
(m.a.s.l.) during the austral summer in mid-February 2009. The top 4 cm of surface
material, excluding rocks larger than 2 cm in diameter, were aseptically collected and
frozen the same day in the field using blue ice packs. By the evening of the day the samples
were collected, they were transferred to a −20°C freezer at the army barracks (on the ChileArgentina border) near the field site. The next day they were driven (on ice in a cooler) to
Salta, Argentina where they were again placed in a −20°C freezer until they were hand
32
carried to Colorado in a thick-walled cooler on blue ice packs. They arrived in Boulder,
Colorado within 24 h of being taken out of the freezer in Salta and were still frozen upon
arrival (i.e., the ice packs hadn't melted). The samples have since been continuously stored
at −20°C. Further details regarding these and other samples collected from the Llullaillaco
Volcano can be found in Lynch et al. (2012).
DNA Extraction and Sequencing and Quality Control
We utilized a modified serial silica filter binding protocol (Fierer et al., 2012) to overcome
the low DNA yields of these low biomass samples and to avoid the potential biases
introduced from random genomic amplification techniques. DNA extractions were
quantified using PicoGreen dsDNA fluorometry (Thermo Fisher Scientific Inc.). We
recovered 1 μg of gDNA from each of the samples, which required 10.4 g of volcanic debris
from sample 1 and 4.8 g from sample 3 (Table 3.1). Negative extraction controls were run
with the same batch of extraction reagents, but no soils were added. These negative control
extractions were excluded from the sequencing libraries due to insufficient quantities of
dsDNA. Samples were shipped to the Duke University Genome Sequencing and Analysis
Core Resource where the long-read 454 GS FLX+ platform was used to sequence randomly
fragmented bulk nucleic acid extractions.
Table 3.1 Summary of sample characteristics for volcano metagenomes.
33
Library parsing and removal of the 454 MIDs was achieved with the sfffiles package
(454 Life Sciences) and manually confirmed using the Geneious (6.1.3) viewer. Reads were
trimmed so they contained no more than five bases with quality scores of 15 or lower (Cox
et al., 2010). Sequence length was required to be within two standard deviations of the
mean length, and no more than five ambiguous bases per read were permitted. We found
very low rates of artificial read duplication (Gomez-Alvarez et al., 2009, 0.31 and 0.13% for
the sites 1 and 3 libraries respectively), which was tested using CD HIT (Fu et al., 2012),
with settings 1 1 3 that require 100% sequence identity and length.
We used a 15-mer spectrum analysis (Supplementary Figure 1, Marçais and
Kingsford, 2011) to visualize how sequencing depth relates to the total metagenomic
complexity of the samples. Additional desert and non-desert metagenomes were
downloaded from the MG RAST server (Meyer et al., 2008), ID 4446153.3 and all datasets
from Fierer et al. (2012).
rDNAs
A closed reference operational taxonomic unit (OTU) picking method
(pick_closed_reference_otus.py, Caporaso et al., 2010) was applied to a UCLUST (Edgar,
2010) identified set of candidate 16S RNAs genes. This method overcomes the issue of
sequencing different regions for the 16S rRNA gene with the shotgun technique. A 97%
similarity was required for each candidate sequence alignment to the most current Green
Genes reference dataset available (Release 13_5, McDonald et al., 2012). For the analysis of
phylogenetic dispersion, near full length 16S rRNA gene sequences that have been
34
previously published (JX098304—JX098810) were used to construct a maximum
likelihood tree (Price et al., 2009) with the Green Genes reference dataset (13_5) clustered
into 5088 OTUs85%. Phylocom 4.2 (Webb et al., 2008) was used to calculate a net
relatedness index (NRI) value and associated one-tail P-values with 999 randomization
iterations and the null hypothesis setting 2 (sample OTUs are drawn at random from the
total species pool without replacement). This null hypothesis is intended to model the
homogenizing effects of long distance atmospheric transport and deposition of bacterial
cells from diverse sources, with a total absence of selection.
Fine scale phylogenetic trees were constructed with OTUs1% of the full length 16S
sequences determined by the QIIME pick_de_novo_otus.py workflow. SINA alignments
(Pruesse et al., 2012) were built with Silva (115) reference database representatives (Quast
et al., 2013) and maximum likelihood phylogenies were inferred with PhyML 3.0 (Guindon
et al., 2010) using a GTR model of nucleic acid evolution.
Genetic Inventory
The SEED database (Overbeek et al., 2005) uses a hierarchical classification system where
the broadest level (level 1) includes many anabolic and catabolic pathways and their
associated single enzyme catalyzed intermediaries. Pairwise t-tests were used to calculate
significance of gene category count differences (level 1) between the Llullaillaco Volcano
libraries and a collection of desert and non-desert metagenomes, using the pooled SD
option and a Bonferroni correction for multiple comparisons (α = 0.05/ (28 × 2) = 0.0009)
in R (http://www.r-project.org/). Gene calls were made based on minimum ID of 60% and
35
a maximum e-value of 1 e−5 for all BLAT alignments that were generated from MG RAST,
and the SEED database.
Assembly
De novo assembly was attempted on each of the two separate Llullaillaco site metagenomes
with the MIRA V3.4.0 (Chevreux et al., 1999) signal trace assembly platform using the
following settings: --job=denovo,genome,accurate,454 --highlyrepetitive --noclipping -notraceinfo --fasta -project=RL1All -SK:not=46 -AS:sep=yes 454_SETTINGS -ED:ace=yes AL:mo=40:ms=30 -CL:bsqc=yes -LR:lsd=yes:ft=fastq. These settings require that each
fragment addition to a contig have at least 40 high quality scoring bases of overlap and
minimum quality scores of 30. They also restrict the variance of coverage levels across each
contig to reflect the expectation that random shotgun sampling of each community
member's genome should result in a unique coverage level that reflects its natural relative
abundance in the community of genomes. This assembly approach assumes a theoretical
copy number of one per unique genomic element leading to exclusion of repetitive
elements, and also assumes that the main community members have significantly different
relative abundances.
Assembly Evaluation and Annotation
Tetramer based emergent self-organizing maps (ESOMs) http://databionicesom.sourceforge.net/were used to help evaluate contig binning (Dick et al., 2009) in
conjunction with analysis coverage levels. Descriptions of the databionic ESOM settings and
the Perl scripts used to calculate tetramer frequencies can be found
36
at https://github.com/tetramerfreqs/binning. Consensus sequences from contigs were
called with a majority rule to filter out all but the most abundant strains and low coverage
ends were trimmed.
Bins of contigs that represent draft genomes and associated metadata were
uploaded to the JGI IMG/ER database (Markowitz et al., 2012) for initial annotation. The
phylogenetic origins of the JGI protein annotations were inspected and annotations for
select coding DNA sequences (CDS) were checked manually. Completeness of the
metagenome assembles was assessed by comparing protein family database (Punta et al.,
2012) annotations to the list of conserved single copy genes (CSCGs, Rinke et al., 2013).
Putative genes involved in major metabolic pathways were manual curated by evaluating
blastx alignments and through literature-based refinement of functional annotations.
Comparative Genomics and Analysis of Selection
Clusters of orthologs genes (COGs, Tatusov, 1997) for the three publically
available Pseudonocardia sp. genomes were downloaded from the IMG/ER database. COG
count data were subjected to hierarchal centroid clustering with Cluster
3.0http://bonsai.hgc.jp/mdehoon/software/cluster/software.htm. and visualized with
heatmaps drawn in TreeView (Saldanha, 2004).
Even when genes share clear homologous relationships they may perform divergent
functions. One way to detect the signature of divergent selection between orthologous
genes is through the comparison of rates of non-synonymous (Ka) to synonymous (Ks)
mutations. When selection is weak or absent Ka:Ks ratios should be close to one since
genetic drift should have an equal chance of causing either non-synonymous or
37
synonymous mutations. However, when divergent selection drives altered amino acid
coding potential, rates of non-synonymous mutations should be elevated relative to
synonymous mutations (Yang, 1998). A Perl pipeline was used to link the following steps
together for an iterative Ka:Ks analysis. Pairs of candidate CDS orthologs between our best
volcano Pseudonocardia sp. draft genome and the Pseudonocardia asaccharolytica (IMG ID
13496) draft genome were identified as reciprocal blastn hits (with ≥70% identity for 100
bp). Protein guided DNA alignments were generated for each CDS pair through the
TranslatorX approach (Abascal et al., 2010), which relies on Muscle (Edgar, 2004) to align
predicted amino acid sequences. Codeml (PAML 4.7, Yang, 2007) was then used to estimate
rates of non-synonymous (Ka) and synonymous (Ks) nucleic acid substitutions for each
ortholog pair alignment, using the WAG model of amino acid evolution. Ortholog pairs
found with signatures of positive selection for amino acids substitutions (Ka:Ks ratios of ≥
1) were checked manually and annotated with a database of genes from the P.
asaccharolytica draft genome using blastx.
Hydrogenase Phylogenetics
To place the [NiFe]-hydrogenase genes from our volcano Pseudonocardia sp. assembly into
a broader phylogenetic context, we constructed a phylogeny using available sequence data
from other studies. A broad sampling of [NiFe]-hydrogenase large subunit amino acid
sequences was obtained from the list of sequences provided by Vignais and Billoud (2007),
along with their subgroup annotations. Sequences for a fifth subgroup were obtained
through blastn searches using our assembled sequence, as well as from Constant et al.
(2010). Incomplete sequences were not included in our analysis. All amino acid sequences
38
were aligned using ClustalW2 (Larkin et al., 2007) using default parameters, and a
phylogeny was made using the neighbor-joining algorithm implemented in MEGA 6
(Tamura et al., 2013) using the Poisson model with 1000 bootstrap replications.
3.4 Results
Sequencing and rDNA Diversity
After trimming we were left with 3.85 million reads that total 1.3 Gb of DNA sequence data for
downstream analysis. Each of the two site libraries contained nearly identical distributions of
bacterial (99.2%), eukaryotic (0.5%) and archaeal (0.3%) reads, based on all MG RAST
annotation databases. We found a low diversity community populated mostly by Actinobacteria
(Table 3.1), which make up 98.6 and 97.9% of the 16S rRNA genes from the site 1 and 3
libraries, respectively. This highly uneven community structure is significantly under dispersed
(P < 0.001 and 0.01 for the phylogenetic randomization tests on the two samples), indicating a
likely non-random assemblage of bacterial lineages. All lineages shown in Figure 3.1 belong
within the Actinomycetales, other than an OTU3% belonging to the Acidimicrobiales order
(Supplementary Figure 1) that makes up 15.6% of the site 3 library, but only 1.9% of the site 1
library. The Pseudonocardia are by far the most abundant lineages (72.2% of site 1 and 56.3% of
site 3 total 16S reads) and the Saccharopolyspora (Pseudonocardiaceae) also make up 8.8% and
12.6% of total 16S rRNA gene reads from sites 1 and 3, respectively.
39
Figure 3.1 Community profile (A) All OTU3% taxonomic assignments from each site that
represent at least 1% of the total metagenome 16S gene reads. These 12 OTUs constitute
94.3 and 92.6% of the total 16S gene reads from sites 1 (gray bars) and 3 (black bars)
respectively and are all members of Actinomycetales other than the single Acidimicrobiales
OTU3%. (B) Maximum likelihood phylogeny of the most abundant Pseudonocardia OTU3%, split
into sub OTUs1%. The scale bar represents 1% divergence between nucleic acid sequences.
Genetic Inventory
The Llullaillaco metagenomes show a pronounced reduction in genes associated with
carbohydrate metabolism compared with other desert and non-desert metagenomes (Figure 3.2).
By contrast we found significant enrichment of pathways categorized as membrane transport,
nucleotide metabolism, regulation and cell signaling, nitrogen metabolism and virulence and
defense. Examining the presence and absence of metabolic pathways within the total
40
metagenome, we found no evidence for complete photosynthetic pathways, yet found complete
gene sets for the oxidation of CO and H2, and for CO2 fixation with the Calvin cycle.
Methylotrophic pathways also suggest a role for other C1 compound oxidation and assimilation
including: methanol, formaldehyde, formate and perhaps methane. No nitrogen (N2) fixation or
ammonia monooxygenase genes were identified, but genes for nitrate (NO−3) reduction (nitrate
reductase) and ammonia (NH3) assimilation (glutamine synthetase) were found in high
abundance.
Figure 3.2 Inventories of gene functional categories, comparing non-desert (gray), desert
(black) biomes to the high-elevation volcano metagenomes (blue). Asterisks indicate
Bonferroni corrected significant differences (P < 0.0009) between the volcano data and desert or
non-desert data (desert to non-desert comparisons not shown) for all pairwise T-tests.
Genome Assembly Results
41
We were able to assemble and bin contigs (Supplementary Figures 3, 4) that represent composite
genomes of the most abundant Pseudonocardia sp. (Table 3.2), as well as the other lower
abundance community members, such as a member of the Acidimicrobiales (Supplementary
Figures 1, 3). The best Pseudonocardia assembly appears to represent a nearly complete set of
non-repetitive genomic elements since it contains 138/139 CSCGs (missing a DNA uptake
competence gene, PF03772). None of the CSCGs were present in more than one copy in the
metagenome assemblies, suggesting we did not greatly over-assemble this genome. The CSCGs
are 139 protein coding genes that were found to occur only once in at least 90% of the 1515
finished bacterial genomes available in the IMG/ER database (Rinke et al., 2013). Within each of
the new Pseudonocardia sp. assemblies, 2–3 single nucleotide polymorphisms (SNPs) were
present in many of the CDS regions, which are likely indicative of strain and population level
variation.
Table 3.2. Summary of metagenome Pseudonocardia sp. assemblies and nearest
phylogenetic reference genome, P. asaccharolytica (JGI IMG id 13496).
COG Comparisons
COG counts from our highest quality Pseudonocardia sp. assembly (68–115 × coverage bin
from site 1) and the three other publicly available genomes for named Pseudonocardia spp.
(Figure 3.3) highlight some of the specific differences in genome content. We found certain
COGs like those needed for CO oxidation are conserved at high copy numbers across all the
Pseudonocardia spp., and that COGs such as those required for assimilatory nitrate reduction
and carbon fixation (RuBisCO) show relatively higher counts in both our metagenome assembly
42
and P. asaccharolytica. Other highly abundant gene clusters within our metagenome assembly
bear resemblance to the more phylogenetically distant Pseudonocardia spp. These clusters
include the antibiotic producing non-ribosomal peptide synthesis pathway (NRPS), various ABC
peptide importers, cytochrome P450 monooxygenase, and several recombinases.
Figure 3.3 Venn diagram of the shared
and unique genes (COGs) among
named Pseudonocardia spp. with
complete genomes and the volcano
Pseudonocardia sp. genome assembly.
Although most of the 50 COGs unique to
the volcano Pseudonocardia sp. are
classified as “function unknown” or
“general function prediction only,” the six
additional defense mechanism related
COGs and the nine fewer carbohydrate
transport and metabolism COGs in the
volcano Pseudonocardia sp. stand out as
potentially relevant functional differences
with other Pseudonocardia spp.
Signatures of Selection Analysis
Of the 5024 annotated CDS from the draft P. asaccharolytica genome we were able to initially
align 1722 orthologous coding sequences from our best metagenome Pseudonocardia sp.
assembly with at least 70% nucleotide identity. Of these, manual inspection filtered out 462 gene
pairs that were poorly aligned or were not true homologs across the entire sequence. There were
59 remaining ortholog pairs (4.7%) with estimated Ka:Ks ratios ≥ 1, which reflects elevated rates
of non-synonymous mutations brought about through strong divergent selection acting upon the
amino acid sequences (Figure 3.4, Supplementary Table 1).
43
Figure 3.4 Distribution of Ka:Ks ratios
for 1260 pairwise orthologous protein
coding sequences between the best
volcano Pseudonocardia sp. assembly
and its closest fully-sequenced
relative, Pseudonocardia
asaccharolytica, showing the majority
of genes (95.3%) to be under purifying
or relaxed selection regimes, where
synonymous substitutions that do not
alter the amino acid coding potential
dominate the gene. However, some
outliers (4.7%) display higher levels of
non-synonymous mutations (≥1 Ka:Ks)
likely driven by divergent selection from
the harsh high-elevation desert
conditions. This analysis was limited to
23% of total volcano Pseudonocardia sp.
genes due to the high degree of overall genomic divergence between these two species.
Characteristics of the Volcano Pseudonocardia sp. Genome
The volcano Pseudonocardia sp. genome is at least 4.9 Mb (Table 3.2) and contains many of the
pathways that define the total community metabolic potential (e.g., aerobic heterotrophic
metabolism, NO−3 and NH3 utilization, H2 and CO oxidation, CO2 fixation and methylotrophic
pathways, Figure 3.5). Many genes (33%) were found with multiple copies in the genome,
suggesting a possible role for gene duplication events during the divergence of this genome.
Potential carbohydrate oxidation pathways are quite limited, with genes present only for the
utilization of glucose, mannose, ribose, gluconate, maltose, trehalose, lactose, and galactose that
feed into the Embden-Meyerhof-Parnas pathway or the pentose phosphate pathway.
Carbohydrate uptake potential is apparently even more restricted as only a single annotated
maltose ABC importer was identified. A complete list of putative gene annotations can be found
in the IMG/ER database (id 45716).
44
Figure 3.5 Ecophysiological overview of the volcano Pseudonocardia sp. metabolic
pathways as inferred from assembled metagenomic data. sMMO, soluble methane
monooxygenase; MDH, (PQQ)-dependent methanol dehydrogenase; FDH, formaldehyde
dehydrogenase; FoDH formate dehydrogenase-O; NDH, group 5 high-affinity NiFe
hydrogenase, ATPS, ATP synthase; ETC electron transport chain; COD, form I carbon
monoxide dehydrogenase; AsE, arsenite efflux; CYP, cyanate permease; CYL cyanate lyase;
AMI, ammonium importer; NAS, assimilatory nitrate reductase; NAR, respiratory nitrate
reductase; NIE, nitrite extrusion protein; NIR, nitrite reductase; GS, glutamine synthetase; SPM,
sulfate permease; 3PG 3-phosphoglyceric acid; PHB, polyhydroxybutyrate; Gln, glutamine.
Hydrogenase Phylogeny Results
45
Our phylogenetic analysis of [NiFe]-hydrogenase sequences confirmed that the volcano
Pseudonocardia sp. assembly includes a group 5 [NiFe]-hydrogenase gene (Figure 3.6). Our
phylogeny resolved a monophyletic clade for hydrogenase group 5, which includes the group 5
hydrogenase sequences from Constant et al. (2010) as well as several other Actinobacterial
phylogypes. [NiFe]-hydrogenase protein sequences that are most closely related to the
volcano Pseudonocardia sp. came from P. asaccharolytica, Pseudonocardia spinosispora,
and Actinomycetospora chiangmaiensis.
46
Figure 3.6 Neighbor-joinging phylogenetic tree of [NiFe]-hydrogenase amino acid
sequences. The phylotype from our Pseudonocardia sp. assembly (star) falls into the same clade
as sequences shown in Constant et al. (2010), which are marked with circles. Sequences from
other [NiFe]-hydrogenase large subunit subclades (L1–L4, Vignais and Billoud, 2007) are shown
as the outgroup. Bootstrap support values are shown for nodes present in over 80% of
bootstrapped trees. The scale bar represents 20% divergence between amino acid sequences.
3.5 Discussion
The conditions present in the most extreme Atacama Desert soils exclude most life and leaves
open the questions of if and how microbes may survive there. Previous studies of Atacama
Desert soil microbiota have used either 16S gene based culture-independent approaches
(Navarro-González et al., 2003; Costello et al., 2009; Lynch et al., 2012; Neilson et al., 2012), or
to a limited extent culture-dependent methods (Lester et al., 2007; Okoro et al., 2009). Taken
together, the pioneering work done on Atacama soils indicates that low diversity microbial
communities are present at many sites, though few details have emerged regarding the origins
and functional nature of these microorganisms. In this study, we used a deep metagenomic
sequencing strategy to examine the structure and functional potential of the Llullaillaco Volcano
microbial community (Lynch et al., 2012). Difficulty with extracting DNA from very low
biomass mineral soils required us to pool roughly the equivalent of 60 standard 0.25 g soil DNA
extractions to achieve the quantity of genomic DNA necessary for shotgun metagenomic
sequencing. As a result, this dataset is less spatially expansive than our previous amplicon based
analysis (Lynch et al., 2012), yet still demonstrates the low-diversity community structure
extends throughout a relatively large volume of soil. Despite the limitations of this study, the
approach allowed for a more thorough description of the Llullaillaco Volcano microbial
community structure, and provides an initial insight into the protein coding potential of the
metagenome as well as the most abundant community member's genome.
47
Through this approach we found an extremely low-diversity community of organisms
(Figure 3.1, Table 3.1) that host an unusual inventory of functional genes (Figure 3.2), including
an absence of phototrophic pathways and limited capacity for heterotrophic carbohydrate
metabolism. The low diversity community lacks many of the clades previously recovered from
high-elevation air (Bowers et al., 2012) and dust (Stres et al., 2013) microbiome studies,
suggesting a high degree of environmental selection that could occur during atmospheric
transport to these Atacama sites, or during active or dormant residence in the mineral soils.
The most abundant 16S gene OTU (Pseudonocardia sp.) recovered from the two sites
used in this study (and from the third “low site” from Lynch et al., 2012), shares a relationship
with Pseudonocardia sp. detected in other high elevation samples from Himalayan and Antarctic
mineral soils (Rhodes et al., 2013), as well as with isolates from Icelandic volcanic deposits
(Cockell et al., 2013) leaving open the possibilities it may be native to these sites or that it could
be present at the Llullaillaco Volcano sites as a consequence of atmospheric transport (Stres et
al., 2013). It is noteworthy that the Acidimicrobiales OTUs3% (Figure 3.1) found in this
environment (15.6% of the site 3 library, and 1.9% of the site 1 library) is related to known
inhabitants of fumaroles (Supplementary Figure 1, Benson et al., 2011; Itoh et al., 2011), so it is
likely that at least some of the organisms present at our research sites are the result of regional
wind transport from active fumaroles on nearby Socompa Volcano (Costello et al., 2009), or
from as yet undiscovered fumarolic activity on Llullaillaco Volcano. Indeed, we found
Acidimicrobiales 16S gene sequences identical to those from the Llullaillaco Volcano in warm
fumaroles of Socompa Volcano (Costello et al., 2009). It is also possible that the presence of
known fumarole inhabitants indicates that our research sites are located on soils that were
originally fumarolic and that the organisms found there are relics that have survived as dormant
48
spores. This would explain the presence of genes for the utilization of gases that are found in
fumarolic emissions (e.g., CO and H2), rather than the idea that they serve to metabolize the
exceedingly low concentrations of atmospheric gases found at elevations above 6000 m.a.s.l.
Energetics
Detailed examination of the most abundant community member's genome assembly reveals
unique genetic content (Figure 3.3), evidence for divergent natural selection acting on certain
homologs (Figure 3.4, Supplementary Table 2) and complete metabolic pathways related to trace
atmospheric substance metabolism (Figure 3.5). Unidentified soil oligotrophs have long been
suspected of oxidizing ubiquitous trace gases like H2, CO, and CH4 based on evidence from bulk
soil process studies (Conrad, 1996; Constant et al., 2011). Although unequivocal demonstrations
of bacterial growth and cell division from trace gas metabolism have been elusive, several
actinobacterial isolates have been shown to oxidize ambient H2 and CO at atmospheric
concentrations (Constant et al., 2008; King, 2003b). In certain actinobacteria, ambient
H2 oxidation has now been conclusively tied to the activity of high-affinity group 5 [NiFe]
hydrogenases (Greening et al., 2014).
[NiFe] hydrogenases are membrane-bound enzymes that catalyze the splitting of
periplasmic H2, facilitating the production of a proton gradient for ATP synthesis (Figure 3.5,
“NDH”). A novel group 5 [NiFe] hydrogenase gene set is present in our genome assembly of the
most abundant volcano Pseudonocardia sp. (Figure 3.6), indicating that the dominant organism
at this site likely has the ability to utilize atmospheric concentrations of H2 (0.53 ppmv, at sea
level, but about 0.24 ppmv at 6000 m.a.s.l.) for energy production. Greening et al. (2014) also
found that Mycobacterium smegmatis group 5 [NiFe] hydrogenase expression levels increased
49
under carbon starvation conditions, implicating the oxidation of H2 as a source of electrons
during low metabolic states. Given the low levels of organic carbon measured at the volcano
sites (Table 3.1), and the phylogenetic affiliation between the group 5
volcano Pseudonocardia sp. [NiFe] hydrogenase and the M. smegmatis group 5 [NiFe]
hydrogenase (sharing 80% amino acid identity) studied by Greening et al. (2014), oxidation of
trace H2 seems to be a plausible energy source for the new Pseuodnocardia sp. However, [NiFe]
hydrogenase genes are not the only genes we observed that could be used to metabolize
atmospheric substrates.
Previous studies have correlated a widespread occurrence of carbon monoxide
dehydrogenase genes with soil CO uptake (King, 2003a; Weber and King, 2010; Quiza et al.,
2014), and various soil bacterial isolates have been confirmed to oxidize CO at atmospheric
concentrations (<400 ppbv at sea level, Hardy and King, 2001; King, 2003b). Carbon monoxide
dehydrogenase functions similarly to [NiFe] hydrogenase, in that it is a membrane-bound
enzyme that facilitates the generation of a proton gradient. In this case, the enzyme oxidizes CO
and reduces H2O, forming CO2 and two periplasmic protons (Figure 3.5, “COD”). M.
smegmatis has been shown to be capable of trace CO uptake, and hosts canonical type I carbon
monoxide dehydrogenase genes (Quiza et al., 2014), similar to the CO dehydrogenase genes
present in the volcano Pseudonocardia sp. assembly. However, it is not yet clear how this
activity affects cellular physiology. It is likely that tropospheric CO oxidation is often a
supplemental energy source, contributing to a mixotrophic metabolism (King and Weber, 2007).
Thus, physiological work focused on high-affinity CO oxidizing bacteria must carefully consider
the possible requirements and roles of organic carbon sources, in addition to tracking lowconcentration CO uptake (King and King, 2014).
50
The volcano Pseudonocardia sp. genome encodes complete pathways for the oxidation
and assimilation of methanol, formaldehyde, and formate (Figure 3.5). The atmosphere contains
very low concentrations of these gases mainly due to plant volatile emission and photochemical
reactions (Hu et al., 2011; Stavrakou et al., 2011; Luecken et al., 2012). The study of bacterial
metabolism of atmospheric concentrations of these C1 compounds is limited, although efforts are
underway to develop an understanding of the distributions of methylotrophs and how they
influence the global methanol cycle (Kolb and Stacheter, 2013). Furthermore, some evidence
suggests that various Actinobacteria (e.g., Streptococcus and Rhodococcus spp., Yoshida et al.,
2007) are capable of “CO2 dependent oligotrophic growth” under laboratory carbon starvation
conditions by oxidizing ambient methanol and formaldehyde (Yoshida et al., 2011), suggesting
these C1 gases can be atmospheric sources of energy and carbon for some bacteria.
Methane is the most abundant of the trace gases at 1.79 ppmv (or 0.80 ppmv at 6000
m.a.s.l.), so would seem to be a likely target for trace gas oxidizers. However, the Llullaillaco
Volcano metagenome lacks any identifiable particulate methane monooxygenase (pMMO)
genes, which have been previously identified as likely coding for the high-affinity methane
oxidation enzymes in various soils (Bull et al., 2000; Kolb, 2009). Likewise the study of earlysuccessional Kilauea Volcano soils by King (2003a) detected CO and H2 uptake, but not CH4.
Yet the volcano Pseudonocardia sp. does encode all genes required for a putative iron-dependent
soluble methane monooxygenase (sMMO) enzyme that could function to oxidize methane to
methanol, which would then be fed into the abovementioned methylotrophic pathways. sMMOs
are notoriously non-specific enzymes (Green and Dalton, 1989), and atmospheric concentrations
of methane have not yet been reported to support bacterial growth (Theisen and Murrell,
2005; Conrad, 2009). Nevertheless, the evidence for widespread ambient methane oxidation
51
(McDonald et al., 2008) and experimental confirmation of methane oxidation by members of the
phylum Verrucomicrobia (Dunfield et al., 2007) illustrates the continued need to explore the
phylogenetic and geographic distributions of methane oxidizers.
Given the presence of these various gas utilization pathways in the volcano
Pseudonocardia sp. genome (Figure 3.5), and the constant availability of these substrates at low
concentrations in the atmosphere, the high-elevation volcanic deposit community may rely on a
mixture of diffuse atmospheric substrates in the absence of direct photosynthetic inputs to at least
maintain redox balance, or perhaps even to drive carbon fixation. However, it is important to
note the volcano Pseudonocardia sp. shares nearly all of these aforementioned trace gas
oxidation pathways (Figures 3.5, 3.6) with P. asaccharolytica, its nearest phylogenetic relative
(Figure 3.1). P. asaccharolytica does lack a (PQQ)-dependent methanol dehydrogenase gene, but
these were present in other Pseudonocardia spp. (Figure 3.3). While no studies to date have
tested P. asaccharolytica for trace gas metabolism either in situ or in culture (Reichert et al.,
1998), the trace gas metabolism related genes common to the P. asaccharolytica and the
volcano Pseudonocardia sp. genomes have been shown to confer trace gas metabolism capacity
in other bacteria (Figure 3.6), making it a plausible trait shared by various members of this
genus. Consequently, the relevance of trace gas utilization as a potential metabolic strategy in the
harsh Atacama Desert mineral soils of this study is difficult to interpret, since trace gas
metabolism genes are not exclusive to Pseudonocardia sp. recovered from desert environments.
Atmospheric gas metabolism is not mutually exclusive with other trophic strategies. The
volcano Pseudonocardia sp. hosts fully encoded aerobic heterotrophic and autotrophic carbon
acquisition pathways, and several energy storage pathways (Figure 3.5). The large and small
RuBisCO subunit genes of the volcano Pseudonocardia sp. both cluster within the form IC
52
clade, which contains other known bacterial facultative autotrophs (Yuan et al., 2012) including
various Actinobacteria such as P. asaccharolytica, further suggesting a flexibility in carbon and
energy acquisition physiology. It is certainly possible this organism is opportunistic, capable of
survival at low metabolic rates through the utilization of a variety of low-concentration and
constantly replenished atmospheric gases, but perhaps is also capable of capitalizing on pulses of
other multi-carbon nutrients and water when they become available, such as after a snow melt
event. Further understanding of the environmental conditions and how they vary through annual
cycles at these difficult to access field sites combined with direct experimental growth assays
will be required to test if and how this bacterium, or other members of the community, may grow
under and respond to, variable and stressful conditions.
Stress Tolerance and Other Traits
Metabolism of various trace atmospheric substrates may be important adaptations to survival in
the harsh and nutrient limited desert volcano environment, but the reduced and under-dispersed
phylogenetic diversity of the microbial community (Figure 3.1, Table 3.1) suggests that other
traits must be important for fitness, given that H2 and CO oxidizing genes are present in many
species of several bacterial phyla. Actinobacteria have a seemingly ubiquitous distribution across
varied terrestrial and aquatic environments (Dinsdale et al., 2008), but are relatively most
abundant in cold-desert soil environments (Fierer et al., 2012). Some obvious traits of the
actinobacteria are likely linked to desert fitness, such as gram positive cell wall architecture,
which is perhaps an original adaptation to ancient terrestrial colonization (Battistuzzi and
Hedges, 2009; Rinke et al., 2013), and the ability of many lineages to sporulate. However, given
53
the metabolic diversity and rapid genomic evolution found within this phylum (Zaneveld et al.,
2010), the full scope of desert actinobacteria traits remains largely uncharacterized.
The volcano Pseudonocardia sp. assembly contains COGs with relatively high copy
number compared to other species of the genus that could possibly underlie stress tolerance
adaptations including: DNA replication and repair machinery, transcriptional regulators,
response regulators, cytochrome P450, arabinose efflux permeases, ABC-type multidrug
transport systems and non-ribosomal peptide synthesis pathways (NRPS). It is not possible to
determine the exact functional roles these genes play without experimental confirmation, but it is
conceivable they could be linked to adaptations to the stresses of wet-dry or freeze-thaw cycling
or UV exposure. The multiple copies (≥18) of the NRPS genes are notable because they share
sequence homology most similar to the antibiotic gramicidin D gene set (Kessler et al., 2004).
Considering the known importance of extrapolymeric substance production as a xerotolerace
trait for many microorganisms (Lennon et al., 2012), and the presence of arabinose and
polysaccharide export genes in the volcano Pseudonocardia sp. genome, it is not surprising that
investment in antibiotic defense mechanisms that may ward off scavengers of these vulnerable
carbon sources (e.g., fungi, Schmidt et al., 2012) may also be necessary.
We compared all well aligned homologs between the volcano Pseudonocardia sp. to P.
asaccharolytica in order to identity how selection may have affected the amino acid sequences
(and functions) of certain genes. P. asaccharolytica was isolated from a dimethyl sulfide and
tree-bark biofilter enrichment experiment (Reichert et al., 1998), but little else is known about its
ecology or physiology other than the lack of ability to oxidize any of the single carbohydrates
tested in the original report, and that it can be grown at moderate rates on TSA media at
mesophillic temperatures. Our analysis identified 59 volcano Pseudonocardia sp. genes (4.7% of
54
all analyzed homolog pairs, Supplementary Table 1) that have higher rates of non-synonymous
mutations when compared to their homolog in Pseudonocardia assaccharolytica (Ka:Ks ≥ 1)
because they evolved under a strong divergent selection regime (Figure 3.4). These genes fall
into categories of protein translation (four tRNA methyltransferase modification enzymes and a
ribosomal modulation protein), respiration (succinate dehydrogenase), energy storage (acyl CoA
dehydrogenase) and membrane transport (polysaccharide, multidrug, potassium, phosphate and
cyanate). Other annotations of genes found with a ≥ 1 Ka:Ks ratio are more difficult to interpret
such as 13 uncharacterized conserved proteins and three transposases, but underscore the
potential for discovery of novel microbial traits from understudied environments and taxa.
Although this analysis cannot determine the particulars of how these genes differ in terms of the
reaction kinetics or substrate specificities of the enzymes they code for, functions like membrane
transport and energy storage could plausibly underlie important survival traits for conditions in
the nutrient limited high-elevation volcanic deposits of this study.
Another interesting aspect of the Ka:Ks ratio analysis is that only 23% of total volcano
Pseudonocardia sp. protein coding genes could be unambiguously aligned to homologs from P.
asaccharolytica. The remaining 77% of genes are too divergent to analyze with this method.
This limits the power of the analysis somewhat, but highlights the genetic novelty of each of
these organisms, and suggests that further genomic and culture work on the Pseudonocardia spp.
is warranted.
We find the most abundant genome in the community is intermediately sized (4.9 Mb,
not including highly repetitive content, Table 3.2), and codes for diverse metabolic potential.
This size is not unexpected though, as work by Konstantinidis and Tiedje (2004) shows evidence
that heterogeneous, variable, and low nutrient niches in soils select for larger genomes, which
55
often contain enhanced regulatory and secondary metabolite synthesis pathways. Barberán et al.
(2014) recently expanded this concept by showing that, to some extent, genome size is a
reflection of the complexity and variability of terrestrial bacterial niches. Thus, even though
utilization of low concentration atmospheric substrates may be important traits for the
volcano Pseudonocardia sp., we did not expect to find signatures of genome streamlining, as
have been documented in oceanic bacteria that specialize in low concentration nutrient uptake
(Giovannoni et al., 2014). Given the variability of a high mountain top environment (Lynch et
al., 2012) that experiences frequent wet-dry and freeze-thaw cycling stresses (Stres et al., 2010),
we are not surprised to find significantly higher numbers of genes classified in the regulation and
cell signaling categories in the total metagenome (Figure 3.2), as well as specific examples of
transcription and response regulator genes with high copy numbers, and with high Ka:Ks ratios in
the genome of the most abundant community member.
Conclusions
The functional inferences drawn from this culture-independent study can now serve as testable
hypotheses for ongoing culture-based experiments. Although a modest collection of bacteria and
fungi have been cultured and isolated from these volcano samples using a variety of selection
techniques (unpublished), the most abundant lineages observed from culture-independent
approaches have thus far resisted isolation. Nevertheless, the results we present here can inform
future culture-based physiological analyses by providing information on potential electron
donors and growth conditions.
The atmosphere interfaces with diverse terrestrial and aquatic environments, so it is
possible that the pathways and signatures of selection we have detected result from activity and
56
replication elsewhere. Selective dispersal and dormancy processes cannot be ruled out either;
perhaps we have recovered genomic material from the most well-dispersing or longest surviving
spores. Although there is little evidence to suggest that the most abundant organism from the
Llullaillaco Volcano study sites is native to another environment, or is an exceptional spore
producer, these are possibilities that cannot yet be rejected, especially considering the evidence
for wind borne transport of other lower abundance lineages of the community (Supplementary
Figure 1).
Overall, our initial analyses of these metagenomes indicates that despite, or perhaps
because of, the intense solar radiation this sparsely populated high-elevation microbial
community lacks endogenous photosynthesizing primary producers, but possesses the genetic
potential for utilization of various low molecular weight atmospheric substrates and
CO2 fixation. This seems to support our hypothesis that chemoautotrophic, rather than
photoautotrophic, microbes may be supplying organic carbon to simple and low-energy flux
communities at these sites, but does not allow us to determine the relative roles that heterotrophic
or mixotrophic metabolism may play. Bacterial growth on trace gases and aerosols is difficult to
study and can likely support only low rates of metabolism. Answering whether or not the
intriguing combination of metabolic pathways found in the volcano Pseudonocardia sp. genome
indicates an actual dependency for growth on one or more atmospheric substrates requires direct
physiological experimentation at relevant gas concentrations. These pathways could also be
supplemental to more standard heterotrophic metabolism, and may not by themselves support
growth and cell division. Future studies of these high-elevation actinobacteria and their relatives
(Cockell et al., 2013; Rhodes et al., 2013) should consider the possibility that a mixture of
57
atmospheric, precipitation and soil derived substrates may be required for growth, or that these
organisms are but remnants of extinct ecosystems or windblown transients.
3.6 References
Abascal, F., Zardoya, R., and Telford, M. J. (2010). TranslatorX: multiple alignment of
nucleotide sequences guided by amino acid translations. Nucleic Acids Res. 38, 7–13. doi:
10.1093/nar/gkq291
Barberán, A., Ramirez, K. S., Leff, J. W., Bradford, M. A, Wall, D. H., and Fierer, N. (2014).
Why are some microbes more ubiquitous than others? Predicting the habitat breadth of soil
bacteria. Ecol. Lett. 17, 794–802. doi: 10.1111/ele.12282
Battistuzzi, F. U., and Hedges, S. B. (2009). A major clade of prokaryotes with ancient
adaptations to life on land. Mol. Biol. Evol. 26, 335–343. doi: 10.1093/molbev/msn247
Benson, C. A., Bizzoco, R. W., Lipson, D. A., and Kelley, S. T. (2011). Microbial diversity in
non-sulfur, sulfur and iron geothermal steam vents. FEMS Microbiol. Ecol. 76, 74–88. doi:
10.1111/j.1574-6941.2011.01047.x
Bowers, R. M., McCubbin, I. B., Hallar, A. G., and Fierer, N. (2012). Seasonal variability in
airborne bacterial communities at a high-elevation site. Atmos. Environ. 50, 41–49. doi:
10.1016/j.atmosenv.2012.01.005
Bull, I. D., Parekh, N. R., Hall, G. H., Ineson, P., and Evershed, R. P. (2000). Detection and
classification of atmospheric methane oxidizing bacteria in soil. Nature 405, 175–178. doi:
10.1038/35012061
Caporaso, J. G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F. D., Costello, E. K., et
al. (2010). QIIME allows analysis of high-throughput community sequencing data. Nat.
Methods. 7, 335–336. doi: 10.1038/nmeth.f.303
Chevreux, B., Wetter, T., and Suhai, S. (1999). Genome sequence assembly using trace signals
and additional sequence information. Comput. Sci. Biol. 99, 45–56.
Cockell, C. S., Kelly, L. C., and Marteinsson, V. (2013). Actinobacteria —an ancient phylum
active in volcanic rock weathering. Geomicrobiol. J. 30, 706–720. doi:
10.1080/01490451.2012.758196
Conrad, R. (1996). Soil microorganisms as controllers of atmospheric trace gases (H2,CO, CH4,
OCS, N2O, and NO).Microbiol. Rev. 60:609.
58
Conrad, R. (2009). The global methane cycle: recent advances in understanding the microbial
processes involved. Environ. Microbiol. Rep. 1, 285–292. doi: 10.1111/j.17582229.2009.00038.x
Constant, P., Chowdhury, S. P., Hesse, L., Pratscher, J., and Conrad, R. (2011). Genome data
mining and soil survey for the novel group 5 [NiFe]-hydrogenase to explore the diversity and
ecological importance of presumptive high-affinity H2-oxidizing bacteria. Appl. Environ.
Microbiol. 77, 6027–6035. doi: 10.1128/AEM.00673-11
Constant, P., Chowdhury, S. P., Pratscher, J., and Conrad, R. (2010). Streptomycetes
contributing to atmospheric molecular hydrogen soil uptake are widespread and encode a
putative high-affinity [NiFe]-hydrogenase. Environ. Microbiol. 12, 821–829. doi:
10.1111/j.1462-2920.2009.02130.x
Constant, P., Poissant, L., and Villemur, R. (2008). Isolation of Streptomyces sp. PCB7, the first
microorganism demonstrating high-affinity uptake of tropospheric H2. ISME J. 2, 1066–1076.
doi: 10.1038/ismej.2008.59
Costello, E. K., Halloy, S. R. P., Reed, S. C., Sowell, P., and Schmidt, S. K. (2009). Fumarolesupported islands of biodiversity within a hyperarid, high-elevation landscape on Socompa
Volcano, Puna de Atacama, Andes. Appl. Environ. Microbiol. 75, 735–747. doi:
10.1128/AEM.01469-08
Cox, M. P., Peterson, D. A., and Biggs, P. J. (2010). SolexaQA: At-a-glance quality assessment
of Illumina second-generation sequencing data. BMC Bioinformatics 11:485. doi: 10.1186/14712105-11-485
Dick, G. J., Andersson, A. F., Baker, B. J., Simmons, S. L., Thomas, B. C., Yelton, A. P., et al.
(2009). Community-wide analysis of microbial genome sequence signatures. Genome Biol.
10:R85. doi: 10.1186/gb-2009-10-8-r85
Dinsdale, E. A., Edwards, R. A., Hall, D., Angly, F., Breitbart, M., Brulc, J. M., et al. (2008).
Functional metagenomic profiling of nine biomes. Nature 452, 629–632. doi:
10.1038/nature06810
Dunfield, P. F., Yuryev, A., Senin, P., Smirnova, A. V., Stott, M. B., Hou, S. B., et al. (2007).
Methane oxidation by an extremely acidophilic bacterium of the phylum Verrucomicrobia.
Nature 450, 879–882. doi: 10.1038/nature06411
Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high
throughput. Nucleic Acids Res. 32, 1792–1797. doi: 10.1093/nar/gkh340
Edgar, R. C. (2010). Search and clustering orders of magnitude faster than BLAST.
Bioinformatics 26, 2460–2461. doi: 10.1093/bioinformatics/btq461
59
Fierer, N., Leff, J. W., Adams, B. J., Nielsen, U. N., Thomas, S., Lauber, C. L., et al. (2012).
Cross-biome metagenomic analyses of soil microbial communities and their functional attributes.
Proc. Natl. Acad. Sci. U.S.A. 109, 21390–21395. doi: 10.1073/pnas.1215210110
Fu, L., Niu, B., Zhu, Z., Wu, S., and Li, W. (2012). CD-HIT: accelerated for clustering the nextgeneration sequencing data.Bioinformatics 28, 3150–3152. doi: 10.1093/bioinformatics/bts565
Giovannoni, S. J., Cameron Thrash, J., and Temperton, B. (2014). Implications of streamlining
theory for microbial ecology. ISME J. 8, 1553–1565. doi: 10.1038/ismej.2014.60
Gomez-Alvarez, V., Teal, T. K., and Schmidt, T. M. (2009). Systematic artifacts in
metagenomes from complex microbial communities. ISME J. 3, 1314–1317. doi:
10.1038/ismej.2009.72
Gómez-Silva, B., Rainey, F. A., Warren-Rhodes, K. A., McKay, C. P., and Navarro-González,
R. (2008). “Atacama desert soil microbiology,” in Microbiology of Extreme Soils, ed P. Dion
and C. S. Nautiyal (Berlin; Heidelberg: Springer-Verlag), 117–132.
Green, J., and Dalton, H. (1989). Substrate specificity of soluble methane monooxygenase. J.
Biol. Chem. 264, 17698–17703.
Greening, C., Berney, M., Hards, K., Cook, G. M., and Conrad, R. (2014). A soil
actinobacterium scavenges atmospheric H2 using two membrane-associated, oxygen-dependent
[NiFe] hydrogenases. Proc. Natl. Acad. Sci. U.S.A. 111, 4257–4261. doi:
10.1073/pnas.1320586111
Guindon, S., Dufayard, J.-F., Lefort, V., Anisimova, M., Hordijk, W., and Gascuel, O. (2010).
New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the
performance of PhyML 3.0. Syst. Biol. 59, 307–321. doi: 10.1093/sysbio/syq010
Hardy, K. R., and King, G. M. (2001). Enrichment of high-affinity CO oxidizers in Maine forest
soil. Appl. Environ. Microbiol. 67, 3671–3676. doi: 10.1128/AEM.67.8.3671-3676.2001
Hartley, A., Chong, G., Houston, J., and Mather, A. E. (2005). 150 million years of climatic
stability: evidence from the Atacama Desert, northern Chile. J. Geol. Soc. 162, 421–424. doi:
10.1144/0016-764904-071
Hu, L., Millet, D. B., Mohr, M. J., Wells, K. C., Griffis, T. J., and Helmig, D. (2011). Sources
and seasonality of atmospheric methanol based on tall tower measurements in the US Upper
Midwest. Atmos. Chem. Phys. 11, 11145–11156. doi: 10.5194/acp-11-11145-2011
Itoh, T., Yamanoi, K., Kudo, T., Ohkuma, M., and Takashina, T. (2011). Aciditerrimonas
ferrireducens gen. nov., sp. nov., an iron-reducing thermoacidophilic actinobacterium isolated
from a solfataric field. Int. J. Syst. Evol. Microbiol. 61, 1281–1285. doi: 10.1099/ijs.0.023044-0
60
Kessler, N., Schuhmann, H., Morneweg, S., Linne, U., and Marahiel, M. A. (2004). The linear
pentadecapeptide gramicidin is assembled by four multimodular nonribosomal peptide
synthetases that comprise 16 modules with 56 catalytic domains. J. Biol. Chem. 279, 7413–7419.
doi: 10.1074/jbc.M309658200
King, C. E., and King, G. M. (2014). Description of Thermogemmatispora carboxidivorans sp.
nov., a carbon-monoxide-oxidizing member of the class Ktedonobacteria isolated from a
geothermally heated biofilm, and analysis of carbon monoxide oxidation by members of the class
Ktedonobacteria. Int. J. Syst. Evol. Microbiol. 64, 1244–1251. doi: 10.1099/ijs.0.059675-0
King, G. M. (2003a). Contributions of atmospheric CO and hydrogen uptake to microbial
dynamics on recent hawaiian volcanic deposits. Appl. Environ. Microbiol. 69, 4067–4075. doi:
10.1128/AEM.69.7.4067-4075.2003
King, G. M. (2003b). Uptake of carbon monoxide and hydrogen at environmentally relevant
concentrations by mycobacteria.Appl. Environ. Microbiol. 69, 7266–7272. doi:
10.1128/AEM.69.12.7266-7272.2003
King, G. M., and Weber, C. F. (2007). Distribution, diversity and ecology of aerobic COoxidizing bacteria. Nat. Rev. Microbiol. 5, 107–118. doi: 10.1038/nrmicro1595
Kolb, S. (2009). The quest for atmospheric methane oxidizers in forest soils. Environ. Microbiol.
Rep. 1, 336–346. doi: 10.1111/j.1758-2229.2009.00047.x
Kolb, S., and Stacheter, A. (2013). Prerequisites for amplicon pyrosequencing of microbial
methanol utilizers in the environment. Front. Microbiol. 4:268. doi: 10.3389/fmicb.2013.00268
Konstantinidis, K. T., and Tiedje, J. M. (2004). Trends between gene content and genome size in
prokaryotic species with larger genomes. Proc. Natl. Acad. Sci. U.S.A. 101, 3160–3165. doi:
10.1073/pnas.0308653100
Larkin, M. A., Blackshields, G., Brown, N. P., Chenna, R., McGettigan, P. A., McWilliam, H., et
al. (2007). Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947–2948. doi:
10.1093/bioinformatics/btm404
Lennon, J. T., Aanderud, Z. T., Lehmkuhl, B. K., and Schoolmaster, D. R. (2012). Mapping the
niche space of soil microorganisms using taxonomy and traits. Ecology 93, 1867–1879. doi:
10.1890/11-1745.1
Lester, E. D., Satomi, M., and Ponce, A. (2007). Microflora of extreme arid Atacama Desert
soils. Soil Biol. Biochem. 39, 704–708. doi: 10.1016/j.soilbio.2006.09.020
Luecken, D. J., Hutzell, W. T., Strum, M. L., and Pouliot, G. A. (2012). Regional sources of
atmospheric formaldehyde and acetaldehyde and implications for atmospheric modeling. Atmos.
Environ. 47, 477–490. doi: 10.1016/j.atmosenv.2011.10.005
61
Lynch, R. C., King, A. J., Farías, M. E., Sowell, P., Vitry, C., and Schmidt, S. K. (2012). The
potential for microbial life in the highest elevation (>6000 m.a.s.l.) mineral soils of the Atacama
region. J. Geophys. Res. 117, G02028. doi: 10.1029/2012JG001961
Marçais, G., and Kingsford, C. (2011). A fast, lock-free approach for efficient parallel counting
of occurrences of k-mers.Bioinformatics 27, 764–770. doi: 10.1093/bioinformatics/btr011
Markowitz, V. M., Chen, I.-M. A., Palaniappan, K., Chu, K., Szeto, E., Grechkin, Y., et al.
(2012). IMG: the Integrated Microbial Genomes database and comparative analysis system.
Nucleic Acids Res. 40, 115–122. doi: 10.1093/nar/gkr1044
McDonald, D., Price, M. N., Goodrich, J., Nawrocki, E. P., DeSantis, T. Z., Probst, A., et al.
(2012). An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary
analyses of bacteria and archaea. ISME J. 6, 610–618. doi: 10.1038/ismej.2011.139
McDonald, I. R., Bodrossy, L., Chen, Y., and Murrell, J. C. (2008). Molecular ecology
techniques for the study of aerobic methanotrophs. Appl. Environ. Microbiol. 74, 1305–1315.
doi: 10.1128/AEM.02233-07
Meyer, F., Paarmann, D., D'Souza, M., Olson, R., Glass, E. M., Kubal, M., et al. (2008). The
metagenomics RAST server: a public resource for the automatic phylogenetic and functional
analysis of metagenomes. BMC Bioinformatics 9:386. doi: 10.1186/1471-2105-9-386
Michalski, G., Böhlke, J. K., and Thiemens, M. (2004). Long term atmospheric deposition as the
source of nitrate and other salts in the Atacama Desert, Chile: new evidence from massindependent oxygen isotopic compositions. Geochim. Cosmochim. Acta 68, 4023–4038. doi:
10.1016/j.gca.2004.04.009
Navarro-González, R., Rainey, F. A., Molina, P., Bagaley, D. R., Hollen, B. J., de la Rosa, J., et
al. (2003). Mars-like soils in the Atacama Desert, Chile, and the dry limit of microbial life.
Science 302, 1018–1021. doi: 10.1126/science.1089143
Neilson, J. W., Quade, J., Ortiz, M., Nelson, W. M., Legatzki, A., Tian, F., et al. (2012). Life at
the hyperarid margin: novel bacterial diversity in arid soils of the Atacama Desert, Chile.
Extremophiles 16, 553–566. doi: 10.1007/s00792-012-0454-z
Okoro, C. K., Brown, R., Jones, A. L., Andrews, B. A., Asenjo, J. A., Goodfellow, M., et al.
(2009). Diversity of culturable actinomycetes in hyper-arid soils of the Atacama Desert, Chile.
Antonie Van Leeuwenhoek 95, 121–133. doi: 10.1007/s10482-008-9295-2
Overbeek, R., Begley, T., Butler, R. M., Choudhuri, J. V., Chuang, H.-Y., Cohoon, M., et al.
(2005). The subsystems approach to genome annotation and its use in the project to annotate
1000 genomes. Nucleic Acids Res. 33, 5691–5702. doi: 10.1093/nar/gki866
62
Price, M. N., Dehal, P. S., and Arkin, A. P. (2009). FastTree: computing large minimum
evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26, 1641–1650. doi:
10.1093/molbev/msp077
Pruesse, E., Peplies, J., and Glöckner, F. O. (2012). SINA: accurate high-throughput multiple
sequence alignment of ribosomal RNA genes. Bioinformatics 28, 1823–1829. doi:
10.1093/bioinformatics/bts252
Punta, M., Coggill, P. C., Eberhardt, R. Y., Mistry, J., Tate, J., Boursnell, C., et al. (2012). The
Pfam protein families database. Nucleic Acids Res. 40, 290–301. doi: 10.1093/nar/gkr1065
Pubmed Abstract | Pubmed Full Text | CrossRef Full Text
Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., et al. (2013). The SILVA
ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic
Acids Res. 41, D590–D596. doi: 10.1093/nar/gks1219
Quiza, L., Lalonde, I., Guertin, C., and Constant, P. (2014). Land-use influences the distribution
and activity of high affinity CO-oxidizing bacteria associated to type I-coxL genotype in soil.
Front. Microbiol. 5:271. doi: 10.3389/fmicb.2014.00271
Reichert, K., Lipski, A., Pradella, S., Stackebrandt, E., and Altendorf, K. (1998). New dimethyl
disulfide-degrading actinomycetes and emended description of the genus Pseudonocardia. Int. J.
Syst. Bacteriol. 48, 441–449. doi: 10.1099/00207713-48-2-441
Rhodes, M., Knelman, J., Lynch, R. C., Darcy, J. L., Nemergut, D. R., and Schmidt, S. K.
(2013). “Alpine and arctic soil microbial communities,” in The Prokaryotes, eds E. Rosenberg,
E. F. DeLong, E. Stackebrandt, S. Lory, and F. Thompson (Berlin: Springer), 44–56.
Richter, M., and Schmidt, D. (2002). Cordillera de la Atacama. Das trockenste Hochgebirge der
Welt. Petermanns Geographische Mitteilungen 146, 48–57.
Rinke, C., Schwientek, P., Sczyrba, A., Ivanova, N. N., Anderson, I. J., Cheng, J.-F., et al.
(2013). Insights into the phylogeny and coding potential of microbial dark matter. Nature 499,
431–437. doi: 10.1038/nature12352
Saldanha, A. J. (2004). Java Treeview–extensible visualization of microarray data.
Bioinformatics 20, 3246–3248. doi: 10.1093/bioinformatics/bth349
Schmidt, S. K., Naff, C. S., and Lynch, R. C. (2012). Fungal communities at the edge: ecological
lessons from high alpine fungi. Fungal Ecol. 5, 443–452. doi: 10.1016/j.funeco.2011.10.005
Schmidt, S. K., Nemergut, D. R., Miller, A. E., Freeman, K. R., King, A. J., and Seimon, A.
(2009). Microbial activity and diversity during extreme freeze-thaw cycles in periglacial soils,
5400 m elevation, Cordillera Vilcanota, Perú. Extremophiles13, 807–816. doi: 10.1007/s00792009-0268-9
63
Stavrakou, T., Müller, J.-F., Peeters, J., Razavi, A., Clarisse, L., Clerbaux, C., et al. (2011).
Satellite evidence for a large source of formic acid from boreal and tropical forests. Nat. Geosci.
5, 26–30. doi: 10.1038/ngeo1354
Stres, B., Philippot, L., Faganeli, J., and Tiedje, J. M. (2010). Frequent freeze-thaw cycles yield
diminished yet resistant and responsive microbial communities in two temperate soils: a
laboratory experiment. FEMS Microb. Ecol. 74, 323–335. doi: 10.1111/j.15746941.2010.00951.x
Stres, B., Sul, W. J., Murovec, B., and Tiedje, J. M. (2013). Recently deglaciated high-altitude
soils of the Himalaya: diverse environments, heterogenous bacterial communities and long-range
dust inputs from the upper troposphere. PLoS ONE8:e76440. doi: 10.1371/journal.pone.0076440
Tamura, K., Stecher, G., Peterson, D., Filipski, A., and Kumar, S. (2013). MEGA6: molecular
evolutionary genetics analysis version 6.0. Mol. Biol. Evol. 30, 2725–2729. doi:
10.1093/molbev/mst197
Tatusov, R. L. (1997). A genomic perspective on protein families. Science 278, 631–637. doi:
10.1126/science.278.5338.631
Theisen, A. R., and Murrell, J. C. (2005). Facultative Methanotrophs Revisited. J. Bacteriol. 187,
4303–4305. doi: 10.1128/JB.187.13.4303-4305.2005
Vignais, P. M., and Billoud, B. (2007). Occurrence, classification, and biological function of
hydrogenases: an overview.Chem. Rev. 107, 4206–4272. doi: 10.1021/cr050196r
Webb, C. O., Ackerly, D. D., and Kembel, S. W. (2008). Phylocom: software for the analysis of
phylogenetic community structure and trait evolution. Bioinformatics 24, 2098–2100. doi:
10.1093/bioinformatics/btn358
Weber, C. F., and King, G. M. (2010). Distribution and diversity of carbon monoxide-oxidizing
bacteria and bulk bacterial communities across a succession gradient on a Hawaiian volcanic
deposit. Environ. Microbiol. 12, 1855–1867. doi: 10.1111/j.1462-2920.2010.02190.x
Yang, Z. (1998). Likelihood ratio tests for detecting positive selection and application to primate
lysozyme evolution. Mol. Biol. Evol. 15, 568–573. doi: 10.1093/oxfordjournals.molbev.a025957
Yang, Z. (2007). PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24,
1586–1591. doi: 10.1093/molbev/msm088
Yoshida, N., Hayasaki, T., and Takagi, H. (2011). Gene expression analysis of methylotrophic
oxidoreductases involved in the oligotrophic growth of Rhodococcus erythropolis N9T-4. Biosci.
Biotechnol. Biochem. 75, 123–127. doi: 10.1271/bbb.100700
64
Yoshida, N. Y., Hhata, N. O., Oshino, Y. Y., Atsuragi, T. K., Ani, Y. T., and Akagi, H. T.
(2007). Screening of carbon dioxide-requiring extreme oligotrophs from soil. Biosci. Biotechnol.
Biochem. 71, 2830–2832. doi: 10.1271/bbb.70042
Yuan, H., Ge, T., Chen, C., O'Donnell, A. G., and Wu, J. (2012). Significant role for microbial
autotrophy in the sequestration of soil carbon. Appl. Environ. Microbiol. 78, 2328–2336. doi:
10.1128/AEM.06881-11
Zaneveld, J. R., Lozupone, C., Gordon, J. I., and Knight, R. (2010). Ribosomal RNA diversity
predicts genome diversity in gut bacteria and their relatives. Nucleic Acids Res. 38, 3869–3879.
doi: 10.1093/nar/gkq066
65
CHAPTER 4
GENOMIC DIVERSITY IN CANNABIS
4.1 Introduction
Plants of the genus Cannabis (Cannabaceae; hemp, drug-type) have been used for thousands of
years for fiber, nutritional seedoil and medicinal or psychoactive effects. Archaeological evidence
for hemp fiber textile production in China dates to at least as early as 6000 years ago (Li 1973),
but possibly as early as 12,000 years ago (Russo, 2007), suggesting Cannabis was one of the first
domesticated plants. Evidence for ancient medicinal or shamanistic use of Cannabis has been
found at Chinese, Indian, middle-Eastern, Greek and sub-Saharan African sites (reviewed in:
Russo, 2007), illustrating the extent of Cannabis use throughout human history. A central Asian
site of domestication is commonly cited (Schultes et al. 1974), although genetic analyses suggest
two independent domestication events may have occurred separately for Cannabis sativa and
Cannabis indica (Hillig, 2005).
Cannabis plants are usually annual wind-pollinated dioecious herbs, though individuals
may live more than a year in subtropical climates (Cherniak 1982) and monoecious populations
exist (de Meijer et al., 2003). The taxonomic composition of the genus remains unresolved, with
two species (C. indica and C. sativa) commonly cited (Habib et al. 2013), although C. ruderalis is
sometimes proposed as a third species that contains northern short-day or auto-flowering plants
(Small and Cronquist 1976). Monospecific treatment of the genus as Cannabis sativa L. is also
common (van Bakel et al., 2011) and various alternative species and subspecies names (e.g. C.
chinensis or Cannabis sativa subsp. indica var. kafiristanica) are sometimes still referenced
(reviewed in: Schultes et al. 1974). Given that no broad-scale samplings of Cannabis genomic
66
diversity are published to date, these classification schemes remain debatable. Domesticated and
feral populations are currently growing on every continent except Antarctica, in addition to the
putatively native wild populations found across Eurasia. These populations contain expansive
phenotypic diversity in terpenoid and cannabinoid profiles (Hillig, 2004; Hillig and Mahlberg,
2004), as well as many morphological and life-history characteristics, further fueling debate
regarding Cannabis taxonomic status (Russo, 2007).
One unusual feature of the Cannabis genus is the production of a tremendous diversity of
compounds called Cannabinoids, so named because they are not produced at high levels in any
other plant species. Cannabinoids are a group of over 74 known C21 terpenophenolic compounds
(Radwan et al., 2008; ElSohly and Slade, 2005) responsible for many reported medicinal and
psychoactive effects of Cannabis consumption (Poklis et al., 2010). The plants synthesize a
carboxylic acid form of these compounds, and heating is required to produce the pH-neutral forms
that are most active in humans. Interestingly, these compounds have pronounced neurological
effects on a wide range of vertebrate and invertebrate taxa, suggesting an ancient origin of the
endocannabinoid receptors, perhaps as old as the last common ancestor or all extant bilaterians,
over 500 MYA (Salzet et al., 2000; McPartland et al., 2006). The plant compounds thus produced
have the potential to effect a broad range of metazoans, though their ecological function in nature
is not well understood. Indeed, suggested roles for these compounds include many biotic and
abiotic defenses, such as suppression of pathogens and herbivores, protection from UV radiation
damage, and attraction of seed dispersers. These hypotheses about the selective benefits of
cannabinoid production in nature remain speculative, as none have been experimentally verified
to date. We do know more, however, about the more recent evolution of the plants under human
cultivation.
67
High delta-9-tetrahydrocannabinol (THC) content has been breed into many strains due to
its potent psychoactive, appetite-stimulating, analgesic and antiemetic effects (Mechoulam and
Gaoni 1967), which are mediated through interactions with human endocannabinoid receptors
CB1 and CB2 (Matsuda et al., 1990; Di Marzo et al., 2004). After several decades of accelerated
clandestine cultivation and breeding improvements, modern high-THC strains can currently yield
dried un-pollenated female flower material that contains over 30% THC by dry-weight (Swift et
al., 2013). High cannabidiol (CBD) content producing plants are historically used in some hashish
preparations (Anderson 1980), and are presently in high demand on the US market as an antiseizure therapy (Mechoulam et al., 2002). A single locus with co-dominate THC or CBD
production alleles (Staginnus et al., 2014; de Meijer et al., 2003) provides easy control for breeding
high or ~50% CBD production plants. Other cannabinoids such as cannabigerol (CBG) (Borrelli
et al., 2014), cannabichromene (CBC) (Izzo et al., 2012) or delta-9-tetrahydocannabivarin (THCV)
(Mcpartland et al., 2015) demonstrate pharmacological promise, and can also be produced at high
levels by the plant (de Meijer et al., 2008, 2009; de Meijer and Hammond, 2005). Other Cannabis
secondary metabolites such as terpenes and flavonoids likely contribute to therapeutic or
psychoactive effects (Russo, 2011), with myrcene for example being proposed to produce sedative
effects associated with specific strains (Hazekamp and Fischedick, 2012).
Plants that produce low levels of THC are herein referred to as hemp, while high THC
producing varietals used in this study are described as drug-type strains. Hemp strains typically
have a distinct set of growth characteristics, with fiber varieties reaching up to 6 meters in height
during a growing season, exhibiting reduced flower set and increased internodal spacing compared
to drug-type relatives. Despite the widespread prohibition of drug-type Cannabis cultivation from
the 1930s to present (Bonnie and Whitebread 1970), hemp cultivation and breeding continued in
68
parts of Europe, Canada, and China though this period, as well as for a brief comeback during
WWII in the USA through the hemp for victory campaign. Studies to date have found hemp
varieties are genetically distinct from drug-type strains (van Bakel et al., 2011; Gilmore et al.,
2007), though (Hillig, 2005) interestingly found southeastern Asian hemp landraces are more
closely related to Afghani drug-type strains than to European hemp strains.
Cannabis has a diploid genome (2n = 20), and an XY/XX chromosomal sex determining
system (Divashuk et al., 2014; Moliterni et al., 2004). Genome size is estimated to be 818 Mb for
female plants and 843 Mb for male plants (Sakamoto et al. 1998). Currently a draft genome
consisting of 60,029 scaffolds is available for a Purple Kush drug-type from the National Center
for Biotechnology Information (NCBI accessions: JH226140-JH286168).
Additional whole
genome data is available from NCBI for the Finola (SRP008728) and USO31 (SRP008730) hemp
strains. Presently Cannabis is the only multi-billion dollar crop without a genetic linkage or
physical genome map available (Semagn et al., 2006).
Previous studies of Cannabis genetic diversity have used either many samples with few
molecular markers (Hillig, 2005; Gilmore et al., 2007) or whole genome wide data for relatively
few samples types (van Bakel et al., 2011). In this study we present a genome wide re-mapping
analysis for 43 Cannabis individuals sampled from diverse hemp and drug-type plants, the
largest to date. The aim of this study was to assess the genomic diversity and phylogenetic
relationships among Cannabis plants that have distinct phenotypes, and that were described a
priori by plant breeders as various landrace indica, sativa, hemp and drug-types, as well as
commercially available hemp and drug-types with unclear pedigrees. These data and analyses
will pave the way for the development of modernized breeding and quality assurance tools
(Collard and Mackill, 2008), which are lacking in the nascent US Cannabis breeding and
69
production industry. Cannabis genomics also offers an ideal system for understanding plant
domestication and hybridization events (Baute et al., 2015), as well as the evolution of separate
sexes (Divashuk et al., 2014).
4.2 Materials and Methods
Forty plant tissue samples were collected from a variety of breeding and production facilities in
Colorado. The strain names, descriptions and putative origins used in this paper were recorded
from the providers of the sources material (Table 4.1). DNA extractions were performed using
the Qiagen DNeasy Plant Mini Kit (Valencia, CA) according to the manufacturer’s protocol.
Whole genome shotgun sequencing was performed using standard Illumina multiplexed library
preparation protocols for a 2 x 125 HiSeq 2500 lane and 2 x 150 NextSeq 500 run. Sequencing
efforts were targeted to approximately 4-6x coverage of the Cannabis genome per sample.
Trimmomatic (Bolger et al., 2014) was used to trim any remaining adaptor sequence
from raw fastq reads and remove sequences with low quality regions or ambiguous base calls
using the following settings: ILLUMINACLIP:IlluminaAdapters:2:20:10 LEADING:20
TRAILING:20 SLIDINGWINDOW:5:15 MINLEN:100. Trimmed raw reads from our 40 new
samples, plus raw reads from the three publically available Cannabis genomes (Purple Kush,
Finola and USO31) were then aligned to the only publicly available draft genome of Purple Kush
(JH226140-JH286168) using the Burrows-Wheeler Alignment tool (BWA mem) (Li and Durbin,
2009). Chloroplast and mitochondrial regions were excluded. Individual alignments were then
collated and used to produce a single variant call format table (.vcf) for all samples using
samtools mpileup -uf | bcftools view –bvcg (Li et al., 2009). This table was then filtered to
include only high quality informative SNP sites from the single copy portion of the genome
70
(Supplementary Figure 5) using bash and awk routines on the following standard .vcf fields: QC
( >200), GQ (< 30), DP (75 - 300) and AF1 (.1 - .9). The remaining SNP sites where then reformatted and downsampled for further analysis using bash and awk scripts.
To visualize genetic relationships, divergence, and ancestral hybridization among
lineages, a phylogenetic neighbor network was inferred using 100,000 SNPs from all 43
available datasets with simple p-distance calculations (Huson and Bryant, 2006). The Structure
2.3.4 admixture model (Pritchard et al., 2000) was used to calculate the likelihood of various
numbers of populations (K=1-10) given our data, using 5000 MCMC replications and a burnin of
500 per run. The Evanno method was then used to determine the most probable value of K
(Evanno et al., 2005).
Table 4.1 Sample details.
Cultivar name
Sex
Reproductive
Supplier classification
Heterozygous SNPs
Afghan Kush 1
M
D
indica
1425638
Afghan Kush 2
M
D
indica
1251378
Afghan Kush 3
M
D
indica
1326196
Afghan Kush 4
F
D
indica
1200949
Afghan Kush 5
F
D
indica
1365756
Afghan Kush 6
F
D
indica
1349126
Carmagnola 1
F
D
hemp
1649463
Carmagnola 2
F
D
hemp
1782259
Carmagnola 3
F
D
hemp
1954731
71
Carmagnola 4
M
D
hemp
2111915
Carmagnola 5
M
D
hemp
2113380
Carmagnola 6
M
D
hemp
2145951
Dagestani hemp
F
M
hemp
1387341
Chem91
F
D
hybrid
1423733
Original Sour Diesel
F
D
sativa
1441569
Durban Poison 1
F
D
sativa
1529428
Hawaiian
F
D
sativa
1737720
Lebanese
F
D
unknown
1489262
Tora Bora
F
D
indica
1490365
G13
F
D
indica
1728251
Harlequin1
F
D
sativa
1899375
Cannatonic1
F
D
hybrid
1707990
Auto AK47*
F
D
hybrid
1825464
Low Ryder*
F
D
hybrid
1601477
Pre-98 Bubba Kush
F
D
indica
1619790
Maui Waui
F
D
sativa
1490231
Super Lemon Haze
F
D
sativa
1286703
Hindu Kush
F
D
indica
1245063
Somali Taxi Cab
F
D
sativa
1877605
72
Durban Poison 2
F
D
sativa
1889544
Rocky Mountain
Unknown
D
hybrid
1783563
R4 (Charlotte's Web)1
F
D
indica
1474524
Kunduz
M
D
indica
1651702
Kansas feral
Unknown
D
hemp
1852936
Kompolti 1
F
D
hemp
1528469
Kompolti 2
M
D
hemp
1764527
Euro Seed Oil
F
D/M
hemp
1854976
Chinese hemp
F
D
hemp
1789975
Colombia Rio Negro
F
D
sativa
1660096
Mexican
F
D
sativa
1621453
Finola
F
D
hemp
1350964
Purple Kush
F
D
indica
1683486
USO31
F
D
hemp
1701596
Blueberry
Column two sex: M=male, F=female; Column three reproductive type: D=Dioecious, M=Monoecious; * = autoflowering; 1 = CBD producer
4.3 Results and Discussion
Summary information and raw sequencing libraries are publically available from the NCBI short
read archive (accessions pending). Alignments to the Purple Kush reference scaffolds were
quality filtered to include 10,392,741 SNPs from the single copy portion of the nuclear genome.
73
Further filtering for sites that had at minimum 10% and at maximum 90% variant frequency left
8,538,516 SNPs for downstream analysis.
Phylogenetic relationships are commonly represented as bifurcating trees that explicitly
model mutation driven divergence and speciation events. Whole genome wide sequence datasets
include information about recombination, hybridization, and gene loss or genesis events, some of
which may be incongruent with one and other (Huson and Bryant, 2006). Phylogenetic networks
can represent incompatible phylogenetic signals across large character matrices, and thus offer a
more appropriate model for the variety of events that drive genomic evolution. Our phylogenetic
neighbor network of 43 Cannabis nuclear genomes (Figure 4.1) shows that all European hemp
strains form a distinct clade, separated from all drug-type strains by a consistent band of parallel
branches. Clustering of various wide-leaf blade drug-type strains (Figure 4.2) around our
population of Afghan Kush landrace samples provides further support for the designation of C.
indica (Hillig, 2005), although narrow-leaf blade drug-type strains appear to form several clades,
perhaps influenced by the inclusion of hybrid strains in the analysis.
74
Figure 4.1 Phylogenetic neighbor network of 100,000 SNPs from the single-copy portion of the
Cannabis genome.
75
Figure 4.2 Example of narrow-leaf blade type (left) and wide-leaf blade type (right) strains.
To determine the statistical likelihood of various population scenarios for our dataset, we applied
the admixture model-based Bayesian clustering method of Structure (Pritchard et al., 2000) to
100,000 randomly sampled SNP loci. Even though the Structure admixture model assumes an
absence of linkage disequilibrium within populations, genome wide datasets that include closely
linked markers can be appropriate for this approach when the signal from independent sites
outweighs those of the linked sites (Conrad et al., 2006). Thus by downsampling from the more
than 10 million SNPs identified through our initial alignments, we reduced the number closely
linked sites, making the data more appropriate given the Structure model assumptions (in
addition to constraining the data to a more computationally practical size). Our most likely
population structure analysis ( K=3, mean Ln likelihood = -2327993.3, Figure 4.3), again shows
76
clear separation between hemp and drug-type strains, except for a wide-leaf blade type Chinese
hemp sample that does not cluster strongly with any other hemp or drug-type strain. Hillig’s
(2005) analysis of alloenzymes concluded that Asian hemp strains were more similar to Asian C.
indica drug-type strains than they were to European hemp, and while we did not find support for
this conclusion entirely, it is apparent that Asian and European hemp strains are highly
dissimilar, possibly reflecting independent domestication events. We again also found strong
support for the putative C. indica clade anchored by the Afghan Kush samples, but now also
resolved a third clade of drug-type strains comprised of narrow-leaf blade varieties Super Lemon
Haze, Maui Waui, Hawaiian, and Durban Poison. It remains unclear if both this narrow-leaf
blade drug-type clade, and the European hemp clade, fit together within the C. sativa concept,
given the distance between these clades. Proportions of hybrid drug-type genomes were also
inferred, showing a range of ratios across strains.
77
Figure 4.3 Structure plot for K = 3. Only
one or two individuals from each strain
were used in this analysis in order to
avoid biased cluster inferences (33 of 43
samples).
Much more Cannabis diversity likely
remains to be sampled. Notably absent
from our sample set are putative C.
ruderalis samples, although Finola is an
auto-flowering hemp strain, and Low
Ryder and Auto AK-47 are autoflowering drug-type strains—all with
purported C. ruderalis ancestry. ‘Indica’
and ‘sativa’ are commonly used terms
ascribed to plants that have certain
characteristics, often leaf morphology
and perceived effects of consuming the
plant (Habib et al. 2013). However these names are also rooted in taxonomic traditions dating to
Linnaeus who first classified the genus as monotypic (Cannabis sativa L., Linnaeus 1753), and
Lamarck who subsequently designated Cannabis indica to accommodate the short stature potent
drug-type plants from the Indian subcontinent (Lamarck 1783). Final resolution of Cannabis
taxonomy will require complete assessment of standing global genetic diversity and
experimental evaluation of reproductive compatibly across all major genetic groups (Rieseberg
and Willis, 2007), in conjunction with morphological circumscriptions.
78
One major complication obscuring the understanding of Cannabis diversity and history is
the lack of a known native range or ranges of Cannabis spp. In addition to divergent breeding
efforts and human-vectored transport of seeds, the tendency of Cannabis to escape into feral
populations wherever human cultivation occurs (Small et al., 2003; Haney and Kutscheid, 1975),
coupled wind pollination biology and no known post-zygotic reproductive barriers, makes the
existence of pure wild native Cannabis populations unlikely--or at least difficult to confirm. The
weedy tendencies of Cannabis are exemplified by the mid-western US populations of feral hemp
that flourish despite the eradication efforts by the Drug Enforcement Agency, which have for
decades totaled millions of plants removed per year (http://www.dea.gov/ops/cannabis.shtml.
We were unable to access putative wild or feral Eurasian landrace material for this study, but a
comprehensive evaluation of Cannabis diversity that includes feral and wild Eurasian
populations is required to ascertain if levels of divergence and gene flow are consistent with one
or more origins of domestication (Hillig, 2005). Even if these extant populations are highly
admixed with modern cultivars, their study promises to offer insight into Cannabis ecology and
evolution, given how different the selective regime of the feral setting is compared to that of
agricultural fields (Kane and Rieseberg, 2008).
Cannabis genomics offers a window into the past, but also a road forward.
Although historical and clandestine breeding efforts have been clearly successful in many
regards (Swift et al., 2013; Mehmedic et al., 2010), Cannabis lags decades behind other major
plant crop species in other respects (Collard and Mackill, 2008). Developing stable Cannabis
lines capable of producing the full range of potentially therapeutic non-psychoactive
cannabinoids is important for the research community, which currently lacks access to diverse
and high-quality material in the US (Nutt et al., 2013). In addition to breeding resource
79
development, Cannabis genome science offers an attractive study system for understanding the
evolution of separate sexes in plants (Charlesworth, 2006; Barrett, 2002). Only 6-7% of
angiosperms are dioecious (Renner and Ricklefs, 1995) like Cannabis, plus the existence of
monoecious populations (de Meijer et al., 2003) that can be intercrossed with dioecious
individuals presents an excellent experimental system for studying the genetics and evolution of
separate sexes.
In this paper we extended the initial Cannabis genome study (van Bakel et al., 2011), by
re-mapping whole genome sequence reads to the existing Purple Kush draft scaffolds, to
understand diversity and evolutionary relationships among the major lineages. Analyses of a
subset of the 10.3 million SNPs from the single copy portion of the genome lends support for the
existence of at least three major Cannabis lineages (Figure 4.3). Deep and consistent separation
between European hemp and drug-type strains was found across the nuclear genome (Figure
4.1), while moderate evidence was found for at least two drug-type lineages, and numerous
hybrids (Figure 4.3). Overall, we hope the publicly available data and analyses from this study
will help pave the way for continued exploration of the origins and history of this controversial
plant, and to unlock the full agricultural and therapeutic potential of Cannabis.
80
CHAPTER 5
SUMMARY
In Chapter 2 I used culture independent techniques to characterize a low diversity microbial
ecosystem, including Bacteria, Eukaryotes and Archaea, from volcanic mineral soils with
extremely low organic carbon levels found at > 6,000 meters elevation above the Atacama
Desert. In the absence of plant life or microbial phototrophs this led me to propose that trace gas
oxidation may supply energy for microbial activity, although precipitation derived dissolved
organic carbon and atmospheric dust are also likely sources of nutrients. In Chapter 3 I used
comparative community metagenomics, and genomics of the most abundant bacterial community
member, to explore if trace gas metabolism, or other inferred metabolic traits, might explain the
uneven and low diversity community structure found in these mineral soils. Owing to the
culture-free nature of these analyses, and the limited geographic extent that my samples covered,
I was ultimately unable to reject or confirm the trace gases hypothesis or the dissolved organic
carbon and dusts hypotheses. Nonetheless, Chapter 3 provided some evidence for the use of a
variety of trace gases by the most abundant bacterial community member that were not
considered in Chapter 2 (H2, CO and several organic C1 compounds). I also found that
heterotrophic metabolism of organic carbon may play a lesser role for this ecosystem compared
to other deserts ecosystems (Figure 3.2). In retrospect I have concluded that an experimental
evaluation of metabolic uptake of substrates either in situ, by sample microcosms and or by
cultured isolates would have provided a more direct and compelling test of putative energy
sources.
81
Even though this type of ecologically contextualized microbial physiology is challenging
to execute and is not free of potential confounding issues (Theisen and Murrell, 2005), it is in my
opinion an important method for answering questions about specific microbial metabolic
functions. This is especially true since microbes transported as inactive forms over long
distances (Stres et al., 2013) can influence community structure (Nemergut et al., 2013) and
functional gene presence. That most microbes are un-culturable is often cited as a rationale for
using sequencing based culture-independent approaches (Pace, 1997), and while sequencing
methods are by far more high-throughput and have many advantages, understanding why most
environmental microbes resist standard culturing efforts remains an intriguing question. Indeed,
it appears standard culturing techniques are simply the wrong methods for the majority of
bacteria (Tanaka et al., 2014). Furthermore, standard plate or liquid dilution and isolation
methods are lethal to many bacteria because they require metabolites produced by other
community members (Ling et al. 2015). Unraveling the evolutionary co-dependencies within
microbial communities (Morris et al., 2012), is a key and often overlooked challenge that
remains for a more complete understanding of microbial diversity and function.
Figure 5.1 Examples of isolates from the Llullaillaco Volcano samples used in Chapters 2 and 3.
Left: Blastococcus sp. nov., right: Rhodanobacter sp. nov. Putative taxonomy is based on fulllength 16S rRNA gene homology. The Blastococcus sp. nov. colony measured approximately 3
82
mm in diameter after 11 months of incubation at room temperature. The Rhodanobacter sp. nov.
measured approximately 2 mm in diameter after 6 months of incubation at room temperature.
After several years of experiments, and dozens of isolates I failed to recover any of the top five
most abundant lineages from the Figure 3.1. Whether this is because the Pseudonocardia spp.
could not grow in isolation, or the laboratory conditions were not conducive to growth, or if they
were non-viable remains unknown.
The Cannabis genome project is the start of a lifelong goal to understand plant traits and
breed novel varieties. In Chapter 4 I analyzed genomic diversity and population structure of 43
Cannabis accession using a whole genome re-mapping analysis. These efforts revealed evidence
for three major clusters of genotypic diversity (Figure 4.3), and extensive hybridization of many
modern drug-type strains. This study is limited by the lack samples from Eurasian wild or feral
populations, including northern putative C. ruderalis samples, due to lack of access to this
material in Colorado, but nonetheless provides the largest comparative analysis of Cannabis
genomic diversity to date. These data and analysis can further serve as a foundation for
understanding the origins of Cannabis domestication events with the additional sequencing of
Eurasian wild plants, and the full range of cultivar diversity (www.leafly.com lists 1,224 drugtype strains).
Assaying and describing Cannabis diversity is only the first step however. In addition to
answering basic questions about the evolution of Cannabis lineages and dioecy, the broader
applied goal of my work is to develop genomic tools for accelerating the throughput of breeding
projects (Collard and Mackill, 2008; Pacifico et al., 2006). This will be the work of many labs
for years to come and will include developing genetic linkage maps and trait based associations
(Korte and Farlow, 2013) before marker assisted selection efforts can even begin. Mapping
populations and trait association datasets are currently being developed through various
83
collaborations, but are for the time being limited by the absence of large scale pollen-controlled
breeding facilities in Colorado.
84
6. Bibliography
Anderson, L. C. 1974. A study of systematic wood anatomy in Cannabis. Harvard University
Botanical Museum Leaflets 24, 29–36.
van Bakel, H., Stout, J. M., Cote, A. G., Tallon, C. M., Sharpe, A. G., Hughes, T. R., and Page, J.
E. (2011). The draft genome and transcriptome of Cannabis sativa. Genome Biol. 12, R102.
doi:10.1186/gb-2011-12-10-r102.
Barrett, S. C. H. (2002). The evolution of plant sexual diversity. Nat. Rev. Genet. 3, 274–284.
doi:10.1038/nrg776.
Baute, G. J., Kane, N. C., Grassa, C. J., Lai, Z., and Rieseberg, L. H. (2015). Genome scans
reveal candidate domestication and improvement genes in cultivated sunflower , as well as
post-domestication introgression with wild relatives. New Phytologist doi:
10.1111/nph.13255
Beutler, J. A. and A. H. Dermarderosian. 1978. Chemotaxonomy of Cannabis I. Crossbreeding
between Cannabis sativa and C. ruderalis, with analysis of cannabinoid content. Econ.
Bot. 32,387-394.
Bolger, A. M., Lohse, M., and Usadel, B. (2014). Genome analysis Trimmomatic : a flexible
trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120.
doi:10.1093/bioinformatics/btu170.
Borrelli, F., Pagano, E., Romano, B., Panzera, S., Maiello, F., Coppola, D., Petrocellis, L. De,
Buono, L., Orlando, P., and Izzo, A. A. (2014). Colon carcinogenesis is inhibited by the
TRPM8 antagonist cannabigerol , a Cannabis-derived non-psychotropic cannabinoid.
Carcinogenesis 35, 2787–2797. doi:10.1093/carcin/bgu205.
Bonnie, R. J. and Whitebread, C. H. (1970). The forbidden fruit and the tree of knowledge: an
inquiry into the legal history of american marijuana prohibition. Virginia Law Rev. 56, 9711203.
Casano, S., G. Grassi, V. Martini, and M. Michelozzi. 2011. Variations in Terpene Profiles of
Different Strains of Cannabis sativa L. Xxviii International Horticultural Congress on
Science and Horticulture for People (Ihc2010): A New Look at Medicinal and Aromatic
Plants Seminar 925:115-121.
Charlesworth, D. (2006). Evolution of Plant Breeding Systems. Curr. Biol. 16, 726–735.
doi:10.1016/j.cub.2006.07.068.
85
Collard, B. C. Y., and Mackill, D. J. (2008). Marker-assisted selection: an approach for precision
plant breeding in the twenty-first century. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 363,
557–572. doi:10.1098/rstb.2007.2170.
Conrad, D. F., Jakobsson, M., Coop, G., Wen, X., Wall, J. D., Rosenberg, N. a, and Pritchard, J.
K. (2006). A worldwide survey of haplotype variation and linkage disequilibrium in the
human genome. Nat. Genet. 38, 1251–1260. doi:10.1038/ng1911.
Divashuk, M. G., Alexandrov, O. S., Razumova, O. V, Kirov, I. V, and Karlov, G. I. (2014).
Molecular cytogenetic characterization of the dioecious Cannabis sativa with an XY
chromosome sex determination system. PLoS One 9, e85118.
doi:10.1371/journal.pone.0085118.
Dovichi, N. J., and Zhang, J. (2000). How capillary electrophoresis sequenced the human
genome. Angew. Chemie - Int. Ed. 39, 4463–4468. doi:10.1002/15213773(20001215)39:24<4463::AID-ANIE4463>3.0.CO;2-8.
ElSohly, M. A., and Slade, D. (2005). Chemical constituents of marijuana: The complex mixture
of natural cannabinoids. Life Sci. 78, 539–548. doi:10.1016/j.lfs.2005.09.011.
Evanno, G., Regnaut, S., and Goudet, J. (2005). Detecting the number of clusters of individuals
using the software STRUCTURE: A simulation study. Mol. Ecol. 14, 2611–2620.
doi:10.1111/j.1365-294X.2005.02553.x.
Flaxman, S. M., Wacholder, A. C., Feder, J. L., and Nosil, P. (2014). Theoretical models of the
influence of genomic architecture on the dynamics of speciation. Mol. Ecol. 23, 4074–4088.
doi:10.1111/mec.12750.
Gilmore, S., Peakall, R., and Robertson, J. (2007). Organelle DNA haplotypes reflect crop-use
characteristics and geographic origins of Cannabis sativa. Forensic Sci. Int. 172, 179–190.
doi:10.1016/j.forsciint.2006.10.025.
Graur, D., Zheng, Y., Price, N., Azevedo, R. B. R., Zufall, R. A., and Elhaik, E. (2013). On the
immortality of television sets: “Function” in the human genome according to the evolutionfree gospel of encode. Genome Biol. Evol. 5, 578–590. doi:10.1093/gbe/evt028.
Habib et al. 2013: http://liq.wa.gov/publications/Drug-type/BOTEC%20reports/1c-Testing-forPsychoactive-Agents-Final.pdf
Hanage, W. P. (2014). Microbiome science needs a healthy dose of scepticism. Nature 512, 247–
248. doi:10.1038/512247a.
Haney, A. and Kutscheid, B. B. (1975) An ecological study of naturalized hemp (Cannabis
sativa L.) in east-central Illinois. Am. Midl. Nat. 93, 1–24.
86
Hillig, K. W. (2004). A chemotaxonomic analysis of terpenoid variation in Cannabis. Biochem.
Syst. Ecol. 32, 875–891. doi:10.1016/j.bse.2004.04.004.
Hillig, K. W. (2005). Genetic evidence for speciation in Cannabis (Cannabaceae). Genet.
Resour. Crop Evol. 52, 161–180. doi:10.1007/s10722-003-4452-y.
Hillig, K. W., and Mahlberg, P. G. (2004). A chemotaxonomic analysis of cannabinoid variation
in Cannabis (Cannabaceae). Am. J. Bot. 91, 966–975. doi:10.3732/ajb.91.6.966.
Huson, D. H., and Bryant, D. (2006). Application of phylogenetic networks in evolutionary
studies. Mol. Biol. Evol. 23, 254–267. doi:10.1093/molbev/msj030.
Itoh, T., Yamanoi, K., Kudo, T., Ohkuma, M., and Takashina, T. (2011). Aciditerrimonas
ferrireducens gen. nov., sp. nov., an iron-reducing thermoacidophilic actinobacterium
isolated from a solfataric field. Int. J. Syst. Evol. Microbiol. 61, 1281–5.
doi:10.1099/ijs.0.023044-0.
Izzo, A. A., Capasso, R., Aviello, G., Borrelli, F., Romano, B., Piscitelli, F., Gallo, L., Capasso,
F., Orlando, P., and Di Marzo, V. (2012). Inhibitory effect of cannabichromene, a major
non-psychotropic cannabinoid extracted from Cannabis sativa, on inflammation-induced
hypermotility in mice. Br. J. Pharmacol. 166, 1444–1460. doi:10.1111/j.14765381.2012.01879.x.
Kane, N. C., and Rieseberg, L. H. (2008). Genetics and evolution of weedy Helianthus annuus
populations: Adaptation of an agricultural weed. Mol. Ecol. 17, 384–394.
doi:10.1111/j.1365-294X.2007.03467.x.
Korte, A., and Farlow, A. (2013). The advantages and limitations of trait analysis with GWAS: a
review. Plant Methods 9, 29. doi:10.1186/1746-4811-9-29.
Lamarck, J. B. (1783). Encyclope´die me´thodique botanique, Chez Panckoucke.
Leinonen, T., McCairns, R. J. S., O’Hara, R. B., and Merilä, J. (2013). Q(ST)-F(ST)
comparisons: evolutionary and ecological insights from genomic heterogeneity. Nat. Rev.
Genet. 14, 179–90. doi:10.1038/nrg3395.
Li, H. L. (1973). An archaeological and historical account of Cannabis in China. Econ. Bot.
1973, 28, 437-444.
Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler
transform. Bioinformatics 25, 1754–1760. doi:10.1093/bioinformatics/btp324.
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G.,
and Durbin, R. (2009). The Sequence Alignment/Map format and SAMtools.
Bioinformatics 25, 2078–2079. doi:10.1093/bioinformatics/btp352.
87
Ling, L. L., Schneider, T., Peoples, A. J., Spoering, A. L., Engels, I., Conlon, B. P., Mueller, A.,
Hughes, D. E., Epstein, S., Jones, M., et al. A new antibiotic kills pathogens without
detectable resistance. Nature 517, 455-461. doi:10.1038/nature14098.
Linnaeus, C. (1753). Species plantarum. Laurentii Salvii, Holmiae, p. 1200.
Di Marzo, V., Bifulco, M., and De Petrocellis, L. (2004). The endocannabinoid system and its
therapeutic exploitation. Nat. Rev. Drug Discov. 3, 771–784. doi:10.1038/nrd1495.
Matsuda, L. a, Lolait, S. J., Brownstein, M. J., Young, a C., and Bonner, T. I. (1990). Structure
of a cannabinoid receptor and functional expression of the cloned cDNA. Nature 346, 561–
564. doi:10.1038/346561a0.
McPartland, J. M., Duncan, M., Marzo, V. Di, and Pertwee, R. G. (2015). Are cannabidiol and Δ
9 -tetrahydrocannabivarin negative modulators of the endocannabinoid system ? A
systematic review. 737–753. doi:10.1111/bph.12944.
McPartland, J. M., Matias, I., Di Marzo, V., and Glass, M. (2006). Evolutionary origins of the
endocannabinoid system. Gene 370, 64–74. doi:10.1016/j.gene.2005.11.004.
Mechoulam, R., Gaoni, Y., (1967) Recent advances in the chemistry of hashish. Fortschritte der
Chemie Organischer Naturstoffe. 25, 175– 213
Mechoulam, R., Parker, L. A. and Gallily, R. (2002). Cannabidiol: an overview of some
pharmacological aspects. J. Clin. Pharmacol. 42, 11S–19S.
doi:10.1177/0091270002238789.
Mehmedic, Z., Chandra, S., Slade, D., Denham, H., Foster, S., Patel, A. S., Ross, S. A., Khan, I.
A.,. and ElSohly, M. A. (2010). Potency trends of Δ9-THC and other cannabinoids in
confiscated Cannabis preparations from 1993 to 2008. J. Forensic Sci. 55, 1209–1217.
doi:10.1111/j.1556-4029.2010.01441.x.
de Meijer, E. P. M. De, Bagatta, M., Carboni, A., Crucitti, P., Moliterni, V. M. C., Ranalli, P.,
and Mandolino, G. (2003). The Inheritance of Chemical Phenotype in Cannabis sativa L .
346, 335–346.
de Meijer, E. P. M., and Hammond, K. M. (2005). The inheritance of chemical phenotype in
Cannabis sativa L. (II): Cannabigerol predominant plants. Euphytica 145, 189–198.
doi:10.1007/s10681-005-1164-8.
de Meijer, E. P. M., Hammond, K. M., and Micheler, M. (2008). The inheritance of chemical
phenotype in Cannabis sativa L. (III): variation in cannabichromene proportion. Euphytica
165, 293–311. doi:10.1007/s10681-008-9787-1.
88
de Meijer, E. P. M., Hammond, K. M., and Sutton, a. (2009). The inheritance of chemical
phenotype in Cannabis sativa L. (IV): cannabinoid-free plants. Euphytica 168, 95–112.
doi:10.1007/s10681-009-9894-7.
Moliterni, V. M. C., Cattivelli, L., Ranalli, P., and Mandolino, G. (2004). The sexual
differentiation of Cannabis sativa L.: A morphological and molecular study. Euphytica 140,
95–106. doi:10.1007/s10681-004-4758-7.
Morris, J. J., Lenski, R. E., Zinser, E. R., and Loss, A. G. (2012). The black queen hypothesis:
evolution of dependencies through adaptive gene loss.  mBio 3, e00036-12.
doi:10.1128/mBio.00036-12.Updated.
Nemergut, D. R., Schmidt, S. K., Fukami, T., O’Neill, S. P., Bilinski, T. M., Stanish, L. F.,
Knelman, J. E., Darcy, J. L., Lynch, R. C., Wickey, P., et al. (2013). Patterns and Processes
of Microbial Community Assembly. Microbiol. Mol. Biol. Rev. 77, 342–356.
doi:10.1128/MMBR.00051-12.
Nutt, D. J., King, L. A, and Nichols, D. E. (2013). Effects of Schedule I drug laws on
neuroscience research and treatment innovation. Nat. Rev. Neurosci. 14, 577–585.
doi:10.1038/nrn3530.
Pace, N. R. (1997). A molecular view of microbial diversity and the biosphere. Science 276,
734–40.
Pacifico, D., Miselli, F., Micheler, M., Carboni, A., Ranalli, P., and Mandolino, G. (2006).
Genetics and Marker-assisted Selection of the Chemotype in Cannabis sativa L. Mol.
Breed. 17, 257–268. doi:10.1007/s11032-005-5681-x.
Poklis, J. L., Thompson, C. C., Long, K. a, Lichtman, A. H., and Poklis, A. (2010). Disposition
of cannabichromene, cannabidiol, and Δ9-tetrahydrocannabinol and its metabolites in mouse
brain following marijuana inhalation determined by high-performance liquid
chromatography-tandem mass spectrometry. J. Anal. Toxicol. 34, 516–20.
Pritchard, J. K., Stephens, M., and Donnelly, P. (2000). Inference of population structure using
multilocus genotype data. Genetics 155, 945–959. doi:10.1111/j.1471-8286.2007.01758.x.
Radwan, M. M., Ross, S. a., Slade, D., Ahmed, S. a., Zulfiqar, F., and Elsohly, M. a. (2008).
Isolation and characterization of new cannabis constituents from a high potency variety.
Planta Med. 74, 267–272. doi:10.1055/s-2008-1034311.
Renner, S. S., and Ricklefs, R. E. (1995). Dioecy and its correlates in the flowering plants. Am. J.
Bot. 82, 596. doi:10.2307/2445418.
Rieseberg, L. H., and Willis, J. H. (2007). Plant speciation. Science 317, 910–914.
doi:10.1126/science.1137729.
89
Russo, E. B. (2007). History of Cannabis and its preparations in saga, science, and sobriquet.
ChemInform 38. doi:10.1002/chin.200747224.
Russo, E. B. (2011). Taming THC: potential Cannabis synergy and phytocannabinoid-terpenoid
entourage effects. Br. J. Pharmacol. 163, 1344–64. doi:10.1111/j.1476-5381.2011.01238.x.
Sakamoto, K., Akiyama, Y., Fuku, I. K., Kamada, H., Satoh, S. (1998). Characterization;
Genomes izes and morphology of sex chromosomes in hemp (Cannabis sativ L.).
Cytologia 63,459-464.
Salzet, M., Breton, C., Bisogno, T., and Di Marzo, V. (2000). Comparative biology of the
endocannabinoid system possible role in the immune response. Eur. J. Biochem. 267, 4917–
27. doi:ejb1550.
Sanger, F., Coulson, A. R., Friedmann, T., Air, G. M., Barrell, B. G., Brown, N. L., Fiddes, J. C.,
Hutchison, C. A., Slocombe, P. M., and Smith, M. (1978). The nucleotide sequence of
bacteriophage phiX174. J. Mol. Biol. 125, 225–246. doi:10.1016/0022-2836(78)90346-7.
Schultes R. E., Klein W. M., Plowman T., Lockwood T. E. (1974). Cannabis: an example of
taxonomic neglect. Harvard University Botanical Museum Leaflets 23 337–367.
Semagn, K., Bjørnstad, Å., and Ndjiondjop, M. N. (2006). Principles , requirements and
prospects of genetic mapping in plants. African J. Biotechnol. 5, 2569–2587.
Shendure, J., Mitra, R. D., Varma, C., and Church, G. M. (2004). Advanced sequencing
technologies: methods and goals. Nat. Rev. Genet. 5, 335–344. doi:10.1038/nrg1325.
Small, E., Cronquist, A., 1976. A practical and natural taxonomy for Cannabis. Taxon 25, 405–
435.
Small, E., Pocock, T., and Cavers, P. (2003). The biology of Canadian weeds. 119. Cannabis
sativa L. Can. J. plant Sci. 83: 217–237.
Staginnus, C., Zörntlein, S., and de Meijer, E. (2014). A PCR marker linked to a THCA synthase
polymorphism is a reliable tool to discriminate potentially THC-rich plants of Cannabis
sativa L. J. Forensic Sci. 59, 919–26. doi:10.1111/1556-4029.12448.
Stres, B., Sul, W. J., Murovec, B., and Tiedje, J. M. (2013). Recently deglaciated high-altitude
soils of the Himalaya: diverse environments, heterogenous bacterial communities and longrange dust inputs from the upper troposphere. PLoS One 8, e76440.
doi:10.1371/journal.pone.0076440.
Swift, W., Wong, A., Li, K. M., Arnold, J. C., and McGregor, I. S. (2013). Analysis of Cannabis
seizures in NSA, Australia: Cannabis potency and cannabinoid profile. PLoS One 8, 1–9.
doi:10.1371/journal.pone.0070052.
90
Tanaka, T., Kawasaki, K., Daimon, S., Kitagawa, W., Yamamoto, K., Tamaki, H., Tanaka, M.,
Nakatsu, C. H., and Kamagata, Y. (2014). A hidden pitfall in the preparation of agar media
undermines microorganism cultivability. Appl. Environ. Microbiol. 80, 7659–7666.
doi:10.1128/AEM.02741-14.
Theisen, A. R., and Murrell, J. C. (2005). Facultative Methanotrophs Revisited. J. Bacteriol. 187,
4303–4305. doi: 10.1128/JB.187.13.4303-4305.2005
Tomasetti, C., and Vogelstein, B. (2015). Variation in cancer risk among tissues can be
explained by the number of stem cell divisions. Science. 347, 78-81. doi:
10.1126/science.1260825
Varshney, R. K., Nayak, S. N., May, G. D., and Jackson, S. a. (2009). Next-generation
sequencing technologies and their implications for crop genetics and breeding. Trends
Biotechnol. 27, 522–530. doi:10.1016/j.tibtech.2009.05.006.
91
7. APPENDEX
Supplementary Figure 1. Maximum likelihood phylogeny of Acidimicrobiae OTU1% lineages.
Aciditerrimonas ferrireducens (Itoh et al., 2011) was isolated from geothermal volcanic soils and
can respire heterotrophically using a limited spectrum of sugars or reduce ferric iron under
anaerobic conditions.
92
Supplementary Figure 2. K-mer analyses of genomic complexity, sampling effort and identity.
A) 15-mer spectrum plot including both Llullaillaco volcano metagenome libraries (yellow and
blue) and various publically available reference desert (black) and non-desert (grey)
metagenomes. Dataset are publicly available from MG RAST: all datasets from (Fierer et al.,
2012) and the Luquillo rainforest metagenome (4446153.3).
93
Supplementary Figure 3. Tetramer based emergent self-organizing map, built from the volcano
Pseudonocardia sp. contigs (large bright green squares), volcano Acidimicrobiae contigs (small
green squares), plus the Streptococcus coelicolor genome AL645882, purple squares) as an out
group. The topology of this self-organizing map confirms the discriminatory power of the
assembly coverage level based bins (Figure S3), particularly for the volcano Pseudonocardia sp.
The Acidimicrobiae bin contigs appear to contain a higher degree of mixing across topological
features, likely representing both misclassifications and transposon driven horizontal gene
transfer events.
94
Supplementary Figure 4. Histograms of average coverage levels of contigs from metagenome
assembly.
95
Supplementary Figure 5. Alignment depths of all 43 Cannabis samples for all high-quality SNP
loci.
96
Supplementary Table 1. genes with Ka:Ks ratios ≥ 1 when comparing Pseudonocardia
asaccharolytica with the site 1 volcano Pseudonocardia sp.
2_polyprenylphenol_hydroxylase_and_related_flavodoxin_oxidoreductases_CDS
hypothetical_protein_CDS
D_alanine_D_alanine_ligase_(EC_6_3_2_4)_CDS
SSU_ribosomal_protein_S30P_sigma_54_modulation_protein_CDS
ATPases_involved_in_chromosome_partitioning_CDS
ferrochelatase_(EC_4_99_1_1)_CDS
Cyanate_permease_CDS
NADH_dehydrogenase_subunit_J_(EC_1_6_5_3)_CDS
Predicted_phosphohydrolases_CDS
tRNA_(Guanine37_N(1)_)_methyltransferase_(EC_2_1_1_31)_CDS
Acyl_CoA_dehydrogenases_CDS
phosphoribosyl_AMP_cyclohydrolase_(EC_3_5_4_19)_CDS
alanine_racemase_(EC_5_1_1_1)_CDS
Protein_of_unknown_function_(DUF3159)_CDS
hypothetical_protein_CDS
succinyldiaminopimelate_aminotransferase_apoenzyme_(EC_2_6_1_17)_CDS
Leucyl_aminopeptidase_CDS
RNAse_PH_(EC_2_7_7_56)_CDS
2_3_dihydro_2_3_dihydroxybenzoyl_CoA_ring_cleavage_enzyme_CDS
Transposase_DDE_domain_CDS
pantothenate_synthetase_(EC_6_3_2_1)_CDS
Uncharacterized_conserved_protein_CDS
Protein_of_unknown_function_(DUF3000)_CDS
Prephenate_dehydrogenase_CDS
Protein_of_unknown_function_(DUF2029)_CDS
Formamidopyrimidine_DNA_glycosylase_CDS
Predicted_acyltransferase_CDS
probable_S_adenosylmethionine_dependent_methyltransferase_YraL_family_CDS
phosphomethylpyrimidine_kinase_CDS
ABC_type_multidrug_transport_system_ATPase_component_CDS
ABC_type_polysaccharide_polyol_phosphate_export_systems_permease_component_CDS
hypothetical_protein_CDS
Anaerobic_dehydrogenases_typically_selenocysteine_containing_CDS
coenzyme_F420_0_gamma_glutamyl_ligase_(EC_6_3_2_31)_CDS
6_phosphogluconolactonase_(EC_3_1_1_31)_CDS
tRNA_(5_methylaminomethyl_2_thiouridylate)_methyltransferase_(EC_2_1_1_61)_CDS
serine_threonine_protein_kinase_CDS
transketolase_(EC_2_2_1_1)_CDS
Uncharacterized_conserved_protein_(some_members_contain_a_von_Willebrand_factor_type_A_(vWA)_domain)_CDS
Helicase_conserved_C_terminal_domain_CDS
97
3_dehydroquinate_synthase_(EC_4_2_3_4)_CDS
methionyl_tRNA_formyltransferase_(EC_2_1_2_9)_CDS
hypothetical_protein_CDS
K+_transport_systems_NAD_binding_component_CDS
selenocysteine_specific_translation_elongation_factor_SelB_CDS
DNA_segregation_ATPase_FtsK_SpoIIIE_and_related_proteins_CDS
hypothetical_protein_CDS
Protein_of_unknown_function_(DUF3263)_CDS
leucyl_tRNA_synthetase_(EC_6_1_1_4)_CDS
6_7_dimethyl_8_ribityllumazine_synthase_(EC_2_5_1_78)_CDS
succinate_dehydrogenase_subunit_A_(EC_1_3_5_1)_CDS
phosphate_ABC_transporter_ATP_binding_protein_PhoT_family_(TC_3_A_1_7_1)_CDS
Uncharacterized_conserved_protein_CDS
Uridine_kinase_CDS
hypothetical_protein_CDS
ribonuclease_Rne_Rng_family_CDS
Transposase_IS116_IS110_IS902_family_Transposase_CDS
1_deoxy_D_xylulose_5_phosphate_reductoisomerase_(EC_1_1_1_267)_CDS
selenophosphate_synthase_(EC_2_7_9_3)_CDS
98