De Novo Genome Assembly of the

De Novo Genome Assembly of the Economically
Important Weed Horseweed Using Integrated Data from
Multiple Sequencing Platforms1[C][W][OPEN]
Yanhui Peng, Zhao Lai, Thomas Lane, Madhugiri Nageswara-Rao, Miki Okada, Marie Jasieniuk,
Henriette O’Geen, Ryan W. Kim, R. Douglas Sammons, Loren H. Rieseberg, and C. Neal Stewart, Jr.*
Department of Plant Science, University of Tennessee, Knoxville, Tennessee 37996 (Y.P., T.L., M.N.-R., C.N.S.);
Department of Biology, Indiana University, Bloomington, Indiana 47405 (Z.L., L.H.R.); Department of Plant
Sciences (M.O., M.J.) and Genome Center (H.O., R.W.K.), University of California, Davis, California 95616;
Monsanto, Inc., St. Louis, Missouri 63130 (R.D.S.); and Department of Botany, University of British Columbia,
Vancouver, British Columbia, Canada V6T 1Z4 (L.H.R.)
Horseweed (Conyza canadensis), a member of the Compositae (Asteraceae) family, was the first broadleaf weed to evolve
resistance to glyphosate. Horseweed, one of the most problematic weeds in the world, is a true diploid (2n = 2x = 18), with
the smallest genome of any known agricultural weed (335 Mb). Thus, it is an appropriate candidate to help us understand the
genetic and genomic bases of weediness. We undertook a draft de novo genome assembly of horseweed by combining data from
multiple sequencing platforms (454 GS-FLX, Illumina HiSeq 2000, and PacBio RS) using various libraries with different insertion
sizes (approximately 350 bp, 600 bp, 3 kb, and 10 kb) of a Tennessee-accessed, glyphosate-resistant horseweed biotype. From
116.3 Gb (approximately 3503 coverage) of data, the genome was assembled into 13,966 scaffolds with 50% of the assembly =
33,561 bp. The assembly covered 92.3% of the genome, including the complete chloroplast genome (approximately 153 kb) and a
nearly complete mitochondrial genome (approximately 450 kb in 120 scaffolds). The nuclear genome is composed of 44,592
protein-coding genes. Genome resequencing of seven additional horseweed biotypes was performed. These sequence data were
assembled and used to analyze genome variation. Simple sequence repeat and single-nucleotide polymorphisms were surveyed.
Genomic patterns were detected that associated with glyphosate-resistant or -susceptible biotypes. The draft genome will be
useful to better understand weediness and the evolution of herbicide resistance and to devise new management strategies. The
genome will also be useful as another reference genome in the Compositae. To our knowledge, this article represents the first
published draft genome of an agricultural weed.
In the past few years, genomic approaches have revolutionized plant biology. Complete or draft genome data
are currently available for tens of plant species (http://
www.phytozome.net). From the model plant Arabidopsis (Arabidopsis thaliana; Arabidopsis Genome Initiative, 2000) to important crop plants such as rice (Oryza
sativa; Goff et al., 2002; Yu et al., 2002; International Rice
Genome Sequencing Project, 2005), soybean (Glycine max;
Schmutz et al., 2010), maize (Zea mays; Schnable et al.,
2009), and chickpea (Cicer arietinum; Varshney et al., 2013)
1
This work was supported by the U.S. Department of AgricultureNational Institute of Food and Agriculture, Monsanto, the National
Science Foundation (grant no. DBI–0820451), and the University of
California, Davis, Genome Center.
* Address correspondence to [email protected].
The author responsible for distribution of materials integral to the
findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is:
C. Neal Stewart, Jr. ([email protected]).
[C]
Some figures in this article are displayed in color online but in
black and white in the print edition.
[W]
The online version of this article contains Web-only data.
[OPEN]
Articles can be viewed online without a subscription.
www.plantphysiol.org/cgi/doi/10.1104/pp.114.247668
to economic woody plants such as wine grape (Vitis vinifera; Jaillon et al., 2007) and poplar (Populus trichocarpa;
Tuskan et al., 2006), powerful complete genome data sets
and tools allow the unprecedented ability to explore
the genetics and genomics of plant form and function.
Furthermore, genome sequences of additional plant
species will soon be available from in-progress largescale sequencing projects (http://www.ncbi.nlm.nih.
gov/genomes/PLANTS/PlantList.html).
One critical group of plants that has been largely
overlooked in the genomics revolution consists of economically significant agricultural weeds (Stewart, 2009;
Stewart et al., 2009). Weeds cause about $36 billion annual damage in the United States alone (Pimentel et al.,
2000). The cost is even higher if one includes weeds in
pastures, golf courses, aquatic environments, etc. Unchecked, weeds effectively outcompete crops for resources and decrease the yield of food, feed, and fiber.
Despite their profound economic significance, weedy
plant genomics data are very scarce relative to other
economically important plants.
Weeds experienced surreptitious accidental domestication (Warwick and Stewart, 2005) and adaptation to
changing agricultural environments to become among
the world’s most successful plants. They are persistent,
Plant PhysiologyÒ, November 2014, Vol. 166, pp. 1241–1254, www.plantphysiol.org Ó 2014 American Society of Plant Biologists. All Rights Reserved. 1241
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2014 American Society of Plant Biologists. All rights reserved.
Peng et al.
adaptable, stress tolerant, competitive, and capable of
extreme levels of reproduction (e.g. some species can
produce one million seeds per plant). Herbicide application has been the dominant weed-control measure in
recent decades, but increasingly, the evolution of herbicide resistance is challenging this practice. More than
400 biotypes of weeds have evolved resistance to one or
more of all the major groups of herbicides (Heap, 2014),
among which resistance to glyphosate is currently of
greatest concern. Horseweed (Conyza canadensis), which
is in the Compositae (Asteraceae) family, was the first
eudicot weedy species to evolve glyphosate resistance
and is also one of the most widely distributed glyphosateresistant species in the world (VanGessel, 2001; Heap,
2014). Horseweed is a true diploid (2n = 18), with a selffertilizing mating system and the smallest genome of any
agricultural weed (335 Mb; Stewart et al., 2009). We
sought to describe the reference genome of horseweed
in order to enable genomic approaches to elucidate the
mechanism of nontarget herbicide resistance and to gain a
better understanding of the evolution and spread of herbicide resistance across agricultural landscapes. Weed
genomics is critical to understand weed biology, and
understanding weed biology is critical in weed management. To date, transcriptomes using next-generation sequencing of a few weedy species have been produced to
aid the discovery of genes that contribute to herbicide
resistance and other weediness traits, with most progress
occurring in Conyza and Amaranthus spp. and Lolium
rigidum (Lee et al., 2009; Peng et al., 2010; Riggins et al.,
2010; Lai et al., 2012; Gaines et al., 2014).
Horseweed shares many other weediness features
with the world’s most damaging weeds. These features
include a relatively short vegetative phase, indeterminate flowering, self-compatibility, high seed output,
long-distance seed dispersal, competitiveness with crop
plants, a deep root system, discontinuous dormancy,
environmental plasticity, and allelopathy (Basu et al.,
2004; Chao et al., 2005). Also, horseweed is amenable to
genetic transformation and regeneration (Halfhill et al.,
2007), which facilitates overexpression or knockdown
analysis of potential gene targets. Autogamous, a single
horseweed plant may produce well over 200,000 seeds.
The small seeds have pappi that enable wind and
water dispersal (Bhowmik and Bekech, 1993; Weaver
2001) to sometimes very long distances (DeVlaming and
Proctor, 1968; Weaver and McWilliams, 1980; Andersen,
1993; Dauer et al., 2006; Shields et al., 2006). Moreover,
horseweed seeds are not dormant and can germinate
immediately after maturity in the fall, spring, or midsummer (Bhowmik and Bekech, 1993; Weaver, 2001).
Thus, horseweed is a bona fide weed with model plant
qualities.
The Compositae is the largest and most diverse plant
family, with over 24,000 described species. Despite
the evolutionary success and economic importance of
plants in this family, a draft whole-genome publication
for any member of the family or for any plants from
closely related families is lacking. However, genomes
for lettuce (Lactuca sativa; https://lgr.genomecenter.
ucdavis.edu), sunflower (Helianthus annuus; http://
www.sunflowergenome.org), and several other members of the family are available, but de novo genome
assembly can often be arduous in many cases when
genomes are large and complex (Kane et al., 2011; Truco
et al., 2013). A draft sequence of a streamlined Compositae genome would greatly advance the genomics and
genetics of economically, ecologically, and evolutionarily
important plants within this important plant family.
In this study, in order to further increase sequence
resources for horseweed, we performed de novo genome
sequencing and assembly of the Tennessee glyphosateresistant horseweed biotype TN-R using multiple sequencing platforms, including 454 GS-FLX, Illumina
HiSeq 2000 and PacBio RS. Using a modified strategy
that has proven effective in vertebrates (Li et al.,
2010a), date palm (Phoenix dactylifera; Al-Dous et al.,
2011), and flax (Linum usitatissimum; Wang et al., 2012),
the draft genome of horseweed was assembled into
scaffolds. Genomes and transcriptomes of this and another seven horseweed biotypes were also sequenced,
and the data were used to investigate genome variation and to facilitate molecular marker development
and gene discovery. The primary goal of this project
was to build substantial genomic resources for horseweed, which will subsequently be useful to elucidate
the genomic basis of weediness traits.
RESULTS
De Novo Sequencing and Assembly
The reference genome source was DNA extracted
from six individual plants of a single glyphosate-resistant
biotype that was originally collected in 2002 from a
soybean field in Jackson, TN (TN-R; Mueller et al.,
2003). Comprehensive genome shotgun sequences were
obtained using seven libraries with insert sizes that
ranged from 350 bp to 10 kb and three sequencing platforms (454 GS-FLX, Illumina HiSeq 2000, and PacBio RS;
Table I). Two plates of 454 GS-FLX runs on three independent libraries (approximately 600 bp) produced
2,238,807 raw reads with an average read length of approximately 390 bp, for a yield total of approximately 860
Mb data. Two lanes of HiSeq 2000 runs (2 3 100 bp) of
two short paired-end libraries produced 786,389,990 raw
reads, for a yield total of approximately 77 Gb data. One
lane of a HiSeq 2000 run (2 3 100 bp) of one 3-kb matepaired library produced 381,844,926 raw reads, for a
yield total of approximately 37 Gb data. Ten SMRT cells
of PacBio RS runs produced 513,084 sequences with an
average read length of 3.1 kb, which yielded 1.44 Gb of
total data (Table II). All the sequence data together represented approximately 3503 coverage of the horseweed
genome.
De novo genome assembly is often not straightforward (Baker, 2012), in that many factors impact the
integrity and accuracy of the draft assembly. To integrate data from multiple sequencing platforms, we
used a de novo assembly strategy (Supplemental Fig.
1242
Plant Physiol. Vol. 166, 2014
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2014 American Society of Plant Biologists. All rights reserved.
De Novo Genome Assembly of Horseweed
Table I. Summary of sequencing read yield results from multiple platforms and mate-paired libraries of various sizes (approximately 350 bp, 600 bp,
3 kb, and 10 kb)
Platform
Libraries
Read Length
Filtered Reads
Data (Approximate Coverage)
454 GS-FLX
Illumina PE
Illumina MP
PacBio RS
Total
600 bp (3)
350 bp (2)
3 kb (1)
10 kb (1)
Seven libraries
390 bp
2 3 100 bp
2 3 100 bp
3.1 kb
0.1–10 kb
2,238,807
786,389,990
381,844,926
513,084
1,170,986,807
860 Mb (2.53)
77 Gb (2303)
37 Gb (1103)
1.4 Gb (4.53)
116.3 Gb (3503)
S1). Factors that could impact the assembly quality
were tested, including Kmer (the possible subsequences
of length k from reads obtained through DNA sequencing effort) size, sequencing depth and coverage, and the
hybrid assembly of multiplex data (Supplemental Fig.
S2). Increased coverage could increase assembly quality
by increasing the 50% of the assembly (N50) value and
assembled genome size and may also reduce the numbers of contigs. However, as the coverage reached 503,
the effect of this factor was limited. For a given coverage,
lower Kmer size translated into increased sensitivity, but
with more assembly errors, while higher Kmer size
translated into highly specific, accurate, and reliable assembly (Supplemental Fig. S2). Therefore, given adequate coverage, higher Kmer provided better assembly.
To mitigate potential problems, we chose deep coverage
(greater than 503), high Kmer (more than 25 nucleotides) cutoff for accuracy, and multiple sequencing platforms and library types. Indeed, the data that were
integrated among multiple Next Generation Sequencing
platforms and libraries with different insert sizes increased
the quality of the de novo assembly (Supplemental Fig.
S2). Newbler was used, since it was designed specifically
for assembling sequence data generated by the 454 GSFLX (Margulies et al., 2005), whereas SOAPdenovo is
among the most popular tools to assemble Illumina short
reads (Li et al., 2010b). Using the PacBio RS sequencing
platform, reads of over 10 kb in length were produced.
However, the higher error rate prevented the direct use of
PacBio reads in assembly (Supplemental Fig. S3). After
error correction by mapping high-quality, but short, Illumina reads, the inclusion of the long reads from PacBio
increased the quality of de novo genome assembly by
using it for either the primary assembly or the guidance of
scaffolding (Supplemental Figs. S3–S5). The filtered and
high-quality 454 GS-FLX single reads and high-coverage
Illumina paired-end reads were combined to generate de
novo contigs. Mate-pair libraries included true mates
mixed with chimeras and also paired ends. To find those
true mates with correct orientation and distance, the de
novo-assembled contigs were used as reference for the
mapping of mate-pair reads (Supplemental Table S1;
Supplemental Fig. S6). The total size of the assembled
contigs was 311.3 Mb, in which N50 was composed of
5,122 sequences of 20,764 bp or longer (Fig. 1; Table II).
The contigs were further assembled to 344.9-Mb scaffolds,
in which the N50 increased to 33,561 bp (3,575 sequences).
Without gaps, the assembly contained 92.3% genome
coverage. The horseweed genome contained a 34.9% G/C
base ratio, which was higher than sequenced woody
eudicot genomes but equivalent to other eudicot genomes and much lower than monocot genomes (Fig.
1C; Supplemental Table S2). The G/C content of coding regions was 38.6%, which was higher than that of
the entire genome (Fig. 1D). The assembled draft of
horseweed genome data has been submitted to the
National Center for Biotechnology Information with
accession number SUB535309.
To inspect the de novo assembly accuracy, multiple
independent sources of horseweed DNA and complementary DNA sequences were used to align with the
assembled genome (Table III). These sequences included
EST reads from Sanger sequencing, transcriptome shotgun reads from 454 and Illumina sequencing, and partial
genomic reads, which were used for the genome assembly (Zhou et al., 2009; Peng et al., 2010; Yuan et al.,
2010; sample identifiers WSYE and NUSE from the OneThousand Plant Transcriptomes project [www.onekp.
com]). Over 90% of sequences could be aligned to the
genome. The mapping and BLAST varied among sequence reads. However, in the aligned sequences, 91.2%
or more had over 90% coverage and identity. The nonperfect alignments indicated that EST reads could be
used to further improve assembly accuracy and gene
prediction.
Identification of Repetitive Elements and Annotation of
Protein-Coding Genes
By using the plant repeat database (RepeatMasker
libraries 20140131) and RepeatMasker 4.0.5, a total of
233,521 loci (19.5 Mb) were identified to be various nucleotide repeat elements. These sequences represented
6.25% of the assembled genome (Table IV) and were
composed of 172,825 simple sequence repeats (SSRs),
Table II. Summary of the de novo genome assembly of horseweed
N90 and N70, 90% and 70% of the assembly, respectively.
Parameter
Contigs
Scaffolds
Sequence number
20,075
13,966
8,698; 14,599 13,506; 10,318
N90 (bp; n)
N70 (bp; n)
14,517; 8,886 23,243; 6,219
N50 (bp; n)
20,764; 5,122 33,561; 3,575
Average length (bp)
16,258
26,546
Maximum single sequence (bp)
102,072
182,395
Base pairs (Mb)
311.27
344.88
Plant Physiol. Vol. 166, 2014
1243
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2014 American Society of Plant Biologists. All rights reserved.
Peng et al.
Figure 1. De novo genome assembly of horseweed. A, Assembly units
were sorted by descending length,
and the cumulative DNA content
was plotted as a function of the total
number of assembly units. B, Length
distributions of de novo-assembled
contigs. C, G/C content distributions
in horseweed computed over a bin
size of approximately 500 bp. D, G/C
content distribution of unique coding
DNA sequences.
28,494 low-complexity elements, 23,425 different types of
retroelements, 5,424 small RNA structures, 2,283 DNA
transposons, and 675 unclassified elements. Retroelements were the largest mobile elements found in the
genome (2.61%), of which most were long terminal
repeat-type retroelements (2.54%). Small RNAs contributed 0.61% to the genome, whereas identified DNA
transposons represented just 0.16% of the genome.
Masked SSR loci were analyzed using the Simple
Sequence Repeat Identification Tool (http://archive.
gramene.org/db/markers/ssrtool) with a threshold
minimum of six repeats. The results included 51,892
loci (Supplemental Table S3) containing 44,589 dimer
(85.9%), 6,864 trimer (13.2%), 299 tetramer (0.58%), 60
pentamer (0.12%), and 80 hexamer (0.15%) motifs. The
majority of these loci contained six to nine repeats
(Supplemental Fig. S7). Considering the position in
scaffold, repeat number, and length, 1,316 loci were
identified to have appropriate sequences to develop
PCR primers, which could be used as SSR markers
(Okada et al., 2013).
Using previous EST and transcriptome data as a
starting point (Zhou et al., 2009; Peng et al., 2010; Yuan
et al., 2010), 44,592 horseweed gene models were
predicted. A total of 25,439 unique transcripts were
returned after removing redundancies. The longest
gene (using coding sequence only and without tallying
introns) was composed of 17,120 bp. It was annotated
to encode a midasin-like protein. The mean and median transcript lengths were 2,210 and 1,830 bp, respectively. Over 95% of the transcripts were longer
than 715 bp (Fig. 2A). The predicted gene sequences
were further compared with the UniProt and National
Center for Biotechnology Information nonredundant
protein databases for assigning biological information.
Using a threshold e value of 1024 or less, 79.6% of these
Table III. Alignment of ESTs, transcriptome sequencing (RNAseq), and genomic shotgun sequences to scaffolds
The threshold of alignment coverage and identity was 90% or greater.
Sequence Type
Sequences
Aligned
Aligned
Alignment Coverage
n
Sanger ESTs (more than 600 bp)
454 GS-FLX RNAseq (more than 200 bp)
Illumina RNAseq (100 bp)
454 GS-FLX genomic reads (more than 390 bp)
Illumina genomic reads (100 bp)
5,482
145,796
3,987,575
499,955
9,315,014
Alignment Identity
%
5,012
132,579
3,591,820
450,754
8,476,173
91.4
90.9
90.1
90.2
91.0
91.2
92.1
91.6
$95
$95
1244
93.7
92.6
94.5
$99
$99
Plant Physiol. Vol. 166, 2014
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2014 American Society of Plant Biologists. All rights reserved.
De Novo Genome Assembly of Horseweed
Table IV. Repeat element analysis in the horseweed genome
For elements, most repeats were fragmented by insertions or deletions and have been counted as one element.
Repeat Elements
Elements
Length
n
bp
Retroelements
23,425
8,509,514
Short interspersed
29
1,433
elements
Long interspersed
902
219,700
elements
Long terminal repeat
22,889
8,288,381
elements
Yeast transposon
15,599
4,999,694
family1/Copia
Gypsy/Dictyostelium
7,110
3,276,883
intermediate repeat
sequence1
DNA transposons
2,283
531,765
Hobo-Activator
337
61,039
Tc1-IS630-Pogo
719
166,937
Enhancer/suppressor
2
644
mutator
Tourist/Harbinger
418
137,376
Unclassified
675
278,725
Small RNAs
5,424
2,003,521
Simple repeats
172,825
7,669,927
Low complexity
28,494
1,400,133
Total
233,521 20,393,585
Percentage of
Genome
2.61
0.00
0.07
2.54
1.54
1.00
0.16
0.02
0.05
0.00
0.04
0.09
0.61
2.35
0.43
6.25
(20,248 of 25,439) had at least one hit to the database.
The majority of the alignments had an identity of 50%
or greater (Fig. 2C). Within the annotated genes, 18,406
returned hits at e values of 10210 or less. Moreover,
most of the hits had only one high-scoring segment
pair with the aligned genes (Fig. 2, B and D). Overall, the
majority of the predicted genes in the assembled horseweed genome could be aligned with hits in a known
protein database with high identity and specificity, which
indicated that most of these genes were likely proteincoding sequences.
Gene Ontology (GO) annotation resulted in 14,897 GOannotated genes. The largest gene category with molecular function was kinase activity (1,780), following by
nucleotide binding (1,740), transporter activity (1,031),
and another 15 subgroups with 50 or more terms
(Supplemental Fig. S8A). Among genes annotated to
various biological processes, the largest subgroup was
stress-response genes (5,446), followed by cellular component organization-related genes (4,582), transport
(3,799), and 27 additional groups with 50 or more terms
(Supplemental Fig. S8B). The potential cellular localization
of the genes was predicted by assigning 17,878 cellular
component GO terms to various queries (Supplemental
Fig. S8C). Genes that encoded proteins with enzyme
activity were further divided into six groups (transferases, hydrolases, oxidoreductases, ligases, lyases, and
isomerases; Supplemental Fig. S8D).
Fisher’s exact test showed 10 GO terms that were
significantly overrepresented in horseweed compared
with Arabidopsis (Fig. 3). Enrichment analysis using
the GO terms as well as related gene families included
vacuolar transport, photosynthetic acclimation, detoxification of nitrogenous compounds, response to UV-B
light, chloroplast thylakoid lumen, protein targeting to
chloroplasts, protein peptidyl-prolyl isomerization,
peptidyl-prolyl cis-trans-isomerase activity, hydrolase
activity (acting on carbon-nitrogen), glycolysis, and
NAD(P)H dehydrogenase complex assembly. Four
transporter subgroup GO terms were overrepresented
in the horseweed genome at the P , 0.001 level: drug
transmembrane transporters (GO:0015238), xanthine
transmembrane transporters (GO:0042907), uracil transmembrane transporters (GO:0015210), and allantoin uptake transporters (GO:0005274; Supplemental Fig. S9).
However, these data are not sufficient to draw conclusions about biological function and the evolution of
herbicide resistance; further gene expression analysis and
other follow-on experiments need to be performed.
Gene families that are commonly associated with
nontarget herbicide resistance include cytochrome P450s,
glutathione S-transferases (GSTs), glycosyltransferases
(GTs), and ATP-binding cassette (ABC) transporters
(Yuan at al., 2007). Relative to Arabidopsis, the horseweed genome has more members in each of these families (Table V). There were 323 unique cytochrome P450
genes at 401 loci in horseweed, which represents a 26%
increase over Arabidopsis. Also, 155 ABC transporter
genes at 213 loci were found in horseweed, which is 14%
more than in Arabidopsis. Horseweed had 6% more GTs
than Arabidopsis. There were 54 horseweed GSTs, which
is one more gene than is found in Arabidopsis (Table V).
Plastid and Mitochondrial Genomes
Plastid genome sequences from 454 GS-FLX reads
were isolated by searching against known plastid genome databases and then subjected to de novo assembly.
The assembled plastid genomes were further mapped
with accurate Illumina reads to fix homopolymer errors.
The entire chloroplast genome (approximately 153 kb;
Supplemental Fig. S10) and most of the mitochondrial
genome (approximately 450 kb in scaffolds) were obtained (Supplemental Table S4; Supplemental Data Sets S1
and S2). The horseweed chloroplast genome contained
two inverted repeats (24,936 bp each), a large single-copy
fragment (84,634 bp), and a small single-copy fragment
(18,063 bp). The orientation of the two inverted repeats
was determined by PCR. A total of 95 protein-coding
genes (88 unique), 39 tRNA genes (28 unique), and
eight ribosomal RNA genes (four unique) were annotated (Supplemental Data Set S1). These genes comprised 59.7% of the chloroplast genome. The remainder,
the noncoding portion of the chloroplast genome,
was composed of introns, intergenic spacers, and pseudogenes. The largest gene was the ycf2 gene, which was
6,085 bp. The smallest gene was a 34-bp tRNA gene. The
G/C content in the chloroplast genome was 37.2%. The
chloroplast genome was nearly identical to that of lettuce
Plant Physiol. Vol. 166, 2014
1245
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2014 American Society of Plant Biologists. All rights reserved.
Peng et al.
Figure 2. Annotation of protein-coding transcripts. A, Distribution of lengths of predicted transcripts. B, Identity distribution of
alignments with hits in the database. C, E value distribution of aligned hits. D, Distribution of high-scoring segment pairs (HSPs)
from aligned hits.
and sunflower, except in the orientation of the two
inverted repeat elements (Supplemental Fig. S11). The
lettuce chloroplast has reverse inverted repeat elements compared with horseweed and sunflower.
The horseweed mitochondrial genome reads from
454 GS-FLX, Illumina paired-end, and true-mate pairs
were parsed and assembled into 123 scaffolds (N50 =
10,057 bp), with the help of plant mitochondrial databases. The scaffold sizes ranged from 315 to 43,498
bp, for a total of 453,334 bp in the mitochondrial genome (Supplemental Fig. S12; Supplemental Data Set
S2), which was moderately sized compared with other
plant mitochondrial genomes.
Genome Resequencing and Genomic Variation Analysis
We resequenced four population pairs that were
either glyphosate resistant (R) or susceptible (S) from
the United States. The representative populations used
for resequencing were from California (CA-R versus
CA-S), Delaware (DE-R versus DE-S), Indiana (IN-R
versus IN-S), and Tennessee (TN-R versus TN-S). Individual seedlings surviving a glyphosate application
were used for eventual DNA donors, wherein DNA was
pooled for sequencing (Table VI; Supplemental Fig. S13).
Within two HiSeq 2000 flow cell lanes, a total 55.6 Gb
(1703 coverage) of paired-end reads was produced, and
each of the seven additional genomes was assembled
using TN-R as the reference genome (Table VI). Interpopulation genome variation included multiple-nucleotide
variation, which was defined as 4 bp or less, with equal
numbers of nucleotides at each locus, single-nucleotide
variation, insertions/deletions, and replacement (Table
VII). When a minimum cutoff value (20 times or greater
coverage at the variant loci, 20% or greater frequency) was
applied to avoid calling errors, the frequency of detected
variants in the genome varied among biotypes, ranging
from 0.75 to 1.59 counts per kb (mean of 1.01; Table VII).
Compared with the TN-R reference genome, CA-R had
the fewest variants, followed by, in ascending order, IN-R,
CA-S, IN-S, DE-R, TN-S, and DE-S. IN-R, IN-S, and TN-S
had fewer variants relative to each other compared
with the rest of the other biotypes. Compared with their
1246
Plant Physiol. Vol. 166, 2014
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2014 American Society of Plant Biologists. All rights reserved.
De Novo Genome Assembly of Horseweed
Figure 3. Differential GO term distribution by enrichment analysis of the horseweed and Arabidopsis genomes using the
Fisher’s exact test. The entire transcriptome in horseweed was set as the test data set, and the entire transcriptome in Arabidopsis
was set as the reference data set (P # 0.001). [See online article for color version of this figure.]
partners (DE-S and CA-S, respectively), DE-R and CA-R
had the fewest variants. Similarly, IN-R had fewer variants
than IN-S (Supplemental Table S5). The minimum average
quality score for the position of each variant was 30 (one
error in 1,000; Fig. 4A), which ensured that the results
were quite reliable. To determine whether a variant was
homozygous, we applied a threshold of 90% probability
and found that IN-R and CA-R had the highest frequency
of heterozygosity (nearly 80%), whereas only 50% of variants were heterozygous in the DE-R, DE-S, and TN-S
biotypes (Fig. 4B). Most of the sequence variation among
samples was composed of single-nucleotide variations/
single-nucleotide polymorphisms (SNPs) in all eight biotypes (Fig. 4C). Fisher’s exact test showed that a total of
2,010 specific variants were found in all four glyphosateresistant biotypes, which were distributed among 1,370
loci. Only 539 specific variants were found in all four
glyphosate-susceptible biotypes, and these were located at
425 loci (Fig. 4D). Since the physical map of the horseweed
genome was not available, the location of each specific
variant in the sorted scaffold arrays instead of in the genome was listed (Supplemental Fig. S14). A genome-wide
association study requires SNP information from the genomes of many control and test individuals. Specific SNPs
in glyphosate-resistant and glyphosate-susceptible genomes
were located in hotspots, which were defined as the
ones with low P values (P # 0.01), and might be associated with the phenotypes of glyphosate resistance
(Supplemental Fig. S14). However, these findings are
not definitive, given the low sample sizes of biotypes
analyzed.
Phylogenetic analysis of whole-genome SNP variation provided a phylogeographic hypothesis regarding
the evolution of glyphosate resistance and its spread
(Fig. 5). Biotypes from proximate locations generally
shared the same clades, such as the Delaware pairs
and the Indiana pairs, which is consistent with our
previous studies (Yuan et al., 2010; Okada et al., 2013).
Table V. Analysis of gene families (cytochrome P450s, GSTs, ABC
transporters, and GTs) that are potentially involved in nontarget
glyphosate resistance in horseweed compared with those tallied for
Arabidopsis
Loci data refer to the total potential copy numbers in the gene family.
Gene Family
Arabidopsis
Cytochrome P450s
GSTs
ABC transporters
GTs
256
53
136
361
Plant Physiol. Vol. 166, 2014
Horseweed
323
54
155
381
(401 Loci)
(54 Loci)
(213 Loci)
(457 Loci)
1247
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2014 American Society of Plant Biologists. All rights reserved.
Peng et al.
Table VI. Summary of genome resequencing of seven additional
horseweed biotypes using the Illumina HiSeq 2000 platform
R, Resistant; S, susceptible.
Population Identifier
Accession Location
Phenotype
Coverage
CA-S
CA-R
DE-S
DE-R
IN-S
IN-R
TN-S
Fresno, CA
Fresno, CA
Georgetown, DE
Georgetown, DE
Knox County, IN
Knox County, IN
Jackson, TN
S
R
S
R
S
R
S
363
193
243
283
163
163
303
Although not in the same clade, the TN-R branch was
proximate to TN-S. The position of CA-S on the phylogram was basal, which might be of biological importance with regard to gene flow in the species. The
phylogenetic pattern suggests that glyphosate resistance in horseweed has evolved independently multiple times (Fig. 5).
5-Enolpyruvylshikimate-3-Phosphate Synthase Genes
In the horseweed genome, there are two 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS) gene copies (EPSPS1 and EPSPS2); thus, no gene amplification
was observed. Eleven EPSPS1 and 32 EPSPS2 variants
were detected (Supplemental Fig. S15). However, no
EPSPS mutation cosegregated with glyphosate resistance
in our data. Furthermore, none of the observed EPSPS
sequence variants have been previously reported to be
associated with resistance. There also were no significant
differences in EPSPS gene expression level among these
eight biotypes in response to glyphosate treatment,
based on an RNAseq study (Y. Peng, Y. Sang, S. Allen,
and C.N. Stewart, unpublished data). Therefore, we
conclude that glyphosate resistance in horseweed is
conferred by a non-target-site mechanism.
DISCUSSION
To our knowledge, horseweed is the first economically important weed to have its genome sequenced
and assembled. The de novo strategy using wholegenome shotgun sequencing in this study was similar
to those used in several recent reports (Al-Dous et al.,
2011; Wang et al., 2012; Varshney et al., 2013), except
that we also included third-generation sequencing
reads in our study. The 92.3% coverage of 13,996 scaffolds should be considered to be a draft genome comparable to rice, soybean, maize, chickpea, grape, poplar,
and date palm (Goff et al., 2002; Yu et al., 2002; Tuskan
et al., 2006; Jaillon et al., 2007; Schnable et al., 2009;
Schmutz et al., 2010; Al-Dous et al., 2011; Varshney et al.,
2013). Further improvements in assembly accuracy and
backfilling will require the use of either long-insertion
mate-pair reads or physical maps (International Rice
Genome Sequencing Project, 2005; Wang et al., 2012; AlMssallem et al., 2013).
The draft horseweed genome has proven to be immediately useful. We have successfully cloned 10 target promoters of interest to study glyphosate resistance
and response (data not shown). Moreover, 12 SSR loci
have already been tested and successfully used in a recent horseweed population genetics study (Okada et al.,
2013). Also, the genome sequences have been used to
annotate the explosive composition B (hexahydro-1,3,5trinitro-1,3,5-triazine and 2,4,6-trinitrotoluene) response
genes in Baccharis halimifolia (Ali et al., 2014). Furthermore, we observed that the horseweed chloroplast genome is nearly identical to those of sunflower and lettuce
(https://lgr.genomecenter.ucdavis.edu). Higher plants
have various mitochondrial genome architectures with
respect to size and construction compared with the chloroplast genome, which is more conserved (Tian et al.,
2006). Brassica carinata has a small mitochondrial genome
(232,241 bp), whereas Cucurbita pepo has a much larger
chloroplast genome (982,832 bp; Supplemental Table S4).
Next-generation sequencing technologies have been
widely used in whole-genome sequencing and resequencing, which has led to the development of rapid
genome-wide SNP detection applications in model and
crop plant species for exploring within-species diversity, genotyping by sequencing, construction of haplotype maps, and performing genome-wide association
studies of interesting traits (Craig et al., 2008; Huang
Table VII. Summary of genomic variation among the sequenced horseweed biotypes
For each biotype, genomic DNA was isolated from six individual plants. Variants having frequencies of 90% or greater were considered to be
homozygous, whereas variants having frequencies between 20% and 90% were considered to be heterozygous. MNV, Multiple-nucleotide variation;
SNV, single-nucleotide variation. *Value of average frequency.
Biotype
MNV
SNV
Deletion
Insertion
Replacement
Heterozygous
Homozygous
Total No.
Size
kb per Variant
bp
CA-R
CA-S
DE-R
DE-S
IN-R
IN-S
TN-R
TN-S
Total
7,069
13,253
11,215
13,024
9,890
10,394
6,650
9,725
81,220
188,525
279,016
352,016
419,703
273,165
298,866
174,915
330,880
2,317,086
35,475
10,329
33,265
41,670
16,479
20,132
48,417
58,920
264,687
3,664
5,407
10,712
11,843
6,049
7,360
3,646
10,432
59,113
621
1,035
1,210
1,255
832
965
617
1,043
7,578
184,596
235,416
228,343
234,044
248,044
270,989
155,282
205,585
1,762,299
50,758
73,624
180,075
253,451
58,371
66,728
78,963
205,335
967,305
235,354
309,040
408,418
487,495
306,415
337,717
234,245
410,920
2,729,604
1248
247,452
329,025
432,327
516,307
323,209
356,007
246,577
433,873
2,884,777
1.32
0.99
0.76
0.63
1.01
0.92
1.33
0.75
0.96*
Plant Physiol. Vol. 166, 2014
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2014 American Society of Plant Biologists. All rights reserved.
De Novo Genome Assembly of Horseweed
Figure 4. Summary of genome variants and distribution among biotypes. A, Total variant counts in different biotypes. B, Percentage of heterozygous and homozygous variants among biotypes. C, Distribution of variant types. Del, Deletion; Ins, insertion;
MNV, multiple-nucleotide variation; Rep, replacement; SNV, single-nucleotide variation. D, Number of specific variants in
glyphosate-resistant (GR) and glyphosate-susceptible (GS) populations. [See online article for color version of this figure.]
et al., 2009, 2010; Atwell et al., 2010; Todesco et al., 2010;
Wu et al., 2010; Kump et al., 2011; Deschamps et al., 2012).
With the reported horseweed genome data, 51,892 SSR
markers and over 2.7 million SNPs, as well as chloroplast
DNA markers, were identified. Although glyphosatesusceptible and glyphosate-resistant plants, obviously,
had opposite phenotypic responses to glyphosate treatment, 99.93% of their genomes were identical. We found
many SNPs in glyphosate-resistant or glyphosatesusceptible genomes that were located in hot pots,
which were defined as those with low P values (P #
0.01), and might be associated with glyphosate resistance. However, many SNPs will be silent at the protein
sequence level and not of interest for further study in
terms of biological functions. Other mutations involved
in gene expression regulation (e.g. within promoter regions) are of potential interest for elucidating resistance
and other traits. Interestingly, horseweed exhibited considerable morphological variability among accessions
(Supplemental Fig. S13); thus; integrating genomic information with resistance phenomena could be useful in
weed science not only in studying the rapid evolution of
herbicide resistance but also other traits.
The evolution of glyphosate resistance is a critical
process that, in most cases, has been elucidated with
regard to genomics and molecular mechanisms. For over
a decade, we have known that glyphosate resistance in
horseweed is a semidominant trait that is determined by
a single simple Mendelian locus (Zelaya et al., 2004). The
major focus of weed scientists to combat resistance has
been to explore more herbicide control strategies while
essentially ignoring the genomic and evolutionary bases
of resistance. With few genomic resources for weeds and
little expertise to utilize the available resources among
the weed science community (Stewart, 2009), discovering
the molecular mechanisms underlying nontarget resistance has been extremely difficult. However, now that
the genomics era has found its way to weed science, we
can begin to answer fundamental questions about what
makes weeds so weedy and capable of adapting to
control measures and, thereby, design approaches that
reduce the further evolution of herbicide resistance in
weeds (Basu et al., 2004; Stewart et al., 2009). The currently known mechanisms of glyphosate resistance
in weedy plants include EPSPS target-site mutations
(Baerson et al., 2002; Collavo and Sattin, 2012; Sammons
Plant Physiol. Vol. 166, 2014
1249
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2014 American Society of Plant Biologists. All rights reserved.
Peng et al.
Figure 5. Phylogenetic tree of sequenced horseweed biotypes based on
whole-genome single-nucleotide variation profiles using the shrunk-genomes
method of the program PhyloSNP (Faison et al., 2014) with a position delta of
zero. Bootstrap supports were all 100%, from 1,000 iterations. The tree was
built by using the neighbor-joining method in Phylip 3.695. The scale bar
indicates the number of genetic changes per unit of length.
and Gaines, 2014), EPSPS gene amplification (Gaines
et al., 2010, 2013; Jugulam et al., 2014), and vacuolar
sequestration of glyphosate (Ge et al., 2010, 2012, 2014);
however, in the last case, the genes conferring resistance
have not been identified.
Non-target-site resistance, especially via altered sequestration, is interesting both biologically and practically for weed management, but is not well characterized
at the genomic or molecular level for most weeds (Yuan
et al., 2007). This is the case with the many biotypes of
glyphosate-resistant horseweed as well as several other
glyphosate-resistant weedy plant species, such as johnsongrass (Sorghum halepense), ryegrass (Lolium spp.), velvet bean (Mucuna pruriens), and wild radish (Raphanus
raphanistrum; Mueller et al., 2003; Feng et al., 2004; Main
et al., 2004; Zelaya et al., 2004; Koger and Reddy, 2005;
Owen and Zelaya 2005; Preston and Wakelin, 2008; Ge
et al., 2010, 2012; Riar et al., 2011; Rojano-Delgado et al.,
2012; Vila-Aiub et al., 2012; Ashworth et al., 2014;
Sammons and Gaines, 2014). Physiologically, we know
that non-target-site resistance can involve a plethora of
metabolic, conversion, sequestration, and reduced translocation processes, including oxidation, conjugation, or
compartmentation of the herbicide molecules (Yuan
et al., 2007; Cummins et al., 2013; Iwakami et al., 2014).
Horseweed represents a model weedy plant with a nontarget-site glyphosate-resistant mechanism via altered
translocation or transport (Feng et al., 2004; Stewart et al.,
2009; Ge et al., 2010).
An important goal of effective weed management is
to stop or slow the evolution and spread of herbicide
resistance in weeds. Thus, it is vital to understand whether
glyphosate resistance originated once and spread from a
single source horseweed population or originated multiple
independent times within distinct populations. If resistance originated once and spread, say, via seeds, then the
molecular mechanisms would be expected to be the same
or similar among populations because of identical descent.
In that case, the prevention of seed dispersal and movement by machinery or other means would be an effective resistance management strategy. Alternatively, if
resistance originated multiple times and spread from
multiple sources, reduction in both seed dispersal and
selection pressure will be needed. Management would
also benefit from understanding evolutionary dynamics. In the case of horseweed, the evolution of glyphosate resistance occurred independently in multiple
locations and might be caused by more than one nontarget resistance gene (Yuan et al., 2010; Okada et al., 2013;
this study). Moreover, approximate Bayesian computation
(ABC) analyses of microsatellite marker variation indicated that resistant populations underwent expansion after greatly increased glyphosate use in California in the
1990s, but many years before it was detected, strongly
suggesting that diversity in weed-control practices prior to
herbicide regulation probably kept resistance frequencies
low (Okada et al., 2013). Data from this study also seem
not to support the long-range rapid spread of resistance
from a single source. At one time, the application of a
combination of herbicides simultaneously or in sequence
was considered to be an effective resistance management
strategy. However, resistance to multiple herbicides is
becoming increasingly common (Ashworth et al., 2014;
Heap, 2014), and this approach is no longer consistently
effective. Herbicides with new modes of action or more
diversified control measures are desperately needed. Understanding the genomic mechanisms underlying nontarget resistance would allow the development of novel
approaches, such as allelopathy and the expanded use of
safeners.
For the first time, we have genomics data to begin to
address the genetic basis of weediness. Do weeds have
unique genes that endow weediness or merely variants of
genes common to all plants? Most researchers have posited that the latter situation is most plausible (Basu et al.,
2004; Stewart et al., 2009). Indeed, our finding lends credence to the gene family expansion hypothesis, given that
nontarget resistance candidate gene families, ABC transporters, and cytochrome P450 genes, and to a lesser extent
GST and GT genes, were overrepresented in horseweed
relative to other published plant genomes. Weeds, at least
some of them, might have an inordinate adeptness for
rapid evolution that is manifested by herbicide resistance;
gene family expansion and/or changes in gene expression might be why weeds are weeds (Yuan et al., 2007;
Shaner, 2009; Ge et al., 2010; Peng et al., 2010; Gaines
et al., 2014; Sammons and Gaines, 2014). Other genes of
interest with enriched GO terms, such as vacuole transport, photosynthetic acclimation, and detoxification of
nitrogenous compounds, were found (Fig. 3). Finally,
horseweed seems to have relatively more protein-coding
genes (44,592; also supported by EST data) in comparison
with other sequenced plant genomes while maintaining
the smallest genome of all economically important weeds
(Sterck et al., 2007; Stewart et al., 2009; Paterson et al.,
2010). However, de novo genome assembly (contigs or
scaffolds) tends to underestimate repeat numbers and
overestimate gene numbers, because repeats are compacted on each other and genes are somewhat fragmentary and get annotated as two unique genes in some cases
(Chaisson and Pevzner, 2008; Baker, 2012; Seabury et al.,
1250
Plant Physiol. Vol. 166, 2014
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2014 American Society of Plant Biologists. All rights reserved.
De Novo Genome Assembly of Horseweed
2013). Repeat numbers in the horseweed genome are
lower than in most other sequenced plant genomes, in
part because horseweed has a relatively compact genome.
However, read-mapping analyses revealed that the read
depth of repeated regions was much higher than for
nonrepeats. This finding implies that repeat numbers are
underestimated in the horseweed genome and that a de
novo repeat-searching model will be needed for further
analysis.
The small genome size of horseweed and other economically important agricultural weeds might have a
practical and evolutionary significance. Twenty-three of
the 25 most-studied weedy plant species have 1C genome
sizes of less than 5,000 Mb, and 13 of these have genome
sizes of less than 2,000 Mb (Stewart et al., 2009). Recently,
a meta-analysis was performed in which invasiveness
among plant species was negatively associated with genome size and positively associated with chromosome
number (Pandit et al., 2014). This study analyzed 890
species among 62 genera. Even though it represents one
datum, horseweed appears to be, potentially, an archetypical weed in that it has a very streamlined genome but
with a large number of genes, which could give it the
capacity for rapid evolution in changing environments.
Thus, beyond its use for practical agricultural research,
the horseweed genome might shed light on standing
questions of invasion, evolution, and changing climates
(Caplat et al., 2013). The bane of farmers could be the
harbinger of genomic enlightenment.
MATERIALS AND METHODS
Plants and Phenotyping
Seeds sampled from horseweed (Conyza canadensis) populations were
germinated and grown in potting medium (3B potting medium, 10-cm-diameter
pot, one plant per pot; Supplemental Fig. S13) in a greenhouse under a 16-/8-h
light/dark photoperiod at ambient temperatures (25°C 6 2°C). Twenty-four
3-month-old plants from each population were treated with glyphosate at the
field rate (0.84 kg ha21; Roundup Weathermax; Monsanto). Treatment occurred
at the rosette stage when plants were 6 to 8 cm in diameter, and glyphosatetreated plants were considered to be resistant if they were alive at the end of
3 weeks (Yuan et al., 2010).
DNA Isolation and Genome Sequencing Using
Multiple Platforms
Genomic DNA was extracted from six individual plants of the glyphosateresistant biotype that was collected originally from a soybean field in Jackson,
TN (TN-R), using Plant DNeasy kits (Qiagen). Whole-genome shotgun sequencing was performed using the Illumina HiSeq 2000, 454 GS-FLX Titanium,
and PacBio RS sequencing platforms. A total of three sequencing libraries were
constructed with insert sizes of approximately 600 bp for the 454 GS-FLX
Titanium system. Two paired-end sequencing libraries with insert sizes of
approximately 350 bp and one mate-pair library with insert sizes of approximately 3,000 bp was constructed for the Illumina HiSeq 2000 system. One
library with insert sizes of approximately 10,000 bp was constructed for the
PacBio RS sequencing system.
De Novo Genome Assembly
The de novo genome assembly strategy is shown in Supplemental Figure
S1. The 454 GS-FLX reads were assembled with Newbler. The Illumina pairedend sequence reads were divided by sequencing depth (approximately 503)
using error-corrected PacBio long reads as guidance (Koren et al., 2012), and
subsets were assembled using SOAPdenovo and CLC Genomic Workbench.
NGen was used to combine the primary assembled contigs. CLC Genomic
Workbench was used to find the true mate pairs by mapping with the total
contigs. NGen was used for the final scaffold construction and gap closure. All
the parameter settings were described previously (Varshney et al., 2012; Wang
et al., 2012). To check the completeness of the assembly, we mapped ESTs and
transcriptome data to the genome assembly using BLASTN (with e value
cutoff of 10220).
Assembly of Chloroplast and Mitochondrial Genomes
The sequencing reads were screened against custom plant chloroplast and
mitochondrial genome databases. The subset sequences for chloroplast and mitochondria were assembled respectively. The complete chloroplast genome was
annotated using DOGMA (Wyman et al., 2004). To remove nonmitochondrial
assembly, the mitochondrial contigs were further screened against plant mitochondrial genome databases using BLASTN with e value cutoff of 1025.
Gene and Repeat Annotation
Gene prediction was performed using homology-based and transcriptbased methods. Previous horseweed ESTs and transcriptome data were
aligned to the genome assembly using BLAT (blat-34; identity $ 0.98, coverage $
0.98) to generate spliced alignments (Kent, 2002), which were linked according
to their overlap using PASA (Haas et al., 2003). Plant proteins of other species
were also mapped to the genome using TBLASTN with e value cutoff of 1025.
Predicted transcripts were assigned biological information by searching the
UniProt protein database using BLASTX with e value cutoff of 1023.
RepeatMasker version 4.0.5 (http://www.repeatmasker.org/) was used to
search putative transposable element regions of the horseweed genome
against an updated set of RepeatMasker libraries (20140131). To develop
SSR markers for further population genetic studies, the masked SSR loci
were further analyzed using the Simple Sequence Repeat Identification Tool
(http://archive.gramene.org/db/markers/ssrtool) with a threshold minimum of six repeats.
GO Annotation and Enrichment Analysis
Results of BLASTX of horseweed transcripts were further mapped using the
GO database to assign GO terms and carry out GO annotation using the
Blast2Go program integrated in CLC Genomic Workbench (http://www.
blast2go.com). Enrichment analysis was performed by using the entire set of
available Arabidopsis (Arabidopsis thaliana) transcripts as a reference. The
P value of Fisher’s exact test was set to P # 0.001 to reduce noise and output
the most specific terms.
Genome Resequencing and Variation Analysis
DNA samples of another seven horseweed biotypes were isolated using the
method described above. Paired-end DNA libraries (approximately 350-bp
insertion) were constructed for each biotype, bar coded, and subjected to
Illumina sequencing using the HiSeq 2000 platform. After trimming and
quality-control steps, the remaining data for each population were assembled
into genomes with the TN-R assembly as reference using CLC Genomic
Workbench 7.03. The probabilistic variant detection model was chosen to report
the variants. A minimum cutoff value (20 times or greater coverage at the
variant loci, 20% or greater frequency) was applied to avoid issues of sequencing and assembly errors in variant calling. Fisher’s exact test was carried
out to detect specific variants in glyphosate-susceptible or glyphosate-resistant
groups. The cutoff sample frequency was 90% or greater, which meant that
only variants present in all test samples, but not in any reference samples,
would be reported. To carry out phylogenetic analysis based on wholegenome single-nucleotide variation profiles among sequenced horseweed biotypes, the shrunk-genomes method of the program PhyloSNP (Faison et al.,
2014) with a position delta of zero surrounding each SNP was chosen. The
algorithm of a quantitative method was used, in which the created matrix of
presence/absence and the position of each SNP from the shrunk-genome
alignment can be used directly to generate a distance matrix among the biotypes. Finally, the tree was built by using the neighbor-joining method in
Phylip 3.695.
Plant Physiol. Vol. 166, 2014
1251
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2014 American Society of Plant Biologists. All rights reserved.
Peng et al.
Supplemental Data
The following materials are available in the online version of this article.
Supplemental Figure S1. Data flow of de novo genome assembly.
Supplemental Figure S2. Factors effect on the quality of de novo genome
assembly.
Supplemental Figure S3. Length and quality distribution of PacBio reads.
Supplemental Figure S4. PacBio reads rescue.
Supplemental Figure S5. The benefit of including PacBio long reads.
Supplemental Figure S6. True and false mate-paired reads.
Supplemental Figure S7. Repeats distribution of all SSR loci.
Supplemental Figure S8. GO annotation.
Supplemental Figure S9. Enriched GO terms in transporter subgroup.
Supplemental Figure S10. Map of horseweed chloroplast genome.
Supplemental Figure S11. Syntenic analysis of the horseweed chloroplast
genome.
Supplemental Figure S12. Horseweed mitochondrial genome.
Supplemental Figure S13. Morphological variation among horseweed
populations.
Supplemental Figure S14. Distribution of specific SNPs.
Supplemental Figure S15. Alignment of EPSPS protein sequences from
different horseweed biotypes.
Supplemental Table S1. Mapping of mate-paired reads.
Supplemental Table S2. GC content of plant genomes.
Supplemental Table S3. Summary of SSR markers.
Supplemental Table S4. Summary of plant mitochondrial genomes.
Supplemental Table S5. Variants within genomes among horseweed
biotypes.
Supplemental Data Set S1. Annotation of horseweed chloroplast genome.
Supplemental Data Set S2. Horseweed mitochondrial genome sequences.
ACKNOWLEDGMENTS
We thank Sara Allen, Laura Abercrombie, Qidong Jia, Guanglin Li, and
Nolan Kane for assistance with material sampling, sequencing, and data
collection. We also thank the providers of the original sources of germplasm,
including Anil Shrestha, David Grantz, Vince Davis, Robert M. Hayes, and
Barbara Scott.
Received July 29, 2014; accepted September 9, 2014; published September 10,
2014.
LITERATURE CITED
Al-Dous EK, George B, Al-Mahmoud ME, Al-Jaber MY, Wang H,
Salameh YM, Al-Azwani EK, Chaluvadi S, Pontaroli AC, DeBarry J,
et al (2011) De novo genome sequencing and comparative genomics of
date palm (Phoenix dactylifera). Nat Biotechnol 29: 521–527
Ali A, Zinnert JC, Muthukumar B, Peng Y, Chung SM, Stewart CN Jr
(2014) Physiological and transcriptional responses of Baccharis halimifolia to the explosive “composition B” (RDX/TNT) in amended soil.
Environ Sci Pollut Res Int 21: 8261–8270
Al-Mssallem IS, Hu S, Zhang X, Lin Q, Liu W, Tan J, Yu X, Liu J, Pan L,
Zhang T, et al (2013) Genome sequence of the date palm Phoenix dactylifera L. Nat Commun 4: 2274
Andersen MC (1993) Diaspore morphology and seed dispersal in several
wind-dispersed Asteraceae. Am J Bot 80: 487–492
Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of
the flowering plant Arabidopsis thaliana. Nature 408: 796–815
Ashworth MB, Walsh MJ, Flower KC, Powles SB (2014) Identification of
the first glyphosate-resistant wild radish (Raphanus raphanistrum L.)
populations. Pest Manag Sci 70: 1432–1436
Atwell S, Huang YS, Vilhjálmsson BJ, Willems G, Horton M, Li Y, Meng
D, Platt A, Tarone AM, Hu TT, et al (2010) Genome-wide association
study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465:
627–631
Baerson SR, Rodriguez DJ, Tran M, Feng Y, Biest NA, Dill GM (2002)
Glyphosate-resistant goosegrass: identification of a mutation in the target enzyme 5-enolpyruvylshikimate-3-phosphate synthase. Plant Physiol 129:
1265–1275
Baker M (2012) De novo genome assembly: what every biologist should
know. Nat Methods 9: 333–337
Basu C, Halfhill MD, Mueller TC, Stewart CN Jr (2004) Weed genomics:
new tools to understand weed biology. Trends Plant Sci 9: 391–398
Bhowmik PC, Bekech MM (1993) Horseweed (Conyza canadensis) seed
production, emergence, and distribution in no-tillage and conventional
tillage corn (Zea mays). Agron Trends Agric Sci 1: 67–71
Caplat P, Cheptou PO, Diez J, Guisan A, Larson BMH, MacDougall AS,
Peltzer DA, Richardson DM, Shea K, van Kleunen M, et al (2013)
Movement, impacts and management of plant distributions in response
to climate change: insights from invasions. Oikos 122: 1265–1274
Chaisson MJ, Pevzner PA (2008) Short read fragment assembly of bacterial
genomes. Genome Res 18: 324–330
Chao WS, Horvath DP, Anderson JV, Foley MP (2005) Potential model
weeds to study genomics, ecology, and physiology in the 21st century.
Weed Sci 53: 929–937
Collavo A, Sattin M (2012) Resistance to glyphosate in Lolium rigidum selected in Italian perennial crops: bioevaluation, management and molecular bases of target-site resistance. Weed Res 52: 16–24
Craig DW, Pearson JV, Szelinger S, Sekar A, Redman M, Corneveaux JJ,
Pawlowski TL, Laub T, Nunn G, Stephan DA, et al (2008) Identification of genetic variants using bar-coded multiplexed sequencing. Nat
Methods 5: 887–893
Cummins I, Wortley DJ, Sabbadin F, He Z, Coxon CR, Straker HE, Sellars
JD, Knight K, Edwards L, Hughes D, et al (2013) Key role for a glutathione transferase in multiple-herbicide resistance in grass weeds. Proc
Natl Acad Sci USA 110: 5812–5817
Dauer JT, Mortensen DA, Humston R (2006) Controlled experiments to
predict horseweed (Conyza canadensis) dispersal distances. Weed Sci 54:
484–489
Deschamps S, Llaca V, May GD (2012) Genotyping-by-sequencing in
plants. Biology (Basel) 1: 460–483
DeVlaming V, Proctor VW (1968) Dispersal of aquatic organisms: viability
of seeds recovered from the droppings of captive killdeer and mallard
ducks. Am J Bot 55: 20–26
Faison WJ, Rostovtsev A, Castro-Nallar E, Crandall KA, Chumakov K,
Simonyan V, Mazumder R (2014) Whole genome single-nucleotide
variation profile-based phylogenetic tree building methods for analysis of viral, bacterial and human genomes. Genomics 104: 1–7
Feng PCC, Tran M, Chiu T, Sammons RD, Heck GR, Jacob CA (2004)
Investigations into glyphosate-resistant horseweed (Conyza canadensis):
retention, uptake, translocation, and metabolism. Weed Sci 52: 498–505
Gaines TA, Lorentz L, Figge A, Herrmann J, Maiwald F, Ott MC, Han H,
Busi R, Yu Q, Powles SB, et al (2014) RNA-Seq transcriptome analysis
to identify genes involved in metabolism-based diclofop resistance in
Lolium rigidum. Plant J 78: 865–876
Gaines TA, Wright AA, Molin WT, Lorentz L, Riggins CW, Tranel PJ,
Beffa R, Westra P, Powles SB (2013) Identification of genetic elements
associated with EPSPs gene amplification. PLoS ONE 8: e65819
Gaines TA, Zhang W, Wang D, Bukun B, Chisholm ST, Shaner DL,
Nissen SJ, Patzoldt WL, Tranel PJ, Culpepper AS, et al (2010) Gene
amplification confers glyphosate resistance in Amaranthus palmeri. Proc
Natl Acad Sci USA 107: 1029–1034
Ge X, d’Avignon DA, Ackerman JJH, Collavo A, Sattin M, Ostrander EL,
Hall EL, Sammons RD, Preston C (2012) Vacuolar glyphosatesequestration correlates with glyphosate resistance in ryegrass (Lolium
spp.) from Australia, South America, and Europe: a 31P NMR investigation. J Agric Food Chem 60: 1243–1250
Ge X, d’Avignon DA, Ackerman JJH, Sammons D (2014) In vivo 31Pnuclear magnetic resonance studies of glyphosate uptake, vacuolar sequestration, and tonoplast pump activity in glyphosate-resistant horseweed. Plant
Physiol 166: 1255–1268
1252
Plant Physiol. Vol. 166, 2014
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2014 American Society of Plant Biologists. All rights reserved.
De Novo Genome Assembly of Horseweed
Ge X, d’Avignon DA, Ackerman JJH, Sammons RD (2010) Rapid vacuolar
sequestration: the horseweed glyphosate resistance mechanism. Pest
Manag Sci 66: 345–348
Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J,
Sessions A, Oeller P, Varma H, et al (2002) A draft sequence of the rice
genome (Oryza sativa L. ssp. japonica). Science 296: 92–100
Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr, Hannick LI,
Maiti R, Ronning CM, Rusch DB, Town CD, et al (2003) Improving the
Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 31: 5654–5666
Halfhill MD, Good LL, Basu C, Burris J, Main CL, Mueller TC, Stewart
CN Jr (2007) Transformation and segregation of GFP fluorescence and
glyphosate resistance in horseweed (Conyza canadensis) hybrids. Plant
Cell Rep 26: 303–311
Heap I (2014) The International Survey of Herbicide Resistant Weeds.
www.weedscience.com (May 2014)
Huang X, Feng Q, Qian Q, Zhao Q, Wang L, Wang A, Guan J, Fan D,
Weng Q, Huang T, et al (2009) High-throughput genotyping by wholegenome resequencing. Genome Res 19: 1068–1076
Huang X, Wei X, Sang T, Zhao Q, Feng Q, Zhao Y, Li C, Zhu C, Lu T,
Zhang Z, et al (2010) Genome-wide association studies of 14 agronomic
traits in rice landraces. Nat Genet 42: 961–967
International Rice Genome Sequencing Project (2005) The map-based
sequence of the rice genome. Nature 436: 793–800
Iwakami S, Uchino A, Kataoka Y, Shibaike H, Watanabe H, Inamura T
(2014) Cytochrome P450 genes induced by bispyribac-sodium treatment
in a multiple-herbicide-resistant biotype of Echinochloa phyllopogon. Pest
Manag Sci 70: 549–558
Jaillon O, Aury JM, Noel B, Policriti A, Clepet C, Casagrande A, Choisne
N, Aubourg S, Vitulo N, Jubin C, et al (2007) The grapevine genome
sequence suggests ancestral hexaploidization in major angiosperm
phyla. Nature 449: 463–467
Jugulam M, Niehues K, Godar AS, Koo DH, Danilova T, Friebe B, Sehgal
S, Varanasi VK, Wiersma A, Westra P, et al (2014) Tandem amplification of a chromosomal segment harboring EPSPS locus confers glyphosate resistance in Kochia scoparia. Plant Physiol 166: 1200–1207
Kane NC, Gill N, King MG, Bowers JE, Berges H, Gouzy J, Rieseberg LH
(2011) Progress towards a reference genome for sunflower. Botany 89: 429–437
Kent WJ (2002) BLAT: the BLAST-like alignment tool. Genome Res 12:
656–664
Koger CH, Reddy KN (2005) Role of absorption and translocation in the
mechanism of glyphosate resistance in horseweed (Conyza canadensis).
Weed Sci 53: 84–89
Koren S, Schatz MC, Walenz BP, Martin J, Howard JT, Ganapathy G,
Wang Z, Rasko DA, McCombie WR, Jarvis ED, et al (2012) Hybrid
error correction and de novo assembly of single-molecule sequencing
reads. Nat Biotechnol 30: 693–700
Kump KL, Bradbury PJ, Wisser RJ, Buckler ES, Belcher AR, OropezaRosas MA, Zwonitzer JC, Kresovich S, McMullen MD, Ware D, et al
(2011) Genome-wide association study of quantitative resistance to
southern leaf blight in the maize nested association mapping population. Nat Genet 43: 163–168
Lai Z, Kane NC, Kozik A, Hodgins KA, Dlugosch KM, Barker MS,
Matvienko M, Yu Q, Turner KG, Pearl SA, et al (2012) Genomics of
Compositae weeds: EST libraries, microarrays, and evidence of introgression. Am J Bot 99: 209–218
Lee RM, Thimmapuram J, Thinglum KA, Gong G, Hernandez AG,
Wright CL, Kim RW, Mikel M, Tranel PJ (2009) Sampling the waterhemp (Amaranthus tuberculatus) genome using pyrosequencing technology. Weed Sci 57: 463–469
Li R, Fan W, Tian G, Zhu H, He L, Cai J, Huang Q, Cai Q, Li B, Bai Y, et al
(2010a) The sequence and de novo assembly of the giant panda genome.
Nature 463: 311–317
Li R, Zhu H, Ruan J, Qian W, Fang X, Shi Z, Li Y, Li S, Shan G,
Kristiansen K, et al (2010b) De novo assembly of human genomes with
massively parallel short read sequencing. Genome Res 20: 265–272
Main CL, Mueller TC, Hayes RM, Wilkerson JB (2004) Response of selected horseweed (Conyza canadensis (L.) Cronq.) populations to glyphosate. J Agric Food Chem 52: 879–883
Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA,
Berka J, Braverman MS, Chen YJ, Chen Z, et al (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:
376–380
Mueller TC, Massey JH, Hayes RM, Main CL, Stewart CN Jr (2003) Shikimate accumulates in both glyphosate-sensitive and glyphosateresistant horseweed (Conyza canadensis L. Cronq.). J Agric Food Chem
51: 680–684
Okada M, Hanson BD, Hembree KJ, Peng Y, Shrestha A, Stewart CN Jr,
Wright SD, Jasieniuk M (2013) Evolution and spread of glyphosate
resistance in Conyza canadensis in California. Evol Appl 6: 761–777
Owen MD, Zelaya IA (2005) Herbicide-resistant crops and weed resistance
to herbicides. Pest Manag Sci 61: 301–311
Pandit MK, White SM, Pocock MJO (2014) The contrasting effects of genome size, chromosome number and ploidy level on plant invasiveness:
a global analysis. New Phytol 203: 697–703
Paterson AH, Freeling M, Tang H, Wang X (2010) Insights from the
comparison of plant genome sequences. Annu Rev Plant Biol 61: 349–372
Peng Y, Abercrombie LLG, Yuan JS, Riggins CW, Sammons RD, Tranel
PJ, Stewart CN Jr (2010) Characterization of the horseweed (Conyza
canadensis) transcriptome using GS-FLX 454 pyrosequencing and its
application for expression analysis of candidate non-target herbicide
resistance genes. Pest Manag Sci 66: 1053–1062
Pimentel D, Lach L, Zuniga R, Morrison D (2000) Environmental and
economic costs of nonindigenous species in the United States. Bioscience
50: 53–65
Preston C, Wakelin AM (2008) Resistance to glyphosate from altered
herbicide translocation patterns. Pest Manag Sci 64: 372–376
Riar DS, Norsworthy JK, Johnson DB, Scott RC, Bagavathiannan M
(2011) Glyphosate resistance in a johnsongrass (Sorghum halepense) biotype from Arkansas. Weed Sci 59: 299–304
Riggins CW, Peng Y, Stewart CN Jr, Tranel PJ (2010) Characterization of
waterhemp transcriptome using 454 pyrosequencing and its application
for studies of herbicide target-site genes. Pest Manag Sci 66: 1042–1052
Rojano-Delgado AM, Cruz-Hipolito H, De Prado R, Luque de Castro
MD, Franco AR (2012) Limited uptake, translocation and enhanced
metabolic degradation contribute to glyphosate tolerance in Mucuna
pruriens var. utilis plants. Phytochemistry 73: 34–41
Sammons RD, Gaines TA (2014) Glyphosate resistance: state of knowledge. Pest Manag Sci 70: 1367–1377
Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, Hyten DL,
Song Q, Thelen JJ, Cheng J, et al (2010) Genome sequence of the palaeopolyploid soybean. Nature 463: 178–183
Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, Liang C,
Zhang J, Fulton L, Graves TA, et al (2009) The B73 maize genome:
complexity, diversity, and dynamics. Science 326: 1112–1115
Seabury CM, Dowd SE, Seabury PM, Raudsepp T, Brightsmith DJ,
Liboriussen P, Halley Y, Fisher CA, Owens E, Viswanathan G, et al
(2013) A multi-platform draft de novo genome assembly and comparative
analysis for the scarlet macaw (Ara macao). PLoS ONE 8: e62415
Shaner DL (2009) the role of translocation as a mechanism of resistance to
glyphosate. Weed Sci 57: 118–123
Shields EJ, Dauer JT, VanGessel MJ, Neumann G (2006) Horseweed
(Conyza canadensis) seed collected in the planetary boundary layer. Weed
Sci 54: 1063–1067
Sterck L, Rombauts S, Vandepoele K, Rouzé P, Van de Peer Y (2007) How
many genes are there in plants (...and why are they there)? Curr Opin
Plant Biol 10: 199–203
Stewart CN Jr, editor (2009) Weedy and Invasive Plant Genomics. WileyBlackwell, Ames, IA
Stewart CN Jr, Tranel PJ, Horvath DP, Anderson JV, Rieseberg LH,
Westwood JH, Mallory-Smith CA, Zapiola ML, Dlugosch KM (2009)
Evolution of weediness and invasiveness: charting the course for weed
genomics. Weed Sci 57: 451–462
Tian X, Zheng J, Hu S, Yu J (2006) The rice mitochondrial genomes and
their variations. Plant Physiol 140: 401–410
Todesco M, Balasubramanian S, Hu TT, Traw MB, Horton M, Epple P,
Kuhns C, Sureshkumar S, Schwartz C, Lanz C, et al (2010) Natural
allelic variation underlying a major fitness trade-off in Arabidopsis thaliana.
Nature 465: 632–636
Truco MJ, Ashrafi H, Kozik A, van Leeuwen H, Bowers J, Wo SRC,
Michelmore RW (2013) An ultra-high-density, transcript-based, genetic
map of lettuce. G3 (Bethesda) 3: 617–631
Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U,
Putnam N, Ralph S, Rombauts S, Salamov A, et al (2006) The genome
of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313:
1596–1604
Plant Physiol. Vol. 166, 2014
1253
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2014 American Society of Plant Biologists. All rights reserved.
Peng et al.
VanGessel MJ (2001) Glyphosate-resistant horseweed from Delaware.
Weed Sci 49: 703–705
Varshney RK, Chen W, Li Y, Bharti AK, Saxena RK, Schlueter JA,
Donoghue MT, Azam S, Fan G, Whaley AM, et al (2012) Draft genome
sequence of pigeonpea (Cajanus cajan), an orphan legume crop of
resource-poor farmers. Nat Biotechnol 30: 83–89
Varshney RK, Song C, Saxena RK, Azam S, Yu S, Sharpe AG, Cannon S,
Baek J, Rosen BD, Tar’an B, et al (2013) Draft genome sequence of
chickpea (Cicer arietinum) provides a resource for trait improvement.
Nat Biotechnol 31: 240–246
Vila-Aiub MM, Balbi MC, Distéfano AJ, Fernández L, Hopp E, Yu Q,
Powles SB (2012) Glyphosate resistance in perennial Sorghum halepense
(johnsongrass), endowed by reduced glyphosate translocation and leaf
uptake. Pest Manag Sci 68: 430–436
Wang Z, Hobson N, Galindo L, Zhu S, Shi D, McDill J, Yang L, Hawkins S,
Neutelings G, Datla R, et al (2012) The genome of flax (Linum usitatissimum)
assembled de novo from short shotgun sequence reads. Plant J 72: 461–473
Warwick SI, Stewart CN Jr (2005) Crops come from wild plants: how domestication, transgenes, and linkage effects shape ferality. In J Gressel, ed,
Crop Ferality and Volunteerism. CRC Press, Boca Raton, FL, pp 9–30
Weaver SE (2001) The biology of Canadian weeds. 115. Conyza canadensis.
Can J Plant Sci 81: 867–875
Weaver SE, McWilliams EL (1980) The biology of Canadian weeds. 44.
Amaranthus retroflexus L., A. powellii S. Wats. and A. hybridus L. Can J
Plant Sci 60: 1215–1234
Wu X, Ren C, Joshi T, Vuong T, Xu D, Nguyen HT (2010) SNP discovery
by high-throughput sequencing in soybean. BMC Genomics 11: 469
Wyman SK, Jansen RK, Boore JL (2004) Automatic annotation of organellar genomes with DOGMA. Bioinformatics 20: 3252–3255
Yu J, Hu S, Wang J, Wong GKS, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang
X, et al (2002) A draft sequence of the rice genome (Oryza sativa L. ssp.
indica). Science 296: 79–92
Yuan JS, Abercrombie LG, Cao Y, Halfhill MD, Zhou X, Peng Y, Hu J,
Rao MR, Heck GR, Larosa TJ, et al (2010) Functional genomics analysis
of glyphosate resistance in Conyza canadensis (horseweed). Weed Sci 58:
109–117
Yuan JS, Tranel PJ, Stewart CN Jr (2007) Non-target-site herbicide resistance: a family business. Trends Plant Sci 12: 6–13
Zelaya IA, Owen MDK, Vangessel MJ (2004) Inheritance of evolved
glyphosate resistance in Conyza canadensis (L.) Cronq. Theor Appl Genet
110: 58–70
Zhou X, Su Z, Sammons RD, Peng Y, Tranel PJ, Stewart CN Jr, Yuan JS
(2009) Novel software package for cross-platform transcriptome analysis
(CPTRA). BMC Bioinformatics (Suppl 11) 10: S16
1254
Plant Physiol. Vol. 166, 2014
Downloaded from on June 17, 2017 - Published by www.plantphysiol.org
Copyright © 2014 American Society of Plant Biologists. All rights reserved.
CORRECTIONS
Vol. 166: 1241–1254, 2014
Peng Y., Lai Z., Lane T., Nageswara-Rao M., Okada M., Jasieniuk M., O’Geen H., Kim R.W.,
Sammons R.D., Rieseberg L.H., and Stewart C.N. Jr. De Novo Genome Assembly of the
Economically Important Weed Horseweed Using Integrated Data from Multiple Sequencing Platforms.
On p. 1243 of this article, the assembled draft of horseweed genome data submitted to the
National Center for Biotechnology Information is incorrectly listed as having the accession
number SUB535309.
The correct accession number for the Whole Genome Shotgun project deposited at DDBJ/
EMBL/GenBank is JSWR00000000.
www.plantphysiol.org/cgi/doi/10.1104/pp.114.900500
2218
Plant PhysiologyÒ, December 2014, Vol. 166, p. 2218, www.plantphysiol.org Ó 2014 American Society of Plant Biologists. All Rights Reserved.