Draft Assembly of Elite Inbred Line PH207 Provides

Plant Cell Advance Publication. Published on November 1, 2016, doi:10.1105/tpc.16.00353
LARGE-SCALE BIOLOGY ARTICLE
Draft Assembly of Elite Inbred Line PH207 Provides Insights into Genomic
and Transcriptome Diversity in Maize
Candice N. Hirscha,1, Cory D. Hirschb, Alex B. Brohammera, Megan J. Bowmanc, Ilya Soiferd, Omer
Barade, Doron Shem-Tove, Kobi Baruche, Fei Luf, Alvaro G. Hernandezg, Christopher J. Fieldsg,
Chris L. Wrightg, Klaus Koehlerh, Nathan M. Springeri, Edward Bucklerf,j, C. Robin Buellc,k,
Natalia de Leonl,m, Shawn M. Kaepplerl,m, Kevin L. Childsc,n, Mark A. Mikelg,o
a
Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108
Department of Plant Pathology, University of Minnesota, St. Paul, MN 55108
c
Department of Plant Biology, Michigan State University, East Lansing, MI 48824
d
Calico Labs, San Francisco, CA, 94080
e
NRGENE LTD., Ness-Ziona, Israel, 7403648
f
Instiute for Genome Diversity, Cornell University, Ithaca NY 14850
g
Roy J. Carver Biotechnology Center, University of Illinois, Urbana, IL 61801
h
Dow AgroSciences, Indianapolis, IN 46268
i
Department of Plant Biology, University of Minnesota, St. Paul, MN 55108
j
United States Department of Agriculture/Agricultural Research Services, Ithaca NY 14850
k
DOE Great Lakes Bioenergy Research Center, East Lansing, MI, 48824
l
Department of Agronomy, University of Wisconsin-Madison, Madison, WI, 53706
m
DOE Great Lakes Bioenergy Research Center, Madison, WI, 53706
n
Center for Genomics-Enabled Plant Sciences, Michigan State University, East Lansing, MI 48824
o
Department of Crop Sciences, University of Illinois, Urbana, IL 61801
1
Corresponding Author: [email protected]
b
Short title: PH207 genome assembly
One-sentence summary: Comparative analyses of the maize reference B73 genome assembly
and the newly assembled PH207 genome and their transcriptomes provides insights into
variation between heterotic groups of elite maize.
The author responsible for distribution of materials integral to the findings presented in this
article in accordance with the policy described in the Instructions for Authors
(www.plantcell.org) is: Candice Hirsch ([email protected])
ABSTRACT
Intense artificial selection over the last 100 years has produced elite maize (Zea mays) inbred
lines that combine to produce high-yielding hybrids. To further our understanding of how
genome and transcriptome variation contribute to the production of high-yielding hybrids, we
generated a draft genome assembly of the inbred line PH207 to complement and compare with
the existing B73 reference sequence. B73 is a founder of the Stiff Stalk germplasm pool, while
PH207 is a founder of Iodent germplasm, both of which have contributed substantially to the
production of temperate commercial maize and are combined to make heterotic hybrids.
Comparison of these two assemblies revealed over 2,500 genes present in only one of the two
genotypes and 136 gene families that have undergone extensive expansion or contraction.
1
©2016 American Society of Plant Biologists. All Rights Reserved
1
2
Transcriptome profiling revealed extensive expression variation, with as many as 10,564
differentially expressed transcripts and 7,128 transcripts expressed in only one of the two
genotypes in a single tissue. Genotype-specific genes were more likely to have tissue/conditionspecific expression and lower transcript abundance. The availability of a high-quality genome
assembly for the elite maize inbred PH207 expands our knowledge of the breadth of natural
genome and transcriptome variation in elite maize inbred lines across heterotic pools.
INTRODUCTION
3
Access to genome assemblies has ushered in a new era in plant biology beginning with the
4
genome assembly of the model species Arabidopsis thaliana (Arabidopsis Genome, 2000). Since
5
that time, over 80 A. thaliana genome assemblies have been completed, revealing genome
6
content and allelic variation, the importance of genotype-specific annotation, and the regulation
7
of gene expression (http://1001genomes.org; (Gan et al., 2011)). Rice (Oryza sativa) is the only
8
plant species other than Arabidopsis with a ‘gold standard’ genome assembly (International Rice
9
Genome Sequencing, 2005). As with A. thaliana, multiple de novo rice genome assemblies have
10
now been generated, and through these assemblies, several megabases of sequences present in
11
only one individual have been identified, many of which were annotated to contain genes (Schatz
12
et al., 2014). Beyond these studies, extensive structural variation including differences in gene
13
copy number and presence/absence variation has been documented in many species (Brunner et
14
al., 2005; Morgante et al., 2007; Ossowski et al., 2008; Gore et al., 2009; Springer et al., 2009;
15
Weigel and Mott, 2009; Lai et al., 2010; Swanson-Wagner et al., 2010; Cao et al., 2011; Gan et
16
al., 2011; Chia et al., 2012; Hansey et al., 2012; Anderson et al., 2014; Hirsch et al., 2014; Schatz
17
et al., 2014; Hirakawa et al., 2015; Hardigan et al., 2016).
18
In maize (Zea mays), comparative genome hybridization revealed pervasive copy number
19
variation (CNV) and presence/absence variation (PAV) between two heterotic inbred lines (B73
20
and Mo17), including a ~2.6 Mb region on chromosome 6 that was present in B73 and absent in
21
Mo17 (Springer et al., 2009; Belo et al., 2010). In an expanded panel of 34 maize and teosinte
22
lines examined using comparative genome hybridization, nearly 4,000 instances of CNV/PAV
23
were observed (Swanson-Wagner et al., 2010). Likewise, the first-generation maize HapMap
24
(haplotype map) estimated that the B73 genome represents only about 70% of the total low copy
25
sequence in the maize pan-genome (Gore et al., 2009), and the second-generation maize HapMap
26
also identified pervasive structural variation (Chia et al., 2012). Re-sequencing of six elite maize
27
inbred lines identified several hundred genes that exhibit PAV, and in some cases, show heterotic
2
28
group specificity (Lai et al., 2010). Transcriptome profiling of maize lines has also revealed
29
extensive differences in transcriptome content. Profiling of 21 diverse inbred lines identified
30
1,321 non-reference loci, of which 145 were heterotic-group specific (Hansey et al., 2012).
31
Transcript profiling of seedling tissue from 503 diverse inbred lines was used to characterize the
32
maize pan-genome and pan-transcriptome (Hirsch et al., 2014), and nearly 9,000 novel loci
33
absent in the B73 reference genome assembly were identified. Furthermore, a genome-wide
34
association study using transcript presence/absence and transcript abundance as the independent
35
variable showed complementary and unique loci compared to those obtained from a genome-
36
wide association study using single nucleotide polymorphism (SNP) markers (Hirsch et al.,
37
2014).
38
Maize has been intensively bred in the hybrid seed industry for nearly a century. Breeding
39
efforts in maize focus on improving inbred lines that combine well to produce high-yielding
40
hybrids and have resulted in the development of distinct heterotic germplasm pools. Indeed, elite
41
maize inbred lines have been generated that combine to form hybrid genotypes that yield 70-fold
42
more than the open pollinated varieties from which they originated (Troyer, 2006). Increases in
43
yield performance continue at a consistent rate, indicating persistence of extensive genetic
44
variation in elite maize germplasm. Maize heterotic pools consist primarily of a stiff stalk
45
heterotic pool from which the existing reference maize genome assembly, B73, is derived and
46
non-stiff stalk heterotic pools. In the past 20 years, Iodent germplasm has been a main
47
contributor to the non-stiff stalk heterotic pool. Iodent germplasm used today originated from
48
Pioneer Hi-Bred International germplasm in the 1940s (Troyer, 1999) and now permeates across
49
proprietary breeding programs. A key founder line to Iodent germplasm is the inbred line PH207.
50
The hybrid generated by crossing B73 and PH207 produces high parent heterosis across a
51
number of phenotypic traits (Table 1). Among the 788 U.S. Plant Variety Protection/utility
52
patent commercial corn inbred lines registered between 2009 and 2013, B73 (stiff stalk) was in
53
the pedigree of 327 lines and PH207 (Iodent) was in the pedigree of 441 lines (Mikel, 2011).
54
Among stiff stalks, 92% had B73 as an ancestor, and 91% of the non-stiff stalk lines were
55
descendants of the Iodent PH207.
56
A de novo genome assembly of the maize inbred line B73 was released in 2009 (Schnable
57
and Ware et al., 2009), and to date this has been the only publicly available complete maize
58
whole genome assembly. Previous studies of diversity in maize have been conducted in the
3
59
context of this single reference genome assembly, which has the potential to introduce a
60
reference bias. Here, we present the assembly of elite inbred line PH207, as well as a
61
comparative analysis between the genomes and transcriptomes of these elite maize inbred lines,
62
to further our understanding of natural variation present in elite maize inbred lines that have been
63
selected to combine and produce high-yielding hybrids. This study provides a high-quality
64
assembly to interrogate genome and transcriptome variation in maize, an important crop and
65
model species, and to begin to understand how this genomic variation contributes to the
66
phenotypic diversity and heterosis that is observed between elite inbred lines.
67
68
RESULTS
69
Assembly of the PH207 genome
70
Over 550 Gb of short read sequences from the inbred line PH207 were generated through
71
whole-genome shotgun sequencing using paired-end (PE), mate-pair (MP), and TruSeq synthetic
72
long-read genomic libraries with estimated fragment sizes ranging from 330 bp to 15 kb
73
(Supplemental Table 1). On the basis of 23-mer analysis, the genome size of PH207 was
74
estimated to be ~2.45 Gb (Supplemental Figure 1), which is comparable to the 2.3 Gb estimated
75
genome size of B73 reported by the maize genome sequencing consortium (Schnable and Ware
76
et al., 2009). Using the estimated size of 2.45 Gb, the short read sequences generated for the
77
PH207 assembly provide ~230X theoretical coverage of the genome.
78
A de novo assembly with an N50 scaffold size of ~654 kb with approximately 16% unfilled
79
gaps was generated (Supplemental Table 2). Most of the gaps spanned repetitive regions that
80
could not be assembled into scaffolds. The total size of the assembly was 2.1 Gb, of which 1.7
81
Gb was ungapped genomic sequence and 0.4 Gb was unfilled gaps. The PH207 de novo
82
assembled scaffolds were then organized into pseudomolecules based on the B73 reference
83
genome assembly using alignment of the scaffolds to the B73 assembly. In total, 1.99 Gb of the
84
PH207 assembly was placed into the pseudomolecules based on alignment to the B73 reference
85
assembly. Lu et al. (2015) described a genotyping-by-sequencing anchoring pipeline that utilized
86
linkage disequilibrium mapping coupled with machine learning to generate a set of 4.4M tags
87
that can be used as genetic anchors.
88
approximately 10 kb (average tag every 500 bp) and have been shown to be accurate for a given
89
line 95% of the time, with the other 5% most likely reflecting a species consensus position that is
These 4.4M tags provided a median resolution of
4
90
different from the line. Overall, about 54% of the anchors are within 10kb regions of their true
91
position, 95% within 1Mb, and 98.6% within 10Mb (Lu et al., 2015). To assess the quality of the
92
assembly, the PH207 de novo assembled scaffolds were processed using the pan-genome genetic
93
anchor pipeline (Lu et al., 2015) to flag scaffolds that may be the result of chimeras from
94
disparate chromosome locations; approximately 21% of the tags could be aligned. Using this
95
method, as well as the alignments of the PH207 scaffolds to the B73 genome, only 120 scaffolds
96
were identified as putative misassemblies and were corrected by splitting them at the junction of
97
the putative misassembly. This marginally decreased the assembly N50 scaffold size to ~630 kb
98
(Supplemental Table 2). This final assembly has been named Zm-PH207-REFERENCE_NS-
99
UIUC_UMN-1.0 and is available for download at the Dryad Digital Repository (DOI
100
10.5061/dryad.8vj84) and is also available at the Maize Genetics and Genomics Database
101
(MaizeGDB; (http://www.maizegdb.org) and at Phytozome (https://phytozome.jgi.doe.gov).
102
Comparison of the B73 reference genome assembly and the final PH207 genome assembly
103
revealed numerous structural variants between the two genomes. However, few large gaps in the
104
assemblies were observed (Figure 1A).
105
Several approaches were used to further validate the PH207 assembly completeness and
106
error rate. Genomic sequence reads used to generate the assembly were aligned back to the
107
assembly, and 97.1% of the paired-end short read sequences could align to at least one position
108
in the genome, demonstrating the completeness of the assembly. For the B73 reference genome
109
assembly, comparable mapping was observed, with 98.0% of paired-end short read sequences
110
aligning to at least one position in the genome. From these alignments, only 33,812 SNPs or
111
insertion/deletions (InDels) were identified from alignment of the paired-end short-read
112
sequences to the PH207 assembly, which equates to less than 0.002% of the total genome
113
assembly, and indicates a low error rate in the assembly. To evaluate completeness of the genic
114
space, two approaches were used, alignment of RNAseq reads to the assembly and the Core
115
Eukaryotic Genes Mapping Approach pipeline (Parra et al., 2007). From alignment of RNAseq
116
reads from six PH207 tissues (leaf blade, root cortical parenchyma, germinating kernel, root tip,
117
whole seedling, and root stele), 96.3% of reads on average could map to at least one position in
118
the genome, and 87.1% mapped to a single unique position (Supplemental Figure 2). Alignment
119
of B73 RNAseq reads from these same tissues to the B73 genome assembly resulted in 92.2% of
120
reads mapping to at least one position on average. The Core Eukaryotic Genes Mapping
5
121
Approach pipeline was run for both the PH207 and B73 assemblies and revealed similar
122
representation of the gene space between the assemblies (PH207: 91.5% complete genes and
123
99.2% partial genes; B73: 91.5% complete genes and 98.4% partial genes). Taken together, these
124
analyses demonstrate the high quality nature of the PH207 genome assembly and support its
125
utility in downstream comparative analyses with the existing B73 reference genome assembly.
126
127
Gene annotation of the PH207 genome
128
Structural gene annotation of the PH207 genome assembly was performed on all PH207
129
scaffolds greater than 500 bp using the MAKER-P pipeline (Campbell et al., 2014b). De novo
130
PH207 transcript assemblies of RNAseq reads from six different tissues (leaf blade, root cortical
131
parenchyma, root stele, germinating kernel, root tip, whole seedling) were used as transcript
132
evidence to aid in gene identification. Additionally, the predicted O. sativa proteome and
133
UniProtKB/Swiss-Prot plant proteins (minus Z. mays sp. proteins) were aligned to the PH207
134
genome assembly and used as evidence for gene prediction (Kawahara et al., 2013; UniProt
135
Consortium, 2014). Evidence from other maize accessions was not used in this annotation to
136
ensure that the annotation reflected genotype-specific annotation to reduce bias in downstream
137
comparisons between the PH207 and B73 gene space. In total, we identified 40,557 high-quality
138
gene models in PH207 supported by aligned transcript evidence, protein evidence, or the
139
presence of a Pfam domain (Finn et al., 2014). These gene models were distributed throughout
140
the genome (Figure 1B) in a distribution similar to that observed in the B73 reference genome
141
assembly (Schnable and Ware et al., 2009; Law et al., 2015). The annotation edit distance
142
analysis of the predicted gene set indicated that the gene annotations were generally well
143
supported (Supplemental Figure 3) (Eilbeck et al., 2009; Yandell and Ence, 2012). The annotated
144
transcripts from the PH207 gene set had an average length of 1,294 bp, with an N50 size of
145
1,685 bp, slightly smaller than the average annotated transcript in the B73 v3 filtered gene set
146
(1,559 bp with an N50 size of 1,910). This difference is likely due to better representation of the
147
untranslated regions in B73 resulting from the extensive transcript evidence that was used for the
148
B73 annotation. Indeed, the average predicted protein length of PH207 (353 amino acids) is
149
nearly identical to that of B73 (352 amino acids), suggesting that the PH207 sequence represents
150
a similar coding sequence as compared to B73. Regardless, the number of annotated genes in
151
PH207 was very similar to that of the B73 v3 gene set, which contains 39,301 nuclear genes.
6
152
Putative functional annotation of the predicted PH207 gene models was generated using
153
transitive annotations based on the best BLAST hit to the A. thaliana (33,388), Brachypodium
154
distachyon (37,364), O. sativa (37,201), and Sorghum bicolor (37,998) reference genome
155
annotations and the UniRef100 database (38,710). The functional annotations for many of the
156
genes in these plant species are derived from A. thaliana annotation directly or indirectly. Thus, a
157
plurality of agreement among datasets can in some cases be misleading for assigning functional
158
annotations. Additionally, the multigene family nature of many genes in plant species can result
159
in misannotation based on transitive annotation from best BLAST hits. Gene Ontology terms
160
were also assigned to 18,430 of the PH207 gene models (Supplemental Data Set 1).
161
162
Elite maize inbred lines exhibit extensive genome content variation including massive
163
expansion of gene families
164
Access to two high-quality genome assemblies allowed us to comprehensively explore
165
genome content variation between heterotic maize inbred lines, both in terms of
166
presence/absence as well as copy number variation. To identify dispensable genes between these
167
two lines, representative transcript sequences (longest transcript) from each genotype were
168
aligned to both genome assemblies. Within the subset of transcripts that could map to their
169
cognate genome, 5,291 B73 transcripts did not align to PH207 and 5,029 PH207 transcripts did
170
not align to B73. While both the B73 assembly and our PH207 assembly are high quality, these
171
assemblies are not 100% complete (Schnable and Ware et al., 2009; Lai et al., 2010; Hansey et
172
al., 2012; Hirsch et al., 2014). As such, direct comparison of the genome assemblies
173
overestimates the number of PAVs between the two inbred lines. Indeed, approximately half of
174
the PAVs identified through direct comparison of the assemblies had sequence reads from whole
175
genome sequencing that did not support the PAV classification.
176
To determine how many genes represent sequences absent in their respective genomes and to
177
provide a conservative estimation of genome PAV between B73 and PH207, all genes from each
178
genotype were categorized as present or absent based on the resequencing data (~11x average
179
realized coverage from B73 and ~5x average realized coverage from PH207). If a gene had
180
greater than 75% coverage in its cognate genotype and less than 25% coverage in the reciprocal
181
genotype at a minimum of 1x coverage, it was considered to be a genotype-specific gene. This is
182
a very strict criterion and eliminated a number of genes that are unmappable with short read
7
183
sequencing technologies; however, it provides a high confidence method for identifying
184
dispensable genes. Based on this criterion, 1,169 genes were B73 specific and 1,545 were PH207
185
specific. An enrichment of cellular response to stress was observed among the enriched Gene
186
Ontology terms within this set of PAV genes (Supplemental Data Set 2). Additionally,
187
dispensable genes tended to be shorter than genes that were present in both genotypes (Figure
188
2A). While a large number of genes demonstrated clear PAV, there are many genes that appear
189
to be partial deletions in the reciprocal genome (Supplemental Figure 4). Indeed, 12.2% of the
190
B73 genes were partially deleted in PH207 and 16.2% of the PH207 genes were partially deleted
191
in B73 (Figure 2B).
192
It has previously been suggested that dispensable genes provide an extreme case of
193
complementation, supporting the dominance model of heterosis, in which inferior recessive
194
alleles contributed by one parent are complemented by superior dominant alleles contributed by
195
the other parent (Fu and Dooner, 2002; Lai et al., 2010; Swanson-Wagner et al., 2010). Thus,
196
there should be an enrichment of dispensable genes in low recombination regions of the genome
197
(i.e., near centromeres). A significant enrichment of PAV genes in the pericentromeric regions
198
(30.9% of PAV genes) relative to all genes in the pericentromeric regions (25.9%) was observed
199
(chi-square p-value > 0.0001). In terms of total gene number, more genes annotated as PAV
200
between B73 and PH207 were located within high recombination regions of the genome (Figure
201
2C).
202
Genome content variation can also arise from differences in CNV leading to variation in gene
203
family size between genotypes. To evaluate the difference in gene family size between B73 and
204
PH207, genes from both genotypes were clustered using the OrthoMCL algorithm (Li et al.,
205
2003; Chen et al., 2007). As was expected, consistent gene family size was observed for the
206
majority of the genes in both genomes, with 19,697 genes showing a direct one-to-one
207
relationship (Figure 3A). Another 1,720 families showed only modest changes in gene family
208
size (sizes between 1-5 in both individuals; Figure 3B). Interestingly, substantial expansion and
209
contraction of gene families was observed for 136 families, where large expansion was defined
210
as being present in both individuals but having a difference of five or more family members
211
between the genotypes (Figure 3C). In fact, one family in PH207 contained 214 family members,
212
while B73 had only six genes in this family. Within the genotype that had the expanded family,
213
on average 39.7% of the genes within the family were expressed in at least one of six tissues
8
214
tested (leaf blade, root cortical parenchyma, root stele, germinating kernel, root tip, whole
215
seedling) when requiring reads to align uniquely. This number is likely an underestimation of the
216
true frequency of expression within these families due to the limitations of unique alignments
217
with short reads in large gene families (Hirsch et al., 2015). Across the families that showed
218
large changes in gene family size, an enrichment of Gene Ontology terms related to stress
219
response was observed, among other functions (Supplemental Data Set 2). A large proportion of
220
the additional gene family members were located in non-syntenic positions relative to the copies
221
found in the genotype with the smaller family size (Figure 3D). There are a number of helitrons
222
that are currently annotated as genes but are likely nonfunctional and susceptible to deletion.
223
These non-syntenic expanded families might be a reflection of the action of helitrons. However,
224
examples of expanded families that were completely clustered in a single location (Figure 3E),
225
both clustered and dispersed (Figure 3F), and completely dispersed (Figure 3G) were all
226
observed, indicating that these results cannot be explained simply by the annotation of helitrons
227
or fragments from helitrons as genes. The number of introns was evaluated to determine if the
228
additional copies were primarily the result of retrotransposed transcripts. Within each class
229
(clustered, clustered and dispersed, and dispersed), both genes without introns as well as genes
230
with at least one intron were observed (Figure 3E-G), indicating that retrotransposons are not the
231
primary mechanism driving the expansion of these gene families.
232
233
Genome content variation drives transcriptional variation between elite maize inbred lines
234
The availability of two maize genome sequence assemblies provided an opportunity to
235
evaluate sources of transcriptional variation between genotypes. We evaluated variation in the
236
transcriptome profiles of six tissues (described above) in the context of both the B73 and PH207
237
gene models (Supplemental Data Set 3). In total, 84.9% of genes were expressed in at least one
238
of the tissues in at least one of the two genotypes (Table 2). A relatively large number of genes
239
(9.4%) were consistently expressed in a genotype-specific manner, while another 23.3% were
240
genotype specific only in a subset of tissues. Furthermore, 52.5% of genes showed quantitative
241
differential expression between the two genotypes in at least one tissue, and 4.5% of genes
242
showed quantitative differential expression between the two genotypes across all of the tissues.
243
In total, 69.8% of genes showed either quantitative or qualitative variation in transcript
9
244
abundance in at least one of the six tissues that were evaluated, demonstrating that extensive
245
expression variation is present in the transcriptomes of elite maize inbred lines.
246
The comparison of the B73 and PH207 genome assemblies provided an opportunity to
247
evaluate the contribution of various types of genomic differences to the extensive transcriptional
248
variation that is present in highly selected complementary maize inbred lines. In total, 9.0% of
249
B73 models and 9.8% of PH207 models showed genotype-specific expression across all of the
250
tissues. For 11.6% of these genes, genomic-level PAV was also observed and was the basis for
251
the observed transcriptional variation. Genes showing genomic-level PAV had more tissue-
252
specific expression (Figure 4A) and had lower average expression when compared to genes that
253
were present in the genomes of both individuals (Figure 4B). This observation is further
254
supported using the B73 gene atlas, which includes transcript abundance estimates for 79 B73
255
tissues throughout development (Supplemental Figure 5; (Stelpflug et al., 2016)). Small
256
transcripts (less than 300 bp) are undersampled and therefore have lower estimated transcript
257
abundance due to technical biases resulting from size selection during library preparation (Hirsch
258
et al., 2015). However, based on the distribution of transcript size for genotype-specific and
259
shared genes, the lower observed expression of genotypic-specific genes was not a product of
260
this bias (Figure 4C). The number of genes that were not expressed in any tissue was slightly
261
higher for single exon genes (29.0%) relative to the total number of non-expressed genes
262
(18.5%), and there are therefore other contributing factors to this observed distribution.
263
Extensive quantitative variation in transcript abundance was also observed. Multiple levels of
264
genomic variation that were observed between B73 and PH207 could underlie this variation,
265
such as partial fragmentation of genes (Figure 2B, Supplemental Figure 4), variation in gene
266
family size that may lead to neo- and sub-functionalization at the expression level (Figure 3), and
267
polymorphisms in the promoters between the two genotypes. Of the 5,383 B73 and 7,991 PH207
268
genes that showed partial deletions in the reciprocal genome (Figure 2B), 1,883 and 2,835 were
269
differentially expressed in at least one tissue, respectively. However, transcript abundance
270
estimates and differential expression were calculated assuming common transcript lengths
271
between the genotypes. To test the impact of this on differential gene expression, we calculated
272
coverage-corrected transcript abundance estimates. Based on this correction, 20.5% of the genes
273
that were originally differentially expressed in at least one tissue were no longer differentially
274
expressed in any of the tissues for genes that had 25% or more of the gene model deleted and/or
10
275
unmappable. These results demonstrate how differences in genome annotation can create biases
276
that can affect downstream analyses such as differential gene expression analysis. Still, a large
277
number of genes that show partial deletion between the two genomes were differentially
278
expressed even after implementing a coverage correction.
279
Relationships between copy number variation, transcript abundance, and phenotypic
280
variation have been documented in multiple plant species on a single gene basis (Sutton et al.,
281
2007; Gaines et al., 2010; Cook et al., 2012; Maron et al., 2013; Wang et al., 2015). On a
282
genome-wide scale, within both B73 and PH207, we see a relationship between gene family size
283
and the number of tissues in which expression is observed (Figure 5). As family size increases
284
the number of tissues in which expression is observed tends to decrease. Additionally, 29.6% of
285
genes that were differentially expressed in at least one tissue were from variably sized gene
286
families.
287
Finally, comparison of the B73 and PH207 assemblies uniquely allowed us to evaluate the
288
impact of context sequence, specifically promoter SNP and InDel variation, on transcriptional
289
variation on a genome-wide scale. For 6,379 of the one-to-one genes (Figure 3A), the 1 kb of
290
promoter sequence immediately upstream of the transcription start site could be accurately
291
compared, and the range of sequence similarity between the promoters ranged from 82.1% to
292
100%, with 4,138 having at least one SNP or InDel in the 1 kb of promoter sequence (Figure
293
6A). Of these 4,138, 75.3% of the genes were differentially expressed in at least one tissue. As
294
the number of promoter variants increased, differential expression was observed in a larger
295
number of tissues for both the number of SNPs (Figure 6B) and the number of InDels (Figure
296
6C), indicating the importance of this local variation in driving transcriptional variation in these
297
elite inbred lines on a genome-wide scale.
298
299
DISCUSSION
300
Extensive genome content variation within maize has previously been documented (Gore et
301
al., 2009; Springer et al., 2009; Lai et al., 2010; Swanson-Wagner et al., 2010; Chia et al., 2012;
302
Hansey et al., 2012; Hirsch et al., 2014). However, these studies were limited by ascertainment
303
bias either by investigating only in the context of the reference B73 genome assembly (Springer
304
et al., 2009; Swanson-Wagner et al., 2010) or by the reduced representation approaches that were
305
implemented (Gore et al., 2009; Hansey et al., 2012; Hirsch et al., 2014). Additionally, these
11
306
studies focused on the broad diversity within maize, while the question of how much variation is
307
present within highly selected elite inbred lines remained unanswered. Access to genome
308
assemblies of the highly selected maize inbred lines B73 and PH207 allowed for a
309
comprehensive analysis of the extensive genome and transcriptome variation present in elite
310
maize germplasm following a century of intense selective pressure to improve grain yield.
311
The search for large-effect yield genes has been the target of many studies. In some cases,
312
these endeavors have proven fruitful (Park et al., 2014; Weber et al., 2014). However, extensive
313
genome content variation was observed between B73 and PH207 (3.4% of genes show genomic-
314
level PAV). Extensive PAV has previously been observed in other elite maize germplasm
315
(Springer et al., 2009; Lai et al., 2010; Swanson-Wagner et al., 2010; Chia et al., 2012; Hirsch et
316
al., 2014; Lu et al., 2015), and this estimate of PAV between maize inbred lines is quite similar
317
to a previous report where on average 2,044 non-reference transcripts were assembled in each of
318
503 inbred lines (Hirsch et al., 2014). Additionally, we observed extensive transcriptional
319
variation between B73 and PH207, with 32.7% of genes showing transcript PAV in at least one
320
tissue, and 52.5% of genes differentially expressed in at least one tissue. The presence of this
321
variation in elite maize inbred lines indicates that there are multiple combinations of genes that
322
can be combined to achieve high-yielding varieties. High yields and heterosis that have been
323
obtained in hybrid varieties to date could be rooted in the extensive genome and transcriptome
324
variation that persists in elite inbred lines from opposite heterotic groups, as has been previously
325
hypothesized (Birchler et al., 2003; Song and Messing, 2003; Birchler et al., 2006; Springer and
326
Stupar, 2007; Birchler et al., 2010; Lai et al., 2010; Hansey et al., 2012; Yao et al., 2013) and is
327
supported by comparative analysis of B73 and PH207 genome and transcriptome variation.
328
In addition to understanding variation in elite maize inbred lines, comparisons between the
329
B73 and PH207 assemblies also allowed for a number of practical lessons to be learned.
330
Comparative genomics studies between species are often conducted in the context of the
331
reference genome assemblies for each species (Dong et al., 2004; Goodstein et al., 2012; Monaco
332
et al., 2014). However, there is extensive genome content variation between individuals within a
333
species not accounted for in these analyses. Comparison of the B73 and PH207 assemblies
334
revealed the impact of a reference genome centric approach on interspecies comparative genomic
335
studies. Homologous gene clusters were identified between B. distachyon, maize B73, maize
336
PH207, O. sativa, and S. bicolor proteins encoded by the representative transcript (longest) for
12
337
each gene. Of the subset that included at least one maize protein (24,954 clusters), 2,703 clusters
338
contained only a B73 protein and 2,025 contained only a PH207 protein sequence (Figure 7A).
339
This finding highlights the importance of expanding comparative genomic studies beyond single
340
reference-to-reference comparisons as the maize pan-genome is substantially larger than the
341
genome sequence represented by B73 and PH207 (Hirsch et al., 2014), and already a large
342
number of new maize homologs have been identified.
343
Within a species, there is a loss of duplicate genes following whole genome duplication
344
events through a process known as fractionation. Maize is an ancient tetraploid that has
345
undergone genome fractionation. A previous study identified and classified genes in the B73
346
genome into two maize subgenomes, Maize1 and Maize2, and determined there to be differential
347
fractionation between Maize1 and Maize2 (Schnable et al., 2011). To evaluate the presence of
348
differential fractionation between individuals, the PH207 genome was mined for the presence of
349
missing B73 syntelogs in Maize1 or Maize2. Indeed, there is substantial differential fractionation
350
between B73 and PH207. For example, B73 gene GRMZM2G129575, located in a syntenic
351
block on Chr2 in Maize2, is missing from the Maize1 syntenic block on Chr7 (Schnable et al.,
352
2011). Based on sequence clustering using OrthoMCL, this gene clustered with two PH207
353
genes, one located on Chr2 and the other on Chr7, both in syntenic blocks with genes
354
neighboring GRMZM2G129575 (Figure 7B). Throughout the genome, over 1,265 families were
355
present in two copies in one of the two genomes and one copy in the other genome, likely due to
356
on-going fractionation between these genomes. This supports previous work conducted in the
357
context of the B73 genome assembly (Schnable et al., 2011) showing differential fractionation
358
between individuals following the maize whole genome duplication. This differential
359
fractionation is an important mechanism that generates natural variation within a species and
360
provides the genetic basis for selection to drive genome content variation between maize inbreds
361
from opposite heterotic groups. Additionally, these results highlight the importance of thinking
362
outside of a single reference genome in studies relating to genome fractionation following
363
polyploidization. Interestingly, many of the gene PAVs that were observed between B73 and
364
PH207 were found outside of the Maize1 and Maize2 syntenic blocks, indicating the presence of
365
additional mechanisms driving genome content variation in maize.
366
Finally, comparison of the two assemblies provided the means to assess two pitfalls that
367
often arise from the use of single reference genomes to make inferences across multiple
13
368
individuals within a species. There are many biases that exist with RNAseq for transcript
369
abundance estimates when using a single reference genotype (Hirsch et al., 2015); these biases
370
are further confounded outside of the reference genotype by the alternative gene model structures
371
and partial gene deletions that exist between individuals within a species. For example, transcript
372
abundance estimates for 15.9% of genes were highly influenced by deletion of more than 25% of
373
the gene model in the opposite genome, and 20.5% were falsely identified as differentially
374
expressed in one or more tissues based on this bias. Variable gene models between individuals
375
have also been shown to compensate for large-effect mutations in A. thaliana (Gan et al., 2011).
376
Based on alignment of PH207 resequencing reads to the B73 genome assembly, 53,377
377
moderate-to-large effect mutations were identified, of which 12,855 (24.1%) were located in
378
intronic sequences based on the PH207 specific annotation of the PH207 genome assembly
379
(Figure 7C; Supplemental Figure 6).
380
Access to multiple genome assemblies of elite maize inbred lines has expanded our
381
knowledge on the breadth of natural genome and transcriptome variation that persists in elite
382
maize inbred lines following over a century of intense breeding. This genome-wide
383
characterization of genomic variation and the relationship with transcriptomic variation provides
384
important information in our understanding of the molecular basis of heterosis and the breeding
385
community’s ability to continue to make gains from selection in highly selected maize
386
germplasm. With access to only two whole genome assemblies, many lessons were learned
387
regarding limitations in inter- and intra-species comparisons, transcriptome profiling, and
388
characterization of allelic variation in the context of a single reference genome assembly and
389
annotation that transcend beyond maize.
390
391
METHODS
392
Plant Material and DNA and RNA extraction
393
DNA was extracted from maize (Zea mays) inbred lines PH207 (PI 601005; Lot 03ncai01)
394
and B73 (PI 550473; Lot 08ncai02) from seedlings germinated from seed provided by the
395
National Plant Germplasm System Regional Plant Introduction Station in Ames Iowa using a
396
modified CTAB procedure (Murray and Thompson, 1980). Plants were grown under greenhouse
397
conditions (27°C/24°C day/night and 16 h light/8 h dark) in Metro-Mix 300 (Sun Gro
398
Horticulture) with no additional fertilizer and under fluorescent lights. DNA was quantified
14
399
using picogreen, absorbance at 260nm/280nm using a NanoDrop, and gel bands using 1.2% E-
400
Gel.
401
RNAseq reads for B73 leaf blade, cortical parenchyma, germinating kernel, root tip, whole
402
seedling, and stele were downloaded from the National Center for Biotechnology Information
403
(accession numbers PRJNA171684 and SRP010680). For each tissue, sequencing reads for three
404
biological replicates were obtained. Each biological replicate consisted of pooled tissue from
405
three individual plants. Tissue sampling and RNA extraction for PH207 leaf blade, cortical
406
parenchyma, germinating kernel, root tip, whole seedling, and stele were conducted as
407
previously described for the comparable B73 tissues (Stelpflug et al., 2016). For each tissue, two
408
biological replicates were collected and processed, again with three individual plants pooled per
409
biological replicate.
410
411
DNA and RNA Sequencing
412
PH207 DNA Sequencing. Shotgun genomic libraries of 300 bp and 800 bp DNA fragment sizes
413
were prepared using the TruSeq DNA Sample Preparation Kit version 2 according to the
414
manufacturer’s protocol (Illumina, San Diego, CA). A third shotgun library was made using the
415
same kit from DNA template fragments size selected from ~350 bp to ~450 bp with no PCR
416
amplification (PCR-free). This fragment size was designed to produce a sequencing overlap of
417
the fragments to be sequenced on the MiSeq as paired-end (PE) sequencing 250 nt per end, thus
418
creating an opportunity to produce ‘stitched’ reads of approximately 350 nt to 400 nt in length.
419
Multiple MP libraries per jump were made with the objective to increase sequence diversity and
420
genome coverage. Four separate MP libraries were constructed for each of the 3 kb and 8 kb
421
jumps, and 2 MP libraries for the 15 kb jump using the Illumina Nextera Mate-Pair Sample
422
Preparation Kit (Illumina, San Diego, CA).
423
The 300 bp and 800 bp shotgun libraries and the MP libraries were sequenced on an
424
Illumina HiSeq2000 as 100 nt PE reads. The pool of MP libraries was further sequenced on an
425
Illumina HiSeq2500 as 150 nt PE reads. The PCR-free shotgun library was sequenced on an
426
Illumina MiSeq as 250 nt reads. All sequencing was conducted at the Roy J. Carver
427
Biotechnology Center (Urbana, IL) at the University of Illinois, except for the 300 nt PE
428
genomic library, which was sequenced at Dow AgroSciences (Indianapolis, IN).
15
429
The synthetic long-read libraries were constructed, sequenced, and assembled by Illumina
430
Inc. Sequencing Services (San Diego, California). The PH207 genomic libraries were made
431
using the TruSeq Synthetic Long-Read DNA library preparation kit materials and workflow
432
(http://supportres.illumina.com/documents/documentation/chemistry_documentation/sampleprep
433
s_truseq/truseqsynthlongreaddna/truseq-synth-long-read-dna-library-prep-guide-15047264-
434
a.pdf). Each library consists of a barcoded pool of 384 indexes (one per each well of 10 kb
435
genomic DNA template fragments), and each of the 10 libraries was sequenced individually on
436
one HiSeq2000 lane as 100 nt PE reads.
437
438
B73 DNA Sequencing. A 300 bp DNA fragment size shotgun genomic library was prepared
439
using a TruSeq DNA Sample Preparation Kit version 2 according to the manufacturer’s protocol
440
(Illumina, San Diego, CA). The B73 genomic library was sequenced by Dow AgroSciences on
441
an Illumina HiSeq2000 as 100 nt PE reads.
442
443
PH207 RNA Sequencing. Approximately 5 µg of total RNA was processed for mRNA isolation,
444
fragmented, converted to cDNA, and PCR amplified according to the Illumina TruSeq RNA
445
Sample Prep Kit protocol and sequenced on an Illumina HiSeq 2500 (San Diego, CA) at the Roy
446
J. Carver Biotechnology Center (Urbana, IL) at the University of Illinois as 150 nt PE reads.
447
448
PH207 Genome Assembly
449
Reads pre-processing and error correction. For all PH207 genomic libraries, with the exception
450
of TruSeq synthetic long-reads, PCR duplicates were removed using FastUniq software (Xu et
451
al., 2012). The Illumina HiSeq2000 adaptor AGATCGGAAGAGC was removed, and reads were
452
error corrected using the Corrector_HA module of SOAPdenovo (using kmer size 23 and cutoff
453
of 6) (Luo et al., 2012). For the PCR-free library (MiSeq stitched reads), following adaptor
454
truncation, overlapping reads were merged using FLASH (Magoc and Salzberg, 2011) with a
455
minimal required overlap of 10 bp to create the stitched reads. Processing of MP reads consisted
456
of filtering out putative false mate-pairs by searching for the Nextera linker (10 nucleotides of
457
CTGTCTCTTATACACATCTAGATGTGTATAAGAGACAG) sequence on either end of the
458
MP. Mate pairs for which the linker was not found were sorted into a separate file for restricted
459
scaffolding application. Mate-pairs that did not hit the linker were used only in support of links
16
460
found with the filtered MPs, but were not used to create links independently. The TruSeq
461
synthetic long-read assembly pipeline application was processed by Illumina (Illumina IGN
462
FastTrack Long Reads version 41; Illumina, San Diego, CA) to create synthetic long-read builds
463
that represent long contiguous template fragments.
464
465
PH207 De Novo Scaffold Assembly. In the first step of assembly, SOAPdenovo v1.05 (Luo et
466
al., 2012) was used to construct a De Bruijn graph of contigs from the single end reads of the PE
467
library using very conservative settings (no bubble merge and no repeat masking, but removing
468
low coverage kmers and edges). A kmer of size 63 bp was optimal for De Bruijn graph
469
construction. To scaffold the contigs of the De Bruijn graph, non-repetitive contigs within the
470
graph were identified and assembled into scaffolds based on mapping information of the single
471
end reads, followed by application of synthetic long-reads to resolve long repeats in a similar
472
way. The mapping of the single read ends (mapping without gaps) and long-read builds
473
facilitated scaffolding by linking contigs mapping to the same read.
474
Scaffolding was completed using a directed graph containing scaffolds longer than 200 bp as
475
nodes, and edges were based on the PE and MP links as vertices. Erroneous connections were
476
filtered out to generate unconnected sub-graphs that were ordered into scaffolds. PE reads were
477
used to find reliable paths in the graph for additional repeat resolving. This was accomplished
478
through searching the De Bruijn graph for a unique path connecting pairs of reads mapping to
479
two different scaffolds. At this phase of scaffolding, scaffolds had an average size of thousands
480
of bases and almost no gaps. The scaffolds were then furthered ordered and linked using the MP
481
libraries, estimating gaps between the contigs according to the distance of MP links. Linking
482
scaffolds with MP reads required confirmation of at least three filtered MPs or at least one
483
filtered MP with supporting confirmation from two or more filter failed MPs where the Nextera
484
adaptor was not found. Scaffolds shorter than 200 bp were masked and links between non-
485
repetitive contigs mapping to the same scaffolds were united, generating a directed scaffold
486
graph. In agreement with previous reports, a significant amount of erroneous MPs was observed.
487
Since these pairs link between long non-branched components in the scaffold graph, the
488
scaffolding procedure identified the non-branched components and filtered out the rare
489
connections between them.
490
17
491
Reference Guided Construction of Pseudomolecules. Further ordering of scaffolds was
492
achieved through scaffold alignment to the B73 reference genome using BWA-SW version 0.6.1
493
(Li and Durbin, 2010) requiring alignment quality of at least one. The most probable genomic
494
location was assigned to the ordered scaffold based on the alignments of the unordered scaffolds
495
that generated it. The ordered scaffolds were then placed into pseudomolecules to maximize
496
synteny between the PH207 and B73 genomes. Finally, small scaffolds that mapped inside the
497
unfilled gap of ordered scaffolds replaced these gaps if MP links existed between them.
498
TruSeq synthetic long reads were aligned to scaffolds generated without usage of the TruSeq
499
synthetic long-reads. Scaffolds were identified that had a significant alignment of greater than 30
500
kb to two different locations greater than 10 Mb apart on the B73 reference genome. For those
501
scaffolds where the block aligning to the alternative locations was between two blocks that
502
aligned to the same location, it was assumed to be a translocation between the two genomes, and
503
it was considered unlikely there were two misassemblies at the same region. If a block aligned to
504
one location followed by a block aligning to another distant genomic location, it was considered
505
a misassembly. Suspicious scaffolds were further resolved using a previously defined
506
genotyping-by-sequencing anchor tag pipeline (Lu et al., 2015), and those flagged by both
507
criteria were manually reviewed for misassembly.
508
To assess the completeness of the genome assembly, PH207 genomic paired-end reads were
509
cleaned using Cutadapt version 1.8.1 (Martin, 2011) requiring a minimum length of 70 and a
510
minimum quality of 10, and aligned to the PH207 genome assembly using Bowtie version 2.2.4
511
(Langmead and Salzberg, 2012) with default parameters as single end reads to determine the
512
percentage of reads that could map to the assembly. Variants were called using Samtools
513
mpileup (Li et al., 2009) with a minimum depth of three and a maximum depth of 200. Only
514
variants exceeding a quality score of 20 were retained. To assess the completeness of the gene
515
space in the assemblies, PH207 RNAseq reads were aligned to the PH207 genome assembly and
516
B73
517
(ftp://ftp.ensemblgenomes.org/pub/plants/release-21). Reads were cleaned using Cutadapt
518
version 1.8.1 (Martin, 2011) requiring a minimum length of 75 nt and a minimum quality of 20.
519
All reads were trimmed to 75 nt using fastx_trimmer within the FASTX toolkit version 0.0.14
520
(http://hannonlab.cshl.edu/fastx_toolkit/index.html) for consistency between the B73 and PH207
521
reads. Reads were aligned using Bowtie2 version 2.2.4 (Langmead and Salzberg, 2012) and
RNAseq
reads
were
aligned
to
the
18
B73
version
3.21
genome
assembly
522
TopHat2 version 2.0.12 (Kim et al., 2013) with the maximum number of multihits set to 20,
523
minimum intron size of 10 bp, and maximum intron size of 60,000 bp. Additionally, the Core
524
Eukaryotic Genes Mapping Approach pipeline version 2.4 (Parra et al., 2007) was run using
525
default parameters. Comparison of the B73 and PH207 genome assemblies was completed using
526
the NUCmer program within MUMmer version 3.23 (Kurtz et al., 2004). The B73 version 3.21
527
assembly was used for the comparison, and the minimum length of a cluster match was set to
528
5,000 for the genome-wide plot and 250 for the regional view of chromosome 1.
529
530
PH207 Genome Annotation
531
Gene Model Structural Annotation. The MAKER genome annotation pipeline was used to
532
generate gene annotations (Campbell et al., 2014a; Campbell et al., 2014b; Law et al., 2015). The
533
PH207 genome assembly was first masked using REPEATMASKER and a maize custom repeat
534
library (Smit et al., 2013-2015; Law et al., 2015). Six PH207 transcript assemblies (leaf blade,
535
root cortical parenchyma, root stele, germinating kernel, root tip, and whole seedling) generated
536
using Trinity (Grabherr et al., 2011), the predicted O. sativa proteome and UniProtKB/Swiss-
537
Prot plant proteins (minus maize sp. proteins) were used as evidence during each stage of the
538
MAKER pipeline (Kawahara et al., 2013; UniProt Consortium, 2014). Initially, a Hidden
539
Markov Model (HMM) was trained for the SNAP ab initio gene prediction program using high-
540
quality transcript assembly alignments as gene proxies (Korf, 2004). MAKER was run using the
541
initial SNAP HMM, and these gene predictions were used to train SNAP a second time.
542
MAKER was run again using the second SNAP HMM to make gene predictions, and these gene
543
models were used to train an HMM for the Augustus gene prediction program (Stanke and
544
Waack, 2003). MAKER was run a final time using both the second SNAP HMM and the
545
Augustus HMM. The genes identified by MAKER included models with and without transcript
546
or protein alignment evidence. The predicted proteins from the MAKER gene set were analyzed
547
with hmmscan to identify Pfam domains (Eddy, 2011; Finn et al., 2014). A high-quality gene set
548
was created using all gene predictions that were supported by transcript or protein evidence or
549
that coded for a protein with a Pfam domain. Using this analysis pipeline, a single transcript was
550
annotated for each locus.
551
19
552
Gene Model Functional Annotation. All PH207 transcripts were functionally annotated using
553
transitive annotation based on best BLAST his from A. thaliana TAIR10 (Lamesch et al., 2012),
554
B. distachyon v2.1 (International Brachypodium, 2010), O. sativa release 7 (Ouyang et al.,
555
2007), and S. bicolor v1.4 (Paterson et al., 2009) annotations. For each species, TBLASTN was
556
used within WUBLAST (v2.0) (Altschul et al., 1990b) to search PH207 protein sequences
557
against a nucleotide database for each species. The results were filtered to retain only the top hit
558
requiring an E-value < 1e-5 to retain the hit. The same criteria were used to assign a UniRef100
559
release 2015_05, 29-Apr-2015 (Suzek et al., 2007) identifier to each PH207 protein, except that
560
BLASTP was used. The -goterms option of InterProScan version 5.0 (Zdobnov and Apweiler,
561
2001) was used to assign Gene Ontology terms to each PH207 transcript.
562
563
Genome Content Variation
564
To identify genotype-specific genes from each assembly, B73 version 3.21 and PH207
565
transcript sequences were aligned to both the B73 version 3.21 and PH207 genome assemblies
566
using GMAP version 2012-06-02 (Wu and Watanabe, 2005). A transcript with an alignment with
567
>85% coverage and >85% identity to its cognate genome and the reciprocal genome were
568
considered present in both genomes, those with >85% coverage and >85% identity to its cognate
569
genome and <85% coverage and <85% identity to the reciprocal genome were considered absent
570
in the reciprocal genome, and those with <85% coverage and <85% identity to its cognate
571
genome were not classified. Due to gaps in the genome assemblies, PAVs were also determined
572
using mapping of B73 and PH207 resequencing reads to both the B73 version 3.21 and PH207
573
genome assemblies. B73 and PH207 resequencing reads were mapped to the B73 version 3.21
574
assembly and the PH207 assembly using Bowtie2 version 2.2.3 (Langmead and Salzberg, 2012)
575
requiring a mapping quality score greater than 20. The portion of the bases with coverage were
576
determined using the intersect program within BEDTools version 2.19.0 (Quinlan and Hall,
577
2010). A bi-modal distribution of coverage was observed, with the majority of genes having
578
<25% coverage or greater than 75% coverage. Thus, any gene that had >75% of positions
579
covered by its own reads and <25% of positions covered by the opposite genotype reads was
580
considered PAV. This method was also used to classify partially deleted genes and the percent of
581
the gene that was deleted. For partial gene deletions, the percent of the gene that was deleted was
582
calculated as the percentage of the gene with coverage in the reciprocal genome divided by the
20
583
percent of the gene with coverage in the cognate genome. The comparison of gene density in
584
high and low recombination regions was determined using a chi-square test with previously
585
determined boundaries based on previously described B73/Mo17 near isogenic lines (Swanson-
586
Wagner et al., 2010; Eichten et al., 2011). Boundary coordinates were converted to B73 version
587
3.21 coordinates using NCBI BLASTN with 400 bp of context sequence.
588
To identify gene families that were expanded or contracted in PH207 relative to B73, an all-
589
versus-all blast was performed using WU BLASTN (Altschul et al., 1990a) with the B73 version
590
3.21 transcripts and PH207 transcripts, requiring a minimum E-value of 1e-10 and allowing up to
591
5,000 hits per sequence. The transcripts were clustered using OrthoMCL version 1.4 (Li et al.,
592
2003; Chen et al., 2007) in mode 4 with default parameters to identify putative
593
paralogous/homologous gene families between the two genotypes. This analysis was also used to
594
identify high confidence one-to-one genes between the two individuals for subsequent analyses.
595
The enrichment of Gene Ontology terms was analyzed using AgriGO (Du et al., 2010) using
596
Fisher’s exact test to calculate an enrichment P-value, with a value of <0.05 being considered
597
significant after using the Yekutieli method for multiple test correction.
598
Promoter SNP and lnDel variation was explored for one-to-one genes that were located on
599
the same chromosome within 20Mb of each other between the two genome assemblies and had
600
at least 1kb of promoter sequence upstream of the transcription start site in both assemblies
601
(N=18,798). Sequences with one or more N’s in either assembly were removed, resulting in
602
12,537 promoter comparisons. For each pair-wise comparison, the 1kb of promoter sequences
603
from each assembly were aligned using NCBI BLASTN version 2.2.28 (Altschul et al., 1990b).
604
Promoter comparisons with less than 700 bp alignments (N=6,114) were removed from
605
downstream analysis. Alignments were parsed to identify SNPs and InDels using the
606
blastn2snp.jar scripts within Jvarkit (Lindenbaum, 2015).
607
608
Transcriptional Analysis
609
All RNAseq reads were trimmed to 75 nt to remove low quality bases and aligned to both the
610
B73 version 3.21 reference genome assembly and the PH207 genome assembly using Bowtie2
611
version 2.1.0 (Langmead and Salzberg, 2012) and TopHat2 version 2.0.10 (Kim et al., 2013).
612
The minimum intron size was set to five bp, and the maximum intron size was set to 60,000 bp.
613
All other mapping parameters were set to the default values. Transcript abundance estimates
21
614
(measured as fragments per kilobase of exon model per million fragments mapped) were
615
determined using Cufflinks2 version 2.1.1 (Trapnell et al., 2010) with a minimum and maximum
616
intron size of five bp and 60,000 bp, respectively, and providing genome assemblies for bias
617
correction. Transcript abundance counts for the representative transcript (longest transcript) were
618
generated with HTSeq version 0.6.1p1 (Anders et al., 2015) at the gene level using the union
619
mode and a minimum mapping quality of 20 with non-strand specific counting (--stranded no).
620
Corrected counts by genome coverage were determined by taking the raw counts divided by the
621
percent exon coverage from the resequencing data rounded to the nearest whole number. A gene
622
was defined as expressed in a sample if it had a count of one or more. Differential expression
623
analysis was conducted using DESeq2 version 1.8.2 (Love et al., 2014) within R version 3.2.1 (R
624
Development Core Team, 2011) using Bioconductor version 3.1 (Gentleman et al., 2004) with
625
default parameters. Genes with an adjusted p-value <0.05 and a fold change >2 in either
626
direction were considered differentially expressed. Differential expression analysis was
627
conducted for B73 RNAseq reads mapped to the B73 genome assembly versus PH207 RNAseq
628
reads mapped to B73 genome assembly as well as for B73 RNAseq reads mapped to the PH207
629
genome assembly versus PH207 RNAseq reads mapped to the PH207 genome assembly within
630
each tissues. Differential expression analysis was conducted in this way because of our
631
documented difference in untranslated region representation between the annotation for B73 and
632
PH207, which would bias expression downwardly in PH207 one-to-one genes relative to the
633
corresponding B73 gene model.
634
635
Comparative Genomics Analysis
636
Homologous grass genes that were present in only one assembly (either B73 or PH207) were
637
identified using OrthoMCL version 1.4 (Li et al., 2003; Chen et al., 2007). An all-versus-all
638
Blast analysis was performed using WU BLASTP (Altschul et al., 1990a) requiring a minimum
639
E-value of 1e-10 and allowing up to 5,000 hits per sequence with the B. distachyon v2.1
640
(International Brachypodium, 2010), maize B73 version 3.21, maize PH207, O. sativa release 7
641
(Ouyang et al., 2007), and S. bicolor v1.4 (Paterson et al., 2009) representative protein sequences
642
defined as the protein encoded by the longest transcript. Homologous gene families were
643
identified by OrthoMCL version 1.4 in mode 4 with default parameters.
22
644
To evaluate differential fractionation in B73 and PH207 following the maize whole genome
645
duplication, the previously defined B73 Maize1 and Maize2 classifications (Schnable et al.,
646
2011) from B73 v2 were converted to B73 v3 coordinates using CrossMap (Zhao et al., 2014)
647
and overlapping gene models were identified. The converted Maize1 and Maize2 classifications
648
were compared with the homologous gene clustering generated using OrthoMCL (Li et al., 2003;
649
Chen et al., 2007) with the B73 version 3.21 and PH207 nucleotide sequences described above.
650
651
Analysis of Deleterious Mutations
652
PH207 genomic reads were cleaned using Cutadapt version 1.8.1 (Martin, 2011) requiring a
653
minimum length of 70 nt and a minimum quality of 10. Cleaned reads were aligned to the B73
654
version 3.21 reference assembly using Bowtie version 2.2.4 (Langmead and Salzberg, 2012) with
655
default parameters as single end reads. Variants were called using Samtools mpileup (Li et al.,
656
2009) with a minimum depth of three and a maximum depth of 200. Only variants exceeding a
657
quality score of 20 were retained. Variants were then annotated via SnpEff version 4.1 H
658
(Cingolani et al., 2012) using default parameters. The annotated VCF file was filtered to retain
659
only the predicted high and moderate impact variants that were located within the representative
660
transcript of one-to-one genes based on the B73 model. As defined by the VCF annotation
661
standard, high impact effects included: frameshift variants, splice acceptor/donor variants,
662
start/stop lost, and stop gained. Moderate impact predicated effects included: inframe InDels and
663
missense variants. BLASTN 2.2.25+ (Altschul et al., 1990b) was used to align B73 genic
664
sequence from one-to-one genes to the PH207 assembly. Variants were parsed from the
665
alignment to determine the position of annotated variants from the Bowtie alignment in the
666
context of the PH207 assembly by scanning for each variant in the parsed BLAST output. Only
667
variants that had a consistent alignment position between the Bowtie and BLAST alignment
668
algorithms were retained. The ‘intersect’ sub-command within the BEDTools suite version
669
2.17.0 (Quinlan and Hall, 2010) was used to determine whether the retained variants were
670
located within PH207 exon sequences.
671
672
Accession Numbers
673
Sequence data from this article can be found in the Sequence Read Archive at the National
674
Center for Biotechnology Information under accession number PRJNA258455. The PH207
23
675
genome multifasta file, GFF annotation file, transcript multifasta file, and protein multifasta file
676
are available for download from the Dryad Digital Repository (DOI: 10.5061/dryad.8vj84) and
677
are
678
http://www.maizegdb.org) and at Phytozome (https://phytozome.jgi.doe.gov). Additionally, the
679
PH207 genome assembly results as Integrative Genomics Viewer (Broad Institute) tracks are
680
available for public access at http://nrgene.com.
also
available
at
the
Maize
Genetics
and
Genomics
Database
(MaizeGDB;
681
682
Supplemental Data
683
Supplemental Figure 1. Distribution of 23-mers in the PH207 sequence reads.
684
685
Supplemental Figure 2. PH207 and B73 RNAseq read mapping statistics.
686
687
Supplemental Figure 3. Annotation edit distance curve for the PH207 MakerP annotation.
688
689
Supplemental Figure 4. Resequencing-based approach to identify partial gene deletions.
690
691
Supplemental Figure 5. Density distribution of expression in B73 and PH207 for shared and
692
genotype-specific genes.
693
694
Supplemental Figure 6. Distribution of predicted impact of PH207 variants.
695
696
Supplemental Table 1. Reads generated from the PH207 genome assembly.
697
698
Supplemental Table 2. Assembly statistics at each progressive assembly step.
699
700
Supplemental Data Set 1. Functional annotation of the PH207 gene models.
701
702
Supplemental Data Set 2. Enriched Gene Ontology terms for B73 and PH207 genes that are
703
members of gene families that have expanded or contracted between the two genotypes and
704
genes that show PAV between the two genotypes.
705
24
706
Supplemental Data Set 3. Transcript abundance estimates for six distinct tissues in the inbred
707
lines B73 and PH207.
708
709
AUTHOR CONTRIBUTIONS
710
CNH, KK, NMS, EB, CRB, NdL, SMK, and MM designed the study. AGH, CJF, and CLW
711
acquired data. CNH, CDH, ABB, MJB, IS, OB, DST, KB, FL, CJF, and KLC analyzed and
712
interpreted data. CNH and MM wrote the manuscript. All authors read and approved the final
713
manuscript.
714
715
ACKNOWLEDGEMENTS
716
This work was funded in part by the DOE Great Lakes Bioenergy Research Center (DOE BER
717
Office of Science DE-FC02-07ER64494), Dow AgroSciences, LLC, and by the National Science
718
Foundation (grant no. IOS–1126998 to KLC). The Minnesota Supercomputing Institute (MSI) at
719
the University of Minnesota provided computational resources that contributed to the research
720
results reported within this paper. CDH was supported by a National Science Foundation
721
National Plant Genome Initiative Postdoctoral Fellowship in Biology Fellowship (Grant No.
722
1202724). ABB was supported by the DuPont Pioneer Bill Kuhn Honorary Fellowship.
723
724
REFERENCES
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
Abendroth, L.J., Elmore, R.W., Boyer, M.J., and Marlay, S.K. (2011). Corn growth and
development. PMR 1009. Iowa State University Extension, Ames Iowa.
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990a). Basic local
alignment search tool. J Mol Biol 215, 403-410.
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990b). Basic Local
Alignment Search Tool. Journal of Molecular Biology 215, 403-410.
Anders, S., Pyl, P.T., and Huber, W. (2015). HTSeq-a Python framework to work with highthroughput sequencing data. Bioinformatics 31, 166-169.
Anderson, J.E., Kantar, M.B., Kono, T.Y., Fu, F., Stec, A.O., Song, Q., Cregan, P.B.,
Specht, J.E., Diers, B.W., Cannon, S.B., McHale, L.K., and Stupar, R.M. (2014). A
roadmap for functional structural variants in the soybean genome. G3 4, 1307-1318.
Arabidopsis Genome, I. (2000). Analysis of the genome sequence of the flowering plant
Arabidopsis thaliana. Nature 408, 796-815.
Belo, A., Beatty, M.K., Hondred, D., Fengler, K.A., Li, B., and Rafalski, A. (2010). Allelic
genome structural variations in maize detected by array comparative genome
hybridization. Theor Appl Genet 120, 355-367.
Birchler, J.A., Auger, D.L., and Riddle, N.C. (2003). In search of the molecular basis of
heterosis. Plant Cell 15, 2236-2239.
25
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
Birchler, J.A., Yao, H., and Chudalayandi, S. (2006). Unraveling the genetic basis of hybrid
vigor. Proc Natl Acad Sci U S A 103, 12957-12958.
Birchler, J.A., Yao, H., Chudalayandi, S., Vaiman, D., and Veitia, R.A. (2010). Heterosis.
Plant Cell 22, 2105-2112.
Brunner, S., Fengler, K., Morgante, M., Tingey, S., and Rafalski, A. (2005). Evolution of
DNA sequence nonhomologies among maize inbreds. Plant Cell 17, 343-360.
Campbell, M.S., Holt, C., Moore, B., and Yandell, M. (2014a). Genome Annotation and
Curation Using MAKER and MAKER-P. Curr Protoc Bioinformatics 48, 4 11 11-14 11
39.
Campbell, M.S., Law, M., Holt, C., Stein, J.C., Moghe, G.D., Hufnagel, D.E., Lei, J.,
Achawanantakun, R., Jiao, D., Lawrence, C.J., Ware, D., Shiu, S.H., Childs, K.L.,
Sun, Y., Jiang, N., and Yandell, M. (2014b). MAKER-P: a tool kit for the rapid
creation, management, and quality control of plant genome annotations. Plant Physiol
164, 513-524.
Cao, J., Schneeberger, K., Ossowski, S., Gunther, T., Bender, S., Fitz, J., Koenig, D., Lanz,
C., Stegle, O., Lippert, C., Wang, X., Ott, F., Muller, J., Alonso-Blanco, C.,
Borgwardt, K., Schmid, K.J., and Weigel, D. (2011). Whole-genome sequencing of
multiple Arabidopsis thaliana populations. Nat Genet.
Chen, F., Mackey, A.J., Vermunt, J.K., and Roos, D.S. (2007). Assessing Performance of
Orthology Detection Strategies Applied to Eukaryotic Genomes. PLoS ONE 2, e383.
Chia, J.M., Song, C., Bradbury, P.J., Costich, D., de Leon, N., Doebley, J., Elshire, R.J.,
Gaut, B., Geller, L., Glaubitz, J.C., Gore, M., Guill, K.E., Holland, J., Hufford,
M.B., Lai, J., Li, M., Liu, X., Lu, Y., McCombie, R., Nelson, R., Poland, J.,
Prasanna, B.M., Pyhajarvi, T., Rong, T., Sekhon, R.S., Sun, Q., Tenaillon, M.I.,
Tian, F., Wang, J., Xu, X., Zhang, Z., Kaeppler, S.M., Ross-Ibarra, J., McMullen,
M.D., Buckler, E.S., Zhang, G., Xu, Y., and Ware, D. (2012). Maize HapMap2
identifies extant variation from a genome in flux. Nat Genet 44, 803-807.
Cingolani, P., Platts, A., Wang le, L., Coon, M., Nguyen, T., Wang, L., Land, S.J., Lu, X.,
and Ruden, D.M. (2012). A program for annotating and predicting the effects of single
nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster
strain w1118; iso-2; iso-3. Fly (Austin) 6, 80-92.
Cook, D.E., Lee, T.G., Guo, X., Melito, S., Wang, K., Bayless, A.M., Wang, J., Hughes, T.J.,
Willis, D.K., Clemente, T.E., Diers, B.W., Jiang, J., Hudson, M.E., and Bent, A.F.
(2012). Copy number variation of multiple genes at Rhg1 mediates nematode resistance
in soybean. Science 338, 1206-1209.
Dong, Q., Schlueter, S.D., and Brendel, V. (2004). PlantGDB, plant genome database and
analysis tools. Nucleic acids research 32, D354-359.
Du, Z., Zhou, X., Ling, Y., Zhang, Z., and Su, Z. (2010). agriGO: a GO analysis toolkit for the
agricultural community. Nucleic acids research 38, W64-70.
Eddy, S.R. (2011). Accelerated Profile HMM Searches. PLoS Comput Biol 7, e1002195.
Eichten, S.R., Foerster, J.M., de Leon, N., Kai, Y., Yeh, C.T., Liu, S., Jeddeloh, J.A.,
Schnable, P.S., Kaeppler, S.M., and Springer, N.M. (2011). B73-Mo17 near-isogenic
lines demonstrate dispersed structural variation in maize. Plant Physiol 156, 1679-1690.
Eilbeck, K., Moore, B., Holt, C., and Yandell, M. (2009). Quantitative measures for the
management and comparison of annotated genomes. BMC Bioinformatics 10, 67.
26
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
Finn, R.D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Heger, A.,
Hetherington, K., Holm, L., Mistry, J., Sonnhammer, E.L., Tate, J., and Punta, M.
(2014). Pfam: the protein families database. Nucleic acids research 42, D222-230.
Fu, H., and Dooner, H.K. (2002). Intraspecific violation of genetic colinearity and its
implications in maize. Proc Natl Acad Sci U S A 99, 9573-9578.
Gaines, T.A., Zhang, W., Wang, D., Bukun, B., Chisholm, S.T., Shaner, D.L., Nissen, S.J.,
Patzoldt, W.L., Tranel, P.J., Culpepper, A.S., Grey, T.L., Webster, T.M., Vencill,
W.K., Sammons, R.D., Jiang, J., Preston, C., Leach, J.E., and Westra, P. (2010).
Gene amplification confers glyphosate resistance in Amaranthus palmeri. Proc Natl Acad
Sci U S A 107, 1029-1034.
Gan, X., Stegle, O., Behr, J., Steffen, J.G., Drewe, P., Hildebrand, K.L., Lyngsoe, R.,
Schultheiss, S.J., Osborne, E.J., Sreedharan, V.T., Kahles, A., Bohnert, R., Jean, G.,
Derwent, P., Kersey, P., Belfield, E.J., Harberd, N.P., Kemen, E., Toomajian, C.,
Kover, P.X., Clark, R.M., Ratsch, G., and Mott, R. (2011). Multiple reference
genomes and transcriptomes for Arabidopsis thaliana. Nature 477, 419-423.
Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B.,
Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S.,
Irizarry, R., Leisch, F., Li, C., Maechler, M., Rossini, A.J., Sawitzki, G., Smith, C.,
Smyth, G., Tierney, L., Yang, J.Y., and Zhang, J. (2004). Bioconductor: open software
development for computational biology and bioinformatics. Genome Biol 5, R80.
Goodstein, D.M., Shu, S., Howson, R., Neupane, R., Hayes, R.D., Fazo, J., Mitros, T., Dirks,
W., Hellsten, U., Putnam, N., and Rokhsar, D.S. (2012). Phytozome: a comparative
platform for green plant genomics. Nucleic acids research 40, D1178-1186.
Gore, M.A., Chia, J.M., Elshire, R.J., Sun, Q., Ersoz, E.S., Hurwitz, B.L., Peiffer, J.A.,
McMullen, M.D., Grills, G.S., Ross-Ibarra, J., Ware, D.H., and Buckler, E.S. (2009).
A first-generation haplotype map of maize. Science 326, 1115-1117.
Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., Adiconis,
X., Fan, L., Raychowdhury, R., Zeng, Q., Chen, Z., Mauceli, E., Hacohen, N.,
Gnirke, A., Rhind, N., di Palma, F., Birren, B.W., Nusbaum, C., Lindblad-Toh, K.,
Friedman, N., and Regev, A. (2011). Full-length transcriptome assembly from RNASeq data without a reference genome. Nat Biotechnol 29, 644-652.
Hansey, C.N., Vaillancourt, B., Sekhon, R.S., de Leon, N., Kaeppler, S.M., and Buell, C.R.
(2012). Maize (Zea mays L.) Genome Diversity as Revealed by RNA-Sequencing. PLoS
ONE 7, e33071.
Hardigan, M.A., Crisovan, E., Hamiltion, J.P., Kim, J., Laimbeer, P., Leisner, C.P.,
Manrique-Carpintero, N.C., Newton, L., Pham, G.M., Vaillancourt, B., Yang, X.,
Zeng, Z., Douches, D., Jiang, J., Veilleux, R.E., and Buell, C.R. (2016). Genome
reduction uncovers a large dispensable genome and adaptive role for copy number
variation in asexually propagated Solanum tuberosum. The Plant Cell.
Hirakawa, H., Okada, Y., Tabuchi, H., Shirasawa, K., Watanabe, A., Tsuruoka, H.,
Minami, C., Nakayama, S., Sasamoto, S., Kohara, M., Kishida, Y., Fujishiro, T.,
Kato, M., Nanri, K., Komaki, A., Yoshinaga, M., Takahata, Y., Tanaka, M., Tabata,
S., and Isobe, S.N. (2015). Survey of genome sequences in a wild sweet potato, Ipomoea
trifida (H. B. K.) G. Don. DNA Research 22, 171-179.
Hirsch, C.D., Springer, N.M., and Hirsch, C.N. (2015). Genomic Limitations to RNAseq
Expression Profiling. The Plant Journal 84, 491-503.
27
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
Hirsch, C.N., Foerster, J.M., Johnson, J.M., Sekhon, R.S., Muttoni, G., Vaillancourt, B.,
Penagaricano, F., Lindquist, E., Pedraza, M.A., Barry, K., de Leon, N., Kaeppler,
S.M., and Buell, C.R. (2014). Insights into the maize pan-genome and pantranscriptome. Plant Cell 26, 121-135.
International Brachypodium, I. (2010). Genome sequencing and analysis of the model grass
Brachypodium distachyon. Nature 463, 763-768.
International Rice Genome Sequencing, P. (2005). The map-based sequence of the rice
genome. Nature 436, 793-800.
Kawahara, Y., de la Bastide, M., Hamilton, J.P., Kanamori, H., McCombie, W.R., Ouyang,
S., Schwartz, D.C., Tanaka, T., Wu, J., Zhou, S., Childs, K.L., Davidson, R.M., Lin,
H., Quesada-Ocampo, L., Vaillancourt, B., Sakai, H., Lee, S.S., Kim, J., Numa, H.,
Itoh, T., Buell, C.R., and Matsumoto, T. (2013). Improvement of the Oryza sativa
Nipponbare reference genome using next generation sequence and optical map data. Rice
(N Y) 6, 4.
Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S.L. (2013).
TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions
and gene fusions. Genome Biol 14, R36.
Korf, I. (2004). Gene finding in novel genomes. BMC Bioinformatics 5, 59.
Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., and
Salzberg, S.L. (2004). Versatile and open software for comparing large genomes.
Genome Biol 5, R12.
Lai, J., Li, R., Xu, X., Jin, W., Xu, M., Zhao, H., Xiang, Z., Song, W., Ying, K., Zhang, M.,
Jiao, Y., Ni, P., Zhang, J., Li, D., Guo, X., Ye, K., Jian, M., Wang, B., Zheng, H.,
Liang, H., Zhang, X., Wang, S., Chen, S., Li, J., Fu, Y., Springer, N.M., Yang, H.,
Wang, J., Dai, J., and Schnable, P.S. (2010). Genome-wide patterns of genetic variation
among elite maize inbred lines. Nat Genet 42, 1027-1030.
Lamesch, P., Berardini, T.Z., Li, D., Swarbreck, D., Wilks, C., Sasidharan, R., Muller, R.,
Dreher, K., Alexander, D.L., Garcia-Hernandez, M., Karthikeyan, A.S., Lee, C.H.,
Nelson, W.D., Ploetz, L., Singh, S., Wensel, A., and Huala, E. (2012). The Arabidopsis
Information Resource (TAIR): improved gene annotation and new tools. Nucleic acids
research 40, D1202-1210.
Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nature
methods 9, 357-359.
Law, M., Childs, K.L., Campbell, M.S., Stein, J.C., Olson, A.J., Holt, C., Panchy, N., Lei, J.,
Jiao, D., Andorf, C.M., Lawrence, C.J., Ware, D., Shiu, S.H., Sun, Y., Jiang, N., and
Yandell, M. (2015). Automated update, revision, and quality control of the maize
genome annotations using MAKER-P improves the B73 RefGen_v3 gene models and
identifies new genes. Plant Physiol 167, 25-39.
Li, H., and Durbin, R. (2010). Fast and accurate long-read alignment with Burrows-Wheeler
transform. Bioinformatics 26, 589-595.
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis,
G., and Durbin, R. (2009). The Sequence Alignment/Map format and SAMtools.
Bioinformatics 25, 2078-2079.
Li, L., Stoeckert, C.J., Jr., and Roos, D.S. (2003). OrthoMCL: identification of ortholog
groups for eukaryotic genomes. Genome Res 13, 2178-2189.
28
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
Lindenbaum, P. (2015). JVarkit: java-based utilities for Bioinformatics. figshare
http://dx.doi.org/10.6084/m9.figshare.1425030.
Love, M.I., Huber, W., and Anders, S. (2014). Moderated estimation of fold change and
dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550.
Lu, F., Romay, M.C., Glaubitz, J.C., Bradbury, P.J., Elshire, R.J., Wang, T., Li, Y., Li, Y.,
Semagn, K., Zhang, X., Hernandez, A.G., Mikel, M.A., Soifer, I., Barad, O., and
Buckler, E.S. (2015). High-resolution genetic mapping of maize pan-genome sequence
anchors. Nature communications 6, 6914.
Luo, R., Liu, B., Xie, Y., Li, Z., Huang, W., Yuan, J., He, G., Chen, Y., Pan, Q., Liu, Y.,
Tang, J., Wu, G., Zhang, H., Shi, Y., Liu, Y., Yu, C., Wang, B., Lu, Y., Han, C.,
Cheung, D.W., Yiu, S.M., Peng, S., Xiaoqian, Z., Liu, G., Liao, X., Li, Y., Yang, H.,
Wang, J., Lam, T.W., and Wang, J. (2012). SOAPdenovo2: an empirically improved
memory-efficient short-read de novo assembler. Gigascience 1, 18.
Magoc, T., and Salzberg, S.L. (2011). FLASH: fast length adjustment of short reads to improve
genome assemblies. Bioinformatics 27, 2957-2963.
Maron, L.G., Guimaraes, C.T., Kirst, M., Albert, P.S., Birchler, J.A., Bradbury, P.J.,
Buckler, E.S., Coluccio, A.E., Danilova, T.V., Kudrna, D., Magalhaes, J.V., Pineros,
M.A., Schatz, M.C., Wing, R.A., and Kochian, L.V. (2013). Aluminum tolerance in
maize is associated with higher MATE1 gene copy number. Proc Natl Acad Sci U S A
110, 5241-5246.
Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing
reads. 2011 17.
Mikel, M.A. (2011). Genetic Composition of Contemporary U.S. Commercial Dent Corn
Germplasm. Crop Sci. 51, 592-599.
Monaco, M.K., Stein, J., Naithani, S., Wei, S., Dharmawardhana, P., Kumari, S.,
Amarasinghe, V., Youens-Clark, K., Thomason, J., Preece, J., Pasternak, S., Olson,
A., Jiao, Y., Lu, Z., Bolser, D., Kerhornou, A., Staines, D., Walts, B., Wu, G.,
D'Eustachio, P., Haw, R., Croft, D., Kersey, P.J., Stein, L., Jaiswal, P., and Ware, D.
(2014). Gramene 2013: comparative plant genomics resources. Nucleic acids research 42,
D1193-1199.
Morgante, M., De Paoli, E., and Radovic, S. (2007). Transposable elements and the plant pangenomes. Curr Opin Plant Biol 10, 149-155.
Murray, M.G., and Thompson, W.F. (1980). Rapid isolation of high molecular weight plant
DNA. Nucleic acids research 8, 4321-4325.
Ossowski, S., Schneeberger, K., Clark, R.M., Lanz, C., Warthmann, N., and Weigel, D.
(2008). Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome
Res 18, 2024-2033.
Ouyang, S., Zhu, W., Hamilton, J., Lin, H., Campbell, M., Childs, K., Thibaud-Nissen, F.,
Malek, R.L., Lee, Y., Zheng, L., Orvis, J., Haas, B., Wortman, J., and Buell, C.R.
(2007). The TIGR Rice Genome Annotation Resource: improvements and new features.
Nucleic acids research 35, D883-887.
Park, S.J., Jiang, K., Tal, L., Yichie, Y., Gar, O., Zamir, D., Eshed, Y., and Lippman, Z.B.
(2014). Optimization of crop productivity in tomato using induced mutations in the
florigen pathway. Nat Genet 46, 1337-1342.
Parra, G., Bradnam, K., and Korf, I. (2007). CEGMA: a pipeline to accurately annotate core
genes in eukaryotic genomes. Bioinformatics 23, 1061-1067.
29
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
Paterson, A.H., Bowers, J.E., Bruggmann, R., Dubchak, I., Grimwood, J., Gundlach, H.,
Haberer, G., Hellsten, U., Mitros, T., Poliakov, A., Schmutz, J., Spannagl, M., Tang,
H., Wang, X., Wicker, T., Bharti, A.K., Chapman, J., Feltus, F.A., Gowik, U.,
Grigoriev, I.V., Lyons, E., Maher, C.A., Martis, M., Narechania, A., Otillar, R.P.,
Penning, B.W., Salamov, A.A., Wang, Y., Zhang, L., Carpita, N.C., Freeling, M.,
Gingle, A.R., Hash, C.T., Keller, B., Klein, P., Kresovich, S., McCann, M.C., Ming,
R., Peterson, D.G., Mehboob ur, R., Ware, D., Westhoff, P., Mayer, K.F., Messing,
J., and Rokhsar, D.S. (2009). The Sorghum bicolor genome and the diversification of
grasses. Nature 457, 551-556.
Quinlan, A.R., and Hall, I.M. (2010). BEDTools: a flexible suite of utilities for comparing
genomic features. Bioinformatics 26, 841-842.
R Development Core Team. (2011). R: A Language and Environment for Statistical
Computing. R Foundation for Statistical Computing, Vienna, Austria.
Schatz, M.C., Maron, L.G., Stein, J.C., Hernandez Wences, A., Gurtowski, J., Biggers, E.,
Lee, H., Kramer, M., Antoniou, E., Ghiban, E., Wright, M.H., Chia, J.M., Ware, D.,
McCouch, S.R., and McCombie, W.R. (2014). Whole genome de novo assemblies of
three divergent strains of rice, Oryza sativa, document novel gene space of aus and
indica. Genome Biol 15, 506.
Schnable, J.C., Springer, N.M., and Freeling, M. (2011). Differentiation of the maize
subgenomes by genome dominance and both ancient and ongoing gene loss. Proc Natl
Acad Sci U S A 108, 4069-4074.
Schnable, P.S., Ware, D., Fulton, R.S., Stein, J.C., Wei, F., Pasternak, S., Liang, C., Zhang,
J., Fulton, L., Graves, T.A., Minx, P., Reily, A.D., Courtney, L., Kruchowski, S.S.,
Tomlinson, C., Strong, C., Delehaunty, K., Fronick, C., Courtney, B., Rock, S.M.,
Belter, E., Du, F., Kim, K., Abbott, R.M., Cotton, M., Levy, A., Marchetto, P.,
Ochoa, K., Jackson, S.M., Gillam, B., Chen, W., Yan, L., Higginbotham, J.,
Cardenas, M., Waligorski, J., Applebaum, E., Phelps, L., Falcone, J., Kanchi, K.,
Thane, T., Scimone, A., Thane, N., Henke, J., Wang, T., Ruppert, J., Shah, N.,
Rotter, K., Hodges, J., Ingenthron, E., Cordes, M., Kohlberg, S., Sgro, J., Delgado,
B., Mead, K., Chinwalla, A., Leonard, S., Crouse, K., Collura, K., Kudrna, D.,
Currie, J., He, R., Angelova, A., Rajasekar, S., Mueller, T., Lomeli, R., Scara, G.,
Ko, A., Delaney, K., Wissotski, M., Lopez, G., Campos, D., Braidotti, M., Ashley, E.,
Golser, W., Kim, H., Lee, S., Lin, J., Dujmic, Z., Kim, W., Talag, J., Zuccolo, A.,
Fan, C., Sebastian, A., Kramer, M., Spiegel, L., Nascimento, L., Zutavern, T.,
Miller, B., Ambroise, C., Muller, S., Spooner, W., Narechania, A., Ren, L., Wei, S.,
Kumari, S., Faga, B., Levy, M.J., McMahan, L., Van Buren, P., Vaughn, M.W.,
Ying, K., Yeh, C.-T., Emrich, S.J., Jia, Y., Kalyanaraman, A., Hsia, A.-P.,
Barbazuk, W.B., Baucom, R.S., Brutnell, T.P., Carpita, N.C., Chaparro, C., Chia,
J.-M., Deragon, J.-M., Estill, J.C., Fu, Y., Jeddeloh, J.A., Han, Y., Lee, H., Li, P.,
Lisch, D.R., Liu, S., Liu, Z., Nagel, D.H., McCann, M.C., SanMiguel, P., Myers,
A.M., Nettleton, D., Nguyen, J., Penning, B.W., Ponnala, L., Schneider, K.L.,
Schwartz, D.C., Sharma, A., Soderlund, C., Springer, N.M., Sun, Q., Wang, H.,
Waterman, M., Westerman, R., Wolfgruber, T.K., Yang, L., Yu, Y., Zhang, L.,
Zhou, S., Zhu, Q., Bennetzen, J.L., Dawe, R.K., Jiang, J., Jiang, N., Presting, G.G.,
Wessler, S.R., Aluru, S., Martienssen, R.A., Clifton, S.W., McCombie, W.R., Wing,
30
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
R.A., and Wilson, R.K. (2009). The B73 Maize Genome: Complexity, Diversity, and
Dynamics. Science 326, 1112-1115.
Smit, A., Hubley, R., and Green, P. (2013-2015). RepeatMasker Open-4.0
http://www.repeatmasker.org.
Song, R., and Messing, J. (2003). Gene expression of a gene family in maize based on
noncollinear haplotypes. Proc Natl Acad Sci U S A 100, 9055-9060.
Springer, N.M., and Stupar, R.M. (2007). Allelic variation and heterosis in maize: how do two
halves make more than a whole? Genome Res 17, 264-275.
Springer, N.M., Ying, K., Fu, Y., Ji, T., Yeh, C.T., Jia, Y., Wu, W., Richmond, T., Kitzman,
J., Rosenbaum, H., Iniguez, A.L., Barbazuk, W.B., Jeddeloh, J.A., Nettleton, D., and
Schnable, P.S. (2009). Maize inbreds exhibit high levels of copy number variation
(CNV) and presence/absence variation (PAV) in genome content. PLoS Genet 5,
e1000734.
Stanke, M., and Waack, S. (2003). Gene prediction with a hidden Markov model and a new
intron submodel. Bioinformatics 19 Suppl 2, ii215-225.
Stelpflug, S.C., Sekhon, R.S., Vaillancourt, B., Hirsch, C.N., Buell, C.R., de Leon, N., and
Kaeppler, S.M. (2016). An Expanded Maize Gene Expression Atlas based on RNA
Sequencing and its Use to Explore Root Development. The Plant Genome 9.
Sutton, T., Baumann, U., Hayes, J., Collins, N.C., Shi, B.J., Schnurbusch, T., Hay, A.,
Mayo, G., Pallotta, M., Tester, M., and Langridge, P. (2007). Boron-toxicity tolerance
in barley arising from efflux transporter amplification. Science 318, 1446-1449.
Suzek, B.E., Huang, H., McGarvey, P., Mazumder, R., and Wu, C.H. (2007). UniRef:
comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23, 12821288.
Swanson-Wagner, R.A., Eichten, S.R., Kumari, S., Tiffin, P., Stein, J.C., Ware, D., and
Springer, N.M. (2010). Pervasive gene content variation and copy number variation in
maize and its undomesticated progenitor. Genome Res 20, 1689-1699.
Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J.,
Salzberg, S.L., Wold, B.J., and Pachter, L. (2010). Transcript assembly and
quantification by RNA-Seq reveals unannotated transcripts and isoform switching during
cell differentiation. Nat Biotechnol 28, 511-515.
Troyer, A.F. (1999). Background of U.S. Hybrid Corn. Crop Sci. 39, 601-626.
Troyer, A.F. (2006). Adaptedness and Heterosis in Corn and Mule Hybrids. Crop Science 46,
528-543.
UniProt Consortium. (2014). Activities at the Universal Protein Resource (UniProt). Nucleic
acids research 42, D191-198.
Wang, Y., Xiong, G., Hu, J., Jiang, L., Yu, H., Xu, J., Fang, Y., Zeng, L., Xu, E., Xu, J., Ye,
W., Meng, X., Liu, R., Chen, H., Jing, Y., Wang, Y., Zhu, X., Li, J., and Qian, Q.
(2015). Copy number variation at the GL7 locus contributes to grain size diversity in rice.
Nat Genet.
Weber, R.L., Wiebke-Strohm, B., Bredemeier, C., Margis-Pinheiro, M., de Brito, G.G.,
Rechenmacher, C., Bertagnolli, P.F., de Sa, M.E., Campos Mde, A., de Amorim,
R.M., Beneventi, M.A., Margis, R., Grossi-de-Sa, M.F., and Bodanese-Zanettini,
M.H. (2014). Expression of an osmotin-like protein from Solanum nigrum confers
drought tolerance in transgenic soybean. BMC plant biology 14, 343.
31
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
Weigel, D., and Mott, R. (2009). The 1001 genomes project for Arabidopsis thaliana. Genome
Biol 10, 107.
Wu, T.D., and Watanabe, C.K. (2005). GMAP: a genomic mapping and alignment program for
mRNA and EST sequences. Bioinformatics 21, 1859-1875.
Xu, H., Luo, X., Qian, J., Pang, X., Song, J., Qian, G., Chen, J., and Chen, S. (2012).
FastUniq: a fast de novo duplicates removal tool for paired short reads. PLoS One 7,
e52249.
Yandell, M., and Ence, D. (2012). A beginner's guide to eukaryotic genome annotation. Nat
Rev Genet 13, 329-342.
Yao, H., Dogra Gray, A., Auger, D.L., and Birchler, J.A. (2013). Genomic dosage effects on
heterosis in triploid maize. Proc Natl Acad Sci U S A 110, 2665-2669.
Zdobnov, E.M., and Apweiler, R. (2001). InterProScan--an integration platform for the
signature-recognition methods in InterPro. Bioinformatics 17, 847-848.
Zhao, H., Sun, Z., Wang, J., Huang, H., Kocher, J.P., and Wang, L. (2014). CrossMap: a
versatile tool for coordinate conversion between genome assemblies. Bioinformatics 30,
1006-1007.
1032
1033
Table 1. Phenotypic variation between B73, PH207 and the F1 reciprocal hybrids between B73
1034
and PH207. Phenotypic measurements were collected on greenhouse-grown plants at the
1035
seedling vegetative 1 developmental stage (Abendroth et al., 2011).
1036
B73 x
PH207 x
B73
PH207
B73
PH207
Fresh Aboveground Biomass (g)
0.93
1.43
1.25
0.97
Fresh Belowground Biomass (g)
0.83
1.09
1.33
0.91
Plant Height (cm)
12.84
17.32
17.30
13.82
Root Length (cm)
24.66
29.84
30.01
26.97
Trait
1037
1038
1039
1040
1041
1042
32
1043
Table 2. Transcriptome expression variation across six tissues in the context of B73 and PH207
1044
annotated genes. B73 and PH207 RNAseq reads were mapped to both their cognate genome and
1045
the reciprocal genome and expression was summarized as read counts in each of the mapping
1046
scenarios. Tissues included in this analysis were leaf blade, root cortical parenchyma, root stele,
1047
germinating kernel, root tip, and whole seedling.
1048
B73a
PH207a
Expression Competence
Never Expressed
6,180
5,885
Expressed in ≥1 tissue
33,121
34,672
Differential expression between genotypes
Always Differentially Expressed
1,655
1,968
Sometimes Differentially Expressed
18,913
19,363
Never Differentially Expressed
18,733
19,226
Genotype-specific expression pattern
Always Genotype Specific if Expressed
3,532
3,964
Sometimes Genotype Specific
8,719
9,910
Never Genotype Specific
27,050
26,683
a
The total number of genes in each genotype are 39,301 (B73) and 40,557 (PH207)
1049
1050
1051
FIGURE LEGENDS
1052
Figure 1. Summary of the PH207 genome assembly. A) Genome-wide comparison of the B73
1053
genome assembly and the PH207 genome assembly. The inset shows a zoomed in view of the
1054
first Mb of chromosome one from each genome. Forward matches, plotted first, are colored red
1055
and reverse matches are colored blue. B) Density of annotated genes in the PH207 genome
1056
assembly.
1057
1058
Figure 2. Genome content distribution in elite maize inbred lines B73 and PH207. A)
1059
Distribution of gene size for all B73 and PH207 genes and genes showing presence/absence
1060
variation (PAV). The whiskers are plotted at a maximum of 1.5 times the interquartile range
1061
away from the end of the box in each direction. B) Distribution of partial gene deletions in B73
1062
(right) and PH207 (left) genes relative to the genome in which the gene was annotated based on
1063
the ratio of coverage from resequencing data. C) Genome-wide distribution of B73 and PH207
1064
unique genes.
33
1065
1066
Figure 3. Variation in gene family size in elite maize inbred lines B73 and PH207. A)
1067
Distribution of gene families of the same size in B73 and PH207. B) Distribution of gene
1068
families with moderate changes in gene family size between B73 and PH207. C) Distribution of
1069
gene families with extreme changes in gene family size between B73 and PH207. D) Density
1070
plot of the proportion of genes in the inbred line (B73 or PH207) that has expanded gene family
1071
members in syntenic positions relative to the reciprocal inbred line, with fewer copies for gene
1072
families with extreme changes. E) Physical location of genes in a family that had only clustered
1073
expansion. F) Physical locations of genes in a family that had both clustered and distributed
1074
expansion. G) Physical locations of genes in a family that had distributed expansion only. In
1075
plots E-G, B73 genes are above the axis, and PH207 genes are below the axis. White circles
1076
represent genes with no introns, and black circles represent genes with 1 or more introns.
1077
1078
Figure 4. Distribution of expression in B73 and PH207 for shared and genotype-specific genes.
1079
For each B73 tissue used in this analysis the average of three biological replicates was used, and
1080
the average of two biological replicates was used for PH207 tissues. For both B73 and PH207
1081
the biological replicates consisted of pooled tissue from three plants. A) Distribution of number
1082
of tissues with expression. B) Distribution of average transcript abundance measured in
1083
fragments per kilobase of exon models per million fragments mapped (FPKM). The whiskers are
1084
plotted at a maximum of 1.5 times the interquartile range away from the end of the box in each
1085
direction. For both A and B, expression was measured in leaf blade, root cortical parenchyma,
1086
germinating kernel, root tip, whole seedling, and root stele. C) Distribution of transcript size in
1087
each class of genes.
1088
1089
Figure 5. Relationship between gene family size and transcriptional variation. Gene families
1090
were determined using homologous gene clustering with B73 and PH207 gene models. Heat
1091
maps show the proportion of genes in families of size N with expression in zero to six tissues,
1092
which include leaf blade, root cortical parenchyma, root stele, germinating kernel, root tip, and
1093
whole seedling. White color indicates no genes are expressed and red indicates all genes are
1094
expressed.
1095
34
1096
Figure 6. Relationship between variation in promoter sequence and transcriptional variation. A)
1097
Distribution of percent similarity of promoter sequences between B73 and PH207 genes in the
1098
region 1 kb upstream of the transcription start site for genes that show a one-to-one relationship
1099
in the two genome assemblies. B) Relationship between number of tissues with differential
1100
expression and number of SNPs per kb in the promoter sequence. B) Relationship between
1101
number of tissues with differential expression and number of InDels per kb in the promoter
1102
sequence.
1103
1104
Figure 7. Lessons learned from comparisons between genome assemblies of elite maize inbred
1105
lines. A) Homologous gene clusters between Brachypodium distachyon, maize B73, maize
1106
PH207, Oryza sativa, and Sorghum bicolor that contain a maize gene from only B73 or PH207
1107
and at least one other species. B) Example of a duplicated gene showing differential fractionation
1108
between B73 and PH207. Maize is an ancient tetraploid that has undergone genome
1109
fractionation. A previous study identified and classified genes in the B73 genome into two maize
1110
subgenomes, Maize1 and Maize2 (Schnable et al., 2011). The middle genes in these syntenic
1111
blocks are showing differential fractionation between B73 and PH207, with PH207 retaining
1112
both copies and B73 losing the copy in the Maize2 subgenome (Gene Positions:
1113
GRMZM2G003937
1114
chr2:208887428..208891800;
1115
Zm00008a009655
1116
chr2:212,373,566..212,385,564;
1117
GRMZM2G066197
1118
chr7:161,843,771..161,845,175;
1119
Zm00008a029226
1120
chr7:162,164,382..162,166,725). C) Example of a gene containing a putative frame shift that is
1121
compensated by alternative intron/exon boundaries when using annotation specific to the
1122
individual. PH207 reads were aligned to the B73 reference genome assembly and putative large
1123
effect variants were identified based on the B73 annotation. The putative frame shift variant
1124
shown in this figure is corrected for by an alternative gene model in PH207 that was identified
1125
using PH207-specific evidence to annotate genes in the PH207 genome assembly.
chr2:208,869,363..208,870,921;
GRMZM2G454474
GRMZM2G129575
chr2:208,993,547..208,994,705;
chr2:212,248,102..212,249,027;
Zm00008a009663
chr2:
chr7:161,658,677..161,660,464;
Zm00008a029217
chr7:162,105,478..162,112,256;
1126
35
Zm00008a009660
212,427,728..212,428,519;
GRMZM2G179777
chr7:161,948,327..161,949,627;
Zm00008a029228
PH207 Genome Assembly
A.
Figure 1. Summary of the de novo PH207 genome
assembly. A) Genome-wide comparison of the B73 v2
reference genome assembly and the PH207 de novo
genome assembly. The inset shows a zoomed in view
of the first Mb of chromosome one. Forward matches,
plotted first, are colored red and reverse matches are
colored blue. B) Density of annotated genes in the
PH207 genome assembly.
10
9
8
7
6
5
4
3
2
1
1
2
3
4
5
6 7
B73 v3 Genome Assembly
8
9 10
1
2
3
4
5
6
Genomic Position
8
9 10
B.
Genes per Mb
80
0
7
10000
10000
15000
15000
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
5000
5000
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
00
Gene
Size (bp)
Gene Size (bp)
A.
All Genes
PAV Genes
All Genes
PAV Genes
All Genes B73PAV Genes All GenesPH207
PAV Genes
B73
PH207
3.1% 2.7%
6.4%
B.
5.4%
3.7%
7.1%
83.8%
87.8%
0%
0%Coverage
Coverage
<25%
<25%Coverage
Coverage
25-75%
Coverage
25−75%
Coverage
>75% Coverage
>75%
Coverage
C.
5 6
6 7
7 8
8 9
9 10
10
11 22 33 44 5
B73 Unique Genes
11 22 33 44 55 66 77 88 99 10
10
PH207 Unique Genes
Figure 2. Genome content distribution in elite maize
inbred lines B73 and PH207. A) Distribution of gene
size for all B73 and PH207 genes and genes showing
presence/absence variation (PAV). The whiskers are
plotted at a maximum of 1.5 times the interquartile
range away from the end of the box in each direction.
B) Distribution of partial gene deletions in B73 (right)
and PH207 (left) genes relative to the genome in which
the gene was annotated based on the ratio of coverage
from resequencing data. C) Genome-wide distribution of
B73 and PH207 unique genes.
19697
15000
10000
5000
1040 74 18 6
4
1
1
1
1
0
1 2 3 4 5 6 7 9 10 20
6
100
5
4
3
2
N=136 Families 83
80
60
32
40
20
2
9
7
3
0
0
PH207 >30 >30 6-306-30 1-5 1-5 6-30
1
0
0
1
2
3
4
5
6
B73 6-30 1-5 1-5 6-306-30 >30 >30
Number of B73 Genes
Number of B73 and PH207 Genes
Number of B73 and PH207 Genes
4
Density
0 1 2 3 4
D.
C.
N=1,720 Families
Number of Groups
Number of Groups
25000
20000
B.
N=20,843 Families
Number of PH207 Genes
A.
3
2
1
0
0.0
0.2
0.0
0.4
0.2
0.6
0.4
0.8
0.6
1.0
0.8
1.0
Proportion of High Parent Genes at Positions Syntenic to Low Parent
Proportion of High Parent Genes at Positions Syntentic to Low Parent
●
●
●
●
●
●
●
●
●
●
●
●
E.
●
●
●
●
●
●
●
●
● ●
● ●
● ●
F.
●
●
●
●
●
●
G.
●
●●●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●●
●
●
●●
● ●● ●
● ●
●●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
● ●●
●
●●
●
●
●
●
● ●
●
●●
●
Figure 3. Variation in gene family size in elite maize inbred lines B73 and PH207. A) Distribution of gene families of the
same size in B73 and PH207. B) Distribution of gene families with moderate changes in gene family size between B73
and PH207. C) Distribution of gene families with extreme changes in gene family size between B73 and PH207. D)
Density plot of the proportion of genes in the inbred line (B73 or PH207) that has expanded gene family members in
syntenic positions relative to the reciprocal inbred line with fewer copies for gene families with extreme changes. E)
Physical location of genes in a family that had only clustered expansion. F) Physical location of genes in a family that
had both clustered and distributed expansion. G) Physical location of genes in a family that had distributed expansion
only. In plots E-G, B73 genes are above the axis, and PH207 genes are below the axis. White circles represent genes
with no introns and black circles represent genes with 1 or more introns.
1.5
A.
1.0
PH207 Shared
PH207 Specific
B73 Shared
B73 Specific
0.5
1.0
0.5
0.0
Density
Density
1.5
0.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
2
4
6
0
4 Expression
6
Number of 2
Tissues with
Transcript
Abundance(FPKM)
(FPKM)
Transcript Abundance
B.
●
●
●
●
Number of Tissues with Expression
●
40
40
●
●
●
●
●
●
●
●
●
●
30
30
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
20
20
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
10
10
00
C.
3500
3500
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
PH207
PH207
Shared
Shared
●
●
PH207
PH207
Specific
Specific
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
B73
B73
Shared
Shared
3000
3000
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2500
2500
B73
B73
Specific
Specific
●
●
Transcript Size (bp)
Transcript Length (bp)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2000
2000
●
●
●
●
●
●
●
●
1500
1500
1000
1000
500
500
00
PH207
PH207
Shared
Shared
PH207
PH207
Specific
Specific
B73
B73
Shared
Shared
B73
B73
Specific
Specific
Figure 4. Distribution of expression in B73 and PH207
shared and genotype-specifific genes. For each B73 tissue
used in this analysis, the average of three biological
replicates was used, and the average of two biological
replicates was used for PH207 tissue. For both B73 and
PH207, the biological replicates consisted of pooled tissue
from three plants, A) Distribution of number of tissues with
expression. B) Distribution of average transcript abundance
measured in fragments per kilobase of exon models per
million fragments mapped (FPKM). The whiskers are
plotted at a maximum of 1.5 times the interquartile range
away from the end of the box in each direction. For both A
and B, expression was measured in leaf blade, root cortical
parenchyma, germinating kernel, root tip, whole seedling,
and root stele. C) Distribution of transcript size in each class
of genes.
PH207 Genes
0 1 2 3 4 5 6
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
1010
1111
1212
1313
1414
1515
1616
1717
2020
2121
2828
2929
3434
3838
4040
4646
4747
4949
5757
6161
7777
102
102
105
105
150
150
167
167
214
214
Gene Family Size
Gene Family Size
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
1010
1111
1212
1313
1414
1515
1616
1717
1818
1919
2020
2222
2323
2424
2525
2626
2929
3232
3737
3838
4141
4545
4747
6161
6868
7373
0
2 3
0 11 2
3 44 5
5 66
Number
of Tissues with Expression
Number of Tissues with
Expression
Figure 5. Relationship between gene family size and
transcriptional variation. Gene families were determined
using homologous gene clustering with B73 and PH207
gene models. Heat maps show the proportion of genes
in families of size N with expression in zero to six tissues
which include leaf blade, root cortical parenchyma, root
stele, germinating kernel, root tip, whole seedling. White
color indicates no genes are expressed and red
indicates all genes are expressed.
Gene Family Size
B73 Genes
A.
100
Percent of Genes
4000
Count
3000
2000
1000
80
80
0
0-5
5-20
>20
80
60
40
20
85
90
95
85
90
95
Percent Similarity
Percent Similarity
100
100
0
100
0-2
2-5
>5
80
60
40
20
0
0
0
0
C.
Percent of Genes
1000 2000 3000 4000
B.
0 1 2 3 4 5 6
No. Tissues with Differential
Expression
0
1
2
3
4
5
6
No. Tissues with Differential
Expression
Figure 6. Rela*onship between varia*on in promoter sequence and transcrip*onal varia*on. A)
Distribu*on of percent similarity of promoter sequences between B73 and PH207 genes in the 1 kb
upstreamofthetranscrip*onstartsiteforgenesthatshowaone-to-onerela*onshipinthetwogenome
assemblies.B)Rela*onshipbetweennumberof*ssueswithdifferen*alexpressionandnumberofSNPs
perkbinthepromotersequence.C)Rela*onshipbetweennumberof*ssueswithdifferen*alexpression
andnumberofInser*ons/Dele*onsperkbinthepromotersequence.
A.
2000
B73 Absent From Cluster
PH207 Absent From Cluster
Number of Clusters
1600
B.
B73 Chr2
208,857 kb
208,907 kb
208,957 kb
209,007 kb
GRMZM2G003937 GRMZM2G129575
"Maize2"
Chr2
Zm00008a009655
1200
GRMZM2G454474
B73
Zm00008a009660
Zm00008a009663
800
PH207
GRMAM2G066197
400
0
4
3
2
Total
"Maize1"
Chr7
Zm00008a029217
GRMZM2G179777
B73
Zm00008a029226
Zm00008a029228
PH207
Taxa Per Cluster
C.
9,726 bp
GRMZM2G704285
Zm00008a034742
B73 Gene Model
9,726 bp
PH207 GeneModel
GRMZM2G704285
B73 Gene Model
Zm00008a034742
PH207 Gene Model
PH207 Read
AC C
PH207 Read Alignments
T
Alignments
A GGA T G G G A T T C T A T A C C A T G GGC AG A T GGG C T A C C T C
B73 Genome Sequence
E D
G
L
H
I
G
Q M
Y
Y
G
B73 Protein
B73
Protein
E D
G
L
I
M
Y
Y
G
B73 Based PH207 Protein
B73 Based PH207 Protein
PH207 Protein
PH207 Protein
G C T G C T A T T G C T A G A T G T G G G T C C T C A A C T T T C C A C A T A C T T G C T C C T G A A C C T G A T G A
L
L
A
I
A
A
A
A
A
R
A
I
A
I
A
R
R
C
G
S
S
T
F
H
I
C
G
S
S
T
F
H
I
L
G
S
S
T
F
H
I
L
C
L
A
A
A
E
P
P
P
E
E
D
P
P
P
D
1"
2"
3"
Figure 7. Lessons learned from comparisons between genome assemblies of elite maize inbred lines. A)
4"
5"
6"
7"
B) Example of a duplicated gene showing differential fractionation between B73 and PH207. Maize is an
8"
9"
10"
11"
12"
13"
14"
15"
16"
17"
18"
19"
with PH207 retaining both copies and B73 losing the copy in the Maize2 subgenome (Gene Positions:
Homologous gene clusters between Brachypodium distachyon, maize B73, maize PH207, Oryza sativa,
and Sorghum bicolor that contain a maize gene from only B73 or PH207 and at least one other species.
ancient tetraploid that has undergone genome fractionation. A previous study identified and classified
genes in the B73 genome into two maize subgenomes, Maize1 and Maize2 (Schnable et al., 2011). The
middle genes in these syntenic blocks are showing differential fractionation between B73 and PH207,
GRMZM2G003937 chr2:208,869,363..208,870,921; GRMZM2G129575 chr2:208887428..208891800;
GRMZM2G454474 chr2:208,993,547..208,994,705; Zm00008a009655 chr2:212,248,102..212,249,027;
Zm00008a009660 chr2:212,373,566..212,385,564; Zm00008a009663 chr2: 212,427,728..212,428,519;
GRMZM2G066197 chr7:161,658,677..161,660,464; GRMZM2G179777 chr7:161,843,771..161,845,175;
Zm00008a029217 chr7:161,948,327..161,949,627; Zm00008a029226 chr7:162,105,478..162,112,256;
Zm00008a029228 chr7:162,164,382..162,166,725). C) Example of a gene containing a putative frame
shift that is compensated by alternative intron/exon boundaries when using annotation specific to the
individual. PH207 reads were aligned to the B73 reference genome assembly and putative large effect
variants were identified based on the B73 annotation. The putative frame shift variant shown in this figure
is corrected for by an alternative gene model in PH207 that was identified using PH207-specific evidence
to annotate genes in the PH207 genome assembly.
E
D
E
E
Parsed Citations
Abendroth, L.J., Elmore, R.W., Boyer, M.J., and Marlay, S.K. (2011). Corn growth and development. PMR 1009. Iowa State University
Extension, Ames Iowa.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990a). Basic local alignment search tool. J Mol Biol 215, 403-410.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990b). Basic Local Alignment Search Tool. Journal of Molecular
Biology 215, 403-410.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Anders, S., Pyl, P.T., and Huber, W. (2015). HTSeq-a Python framework to work with high-throughput sequencing data.
Bioinformatics 31, 166-169.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Anderson, J.E., Kantar, M.B., Kono, T.Y., Fu, F., Stec, A.O., Song, Q., Cregan, P.B., Specht, J.E., Diers, B.W., Cannon, S.B., McHale,
L.K., and Stupar, R.M. (2014). A roadmap for functional structural variants in the soybean genome. G3 4, 1307-1318.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Arabidopsis Genome, I. (2000). Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796-815.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Belo, A., Beatty, M.K., Hondred, D., Fengler, K.A., Li, B., and Rafalski, A. (2010). Allelic genome structural variations in maize
detected by array comparative genome hybridization. Theor Appl Genet 120, 355-367.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Birchler, J.A., Auger, D.L., and Riddle, N.C. (2003). In search of the molecular basis of heterosis. Plant Cell 15, 2236-2239.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Birchler, J.A., Yao, H., and Chudalayandi, S. (2006). Unraveling the genetic basis of hybrid vigor. Proc Natl Acad Sci U S A 103,
12957-12958.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Birchler, J.A., Yao, H., Chudalayandi, S., Vaiman, D., and Veitia, R.A. (2010). Heterosis. Plant Cell 22, 2105-2112.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Brunner, S., Fengler, K., Morgante, M., Tingey, S., and Rafalski, A. (2005). Evolution of DNA sequence nonhomologies among
maize inbreds. Plant Cell 17, 343-360.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Campbell, M.S., Holt, C., Moore, B., and Yandell, M. (2014a). Genome Annotation and Curation Using MAKER and MAKER-P. Curr
Protoc Bioinformatics 48, 4 11 11-14 11 39.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Campbell, M.S., Law, M., Holt, C., Stein, J.C., Moghe, G.D., Hufnagel, D.E., Lei, J., Achawanantakun, R., Jiao, D., Lawrence, C.J.,
Ware, D., Shiu, S.H., Childs, K.L., Sun, Y., Jiang, N., and Yandell, M. (2014b). MAKER-P: a tool kit for the rapid creation,
management, and quality control of plant genome annotations. Plant Physiol 164, 513-524.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Cao, J., Schneeberger, K., Ossowski, S., Gunther, T., Bender, S., Fitz, J., Koenig, D., Lanz, C., Stegle, O., Lippert, C., Wang, X., Ott,
F., Muller, J., Alonso-Blanco, C., Borgwardt, K., Schmid, K.J., and Weigel, D. (2011). Whole-genome sequencing of multiple
Arabidopsis thaliana populations. Nat Genet.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Chen, F., Mackey, A.J., Vermunt, J.K., and Roos, D.S. (2007). Assessing Performance of Orthology Detection Strategies Applied to
Eukaryotic Genomes. PLoS ONE 2, e383.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Chia, J.M., Song, C., Bradbury, P.J., Costich, D., de Leon, N., Doebley, J., Elshire, R.J., Gaut, B., Geller, L., Glaubitz, J.C., Gore, M.,
Guill, K.E., Holland, J., Hufford, M.B., Lai, J., Li, M., Liu, X., Lu, Y., McCombie, R., Nelson, R., Poland, J., Prasanna, B.M., Pyhajarvi,
T., Rong, T., Sekhon, R.S., Sun, Q., Tenaillon, M.I., Tian, F., Wang, J., Xu, X., Zhang, Z., Kaeppler, S.M., Ross-Ibarra, J., McMullen,
M.D., Buckler, E.S., Zhang, G., Xu, Y., and Ware, D. (2012). Maize HapMap2 identifies extant variation from a genome in flux. Nat
Genet 44, 803-807.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Cingolani, P., Platts, A., Wang le, L., Coon, M., Nguyen, T., Wang, L., Land, S.J., Lu, X., and Ruden, D.M. (2012). A program for
annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila
melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80-92.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Cook, D.E., Lee, T.G., Guo, X., Melito, S., Wang, K., Bayless, A.M., Wang, J., Hughes, T.J., Willis, D.K., Clemente, T.E., Diers, B.W.,
Jiang, J., Hudson, M.E., and Bent, A.F. (2012). Copy number variation of multiple genes at Rhg1 mediates nematode resistance in
soybean. Science 338, 1206-1209.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Dong, Q., Schlueter, S.D., and Brendel, V. (2004). PlantGDB, plant genome database and analysis tools. Nucleic acids research 32,
D354-359.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Du, Z., Zhou, X., Ling, Y., Zhang, Z., and Su, Z. (2010). agriGO: a GO analysis toolkit for the agricultural community. Nucleic acids
research 38, W64-70.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Eddy, S.R. (2011). Accelerated Profile HMM Searches. PLoS Comput Biol 7, e1002195.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Eichten, S.R., Foerster, J.M., de Leon, N., Kai, Y., Yeh, C.T., Liu, S., Jeddeloh, J.A., Schnable, P.S., Kaeppler, S.M., and Springer,
N.M. (2011). B73-Mo17 near-isogenic lines demonstrate dispersed structural variation in maize. Plant Physiol 156, 1679-1690.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Eilbeck, K., Moore, B., Holt, C., and Yandell, M. (2009). Quantitative measures for the management and comparison of annotated
genomes. BMC Bioinformatics 10, 67.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Finn, R.D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Heger, A., Hetherington, K., Holm, L., Mistry, J.,
Sonnhammer, E.L., Tate, J., and Punta, M. (2014). Pfam: the protein families database. Nucleic acids research 42, D222-230.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Fu, H., and Dooner, H.K. (2002). Intraspecific violation of genetic colinearity and its implications in maize. Proc Natl Acad Sci U S A
99, 9573-9578.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Gaines, T.A., Zhang, W., Wang, D., Bukun, B., Chisholm, S.T., Shaner, D.L., Nissen, S.J., Patzoldt, W.L., Tranel, P.J., Culpepper, A.S.,
Grey, T.L., Webster, T.M., Vencill, W.K., Sammons, R.D., Jiang, J., Preston, C., Leach, J.E., and Westra, P. (2010). Gene amplification
confers glyphosate resistance in Amaranthus palmeri. Proc Natl Acad Sci U S A 107, 1029-1034.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Gan, X., Stegle, O., Behr, J., Steffen, J.G., Drewe, P., Hildebrand, K.L., Lyngsoe, R., Schultheiss, S.J., Osborne, E.J., Sreedharan,
V.T., Kahles, A., Bohnert, R., Jean, G., Derwent, P., Kersey, P., Belfield, E.J., Harberd, N.P., Kemen, E., Toomajian, C., Kover, P.X.,
Clark, R.M., Ratsch, G., and Mott, R. (2011). Multiple reference genomes and transcriptomes for Arabidopsis thaliana. Nature 477,
419-423.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K.,
Hothorn, T., Huber, W., Iacus, S., Irizarry, R., Leisch, F., Li, C., Maechler, M., Rossini, A.J., Sawitzki, G., Smith, C., Smyth, G.,
Tierney, L., Yang, J.Y., and Zhang, J. (2004). Bioconductor: open software development for computational biology and
bioinformatics. Genome Biol 5, R80.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Goodstein, D.M., Shu, S., Howson, R., Neupane, R., Hayes, R.D., Fazo, J., Mitros, T., Dirks, W., Hellsten, U., Putnam, N., and
Rokhsar, D.S. (2012). Phytozome: a comparative platform for green plant genomics. Nucleic acids research 40, D1178-1186.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Gore, M.A., Chia, J.M., Elshire, R.J., Sun, Q., Ersoz, E.S., Hurwitz, B.L., Peiffer, J.A., McMullen, M.D., Grills, G.S., Ross-Ibarra, J.,
Ware, D.H., and Buckler, E.S. (2009). A first-generation haplotype map of maize. Science 326, 1115-1117.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q.,
Chen, Z., Mauceli, E., Hacohen, N., Gnirke, A., Rhind, N., di Palma, F., Birren, B.W., Nusbaum, C., Lindblad-Toh, K., Friedman, N.,
and Regev, A. (2011). Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29, 644652.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Hansey, C.N., Vaillancourt, B., Sekhon, R.S., de Leon, N., Kaeppler, S.M., and Buell, C.R. (2012). Maize (Zea mays L.) Genome
Diversity as Revealed by RNA-Sequencing. PLoS ONE 7, e33071.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Hardigan, M.A., Crisovan, E., Hamiltion, J.P., Kim, J., Laimbeer, P., Leisner, C.P., Manrique-Carpintero, N.C., Newton, L., Pham,
G.M., Vaillancourt, B., Yang, X., Zeng, Z., Douches, D., Jiang, J., Veilleux, R.E., and Buell, C.R. (2016). Genome reduction uncovers
a large dispensable genome and adaptive role for copy number variation in asexually propagated Solanum tuberosum. The Plant
Cell.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Hirakawa, H., Okada, Y., Tabuchi, H., Shirasawa, K., Watanabe, A., Tsuruoka, H., Minami, C., Nakayama, S., Sasamoto, S., Kohara, M.,
Kishida, Y., Fujishiro, T., Kato, M., Nanri, K., Komaki, A., Yoshinaga, M., Takahata, Y., Tanaka, M., Tabata, S., and Isobe, S.N. (2015).
Survey of genome sequences in a wild sweet potato, Ipomoea trifida (H. B. K.) G. Don. DNA Research 22, 171-179.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Hirsch, C.D., Springer, N.M., and Hirsch, C.N. (2015). Genomic Limitations to RNAseq Expression Profiling. The Plant Journal 84,
491-503.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Hirsch, C.N., Foerster, J.M., Johnson, J.M., Sekhon, R.S., Muttoni, G., Vaillancourt, B., Penagaricano, F., Lindquist, E., Pedraza,
M.A., Barry, K., de Leon, N., Kaeppler, S.M., and Buell, C.R. (2014). Insights into the maize pan-genome and pan-transcriptome.
Plant Cell 26, 121-135.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
International Brachypodium, I. (2010). Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463,
763-768.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
International Rice Genome Sequencing, P. (2005). The map-based sequence of the rice genome. Nature 436, 793-800.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Kawahara, Y., de la Bastide, M., Hamilton, J.P., Kanamori, H., McCombie, W.R., Ouyang, S., Schwartz, D.C., Tanaka, T., Wu, J., Zhou,
S., Childs, K.L., Davidson, R.M., Lin, H., Quesada-Ocampo, L., Vaillancourt, B., Sakai, H., Lee, S.S., Kim, J., Numa, H., Itoh, T., Buell,
C.R., and Matsumoto, T. (2013). Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence
and optical map data. Rice (N Y) 6, 4.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S.L. (2013). TopHat2: accurate alignment of transcriptomes
in the presence of insertions, deletions and gene fusions. Genome Biol 14, R36.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Korf, I. (2004). Gene finding in novel genomes. BMC Bioinformatics 5, 59.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., and Salzberg, S.L. (2004). Versatile and open software
for comparing large genomes. Genome Biol 5, R12.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Lai, J., Li, R., Xu, X., Jin, W., Xu, M., Zhao, H., Xiang, Z., Song, W., Ying, K., Zhang, M., Jiao, Y., Ni, P., Zhang, J., Li, D., Guo, X., Ye, K.,
Jian, M., Wang, B., Zheng, H., Liang, H., Zhang, X., Wang, S., Chen, S., Li, J., Fu, Y., Springer, N.M., Yang, H., Wang, J., Dai, J., and
Schnable, P.S. (2010). Genome-wide patterns of genetic variation among elite maize inbred lines. Nat Genet 42, 1027-1030.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Lamesch, P., Berardini, T.Z., Li, D., Swarbreck, D., Wilks, C., Sasidharan, R., Muller, R., Dreher, K., Alexander, D.L., GarciaHernandez, M., Karthikeyan, A.S., Lee, C.H., Nelson, W.D., Ploetz, L., Singh, S., Wensel, A., and Huala, E. (2012). The Arabidopsis
Information Resource (TAIR): improved gene annotation and new tools. Nucleic acids research 40, D1202-1210.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nature methods 9, 357-359.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Law, M., Childs, K.L., Campbell, M.S., Stein, J.C., Olson, A.J., Holt, C., Panchy, N., Lei, J., Jiao, D., Andorf, C.M., Lawrence, C.J.,
Ware, D., Shiu, S.H., Sun, Y., Jiang, N., and Yandell, M. (2015). Automated update, revision, and quality control of the maize genome
annotations using MAKER-P improves the B73 RefGen_v3 gene models and identifies new genes. Plant Physiol 167, 25-39.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Li, H., and Durbin, R. (2010). Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589-595.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and Durbin, R. (2009). The Sequence
Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Li, L., Stoeckert, C.J., Jr., and Roos, D.S. (2003). OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res
13, 2178-2189.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Lindenbaum, P. (2015). JVarkit: java-based utilities for Bioinformatics. figshare http://dx.doi.org/10.6084/m9.figshare.1425030.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Love, M.I., Huber, W., and Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.
Genome Biol 15, 550.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Lu, F., Romay, M.C., Glaubitz, J.C., Bradbury, P.J., Elshire, R.J., Wang, T., Li, Y., Li, Y., Semagn, K., Zhang, X., Hernandez, A.G.,
Mikel, M.A., Soifer, I., Barad, O., and Buckler, E.S. (2015). High-resolution genetic mapping of maize pan-genome sequence
anchors. Nature communications 6, 6914.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Luo, R., Liu, B., Xie, Y., Li, Z., Huang, W., Yuan, J., He, G., Chen, Y., Pan, Q., Liu, Y., Tang, J., Wu, G., Zhang, H., Shi, Y., Liu, Y., Yu,
C., Wang, B., Lu, Y., Han, C., Cheung, D.W., Yiu, S.M., Peng, S., Xiaoqian, Z., Liu, G., Liao, X., Li, Y., Yang, H., Wang, J., Lam, T.W.,
and Wang, J. (2012). SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Magoc, T., and Salzberg, S.L. (2011). FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics
27, 2957-2963.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Maron, L.G., Guimaraes, C.T., Kirst, M., Albert, P.S., Birchler, J.A., Bradbury, P.J., Buckler, E.S., Coluccio, A.E., Danilova, T.V.,
Kudrna, D., Magalhaes, J.V., Pineros, M.A., Schatz, M.C., Wing, R.A., and Kochian, L.V. (2013). Aluminum tolerance in maize is
associated with higher MATE1 gene copy number. Proc Natl Acad Sci U S A 110, 5241-5246.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. 2011 17.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Mikel, M.A. (2011). Genetic Composition of Contemporary U.S. Commercial Dent Corn Germplasm. Crop Sci. 51, 592-599.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Monaco, M.K., Stein, J., Naithani, S., Wei, S., Dharmawardhana, P., Kumari, S., Amarasinghe, V., Youens-Clark, K., Thomason, J.,
Preece, J., Pasternak, S., Olson, A., Jiao, Y., Lu, Z., Bolser, D., Kerhornou, A., Staines, D., Walts, B., Wu, G., D'Eustachio, P., Haw,
R., Croft, D., Kersey, P.J., Stein, L., Jaiswal, P., and Ware, D. (2014). Gramene 2013: comparative plant genomics resources. Nucleic
acids research 42, D1193-1199.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Morgante, M., De Paoli, E., and Radovic, S. (2007). Transposable elements and the plant pan-genomes. Curr Opin Plant Biol 10,
149-155.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Murray, M.G., and Thompson, W.F. (1980). Rapid isolation of high molecular weight plant DNA. Nucleic acids research 8, 4321-4325.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Ossowski, S., Schneeberger, K., Clark, R.M., Lanz, C., Warthmann, N., and Weigel, D. (2008). Sequencing of natural strains of
Arabidopsis thaliana with short reads. Genome Res 18, 2024-2033.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Ouyang, S., Zhu, W., Hamilton, J., Lin, H., Campbell, M., Childs, K., Thibaud-Nissen, F., Malek, R.L., Lee, Y., Zheng, L., Orvis, J.,
Haas, B., Wortman, J., and Buell, C.R. (2007). The TIGR Rice Genome Annotation Resource: improvements and new features.
Nucleic acids research 35, D883-887.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Park, S.J., Jiang, K., Tal, L., Yichie, Y., Gar, O., Zamir, D., Eshed, Y., and Lippman, Z.B. (2014). Optimization of crop productivity in
tomato using induced mutations in the florigen pathway. Nat Genet 46, 1337-1342.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Parra, G., Bradnam, K., and Korf, I. (2007). CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes.
Bioinformatics 23, 1061-1067.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Paterson, A.H., Bowers, J.E., Bruggmann, R., Dubchak, I., Grimwood, J., Gundlach, H., Haberer, G., Hellsten, U., Mitros, T.,
Poliakov, A., Schmutz, J., Spannagl, M., Tang, H., Wang, X., Wicker, T., Bharti, A.K., Chapman, J., Feltus, F.A., Gowik, U., Grigoriev,
I.V., Lyons, E., Maher, C.A., Martis, M., Narechania, A., Otillar, R.P., Penning, B.W., Salamov, A.A., Wang, Y., Zhang, L., Carpita, N.C.,
Freeling, M., Gingle, A.R., Hash, C.T., Keller, B., Klein, P., Kresovich, S., McCann, M.C., Ming, R., Peterson, D.G., Mehboob ur, R.,
Ware, D., Westhoff, P., Mayer, K.F., Messing, J., and Rokhsar, D.S. (2009). The Sorghum bicolor genome and the diversification of
grasses. Nature 457, 551-556.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Quinlan, A.R., and Hall, I.M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841842.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
R Development Core Team. (2011). R: A Language and Environment for Statistical Computing. R Foundation for Statistical
Computing, Vienna, Austria.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Schatz, M.C., Maron, L.G., Stein, J.C., Hernandez Wences, A., Gurtowski, J., Biggers, E., Lee, H., Kramer, M., Antoniou, E., Ghiban,
E., Wright, M.H., Chia, J.M., Ware, D., McCouch, S.R., and McCombie, W.R. (2014). Whole genome de novo assemblies of three
divergent strains of rice, Oryza sativa, document novel gene space of aus and indica. Genome Biol 15, 506.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Schnable, J.C., Springer, N.M., and Freeling, M. (2011). Differentiation of the maize subgenomes by genome dominance and both
ancient and ongoing gene loss. Proc Natl Acad Sci U S A 108, 4069-4074.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Schnable, P.S., Ware, D., Fulton, R.S., Stein, J.C., Wei, F., Pasternak, S., Liang, C., Zhang, J., Fulton, L., Graves, T.A., Minx, P., Reily,
A.D., Courtney, L., Kruchowski, S.S., Tomlinson, C., Strong, C., Delehaunty, K., Fronick, C., Courtney, B., Rock, S.M., Belter, E., Du,
F., Kim, K., Abbott, R.M., Cotton, M., Levy, A., Marchetto, P., Ochoa, K., Jackson, S.M., Gillam, B., Chen, W., Yan, L., Higginbotham,
J., Cardenas, M., Waligorski, J., Applebaum, E., Phelps, L., Falcone, J., Kanchi, K., Thane, T., Scimone, A., Thane, N., Henke, J.,
Wang, T., Ruppert, J., Shah, N., Rotter, K., Hodges, J., Ingenthron, E., Cordes, M., Kohlberg, S., Sgro, J., Delgado, B., Mead, K.,
Chinwalla, A., Leonard, S., Crouse, K., Collura, K., Kudrna, D., Currie, J., He, R., Angelova, A., Rajasekar, S., Mueller, T., Lomeli, R.,
Scara, G., Ko, A., Delaney, K., Wissotski, M., Lopez, G., Campos, D., Braidotti, M., Ashley, E., Golser, W., Kim, H., Lee, S., Lin, J.,
Dujmic, Z., Kim, W., Talag, J., Zuccolo, A., Fan, C., Sebastian, A., Kramer, M., Spiegel, L., Nascimento, L., Zutavern, T., Miller, B.,
Ambroise, C., Muller, S., Spooner, W., Narechania, A., Ren, L., Wei, S., Kumari, S., Faga, B., Levy, M.J., McMahan, L., Van Buren, P.,
Vaughn, M.W., Ying, K., Yeh, C.-T., Emrich, S.J., Jia, Y., Kalyanaraman, A., Hsia, A.-P., Barbazuk, W.B., Baucom, R.S., Brutnell, T.P.,
Carpita, N.C., Chaparro, C., Chia, J.-M., Deragon, J.-M., Estill, J.C., Fu, Y., Jeddeloh, J.A., Han, Y., Lee, H., Li, P., Lisch, D.R., Liu, S.,
Liu, Z., Nagel, D.H., McCann, M.C., SanMiguel, P., Myers, A.M., Nettleton, D., Nguyen, J., Penning, B.W., Ponnala, L., Schneider,
K.L., Schwartz, D.C., Sharma, A., Soderlund, C., Springer, N.M., Sun, Q., Wang, H., Waterman, M., Westerman, R., Wolfgruber, T.K.,
Yang, L., Yu, Y., Zhang, L., Zhou, S., Zhu, Q., Bennetzen, J.L., Dawe, R.K., Jiang, J., Jiang, N., Presting, G.G., Wessler, S.R., Aluru,
S., Martienssen, R.A., Clifton, S.W., McCombie, W.R., Wing, R.A., and Wilson, R.K. (2009). The B73 Maize Genome: Complexity,
Diversity, and Dynamics. Science 326, 1112-1115.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Smit, A., Hubley, R., and Green, P. (2013-2015). RepeatMasker Open-4.0 http://www.repeatmasker.org.
Song, R., and Messing, J. (2003). Gene expression of a gene family in maize based on noncollinear haplotypes. Proc Natl Acad Sci
U S A 100, 9055-9060.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Springer, N.M., and Stupar, R.M. (2007). Allelic variation and heterosis in maize: how do two halves make more than a whole?
Genome Res 17, 264-275.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Springer, N.M., Ying, K., Fu, Y., Ji, T., Yeh, C.T., Jia, Y., Wu, W., Richmond, T., Kitzman, J., Rosenbaum, H., Iniguez, A.L., Barbazuk,
W.B., Jeddeloh, J.A., Nettleton, D., and Schnable, P.S. (2009). Maize inbreds exhibit high levels of copy number variation (CNV) and
presence/absence variation (PAV) in genome content. PLoS Genet 5, e1000734.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Stanke, M., and Waack, S. (2003). Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19
Suppl 2, ii215-225.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Stelpflug, S.C., Sekhon, R.S., Vaillancourt, B., Hirsch, C.N., Buell, C.R., de Leon, N., and Kaeppler, S.M. (2016). An Expanded Maize
Gene Expression Atlas based on RNA Sequencing and its Use to Explore Root Development. The Plant Genome 9.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Sutton, T., Baumann, U., Hayes, J., Collins, N.C., Shi, B.J., Schnurbusch, T., Hay, A., Mayo, G., Pallotta, M., Tester, M., and
Langridge, P. (2007). Boron-toxicity tolerance in barley arising from efflux transporter amplification. Science 318, 1446-1449.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Suzek, B.E., Huang, H., McGarvey, P., Mazumder, R., and Wu, C.H. (2007). UniRef: comprehensive and non-redundant UniProt
reference clusters. Bioinformatics 23, 1282-1288.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Swanson-Wagner, R.A., Eichten, S.R., Kumari, S., Tiffin, P., Stein, J.C., Ware, D., and Springer, N.M. (2010). Pervasive gene content
variation and copy number variation in maize and its undomesticated progenitor. Genome Res 20, 1689-1699.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., and Pachter, L. (2010).
Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell
differentiation. Nat Biotechnol 28, 511-515.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Troyer, A.F. (1999). Background of U.S. Hybrid Corn. Crop Sci. 39, 601-626.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Troyer, A.F. (2006). Adaptedness and Heterosis in Corn and Mule Hybrids. Crop Science 46, 528-543.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
UniProt Consortium. (2014). Activities at the Universal Protein Resource (UniProt). Nucleic acids research 42, D191-198.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Wang, Y., Xiong, G., Hu, J., Jiang, L., Yu, H., Xu, J., Fang, Y., Zeng, L., Xu, E., Xu, J., Ye, W., Meng, X., Liu, R., Chen, H., Jing, Y.,
Wang, Y., Zhu, X., Li, J., and Qian, Q. (2015). Copy number variation at the GL7 locus contributes to grain size diversity in rice. Nat
Genet.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Weber, R.L., Wiebke-Strohm, B., Bredemeier, C., Margis-Pinheiro, M., de Brito, G.G., Rechenmacher, C., Bertagnolli, P.F., de Sa,
M.E., Campos Mde, A., de Amorim, R.M., Beneventi, M.A., Margis, R., Grossi-de-Sa, M.F., and Bodanese-Zanettini, M.H. (2014).
Expression of an osmotin-like protein from Solanum nigrum confers drought tolerance in transgenic soybean. BMC plant biology
14, 343.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Weigel, D., and Mott, R. (2009). The 1001 genomes project for Arabidopsis thaliana. Genome Biol 10, 107.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Wu, T.D., and Watanabe, C.K. (2005). GMAP: a genomic mapping and alignment program for mRNA and EST sequences.
Bioinformatics 21, 1859-1875.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Xu, H., Luo, X., Qian, J., Pang, X., Song, J., Qian, G., Chen, J., and Chen, S. (2012). FastUniq: a fast de novo duplicates removal tool
for paired short reads. PLoS One 7, e52249.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Yandell, M., and Ence, D. (2012). A beginner's guide to eukaryotic genome annotation. Nat Rev Genet 13, 329-342.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Yao, H., Dogra Gray, A., Auger, D.L., and Birchler, J.A. (2013). Genomic dosage effects on heterosis in triploid maize. Proc Natl
Acad Sci U S A 110, 2665-2669.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Zdobnov, E.M., and Apweiler, R. (2001). InterProScan--an integration platform for the signature-recognition methods in InterPro.
Bioinformatics 17, 847-848.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Zhao, H., Sun, Z., Wang, J., Huang, H., Kocher, J.P., and Wang, L. (2014). CrossMap: a versatile tool for coordinate conversion
between genome assemblies. Bioinformatics 30, 1006-1007.
Pubmed: Author and Title
CrossRef: Author and Title
Google Scholar: Author Only Title Only Author and Title
Draft Assembly of Elite Inbred Line PH207 Provides Insights into Genomic and Transcriptome
Diversity in Maize
Candice Hirsch, Cory D Hirsch, Alex B Brohammer, Megan J Bowman, Ilya Soifer, Omer Barad, Doron
Shem-Tov, Kobi Baruch, Fei Lu, Alvaro G Hernandez, Christopher J Fields, Chris L Wright, Klaus
Koehler, Nathan M. Springer, Edward S. Buckler, C. Robin Buell, Natalia de Leon, Shawn M. Kaeppler,
Kevin Childs and Mark A Mikel
Plant Cell; originally published online November 1, 2016;
DOI 10.1105/tpc.16.00353
This information is current as of June 16, 2017
Supplemental Data
/content/suppl/2016/11/01/tpc.16.00353.DC1.html
Permissions
https://www.copyright.com/ccc/openurl.do?sid=pd_hw1532298X&issn=1532298X&WT.mc_id=pd_hw1532298X
eTOCs
Sign up for eTOCs at:
http://www.plantcell.org/cgi/alerts/ctmain
CiteTrack Alerts
Sign up for CiteTrack Alerts at:
http://www.plantcell.org/cgi/alerts/ctmain
Subscription Information
Subscription Information for The Plant Cell and Plant Physiology is available at:
http://www.aspb.org/publications/subscriptions.cfm
© American Society of Plant Biologists
ADVANCING THE SCIENCE OF PLANT BIOLOGY