Plant Cell Advance Publication. Published on November 1, 2016, doi:10.1105/tpc.16.00353 LARGE-SCALE BIOLOGY ARTICLE Draft Assembly of Elite Inbred Line PH207 Provides Insights into Genomic and Transcriptome Diversity in Maize Candice N. Hirscha,1, Cory D. Hirschb, Alex B. Brohammera, Megan J. Bowmanc, Ilya Soiferd, Omer Barade, Doron Shem-Tove, Kobi Baruche, Fei Luf, Alvaro G. Hernandezg, Christopher J. Fieldsg, Chris L. Wrightg, Klaus Koehlerh, Nathan M. Springeri, Edward Bucklerf,j, C. Robin Buellc,k, Natalia de Leonl,m, Shawn M. Kaepplerl,m, Kevin L. Childsc,n, Mark A. Mikelg,o a Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108 Department of Plant Pathology, University of Minnesota, St. Paul, MN 55108 c Department of Plant Biology, Michigan State University, East Lansing, MI 48824 d Calico Labs, San Francisco, CA, 94080 e NRGENE LTD., Ness-Ziona, Israel, 7403648 f Instiute for Genome Diversity, Cornell University, Ithaca NY 14850 g Roy J. Carver Biotechnology Center, University of Illinois, Urbana, IL 61801 h Dow AgroSciences, Indianapolis, IN 46268 i Department of Plant Biology, University of Minnesota, St. Paul, MN 55108 j United States Department of Agriculture/Agricultural Research Services, Ithaca NY 14850 k DOE Great Lakes Bioenergy Research Center, East Lansing, MI, 48824 l Department of Agronomy, University of Wisconsin-Madison, Madison, WI, 53706 m DOE Great Lakes Bioenergy Research Center, Madison, WI, 53706 n Center for Genomics-Enabled Plant Sciences, Michigan State University, East Lansing, MI 48824 o Department of Crop Sciences, University of Illinois, Urbana, IL 61801 1 Corresponding Author: [email protected] b Short title: PH207 genome assembly One-sentence summary: Comparative analyses of the maize reference B73 genome assembly and the newly assembled PH207 genome and their transcriptomes provides insights into variation between heterotic groups of elite maize. The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantcell.org) is: Candice Hirsch ([email protected]) ABSTRACT Intense artificial selection over the last 100 years has produced elite maize (Zea mays) inbred lines that combine to produce high-yielding hybrids. To further our understanding of how genome and transcriptome variation contribute to the production of high-yielding hybrids, we generated a draft genome assembly of the inbred line PH207 to complement and compare with the existing B73 reference sequence. B73 is a founder of the Stiff Stalk germplasm pool, while PH207 is a founder of Iodent germplasm, both of which have contributed substantially to the production of temperate commercial maize and are combined to make heterotic hybrids. Comparison of these two assemblies revealed over 2,500 genes present in only one of the two genotypes and 136 gene families that have undergone extensive expansion or contraction. 1 ©2016 American Society of Plant Biologists. All Rights Reserved 1 2 Transcriptome profiling revealed extensive expression variation, with as many as 10,564 differentially expressed transcripts and 7,128 transcripts expressed in only one of the two genotypes in a single tissue. Genotype-specific genes were more likely to have tissue/conditionspecific expression and lower transcript abundance. The availability of a high-quality genome assembly for the elite maize inbred PH207 expands our knowledge of the breadth of natural genome and transcriptome variation in elite maize inbred lines across heterotic pools. INTRODUCTION 3 Access to genome assemblies has ushered in a new era in plant biology beginning with the 4 genome assembly of the model species Arabidopsis thaliana (Arabidopsis Genome, 2000). Since 5 that time, over 80 A. thaliana genome assemblies have been completed, revealing genome 6 content and allelic variation, the importance of genotype-specific annotation, and the regulation 7 of gene expression (http://1001genomes.org; (Gan et al., 2011)). Rice (Oryza sativa) is the only 8 plant species other than Arabidopsis with a ‘gold standard’ genome assembly (International Rice 9 Genome Sequencing, 2005). As with A. thaliana, multiple de novo rice genome assemblies have 10 now been generated, and through these assemblies, several megabases of sequences present in 11 only one individual have been identified, many of which were annotated to contain genes (Schatz 12 et al., 2014). Beyond these studies, extensive structural variation including differences in gene 13 copy number and presence/absence variation has been documented in many species (Brunner et 14 al., 2005; Morgante et al., 2007; Ossowski et al., 2008; Gore et al., 2009; Springer et al., 2009; 15 Weigel and Mott, 2009; Lai et al., 2010; Swanson-Wagner et al., 2010; Cao et al., 2011; Gan et 16 al., 2011; Chia et al., 2012; Hansey et al., 2012; Anderson et al., 2014; Hirsch et al., 2014; Schatz 17 et al., 2014; Hirakawa et al., 2015; Hardigan et al., 2016). 18 In maize (Zea mays), comparative genome hybridization revealed pervasive copy number 19 variation (CNV) and presence/absence variation (PAV) between two heterotic inbred lines (B73 20 and Mo17), including a ~2.6 Mb region on chromosome 6 that was present in B73 and absent in 21 Mo17 (Springer et al., 2009; Belo et al., 2010). In an expanded panel of 34 maize and teosinte 22 lines examined using comparative genome hybridization, nearly 4,000 instances of CNV/PAV 23 were observed (Swanson-Wagner et al., 2010). Likewise, the first-generation maize HapMap 24 (haplotype map) estimated that the B73 genome represents only about 70% of the total low copy 25 sequence in the maize pan-genome (Gore et al., 2009), and the second-generation maize HapMap 26 also identified pervasive structural variation (Chia et al., 2012). Re-sequencing of six elite maize 27 inbred lines identified several hundred genes that exhibit PAV, and in some cases, show heterotic 2 28 group specificity (Lai et al., 2010). Transcriptome profiling of maize lines has also revealed 29 extensive differences in transcriptome content. Profiling of 21 diverse inbred lines identified 30 1,321 non-reference loci, of which 145 were heterotic-group specific (Hansey et al., 2012). 31 Transcript profiling of seedling tissue from 503 diverse inbred lines was used to characterize the 32 maize pan-genome and pan-transcriptome (Hirsch et al., 2014), and nearly 9,000 novel loci 33 absent in the B73 reference genome assembly were identified. Furthermore, a genome-wide 34 association study using transcript presence/absence and transcript abundance as the independent 35 variable showed complementary and unique loci compared to those obtained from a genome- 36 wide association study using single nucleotide polymorphism (SNP) markers (Hirsch et al., 37 2014). 38 Maize has been intensively bred in the hybrid seed industry for nearly a century. Breeding 39 efforts in maize focus on improving inbred lines that combine well to produce high-yielding 40 hybrids and have resulted in the development of distinct heterotic germplasm pools. Indeed, elite 41 maize inbred lines have been generated that combine to form hybrid genotypes that yield 70-fold 42 more than the open pollinated varieties from which they originated (Troyer, 2006). Increases in 43 yield performance continue at a consistent rate, indicating persistence of extensive genetic 44 variation in elite maize germplasm. Maize heterotic pools consist primarily of a stiff stalk 45 heterotic pool from which the existing reference maize genome assembly, B73, is derived and 46 non-stiff stalk heterotic pools. In the past 20 years, Iodent germplasm has been a main 47 contributor to the non-stiff stalk heterotic pool. Iodent germplasm used today originated from 48 Pioneer Hi-Bred International germplasm in the 1940s (Troyer, 1999) and now permeates across 49 proprietary breeding programs. A key founder line to Iodent germplasm is the inbred line PH207. 50 The hybrid generated by crossing B73 and PH207 produces high parent heterosis across a 51 number of phenotypic traits (Table 1). Among the 788 U.S. Plant Variety Protection/utility 52 patent commercial corn inbred lines registered between 2009 and 2013, B73 (stiff stalk) was in 53 the pedigree of 327 lines and PH207 (Iodent) was in the pedigree of 441 lines (Mikel, 2011). 54 Among stiff stalks, 92% had B73 as an ancestor, and 91% of the non-stiff stalk lines were 55 descendants of the Iodent PH207. 56 A de novo genome assembly of the maize inbred line B73 was released in 2009 (Schnable 57 and Ware et al., 2009), and to date this has been the only publicly available complete maize 58 whole genome assembly. Previous studies of diversity in maize have been conducted in the 3 59 context of this single reference genome assembly, which has the potential to introduce a 60 reference bias. Here, we present the assembly of elite inbred line PH207, as well as a 61 comparative analysis between the genomes and transcriptomes of these elite maize inbred lines, 62 to further our understanding of natural variation present in elite maize inbred lines that have been 63 selected to combine and produce high-yielding hybrids. This study provides a high-quality 64 assembly to interrogate genome and transcriptome variation in maize, an important crop and 65 model species, and to begin to understand how this genomic variation contributes to the 66 phenotypic diversity and heterosis that is observed between elite inbred lines. 67 68 RESULTS 69 Assembly of the PH207 genome 70 Over 550 Gb of short read sequences from the inbred line PH207 were generated through 71 whole-genome shotgun sequencing using paired-end (PE), mate-pair (MP), and TruSeq synthetic 72 long-read genomic libraries with estimated fragment sizes ranging from 330 bp to 15 kb 73 (Supplemental Table 1). On the basis of 23-mer analysis, the genome size of PH207 was 74 estimated to be ~2.45 Gb (Supplemental Figure 1), which is comparable to the 2.3 Gb estimated 75 genome size of B73 reported by the maize genome sequencing consortium (Schnable and Ware 76 et al., 2009). Using the estimated size of 2.45 Gb, the short read sequences generated for the 77 PH207 assembly provide ~230X theoretical coverage of the genome. 78 A de novo assembly with an N50 scaffold size of ~654 kb with approximately 16% unfilled 79 gaps was generated (Supplemental Table 2). Most of the gaps spanned repetitive regions that 80 could not be assembled into scaffolds. The total size of the assembly was 2.1 Gb, of which 1.7 81 Gb was ungapped genomic sequence and 0.4 Gb was unfilled gaps. The PH207 de novo 82 assembled scaffolds were then organized into pseudomolecules based on the B73 reference 83 genome assembly using alignment of the scaffolds to the B73 assembly. In total, 1.99 Gb of the 84 PH207 assembly was placed into the pseudomolecules based on alignment to the B73 reference 85 assembly. Lu et al. (2015) described a genotyping-by-sequencing anchoring pipeline that utilized 86 linkage disequilibrium mapping coupled with machine learning to generate a set of 4.4M tags 87 that can be used as genetic anchors. 88 approximately 10 kb (average tag every 500 bp) and have been shown to be accurate for a given 89 line 95% of the time, with the other 5% most likely reflecting a species consensus position that is These 4.4M tags provided a median resolution of 4 90 different from the line. Overall, about 54% of the anchors are within 10kb regions of their true 91 position, 95% within 1Mb, and 98.6% within 10Mb (Lu et al., 2015). To assess the quality of the 92 assembly, the PH207 de novo assembled scaffolds were processed using the pan-genome genetic 93 anchor pipeline (Lu et al., 2015) to flag scaffolds that may be the result of chimeras from 94 disparate chromosome locations; approximately 21% of the tags could be aligned. Using this 95 method, as well as the alignments of the PH207 scaffolds to the B73 genome, only 120 scaffolds 96 were identified as putative misassemblies and were corrected by splitting them at the junction of 97 the putative misassembly. This marginally decreased the assembly N50 scaffold size to ~630 kb 98 (Supplemental Table 2). This final assembly has been named Zm-PH207-REFERENCE_NS- 99 UIUC_UMN-1.0 and is available for download at the Dryad Digital Repository (DOI 100 10.5061/dryad.8vj84) and is also available at the Maize Genetics and Genomics Database 101 (MaizeGDB; (http://www.maizegdb.org) and at Phytozome (https://phytozome.jgi.doe.gov). 102 Comparison of the B73 reference genome assembly and the final PH207 genome assembly 103 revealed numerous structural variants between the two genomes. However, few large gaps in the 104 assemblies were observed (Figure 1A). 105 Several approaches were used to further validate the PH207 assembly completeness and 106 error rate. Genomic sequence reads used to generate the assembly were aligned back to the 107 assembly, and 97.1% of the paired-end short read sequences could align to at least one position 108 in the genome, demonstrating the completeness of the assembly. For the B73 reference genome 109 assembly, comparable mapping was observed, with 98.0% of paired-end short read sequences 110 aligning to at least one position in the genome. From these alignments, only 33,812 SNPs or 111 insertion/deletions (InDels) were identified from alignment of the paired-end short-read 112 sequences to the PH207 assembly, which equates to less than 0.002% of the total genome 113 assembly, and indicates a low error rate in the assembly. To evaluate completeness of the genic 114 space, two approaches were used, alignment of RNAseq reads to the assembly and the Core 115 Eukaryotic Genes Mapping Approach pipeline (Parra et al., 2007). From alignment of RNAseq 116 reads from six PH207 tissues (leaf blade, root cortical parenchyma, germinating kernel, root tip, 117 whole seedling, and root stele), 96.3% of reads on average could map to at least one position in 118 the genome, and 87.1% mapped to a single unique position (Supplemental Figure 2). Alignment 119 of B73 RNAseq reads from these same tissues to the B73 genome assembly resulted in 92.2% of 120 reads mapping to at least one position on average. The Core Eukaryotic Genes Mapping 5 121 Approach pipeline was run for both the PH207 and B73 assemblies and revealed similar 122 representation of the gene space between the assemblies (PH207: 91.5% complete genes and 123 99.2% partial genes; B73: 91.5% complete genes and 98.4% partial genes). Taken together, these 124 analyses demonstrate the high quality nature of the PH207 genome assembly and support its 125 utility in downstream comparative analyses with the existing B73 reference genome assembly. 126 127 Gene annotation of the PH207 genome 128 Structural gene annotation of the PH207 genome assembly was performed on all PH207 129 scaffolds greater than 500 bp using the MAKER-P pipeline (Campbell et al., 2014b). De novo 130 PH207 transcript assemblies of RNAseq reads from six different tissues (leaf blade, root cortical 131 parenchyma, root stele, germinating kernel, root tip, whole seedling) were used as transcript 132 evidence to aid in gene identification. Additionally, the predicted O. sativa proteome and 133 UniProtKB/Swiss-Prot plant proteins (minus Z. mays sp. proteins) were aligned to the PH207 134 genome assembly and used as evidence for gene prediction (Kawahara et al., 2013; UniProt 135 Consortium, 2014). Evidence from other maize accessions was not used in this annotation to 136 ensure that the annotation reflected genotype-specific annotation to reduce bias in downstream 137 comparisons between the PH207 and B73 gene space. In total, we identified 40,557 high-quality 138 gene models in PH207 supported by aligned transcript evidence, protein evidence, or the 139 presence of a Pfam domain (Finn et al., 2014). These gene models were distributed throughout 140 the genome (Figure 1B) in a distribution similar to that observed in the B73 reference genome 141 assembly (Schnable and Ware et al., 2009; Law et al., 2015). The annotation edit distance 142 analysis of the predicted gene set indicated that the gene annotations were generally well 143 supported (Supplemental Figure 3) (Eilbeck et al., 2009; Yandell and Ence, 2012). The annotated 144 transcripts from the PH207 gene set had an average length of 1,294 bp, with an N50 size of 145 1,685 bp, slightly smaller than the average annotated transcript in the B73 v3 filtered gene set 146 (1,559 bp with an N50 size of 1,910). This difference is likely due to better representation of the 147 untranslated regions in B73 resulting from the extensive transcript evidence that was used for the 148 B73 annotation. Indeed, the average predicted protein length of PH207 (353 amino acids) is 149 nearly identical to that of B73 (352 amino acids), suggesting that the PH207 sequence represents 150 a similar coding sequence as compared to B73. Regardless, the number of annotated genes in 151 PH207 was very similar to that of the B73 v3 gene set, which contains 39,301 nuclear genes. 6 152 Putative functional annotation of the predicted PH207 gene models was generated using 153 transitive annotations based on the best BLAST hit to the A. thaliana (33,388), Brachypodium 154 distachyon (37,364), O. sativa (37,201), and Sorghum bicolor (37,998) reference genome 155 annotations and the UniRef100 database (38,710). The functional annotations for many of the 156 genes in these plant species are derived from A. thaliana annotation directly or indirectly. Thus, a 157 plurality of agreement among datasets can in some cases be misleading for assigning functional 158 annotations. Additionally, the multigene family nature of many genes in plant species can result 159 in misannotation based on transitive annotation from best BLAST hits. Gene Ontology terms 160 were also assigned to 18,430 of the PH207 gene models (Supplemental Data Set 1). 161 162 Elite maize inbred lines exhibit extensive genome content variation including massive 163 expansion of gene families 164 Access to two high-quality genome assemblies allowed us to comprehensively explore 165 genome content variation between heterotic maize inbred lines, both in terms of 166 presence/absence as well as copy number variation. To identify dispensable genes between these 167 two lines, representative transcript sequences (longest transcript) from each genotype were 168 aligned to both genome assemblies. Within the subset of transcripts that could map to their 169 cognate genome, 5,291 B73 transcripts did not align to PH207 and 5,029 PH207 transcripts did 170 not align to B73. While both the B73 assembly and our PH207 assembly are high quality, these 171 assemblies are not 100% complete (Schnable and Ware et al., 2009; Lai et al., 2010; Hansey et 172 al., 2012; Hirsch et al., 2014). As such, direct comparison of the genome assemblies 173 overestimates the number of PAVs between the two inbred lines. Indeed, approximately half of 174 the PAVs identified through direct comparison of the assemblies had sequence reads from whole 175 genome sequencing that did not support the PAV classification. 176 To determine how many genes represent sequences absent in their respective genomes and to 177 provide a conservative estimation of genome PAV between B73 and PH207, all genes from each 178 genotype were categorized as present or absent based on the resequencing data (~11x average 179 realized coverage from B73 and ~5x average realized coverage from PH207). If a gene had 180 greater than 75% coverage in its cognate genotype and less than 25% coverage in the reciprocal 181 genotype at a minimum of 1x coverage, it was considered to be a genotype-specific gene. This is 182 a very strict criterion and eliminated a number of genes that are unmappable with short read 7 183 sequencing technologies; however, it provides a high confidence method for identifying 184 dispensable genes. Based on this criterion, 1,169 genes were B73 specific and 1,545 were PH207 185 specific. An enrichment of cellular response to stress was observed among the enriched Gene 186 Ontology terms within this set of PAV genes (Supplemental Data Set 2). Additionally, 187 dispensable genes tended to be shorter than genes that were present in both genotypes (Figure 188 2A). While a large number of genes demonstrated clear PAV, there are many genes that appear 189 to be partial deletions in the reciprocal genome (Supplemental Figure 4). Indeed, 12.2% of the 190 B73 genes were partially deleted in PH207 and 16.2% of the PH207 genes were partially deleted 191 in B73 (Figure 2B). 192 It has previously been suggested that dispensable genes provide an extreme case of 193 complementation, supporting the dominance model of heterosis, in which inferior recessive 194 alleles contributed by one parent are complemented by superior dominant alleles contributed by 195 the other parent (Fu and Dooner, 2002; Lai et al., 2010; Swanson-Wagner et al., 2010). Thus, 196 there should be an enrichment of dispensable genes in low recombination regions of the genome 197 (i.e., near centromeres). A significant enrichment of PAV genes in the pericentromeric regions 198 (30.9% of PAV genes) relative to all genes in the pericentromeric regions (25.9%) was observed 199 (chi-square p-value > 0.0001). In terms of total gene number, more genes annotated as PAV 200 between B73 and PH207 were located within high recombination regions of the genome (Figure 201 2C). 202 Genome content variation can also arise from differences in CNV leading to variation in gene 203 family size between genotypes. To evaluate the difference in gene family size between B73 and 204 PH207, genes from both genotypes were clustered using the OrthoMCL algorithm (Li et al., 205 2003; Chen et al., 2007). As was expected, consistent gene family size was observed for the 206 majority of the genes in both genomes, with 19,697 genes showing a direct one-to-one 207 relationship (Figure 3A). Another 1,720 families showed only modest changes in gene family 208 size (sizes between 1-5 in both individuals; Figure 3B). Interestingly, substantial expansion and 209 contraction of gene families was observed for 136 families, where large expansion was defined 210 as being present in both individuals but having a difference of five or more family members 211 between the genotypes (Figure 3C). In fact, one family in PH207 contained 214 family members, 212 while B73 had only six genes in this family. Within the genotype that had the expanded family, 213 on average 39.7% of the genes within the family were expressed in at least one of six tissues 8 214 tested (leaf blade, root cortical parenchyma, root stele, germinating kernel, root tip, whole 215 seedling) when requiring reads to align uniquely. This number is likely an underestimation of the 216 true frequency of expression within these families due to the limitations of unique alignments 217 with short reads in large gene families (Hirsch et al., 2015). Across the families that showed 218 large changes in gene family size, an enrichment of Gene Ontology terms related to stress 219 response was observed, among other functions (Supplemental Data Set 2). A large proportion of 220 the additional gene family members were located in non-syntenic positions relative to the copies 221 found in the genotype with the smaller family size (Figure 3D). There are a number of helitrons 222 that are currently annotated as genes but are likely nonfunctional and susceptible to deletion. 223 These non-syntenic expanded families might be a reflection of the action of helitrons. However, 224 examples of expanded families that were completely clustered in a single location (Figure 3E), 225 both clustered and dispersed (Figure 3F), and completely dispersed (Figure 3G) were all 226 observed, indicating that these results cannot be explained simply by the annotation of helitrons 227 or fragments from helitrons as genes. The number of introns was evaluated to determine if the 228 additional copies were primarily the result of retrotransposed transcripts. Within each class 229 (clustered, clustered and dispersed, and dispersed), both genes without introns as well as genes 230 with at least one intron were observed (Figure 3E-G), indicating that retrotransposons are not the 231 primary mechanism driving the expansion of these gene families. 232 233 Genome content variation drives transcriptional variation between elite maize inbred lines 234 The availability of two maize genome sequence assemblies provided an opportunity to 235 evaluate sources of transcriptional variation between genotypes. We evaluated variation in the 236 transcriptome profiles of six tissues (described above) in the context of both the B73 and PH207 237 gene models (Supplemental Data Set 3). In total, 84.9% of genes were expressed in at least one 238 of the tissues in at least one of the two genotypes (Table 2). A relatively large number of genes 239 (9.4%) were consistently expressed in a genotype-specific manner, while another 23.3% were 240 genotype specific only in a subset of tissues. Furthermore, 52.5% of genes showed quantitative 241 differential expression between the two genotypes in at least one tissue, and 4.5% of genes 242 showed quantitative differential expression between the two genotypes across all of the tissues. 243 In total, 69.8% of genes showed either quantitative or qualitative variation in transcript 9 244 abundance in at least one of the six tissues that were evaluated, demonstrating that extensive 245 expression variation is present in the transcriptomes of elite maize inbred lines. 246 The comparison of the B73 and PH207 genome assemblies provided an opportunity to 247 evaluate the contribution of various types of genomic differences to the extensive transcriptional 248 variation that is present in highly selected complementary maize inbred lines. In total, 9.0% of 249 B73 models and 9.8% of PH207 models showed genotype-specific expression across all of the 250 tissues. For 11.6% of these genes, genomic-level PAV was also observed and was the basis for 251 the observed transcriptional variation. Genes showing genomic-level PAV had more tissue- 252 specific expression (Figure 4A) and had lower average expression when compared to genes that 253 were present in the genomes of both individuals (Figure 4B). This observation is further 254 supported using the B73 gene atlas, which includes transcript abundance estimates for 79 B73 255 tissues throughout development (Supplemental Figure 5; (Stelpflug et al., 2016)). Small 256 transcripts (less than 300 bp) are undersampled and therefore have lower estimated transcript 257 abundance due to technical biases resulting from size selection during library preparation (Hirsch 258 et al., 2015). However, based on the distribution of transcript size for genotype-specific and 259 shared genes, the lower observed expression of genotypic-specific genes was not a product of 260 this bias (Figure 4C). The number of genes that were not expressed in any tissue was slightly 261 higher for single exon genes (29.0%) relative to the total number of non-expressed genes 262 (18.5%), and there are therefore other contributing factors to this observed distribution. 263 Extensive quantitative variation in transcript abundance was also observed. Multiple levels of 264 genomic variation that were observed between B73 and PH207 could underlie this variation, 265 such as partial fragmentation of genes (Figure 2B, Supplemental Figure 4), variation in gene 266 family size that may lead to neo- and sub-functionalization at the expression level (Figure 3), and 267 polymorphisms in the promoters between the two genotypes. Of the 5,383 B73 and 7,991 PH207 268 genes that showed partial deletions in the reciprocal genome (Figure 2B), 1,883 and 2,835 were 269 differentially expressed in at least one tissue, respectively. However, transcript abundance 270 estimates and differential expression were calculated assuming common transcript lengths 271 between the genotypes. To test the impact of this on differential gene expression, we calculated 272 coverage-corrected transcript abundance estimates. Based on this correction, 20.5% of the genes 273 that were originally differentially expressed in at least one tissue were no longer differentially 274 expressed in any of the tissues for genes that had 25% or more of the gene model deleted and/or 10 275 unmappable. These results demonstrate how differences in genome annotation can create biases 276 that can affect downstream analyses such as differential gene expression analysis. Still, a large 277 number of genes that show partial deletion between the two genomes were differentially 278 expressed even after implementing a coverage correction. 279 Relationships between copy number variation, transcript abundance, and phenotypic 280 variation have been documented in multiple plant species on a single gene basis (Sutton et al., 281 2007; Gaines et al., 2010; Cook et al., 2012; Maron et al., 2013; Wang et al., 2015). On a 282 genome-wide scale, within both B73 and PH207, we see a relationship between gene family size 283 and the number of tissues in which expression is observed (Figure 5). As family size increases 284 the number of tissues in which expression is observed tends to decrease. Additionally, 29.6% of 285 genes that were differentially expressed in at least one tissue were from variably sized gene 286 families. 287 Finally, comparison of the B73 and PH207 assemblies uniquely allowed us to evaluate the 288 impact of context sequence, specifically promoter SNP and InDel variation, on transcriptional 289 variation on a genome-wide scale. For 6,379 of the one-to-one genes (Figure 3A), the 1 kb of 290 promoter sequence immediately upstream of the transcription start site could be accurately 291 compared, and the range of sequence similarity between the promoters ranged from 82.1% to 292 100%, with 4,138 having at least one SNP or InDel in the 1 kb of promoter sequence (Figure 293 6A). Of these 4,138, 75.3% of the genes were differentially expressed in at least one tissue. As 294 the number of promoter variants increased, differential expression was observed in a larger 295 number of tissues for both the number of SNPs (Figure 6B) and the number of InDels (Figure 296 6C), indicating the importance of this local variation in driving transcriptional variation in these 297 elite inbred lines on a genome-wide scale. 298 299 DISCUSSION 300 Extensive genome content variation within maize has previously been documented (Gore et 301 al., 2009; Springer et al., 2009; Lai et al., 2010; Swanson-Wagner et al., 2010; Chia et al., 2012; 302 Hansey et al., 2012; Hirsch et al., 2014). However, these studies were limited by ascertainment 303 bias either by investigating only in the context of the reference B73 genome assembly (Springer 304 et al., 2009; Swanson-Wagner et al., 2010) or by the reduced representation approaches that were 305 implemented (Gore et al., 2009; Hansey et al., 2012; Hirsch et al., 2014). Additionally, these 11 306 studies focused on the broad diversity within maize, while the question of how much variation is 307 present within highly selected elite inbred lines remained unanswered. Access to genome 308 assemblies of the highly selected maize inbred lines B73 and PH207 allowed for a 309 comprehensive analysis of the extensive genome and transcriptome variation present in elite 310 maize germplasm following a century of intense selective pressure to improve grain yield. 311 The search for large-effect yield genes has been the target of many studies. In some cases, 312 these endeavors have proven fruitful (Park et al., 2014; Weber et al., 2014). However, extensive 313 genome content variation was observed between B73 and PH207 (3.4% of genes show genomic- 314 level PAV). Extensive PAV has previously been observed in other elite maize germplasm 315 (Springer et al., 2009; Lai et al., 2010; Swanson-Wagner et al., 2010; Chia et al., 2012; Hirsch et 316 al., 2014; Lu et al., 2015), and this estimate of PAV between maize inbred lines is quite similar 317 to a previous report where on average 2,044 non-reference transcripts were assembled in each of 318 503 inbred lines (Hirsch et al., 2014). Additionally, we observed extensive transcriptional 319 variation between B73 and PH207, with 32.7% of genes showing transcript PAV in at least one 320 tissue, and 52.5% of genes differentially expressed in at least one tissue. The presence of this 321 variation in elite maize inbred lines indicates that there are multiple combinations of genes that 322 can be combined to achieve high-yielding varieties. High yields and heterosis that have been 323 obtained in hybrid varieties to date could be rooted in the extensive genome and transcriptome 324 variation that persists in elite inbred lines from opposite heterotic groups, as has been previously 325 hypothesized (Birchler et al., 2003; Song and Messing, 2003; Birchler et al., 2006; Springer and 326 Stupar, 2007; Birchler et al., 2010; Lai et al., 2010; Hansey et al., 2012; Yao et al., 2013) and is 327 supported by comparative analysis of B73 and PH207 genome and transcriptome variation. 328 In addition to understanding variation in elite maize inbred lines, comparisons between the 329 B73 and PH207 assemblies also allowed for a number of practical lessons to be learned. 330 Comparative genomics studies between species are often conducted in the context of the 331 reference genome assemblies for each species (Dong et al., 2004; Goodstein et al., 2012; Monaco 332 et al., 2014). However, there is extensive genome content variation between individuals within a 333 species not accounted for in these analyses. Comparison of the B73 and PH207 assemblies 334 revealed the impact of a reference genome centric approach on interspecies comparative genomic 335 studies. Homologous gene clusters were identified between B. distachyon, maize B73, maize 336 PH207, O. sativa, and S. bicolor proteins encoded by the representative transcript (longest) for 12 337 each gene. Of the subset that included at least one maize protein (24,954 clusters), 2,703 clusters 338 contained only a B73 protein and 2,025 contained only a PH207 protein sequence (Figure 7A). 339 This finding highlights the importance of expanding comparative genomic studies beyond single 340 reference-to-reference comparisons as the maize pan-genome is substantially larger than the 341 genome sequence represented by B73 and PH207 (Hirsch et al., 2014), and already a large 342 number of new maize homologs have been identified. 343 Within a species, there is a loss of duplicate genes following whole genome duplication 344 events through a process known as fractionation. Maize is an ancient tetraploid that has 345 undergone genome fractionation. A previous study identified and classified genes in the B73 346 genome into two maize subgenomes, Maize1 and Maize2, and determined there to be differential 347 fractionation between Maize1 and Maize2 (Schnable et al., 2011). To evaluate the presence of 348 differential fractionation between individuals, the PH207 genome was mined for the presence of 349 missing B73 syntelogs in Maize1 or Maize2. Indeed, there is substantial differential fractionation 350 between B73 and PH207. For example, B73 gene GRMZM2G129575, located in a syntenic 351 block on Chr2 in Maize2, is missing from the Maize1 syntenic block on Chr7 (Schnable et al., 352 2011). Based on sequence clustering using OrthoMCL, this gene clustered with two PH207 353 genes, one located on Chr2 and the other on Chr7, both in syntenic blocks with genes 354 neighboring GRMZM2G129575 (Figure 7B). Throughout the genome, over 1,265 families were 355 present in two copies in one of the two genomes and one copy in the other genome, likely due to 356 on-going fractionation between these genomes. This supports previous work conducted in the 357 context of the B73 genome assembly (Schnable et al., 2011) showing differential fractionation 358 between individuals following the maize whole genome duplication. This differential 359 fractionation is an important mechanism that generates natural variation within a species and 360 provides the genetic basis for selection to drive genome content variation between maize inbreds 361 from opposite heterotic groups. Additionally, these results highlight the importance of thinking 362 outside of a single reference genome in studies relating to genome fractionation following 363 polyploidization. Interestingly, many of the gene PAVs that were observed between B73 and 364 PH207 were found outside of the Maize1 and Maize2 syntenic blocks, indicating the presence of 365 additional mechanisms driving genome content variation in maize. 366 Finally, comparison of the two assemblies provided the means to assess two pitfalls that 367 often arise from the use of single reference genomes to make inferences across multiple 13 368 individuals within a species. There are many biases that exist with RNAseq for transcript 369 abundance estimates when using a single reference genotype (Hirsch et al., 2015); these biases 370 are further confounded outside of the reference genotype by the alternative gene model structures 371 and partial gene deletions that exist between individuals within a species. For example, transcript 372 abundance estimates for 15.9% of genes were highly influenced by deletion of more than 25% of 373 the gene model in the opposite genome, and 20.5% were falsely identified as differentially 374 expressed in one or more tissues based on this bias. Variable gene models between individuals 375 have also been shown to compensate for large-effect mutations in A. thaliana (Gan et al., 2011). 376 Based on alignment of PH207 resequencing reads to the B73 genome assembly, 53,377 377 moderate-to-large effect mutations were identified, of which 12,855 (24.1%) were located in 378 intronic sequences based on the PH207 specific annotation of the PH207 genome assembly 379 (Figure 7C; Supplemental Figure 6). 380 Access to multiple genome assemblies of elite maize inbred lines has expanded our 381 knowledge on the breadth of natural genome and transcriptome variation that persists in elite 382 maize inbred lines following over a century of intense breeding. This genome-wide 383 characterization of genomic variation and the relationship with transcriptomic variation provides 384 important information in our understanding of the molecular basis of heterosis and the breeding 385 community’s ability to continue to make gains from selection in highly selected maize 386 germplasm. With access to only two whole genome assemblies, many lessons were learned 387 regarding limitations in inter- and intra-species comparisons, transcriptome profiling, and 388 characterization of allelic variation in the context of a single reference genome assembly and 389 annotation that transcend beyond maize. 390 391 METHODS 392 Plant Material and DNA and RNA extraction 393 DNA was extracted from maize (Zea mays) inbred lines PH207 (PI 601005; Lot 03ncai01) 394 and B73 (PI 550473; Lot 08ncai02) from seedlings germinated from seed provided by the 395 National Plant Germplasm System Regional Plant Introduction Station in Ames Iowa using a 396 modified CTAB procedure (Murray and Thompson, 1980). Plants were grown under greenhouse 397 conditions (27°C/24°C day/night and 16 h light/8 h dark) in Metro-Mix 300 (Sun Gro 398 Horticulture) with no additional fertilizer and under fluorescent lights. DNA was quantified 14 399 using picogreen, absorbance at 260nm/280nm using a NanoDrop, and gel bands using 1.2% E- 400 Gel. 401 RNAseq reads for B73 leaf blade, cortical parenchyma, germinating kernel, root tip, whole 402 seedling, and stele were downloaded from the National Center for Biotechnology Information 403 (accession numbers PRJNA171684 and SRP010680). For each tissue, sequencing reads for three 404 biological replicates were obtained. Each biological replicate consisted of pooled tissue from 405 three individual plants. Tissue sampling and RNA extraction for PH207 leaf blade, cortical 406 parenchyma, germinating kernel, root tip, whole seedling, and stele were conducted as 407 previously described for the comparable B73 tissues (Stelpflug et al., 2016). For each tissue, two 408 biological replicates were collected and processed, again with three individual plants pooled per 409 biological replicate. 410 411 DNA and RNA Sequencing 412 PH207 DNA Sequencing. Shotgun genomic libraries of 300 bp and 800 bp DNA fragment sizes 413 were prepared using the TruSeq DNA Sample Preparation Kit version 2 according to the 414 manufacturer’s protocol (Illumina, San Diego, CA). A third shotgun library was made using the 415 same kit from DNA template fragments size selected from ~350 bp to ~450 bp with no PCR 416 amplification (PCR-free). This fragment size was designed to produce a sequencing overlap of 417 the fragments to be sequenced on the MiSeq as paired-end (PE) sequencing 250 nt per end, thus 418 creating an opportunity to produce ‘stitched’ reads of approximately 350 nt to 400 nt in length. 419 Multiple MP libraries per jump were made with the objective to increase sequence diversity and 420 genome coverage. Four separate MP libraries were constructed for each of the 3 kb and 8 kb 421 jumps, and 2 MP libraries for the 15 kb jump using the Illumina Nextera Mate-Pair Sample 422 Preparation Kit (Illumina, San Diego, CA). 423 The 300 bp and 800 bp shotgun libraries and the MP libraries were sequenced on an 424 Illumina HiSeq2000 as 100 nt PE reads. The pool of MP libraries was further sequenced on an 425 Illumina HiSeq2500 as 150 nt PE reads. The PCR-free shotgun library was sequenced on an 426 Illumina MiSeq as 250 nt reads. All sequencing was conducted at the Roy J. Carver 427 Biotechnology Center (Urbana, IL) at the University of Illinois, except for the 300 nt PE 428 genomic library, which was sequenced at Dow AgroSciences (Indianapolis, IN). 15 429 The synthetic long-read libraries were constructed, sequenced, and assembled by Illumina 430 Inc. Sequencing Services (San Diego, California). The PH207 genomic libraries were made 431 using the TruSeq Synthetic Long-Read DNA library preparation kit materials and workflow 432 (http://supportres.illumina.com/documents/documentation/chemistry_documentation/sampleprep 433 s_truseq/truseqsynthlongreaddna/truseq-synth-long-read-dna-library-prep-guide-15047264- 434 a.pdf). Each library consists of a barcoded pool of 384 indexes (one per each well of 10 kb 435 genomic DNA template fragments), and each of the 10 libraries was sequenced individually on 436 one HiSeq2000 lane as 100 nt PE reads. 437 438 B73 DNA Sequencing. A 300 bp DNA fragment size shotgun genomic library was prepared 439 using a TruSeq DNA Sample Preparation Kit version 2 according to the manufacturer’s protocol 440 (Illumina, San Diego, CA). The B73 genomic library was sequenced by Dow AgroSciences on 441 an Illumina HiSeq2000 as 100 nt PE reads. 442 443 PH207 RNA Sequencing. Approximately 5 µg of total RNA was processed for mRNA isolation, 444 fragmented, converted to cDNA, and PCR amplified according to the Illumina TruSeq RNA 445 Sample Prep Kit protocol and sequenced on an Illumina HiSeq 2500 (San Diego, CA) at the Roy 446 J. Carver Biotechnology Center (Urbana, IL) at the University of Illinois as 150 nt PE reads. 447 448 PH207 Genome Assembly 449 Reads pre-processing and error correction. For all PH207 genomic libraries, with the exception 450 of TruSeq synthetic long-reads, PCR duplicates were removed using FastUniq software (Xu et 451 al., 2012). The Illumina HiSeq2000 adaptor AGATCGGAAGAGC was removed, and reads were 452 error corrected using the Corrector_HA module of SOAPdenovo (using kmer size 23 and cutoff 453 of 6) (Luo et al., 2012). For the PCR-free library (MiSeq stitched reads), following adaptor 454 truncation, overlapping reads were merged using FLASH (Magoc and Salzberg, 2011) with a 455 minimal required overlap of 10 bp to create the stitched reads. Processing of MP reads consisted 456 of filtering out putative false mate-pairs by searching for the Nextera linker (10 nucleotides of 457 CTGTCTCTTATACACATCTAGATGTGTATAAGAGACAG) sequence on either end of the 458 MP. Mate pairs for which the linker was not found were sorted into a separate file for restricted 459 scaffolding application. Mate-pairs that did not hit the linker were used only in support of links 16 460 found with the filtered MPs, but were not used to create links independently. The TruSeq 461 synthetic long-read assembly pipeline application was processed by Illumina (Illumina IGN 462 FastTrack Long Reads version 41; Illumina, San Diego, CA) to create synthetic long-read builds 463 that represent long contiguous template fragments. 464 465 PH207 De Novo Scaffold Assembly. In the first step of assembly, SOAPdenovo v1.05 (Luo et 466 al., 2012) was used to construct a De Bruijn graph of contigs from the single end reads of the PE 467 library using very conservative settings (no bubble merge and no repeat masking, but removing 468 low coverage kmers and edges). A kmer of size 63 bp was optimal for De Bruijn graph 469 construction. To scaffold the contigs of the De Bruijn graph, non-repetitive contigs within the 470 graph were identified and assembled into scaffolds based on mapping information of the single 471 end reads, followed by application of synthetic long-reads to resolve long repeats in a similar 472 way. The mapping of the single read ends (mapping without gaps) and long-read builds 473 facilitated scaffolding by linking contigs mapping to the same read. 474 Scaffolding was completed using a directed graph containing scaffolds longer than 200 bp as 475 nodes, and edges were based on the PE and MP links as vertices. Erroneous connections were 476 filtered out to generate unconnected sub-graphs that were ordered into scaffolds. PE reads were 477 used to find reliable paths in the graph for additional repeat resolving. This was accomplished 478 through searching the De Bruijn graph for a unique path connecting pairs of reads mapping to 479 two different scaffolds. At this phase of scaffolding, scaffolds had an average size of thousands 480 of bases and almost no gaps. The scaffolds were then furthered ordered and linked using the MP 481 libraries, estimating gaps between the contigs according to the distance of MP links. Linking 482 scaffolds with MP reads required confirmation of at least three filtered MPs or at least one 483 filtered MP with supporting confirmation from two or more filter failed MPs where the Nextera 484 adaptor was not found. Scaffolds shorter than 200 bp were masked and links between non- 485 repetitive contigs mapping to the same scaffolds were united, generating a directed scaffold 486 graph. In agreement with previous reports, a significant amount of erroneous MPs was observed. 487 Since these pairs link between long non-branched components in the scaffold graph, the 488 scaffolding procedure identified the non-branched components and filtered out the rare 489 connections between them. 490 17 491 Reference Guided Construction of Pseudomolecules. Further ordering of scaffolds was 492 achieved through scaffold alignment to the B73 reference genome using BWA-SW version 0.6.1 493 (Li and Durbin, 2010) requiring alignment quality of at least one. The most probable genomic 494 location was assigned to the ordered scaffold based on the alignments of the unordered scaffolds 495 that generated it. The ordered scaffolds were then placed into pseudomolecules to maximize 496 synteny between the PH207 and B73 genomes. Finally, small scaffolds that mapped inside the 497 unfilled gap of ordered scaffolds replaced these gaps if MP links existed between them. 498 TruSeq synthetic long reads were aligned to scaffolds generated without usage of the TruSeq 499 synthetic long-reads. Scaffolds were identified that had a significant alignment of greater than 30 500 kb to two different locations greater than 10 Mb apart on the B73 reference genome. For those 501 scaffolds where the block aligning to the alternative locations was between two blocks that 502 aligned to the same location, it was assumed to be a translocation between the two genomes, and 503 it was considered unlikely there were two misassemblies at the same region. If a block aligned to 504 one location followed by a block aligning to another distant genomic location, it was considered 505 a misassembly. Suspicious scaffolds were further resolved using a previously defined 506 genotyping-by-sequencing anchor tag pipeline (Lu et al., 2015), and those flagged by both 507 criteria were manually reviewed for misassembly. 508 To assess the completeness of the genome assembly, PH207 genomic paired-end reads were 509 cleaned using Cutadapt version 1.8.1 (Martin, 2011) requiring a minimum length of 70 and a 510 minimum quality of 10, and aligned to the PH207 genome assembly using Bowtie version 2.2.4 511 (Langmead and Salzberg, 2012) with default parameters as single end reads to determine the 512 percentage of reads that could map to the assembly. Variants were called using Samtools 513 mpileup (Li et al., 2009) with a minimum depth of three and a maximum depth of 200. Only 514 variants exceeding a quality score of 20 were retained. To assess the completeness of the gene 515 space in the assemblies, PH207 RNAseq reads were aligned to the PH207 genome assembly and 516 B73 517 (ftp://ftp.ensemblgenomes.org/pub/plants/release-21). Reads were cleaned using Cutadapt 518 version 1.8.1 (Martin, 2011) requiring a minimum length of 75 nt and a minimum quality of 20. 519 All reads were trimmed to 75 nt using fastx_trimmer within the FASTX toolkit version 0.0.14 520 (http://hannonlab.cshl.edu/fastx_toolkit/index.html) for consistency between the B73 and PH207 521 reads. Reads were aligned using Bowtie2 version 2.2.4 (Langmead and Salzberg, 2012) and RNAseq reads were aligned to the 18 B73 version 3.21 genome assembly 522 TopHat2 version 2.0.12 (Kim et al., 2013) with the maximum number of multihits set to 20, 523 minimum intron size of 10 bp, and maximum intron size of 60,000 bp. Additionally, the Core 524 Eukaryotic Genes Mapping Approach pipeline version 2.4 (Parra et al., 2007) was run using 525 default parameters. Comparison of the B73 and PH207 genome assemblies was completed using 526 the NUCmer program within MUMmer version 3.23 (Kurtz et al., 2004). The B73 version 3.21 527 assembly was used for the comparison, and the minimum length of a cluster match was set to 528 5,000 for the genome-wide plot and 250 for the regional view of chromosome 1. 529 530 PH207 Genome Annotation 531 Gene Model Structural Annotation. The MAKER genome annotation pipeline was used to 532 generate gene annotations (Campbell et al., 2014a; Campbell et al., 2014b; Law et al., 2015). The 533 PH207 genome assembly was first masked using REPEATMASKER and a maize custom repeat 534 library (Smit et al., 2013-2015; Law et al., 2015). Six PH207 transcript assemblies (leaf blade, 535 root cortical parenchyma, root stele, germinating kernel, root tip, and whole seedling) generated 536 using Trinity (Grabherr et al., 2011), the predicted O. sativa proteome and UniProtKB/Swiss- 537 Prot plant proteins (minus maize sp. proteins) were used as evidence during each stage of the 538 MAKER pipeline (Kawahara et al., 2013; UniProt Consortium, 2014). Initially, a Hidden 539 Markov Model (HMM) was trained for the SNAP ab initio gene prediction program using high- 540 quality transcript assembly alignments as gene proxies (Korf, 2004). MAKER was run using the 541 initial SNAP HMM, and these gene predictions were used to train SNAP a second time. 542 MAKER was run again using the second SNAP HMM to make gene predictions, and these gene 543 models were used to train an HMM for the Augustus gene prediction program (Stanke and 544 Waack, 2003). MAKER was run a final time using both the second SNAP HMM and the 545 Augustus HMM. The genes identified by MAKER included models with and without transcript 546 or protein alignment evidence. The predicted proteins from the MAKER gene set were analyzed 547 with hmmscan to identify Pfam domains (Eddy, 2011; Finn et al., 2014). A high-quality gene set 548 was created using all gene predictions that were supported by transcript or protein evidence or 549 that coded for a protein with a Pfam domain. Using this analysis pipeline, a single transcript was 550 annotated for each locus. 551 19 552 Gene Model Functional Annotation. All PH207 transcripts were functionally annotated using 553 transitive annotation based on best BLAST his from A. thaliana TAIR10 (Lamesch et al., 2012), 554 B. distachyon v2.1 (International Brachypodium, 2010), O. sativa release 7 (Ouyang et al., 555 2007), and S. bicolor v1.4 (Paterson et al., 2009) annotations. For each species, TBLASTN was 556 used within WUBLAST (v2.0) (Altschul et al., 1990b) to search PH207 protein sequences 557 against a nucleotide database for each species. The results were filtered to retain only the top hit 558 requiring an E-value < 1e-5 to retain the hit. The same criteria were used to assign a UniRef100 559 release 2015_05, 29-Apr-2015 (Suzek et al., 2007) identifier to each PH207 protein, except that 560 BLASTP was used. The -goterms option of InterProScan version 5.0 (Zdobnov and Apweiler, 561 2001) was used to assign Gene Ontology terms to each PH207 transcript. 562 563 Genome Content Variation 564 To identify genotype-specific genes from each assembly, B73 version 3.21 and PH207 565 transcript sequences were aligned to both the B73 version 3.21 and PH207 genome assemblies 566 using GMAP version 2012-06-02 (Wu and Watanabe, 2005). A transcript with an alignment with 567 >85% coverage and >85% identity to its cognate genome and the reciprocal genome were 568 considered present in both genomes, those with >85% coverage and >85% identity to its cognate 569 genome and <85% coverage and <85% identity to the reciprocal genome were considered absent 570 in the reciprocal genome, and those with <85% coverage and <85% identity to its cognate 571 genome were not classified. Due to gaps in the genome assemblies, PAVs were also determined 572 using mapping of B73 and PH207 resequencing reads to both the B73 version 3.21 and PH207 573 genome assemblies. B73 and PH207 resequencing reads were mapped to the B73 version 3.21 574 assembly and the PH207 assembly using Bowtie2 version 2.2.3 (Langmead and Salzberg, 2012) 575 requiring a mapping quality score greater than 20. The portion of the bases with coverage were 576 determined using the intersect program within BEDTools version 2.19.0 (Quinlan and Hall, 577 2010). A bi-modal distribution of coverage was observed, with the majority of genes having 578 <25% coverage or greater than 75% coverage. Thus, any gene that had >75% of positions 579 covered by its own reads and <25% of positions covered by the opposite genotype reads was 580 considered PAV. This method was also used to classify partially deleted genes and the percent of 581 the gene that was deleted. For partial gene deletions, the percent of the gene that was deleted was 582 calculated as the percentage of the gene with coverage in the reciprocal genome divided by the 20 583 percent of the gene with coverage in the cognate genome. The comparison of gene density in 584 high and low recombination regions was determined using a chi-square test with previously 585 determined boundaries based on previously described B73/Mo17 near isogenic lines (Swanson- 586 Wagner et al., 2010; Eichten et al., 2011). Boundary coordinates were converted to B73 version 587 3.21 coordinates using NCBI BLASTN with 400 bp of context sequence. 588 To identify gene families that were expanded or contracted in PH207 relative to B73, an all- 589 versus-all blast was performed using WU BLASTN (Altschul et al., 1990a) with the B73 version 590 3.21 transcripts and PH207 transcripts, requiring a minimum E-value of 1e-10 and allowing up to 591 5,000 hits per sequence. The transcripts were clustered using OrthoMCL version 1.4 (Li et al., 592 2003; Chen et al., 2007) in mode 4 with default parameters to identify putative 593 paralogous/homologous gene families between the two genotypes. This analysis was also used to 594 identify high confidence one-to-one genes between the two individuals for subsequent analyses. 595 The enrichment of Gene Ontology terms was analyzed using AgriGO (Du et al., 2010) using 596 Fisher’s exact test to calculate an enrichment P-value, with a value of <0.05 being considered 597 significant after using the Yekutieli method for multiple test correction. 598 Promoter SNP and lnDel variation was explored for one-to-one genes that were located on 599 the same chromosome within 20Mb of each other between the two genome assemblies and had 600 at least 1kb of promoter sequence upstream of the transcription start site in both assemblies 601 (N=18,798). Sequences with one or more N’s in either assembly were removed, resulting in 602 12,537 promoter comparisons. For each pair-wise comparison, the 1kb of promoter sequences 603 from each assembly were aligned using NCBI BLASTN version 2.2.28 (Altschul et al., 1990b). 604 Promoter comparisons with less than 700 bp alignments (N=6,114) were removed from 605 downstream analysis. Alignments were parsed to identify SNPs and InDels using the 606 blastn2snp.jar scripts within Jvarkit (Lindenbaum, 2015). 607 608 Transcriptional Analysis 609 All RNAseq reads were trimmed to 75 nt to remove low quality bases and aligned to both the 610 B73 version 3.21 reference genome assembly and the PH207 genome assembly using Bowtie2 611 version 2.1.0 (Langmead and Salzberg, 2012) and TopHat2 version 2.0.10 (Kim et al., 2013). 612 The minimum intron size was set to five bp, and the maximum intron size was set to 60,000 bp. 613 All other mapping parameters were set to the default values. Transcript abundance estimates 21 614 (measured as fragments per kilobase of exon model per million fragments mapped) were 615 determined using Cufflinks2 version 2.1.1 (Trapnell et al., 2010) with a minimum and maximum 616 intron size of five bp and 60,000 bp, respectively, and providing genome assemblies for bias 617 correction. Transcript abundance counts for the representative transcript (longest transcript) were 618 generated with HTSeq version 0.6.1p1 (Anders et al., 2015) at the gene level using the union 619 mode and a minimum mapping quality of 20 with non-strand specific counting (--stranded no). 620 Corrected counts by genome coverage were determined by taking the raw counts divided by the 621 percent exon coverage from the resequencing data rounded to the nearest whole number. A gene 622 was defined as expressed in a sample if it had a count of one or more. Differential expression 623 analysis was conducted using DESeq2 version 1.8.2 (Love et al., 2014) within R version 3.2.1 (R 624 Development Core Team, 2011) using Bioconductor version 3.1 (Gentleman et al., 2004) with 625 default parameters. Genes with an adjusted p-value <0.05 and a fold change >2 in either 626 direction were considered differentially expressed. Differential expression analysis was 627 conducted for B73 RNAseq reads mapped to the B73 genome assembly versus PH207 RNAseq 628 reads mapped to B73 genome assembly as well as for B73 RNAseq reads mapped to the PH207 629 genome assembly versus PH207 RNAseq reads mapped to the PH207 genome assembly within 630 each tissues. Differential expression analysis was conducted in this way because of our 631 documented difference in untranslated region representation between the annotation for B73 and 632 PH207, which would bias expression downwardly in PH207 one-to-one genes relative to the 633 corresponding B73 gene model. 634 635 Comparative Genomics Analysis 636 Homologous grass genes that were present in only one assembly (either B73 or PH207) were 637 identified using OrthoMCL version 1.4 (Li et al., 2003; Chen et al., 2007). An all-versus-all 638 Blast analysis was performed using WU BLASTP (Altschul et al., 1990a) requiring a minimum 639 E-value of 1e-10 and allowing up to 5,000 hits per sequence with the B. distachyon v2.1 640 (International Brachypodium, 2010), maize B73 version 3.21, maize PH207, O. sativa release 7 641 (Ouyang et al., 2007), and S. bicolor v1.4 (Paterson et al., 2009) representative protein sequences 642 defined as the protein encoded by the longest transcript. Homologous gene families were 643 identified by OrthoMCL version 1.4 in mode 4 with default parameters. 22 644 To evaluate differential fractionation in B73 and PH207 following the maize whole genome 645 duplication, the previously defined B73 Maize1 and Maize2 classifications (Schnable et al., 646 2011) from B73 v2 were converted to B73 v3 coordinates using CrossMap (Zhao et al., 2014) 647 and overlapping gene models were identified. The converted Maize1 and Maize2 classifications 648 were compared with the homologous gene clustering generated using OrthoMCL (Li et al., 2003; 649 Chen et al., 2007) with the B73 version 3.21 and PH207 nucleotide sequences described above. 650 651 Analysis of Deleterious Mutations 652 PH207 genomic reads were cleaned using Cutadapt version 1.8.1 (Martin, 2011) requiring a 653 minimum length of 70 nt and a minimum quality of 10. Cleaned reads were aligned to the B73 654 version 3.21 reference assembly using Bowtie version 2.2.4 (Langmead and Salzberg, 2012) with 655 default parameters as single end reads. Variants were called using Samtools mpileup (Li et al., 656 2009) with a minimum depth of three and a maximum depth of 200. Only variants exceeding a 657 quality score of 20 were retained. Variants were then annotated via SnpEff version 4.1 H 658 (Cingolani et al., 2012) using default parameters. The annotated VCF file was filtered to retain 659 only the predicted high and moderate impact variants that were located within the representative 660 transcript of one-to-one genes based on the B73 model. As defined by the VCF annotation 661 standard, high impact effects included: frameshift variants, splice acceptor/donor variants, 662 start/stop lost, and stop gained. Moderate impact predicated effects included: inframe InDels and 663 missense variants. BLASTN 2.2.25+ (Altschul et al., 1990b) was used to align B73 genic 664 sequence from one-to-one genes to the PH207 assembly. Variants were parsed from the 665 alignment to determine the position of annotated variants from the Bowtie alignment in the 666 context of the PH207 assembly by scanning for each variant in the parsed BLAST output. Only 667 variants that had a consistent alignment position between the Bowtie and BLAST alignment 668 algorithms were retained. The ‘intersect’ sub-command within the BEDTools suite version 669 2.17.0 (Quinlan and Hall, 2010) was used to determine whether the retained variants were 670 located within PH207 exon sequences. 671 672 Accession Numbers 673 Sequence data from this article can be found in the Sequence Read Archive at the National 674 Center for Biotechnology Information under accession number PRJNA258455. The PH207 23 675 genome multifasta file, GFF annotation file, transcript multifasta file, and protein multifasta file 676 are available for download from the Dryad Digital Repository (DOI: 10.5061/dryad.8vj84) and 677 are 678 http://www.maizegdb.org) and at Phytozome (https://phytozome.jgi.doe.gov). Additionally, the 679 PH207 genome assembly results as Integrative Genomics Viewer (Broad Institute) tracks are 680 available for public access at http://nrgene.com. also available at the Maize Genetics and Genomics Database (MaizeGDB; 681 682 Supplemental Data 683 Supplemental Figure 1. Distribution of 23-mers in the PH207 sequence reads. 684 685 Supplemental Figure 2. PH207 and B73 RNAseq read mapping statistics. 686 687 Supplemental Figure 3. Annotation edit distance curve for the PH207 MakerP annotation. 688 689 Supplemental Figure 4. Resequencing-based approach to identify partial gene deletions. 690 691 Supplemental Figure 5. Density distribution of expression in B73 and PH207 for shared and 692 genotype-specific genes. 693 694 Supplemental Figure 6. Distribution of predicted impact of PH207 variants. 695 696 Supplemental Table 1. Reads generated from the PH207 genome assembly. 697 698 Supplemental Table 2. Assembly statistics at each progressive assembly step. 699 700 Supplemental Data Set 1. Functional annotation of the PH207 gene models. 701 702 Supplemental Data Set 2. Enriched Gene Ontology terms for B73 and PH207 genes that are 703 members of gene families that have expanded or contracted between the two genotypes and 704 genes that show PAV between the two genotypes. 705 24 706 Supplemental Data Set 3. Transcript abundance estimates for six distinct tissues in the inbred 707 lines B73 and PH207. 708 709 AUTHOR CONTRIBUTIONS 710 CNH, KK, NMS, EB, CRB, NdL, SMK, and MM designed the study. AGH, CJF, and CLW 711 acquired data. CNH, CDH, ABB, MJB, IS, OB, DST, KB, FL, CJF, and KLC analyzed and 712 interpreted data. CNH and MM wrote the manuscript. All authors read and approved the final 713 manuscript. 714 715 ACKNOWLEDGEMENTS 716 This work was funded in part by the DOE Great Lakes Bioenergy Research Center (DOE BER 717 Office of Science DE-FC02-07ER64494), Dow AgroSciences, LLC, and by the National Science 718 Foundation (grant no. IOS–1126998 to KLC). The Minnesota Supercomputing Institute (MSI) at 719 the University of Minnesota provided computational resources that contributed to the research 720 results reported within this paper. CDH was supported by a National Science Foundation 721 National Plant Genome Initiative Postdoctoral Fellowship in Biology Fellowship (Grant No. 722 1202724). ABB was supported by the DuPont Pioneer Bill Kuhn Honorary Fellowship. 723 724 REFERENCES 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 Abendroth, L.J., Elmore, R.W., Boyer, M.J., and Marlay, S.K. (2011). Corn growth and development. PMR 1009. Iowa State University Extension, Ames Iowa. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990a). Basic local alignment search tool. J Mol Biol 215, 403-410. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990b). Basic Local Alignment Search Tool. Journal of Molecular Biology 215, 403-410. Anders, S., Pyl, P.T., and Huber, W. (2015). HTSeq-a Python framework to work with highthroughput sequencing data. Bioinformatics 31, 166-169. Anderson, J.E., Kantar, M.B., Kono, T.Y., Fu, F., Stec, A.O., Song, Q., Cregan, P.B., Specht, J.E., Diers, B.W., Cannon, S.B., McHale, L.K., and Stupar, R.M. (2014). A roadmap for functional structural variants in the soybean genome. G3 4, 1307-1318. Arabidopsis Genome, I. (2000). Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796-815. Belo, A., Beatty, M.K., Hondred, D., Fengler, K.A., Li, B., and Rafalski, A. (2010). Allelic genome structural variations in maize detected by array comparative genome hybridization. Theor Appl Genet 120, 355-367. Birchler, J.A., Auger, D.L., and Riddle, N.C. (2003). In search of the molecular basis of heterosis. Plant Cell 15, 2236-2239. 25 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 Birchler, J.A., Yao, H., and Chudalayandi, S. (2006). Unraveling the genetic basis of hybrid vigor. Proc Natl Acad Sci U S A 103, 12957-12958. Birchler, J.A., Yao, H., Chudalayandi, S., Vaiman, D., and Veitia, R.A. (2010). Heterosis. Plant Cell 22, 2105-2112. Brunner, S., Fengler, K., Morgante, M., Tingey, S., and Rafalski, A. (2005). Evolution of DNA sequence nonhomologies among maize inbreds. Plant Cell 17, 343-360. Campbell, M.S., Holt, C., Moore, B., and Yandell, M. (2014a). Genome Annotation and Curation Using MAKER and MAKER-P. Curr Protoc Bioinformatics 48, 4 11 11-14 11 39. Campbell, M.S., Law, M., Holt, C., Stein, J.C., Moghe, G.D., Hufnagel, D.E., Lei, J., Achawanantakun, R., Jiao, D., Lawrence, C.J., Ware, D., Shiu, S.H., Childs, K.L., Sun, Y., Jiang, N., and Yandell, M. (2014b). MAKER-P: a tool kit for the rapid creation, management, and quality control of plant genome annotations. Plant Physiol 164, 513-524. Cao, J., Schneeberger, K., Ossowski, S., Gunther, T., Bender, S., Fitz, J., Koenig, D., Lanz, C., Stegle, O., Lippert, C., Wang, X., Ott, F., Muller, J., Alonso-Blanco, C., Borgwardt, K., Schmid, K.J., and Weigel, D. (2011). Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nat Genet. Chen, F., Mackey, A.J., Vermunt, J.K., and Roos, D.S. (2007). Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes. PLoS ONE 2, e383. Chia, J.M., Song, C., Bradbury, P.J., Costich, D., de Leon, N., Doebley, J., Elshire, R.J., Gaut, B., Geller, L., Glaubitz, J.C., Gore, M., Guill, K.E., Holland, J., Hufford, M.B., Lai, J., Li, M., Liu, X., Lu, Y., McCombie, R., Nelson, R., Poland, J., Prasanna, B.M., Pyhajarvi, T., Rong, T., Sekhon, R.S., Sun, Q., Tenaillon, M.I., Tian, F., Wang, J., Xu, X., Zhang, Z., Kaeppler, S.M., Ross-Ibarra, J., McMullen, M.D., Buckler, E.S., Zhang, G., Xu, Y., and Ware, D. (2012). Maize HapMap2 identifies extant variation from a genome in flux. Nat Genet 44, 803-807. Cingolani, P., Platts, A., Wang le, L., Coon, M., Nguyen, T., Wang, L., Land, S.J., Lu, X., and Ruden, D.M. (2012). A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80-92. Cook, D.E., Lee, T.G., Guo, X., Melito, S., Wang, K., Bayless, A.M., Wang, J., Hughes, T.J., Willis, D.K., Clemente, T.E., Diers, B.W., Jiang, J., Hudson, M.E., and Bent, A.F. (2012). Copy number variation of multiple genes at Rhg1 mediates nematode resistance in soybean. Science 338, 1206-1209. Dong, Q., Schlueter, S.D., and Brendel, V. (2004). PlantGDB, plant genome database and analysis tools. Nucleic acids research 32, D354-359. Du, Z., Zhou, X., Ling, Y., Zhang, Z., and Su, Z. (2010). agriGO: a GO analysis toolkit for the agricultural community. Nucleic acids research 38, W64-70. Eddy, S.R. (2011). Accelerated Profile HMM Searches. PLoS Comput Biol 7, e1002195. Eichten, S.R., Foerster, J.M., de Leon, N., Kai, Y., Yeh, C.T., Liu, S., Jeddeloh, J.A., Schnable, P.S., Kaeppler, S.M., and Springer, N.M. (2011). B73-Mo17 near-isogenic lines demonstrate dispersed structural variation in maize. Plant Physiol 156, 1679-1690. Eilbeck, K., Moore, B., Holt, C., and Yandell, M. (2009). Quantitative measures for the management and comparison of annotated genomes. BMC Bioinformatics 10, 67. 26 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 Finn, R.D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Heger, A., Hetherington, K., Holm, L., Mistry, J., Sonnhammer, E.L., Tate, J., and Punta, M. (2014). Pfam: the protein families database. Nucleic acids research 42, D222-230. Fu, H., and Dooner, H.K. (2002). Intraspecific violation of genetic colinearity and its implications in maize. Proc Natl Acad Sci U S A 99, 9573-9578. Gaines, T.A., Zhang, W., Wang, D., Bukun, B., Chisholm, S.T., Shaner, D.L., Nissen, S.J., Patzoldt, W.L., Tranel, P.J., Culpepper, A.S., Grey, T.L., Webster, T.M., Vencill, W.K., Sammons, R.D., Jiang, J., Preston, C., Leach, J.E., and Westra, P. (2010). Gene amplification confers glyphosate resistance in Amaranthus palmeri. Proc Natl Acad Sci U S A 107, 1029-1034. Gan, X., Stegle, O., Behr, J., Steffen, J.G., Drewe, P., Hildebrand, K.L., Lyngsoe, R., Schultheiss, S.J., Osborne, E.J., Sreedharan, V.T., Kahles, A., Bohnert, R., Jean, G., Derwent, P., Kersey, P., Belfield, E.J., Harberd, N.P., Kemen, E., Toomajian, C., Kover, P.X., Clark, R.M., Ratsch, G., and Mott, R. (2011). Multiple reference genomes and transcriptomes for Arabidopsis thaliana. Nature 477, 419-423. Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R., Leisch, F., Li, C., Maechler, M., Rossini, A.J., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J.Y., and Zhang, J. (2004). Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5, R80. Goodstein, D.M., Shu, S., Howson, R., Neupane, R., Hayes, R.D., Fazo, J., Mitros, T., Dirks, W., Hellsten, U., Putnam, N., and Rokhsar, D.S. (2012). Phytozome: a comparative platform for green plant genomics. Nucleic acids research 40, D1178-1186. Gore, M.A., Chia, J.M., Elshire, R.J., Sun, Q., Ersoz, E.S., Hurwitz, B.L., Peiffer, J.A., McMullen, M.D., Grills, G.S., Ross-Ibarra, J., Ware, D.H., and Buckler, E.S. (2009). A first-generation haplotype map of maize. Science 326, 1115-1117. Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., Chen, Z., Mauceli, E., Hacohen, N., Gnirke, A., Rhind, N., di Palma, F., Birren, B.W., Nusbaum, C., Lindblad-Toh, K., Friedman, N., and Regev, A. (2011). Full-length transcriptome assembly from RNASeq data without a reference genome. Nat Biotechnol 29, 644-652. Hansey, C.N., Vaillancourt, B., Sekhon, R.S., de Leon, N., Kaeppler, S.M., and Buell, C.R. (2012). Maize (Zea mays L.) Genome Diversity as Revealed by RNA-Sequencing. PLoS ONE 7, e33071. Hardigan, M.A., Crisovan, E., Hamiltion, J.P., Kim, J., Laimbeer, P., Leisner, C.P., Manrique-Carpintero, N.C., Newton, L., Pham, G.M., Vaillancourt, B., Yang, X., Zeng, Z., Douches, D., Jiang, J., Veilleux, R.E., and Buell, C.R. (2016). Genome reduction uncovers a large dispensable genome and adaptive role for copy number variation in asexually propagated Solanum tuberosum. The Plant Cell. Hirakawa, H., Okada, Y., Tabuchi, H., Shirasawa, K., Watanabe, A., Tsuruoka, H., Minami, C., Nakayama, S., Sasamoto, S., Kohara, M., Kishida, Y., Fujishiro, T., Kato, M., Nanri, K., Komaki, A., Yoshinaga, M., Takahata, Y., Tanaka, M., Tabata, S., and Isobe, S.N. (2015). Survey of genome sequences in a wild sweet potato, Ipomoea trifida (H. B. K.) G. Don. DNA Research 22, 171-179. Hirsch, C.D., Springer, N.M., and Hirsch, C.N. (2015). Genomic Limitations to RNAseq Expression Profiling. The Plant Journal 84, 491-503. 27 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 Hirsch, C.N., Foerster, J.M., Johnson, J.M., Sekhon, R.S., Muttoni, G., Vaillancourt, B., Penagaricano, F., Lindquist, E., Pedraza, M.A., Barry, K., de Leon, N., Kaeppler, S.M., and Buell, C.R. (2014). Insights into the maize pan-genome and pantranscriptome. Plant Cell 26, 121-135. International Brachypodium, I. (2010). Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463, 763-768. International Rice Genome Sequencing, P. (2005). The map-based sequence of the rice genome. Nature 436, 793-800. Kawahara, Y., de la Bastide, M., Hamilton, J.P., Kanamori, H., McCombie, W.R., Ouyang, S., Schwartz, D.C., Tanaka, T., Wu, J., Zhou, S., Childs, K.L., Davidson, R.M., Lin, H., Quesada-Ocampo, L., Vaillancourt, B., Sakai, H., Lee, S.S., Kim, J., Numa, H., Itoh, T., Buell, C.R., and Matsumoto, T. (2013). Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data. Rice (N Y) 6, 4. Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S.L. (2013). TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14, R36. Korf, I. (2004). Gene finding in novel genomes. BMC Bioinformatics 5, 59. Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., and Salzberg, S.L. (2004). Versatile and open software for comparing large genomes. Genome Biol 5, R12. Lai, J., Li, R., Xu, X., Jin, W., Xu, M., Zhao, H., Xiang, Z., Song, W., Ying, K., Zhang, M., Jiao, Y., Ni, P., Zhang, J., Li, D., Guo, X., Ye, K., Jian, M., Wang, B., Zheng, H., Liang, H., Zhang, X., Wang, S., Chen, S., Li, J., Fu, Y., Springer, N.M., Yang, H., Wang, J., Dai, J., and Schnable, P.S. (2010). Genome-wide patterns of genetic variation among elite maize inbred lines. Nat Genet 42, 1027-1030. Lamesch, P., Berardini, T.Z., Li, D., Swarbreck, D., Wilks, C., Sasidharan, R., Muller, R., Dreher, K., Alexander, D.L., Garcia-Hernandez, M., Karthikeyan, A.S., Lee, C.H., Nelson, W.D., Ploetz, L., Singh, S., Wensel, A., and Huala, E. (2012). The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic acids research 40, D1202-1210. Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nature methods 9, 357-359. Law, M., Childs, K.L., Campbell, M.S., Stein, J.C., Olson, A.J., Holt, C., Panchy, N., Lei, J., Jiao, D., Andorf, C.M., Lawrence, C.J., Ware, D., Shiu, S.H., Sun, Y., Jiang, N., and Yandell, M. (2015). Automated update, revision, and quality control of the maize genome annotations using MAKER-P improves the B73 RefGen_v3 gene models and identifies new genes. Plant Physiol 167, 25-39. Li, H., and Durbin, R. (2010). Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589-595. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and Durbin, R. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079. Li, L., Stoeckert, C.J., Jr., and Roos, D.S. (2003). OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13, 2178-2189. 28 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 Lindenbaum, P. (2015). JVarkit: java-based utilities for Bioinformatics. figshare http://dx.doi.org/10.6084/m9.figshare.1425030. Love, M.I., Huber, W., and Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550. Lu, F., Romay, M.C., Glaubitz, J.C., Bradbury, P.J., Elshire, R.J., Wang, T., Li, Y., Li, Y., Semagn, K., Zhang, X., Hernandez, A.G., Mikel, M.A., Soifer, I., Barad, O., and Buckler, E.S. (2015). High-resolution genetic mapping of maize pan-genome sequence anchors. Nature communications 6, 6914. Luo, R., Liu, B., Xie, Y., Li, Z., Huang, W., Yuan, J., He, G., Chen, Y., Pan, Q., Liu, Y., Tang, J., Wu, G., Zhang, H., Shi, Y., Liu, Y., Yu, C., Wang, B., Lu, Y., Han, C., Cheung, D.W., Yiu, S.M., Peng, S., Xiaoqian, Z., Liu, G., Liao, X., Li, Y., Yang, H., Wang, J., Lam, T.W., and Wang, J. (2012). SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18. Magoc, T., and Salzberg, S.L. (2011). FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27, 2957-2963. Maron, L.G., Guimaraes, C.T., Kirst, M., Albert, P.S., Birchler, J.A., Bradbury, P.J., Buckler, E.S., Coluccio, A.E., Danilova, T.V., Kudrna, D., Magalhaes, J.V., Pineros, M.A., Schatz, M.C., Wing, R.A., and Kochian, L.V. (2013). Aluminum tolerance in maize is associated with higher MATE1 gene copy number. Proc Natl Acad Sci U S A 110, 5241-5246. Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. 2011 17. Mikel, M.A. (2011). Genetic Composition of Contemporary U.S. Commercial Dent Corn Germplasm. Crop Sci. 51, 592-599. Monaco, M.K., Stein, J., Naithani, S., Wei, S., Dharmawardhana, P., Kumari, S., Amarasinghe, V., Youens-Clark, K., Thomason, J., Preece, J., Pasternak, S., Olson, A., Jiao, Y., Lu, Z., Bolser, D., Kerhornou, A., Staines, D., Walts, B., Wu, G., D'Eustachio, P., Haw, R., Croft, D., Kersey, P.J., Stein, L., Jaiswal, P., and Ware, D. (2014). Gramene 2013: comparative plant genomics resources. Nucleic acids research 42, D1193-1199. Morgante, M., De Paoli, E., and Radovic, S. (2007). Transposable elements and the plant pangenomes. Curr Opin Plant Biol 10, 149-155. Murray, M.G., and Thompson, W.F. (1980). Rapid isolation of high molecular weight plant DNA. Nucleic acids research 8, 4321-4325. Ossowski, S., Schneeberger, K., Clark, R.M., Lanz, C., Warthmann, N., and Weigel, D. (2008). Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Res 18, 2024-2033. Ouyang, S., Zhu, W., Hamilton, J., Lin, H., Campbell, M., Childs, K., Thibaud-Nissen, F., Malek, R.L., Lee, Y., Zheng, L., Orvis, J., Haas, B., Wortman, J., and Buell, C.R. (2007). The TIGR Rice Genome Annotation Resource: improvements and new features. Nucleic acids research 35, D883-887. Park, S.J., Jiang, K., Tal, L., Yichie, Y., Gar, O., Zamir, D., Eshed, Y., and Lippman, Z.B. (2014). Optimization of crop productivity in tomato using induced mutations in the florigen pathway. Nat Genet 46, 1337-1342. Parra, G., Bradnam, K., and Korf, I. (2007). CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061-1067. 29 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 Paterson, A.H., Bowers, J.E., Bruggmann, R., Dubchak, I., Grimwood, J., Gundlach, H., Haberer, G., Hellsten, U., Mitros, T., Poliakov, A., Schmutz, J., Spannagl, M., Tang, H., Wang, X., Wicker, T., Bharti, A.K., Chapman, J., Feltus, F.A., Gowik, U., Grigoriev, I.V., Lyons, E., Maher, C.A., Martis, M., Narechania, A., Otillar, R.P., Penning, B.W., Salamov, A.A., Wang, Y., Zhang, L., Carpita, N.C., Freeling, M., Gingle, A.R., Hash, C.T., Keller, B., Klein, P., Kresovich, S., McCann, M.C., Ming, R., Peterson, D.G., Mehboob ur, R., Ware, D., Westhoff, P., Mayer, K.F., Messing, J., and Rokhsar, D.S. (2009). The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551-556. Quinlan, A.R., and Hall, I.M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841-842. R Development Core Team. (2011). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Schatz, M.C., Maron, L.G., Stein, J.C., Hernandez Wences, A., Gurtowski, J., Biggers, E., Lee, H., Kramer, M., Antoniou, E., Ghiban, E., Wright, M.H., Chia, J.M., Ware, D., McCouch, S.R., and McCombie, W.R. (2014). Whole genome de novo assemblies of three divergent strains of rice, Oryza sativa, document novel gene space of aus and indica. Genome Biol 15, 506. Schnable, J.C., Springer, N.M., and Freeling, M. (2011). Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss. Proc Natl Acad Sci U S A 108, 4069-4074. Schnable, P.S., Ware, D., Fulton, R.S., Stein, J.C., Wei, F., Pasternak, S., Liang, C., Zhang, J., Fulton, L., Graves, T.A., Minx, P., Reily, A.D., Courtney, L., Kruchowski, S.S., Tomlinson, C., Strong, C., Delehaunty, K., Fronick, C., Courtney, B., Rock, S.M., Belter, E., Du, F., Kim, K., Abbott, R.M., Cotton, M., Levy, A., Marchetto, P., Ochoa, K., Jackson, S.M., Gillam, B., Chen, W., Yan, L., Higginbotham, J., Cardenas, M., Waligorski, J., Applebaum, E., Phelps, L., Falcone, J., Kanchi, K., Thane, T., Scimone, A., Thane, N., Henke, J., Wang, T., Ruppert, J., Shah, N., Rotter, K., Hodges, J., Ingenthron, E., Cordes, M., Kohlberg, S., Sgro, J., Delgado, B., Mead, K., Chinwalla, A., Leonard, S., Crouse, K., Collura, K., Kudrna, D., Currie, J., He, R., Angelova, A., Rajasekar, S., Mueller, T., Lomeli, R., Scara, G., Ko, A., Delaney, K., Wissotski, M., Lopez, G., Campos, D., Braidotti, M., Ashley, E., Golser, W., Kim, H., Lee, S., Lin, J., Dujmic, Z., Kim, W., Talag, J., Zuccolo, A., Fan, C., Sebastian, A., Kramer, M., Spiegel, L., Nascimento, L., Zutavern, T., Miller, B., Ambroise, C., Muller, S., Spooner, W., Narechania, A., Ren, L., Wei, S., Kumari, S., Faga, B., Levy, M.J., McMahan, L., Van Buren, P., Vaughn, M.W., Ying, K., Yeh, C.-T., Emrich, S.J., Jia, Y., Kalyanaraman, A., Hsia, A.-P., Barbazuk, W.B., Baucom, R.S., Brutnell, T.P., Carpita, N.C., Chaparro, C., Chia, J.-M., Deragon, J.-M., Estill, J.C., Fu, Y., Jeddeloh, J.A., Han, Y., Lee, H., Li, P., Lisch, D.R., Liu, S., Liu, Z., Nagel, D.H., McCann, M.C., SanMiguel, P., Myers, A.M., Nettleton, D., Nguyen, J., Penning, B.W., Ponnala, L., Schneider, K.L., Schwartz, D.C., Sharma, A., Soderlund, C., Springer, N.M., Sun, Q., Wang, H., Waterman, M., Westerman, R., Wolfgruber, T.K., Yang, L., Yu, Y., Zhang, L., Zhou, S., Zhu, Q., Bennetzen, J.L., Dawe, R.K., Jiang, J., Jiang, N., Presting, G.G., Wessler, S.R., Aluru, S., Martienssen, R.A., Clifton, S.W., McCombie, W.R., Wing, 30 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 R.A., and Wilson, R.K. (2009). The B73 Maize Genome: Complexity, Diversity, and Dynamics. Science 326, 1112-1115. Smit, A., Hubley, R., and Green, P. (2013-2015). RepeatMasker Open-4.0 http://www.repeatmasker.org. Song, R., and Messing, J. (2003). Gene expression of a gene family in maize based on noncollinear haplotypes. Proc Natl Acad Sci U S A 100, 9055-9060. Springer, N.M., and Stupar, R.M. (2007). Allelic variation and heterosis in maize: how do two halves make more than a whole? Genome Res 17, 264-275. Springer, N.M., Ying, K., Fu, Y., Ji, T., Yeh, C.T., Jia, Y., Wu, W., Richmond, T., Kitzman, J., Rosenbaum, H., Iniguez, A.L., Barbazuk, W.B., Jeddeloh, J.A., Nettleton, D., and Schnable, P.S. (2009). Maize inbreds exhibit high levels of copy number variation (CNV) and presence/absence variation (PAV) in genome content. PLoS Genet 5, e1000734. Stanke, M., and Waack, S. (2003). Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19 Suppl 2, ii215-225. Stelpflug, S.C., Sekhon, R.S., Vaillancourt, B., Hirsch, C.N., Buell, C.R., de Leon, N., and Kaeppler, S.M. (2016). An Expanded Maize Gene Expression Atlas based on RNA Sequencing and its Use to Explore Root Development. The Plant Genome 9. Sutton, T., Baumann, U., Hayes, J., Collins, N.C., Shi, B.J., Schnurbusch, T., Hay, A., Mayo, G., Pallotta, M., Tester, M., and Langridge, P. (2007). Boron-toxicity tolerance in barley arising from efflux transporter amplification. Science 318, 1446-1449. Suzek, B.E., Huang, H., McGarvey, P., Mazumder, R., and Wu, C.H. (2007). UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23, 12821288. Swanson-Wagner, R.A., Eichten, S.R., Kumari, S., Tiffin, P., Stein, J.C., Ware, D., and Springer, N.M. (2010). Pervasive gene content variation and copy number variation in maize and its undomesticated progenitor. Genome Res 20, 1689-1699. Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., and Pachter, L. (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28, 511-515. Troyer, A.F. (1999). Background of U.S. Hybrid Corn. Crop Sci. 39, 601-626. Troyer, A.F. (2006). Adaptedness and Heterosis in Corn and Mule Hybrids. Crop Science 46, 528-543. UniProt Consortium. (2014). Activities at the Universal Protein Resource (UniProt). Nucleic acids research 42, D191-198. Wang, Y., Xiong, G., Hu, J., Jiang, L., Yu, H., Xu, J., Fang, Y., Zeng, L., Xu, E., Xu, J., Ye, W., Meng, X., Liu, R., Chen, H., Jing, Y., Wang, Y., Zhu, X., Li, J., and Qian, Q. (2015). Copy number variation at the GL7 locus contributes to grain size diversity in rice. Nat Genet. Weber, R.L., Wiebke-Strohm, B., Bredemeier, C., Margis-Pinheiro, M., de Brito, G.G., Rechenmacher, C., Bertagnolli, P.F., de Sa, M.E., Campos Mde, A., de Amorim, R.M., Beneventi, M.A., Margis, R., Grossi-de-Sa, M.F., and Bodanese-Zanettini, M.H. (2014). Expression of an osmotin-like protein from Solanum nigrum confers drought tolerance in transgenic soybean. BMC plant biology 14, 343. 31 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 Weigel, D., and Mott, R. (2009). The 1001 genomes project for Arabidopsis thaliana. Genome Biol 10, 107. Wu, T.D., and Watanabe, C.K. (2005). GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859-1875. Xu, H., Luo, X., Qian, J., Pang, X., Song, J., Qian, G., Chen, J., and Chen, S. (2012). FastUniq: a fast de novo duplicates removal tool for paired short reads. PLoS One 7, e52249. Yandell, M., and Ence, D. (2012). A beginner's guide to eukaryotic genome annotation. Nat Rev Genet 13, 329-342. Yao, H., Dogra Gray, A., Auger, D.L., and Birchler, J.A. (2013). Genomic dosage effects on heterosis in triploid maize. Proc Natl Acad Sci U S A 110, 2665-2669. Zdobnov, E.M., and Apweiler, R. (2001). InterProScan--an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17, 847-848. Zhao, H., Sun, Z., Wang, J., Huang, H., Kocher, J.P., and Wang, L. (2014). CrossMap: a versatile tool for coordinate conversion between genome assemblies. Bioinformatics 30, 1006-1007. 1032 1033 Table 1. Phenotypic variation between B73, PH207 and the F1 reciprocal hybrids between B73 1034 and PH207. Phenotypic measurements were collected on greenhouse-grown plants at the 1035 seedling vegetative 1 developmental stage (Abendroth et al., 2011). 1036 B73 x PH207 x B73 PH207 B73 PH207 Fresh Aboveground Biomass (g) 0.93 1.43 1.25 0.97 Fresh Belowground Biomass (g) 0.83 1.09 1.33 0.91 Plant Height (cm) 12.84 17.32 17.30 13.82 Root Length (cm) 24.66 29.84 30.01 26.97 Trait 1037 1038 1039 1040 1041 1042 32 1043 Table 2. Transcriptome expression variation across six tissues in the context of B73 and PH207 1044 annotated genes. B73 and PH207 RNAseq reads were mapped to both their cognate genome and 1045 the reciprocal genome and expression was summarized as read counts in each of the mapping 1046 scenarios. Tissues included in this analysis were leaf blade, root cortical parenchyma, root stele, 1047 germinating kernel, root tip, and whole seedling. 1048 B73a PH207a Expression Competence Never Expressed 6,180 5,885 Expressed in ≥1 tissue 33,121 34,672 Differential expression between genotypes Always Differentially Expressed 1,655 1,968 Sometimes Differentially Expressed 18,913 19,363 Never Differentially Expressed 18,733 19,226 Genotype-specific expression pattern Always Genotype Specific if Expressed 3,532 3,964 Sometimes Genotype Specific 8,719 9,910 Never Genotype Specific 27,050 26,683 a The total number of genes in each genotype are 39,301 (B73) and 40,557 (PH207) 1049 1050 1051 FIGURE LEGENDS 1052 Figure 1. Summary of the PH207 genome assembly. A) Genome-wide comparison of the B73 1053 genome assembly and the PH207 genome assembly. The inset shows a zoomed in view of the 1054 first Mb of chromosome one from each genome. Forward matches, plotted first, are colored red 1055 and reverse matches are colored blue. B) Density of annotated genes in the PH207 genome 1056 assembly. 1057 1058 Figure 2. Genome content distribution in elite maize inbred lines B73 and PH207. A) 1059 Distribution of gene size for all B73 and PH207 genes and genes showing presence/absence 1060 variation (PAV). The whiskers are plotted at a maximum of 1.5 times the interquartile range 1061 away from the end of the box in each direction. B) Distribution of partial gene deletions in B73 1062 (right) and PH207 (left) genes relative to the genome in which the gene was annotated based on 1063 the ratio of coverage from resequencing data. C) Genome-wide distribution of B73 and PH207 1064 unique genes. 33 1065 1066 Figure 3. Variation in gene family size in elite maize inbred lines B73 and PH207. A) 1067 Distribution of gene families of the same size in B73 and PH207. B) Distribution of gene 1068 families with moderate changes in gene family size between B73 and PH207. C) Distribution of 1069 gene families with extreme changes in gene family size between B73 and PH207. D) Density 1070 plot of the proportion of genes in the inbred line (B73 or PH207) that has expanded gene family 1071 members in syntenic positions relative to the reciprocal inbred line, with fewer copies for gene 1072 families with extreme changes. E) Physical location of genes in a family that had only clustered 1073 expansion. F) Physical locations of genes in a family that had both clustered and distributed 1074 expansion. G) Physical locations of genes in a family that had distributed expansion only. In 1075 plots E-G, B73 genes are above the axis, and PH207 genes are below the axis. White circles 1076 represent genes with no introns, and black circles represent genes with 1 or more introns. 1077 1078 Figure 4. Distribution of expression in B73 and PH207 for shared and genotype-specific genes. 1079 For each B73 tissue used in this analysis the average of three biological replicates was used, and 1080 the average of two biological replicates was used for PH207 tissues. For both B73 and PH207 1081 the biological replicates consisted of pooled tissue from three plants. A) Distribution of number 1082 of tissues with expression. B) Distribution of average transcript abundance measured in 1083 fragments per kilobase of exon models per million fragments mapped (FPKM). The whiskers are 1084 plotted at a maximum of 1.5 times the interquartile range away from the end of the box in each 1085 direction. For both A and B, expression was measured in leaf blade, root cortical parenchyma, 1086 germinating kernel, root tip, whole seedling, and root stele. C) Distribution of transcript size in 1087 each class of genes. 1088 1089 Figure 5. Relationship between gene family size and transcriptional variation. Gene families 1090 were determined using homologous gene clustering with B73 and PH207 gene models. Heat 1091 maps show the proportion of genes in families of size N with expression in zero to six tissues, 1092 which include leaf blade, root cortical parenchyma, root stele, germinating kernel, root tip, and 1093 whole seedling. White color indicates no genes are expressed and red indicates all genes are 1094 expressed. 1095 34 1096 Figure 6. Relationship between variation in promoter sequence and transcriptional variation. A) 1097 Distribution of percent similarity of promoter sequences between B73 and PH207 genes in the 1098 region 1 kb upstream of the transcription start site for genes that show a one-to-one relationship 1099 in the two genome assemblies. B) Relationship between number of tissues with differential 1100 expression and number of SNPs per kb in the promoter sequence. B) Relationship between 1101 number of tissues with differential expression and number of InDels per kb in the promoter 1102 sequence. 1103 1104 Figure 7. Lessons learned from comparisons between genome assemblies of elite maize inbred 1105 lines. A) Homologous gene clusters between Brachypodium distachyon, maize B73, maize 1106 PH207, Oryza sativa, and Sorghum bicolor that contain a maize gene from only B73 or PH207 1107 and at least one other species. B) Example of a duplicated gene showing differential fractionation 1108 between B73 and PH207. Maize is an ancient tetraploid that has undergone genome 1109 fractionation. A previous study identified and classified genes in the B73 genome into two maize 1110 subgenomes, Maize1 and Maize2 (Schnable et al., 2011). The middle genes in these syntenic 1111 blocks are showing differential fractionation between B73 and PH207, with PH207 retaining 1112 both copies and B73 losing the copy in the Maize2 subgenome (Gene Positions: 1113 GRMZM2G003937 1114 chr2:208887428..208891800; 1115 Zm00008a009655 1116 chr2:212,373,566..212,385,564; 1117 GRMZM2G066197 1118 chr7:161,843,771..161,845,175; 1119 Zm00008a029226 1120 chr7:162,164,382..162,166,725). C) Example of a gene containing a putative frame shift that is 1121 compensated by alternative intron/exon boundaries when using annotation specific to the 1122 individual. PH207 reads were aligned to the B73 reference genome assembly and putative large 1123 effect variants were identified based on the B73 annotation. The putative frame shift variant 1124 shown in this figure is corrected for by an alternative gene model in PH207 that was identified 1125 using PH207-specific evidence to annotate genes in the PH207 genome assembly. chr2:208,869,363..208,870,921; GRMZM2G454474 GRMZM2G129575 chr2:208,993,547..208,994,705; chr2:212,248,102..212,249,027; Zm00008a009663 chr2: chr7:161,658,677..161,660,464; Zm00008a029217 chr7:162,105,478..162,112,256; 1126 35 Zm00008a009660 212,427,728..212,428,519; GRMZM2G179777 chr7:161,948,327..161,949,627; Zm00008a029228 PH207 Genome Assembly A. Figure 1. Summary of the de novo PH207 genome assembly. A) Genome-wide comparison of the B73 v2 reference genome assembly and the PH207 de novo genome assembly. The inset shows a zoomed in view of the first Mb of chromosome one. Forward matches, plotted first, are colored red and reverse matches are colored blue. B) Density of annotated genes in the PH207 genome assembly. 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 B73 v3 Genome Assembly 8 9 10 1 2 3 4 5 6 Genomic Position 8 9 10 B. Genes per Mb 80 0 7 10000 10000 15000 15000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5000 5000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 00 Gene Size (bp) Gene Size (bp) A. All Genes PAV Genes All Genes PAV Genes All Genes B73PAV Genes All GenesPH207 PAV Genes B73 PH207 3.1% 2.7% 6.4% B. 5.4% 3.7% 7.1% 83.8% 87.8% 0% 0%Coverage Coverage <25% <25%Coverage Coverage 25-75% Coverage 25−75% Coverage >75% Coverage >75% Coverage C. 5 6 6 7 7 8 8 9 9 10 10 11 22 33 44 5 B73 Unique Genes 11 22 33 44 55 66 77 88 99 10 10 PH207 Unique Genes Figure 2. Genome content distribution in elite maize inbred lines B73 and PH207. A) Distribution of gene size for all B73 and PH207 genes and genes showing presence/absence variation (PAV). The whiskers are plotted at a maximum of 1.5 times the interquartile range away from the end of the box in each direction. B) Distribution of partial gene deletions in B73 (right) and PH207 (left) genes relative to the genome in which the gene was annotated based on the ratio of coverage from resequencing data. C) Genome-wide distribution of B73 and PH207 unique genes. 19697 15000 10000 5000 1040 74 18 6 4 1 1 1 1 0 1 2 3 4 5 6 7 9 10 20 6 100 5 4 3 2 N=136 Families 83 80 60 32 40 20 2 9 7 3 0 0 PH207 >30 >30 6-306-30 1-5 1-5 6-30 1 0 0 1 2 3 4 5 6 B73 6-30 1-5 1-5 6-306-30 >30 >30 Number of B73 Genes Number of B73 and PH207 Genes Number of B73 and PH207 Genes 4 Density 0 1 2 3 4 D. C. N=1,720 Families Number of Groups Number of Groups 25000 20000 B. N=20,843 Families Number of PH207 Genes A. 3 2 1 0 0.0 0.2 0.0 0.4 0.2 0.6 0.4 0.8 0.6 1.0 0.8 1.0 Proportion of High Parent Genes at Positions Syntenic to Low Parent Proportion of High Parent Genes at Positions Syntentic to Low Parent ● ● ● ● ● ● ● ● ● ● ● ● E. ● ● ● ● ● ● ● ● ● ● ● ● ● ● F. ● ● ● ● ● ● G. ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● Figure 3. Variation in gene family size in elite maize inbred lines B73 and PH207. A) Distribution of gene families of the same size in B73 and PH207. B) Distribution of gene families with moderate changes in gene family size between B73 and PH207. C) Distribution of gene families with extreme changes in gene family size between B73 and PH207. D) Density plot of the proportion of genes in the inbred line (B73 or PH207) that has expanded gene family members in syntenic positions relative to the reciprocal inbred line with fewer copies for gene families with extreme changes. E) Physical location of genes in a family that had only clustered expansion. F) Physical location of genes in a family that had both clustered and distributed expansion. G) Physical location of genes in a family that had distributed expansion only. In plots E-G, B73 genes are above the axis, and PH207 genes are below the axis. White circles represent genes with no introns and black circles represent genes with 1 or more introns. 1.5 A. 1.0 PH207 Shared PH207 Specific B73 Shared B73 Specific 0.5 1.0 0.5 0.0 Density Density 1.5 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 0 4 Expression 6 Number of 2 Tissues with Transcript Abundance(FPKM) (FPKM) Transcript Abundance B. ● ● ● ● Number of Tissues with Expression ● 40 40 ● ● ● ● ● ● ● ● ● ● 30 30 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 20 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 10 00 C. 3500 3500 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● PH207 PH207 Shared Shared ● ● PH207 PH207 Specific Specific ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● B73 B73 Shared Shared 3000 3000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2500 2500 B73 B73 Specific Specific ● ● Transcript Size (bp) Transcript Length (bp) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2000 2000 ● ● ● ● ● ● ● ● 1500 1500 1000 1000 500 500 00 PH207 PH207 Shared Shared PH207 PH207 Specific Specific B73 B73 Shared Shared B73 B73 Specific Specific Figure 4. Distribution of expression in B73 and PH207 shared and genotype-specifific genes. For each B73 tissue used in this analysis, the average of three biological replicates was used, and the average of two biological replicates was used for PH207 tissue. For both B73 and PH207, the biological replicates consisted of pooled tissue from three plants, A) Distribution of number of tissues with expression. B) Distribution of average transcript abundance measured in fragments per kilobase of exon models per million fragments mapped (FPKM). The whiskers are plotted at a maximum of 1.5 times the interquartile range away from the end of the box in each direction. For both A and B, expression was measured in leaf blade, root cortical parenchyma, germinating kernel, root tip, whole seedling, and root stele. C) Distribution of transcript size in each class of genes. PH207 Genes 0 1 2 3 4 5 6 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 1010 1111 1212 1313 1414 1515 1616 1717 2020 2121 2828 2929 3434 3838 4040 4646 4747 4949 5757 6161 7777 102 102 105 105 150 150 167 167 214 214 Gene Family Size Gene Family Size 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 1010 1111 1212 1313 1414 1515 1616 1717 1818 1919 2020 2222 2323 2424 2525 2626 2929 3232 3737 3838 4141 4545 4747 6161 6868 7373 0 2 3 0 11 2 3 44 5 5 66 Number of Tissues with Expression Number of Tissues with Expression Figure 5. Relationship between gene family size and transcriptional variation. Gene families were determined using homologous gene clustering with B73 and PH207 gene models. Heat maps show the proportion of genes in families of size N with expression in zero to six tissues which include leaf blade, root cortical parenchyma, root stele, germinating kernel, root tip, whole seedling. White color indicates no genes are expressed and red indicates all genes are expressed. Gene Family Size B73 Genes A. 100 Percent of Genes 4000 Count 3000 2000 1000 80 80 0 0-5 5-20 >20 80 60 40 20 85 90 95 85 90 95 Percent Similarity Percent Similarity 100 100 0 100 0-2 2-5 >5 80 60 40 20 0 0 0 0 C. Percent of Genes 1000 2000 3000 4000 B. 0 1 2 3 4 5 6 No. Tissues with Differential Expression 0 1 2 3 4 5 6 No. Tissues with Differential Expression Figure 6. Rela*onship between varia*on in promoter sequence and transcrip*onal varia*on. A) Distribu*on of percent similarity of promoter sequences between B73 and PH207 genes in the 1 kb upstreamofthetranscrip*onstartsiteforgenesthatshowaone-to-onerela*onshipinthetwogenome assemblies.B)Rela*onshipbetweennumberof*ssueswithdifferen*alexpressionandnumberofSNPs perkbinthepromotersequence.C)Rela*onshipbetweennumberof*ssueswithdifferen*alexpression andnumberofInser*ons/Dele*onsperkbinthepromotersequence. A. 2000 B73 Absent From Cluster PH207 Absent From Cluster Number of Clusters 1600 B. B73 Chr2 208,857 kb 208,907 kb 208,957 kb 209,007 kb GRMZM2G003937 GRMZM2G129575 "Maize2" Chr2 Zm00008a009655 1200 GRMZM2G454474 B73 Zm00008a009660 Zm00008a009663 800 PH207 GRMAM2G066197 400 0 4 3 2 Total "Maize1" Chr7 Zm00008a029217 GRMZM2G179777 B73 Zm00008a029226 Zm00008a029228 PH207 Taxa Per Cluster C. 9,726 bp GRMZM2G704285 Zm00008a034742 B73 Gene Model 9,726 bp PH207 GeneModel GRMZM2G704285 B73 Gene Model Zm00008a034742 PH207 Gene Model PH207 Read AC C PH207 Read Alignments T Alignments A GGA T G G G A T T C T A T A C C A T G GGC AG A T GGG C T A C C T C B73 Genome Sequence E D G L H I G Q M Y Y G B73 Protein B73 Protein E D G L I M Y Y G B73 Based PH207 Protein B73 Based PH207 Protein PH207 Protein PH207 Protein G C T G C T A T T G C T A G A T G T G G G T C C T C A A C T T T C C A C A T A C T T G C T C C T G A A C C T G A T G A L L A I A A A A A R A I A I A R R C G S S T F H I C G S S T F H I L G S S T F H I L C L A A A E P P P E E D P P P D 1" 2" 3" Figure 7. Lessons learned from comparisons between genome assemblies of elite maize inbred lines. A) 4" 5" 6" 7" B) Example of a duplicated gene showing differential fractionation between B73 and PH207. Maize is an 8" 9" 10" 11" 12" 13" 14" 15" 16" 17" 18" 19" with PH207 retaining both copies and B73 losing the copy in the Maize2 subgenome (Gene Positions: Homologous gene clusters between Brachypodium distachyon, maize B73, maize PH207, Oryza sativa, and Sorghum bicolor that contain a maize gene from only B73 or PH207 and at least one other species. ancient tetraploid that has undergone genome fractionation. A previous study identified and classified genes in the B73 genome into two maize subgenomes, Maize1 and Maize2 (Schnable et al., 2011). The middle genes in these syntenic blocks are showing differential fractionation between B73 and PH207, GRMZM2G003937 chr2:208,869,363..208,870,921; GRMZM2G129575 chr2:208887428..208891800; GRMZM2G454474 chr2:208,993,547..208,994,705; Zm00008a009655 chr2:212,248,102..212,249,027; Zm00008a009660 chr2:212,373,566..212,385,564; Zm00008a009663 chr2: 212,427,728..212,428,519; GRMZM2G066197 chr7:161,658,677..161,660,464; GRMZM2G179777 chr7:161,843,771..161,845,175; Zm00008a029217 chr7:161,948,327..161,949,627; Zm00008a029226 chr7:162,105,478..162,112,256; Zm00008a029228 chr7:162,164,382..162,166,725). C) Example of a gene containing a putative frame shift that is compensated by alternative intron/exon boundaries when using annotation specific to the individual. PH207 reads were aligned to the B73 reference genome assembly and putative large effect variants were identified based on the B73 annotation. The putative frame shift variant shown in this figure is corrected for by an alternative gene model in PH207 that was identified using PH207-specific evidence to annotate genes in the PH207 genome assembly. E D E E Parsed Citations Abendroth, L.J., Elmore, R.W., Boyer, M.J., and Marlay, S.K. (2011). Corn growth and development. PMR 1009. Iowa State University Extension, Ames Iowa. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990a). Basic local alignment search tool. J Mol Biol 215, 403-410. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990b). Basic Local Alignment Search Tool. Journal of Molecular Biology 215, 403-410. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Anders, S., Pyl, P.T., and Huber, W. (2015). HTSeq-a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166-169. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Anderson, J.E., Kantar, M.B., Kono, T.Y., Fu, F., Stec, A.O., Song, Q., Cregan, P.B., Specht, J.E., Diers, B.W., Cannon, S.B., McHale, L.K., and Stupar, R.M. (2014). A roadmap for functional structural variants in the soybean genome. G3 4, 1307-1318. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Arabidopsis Genome, I. (2000). Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796-815. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Belo, A., Beatty, M.K., Hondred, D., Fengler, K.A., Li, B., and Rafalski, A. (2010). Allelic genome structural variations in maize detected by array comparative genome hybridization. Theor Appl Genet 120, 355-367. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Birchler, J.A., Auger, D.L., and Riddle, N.C. (2003). In search of the molecular basis of heterosis. Plant Cell 15, 2236-2239. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Birchler, J.A., Yao, H., and Chudalayandi, S. (2006). Unraveling the genetic basis of hybrid vigor. Proc Natl Acad Sci U S A 103, 12957-12958. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Birchler, J.A., Yao, H., Chudalayandi, S., Vaiman, D., and Veitia, R.A. (2010). Heterosis. Plant Cell 22, 2105-2112. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Brunner, S., Fengler, K., Morgante, M., Tingey, S., and Rafalski, A. (2005). Evolution of DNA sequence nonhomologies among maize inbreds. Plant Cell 17, 343-360. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Campbell, M.S., Holt, C., Moore, B., and Yandell, M. (2014a). Genome Annotation and Curation Using MAKER and MAKER-P. Curr Protoc Bioinformatics 48, 4 11 11-14 11 39. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Campbell, M.S., Law, M., Holt, C., Stein, J.C., Moghe, G.D., Hufnagel, D.E., Lei, J., Achawanantakun, R., Jiao, D., Lawrence, C.J., Ware, D., Shiu, S.H., Childs, K.L., Sun, Y., Jiang, N., and Yandell, M. (2014b). MAKER-P: a tool kit for the rapid creation, management, and quality control of plant genome annotations. Plant Physiol 164, 513-524. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Cao, J., Schneeberger, K., Ossowski, S., Gunther, T., Bender, S., Fitz, J., Koenig, D., Lanz, C., Stegle, O., Lippert, C., Wang, X., Ott, F., Muller, J., Alonso-Blanco, C., Borgwardt, K., Schmid, K.J., and Weigel, D. (2011). Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nat Genet. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Chen, F., Mackey, A.J., Vermunt, J.K., and Roos, D.S. (2007). Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes. PLoS ONE 2, e383. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Chia, J.M., Song, C., Bradbury, P.J., Costich, D., de Leon, N., Doebley, J., Elshire, R.J., Gaut, B., Geller, L., Glaubitz, J.C., Gore, M., Guill, K.E., Holland, J., Hufford, M.B., Lai, J., Li, M., Liu, X., Lu, Y., McCombie, R., Nelson, R., Poland, J., Prasanna, B.M., Pyhajarvi, T., Rong, T., Sekhon, R.S., Sun, Q., Tenaillon, M.I., Tian, F., Wang, J., Xu, X., Zhang, Z., Kaeppler, S.M., Ross-Ibarra, J., McMullen, M.D., Buckler, E.S., Zhang, G., Xu, Y., and Ware, D. (2012). Maize HapMap2 identifies extant variation from a genome in flux. Nat Genet 44, 803-807. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Cingolani, P., Platts, A., Wang le, L., Coon, M., Nguyen, T., Wang, L., Land, S.J., Lu, X., and Ruden, D.M. (2012). A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80-92. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Cook, D.E., Lee, T.G., Guo, X., Melito, S., Wang, K., Bayless, A.M., Wang, J., Hughes, T.J., Willis, D.K., Clemente, T.E., Diers, B.W., Jiang, J., Hudson, M.E., and Bent, A.F. (2012). Copy number variation of multiple genes at Rhg1 mediates nematode resistance in soybean. Science 338, 1206-1209. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Dong, Q., Schlueter, S.D., and Brendel, V. (2004). PlantGDB, plant genome database and analysis tools. Nucleic acids research 32, D354-359. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Du, Z., Zhou, X., Ling, Y., Zhang, Z., and Su, Z. (2010). agriGO: a GO analysis toolkit for the agricultural community. Nucleic acids research 38, W64-70. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Eddy, S.R. (2011). Accelerated Profile HMM Searches. PLoS Comput Biol 7, e1002195. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Eichten, S.R., Foerster, J.M., de Leon, N., Kai, Y., Yeh, C.T., Liu, S., Jeddeloh, J.A., Schnable, P.S., Kaeppler, S.M., and Springer, N.M. (2011). B73-Mo17 near-isogenic lines demonstrate dispersed structural variation in maize. Plant Physiol 156, 1679-1690. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Eilbeck, K., Moore, B., Holt, C., and Yandell, M. (2009). Quantitative measures for the management and comparison of annotated genomes. BMC Bioinformatics 10, 67. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Finn, R.D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Heger, A., Hetherington, K., Holm, L., Mistry, J., Sonnhammer, E.L., Tate, J., and Punta, M. (2014). Pfam: the protein families database. Nucleic acids research 42, D222-230. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Fu, H., and Dooner, H.K. (2002). Intraspecific violation of genetic colinearity and its implications in maize. Proc Natl Acad Sci U S A 99, 9573-9578. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Gaines, T.A., Zhang, W., Wang, D., Bukun, B., Chisholm, S.T., Shaner, D.L., Nissen, S.J., Patzoldt, W.L., Tranel, P.J., Culpepper, A.S., Grey, T.L., Webster, T.M., Vencill, W.K., Sammons, R.D., Jiang, J., Preston, C., Leach, J.E., and Westra, P. (2010). Gene amplification confers glyphosate resistance in Amaranthus palmeri. Proc Natl Acad Sci U S A 107, 1029-1034. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Gan, X., Stegle, O., Behr, J., Steffen, J.G., Drewe, P., Hildebrand, K.L., Lyngsoe, R., Schultheiss, S.J., Osborne, E.J., Sreedharan, V.T., Kahles, A., Bohnert, R., Jean, G., Derwent, P., Kersey, P., Belfield, E.J., Harberd, N.P., Kemen, E., Toomajian, C., Kover, P.X., Clark, R.M., Ratsch, G., and Mott, R. (2011). Multiple reference genomes and transcriptomes for Arabidopsis thaliana. Nature 477, 419-423. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R., Leisch, F., Li, C., Maechler, M., Rossini, A.J., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J.Y., and Zhang, J. (2004). Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5, R80. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Goodstein, D.M., Shu, S., Howson, R., Neupane, R., Hayes, R.D., Fazo, J., Mitros, T., Dirks, W., Hellsten, U., Putnam, N., and Rokhsar, D.S. (2012). Phytozome: a comparative platform for green plant genomics. Nucleic acids research 40, D1178-1186. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Gore, M.A., Chia, J.M., Elshire, R.J., Sun, Q., Ersoz, E.S., Hurwitz, B.L., Peiffer, J.A., McMullen, M.D., Grills, G.S., Ross-Ibarra, J., Ware, D.H., and Buckler, E.S. (2009). A first-generation haplotype map of maize. Science 326, 1115-1117. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., Chen, Z., Mauceli, E., Hacohen, N., Gnirke, A., Rhind, N., di Palma, F., Birren, B.W., Nusbaum, C., Lindblad-Toh, K., Friedman, N., and Regev, A. (2011). Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29, 644652. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Hansey, C.N., Vaillancourt, B., Sekhon, R.S., de Leon, N., Kaeppler, S.M., and Buell, C.R. (2012). Maize (Zea mays L.) Genome Diversity as Revealed by RNA-Sequencing. PLoS ONE 7, e33071. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Hardigan, M.A., Crisovan, E., Hamiltion, J.P., Kim, J., Laimbeer, P., Leisner, C.P., Manrique-Carpintero, N.C., Newton, L., Pham, G.M., Vaillancourt, B., Yang, X., Zeng, Z., Douches, D., Jiang, J., Veilleux, R.E., and Buell, C.R. (2016). Genome reduction uncovers a large dispensable genome and adaptive role for copy number variation in asexually propagated Solanum tuberosum. The Plant Cell. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Hirakawa, H., Okada, Y., Tabuchi, H., Shirasawa, K., Watanabe, A., Tsuruoka, H., Minami, C., Nakayama, S., Sasamoto, S., Kohara, M., Kishida, Y., Fujishiro, T., Kato, M., Nanri, K., Komaki, A., Yoshinaga, M., Takahata, Y., Tanaka, M., Tabata, S., and Isobe, S.N. (2015). Survey of genome sequences in a wild sweet potato, Ipomoea trifida (H. B. K.) G. Don. DNA Research 22, 171-179. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Hirsch, C.D., Springer, N.M., and Hirsch, C.N. (2015). Genomic Limitations to RNAseq Expression Profiling. The Plant Journal 84, 491-503. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Hirsch, C.N., Foerster, J.M., Johnson, J.M., Sekhon, R.S., Muttoni, G., Vaillancourt, B., Penagaricano, F., Lindquist, E., Pedraza, M.A., Barry, K., de Leon, N., Kaeppler, S.M., and Buell, C.R. (2014). Insights into the maize pan-genome and pan-transcriptome. Plant Cell 26, 121-135. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title International Brachypodium, I. (2010). Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463, 763-768. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title International Rice Genome Sequencing, P. (2005). The map-based sequence of the rice genome. Nature 436, 793-800. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Kawahara, Y., de la Bastide, M., Hamilton, J.P., Kanamori, H., McCombie, W.R., Ouyang, S., Schwartz, D.C., Tanaka, T., Wu, J., Zhou, S., Childs, K.L., Davidson, R.M., Lin, H., Quesada-Ocampo, L., Vaillancourt, B., Sakai, H., Lee, S.S., Kim, J., Numa, H., Itoh, T., Buell, C.R., and Matsumoto, T. (2013). Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data. Rice (N Y) 6, 4. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S.L. (2013). TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14, R36. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Korf, I. (2004). Gene finding in novel genomes. BMC Bioinformatics 5, 59. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., and Salzberg, S.L. (2004). Versatile and open software for comparing large genomes. Genome Biol 5, R12. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Lai, J., Li, R., Xu, X., Jin, W., Xu, M., Zhao, H., Xiang, Z., Song, W., Ying, K., Zhang, M., Jiao, Y., Ni, P., Zhang, J., Li, D., Guo, X., Ye, K., Jian, M., Wang, B., Zheng, H., Liang, H., Zhang, X., Wang, S., Chen, S., Li, J., Fu, Y., Springer, N.M., Yang, H., Wang, J., Dai, J., and Schnable, P.S. (2010). Genome-wide patterns of genetic variation among elite maize inbred lines. Nat Genet 42, 1027-1030. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Lamesch, P., Berardini, T.Z., Li, D., Swarbreck, D., Wilks, C., Sasidharan, R., Muller, R., Dreher, K., Alexander, D.L., GarciaHernandez, M., Karthikeyan, A.S., Lee, C.H., Nelson, W.D., Ploetz, L., Singh, S., Wensel, A., and Huala, E. (2012). The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic acids research 40, D1202-1210. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nature methods 9, 357-359. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Law, M., Childs, K.L., Campbell, M.S., Stein, J.C., Olson, A.J., Holt, C., Panchy, N., Lei, J., Jiao, D., Andorf, C.M., Lawrence, C.J., Ware, D., Shiu, S.H., Sun, Y., Jiang, N., and Yandell, M. (2015). Automated update, revision, and quality control of the maize genome annotations using MAKER-P improves the B73 RefGen_v3 gene models and identifies new genes. Plant Physiol 167, 25-39. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Li, H., and Durbin, R. (2010). Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589-595. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and Durbin, R. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Li, L., Stoeckert, C.J., Jr., and Roos, D.S. (2003). OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13, 2178-2189. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Lindenbaum, P. (2015). JVarkit: java-based utilities for Bioinformatics. figshare http://dx.doi.org/10.6084/m9.figshare.1425030. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Love, M.I., Huber, W., and Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Lu, F., Romay, M.C., Glaubitz, J.C., Bradbury, P.J., Elshire, R.J., Wang, T., Li, Y., Li, Y., Semagn, K., Zhang, X., Hernandez, A.G., Mikel, M.A., Soifer, I., Barad, O., and Buckler, E.S. (2015). High-resolution genetic mapping of maize pan-genome sequence anchors. Nature communications 6, 6914. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Luo, R., Liu, B., Xie, Y., Li, Z., Huang, W., Yuan, J., He, G., Chen, Y., Pan, Q., Liu, Y., Tang, J., Wu, G., Zhang, H., Shi, Y., Liu, Y., Yu, C., Wang, B., Lu, Y., Han, C., Cheung, D.W., Yiu, S.M., Peng, S., Xiaoqian, Z., Liu, G., Liao, X., Li, Y., Yang, H., Wang, J., Lam, T.W., and Wang, J. (2012). SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Magoc, T., and Salzberg, S.L. (2011). FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27, 2957-2963. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Maron, L.G., Guimaraes, C.T., Kirst, M., Albert, P.S., Birchler, J.A., Bradbury, P.J., Buckler, E.S., Coluccio, A.E., Danilova, T.V., Kudrna, D., Magalhaes, J.V., Pineros, M.A., Schatz, M.C., Wing, R.A., and Kochian, L.V. (2013). Aluminum tolerance in maize is associated with higher MATE1 gene copy number. Proc Natl Acad Sci U S A 110, 5241-5246. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. 2011 17. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Mikel, M.A. (2011). Genetic Composition of Contemporary U.S. Commercial Dent Corn Germplasm. Crop Sci. 51, 592-599. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Monaco, M.K., Stein, J., Naithani, S., Wei, S., Dharmawardhana, P., Kumari, S., Amarasinghe, V., Youens-Clark, K., Thomason, J., Preece, J., Pasternak, S., Olson, A., Jiao, Y., Lu, Z., Bolser, D., Kerhornou, A., Staines, D., Walts, B., Wu, G., D'Eustachio, P., Haw, R., Croft, D., Kersey, P.J., Stein, L., Jaiswal, P., and Ware, D. (2014). Gramene 2013: comparative plant genomics resources. Nucleic acids research 42, D1193-1199. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Morgante, M., De Paoli, E., and Radovic, S. (2007). Transposable elements and the plant pan-genomes. Curr Opin Plant Biol 10, 149-155. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Murray, M.G., and Thompson, W.F. (1980). Rapid isolation of high molecular weight plant DNA. Nucleic acids research 8, 4321-4325. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Ossowski, S., Schneeberger, K., Clark, R.M., Lanz, C., Warthmann, N., and Weigel, D. (2008). Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Res 18, 2024-2033. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Ouyang, S., Zhu, W., Hamilton, J., Lin, H., Campbell, M., Childs, K., Thibaud-Nissen, F., Malek, R.L., Lee, Y., Zheng, L., Orvis, J., Haas, B., Wortman, J., and Buell, C.R. (2007). The TIGR Rice Genome Annotation Resource: improvements and new features. Nucleic acids research 35, D883-887. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Park, S.J., Jiang, K., Tal, L., Yichie, Y., Gar, O., Zamir, D., Eshed, Y., and Lippman, Z.B. (2014). Optimization of crop productivity in tomato using induced mutations in the florigen pathway. Nat Genet 46, 1337-1342. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Parra, G., Bradnam, K., and Korf, I. (2007). CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061-1067. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Paterson, A.H., Bowers, J.E., Bruggmann, R., Dubchak, I., Grimwood, J., Gundlach, H., Haberer, G., Hellsten, U., Mitros, T., Poliakov, A., Schmutz, J., Spannagl, M., Tang, H., Wang, X., Wicker, T., Bharti, A.K., Chapman, J., Feltus, F.A., Gowik, U., Grigoriev, I.V., Lyons, E., Maher, C.A., Martis, M., Narechania, A., Otillar, R.P., Penning, B.W., Salamov, A.A., Wang, Y., Zhang, L., Carpita, N.C., Freeling, M., Gingle, A.R., Hash, C.T., Keller, B., Klein, P., Kresovich, S., McCann, M.C., Ming, R., Peterson, D.G., Mehboob ur, R., Ware, D., Westhoff, P., Mayer, K.F., Messing, J., and Rokhsar, D.S. (2009). The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551-556. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Quinlan, A.R., and Hall, I.M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841842. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title R Development Core Team. (2011). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Schatz, M.C., Maron, L.G., Stein, J.C., Hernandez Wences, A., Gurtowski, J., Biggers, E., Lee, H., Kramer, M., Antoniou, E., Ghiban, E., Wright, M.H., Chia, J.M., Ware, D., McCouch, S.R., and McCombie, W.R. (2014). Whole genome de novo assemblies of three divergent strains of rice, Oryza sativa, document novel gene space of aus and indica. Genome Biol 15, 506. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Schnable, J.C., Springer, N.M., and Freeling, M. (2011). Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss. Proc Natl Acad Sci U S A 108, 4069-4074. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Schnable, P.S., Ware, D., Fulton, R.S., Stein, J.C., Wei, F., Pasternak, S., Liang, C., Zhang, J., Fulton, L., Graves, T.A., Minx, P., Reily, A.D., Courtney, L., Kruchowski, S.S., Tomlinson, C., Strong, C., Delehaunty, K., Fronick, C., Courtney, B., Rock, S.M., Belter, E., Du, F., Kim, K., Abbott, R.M., Cotton, M., Levy, A., Marchetto, P., Ochoa, K., Jackson, S.M., Gillam, B., Chen, W., Yan, L., Higginbotham, J., Cardenas, M., Waligorski, J., Applebaum, E., Phelps, L., Falcone, J., Kanchi, K., Thane, T., Scimone, A., Thane, N., Henke, J., Wang, T., Ruppert, J., Shah, N., Rotter, K., Hodges, J., Ingenthron, E., Cordes, M., Kohlberg, S., Sgro, J., Delgado, B., Mead, K., Chinwalla, A., Leonard, S., Crouse, K., Collura, K., Kudrna, D., Currie, J., He, R., Angelova, A., Rajasekar, S., Mueller, T., Lomeli, R., Scara, G., Ko, A., Delaney, K., Wissotski, M., Lopez, G., Campos, D., Braidotti, M., Ashley, E., Golser, W., Kim, H., Lee, S., Lin, J., Dujmic, Z., Kim, W., Talag, J., Zuccolo, A., Fan, C., Sebastian, A., Kramer, M., Spiegel, L., Nascimento, L., Zutavern, T., Miller, B., Ambroise, C., Muller, S., Spooner, W., Narechania, A., Ren, L., Wei, S., Kumari, S., Faga, B., Levy, M.J., McMahan, L., Van Buren, P., Vaughn, M.W., Ying, K., Yeh, C.-T., Emrich, S.J., Jia, Y., Kalyanaraman, A., Hsia, A.-P., Barbazuk, W.B., Baucom, R.S., Brutnell, T.P., Carpita, N.C., Chaparro, C., Chia, J.-M., Deragon, J.-M., Estill, J.C., Fu, Y., Jeddeloh, J.A., Han, Y., Lee, H., Li, P., Lisch, D.R., Liu, S., Liu, Z., Nagel, D.H., McCann, M.C., SanMiguel, P., Myers, A.M., Nettleton, D., Nguyen, J., Penning, B.W., Ponnala, L., Schneider, K.L., Schwartz, D.C., Sharma, A., Soderlund, C., Springer, N.M., Sun, Q., Wang, H., Waterman, M., Westerman, R., Wolfgruber, T.K., Yang, L., Yu, Y., Zhang, L., Zhou, S., Zhu, Q., Bennetzen, J.L., Dawe, R.K., Jiang, J., Jiang, N., Presting, G.G., Wessler, S.R., Aluru, S., Martienssen, R.A., Clifton, S.W., McCombie, W.R., Wing, R.A., and Wilson, R.K. (2009). The B73 Maize Genome: Complexity, Diversity, and Dynamics. Science 326, 1112-1115. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Smit, A., Hubley, R., and Green, P. (2013-2015). RepeatMasker Open-4.0 http://www.repeatmasker.org. Song, R., and Messing, J. (2003). Gene expression of a gene family in maize based on noncollinear haplotypes. Proc Natl Acad Sci U S A 100, 9055-9060. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Springer, N.M., and Stupar, R.M. (2007). Allelic variation and heterosis in maize: how do two halves make more than a whole? Genome Res 17, 264-275. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Springer, N.M., Ying, K., Fu, Y., Ji, T., Yeh, C.T., Jia, Y., Wu, W., Richmond, T., Kitzman, J., Rosenbaum, H., Iniguez, A.L., Barbazuk, W.B., Jeddeloh, J.A., Nettleton, D., and Schnable, P.S. (2009). Maize inbreds exhibit high levels of copy number variation (CNV) and presence/absence variation (PAV) in genome content. PLoS Genet 5, e1000734. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Stanke, M., and Waack, S. (2003). Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19 Suppl 2, ii215-225. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Stelpflug, S.C., Sekhon, R.S., Vaillancourt, B., Hirsch, C.N., Buell, C.R., de Leon, N., and Kaeppler, S.M. (2016). An Expanded Maize Gene Expression Atlas based on RNA Sequencing and its Use to Explore Root Development. The Plant Genome 9. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Sutton, T., Baumann, U., Hayes, J., Collins, N.C., Shi, B.J., Schnurbusch, T., Hay, A., Mayo, G., Pallotta, M., Tester, M., and Langridge, P. (2007). Boron-toxicity tolerance in barley arising from efflux transporter amplification. Science 318, 1446-1449. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Suzek, B.E., Huang, H., McGarvey, P., Mazumder, R., and Wu, C.H. (2007). UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23, 1282-1288. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Swanson-Wagner, R.A., Eichten, S.R., Kumari, S., Tiffin, P., Stein, J.C., Ware, D., and Springer, N.M. (2010). Pervasive gene content variation and copy number variation in maize and its undomesticated progenitor. Genome Res 20, 1689-1699. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., and Pachter, L. (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28, 511-515. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Troyer, A.F. (1999). Background of U.S. Hybrid Corn. Crop Sci. 39, 601-626. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Troyer, A.F. (2006). Adaptedness and Heterosis in Corn and Mule Hybrids. Crop Science 46, 528-543. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title UniProt Consortium. (2014). Activities at the Universal Protein Resource (UniProt). Nucleic acids research 42, D191-198. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Wang, Y., Xiong, G., Hu, J., Jiang, L., Yu, H., Xu, J., Fang, Y., Zeng, L., Xu, E., Xu, J., Ye, W., Meng, X., Liu, R., Chen, H., Jing, Y., Wang, Y., Zhu, X., Li, J., and Qian, Q. (2015). Copy number variation at the GL7 locus contributes to grain size diversity in rice. Nat Genet. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Weber, R.L., Wiebke-Strohm, B., Bredemeier, C., Margis-Pinheiro, M., de Brito, G.G., Rechenmacher, C., Bertagnolli, P.F., de Sa, M.E., Campos Mde, A., de Amorim, R.M., Beneventi, M.A., Margis, R., Grossi-de-Sa, M.F., and Bodanese-Zanettini, M.H. (2014). Expression of an osmotin-like protein from Solanum nigrum confers drought tolerance in transgenic soybean. BMC plant biology 14, 343. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Weigel, D., and Mott, R. (2009). The 1001 genomes project for Arabidopsis thaliana. Genome Biol 10, 107. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Wu, T.D., and Watanabe, C.K. (2005). GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859-1875. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Xu, H., Luo, X., Qian, J., Pang, X., Song, J., Qian, G., Chen, J., and Chen, S. (2012). FastUniq: a fast de novo duplicates removal tool for paired short reads. PLoS One 7, e52249. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Yandell, M., and Ence, D. (2012). A beginner's guide to eukaryotic genome annotation. Nat Rev Genet 13, 329-342. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Yao, H., Dogra Gray, A., Auger, D.L., and Birchler, J.A. (2013). Genomic dosage effects on heterosis in triploid maize. Proc Natl Acad Sci U S A 110, 2665-2669. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Zdobnov, E.M., and Apweiler, R. (2001). InterProScan--an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17, 847-848. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Zhao, H., Sun, Z., Wang, J., Huang, H., Kocher, J.P., and Wang, L. (2014). CrossMap: a versatile tool for coordinate conversion between genome assemblies. Bioinformatics 30, 1006-1007. Pubmed: Author and Title CrossRef: Author and Title Google Scholar: Author Only Title Only Author and Title Draft Assembly of Elite Inbred Line PH207 Provides Insights into Genomic and Transcriptome Diversity in Maize Candice Hirsch, Cory D Hirsch, Alex B Brohammer, Megan J Bowman, Ilya Soifer, Omer Barad, Doron Shem-Tov, Kobi Baruch, Fei Lu, Alvaro G Hernandez, Christopher J Fields, Chris L Wright, Klaus Koehler, Nathan M. Springer, Edward S. Buckler, C. Robin Buell, Natalia de Leon, Shawn M. Kaeppler, Kevin Childs and Mark A Mikel Plant Cell; originally published online November 1, 2016; DOI 10.1105/tpc.16.00353 This information is current as of June 16, 2017 Supplemental Data /content/suppl/2016/11/01/tpc.16.00353.DC1.html Permissions https://www.copyright.com/ccc/openurl.do?sid=pd_hw1532298X&issn=1532298X&WT.mc_id=pd_hw1532298X eTOCs Sign up for eTOCs at: http://www.plantcell.org/cgi/alerts/ctmain CiteTrack Alerts Sign up for CiteTrack Alerts at: http://www.plantcell.org/cgi/alerts/ctmain Subscription Information Subscription Information for The Plant Cell and Plant Physiology is available at: http://www.aspb.org/publications/subscriptions.cfm © American Society of Plant Biologists ADVANCING THE SCIENCE OF PLANT BIOLOGY
© Copyright 2026 Paperzz