Genome and transcriptome sequencing of watermelon (Citrullus lanatus) Zhangjun Fei (费章君) Boyce Thompson Institute for Plant Research USDA Robert W. Holley Center for Agriculture and Health Cornell University Research in Fei lab Purely computational No wet lab Research in Fei lab • • • Developing biological databases for efficient storage, management, dissemination, and mining of large and diverse public datasets • Tomato Functional Genomics Database (http://solgenomics.net/ted) • Cucurbit Genomics Database (http://www.icugi.org) Developing computational tools and algorithms for efficient processing, analysis, and integration of large-scale ‘omics’ datasets. • Plant MetGenMAP (http://bioinfo.bti.cornell.edu/tool/MetGenMAP) • iAssembler (http://bioinfo.bti.cornell.edu/tool/iAssembler) • iTAK (http://bioinfo.bti.cornell.edu/tool/itak) Application of bioinformatics and sequencing technologies for trait discovery, crop improvement and knowledge advancement • Plant virus identification • Tomato epigenome • Genome and transcriptome analysis of several important crops • …………………………………. Talk overview • Generation of a high-quality draft genome of watermelon • Re-sequencing the genomes of 20 representative watermelon accessions from the three C. lanatus sub-species. • Comparative transcriptome sequencing (RNA-seq) of cucurbit phleom sap and vascular bundles, and watermelon flesh and rind during fruit development • Initial analysis of sweet potato genome sequencing Watermelon (Citrullus lanatus) a major cucurbit and an important vegetable crop Narrow genetic diversity Complicated quantity traits size, SSC, shape, aroma, maturity, shelf-life, uniformity, growth vigor diseases BFB, GSB, viruses, Fusarium wilt, powdery mildew, downy mildew, Phytophthora blight Pest infection aphids root-knot nematode International Watermelon Genome Initiative • Generation of a high quality draft genome of watermelon cultivar 97103, an East-Asia type, early maturity, high fruit-quality • Re-sequencing of representative watermelon genotypes to generate a comprehensive genome sequence variation map • Large-scale sequencing of watermelon transcriptomes to gain deeper understanding of important biological processes Sequencing the genome of 97103 Insert size Read length Total length (Mb) Sequence depth 100 bp 50 6,845.93 16.11 200 bp 75, 100 16,326.05 38.41 400 bp 44, 75 12,803.65 30.13 2 kb 44 4,172.44 9.82 5 kb 44 1,884.77 4.43 10 kb 44 2,526.47 5.94 20 kb 44 1,616.72 3.8 46,176.03 108.64 Total Estimated genome size: 425 Mb De novo assembly of watermelon genome Effects of sequence depth and large-insert reads on watermelon genome assembly 1. 100-200 bp 2. 400 bp 3. 2 kb 4. 5 kb 5. 10 kb 6. 20 kb Evaluation of genome assembly quality Analysis of un-assembled reads De novo assembly (353.3 Mb) covers 83.2% of the estimated watermelon genome (425 Mb) Lane 1 (%) Lane 2 (%) Lane 3 (%) Total reads 13,107,531 (100) 12,037,084 (100) 15,435,854 (100) Reads aligned to genome by SOAP 10,540,768 (80.42) 10,113,735 (84.02) 13,329,285 (86.35) Reads not aligned to genome by SOAP 2,566,763 (19.58) 1,923,349 (15.98) 2,106,569 (13.65) Reads aligned to genome by blast* 2,373,937 (18.11) 1,705,221 (14.17) 1,901,355 (12.32) Reads not aligned to genome by blast* 192,826 (1.47) 218,128 (1.81) 205,214 (1.33) centromere telomere 45S and 5S rDNAs Scaffold anchoring and ordering 93.5% of assembled genome anchored 70% and ordered 65% oriented Whole genome duplication Cucurbit genome evolution c w1 w11 Diploid ancestor w10 w2 w9 hexaploid ancestor w3 w8 w4 w7 A4 A7 A10 A13 A16 A1 A4 A7 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A13 A16 A19 n=21 81 fissions fissions 0 1 2 3 4 5 6 7 8 A1 A1 A1 A1 A1 A1 A1 A1 A1 9 0 1 A1 A2 A2 the transition from the 91 91 fusions fusions 21-chromosome eudicot TE Invasion intermediate Modern Watermelon species w5 w6 A1 n=7 WGD Structural shuffling a ancestors involved 81 fissions and 0 1 w1 w2 w3 w4 w5 w6 w7 w8 w9 w1 w1 A19 91 fusions in order to b Watermelon Chr10 Chr5 Chr6 Chr3 Chr1 Chr7 Chr4 Chr11 Chr8 Chr2 Chr9 reach the modern 11chromosome structure of Cucumber Chr3 Chr1 Chr5 Chr4 Chr6 Chr2 Chr7 watermelon, Melon LG4 LG6 LG10 LG5 LG2 LG12 LG9 LG7 LG8 LG3 LG11 LG1 which represented as a mosaic of 102 ancestral block Watermelon Chr10 Chr5 Chr6 Chr3 Chr1 Chr7 Chr4 Chr11 Chr8 Chr2 Chr9 is Repeat sequence annotation Repbase TEs Type Length (bp) % in genome TE protiens Length (bp) % in genome De novo Length (bp) Combined TEs % in genome Length (bp) % in genome DNA transposon 3334975 1.0377 2972663 0.925 12100932 3.77 12100932 3.77 long interspersed element 765975 0.2383 1893537 0.5892 4013972 1.25 4013972 1.25 long terminal repeat 16960692 5.2776 20405741 6.3495 107653915 33.5 107653915 33.5 short interspersed element 27194 0.0085 - - 637970 0.1985 637970 0.1985 Other 13490 0.0042 - - 13490 0.0042 13490 0.0042 Unknown 13699 0.0043 426 0.0001 50632999 15.76 50632999 15.76 21116025 6.5706 25272367 7.8638 175053278 54.48 175053278 54.48 Total Gene annotation Number Percent(%) Total 23,440 100.00 Annotated 19,836 84.62 Swissprot 14,873 63.45 TrEMBL 19,760 84.30 InterPro 16,266 69.39 KEGG 10,936 46.66 GO 11,822 50.44 Unannotated 3,604 15.38 Talk overview • Generation of a high-quality draft genome of watermelon • Re-sequencing the genomes of 20 representative watermelon accessions from the three C. lanatus sub-species. • Comparative transcriptome sequencing (RNA-seq) of cucurbit phleom sap and vascular bundles, and watermelon flesh and rind during fruit development • Initial analysis of sweet potato genome sequencing Watermelon genome resequencing C. lanatus subsp. vulgaris East-Asia ecotype JX-2 JLM JXF RZ-901 XHBFGM Black Diamond Calhoun Gray Sugarlee Sy-904304 RZ-900 PI482271 PI500301 PI189317 PI595203 PI249010 PI482276 PI482303 PI296341-FR PI482326 C. lanatus subsp. vulgaris America ecotype C. lanatus subsp. mucosospermus C. lanatus subsp. lanatus PI248178 Watermelon genome resequencing Genetic diversity of watermelon genomes Structure of watermelon germplasm Pattern of 5S and 45S rDNA distribution C. lanatus subsp. mucosospermus is the recent ancestor of C. lanatus subsp. vulgaris Selective sweep a total of 108 regions of 7.78 Mb in size containing 741 candidate genes GO term enrichment analysis: regulation of carbohydrate utilization, sugar mediated signaling, carbohydrate metabolism, response to sucrose stimulus, regulation of nitrogen compound metabolism, cellular response to nitrogen starvation, and growth Evolution of disease resistance genes • Only 44 NBS-LRR genes identified in the reference 97103 genome • LOX family has undergone an expansion in the watermelon genome with 26 members, of which nineteen are arranged in two tandem gene arrays Evolution of disease resistance genes De novo assembly of un-aligned reads from low-coverage resequencing Group 1 (cultivated watermelon): East Asia (5) and America ecotype (5) Group 2 (semi-wild/wild): Citrullus lanatus subsp. lanatus subsp.egusi (6) Citrullus lanatus subsp. citroides (4) Group 1: no disease genes. conserved hypothetical protein conserved hypothetical protein cytochrome c biogenesis orf452 maturase NADH-ubiquinone oxidoreductase fe-s protein predicted protein ATP binding protein mycolic acid methyl transferase-like protein Galactose oxidase precursor, putative ribosomal protein S12 xyloglucan endotransglucosylase/hydrolase 4.00E-38 6.00E-26 1.00E-133 1.00E-47 3.00E-54 1.00E-37 0 2.00E-69 1.00E-74 3.00E-18 5.00E-142 Evolution of disease resistance genes New genes identified from semi-wild/wild species pathogenesis-related protein 1 TIR-LRR-NBS disease resistance protein TIR-LRR-NBS disease resistance protein TIR-LRR-NBS disease resistance protein TIR-LRR-NBS disease resistance protein TIR-LRR-NBS disease resistance protein TIR-LRR-NBS disease resistance protein 13S-lipoxygenase lipoxygenase loxc homologue (3S)-linalool/(E)-nerolidol synthase 1-aminocyclopropane-1-carboxylate oxidase 1-aminocyclopropane-1-carboxylate oxidase-1 2,4-dienoyl-CoA reductase, putative 23 kDa jasmonate-induced protein agglutinin [Amaranthus hypochondriacus] Alba DNA/RNA-binding protein ATP synthase CF1 epsilon subunit ATP synthase subunit alpha, mitochondrial B3 domain-containing protein At3g25182 cytochrome P450 endo-1,3-beta-glucanase F1-ATPase alpha subunit gag protease polyprotein galactose-binding type-2 ribosome-inactivating protein heat shock protein histone H2B.2 huntingtin interacting protein hypothetical protein hypothetical protein hypothetical protein jasmonate-induced protein laccase family protein/diphenol oxidase family protein mandelonitrile lyase 1 1.00E-41 2.00E-45 3.00E-189 2.00E-66 6.00E-163 9.00E-71 1.00E-55 1.00E-80 2.00E-104 3.00E-39 3.00E-12 1.00E-38 2.00E-31 5.00E-15 1.00E-12 2.00E-06 7.00E-10 2.00E-13 1.00E-44 1.00E-07 1.00E-13 7.00E-11 2.00E-30 2.00E-06 8.00E-120 1.00E-65 1.00E-12 1.00E-22 4.00E-19 5.00E-08 2.00E-07 3.00E-21 2.00E-19 8.00E-06 Minor allergen Alt a monovalent cation:proton antiporter nodulin family protein NtPRp27 [Nicotiana tabacum] nutrient reservoir orcinol O-methyltransferase p8MTCP1 pectin methylesterase-like protein phosphate starvation-induced protein 2 polyphenol oxidase polyphenol oxidase 4 precursor predicted protein predicted protein predicted protein Protein PRY2 precursor, putative putative alcohol acyl-transferases putative major latex protein putative major latex protein putative non-specific lipid-transfer protein type 2 subfamily putative WRKY transcription factor 62 quinone reductase ribosomal protein L22 ribosomal protein S12 RNA polymerase beta subunit RNA recognition motif-containing protein RNase NE similar to MtN19-like protein soluble epoxide hydrolase UDP-glucosyltransferase UDP-glucosyltransferase family 1 protein Ulp1-like peptidase Ulp1-like peptidase xyloglucan endotransglucosylase/hydrolase 2.00E-99 7.00E-130 2.00E-17 2.00E-38 8.00E-29 6.00E-120 1.00E-14 2.00E-25 3.00E-17 5.00E-08 2.00E-46 8.00E-04 1.00E-13 3.00E-11 6.00E-16 2.00E-70 3.00E-30 1.00E-25 2.00E-14 3.00E-12 6.00E-16 5.00E-25 2.00E-60 1.00E-26 3.00E-05 3.00E-32 2.00E-46 6.00E-19 2.00E-122 3.00E-76 2.00E-14 2.00E-08 2.00E-99 Talk overview • Generation of a high-quality draft genome of watermelon • Re-sequencing the genomes of 20 representative watermelon accessions from the three C. lanatus sub-species. • Comparative transcriptome sequencing (RNA-seq) of cucurbit phleom sap and vascular bundles, and watermelon flesh and rind during fruit development • Initial analysis of sweet potato genome sequencing Comparative transcriptome analysis Strand-specific RNA-seq Phloem sap and vascular RNA-seq Sample No. cleaned reads No. mapped reads Cucumber Chinese long phloem sap, repeat 1 13807149 10815694 Cucumber Chinese long phloem sap, repeat 2 12733347 10010183 Cucumber Chinese long phloem sap, repeat 3 8686814 7742299 Cucumber Chinese long vascular tissue, repeat 1 7288455 5821286 Cucumber Chinese long vascular tissue, repeat 2 16548993 13637388 Cucumber Chinese long vascular tissue, repeat 3 8215212 7689994 Watermelon 97103 phloem sap, repeat 1 16930703 13232347 Watermelon 97103 phloem sap, repeat 2 12069969 9710926 Watermelon 97103 phloem sap, repeat 3 13465328 11177584 Watermelon 97103 vascular tissue, repeat 1 13607959 10957475 Watermelon 97103 vascular tissue, repeat 2 11380726 10420706 Watermelon 97103 vascular tissue, repeat 3 13224995 12060837 Phloem sap and vascular RNA-seq • 13,775 and 14,242 mRNA species, respectively, were identified in watermelon and cucumber vascular bundles, while only 1,519 and 1,012 transcripts, respectively, in the watermelon and cucumber phloem sap • gene sets were found to be almost identical in vascular bundles between the two cucurbit species, whereas only 50-60% of the transcripts detected in the phloem sap were held in common • Gene Ontology (GO) terms highly enriched in common phloem transcripts were response to stress or stimulus • GO terms, macromolecular biosynthesis process and protein metabolic process, were highly enriched in watermelon unique phloem transcripts Watermelon fruit RNA-seq Sample No. raw reads No. rRNA reads No. cleaned reads No. mapped reads flesh 10 DAP rep1 10,131,218 24,132 10,107,086 6,575,358 flesh 10 DAP rep2 10,752,201 40,929 10,711,272 8,705,967 flesh 18 DAP rep1 18,914,328 52,506 18,861,822 14,277,592 flesh 18 DAP rep2 12,077,551 33,363 12,044,188 8,704,999 flesh 26 DAP rep1 13,588,792 56,979 13,531,813 11,412,819 flesh 26 DAP rep2 12,345,625 36,261 12,309,364 6,546,412 flesh 34 DAP rep1 10,306,366 25,801 10,280,565 3,325,540 flesh 34 DAP rep2 8,148,638 49,097 8,099,541 5,130,657 rind 10 DAP rep1 8,647,278 65,174 8,582,104 5,554,196 rind 10 DAP rep2 8,488,793 40,998 8,447,795 7,656,486 rind 18 DAP rep1 10,154,906 33,819 10,121,087 5,798,131 rind 18 DAP rep2 12,837,948 52,792 12,785,156 7,751,759 rind 26 DAP rep1 10,861,345 51,046 10,810,299 7,751,488 rind 26 DAP rep2 11,755,860 61,439 11,694,421 9,686,947 rind 34 DAP rep1 9,778,317 40,707 9,737,610 5,643,690 rind 34 DAP rep2 11,572,988 40,868 11,532,120 7,752,428 10 DAP 18 DAP 26 DAP 34 DAP Watermelon fruit RNA-seq • 3,046 and 558 genes that were differentially expressed in flesh and rind, respectively, during fruit development • A model for sugar metabolism in cells of watermelon fruit flesh was proposed based on the expression profiles of sugar metabolic and transporter genes • Key enzymes regulating watermelon citrulline accumulation in watermelon fruit were identified Summary of watermelon sequencing • A high quality draft genome of watermelon cultivar 97103 was generated and the genome contains around 23,440 genes. • Resequencing of 20 watermelon accessions representing three different C. lanatus subspecies produced numerous haplotypes and revealed the extent of genetic diversity and population structure of watermelon germplasm. • Genomic regions that were preferentially selected during domestication were identified and on the other hand, many disease resistance genes were found to be lost during domestication. • Integrative genomic and transcriptomic analyses yielded important insights into aspects of phloem-based vascular signaling, and identified genes critical to valuable fruit quality traits The draft genome of watermelon (Citrullus lanatus) and resequencing of 20 diverse accessions. Nature Genetics In press Talk overview • Generation of a high-quality draft genome of watermelon • Re-sequencing the genomes of 20 representative watermelon accessions from the three C. lanatus sub-species. • Comparative transcriptome sequencing (RNA-seq) of cucurbit phleom sap and vascular bundles, and watermelon flesh and rind during fruit development • Initial analysis of sweet potato genome sequencing Sweetpotato genome sequencing flow cytometry analysis Arumuganathan, K. & Earle, E.D. Nuclear DNA content of some important plant species. Plant Mol. Biol. Reporter 9, 208–218 (1991) Sweetpotato genome sequencing • • • Huachano, a Peruvian landrace which is amenable to genetic transformation About 80G raw sequence data was generated using the Illumina HiSeq 2000 system Additional 50G raw data was generated using the SOLiD system Raw data ~246 M read pairs in 200 bp library. ~152 M read pairs in 500 bp library. Total length 80,317 Mb Cleaned data # paired # Single Read size Total SP200 185 M 32 M 94 bp 37.68 Gb SP500 127 M 17 M 90 bp 24.36 Gb Total 312 M 49 M 92 bp 62.04 Gb Sweetpotato genome sequencing Kmer distribution Sweet potato watermelon Genome size = (Total number of kmer)/(Position of peak depth) = 4,639,223,061 / 11 = 421.75 M Sweetpotato genome sequencing De novo assembly Scaffolds >= 200 bp (GC% = 38.35%) Contig Scaffold Size (bp) Index Size (bp) Index N90 236 737,451 282 538,782 N80 302 552,738 421 392,362 N70 382 407,767 563 292,438 N60 480 292,372 701 212,612 N50 626 202,185 903 149,749 N25 1,267 59,915 1,695 46,336 Largest 19,628 1 21,622 1 492,615,538 998,299 498,123,765 751,346 Total Sweetpotato genome sequencing Heterozygous SNP Same Base homoSNP heteSNP Total SNP% heteSNP% SP200R1 366,949,901 707,821 1,546,356 0.61% 0.42% SP200R2 362,273,897 713,986 1,548,594 0.62% 0.42% SP500R1 434,569,440 728,686 1,920,965 0.61% 0.44% SP500R2 430,648,776 739,694 1,896,432 0.61% 0.44% SP Read Pairs 478,919,859 744,669 2,225,150 0.62% 0.46% cucumber 339,806,337 41,408 81,933 0.04% 0.02% Cucumber (Maximum) 174,393,370 298,379 392,352 0.39% 0.22% Cucumber (Mean) 171,397,005 419,873 81,520 0.29% 0.05% Sweetpotato genome sequencing Heterozygous SNP 2:1 1:1 3:2 B1B1B2B2B2B2 Acknowledgement ZhangJun Fei Linyong Mao Yi Zheng …… Yong Xu Shaogui Guo Honghe Sun …… Jianguo Zhang Zhiwen Wang Jiumeng Min …… Jerome Salse Florent Murat William Lucas Byung-Kook Ham Zhaoliang Zhang Erik Legg, Xingping Zhang, Eric.Ganko Joel.Piquemal, Michel.Ragot Jack de Wit, Remco Ursem, Zhongkui Sun Rob Dirks, Aat Vogelaar Acknowledgment
© Copyright 2026 Paperzz