Yu-Chieh Liao, Ph.D. Assistant Investigator Institute of Population Health Sciences National Health Research Institutes No. 35 Keyan Road, Zhunan Town, Miaoli County 350, Taiwan Tel: +886-37-246166 ext. 36178 Fax: +886-37-586-467 E-mail: [email protected] Dec 27, 2016 Dr. Hans Zauner Academic Editor GigaScience Dear Dr. Zauner, Thank you for your correspondence dated December 7 and the enclosed review commentary regarding the manuscript that was submitted for publication in GigaScience, drVM: Detection and Reconstruction of Viral Genomes from Metagenomes (GIGA-D-16-00060R1). We found these comments are helpful. The manuscript has been amended according to the reviewers’ comments and responses to the comments raised by the reviewers are attached to this letter. We have changed the title of manuscript to “drVM: a new tool for efficient genome assembly of known eukaryotic viruses from metagenomes” and added discussion as the reviewer’s suggestion. We have demonstrated that drVM’s potential to produce prompt and accurate genome assembly of known virus from metagenomes of clinical samples. We trust that the current manuscript should be suitable for publication in GigaScience. We thank you in advance for your consideration! Sincerely yours, Yu-Chieh Liao, Ph.D. Assistant Investigator Institute of Population Health Sciences National Health Research Institutes No. 35 Keyan Road, Zhunan Town, Miaoli County 350, Taiwan Responses to reviewers’ comments: Reviewer #1: Counter-response: You've done some of this, but it needs to be described more clearly, see below. Counter-response: This is a very long answer, and as I see it, the core points are: - The intended use case is to assemble genomes of viruses for which a genus representative already exists, with at least a 20nt match. This is not an unreasonable limitation but it is a definite limitation. Thus it is misleading to say the tool is de novo in an unqualified manner. This limitation (de novo assembly within previously characterized genera) should be pushed more strongly, including in abstract and discussion, so readers do not believe it can work on any arbitrary virus. - You show in a single simulated example that assembly works well at 20X but not 10X. While only an example, it is likely representative, so it addresses that concern. - You show in another simulated example that you can separate two distinct genomes from the same genus via assembly - fine, that shows that within such constraints, you can handle mixtures. Please add to discussion at some point that while you have verified it can tell apart species within a genus, it might still not be able to tell apart even more similar viruses. - On the example on a complex mixture, is this one of the datasets in your benchmark table? If so, please see concerns below. [Response] We thank the reviewer’s comments. We have change the title of our manuscript to “drVM: a new tool for efficient genome assembly of known eukaryotic viruses from metagenomes”. And, we have added the description “The first two procedures and sequence annotation rely on known viral genomes as a reference database.” in Abstract [line 27-29]. We have added the description “Although drVM produced distinct viral genome assemblies within the same genus, for human papillomavirus types 45 and 53 in SRR062073 (Fig. 2 and Table 3), and for human enterovirus rhinovirus in the simulation dataset (Fig. 3), the assembler may not be able to handle mixtures of very closely related viruses.” in Discussion [line 380-384]. Counter-response: So for your simulation, you show that for one specific example you can separate same-genus representatives. This doesn't conclude there may be a risk of errors where genera are poorly sampled, but such assessment might be outside the reasonable scope of this MS. However, please add some discussion/caveats in that it isn't known at present whether this (or any other method) may make mistakes if genera are poorly sampled (in the sense of, few and biased references in the databases) or if there are mixtures of very closely related viruses. [Response] We have added the description “Although drVM produced distinct viral genome assemblies within the same genus, for human papillomavirus types 45 and 53 in SRR062073 (Fig. 2 and Table 3), and for human enterovirus rhinovirus in the simulation dataset (Fig. 3), the assembler may not be able to handle mixtures of very closely related viruses. Moreover, a biased viral reference database may result in assemblies with missing segments or no assembly whatsoever.” in Discussion [line 380-385]. Counter-response: So none of the other tools can do unqualified de novo assembly either? OK, fine then. [Response] We thank for the reviewer’s understanding. Counter-response: Here perhaps I was imprecise, it's not clear from text or table what the ground truth is. If I were to guess, I would conclude: - You consider a positive result assembly of something which is 99% similar to something submitted from the original study. In this regard you have a sensitivity of 72/98, somewhat higher if you also consider where you at least make large context. This is fine. But there is also the false positive question. Of your remaining identified coverage maps/detections (please clarify this is your definition of a positive), they were not reported by the original authors of those studies. So they could either be false positives, or novel discoveries that were false negatives in those earlier studies. You need to discuss this. I agree that large contigs >99% identical to sequenced references are unlikely to be artifacts, but you need to spell this out loud in discussing those results and why you therefore don't think of these as false positives. [Response] We have added the description “Unlike SURPI and VIP, the coverage plots produced by drVM are created by mapping raw reads back to the assembled contigs, not to closely-related references. A coverage plot with a continuous profile reflects the accuracy and continuity of the assembly paradigm. With the support of coverage plot, one can rest assured that viruses with drVM-produced genome assemblies are present in the sample.” in Discussion [line 385-389]. Reviewer #2: Overall, I appreciate the new data provided by the authors, but still find the presentation of the tool misleading. The main issue I have is the lack of clarity about the tool scope and possibilities. Currently, the manuscript alternatively describes drVM either as a general tool with broad application in viral genome assembly or as a software designed for human/animal viral pathogens detections. Since drVM is based on a classification-then-assembly pipeline, and the authors use a human virus database, I believe that their intended purpose is to provide a tool specifically designed to easily assemble new types/variants of known viral pathogens (which is consistent with their introduction and conclusion mentioning this application, l. 56: "identification of potential pathogens", and l. 322: "detect pathogens in clinical samples"). However, in that case, this has to be clarified throughout the whole manuscript. For example, the current title "drVM: Detection and Reconstruction of Viral Genomes from Metagenomes" wrongly suggest that drVM can be used with any type of viral genome (known or unknown, which is not tested in the manuscript). Similarly, the tool description in the introduction and conclusion are very evasive, e.g. l. 87: "detection and reconstruction of various viral genomes", l. 400: "demonstrate that drVM is indeed able to detect and reconstruct various viral genomes from metagenomic data". Some claims are even (in my opinion) clearly misleading, for example the authors assert that "We can therefore expect to extend our knowledge of viral diversity by leveraging the unique genome assembly capabilities of drVM" (l. 140), yet all the assemblies provided by drVM are very closely related to the known references (90% ANI or higher), and thus can hardly be qualified as "extending our knowledge of viral diversity". Thus, to me, drVM is a new tool for the "Efficient assembly of known eukaryote viruses genomes from metagenomes", but presenting it simply as "Detection and Reconstruction of Viral Genomes from Metagenomes" is misleading for non-specialist readers, and would not allow such readers to quickly determine is this tool is well suited for their project or not (that is, without reading the whole manuscript and/or testing the tool themselves). [Response] We thank the reviewer’s valuable suggestion. We have changed the title of the manuscript to “drVM: a new tool for efficient genome assembly of known eukaryotic viruses from metagenomes” and revised the text according. Finally, I found the writing still confusing in multiple places (which could likely be improved if the manuscript was proof-read by a native english speaker). A few examples are: [Response] We have asked a native English speaker (also a Bioengineering Ph.D.) to revise the manuscript. Throughout the manuscript, the use of "runs" to describe one set of metagenomic sequences instead of the more traditionally used "libraries". [Response] Multiple runs can be produced from a single library. Based on the description in SRA Handbook (https://www.ncbi.nlm.nih.gov/books/NBK47529/) “Runs describe the files that belong to the previously created Experiments. They specify the data files for a specific sample to be processed by SRA. Experiments may contain many Runs depending on how many sequencer runs were involved in data acquisition.”, we decided to use “sequencing runs” instead of “sequencing libraries”. l. 29: "The feasibility of drVM was validated via the analysis of over 300 sequencing runs" drVM was validated via the analysis of over 300 sequencing runs generated by Illumina and Ion Torrent platforms to …[line 29] l.65: "to assemble reads under genus-classification to assembly-for improved viral genome assembly". VIP pursues the same strategy as SURPI-subtraction to identification-to subtract host and bacteria reads prior to the identification of viral reads, although it does provide an alternative strategy to the assembly of reads under a genus-classification to assembly-hence enabling improved viral genome assembly. [line 62-66] l.187: "Please note that viral contigs" Note that viral sequences were fully assembled, de novo, by SPAdes using reads within the same genus [line 189] l.192: "across genomics positions, the y-axis, thus representing read depth" The coverage plots were generated by plotting read coverage across genomic position, with the y-axis representing read depth. [line 192-194] l. 344: "shared only 90% at nucleotide level identity" This assembled sequence shared only 90% identity at nucleotide level with a closely related porcine kobuvirus genome sequence [line 348] l. 358: "If multiple contigs present in one coverage plot" If multiple contigs are present in one coverage plot, reads aligned to the contigs are extracted in pairs for subsequent re-assembly [line 362-363] l. 371: "that is to say, to extract viral reads from metagenomes is prior to assembly." suggesting that to extract viral reads from metagenomes is a prior step to sequence assembly [line 375-376] On the tool output: Since the tool is designed to be user-friendly, I don't understand why the authors did not add y-axis labels directly on the coverage plots: these plots are the main output from drVM, and I would think the authors want to make it as easy as possible for a user to understand what is represented. [Response] We have changed the scripts of drVM to add “read depth” as the y-axis label in a coverage plot. [Fig. 2 and Fig. 3] Reviewer #3: The revisions add clarity and the resulting manuscript is greatly improved. Nice work. [Response] We appreciated the reviewer’s comments.
© Copyright 2026 Paperzz