Dec 27, 2016 Dr. Hans Zauner Academic Editor GigaScience Dear

Yu-Chieh Liao, Ph.D.
Assistant Investigator
Institute of Population Health Sciences
National Health Research Institutes
No. 35 Keyan Road, Zhunan Town, Miaoli County 350, Taiwan
Tel: +886-37-246166 ext. 36178
Fax: +886-37-586-467
E-mail: [email protected]
Dec 27, 2016
Dr. Hans Zauner
Academic Editor
GigaScience
Dear Dr. Zauner,
Thank you for your correspondence dated December 7 and the enclosed review
commentary regarding the manuscript that was submitted for publication in
GigaScience, drVM: Detection and Reconstruction of Viral Genomes from
Metagenomes (GIGA-D-16-00060R1). We found these comments are helpful. The
manuscript has been amended according to the reviewers’ comments and responses to
the comments raised by the reviewers are attached to this letter. We have changed the
title of manuscript to “drVM: a new tool for efficient genome assembly of known
eukaryotic viruses from metagenomes” and added discussion as the reviewer’s
suggestion. We have demonstrated that drVM’s potential to produce prompt and
accurate genome assembly of known virus from metagenomes of clinical samples. We
trust that the current manuscript should be suitable for publication in GigaScience.
We thank you in advance for your consideration!
Sincerely yours,
Yu-Chieh Liao, Ph.D.
Assistant Investigator
Institute of Population Health Sciences
National Health Research Institutes
No. 35 Keyan Road, Zhunan Town, Miaoli County 350, Taiwan
Responses to reviewers’ comments:
Reviewer #1:
Counter-response: You've done some of this, but it needs to be described more clearly,
see below.
Counter-response: This is a very long answer, and as I see it, the core points are:
- The intended use case is to assemble genomes of viruses for which a genus
representative already exists, with at least a 20nt match. This is not an unreasonable
limitation but it is a definite limitation. Thus it is misleading to say the tool is de novo
in an unqualified manner. This limitation (de novo assembly within previously
characterized genera) should be pushed more strongly, including in abstract and
discussion, so readers do not believe it can work on any arbitrary virus.
- You show in a single simulated example that assembly works well at 20X but not
10X. While only an example, it is likely representative, so it addresses that concern.
- You show in another simulated example that you can separate two distinct genomes
from the same genus via assembly - fine, that shows that within such constraints, you
can handle mixtures. Please add to discussion at some point that while you have
verified it can tell apart species within a genus, it might still not be able to tell
apart even more similar viruses.
- On the example on a complex mixture, is this one of the datasets in your benchmark
table? If so, please see concerns below.
[Response] We thank the reviewer’s comments. We have change the title of our
manuscript to “drVM: a new tool for efficient genome assembly of known eukaryotic
viruses from metagenomes”. And, we have added the description “The first two
procedures and sequence annotation rely on known viral genomes as a reference
database.” in Abstract [line 27-29]. We have added the description “Although drVM
produced distinct viral genome assemblies within the same genus, for human
papillomavirus types 45 and 53 in SRR062073 (Fig. 2 and Table 3), and for human
enterovirus rhinovirus in the simulation dataset (Fig. 3), the assembler may not be
able to handle mixtures of very closely related viruses.” in Discussion [line 380-384].
Counter-response: So for your simulation, you show that for one specific example you
can separate same-genus representatives. This doesn't conclude there may be a risk of
errors where genera are poorly sampled, but such assessment might be outside the
reasonable scope of this MS. However, please add some discussion/caveats in that
it isn't known at present whether this (or any other method) may make mistakes
if genera are poorly sampled (in the sense of, few and biased references in the
databases) or if there are mixtures of very closely related viruses.
[Response] We have added the description “Although drVM produced distinct viral
genome assemblies within the same genus, for human papillomavirus types 45 and 53
in SRR062073 (Fig. 2 and Table 3), and for human enterovirus rhinovirus in the
simulation dataset (Fig. 3), the assembler may not be able to handle mixtures of very
closely related viruses. Moreover, a biased viral reference database may result in
assemblies with missing segments or no assembly whatsoever.” in Discussion [line
380-385].
Counter-response: So none of the other tools can do unqualified de novo assembly
either? OK, fine then.
[Response] We thank for the reviewer’s understanding.
Counter-response: Here perhaps I was imprecise, it's not clear from text or table what
the ground truth is. If I were to guess, I would conclude:
- You consider a positive result assembly of something which is 99% similar to
something submitted from the original study. In this regard you have a sensitivity of
72/98, somewhat higher if you also consider where you at least make large context.
This is fine. But there is also the false positive question. Of your remaining identified
coverage maps/detections (please clarify this is your definition of a positive), they
were not reported by the original authors of those studies. So they could either be
false positives, or novel discoveries that were false negatives in those earlier studies.
You need to discuss this. I agree that large contigs >99% identical to sequenced
references are unlikely to be artifacts, but you need to spell this out loud in
discussing those results and why you therefore don't think of these as false
positives.
[Response] We have added the description “Unlike SURPI and VIP, the coverage
plots produced by drVM are created by mapping raw reads back to the assembled
contigs, not to closely-related references. A coverage plot with a continuous profile
reflects the accuracy and continuity of the assembly paradigm. With the support of
coverage plot, one can rest assured that viruses with drVM-produced genome
assemblies are present in the sample.” in Discussion [line 385-389].
Reviewer #2: Overall, I appreciate the new data provided by the authors, but still find
the presentation of the tool misleading. The main issue I have is the lack of clarity
about the tool scope and possibilities. Currently, the manuscript alternatively
describes drVM either as a general tool with broad application in viral genome
assembly or as a software designed for human/animal viral pathogens detections.
Since drVM is based on a classification-then-assembly pipeline, and the authors use a
human virus database, I believe that their intended purpose is to provide a tool
specifically designed to easily assemble new types/variants of known viral pathogens
(which is consistent with their introduction and conclusion mentioning this
application, l. 56: "identification of potential pathogens", and l. 322: "detect
pathogens in clinical samples"). However, in that case, this has to be clarified
throughout the whole manuscript. For example, the current title "drVM: Detection and
Reconstruction of Viral Genomes from Metagenomes" wrongly suggest that drVM
can be used with any type of viral genome (known or unknown, which is not tested in
the manuscript). Similarly, the tool description in the introduction and conclusion are
very evasive, e.g. l. 87: "detection and reconstruction of various viral genomes", l.
400: "demonstrate that drVM is indeed able to detect and reconstruct various viral
genomes from metagenomic data". Some claims are even (in my opinion) clearly
misleading, for example the authors assert that "We can therefore expect to extend
our knowledge of viral diversity by leveraging the unique genome assembly
capabilities of drVM" (l. 140), yet all the assemblies provided by drVM are very
closely related to the known references (90% ANI or higher), and thus can hardly be
qualified as "extending our knowledge of viral diversity".
Thus, to me, drVM is a new tool for the "Efficient assembly of known eukaryote
viruses genomes from metagenomes", but presenting it simply as "Detection and
Reconstruction of Viral Genomes from Metagenomes" is misleading for
non-specialist readers, and would not allow such readers to quickly determine is this
tool is well suited for their project or not (that is, without reading the whole
manuscript and/or testing the tool themselves).
[Response] We thank the reviewer’s valuable suggestion. We have changed the title of
the manuscript to “drVM: a new tool for efficient genome assembly of known
eukaryotic viruses from metagenomes” and revised the text according.
Finally, I found the writing still confusing in multiple places (which could likely be
improved if the manuscript was proof-read by a native english speaker). A few
examples are:
[Response] We have asked a native English speaker (also a Bioengineering Ph.D.) to
revise the manuscript.
Throughout the manuscript, the use of "runs" to describe one set of metagenomic
sequences instead of the more traditionally used "libraries".
[Response] Multiple runs can be produced from a single library. Based on the
description in SRA Handbook (https://www.ncbi.nlm.nih.gov/books/NBK47529/)
“Runs describe the files that belong to the previously created Experiments. They
specify the data files for a specific sample to be processed by SRA. Experiments may
contain many Runs depending on how many sequencer runs were involved in data
acquisition.”, we decided to use “sequencing runs” instead of “sequencing libraries”.
l. 29: "The feasibility of drVM was validated via the analysis of over 300 sequencing
runs"  drVM was validated via the analysis of over 300 sequencing runs generated
by Illumina and Ion Torrent platforms to …[line 29]
l.65: "to assemble reads under genus-classification to assembly-for improved viral
genome assembly".  VIP pursues the same strategy as SURPI-subtraction to
identification-to subtract host and bacteria reads prior to the identification of viral
reads, although it does provide an alternative strategy to the assembly of reads under a
genus-classification to assembly-hence enabling improved viral genome assembly.
[line 62-66]
l.187: "Please note that viral contigs"  Note that viral sequences were fully
assembled, de novo, by SPAdes using reads within the same genus [line 189]
l.192: "across genomics positions, the y-axis, thus representing read depth"  The
coverage plots were generated by plotting read coverage across genomic position,
with the y-axis representing read depth. [line 192-194]
l. 344: "shared only 90% at nucleotide level identity"  This assembled sequence
shared only 90% identity at nucleotide level with a closely related porcine kobuvirus
genome sequence [line 348]
l. 358: "If multiple contigs present in one coverage plot"  If multiple contigs are
present in one coverage plot, reads aligned to the contigs are extracted in pairs for
subsequent re-assembly [line 362-363]
l. 371: "that is to say, to extract viral reads from metagenomes is prior to assembly."
 suggesting that to extract viral reads from metagenomes is a prior step to sequence
assembly [line 375-376]
On the tool output: Since the tool is designed to be user-friendly, I don't understand
why the authors did not add y-axis labels directly on the coverage plots: these plots
are the main output from drVM, and I would think the authors want to make it as easy
as possible for a user to understand what is represented.
[Response] We have changed the scripts of drVM to add “read depth” as the y-axis
label in a coverage plot. [Fig. 2 and Fig. 3]
Reviewer #3: The revisions add clarity and the resulting manuscript is greatly
improved. Nice work.
[Response] We appreciated the reviewer’s comments.