Complete genome sequence of Genus/species Author list Institution list Corresponding author: Robert Edwards, [email protected] Keywords 5-6 key words that describe the organism and/or your findings Abstract You should write a 150-200 word abstract that describes what you have found and why it is interesting. Abbreviations EMBL: European Molecular Biology Laboratory NCBI: National Center for Biotechnology Information (Bethesda, MD, USA) RDP: Ribosomal Database Project (East Lansing, MI, USA) Introduction In the introduction, you should provide additional information concerning the background, purpose and overall approach of what was done. You should describe both the sequencing (but in general terms) and the annotation and analysis. Organism information In this section, we will describe the organism. The first thing that you will do is identify the 16S genes from the organism, and build a phylogenetic tree of the 5-10 most closely related organisms. You should use that information to generate a Genus and Species name for the strain that you have sequenced, and then use that name throughout the paper. This is an example figure and figure legend, you should make something that looks like this: Figure 1. Phylogenetic tree highlighting the position of Sphingomonas wittichii strain RW1 relative to other type and non-type strains within the Sphingomonadacaea. Strains shown are those within the Sphingomonadacaea having corresponding NCBI genome project ids listed within . The strains and their corresponding GenBank accession numbers (and, when applicable, draft sequence coordinates) for 16S rRNA genes are (type=T): N. aromaticivorans strain SMCC F199T, U20756; Erythrobacter sp. strain NAP1, AAMW01000002.1:1127089-1128582; E. litoralis strain HTCC2594, CP000157; Sphingomonas sp. strain SKA58, AAQG01000001.1:1-836; S. wittichii strain RW1T, AB021492; and Z. mobilis strain ATCC 31821, AF281031. The tree uses sequences aligned by the RDP aligner, and uses the Jukes-Cantor corrected distance model to construct a distance matrix based on alignment model positions without the use of alignment inserts, and uses a minimum comparable position of 200. The tree is built with RDP Tree Builder, which uses Weighbor with an alphabet size of 4 and length size of 1000. The building of the tree also involves a bootstrapping process repeated 100 times to generate a majority consensus tree . Z. mobilis (AF281031) was used as an outgroup. Table 1. Classification and general features of your strain. You should fill this table in to the best of your ability, and you should add evidence codes (either IDA, TAS, or NAS; see the website) as applicable. You may need to consult with Dr. Edwards or Dr. Dinsdale to complete this table! MIGS ID Property Current classification MIGS-6 MIGS-6.3 Gram stain Cell shape Motility Sporulation Temperature range Optimum temperature Carbon source Energy source Terminal electron receptor Habitat Salinity MIGS-22 Oxygen MIGS-15 Biotic relationship MIGS-14 Pathogenicity MIGS-4 MIGS-5 MIGS-4.1 MIGS-4.2 MIGS-4.3 MIGS-4.4 Geographic location Sample collection time Latitude – Longitude Term Evidence codea Domain Phylum Class Order Family Genus Species Depth Altitude Genome sequencing information Genome project history You should write a brief summary of the genome sequencing approach, including how the bacteria were found, and how the sequencing was done. You may be able to fill in some of these sentences, or you may not be able to. The genome was selected because… The genome sequence was completed in … and presented for public access on …. Finishing was done using …. Annotation was performed using … and…. A summary of the project information is shown in Table 2.) Table 2. Project information MIGS ID MIGS-31 MIGS-28 MIGS-29 MIGS-31.2 MIGS-30 MIGS-32 Property Finishing quality Libraries used Sequencing platforms Fold coverage Assemblers Gene calling method Genome Database release Genbank ID Genbank Date of Release GOLD ID Project relevance Term (e.g., improved-high-quality draft) (…) (e.g., 454, Sanger) (e.g., 14.5 x) (e.g., Arachne) (e.g., Glimmer) (…) (e.g., biotechnological, pathway) Growth conditions and DNA isolation You should discuss how the organism was grown and how the DNA was isolated. Genome sequencing and assembly You should discuss general aspects of library construction and sequencing. Types of technologies and methods that could be mentioned include 454 pyrosequencing reads, Newbler assembler (Roche 454), size of overlapping fragments, q-scores. This is an example of what you might write: This microbial genomes was curated to close as many gaps as possible. Each base pair has a minimum q (quality) score of xxx. The genome of <your strain name> was sequenced as a part of the San Diego State University sequencing center. The error rate of the completed genome sequence is less than xxx in 50000. On average, there is a xxx-fold coverage of the genome, although there are some areas with higher coverage. Genome annotation You should describe the genome annotation approach (software and/or databases used – with citations). For example: Protein encoding genes were identified using the Rapid Annotation using Subsystems (RAST) ORF caller [REF] as part of the RAST genome annotation pipeline at Argonne National Laboratory, Argonne, IL, USA and were compared to ORFs identified using GLIMMER version 3.1 [REF]. The predicted protein encoding genes were translated and used to search the National Center for Biotechnology Information (NCBI) non redundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, InterPro, and the RAST-non redundant databases. The tRNAScanSE tool [REF] was used to find tRNA genes, and ribosomal RNAs were found by searching [using what software??] against the Greengenes ribosomal database [REF]. The RNA components of the protein secretion complex and the RNaseP were identified by searching the genome for the corresponding Rfam profiles using INFERNAL (http://infernal.janelia.org/). Additional gene prediction analysis and manual functional annotation was performed within the SEED platform developed at Argonne National Laboratory [REF]. Genome properties Here you should describe what you find in the genome. For example: The genome has a single, circular chromosome of XXX bp and an average of XX.X% GC content. The genome also includes two plasmids, for a total size of XXX bp. The chromosome contains XXX predicted genes , XXX of which are protein encoding genes. XXX of protein coding genes were assigned to a putative function with the remaining annotated as hypothetical proteins, and YYY of the protein encoding genes have high quality annotations as they are in subsystems. XXX protein coding genes belong to XXX paralogous families in this genome corresponding to a gene content redundancy of XX.X%. The properties and the statistics of the genome are summarized in Tables 3-5. Table 3. Summary of genome: one chromosome and two plasmids An INSDC identifier is typically a GenBank or EMBL or DDBJ accession number. The last column can be used to refer to another identifier scheme (e.g. RAST) depending on what identifier aided the process of data retrieval. Label Size (Mb) Topology INSDC RAST ID identifier Chromosome 1 Plasmid 1 Plasmid 2 Table 4. Nucleotide content and gene count levels of the genome You should add other rows to this table, depending on the analysis that you have done. Attribute Genome (total) Value % of totala Size (bp) G+C content (bp) Coding region (bp) Total genesb RNA genes Protein-coding genes Genes in paralog clusters Genes assigned to COGs 1 or more conserved domains 2 or more conserved domains 3 or more conserved domains 4 or more conserved domains Genes with signal peptides Genes with transmembrane helices Paralogous groups a) The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome. b) Also includes 54 pseudogenes and 5 other genes. Table 5. Number of genes associated with the subsystem hierarchies. Number % of totala Description Amino Acids and Derivatives Arabinose Sensor and transport module Carbohydrates Cell Division and Cell Cycle Cell Wall and Capsule Clustering-based subsystems Cofactors, Vitamins, Prosthetic Groups, Pigments DNA Metabolism Dormancy and Sporulation Fatty Acids, Lipids, and Isoprenoids Iron acquisition and metabolism Membrane Transport Metabolism of Aromatic Compounds Miscellaneous Motility and Chemotaxis Nitrogen Metabolism Nucleosides and Nucleotides Phages, Prophages, Transposable elements, Plasmids Phosphorus Metabolism Photosynthesis Plasmids Potassium metabolism Protein Metabolism RNA Metabolism Regulation and Cell signaling Respiration Secondary Metabolism Stress Response Sulfur Metabolism Virulence, Disease and Defense General function prediction only Function unknown Not in Subsystems a) The total is based on the total number of protein coding genes in the annotated genome. Additional Information You should add other information relevant to this genome. Two examples include: Profiles of metabolic network and pathways. For instance, how many genes can be associated with a metabolic network? How many enzymes, enzymatic reactions, metabolic pathways and metabolites may be found in the genome? A diagram of interacting cellular components (e.g., amino acids, carbohydrates, proteins, purines, cofactors, tRNAs, etc)? You can extract most or all of this information from RAST if you check the box to have a metabolic model prediction run. Comparisons with other fully sequenced genomes. For instance, how do the genome properties compare to other members within the taxonomic family and the overall set of fully sequenced Bacterial and Archaeal genomes. How do the genome properties compare to other organisms from a similar environment. Conclusion Write some conclusions about your study. Associated MIGS Record Fill in the following table which will become an electronic record of MIGS compliance associated with this paper. Values left blank will be filled in with the text “not reported”, so fill in as much as you can! Table S1. Associated MIGS record MIGS-ID MIGS-1 1.1 field name Submit to INSDC/Trace archives PID description Not reported 1.2 MIGS-2 MIGS-3 MIGS-4 4.1 4.2 4.3 4.4 MIGS-5 MIGS-6 6.1 6.2 6.3 6.4 Trace Archive MIGS CHECK LIST TYPE Project Name Geographic Location Latitude Longitude Depth Altitude Time of Sample collection Habitat (EnvO) temperature pH salinity chlorophyll 6.5 6.6 6.7 6.8 6.9 6.10 conductivity light intensity dissolved organic carbon (DOC) current atmospheric data density 6.11 6.12 6.13 6.14 6.15 6.16 alkalinity dissolved oxygen particulate organic carbon (POC) phosphate nitrate sulfates 6.17 6.18 MIGS-7 MIGS-9 MIGS-10 MIGS-11 MIGS-12 MIGS-13 sulfides primary production Subspecific genetic lineage Number of replicons Extrachromosomal elements Estimated Size Reference for biomaterial or Genome report Source material identifiers MIGS-14 MIGS-15 MIGS-16 MIGS-17 MIGS-18 MIGS-19 MIGS-22 MIGS-23 MIGS-27 MIGS-28 28.1 28.2 28.3 MIGS-29 MIGS-30 30.1 Known Pathogenicity Biotic Relationship Specific Host Host specificity or range (taxid) Health status of Host Trophic Level Relationship to Oxygen Isolation and Growth conditions Nucleic acid preparation Library construction Library size Number of reads vector Sequencing method Assembly Assembly method 30.2 30.3 estimated error rate method of calculation Not reported Not reported Emulsion PCR N/A Roche Pyrosequencing Newbler MIGS-31 31.1 31.2 31.3 MIGS-32 MIGS-33 Finishing strategy Status coverage contigs Relevant SOPs Relevant e-resources None References Please add references here using this format. I strongly recommend that you use zotero to organize and insert your references.
© Copyright 2026 Paperzz