EXCELERATEDeliverableD6.1
ProjectTitle:
ELIXIR-EXCELERATE:Fast-trackELIXIRimplementationanddrive
earlyuserexploitationacrossthelifesciences
ProjectAcronym:
ELIXIR-EXCELERATE
Grantagreementno.:
676559
H2020-INFRADEV-2014-2015/H2020-INFRADEV-1-2015-1
Deliverabletitle:
Specificmarinedatabases
WPNo.
6
LeadBeneficiary:
UiT
WPTitle
Marinemetagenomicinfrastructureasadriverforresearchand
industrialinnovation
Contractualdeliverydate:
17022017
Actualdeliverydate:
17022017
WPleader:
RobFinn,EMBL-EBI
1:EMBL-EBI;
NilsPederWillassen,UiT
24:UiT
Partner(s)contributingtothis UiT,EMBL-EBI
deliverable:
AuthorsandContributors:
IngeAlexanderRaknes(UiT),TerjeKlementsen(UiT),SudhagarVeerabadran(UiT),Balasundaram,
GiacomoTartari(UiT),JuanFu(UiT),EspenRobertsen(UiT),LarsAiloBongo(UiT),NilsPeder
Willassen(UiT),GuyCochrane(EMBL-EBI),RobFinn(EMBL-EBI)
EXCELERATE: Deliverable D6.1
Tableofcontent
1.
Executive Summary
2.
Project objectives
3.
Delivery and schedule
4.
Adjustments made
5.
Background information
6.
Appendix 1: “Marine specific databases”
1.
ExecutiveSummary
The marine databases; MarRef, MarDb, and MarCat, are public available resources that
promotesmarineresearchandinnovation.
The marine resources, which have been implemented in the Marine Metagenomics Portal
(MMP), are a collection of richly annotated and manually curated contextual (metadata) and
sequence databases representing three tiers of accuracy. While MarRef is a database for
completely sequenced marine prokaryotic genomes, which represent a marine prokaryote
reference genome database, MarDb includes all sequenced marine prokaryotic genomes
regardlessoflevelofcompleteness.MarCatrepresentagene(protein)catalogueofuncultivable
(andcultivable)marinegenesandproteinsderivedfrommetagenomicssamples.
ThefirstversionsofMarRefandMarDbcontain484and2557entries,respectively.Eachrecordis
build up of 104 metadata fields including attributes for sampling, sequencing, assembly and
annotation in addition to organism and taxonomic information. For MarRef and MarDb, data
from various sources, such as sequence, contextual, taxonomy and literature databases, in
addition to data from bacterial diversity metadata and culture collection databases has been
curated and integrated to produce robust databases. The corresponding genome, gene and
protein sequence databases has been build by downloading the individual entries from ENA
(EuropeanNucleotideArchive).
MarCat contains currently the Tara Ocean samples containing 1433 entries. In MarCat each
record contains 103 metadata fields. As forMarRef and MarDb each entries has been manually
curated and enriched with taxonomical annotation, assembly and functional annotation data.
The corresponding DNA, gene and protein databases were generated using META-pipe, a
pipelinefortaxonomicclassificationandfunctionalannotationofmetagenomicssample.
To generate the contextual databases, controlled vocabularies and ontologies are used, which
allowamorestreamlinedcuration,betterconsistencyofthedata,enhancedqualitycontrol(QC)
andnotleastdatatobemoreeasilyaggregatedandanalysed.Themanualcurationofthedata
producesmorerobust,richlyannotateddatasetswithhighlyaccurateanddetailedinformation.
The contextual and sequence databases has been incorporated into the Marine Metagenomics
Portal(MMP)andareavailableathttps://mmp.sfb.uit.no/
2
Horizon 2020 grant n. 676559
EXCELERATE: Deliverable D6.1
2.
Projectobjectives
Withthisdeliverable,theprojecthasreachedorthedeliverablehascontributedtothefollowing
objectives:
No. Objective
Yes
No
1
Developmentandimplementationofselectedstandardsforthemarine
domain.(Task6.1)
x
2
Developmentandimplementationofdatabasesspecificforthemarine
metagenomics.(Task6.2)
Evaluationandimplementationoftoolsandpipelinesformetagenomics
analysis.(Task6.3)
Developmentofasearchengineforinterrogationofmarine
metagenomicsdatasetsandestablishtrainingworkshopsforendusers.
(Task6.4)
x
3
4
x
x
3.
Deliveryandschedule
Thedeliveryisdelayed:☑Yes √No
4.
Adjustmentsmade
Noadjustmentswasmade
3
Horizon 2020 grant n. 676559
EXCELERATE: Deliverable D6.1
5.
Backgroundinformation
BackgroundinformationonthisWPasoriginallyindicatedinthedescriptionofaction(DoA)is
includedhereforreference.
Workpackagenumber
6
Startdateorstartingevent:
month1
Workpackagetitle
UseCaseA:Marinemetagenomicinfrastructureasdriverfor
researchandindustrialinnovation
Lead
NilsPederWillassen(NO)andRobFinn(EMBL-EBI)
Participantnumberandpersonmonthsperparticipant
P1:EMBL-EBI(28PM)-P17:FCG(2PM)-P20:CCMAR(11PM)–P24UiT(36PM)–P27:CNRS
(10PM)-P31:CNR(10PM)
Objectives
ThemainobjectiveforthisUseCaseistodevelopasustainablemetagenomicsinfrastructure
toenhanceresearchandindustrialinnovationwithinthemarinedomainbeforeM36ofthe
ELIXIR-EXCELERATEproject.Themainobjectivewillbeachievedbythefollowingspecific
objectives:
• Developmentandimplementationofselectedstandardsforthemarinedomain.
(Task6.1)
• Developmentandimplementationofdatabasesspecificforthemarine
metagenomics.(Task6.2)
• Evaluationandimplementationoftoolsandpipelinesformetagenomicsanalysis.
(Task6.3)
• Developmentofasearchengineforinterrogationofmarinemetagenomicsdatasets
andestablishtrainingworkshopsforendusers.(Task6.4)
Descriptionofwork
Metagenomicshasthepotentialtoprovideunprecedentedinsightintothestructureand
functionofheterogeneouscommunitiesofmicroorganismsandtheirvastbiodiversity.
Microbialcommunitiesaffecthumanandanimalhealthandarecriticalcomponentsofall
terrestrialandaquaticecosystems.Theycanbeexploitede.g.toidentifynovelbiocatalysts
forproductionoffuelsorchemicals(bioprospecting),makefunctionalfeedforaquaculture
species,andforenvironmentalmonitoring.However,inordertoexpandthepotential
furtherfortheresearchcommunityandbiotechindustry,especiallywithinthemarine
domain,themetagenomicsmethodologiesneedtoovercomeanumberofchallenges
4
Horizon 2020 grant n. 676559
EXCELERATE: Deliverable D6.1
relatedtostandardization,developmentofrelevantdatabasesandbioinformaticstools.New
andemergingsequencingtechnologies,integrationofmetadatagivesanextraburdentothe
developmentoffuturedatabasesandtools.TheUseCase“Marinemetagenomic
infrastructureasdriverforresearchandindustrialinnovation”willcontributetotheoverall
objectivesoftheELIXIR-EXCELERATEprojectbydevelopingresearchinfrastructureand
serviceprovisionspecificforthemarinedomaininordertoenablemetagenomicapproaches
respondingtosocietalandindustrialneeds.TheoutcomeoftheproposedUseCasewill
meetthemajorneedsexpressesbythemarinedomain(e.g.ESFMarineboardPosition
Paper17“MarineMicrobialDiversityanditsroleinEcosystemFunctioningand
EnvironmentalChange”andPositionPaper15“MarineBiotechnology:ANewVisionand
StrategyforEurope”).
Task6.1:Developmentandimplementationofacomprehensivemetagenomicsdata
standardsenvironmentforthemarinedomain(12PM)
Tomaximisetheimpactandlongtermutilityanddiscoverabilityofmetagenomicsdatasets,
itisessentialtheexperimentalmethodsanddataacquisition/storageprotocolsbe
established.InTask6.1,wewillbringtogetheracomprehensivemetagenomicsdata
standardsenvironmentincollaborationwithmarineexperimentalscientists,dataproviders,
endusersandtheexistingcommunitiesinvolvedinmarinestandardsdevelopment.The
environmentwillbringtogetherthreecomponents:
• Dataformatconventionsandstandardswilladdressthevariousdatatypesforwhich
sharingisrequired,thatwillincludecontextualdata(e.g.sampleinformation,
expedition-relateddata),metadata(e.g.provenanceandtrackinginformation,
descriptionsofexperimentalconfigurationsandbioinformaticstoolsinuse)anddata
(e.g.rawsequencedata,alignedreads,taxonomicidentifications,genecalls).
• Reportingstandardswilladdresscommunity-acceptedthresholdsfor
richness/precisionthatarerequiredtomakedatauseful,includingdepthofraw
machinedata,suchasresolutionofsequencequalityscoring,conventionsfor
referencestoreferenceassembliesandminimalreportingrequirementsfor
contextualdata.
• Validationtoolswilladdresstheautomatedvalidationofcompliancewith
conventionsandstandardsandthemeetingofminimalreportingexpectationsfor
givendatasetsinpreparationbythemarineresearchcommunity.Inthistask,wewill
bringtogethercomponentsthatexistalready–inparticularthecontextualdataand
metadatareportingstandardswehavedevelopedundertheMicroB3project(EU
FP7),datastandardsandconventionsdevelopedaroundourEuropeanNucleotide
Archive(ENA)programme,suchasCRAM,FASTQconventions,workexistinginthe
biodiversityandmolecularecologydomains(suchastabulardataconventionsand
BIOMmatrices)–andconstructnewcomponentsasrequired.Themajoroutputof
thisworkwillbeasetofwelldescribedandnavigableelementstoaidthemarine
communityinthepreparation,sharing,disseminationandpublicationofhighly
interoperableandcomprehensivemetagenomicsdatasets.
Partners:EMBL-EBI,NO
5
Horizon 2020 grant n. 676559
EXCELERATE: Deliverable D6.1
Task6.2.Establishmentofmarinespecificdataresources(20PM)
Duetothedatabiasesofexistingreferencedatabases,onlyaboutonequarterofsequences
areannotated,andthisfractiondiminishesfurtherwhenmorediversesamplessuchassoil
andmarineareanalyzed.Toimprovethecharacterizationofmarinemetagenomicsamples,
thistaskinvolvestheconstructionofsustainablepublicdataresourcesforthemarine
microbialdomain.Task6.2willbeachievedbyestablishingmarinemicrobialdatabases
includingreferencegenomes,nucleotideandproteindatabases.Theestablisheddatabases,
basedonthestandardsdevelopedinTask6.1,willenhancetheprecisionandaccuracyof
biodiversityandfunctionanalysis.Thereferencedatabaseswillbenon-redundantdatasets
generatedfromsequencesacquiredfromENA(aspartoftheInternationalNucleotide
SequenceDatabaseCollaboration),UniProtandotherpubliclyavailabledatasets.In
particularly,wewillusesomeofthehigher-coverageandhigherqualitysequenceoutputs
fromtheTaraOceansandOceanSamplingDaymetagenomicprojects,tobuildhighquality
marinespecificreferencedatabases.Alldatasetswillbecheckedwithrespecttoquality,
consistency,andinteroperability,andincompliancewithstandardsdevelopedin
Task6.1.Therespectiveknowledge-enhanceddatabaseswillbethecornerstonefor
sustainableanalysisofmarinemetagenomicssequencedata.Thedatabaseswillbe
developedincollaborationwithmembersoftheESFRIinfrastructuresEMBRCandMIRRIand
madepubliclyavailablethroughELIXIR.
Partners:NO,EMBL-EBI,IT
Task6.3:Gold-standardsformetagenomicsanalysis(58PM)
Themajorityofexistingmetagenomicsanalysisplatforms,whileprovidinginsightsintothe
prokaryotictaxonomicdiversityandfunctionalpotentialforindividualsamples,butlackthe
toolsthatenablediscoverabilityacrosssamplesandindustrialinnovation.Thistaskwillfocus
ontheevaluationandimplementationofnewtoolsandpipelinesinordertoaccelerate
research,discoverabilityandinnovation,reducingtimetomarketfornewproducts.In
combinationwithnewstandardsanddatabasesdevelopedinTask6.1andTask6.2,
respectively,newtoolsforcommunitystructure(microbialbiodiversity),geneticand
functionalpotentialwillbeevaluatedandimplementedforenvironmentalapplications.For
industrialapplicationtoolsandpipelinesfortheidentificationofgeneproducts(e.g.
enzymesanddrugtargets)andpathwayswillbeimplementedandmadepubliclyavailable.
Theevaluationandimplementationwillbeperformedinnearcollaborationwithend-users
(researchgroups,environmentalcenters,biotechcompanies)toensureusabilityfortheend
usercommunityinordertoimprove[ELIXIR-EXCELERATE]
quality,productivityandfunctionality,aswellasreductionofcostsfortheend-users.New
toolsandpipelineswillbemadepubliclyavailablethroughthee.g.META-pipe(ELIXIR-NO),
EBIMetagenomicsPortal(EMBLEBI)and/orEMBLEmbassycloudtechnology.Technical
requirementswillbemappedbyWP3andimplementedtomeettherequirementsofthe
ELIXIRcommunity.Thecontinuedadvancementofsequencingtechnologiesandthegrowing
numberofpublicmarinemetagenomicsprojectsmeansthatitisbecomingincreasingly
difficulttominethesevastdatasets.Inthistask,initiallyaweb-basedsearchenginewillbe
developedfortheinterrogationofmarinemetagenomicsresultsavailablefromtheEBI
MetagenomicsPortal,basedoncombinationsofqueriestoourwebservices(alreadyin
existence,ortobebuiltaspartofexistingprojectsoutsideELIXIR-EXCELERATE)forthe
6
Horizon 2020 grant n. 676559
EXCELERATE: Deliverable D6.1
discoveryofdatathroughmetadata,taxonomicandfunctionalfields.Thiswillextendthe
back-endsearchfunctionalitythatistobedevelopedaspartofon-goingefforts.Inaddition
tobeingdownloadable,wewillenablesearchresultstoflowintoanexpandedcomparison
tool(currentlylimitedtogeneontologytermsfromsamplesinthesameproject),toallow
morein-depthanalysisofauserselecteddatasets,allowingfunctionalandtaxonomic
comparisons.Inthesecondphaseofthistask,thesearchenginewillbuilduponthedata
exchangeformatsinTask6.1,andfederatethesearchacrossdifferentpipelineresultssets
(e.g.META-pipe),sothatdifferentresultsbasedonthesameunderlyingdataset,canbe
amalgamatedintoasinglesearch.Thiswilldramaticallyenhancethediscoverabilityacross
differentmarinedatasets,allowingtheidentificationofcommontrendsand/ordifferences.
Thesetoolswillbedevelopedusinguser-experiencetestingandincollaborationwithend
userstoensuretheyarefitforpurpose.
Partners:NO,EMBL-EBI,IT,FR,PT
Task6.4:Trainingworkshopsforendusers(7PM)
Inthistasktrainingworkshopswillbeestablished,incollaborationwithWP11“ELIXIR
TrainingProgramme”,forend-userswiththeaimtofacilitateaccessibility,bytraining
Europeanresearchersandindustrytomoreeffectivelyexploitthedata,toolsandpipelines,
andcomputeinfrastructureprovidedbytheELIXIRmarinemetagenomicsinfrastructure.
Thesetrainingworkshopsandmaterialswillbeconvertedtoonlinetrainingresources,
extendingthereachoftheworkshop.
Partners:PT,NO
Appendix1:
Developmentandimplementationofdatabasesspecificforthemarinemetagenomics.
7
Horizon 2020 grant n. 676559
EXCELERATE Deliverable 6.1: Annex I.
Developmentandimplementationofdatabases
specificformarinemetagenomics. Summary
The marine databases; MarRef, MarDb, and MarCat, are public available resources that
promotesmarineresearchandinnovation.
The marine resources, which have been implemented in the Marine Metagenomics Portal
(MMP), are a collection of richly annotated and manually curated contextual (metadata)
andsequencedatabasesrepresentingthreetiersofaccuracy.WhileMarRefisadatabasefor
completelysequencedmarineprokaryoticgenomes,whichrepresentamarineprokaryote
reference genome database, MarDb includes all sequenced marine prokaryotic genomes
regardless of level of completeness. MarCat represent a gene (protein) catalogue of
uncultivable (and cultivable) marine genes and proteins derived from metagenomics
samples.
The first versions of MarRef and MarDb contain 484 and 2557 entries, respectively. Each
record is build up of 104 metadata fields including attributes for sampling, sequencing,
assembly and annotation in addition to organism and taxonomic information. For MarRef
and MarDb, data from various sources, such as sequence, contextual, taxonomy and
literature databases, in addition to data from bacterial diversity metadata and culture
collection databases has been curated and integrated to produce robust databases. The
corresponding genome, gene and protein sequence databases has been build by
downloadingtheindividualentriesfromENA(EuropeanNucleotideArchive).
MarCatcontainscurrentlytheTaraOceansamplescontaining1433entries.InMarCateach
record contains 103 metadata fields. As for MarRef and MarDb each entries has been
manually curated and enriched with taxonomical annotation, assembly and functional
annotation data. The corresponding DNA, gene and protein databases were generated
using META-pipe, a pipeline for taxonomic classification and functional annotation of
metagenomicssample.
To generate the contextual databases, controlled vocabularies and ontologies are used,
whichallowamorestreamlinedcuration,betterconsistencyofthedata,enhancedquality
control (QC) and not least data to be more easily aggregated and analysed. The manual
curationofthedataproducesmorerobust,richlyannotateddatasetswithhighlyaccurate
anddetailedinformation.
ThecontextualandsequenceMardatabasesandareavailableathttps://mmp.sfb.uit.no/
1
Horizon 2020 grant n. 676559
EXCELERATE Deliverable 6.1: Annex I.
Background
Microorganisms are ubiquitous in the marine environment, where they play key roles in
many biogeochemical processes. With recent advances in community DNA shotgun
sequencing (metagenomics) and computational analysis, it is now possible to access the
taxonomicandgenomiccontent(microbiome)ofmarinecommunitiesand,thus,tostudy
theirdiversity,structuralpatterns,andfunctionalpotential.Thesemicroorganisms,andthe
communitiestheyform,driveandrespondtochangesintheenvironmentandalterationsin
oceanstratificationandcurrents.Withanestimated104to106cellspermilliliterseawater
andtotally1029bacterialcellstheyalsoprovidethegroundsforimmensegeneticdiversity.
Allresearchandinnovationisbasedoncomparisontoexistingknowledgeandinformation.
Therefore,sustainableandhighlyaccuratedataresources,whichareeasytoexcess,browse
and retrieve data from, are vital for performing high-class and beyond the state of art
researchandinnovation.
Uptonow,nodedicateddataresourcesexistforthemarinemetagenomicsdomain,which
notonlyhampertheutilizationofthevastgeneticresourcesforbiotechnologyresearchand
innovation (biosprospecting), but also impede the development of sustainable tools and
resources for example for environmental monitoring, monitoring of fish and shellfish
pathogensanddevelopmentofsustainablefeedformarineaquaculture.
Due to the data biases of existing generic reference databases, only about one quarter of
sequencesinametagenomicssamplesareannotated,andthisfractiondiminishesfurther
whenmorediversesamplessuchasmarinewaterandsedimentsamplesareanalysed.
Task 6.2 was established in order to construct non-redundant contextual and sequence
databases,includinggenomes/metagenomes,nucleotideandproteindatabasestoimprove
annotationofmarinemetagenomicsamples.
Overviewandstatus
Definitionofmarinemicroorganism
Todefine“marinemicroorganism”or“microbialmarinebiome”isnotstraightforwardsince
there are many habitats, which are in the borderline such as sandy shores and near river
deltas.AccordingtothedefinitionsetbytheEnvironmentalOntology1a“marinebiome”is
definedas“Anaquaticbiomethatcomprisessystemsofopen-oceanandunprotectedcoastal
habitats, characterized by exposure to wave action, tidal fluctuation, and ocean currents as
wellassystemsthatlargelyresemblethese.Waterinthemarinebiomeisgenerallywithinthe
salinity range of seawater: 30 to 38 ppt”. This definition does not fit to our mission to
ensemble sequence and metadata data of marine microorganisms from the marine
environment, since the definition does not include protected costal habitats such as
harbours,andestuariesenvironments.
In order to establish the marine resources we have chosen to define “marine microbial
biome”as“Anaquaticmicrobialbiomecomprisesofmicrobialcommunitiesfromopen-oceans,
costal and protected habitats up to the high water mark with salinity from 0.5 ppt as in
1
http://purl.obolibrary.org/obo/ENVO_00000447
2
Horizon 2020 grant n. 676559
EXCELERATE Deliverable 6.1: Annex I.
estuaries (brackish water) environments to above 100 ppt as in sea ice brine. The biome also
includes marine microbial communities obtained from marine species associated with these
habitats”. We have chosen to include soil samples from sandy shores, intertidal zone, salt
marshes(coastalsaltmarshoratidalmarsh),mudflatsandestuaries,inadditiontohabitats
such as seawater saltern, sea ice brine, black smokers (hydrothermal vents) where the
salinitycanweextremehighorlowcomparedtoseawater.Microorganismsassociatedwith
marine species, as defined by the World Register of Marine Species, WoRMS2, have also
beendefinedasmarine.Thisincludesmicroorganismsassociatedwithorcausingdiseases
inmarineanimalsorplantsforexamplecoral,shellfishandfish.
ShortdescriptionofMarRef,MarDbandMarCat
The construction of the marine sequence databases (BLAST) and their corresponding
contextualdatabasesareshowninFig.1.
TheMarRef,MarDbandMarCatsequencedatabasesarebasedonnon-redundantgenome
and metagenome data sets obtained from ENA (European Nucleotide Archive)3 and/or
NCBI(TheNationalCenterforBiotechnologyInformation)4.
MarRef is a database for completely sequenced marine prokaryotic genomes. MarDb
includes all sequenced marine prokaryotic genomes regardless the level of completeness.
Each genome assigned as marine
microbial biome according to our
definition was incorporated in the
MarRef and MarDb contextual
databases,respectively.
MarCat represent a gene (protein)
catalogue of uncultivable (and
cultivable) marine genes derived
from
marine
metagenomics
samples. Metagenomics sequences Figure1.DatabaseconstructionproceduresinMarRef,MarDband
were obtained from ENA and their MarCat.
corresponding gene and protein
featureannotationsuniquetoeachsampleweregeneratedusingMETA-pipe,apipelinefor
taxonomicclassificationandfunctionalannotationofmetagenomicssample.
The corresponding contextual databases, supports the International community-driven
standards of the Genomics Standards Consortium5 and is fully compliant with its
recommendations for Minimum Information about any (x) Sequence (MIxS) standards,
including MIGS (Minimum Information about a Genome Sequence) and MIMS (Minimal
InformationofMetagenomesequence).Thedatabasesalsoincludetheproposedstandards
forprovenanceofanalysisdevelopedinTask6.1.
2
http://www.marinespecies.org/
http://www.ebi.ac.uk/ena
4
https://www.ncbi.nlm.nih.gov/
5
http://gensc.org/
3
3
Horizon 2020 grant n. 676559
EXCELERATE Deliverable 6.1: Annex I.
Contextualdatabases
Datacollection
The MarRef, MarDb and MarCat contextual databases are built by compiling data from a
number of public available
sequence,
taxonomy
and
literature databases in a semiautomatic
fashion.
Other
databases or resources such as
bacterial diversity and culture
collections databases, web
mapping service and ontology
databases were used extensively
for curation of metadata.
Resources used in generation of
the marine curation databases
are shown in Fig. 3 (See also Figure2.PublicdataresourcesutilizedforconstructionofMarRef,MarDb
AppendixTable1).
andMarCat
Curation
Theimporteddatafileswerecompiled,convertedtoTabseparatedValuefiles(.tsv)format
and imported into Base, a full-featured desktop database front end, provided by
LibreOffice6. MarRef and MarDb contain in total 484 and 2557 entries with 106 metadata
fields out of which 30 are
represented
by
controlled
vocabularies (CV) and the
remaining are free text or
numeric fields (Appendix Table
2). These 106 metadata fields
include
information
about
sampling environment or host,
organism
and
taxonomy,
phenotype,
pathogenicity,
assembly
and
annotation
information (See Fig. 3). The use
of CV and ontologies can shortly
be described by the following
example.
The
three Figure3.BaseInterfaceformanuallycuratedofentries.
environmental metadata fields
usedfordescribingthesamplingsiteofthemicroorganisms;environmentalbiome,feature
and material are controlled by 104 CV terms. The environmental biome metadata field
contains11controlledenvironmentalontology(ENVO)termscoveringenvironmentssuch
asEstuarinebiome(ENVO:01000020),Marginalseabiome(ENVO:01000046),Marinebenthic
6
https://no.libreoffice.org/
4
Horizon 2020 grant n. 676559
EXCELERATE Deliverable 6.1: Annex I.
biome (ENVO:01000024), Marine mud (ENVO:00005795), Marine pelagic biome
(ENVO:01000023),Marinewaterbody(ENVO:00001999),andOceanbiome(ENVO:01000048).
These ontologies used in the environmental biome, feature and material fields are all well
defined and described (http://www.environmentontology.org/) and allows consistency
acrossthedatasets.
ThisfirstversionofMarCatcontains1433entriesfromtheTaraOceanexpeditionandeach
record contains 103 metadata fields. We have included metadata fields for provenance of
analysis includes metagenomics analysis metadata such as filtering, assembly, taxonomy,
genepredictionandfunctionalassignment.
The databases have links to other public databases. For example in MarRef sixteen of the
metadata fields have active links to other databases such as the literature databases
PubMedandPMCEurope,ontologiessuchasENVOandGAZ,sequencedatabasessuchas
UniProt and ENA, taxonomy databases such as NCBI taxon and Silva, and DMSZ culture
collection. These links allows the site visitors to easily access other site in order to obtain
more information about each microorganism. For MarRef all metadata fields has been
manually curated to ensure consistency across the datasets, which allow the end user to
easilysearchandbrowseentries.
For microorganisms, which have been completely sequenced, more information can be
foundindatabasescomparedtopartiallyordraftsequencedmicroorganisms.WhileMarRef
is thoroughly curated, MarDb and MarCat are only partly curated - a focus has been on
curatingtaxonomicandenvironmentalbiome,featureandmaterial.
Entries in the marine databases, MarRef, MarDb and MarCat follow the MlxS standard
guidelinesdevelopedbytheGenomicStandardConsortium,inadditiontootherontologies
such as Environmental ontologies (ENVO). Links to other external recourses will such as
culturecollections,literatureandothersecondarydatabasesisprovidedifavailable.Alistof
resourcesisshowninAppendixTable1.
Refinement
OpenRefine7 was used for refining
the metadata fields by cleaning,
trimming of leading and trailing
whitespace, transforming data from
one format into another and
extending it with web services and
externaldata.
Validation
A validator was developed to
convert Tab Separated Value files Figure4.Refinementofmetadatafields
(TSV) to Extensible Markup
Languagefiles(XML)andfromTSV
to XML to link the source TSV curation databases to the XML database. The validator
7
http://openrefine.org/
5
Horizon 2020 grant n. 676559
EXCELERATE Deliverable 6.1: Annex I.
defines a set of rules for the conversion - warnings and errors during conversion are
reported.
Sequencedatabases
TheMarRef,MarDbandMarCatsequencedatabasesarebasedonnon-redundantgenome
and metagenome data sets obtained mainly from ENA (European Nucleotide Archive)
and/orNCBI(TheNationalCentreforBiotechnologyInformation).
MarRefandMarDb
WhileMarRefisadatabaseforcompletelysequencedmarineprokaryoticgenomes,MarDb
includes all sequenced marine prokaryotic genomes regardless the level of completeness.
Each genome assigned as marine microbial biome according to our definition was
incorporatedintheMarRefandMarDbcontextualdatabases,respectively.Eachannotated
genome represents a set of gene and protein feature annotations that are unique to that
genome were downloaded from the RefSeq database (NCBI) and used in their respective
sequence databases. All RefSeq archaeal and bacterial genomes are annotated using
NCBI’sprokaryotic genome annotation pipeline PGAAP8, which improves consistency
acrossthedatasets.However,approx.20%ofallentriesinMarDbdidnotcontainanygene
and protein information. These genomes were annotated using Prokka, a command line
softwaretool,forannotationofprokaryotegenomes.9
MarCat
MarCatrepresentagene(protein)catalogueofuncultivable(andcultivable)marinegenes
derived from metagenomics samples. Metagenomics sequences were obtained from ENA
andtheircorrespondinggeneandproteinfeatureannotationsuniquetoeachsamplewere
generated using META-pipe, a pipeline for taxonomic classification and functional
annotation of metagenomics sample10. As a start we used the high-coverage and high
quality sequence outputs from the Tara Oceans and Ocean Sampling Day metagenomic
projects,tobuildthehighqualitymarinespecificreferencedatabases.
Implementationanduserinterface
Themarinereferencedatabasesprovideloginfreeaccesstoallpubliclyavailabledata.The
reference databases has been incorporated into the Marine Metagenomics Portal (MMP)
and implemented using the Hugo static website engine11. The website engine reads the
referencedatabasesfromXMLfilesallowsthesitevisitorsto:
• Browseeachofthedatabasesthatallowtheuserviewalldatabaseentries.
• Selectattributestobevisibleinthetable.
8
http://www.ncbi.nlm.nih.gov/genome/annotation_prok/
https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btu153
10
https://f1000research.com/articles/6-70/v1
11
https://gohugo.io/
9
6
Horizon 2020 grant n. 676559
EXCELERATE Deliverable 6.1: Annex I.
Filter entries to be visible in the table
based on the most important record
attribute, such as environmental
ontologies (biome, feature and
material)andtaxonomy(phylum,order
andgenus).
• Advancedfilteringallowsthesitevisitor
to (i) add one or more filters; (ii) refine
current filters by adding new filters or
removingalreadyappliedfiltersand(iii)
remove all filters and launch a new
search.
TheBLASTdatabasesprovidesimilaritysearch
against all nucleotide and protein sequences
entriesincludedinMarRef,MarDbandMarCat. Figure5.AccesstoMarRef,MarDBandMarCat
While the MarRef and MarDB databases were
build on sequences obtained from the RefSeq database or annotated using the Prokka
software package, the MarCat database was generated using META-pipe. All
metagenomics samples were assembled and annotated - only full length protein coding
(CDS)wasincludedinMarCat.
•
The contextual and sequence databases has been incorporated into the Marine
MetagenomicsPortal(MMP)andareavailableathttps://mmp.sfb.uit.no/
Furtherplans
The further plans can be classified into five broadly categories; (i) Acquisition of data, (ii)
Including viral, eukaryote microbial genomes and transcriptome samples, (iii) Controlled
vocabularies, (iv) Implementation of Bioschemas, (v) Downloading of data and (v) User
interface.
Acquisitionofsequenceandcontextualdata
Collectionofdatafrompublicavailableresourceswillcontinue.However,duetoincreasing
amountofgenomicandmetagenomicsequence-andmetadata,developmentofautomatic
and semi-automatic import scripts that generates data for the curation database will be
improvedinordertobuildmoreefficientimportpipelines.
Includingviral,eukaryotemicrobialgenomesandtranscriptomedata
InthisfirstversionoftheMarRefandMarDbdatabasesonlyprokaryotegenomeshasbeen
included. In the future we aim to include virus, eukaryote microbial genomes and
transcriptome data. In addition we aim to include more metagenomics and
metatranscriptomicsdatatoenhancethequalityoftheMarCatdatabases.
7
Horizon 2020 grant n. 676559
EXCELERATE Deliverable 6.1: Annex I.
Controlledvocabulariesandontologies
Inordertoenhancethecurationefficiencyandprovidebetterconsistencyofthedatasets,
thenumberofmetadatafieldsinthecontextualdatabaseswillbeincreasedwithcontrolled
vocabulariesandontologies.Thiseffortwillstreamlinethecurationanddatawillbemore
robust,easieraggregatedandanalysed.
Implementingschema.orgmarkup
Toimprovedatainteroperabilityweintendtoimplementschema.orgmarkup,sothatMMP
websitesandservicescontainmorestructuredinformation.Thisstructuredinformationwill
makeiteasierfortheendusertodiscover,collateandanalyzeourdata.
Userinterface
Sitevisitorsinteractwiththedatabasesthroughuserinterfacesforbrowsing,filteringand
downloadingdata.Wewillimplementbettersearchandfilteringfeaturesbyincludingmore
metadatafieldswithcontrolledvocabularies.
Downloadingofdata
Inthefirstversionofthedatabasesthesitevisitorcanonlydownloadallsequence(BLAST)
andcontextualdatabases.Forthecontextualdatabasesthedatacanbedownloadedeither
inTSVorXMLformat.Weaimtoimplementbettersystemsfordownloadingsingleentries
orentriesselectedbysearchingorfilteringofthedatasets.
Funding
The work has been conducted as a part of H2020 ELIXIR EXCELERATE project (Grant no.
676559), with support from Research Council of Norway (Grant no 208481/F50) and UiT
TheArcticUniversityofNorway.
8
Horizon 2020 grant n. 676559
EXCELERATE Deliverable 6.1: Annex I.
Appendix
Table1.Resourcesusedtogeneratethemarinereferencedatabases
Resources
Description
Links
ENA(EuropeanNucleotideArchive)
A comprehensive record of the world's nucleotide sequencing
information, covering raw sequencing data, sequence assembly
informationandfunctionalannotation.
http://www.ebi.ac.uk/ena
NCBI (The National Center for Biotechnology
Information)
NCBI houses a series of databases relevant to biotechnology and
biomedicine and an important resource for bioinformatics tools and
servicesincludingGenBank,BLASTandPubMed.
https://www.ncbi.nlm.nih.gov/
UniProt
Acomprehensive,high-qualityandresourceofproteinsequenceand
function
http://www.uniprot.org/
BacDive
BacDive - the Bacterial Diversity Metadatabase merges detailed
strain-linked information on the different aspects of bacterial and
archaealbiodiversity.
http://bacdive.dsmz.de/
DSMZ
(Deutsche
Sammlung
von
MikroorganismenundZellkulturenGmbH)
The DSMZ (Deutsche sammlung von Mikroorganismen und
Zellkulturen GmbH) is one of the largest biological resource centers
worldwide, including about 27,000 different bacterial, 4,000 fungal
strainsand13,000differenttypesofbacterialgenomicDNA.
https://www.dsmz.de/
OLS
OLS (Ontology Lookup Service) - a repository for biomedical
ontologies that aims to provide a single point of access to the latest
ontologyversions.
https://www.ebi.ac.uk/ols/index
OLSVis
VisualbrowserforOLS
http://ols.wordvis.com/
BioPortal
Theworld’smostcomprehensiverepositoryofbiomedicalontologies
http://bioportal.bioontology.org/
PubMed
Abibliographicdatabaseforthebiomedicalliterature.
https://www.ncbi.nlm.nih.gov/pubmed/
EuropePMC
Europe PMC is a repository, providing access to worldwide life
sciencesarticles,books,patentsandclinicalguidelines.
https://europepmc.org/
GoogleMaps
GoogleMapsisawebmappingservicedevelopedbyGoogle.
https://www.google.com/maps
Silva
HighqualityribosomalRNAdatabase
https://www.arb-silva.de/
NCBItaxonomybrowser
A curated classification and nomenclature database for all of the
organismsinpublicsequencedatabases
https://www.ncbi.nlm.nih.gov/taxonomy
Patric
An information system designed to support the biomedical research
community on bacterial infectious diseases via integration of vital
pathogeninformationwithrichdataandanalysistools.
https://www.patricbrc.org/
Gold
Genomes Online Database - a World Wide Web resource for
comprehensive access to information regarding genome and
metagenomesequencingprojects,andtheirassociatedmetadata.
https://gold.jgi.doe.gov/index
WoRMS
WorldRegisterofMarineSpecies(WoRMS)providesanauthoritative
and comprehensive list of names of marine organisms, including
informationonsynonymy.
http://www.marinespecies.org/
Table2.AttributesincludedinMarDb,MarRefandMarCat
Structured comment Item
name
Description
Examples
Expectedvalue
Valuesyntax
Prefered units
/suffix
{float}or{range}
meters(m)
alt_elev
Geographic
Sample taken at given elevation above sea level, Ex1:3.06
location
defined in meters(m) as a positive floating Ex2:1.80-2.15
(altitude/elevat numberwithtwodecimals.
ion)
-
collection_date
Collectiondate Thetimeofsampling,eitherasaninstance(single Ex1:2008-01pointintime)orinterval.Incasenoexacttimeis 23T19:23:10+00:00
available,thedate/timecanberighttruncated.
Ex2:2011-11-10
Ex3:2001-12
dateandtime,range {timestamp}
-
9
Horizon 2020 grant n. 676559
EXCELERATE Deliverable 6.1: Annex I.
Ex7:2015
Ex4:2003--2006
Ex5:2010-01--2011-03
Ex6:2011-05-28--201108-10
depth
Depth
Please refer to the definitions of depth in the Ex1:355.20
environmental packages. Water: Sample taken at Ex2:2.00-5.00
givendepthbelowsealevel,definedinmeters(m)
as a positive floating number or as a range, both
withtwodecimals.
-
meters(m)
env_biome
Environment
(biome)
In environmental biome level are the major Ex1:coralreef
classes of ecologically similar communities of Ex2:tropical
plants,animals,andotherorganisms.Biomesare
definedbasedonfactorssuchasplantstructures,
leaf types, plant spacing, and other factors like
climate. Examples include: desert, taiga,
deciduous woodland, or coral reef. EnvO (v1.53)
terms listed under environmental biome can be
found
from
the
link:(http://www.environmentontology.org/Brow
se-EnvO)
EnvO
{freetext}
-
env_biome_ENVO
Environment
(biome_id)
Corresponding ENVO identifier related to the Ex1:ENVO:00000150
termnameofEnvironment(biome).
Ex2:ENVO:01000204
EnvO
{accession}
-
env_feature
Environment
(feature)
Environmental feature level includes geographic Ex1:coast
environmental features. Examples include: Ex2:oceanfloor
harbor, cliff, or lake. EnvO (v1.53) terms listed
under environmental feature can be found from
the
link:(http://www.environmentontology.org/Brow
se-EnvO)
EnvO
{term}
-
env_feature_ENVO
Environment
(feature_id)
Corresponding ENVO identifier related to the https://www.ebi.ac.uk/me EnvO
termnameofEnvironment(feature).
tagenomics/projects/SRP0
00183/samples/SRS00044
7
{accession}
-
env_material
Environment
(material)
The environmental material level refers to the Ex1:seawater
matterthatwasdisplacedbythesample,priorto Ex2:ice
the sampling event. EnvO (v1.53) terms listed
under environmental matter can be found from
the
link:(http://www.environmentontology.org/Brow
se-EnvO)
EnvO
{term}
-
env_material_ENVO
Environment
(material_id)
Corresponding ENVO identifier related to the Ex1:ENVO:00002149
termnameofEnvironment(material).
Ex2:ENVO:01000277
EnvO
{accession}
-
env_package
Environmental MIGS/MIMS/MIMARK extension for reporting of Ex1:Water
package
measurements and observations obtained from Ex2:Host-associated
one or more of the environments where the
samplewasobtained.Allenvironmentalpackages
listed here are further defined in separate
subtables. By giving the name of the
environmental package, a selection of fields can
bemadefromthesubtablesandcanbereported.
CV,singledataentry [Air|Host-
associated|Micro
bial
mat/biofilm|Misc
environment|Plan
tassociated|Sedim
ent|Soil|Wastewa
ter/sludge|water]
env_salinity
Environmental Please refer to the definitions of salinity in the 39.14
salinity
environmental packages. Water: Salinity
measurement, given in practical salinity units
(psu)asapositivenumberwithtwodecimals.
-
{float}{unit}
psu
env_temp
Environmental Package defined temperature. Temperature in 16.25
temperature
degreesCelsiusofthesampleattimeofsampling,
given in degrees Celsius as positive or negative
numberswithtwodecimals.
-
{float}{unit}
°C
geo_loc_name
Geographic
location
(country
and/or
sea,
region)
country or sea name {string}:{string}:{st -
(INSDC):region:specif ring}
iclocationname
The geographical origin of the sample as defined Japan:Kochi
by the country or sea name followed by specific Prefecture:CapeMuroto
region name. Country or sea names should be
chosen from the INSDC country list here.
Source:(http://insdc.org/country.html)
10
Horizon 2020 grant n. 676559
EXCELERATE Deliverable 6.1: Annex I.
geo_loc_name_GAZ
Geographic
location(GAZ)
The GAZ ontology (v1.446) may also be used to GotoIslands
specify
the
location
of
sampling.
Source:
(http://purl.bioontology.org/ontology/GAZ)
geo_loc_name_GAZ_
ENVO
Geographic
location
(GAZ_id)
Corresponding Gazetteer (GAZ) identifier related
tothetermnameofGeographiclocation.LinkOut
to EMBL Ontology Lookup Service, see example
forGAZ:00045749.
gazname
https://www.ebi.ac.uk/ols gaznumber
/ontologies/gaz/terms?iri=
http%3A%2F%2Fpurl.oboli
brary.org%2Fobo%2FGAZ_
00045749
{term}
-
{accession}
-
investigation_type
Investigation
type
NucleicAcidSequenceReportistherootelement bacteria
of all MIGS/MIMS compliant reports as
standardized by Genomic Standards Consortium.
This
field
is
either
eukaryote,bacteria,virus,plasmid,organelle,
metagenome,miens-surveyormiens-culture.
CV,singledataentry [Eukaryote|Bacter -
ia|Archaea|Plasm
id|Virus|Organell
e|Metagenome|
Mimarkssurvey|Mimarksspecimen|NA|un
known|missing]
lat_lon
Coordinates
The geographical origin of the sample as defined 69.692320,18.973551
(latitude and by latitude and longitude. The values should be
longitude)
reported in decimal degrees, positive and
negative numbers, and according to the WGS84
system.Thiscanbefoundusinggooglemaps.
decimaldegrees
{string}:{string}
-
pathogenicity
Known
pathogenicity
To what is the entity pathogenic. Ex: human, Animal
animal,plant,fungi,bacteria
Freetext
{string}
-
project_name
Projectname
Nameoftheprojectwithinwhichthesequencing -
wasorganized.
Freetext
{text}
-
assembly
Assembly
How was the assembly done (e.g. with a text Newblerassemblerv.2.5
based assembler like phrap or a flowgram
assembler); estimated error rate associated with
the finished sequences (e.g. error rate of 1 in
1000 bp); and the method of calculation.
Source:(https://www.ncbi.nlm.nih.gov/assembly)
Freetext
{text}
-
isol_growth_condt
Isolation and Publication reference in the form of pubmed ID
growth
(pmid), digital object identifier (doi) or url for
condition
isolation and growth condition specifications of
the organism/material. PubMed ID or DOI ID.
Source:(https://www.ncbi.nlm.nih.gov/pubmed)
num_replicons
Number
replicons
ref_biomaterial
Reference for Primary publication if isolated before genome
biomaterial
publication; otherwise, primary genome report.
PubMed
ID
or
DOI
ID.
Source:(https://www.ncbi.nlm.nih.gov/pubmed)
microbe_package
Microbe
Ex1:15393697
PMID,DOI, URL or {string}or{list}
Ex2:10.1007/BF00210994 PMCID, single data
Ex1LinkOut:
entryoralist
https://www.ncbi.nlm.nih.
gov/pubmed/?term=1539
3697
Ex2LinkOut:
https://doi.org/10.1007/B
F00210994
of Reports the number of replicons in a nuclear 3
genome of eukaryotes, in the genome of a
bacteriumorarchaeaorthenumberofsegments
in a segmented virus. Always applied to the
haploid chromosome count of a eukaryote.
Source:(https://www.ncbi.nlm.nih.gov/assembly)
for eukaryotes and {integer}
bacteria:
chromosomes
(haploid count); for
viruses:segments
Ex1:15393697
PMID,DOI, URL or {string}or{list}
Ex2:10.1007/BF00210994 PMCID, single data
Ex1LinkOut:
entryoralist
https://www.ncbi.nlm.nih.
gov/pubmed/?term=1539
3697
Ex2LinkOut:
https://doi.org/10.1007/B
F00210994
ApackagerepresentsatypeofBioSample(NCBI) Microbeversion1.0
and specifies the list of attributes by which it
should be described. Use for bacteria or other
unicellularmicrobeswhenitisnotappropriateor
advantageous to use MIxS, Pathogen or Virus
packages.
See:
https://www.ncbi.nlm.nih.gov/biosample/docs/p
ackages/
See:
http://www.ncbi.nlm.nih.gov/biosample/SAMN02
911891
Freetext
{string}
-
-
-
-
11
Horizon 2020 grant n. 676559
EXCELERATE Deliverable 6.1: Annex I.
sample_type
Sampletype
Sample type, such as cell culture, mixed culture, Enrichmentculture
tissue sample, whole organism, single cell,
metagenomicassembly.
Freetext
{string}
-
Freetext
{text}
-
Source:
(https://www.ncbi.nlm.nih.gov/biosample/)
strain
Strain
Microbial or eukaryotic strain name. Microbial
designated∕original strain name followed by
sequencedstrainfromculturecollection.Multiple
values
allowed
separated
by
':'.
Source:(http://bacdive.dsmz.de)
Ex1:N-2927
Ex2:Och323:DSM
19469:CIP107377
Ex3:7-ME:DSM
22347:LMG26961
isolation_source
Isolation
source
Describes the physical, environmental and/or Mid-OkinawaTrough
localgeographicalsourceofthebiologicalsample hydrothermalsediments
fromwhichthesamplewasderived.
Freetext
{text}
-
collected_by
Collectedby
Name of persons or institute who collected the R.Smith
sample
Freetext
{text}
-
culture_collection
Culture
collection
Name of source institute and unique culture CIP109004:DSM
identifier. See the description for the proper 16917:KCTC32106
format and list of allowed institutes,
http://www.insdc.org/controlled-vocabularyculturecollection-qualifier.
The sample represented by the entry can be
stored in one or several culture collections like
DSMZ,
ATCC
and
JCM.
Source:(http://bacdive.dsmz.de)
Culture
collection {string}or{list}
abrevation followed
byIDnumber.Single
dataentryoralist.
-
bacdive_id
BacDive
{LinkOut}
BacDive ID accession URL{integer}
number
-
curation_date
Curationdate
The date in which the entry curation was 2016-11-28
completed.
YYYY-MM-DD
{timestamp}
-
implementation_date Implementatio
ndate
The date in which the entry was added to the 2016-11-28
database.
YYYY-MM-DD
{timestamp}
-
updated_date
Updated
The date in which the entry was updated with 2016-12-05
morerecentinformationthanoriginallycurated.
YYYY-MM-DD
{timestamp}
-
mmp_biome
-
Marked as marine according to the MMP Marine
standard.
-
(string)
-
base_ID
-
Individual numeric ID for MarRef, MarDB and 1
MarCat,whichisonlyneededforidentificationin
LObase.
-
{integer}
-
mmp_ID
-
Individual numeric ID for MarRef, MarDB and Ex1:MMP02256433
MarCatthatcorrespondstothenumericsystemin Ex2:MMP02954345.1
biosample_accession. Incase of duplicate Ex3:MMP02954345.2
BioSample number we use ".1, .2 ..." as an
aditionalinteger.
-
{accession}
-
silva_accession_SSU
Silva accession Link out to the SILVA Small Subunit rRNA https://www.arbURL followed by the {accession}
ID{LinkOut}
Database (SSU). Number in the BioProject silva.de/search/show/ssu/i BioProjectnumberin
identifierisusedtolookuptheentryinSILVA.See nsdc/30703
theID.
exampleforhowtheBioProjectidPRJEA30703is
used.
-
silva_accession_LSU
Silva accession Link out to the SILVA Large Subunit rRNA http://www.arbID{LinkOut}
Database (SSU). Number in the BioProject silva.de/search/show/lsu/i
identifierisusedtolookuptheentryinSILVA.See nsdc/30703
exampleforhowtheBioProjectidPRJEA30703is
used.
-
uniprot_accession
UniProt
proteome
{LinkOut}
assembly_accession
ENA Assembly The given accession number of the entry in http://www.ebi.ac.uk/ena
accession ID NCBI/ENA
Assembly. /data/view/GCA_0004002
ID The given ID number of the entry in BacDive http://bacdive.dsmz.de/in
meta-database.
dex.php?search=2546
Source:(http://bacdive.dsmz.de)
URL followed by the {accession}
BioProjectnumberin
theID.
The given ID number of the entry in UniProt http://www.uniprot.org/pr -
ID Proteomes.
oteomes/UP000027362
Source:(http://www.uniprot.org/proteomes/)
{accession}
-
Assembly accession {accession}
-
12
Horizon 2020 grant n. 676559
EXCELERATE Deliverable 6.1: Annex I.
{LinkOut}
Source:(https://www.ncbi.nlm.nih.gov/assembly) 45.1
number
bioproject_accession
ENA BioProject The given accession number of the entry in http://www.ebi.ac.uk/ena
accession ID NCBI/ENA
BioProject. /data/search?query=PRJN
{LinkOut}
Source:
A40879
(https://www.ncbi.nlm.nih.gov/bioproject/)
Bioproject accession {accession}
number
-
biosample_accession
ENA BioSample The given accession number of the entry in http://www.ebi.ac.uk/bios Biosample accession {accession}
accession ID NCBI/ENA
BioSample. amples/samples/SAMN025 number
{LinkOut}
Source:
89594
(https://www.ncbi.nlm.nih.gov/biosample/)
-
genbank_accession
ENA GenBank The given accession number of the entry in http://www.ebi.ac.uk/ena Genbank accession {accession}
accession ID NCBI/ENA
GenBank. /data/view/JQOJ01000000 number
{LinkOut}
Source:(https://www.ncbi.nlm.nih.gov/genbank/)
-
NCBI_refseq_accessio NCBI Refseq The given accession number of the entry in NCBI https://www.ncbi.nlm.nih. NCBI
refseq {accession}
n
accession ID RefSeq.
gov/nuccore/NZ_BACE000 accessionnumber
{LinkOut}
Source:(https://www.ncbi.nlm.nih.gov/refseq/)
00000
-
NCBI_taxon_identifier NCBI
Taxon The given taxon ID number of the entry in NCBI https://www.ncbi.nlm.nih. IDnumber
identifier
Taxonomy.
gov/Taxonomy/Browser/w
{LinkOut}
Source:
wwtax.cgi?id=1454200
(https://www.ncbi.nlm.nih.gov/taxonomy)
{accession}
-
annotation_provider
Annotation
provider
-
NCBI
-
{string}
-
annotation_date
Annotation
date
-
Ex1:2008-0123T19:23:10+00:00
Ex2:2011-11-10
Ex3:2001-12
dateandtime
{timestamp}
-
annotation_pipeline
Annotation
pipeline
Nameofthepipeline
NCBIProkaryoticGenome
AnnotationPipeline
-
{string}
-
annotation_method
Annotation
method
Databasesandtoolsintegratedinthepipeline
Best-placedreference
proteinset:GeneMarkS
Listofdatabasesand {string}:{string}
tools integrated in
thegivenpipeline.
-
annotation_software
_revision
Annotation
software
revision
-
3.0
-
-
features_annotated
Features
annotated
-
Gene:CDS:rRNA:tRNA:ncR
NA:repeat_region
List of features {string}:{string}:{st -
annotated
ring}:{string}:{strin
g}:{string}
genes
Genes
-
3717
-
{integer}
-
cds
CDS
-
3625
-
{integer}
-
pseudo_genes
Pseudogenes
-
24
-
{integer}
-
rrnas
rRNAs
Thedefinednumberof5S,16Sand23Spredicted. 2,1,2;0,0,0
Fieldcannotbeempty.
Integer
list {integer},
containing
a {integer},
constant of three {integer}
numbers
that
represents 5S, 16S
and/or
23S
sequences.
5S,16S,23S
complete_rrnas
Complete
rRNAs
Thedefinednumberof5S,16Sand23Spredicted. 2,3,5
Fieldcannotbeempty.
Integer
list {integer},
containing
a {integer},
constant of three {integer}
numbers
that
represents 5S, 16S
and/or
23S
sequences.
5S,16S,23S
partial_rrnas
PartialrRNAs
Thedefinednumberof5S,16Sand23Spredicted. 0,3,0
Fieldcannotbeempty.
Integer
containing
5S,16S,23S
{float}
list {integer},
a {integer},
13
Horizon 2020 grant n. 676559
EXCELERATE Deliverable 6.1: Annex I.
constant of three {integer}
numbers
that
represents 5S, 16S
and/or
23S
sequences.
trnas
tRNAs
-
57
-
{integer}
-
ncrna
ncRNA
-
1
-
{integer}
-
frameshifted_genes
Frameshifted
genes
-
12
-
{integer}
-
frameshifted_genes_
on_monomer_runs
Frameshifted -
genes
on
monomerruns
2
-
{integer}
-
frameshifted_genes_
not_on_monomer_ru
ns
Frameshifted -
genes not on
monomerruns
5
-
{integer}
-
Salmon
Organism common {text}
name
-
host_common_name Host common Thecommonnameoftheorganisme.g.Human.
name
host_scientific_name Host
Thenatural(asopposedtolaboratory)hosttothe Salmosalar
organism from which the sample was obtained.
Use the full taxonomic name, e.g. "Homo
sapiens".
Hostscientificname {text}
-
organism
Organism
The most descriptive organism name for this Mobilicoccuspelagius
sample (to the species, if relevant but without
strainorculturecollectionname).
Text
{text}
-
full_scientific_name
Full Scientific Scientific name with authority, strain or culture Stigmatellaaurantiaca
Name
collection
appended. BerkeleyandCurtis1875
Source:(http://bacdive.dsmz.de)
Species
name {text}
followed
by
authority
or
designated
strain/culture
collection.
-
disease
Disease
Relevant disease related to the sample. We Botulism
restrict our consideration to two parent MeSH
headings: Gram-Positive Infections, and GramNegative
Infections.
Source:(https://www.ncbi.nlm.nih.gov/mesh)
Freetext
-
publication_pmid
Publication
{LinkOut}
Publication(s) including the genome data. PMID, Ex1:15393697
PMID,DOI, URL or {string}or{list}
DOI
or
URL. Ex2:10.1007/BF00210994 PMCID, single data
Source:(https://www.ncbi.nlm.nih.gov/pubmed) Ex1LinkOut:
entryoralist
https://www.ncbi.nlm.nih.
gov/pubmed/?term=1539
3697
Ex2LinkOut:
https://doi.org/10.1007/B
F00210994
-
isolation_country
Isolation
country
Sameasgeo_loc_name
Philippines
Freetext
{text}
-
isolation_comments
Isolation
comments
PATRIC
isolatedatadepthof2500 Freetext
mfromtheSuluTrough
{text}
-
comments
Comments
PATRIC
GenomicDNAfromStrain
IMCC9063fromSAR11
group3isolatedfroman
ArcticEnvirionment.
Freetext
{text}
-
sequencing_centers
Sequencing
center
PATRIC
ZhejiangUniversity
-
{text}
-
body_sample_site
Body
site
kidney
Freetext
{text}
-
sample PATRIC
{text}
14
Horizon 2020 grant n. 676559
EXCELERATE Deliverable 6.1: Annex I.
body_sample_subsite Body sample PATRIC
subsite
-
Freetext
{text}
-
other_clinical
Otherclinical
PATRIC
host_health_state:disease
d
Freetext
{text}
-
gram_stain
Gramstain
PATRIC
http://trace.ddbj.nig.ac.jp/bioproject/submission
_e.html#Gram
Positive
CV,singledataentry [Positive|Negativ
e|unknown|NA|
missing]
cell_shape
Cellshape
PATRIC,
For
more
information
see: Bacilli
http://trace.ddbj.nig.ac.jp/bioproject/submission
_e.html#Shape
CV,singledataentry [Bacilli
-
(Rod)|Cocci
(Spherical)|Spirilla
(Spiral)|Coccobaci
lli
(Elongated
coccal)|Filamento
us
(Long
threds)|Vibrios
(Slightly curved
rod)|Fusobacteria
(Spindle)|Square
Shaped|Curved
Shaped|Tailed|Ov
al|unknown|NA|
missing]
motility
Motility
PATRIC
http://trace.ddbj.nig.ac.jp/bioproject/submission
_e.html#Motility
Yes
CV,singledataentry [Yes|No|unknow
n|NA|missing]
-
sporulation
Sporulation
PATRIC
http://trace.ddbj.nig.ac.jp/bioproject/submission
_e.html#Endospores
No
CV,singledataentry [Yes|No|unknow
n|NA|missing]
-
temperature_range
Temperature
range
Bacteriagrowthtemperaturetolerance.
Psychrophilic
CV,singledataentry [Cryophilic|Psychr -
ophilic|Psychrotol
erant|Mesophilic
|Thermophilic|Hy
perthermophilic|
unknown|NA|mis
sing]
optimal_temperature Optimal
growth
temperature
Theoptimumgrowthtemperatureofthebacteria. Ex1:37.23
Given as a single positive or negative number or Ex2:15.00-18.50
asarangewithtwodecimals.
-
{float}
-
°C
halotolerance
Halotolerance Slighthalophilesprefer0.3to0.8M(1.7to4.8% Extremehalophilic
— seawater is 0.6 M or 3.5%), moderate
halophiles0.8to3.4M(4.7to20%),andextreme
halophiles 3.4 to 5.1 M (20 to 30%) salt content.
Halophiles require sodium chloride (salt) for
growth, in contrast to halotolerant organisms,
which do not require salt but can grow under
saline
conditions.
http://trace.ddbj.nig.ac.jp/bioproject/submission
_e.html#Salinity
CV,singledataentry [Halophilic|Haloto -
lerant|Non
Halophilic|Moder
ate
Halophilic|Extrem
e
Halophilic|Euryha
line|Stenohaline|
|unknown|NA|mi
ssing]
oxygen_requirement
Oxygen
requirement
PATRIC
http://trace.ddbj.nig.ac.jp/bioproject/submission
_e.html#OxygenReq
Aerobe
CV,singledataentry [Aerobic|Microae -
rophilic|Facultativ
e|Anaerobic|unk
nown|NA|missing
]
plasmids
Plasmids
PATRIC
4
-
{integer}
-
genome_length
Genomelength PATRIC
1518636
-
{integer}
bp
gc_content
GCcontent
Floatwithtwodecimalnumbers.
29.84
-
{float}
%
refseq_cds
RefseqCDS
0=missing
1490
-
{integer}
-
biovar
Biovar
Abiovarisavariantprokaryoticstrainthatdiffers Ex1:Biovar1(parvo)
physiologically and/or biochemically from other Ex2:Biovar2(T960)
strainsinaparticularspecies.
-
{text}
-
15
Horizon 2020 grant n. 676559
EXCELERATE Deliverable 6.1: Annex I.
other_typing
Othertyping
PATRIC
genotype:Wildtype
typing:term
type_strain
Typestrain
PATRIC
Yes
CV,singledataentry [Yes|No|unknow
n|NA|missing]
sequencing_platform Sequencing
Platform
Sequencing method used; e.g. Sanger, HiSeqIllumina:Ion
pyrosequencing,
ABI-solid. PGM:PacBio
Data
source:
(https://www.ncbi.nlm.nih.gov/assembly)
Options:
http://trace.ddbj.nig.ac.jp/dra/submission_e.html
#Instrument
{typing}:{term}
-
-
free text, single data [454 GS|454 GS -
entryoralist
20|454
GS
FLX|454
GS
FLX+|454 GS FLX
Titanium|454 GS
Junior|Illumina
Genome
Analyzer|Illumina
Genome Analyzer
II|Illumina
Genome Analyzer
IIx|Illumina
HiSeq|Illumina
HiSeq
1000|Illumina
HiSeq
1500|Illumina
HiSeq
2000|Illumina
HiSeq
2500|Illumina
HiSeq
3000|Illumina
HiSeq
4000|Illumina
MiSeq|Illumina
HiScanSQ|HiSeq X
Five|HiSeq
X
Ten|NextSeq
500|NextSeq
550|Helicos
HeliScope|AB
SOLiD System|AB
SOLiD
System
2.0|AB
SOLiD
System
3.0|AB
SOLiD 3 Plus
System|AB SOLiD
4
System|AB
SOLiD
4hq
System|AB SOLiD
PI
System|AB
5500
Genetic
Analyzer|AB
5500xl Genetic
Analyzer|AB
5500xl-W Genetic
Analysis
System|Complete
Genomics|MinIO
N|GridION|Prom
ethION|PacBio
RS|PacBio
RS
II|Sequel|Ion
Torrent|Ion
Torrent PGM|Ion
Torrent
Proton|AB|AB
Genetic
Analyzer|AB 377
Genetic
Analyzer|AB 310
Genetic
Analyzer|AB 3130
Genetic
Analyzer|AB
3130xL Genetic
Analyzer|AB 3500
Genetic
Analyzer|AB
3500xL Genetic
Analyzer|AB 3730
Genetic
Analyzer|AB
3730xL Genetic
Analyzer|Sanger
sequencing]
16
Horizon 2020 grant n. 676559
EXCELERATE Deliverable 6.1: Annex I.
sequencing_depth
Sequencing
depth
-
25.3:102.8
-
{float}
x:x
contigs
Contigs
-
125
-
{integer}
-
genome_status
Genomestatus Complete,Draft
Complete
-
[Complete|Draft] -
host_sex
Hostsex
Physicalsexofthehost
male
CV,singledataentry [Male|Female|Ne -
uter|Hermaphrod
ite|Not
determined|NA|u
nknown|missing]
host_health_stage
Hosthealth
Healthstateofhost
diseased
Hostconditionatthe {text}
timeofsampling
-
host_age
Hostage
Ageofhostatthetimeofsampling;relevantscale Ex1:5years
depends on species and study, e.g. could be Ex2:26days
secondsforamoebaeorcenturiesfortrees.
Ex3:2.5-3.0hours
Life stage or length {float}
oflife
minutes,
hours, days,
weeks, years,
decades,
centuries
serovar
Serovar
Epidemiologic classification of organisms to the O4:K8
sub-species level based on their cell surface
antigens.Maybesub-classificationsbelowbiovar.
-
{text}
-
pathovar
Pathovar
Thetermpathovarisusedtorefertostrainswith Pseudomonassyringaepv. -
similar features that are differentiated at the lachrymansabbreviated-
subspecies level on the basis of differences in P.s.pv.lachrymans
plant host range and types of symptoms, and
additionally
by
biochemical
profiles.
http://www.isppweb.org/about_tppb_naming.as
p
{text}
-
kingdom
Kingdom
NCBItaxonomy
Bacteria
Freetext
{string}
-
phylum
Phylum
NCBItaxonomy
Bacteroidetes
Freetext
{string}
-
class
Class
NCBItaxonomy
Flavobacteriia
Freetext
{string}
-
order
Order
NCBItaxonomy
Flavobacteriales
Freetext
{string}
-
family
Family
NCBItaxonomy
Flavobacteriaceae
Freetext
{string}
-
genus
Genus
NCBItaxonomy
Algibacter
Freetext
{string}
-
species
Species
NCBItaxonomy
Altererythrobacter
marensis
Freetext
{string}
-
taxon_lineage_ids
Taxon lineage The complete identifier lineage to the most 131567:2;1224:28211:204 List
ids
specifictaxonoftheorganism.
457:335929:361177:54387
7
{string}:{string}:{st -
ring}:{string}:{strin
g}:{string}:{string}
taxon_lineage_names Taxon lineage The complete name lineage from cellular cellular
List
names
organisms to the most specific taxon of the organisms:Bacteria:Proteo
organism.
bacteria:Alphaproteobacte
ria:Sphingomonadales:Eryt
hrobacteraceae:Altereryth
robacter:Altererythrobacte
rmarensis
{string}:{string}:{st -
ring}:{string}:{strin
g}:{string}:{string}
17
Horizon 2020 grant n. 676559
EXCELERATE Deliverable 6.1: Annex I.
Table 3. Controlled vocabularies for Environmental biome
PreferredName
Definitions
ENVOID
Link
Marinebiome
Anaquaticbiomethatcomprisessystemsofopen-oceanandunprotected
coastalhabitats,characterizedbyexposuretowaveaction,tidalfluctuation,
andoceancurrentsaswellassystemsthatlargelyresemblethese.Waterin
themarinebiomeisgenerallywithinthesalinityrangeofseawater:30to38
ppt.
ENVO:00000447
http://purl.obolibrary.org/obo/ENVO_00000447
ENVO:01000045
http://purl.obolibrary.org/obo/ENVO_01000045
Estuarinebiome
Expressionsoftheestuarinebiomeoccuratwidelowercoursesofriverswhere ENVO:01000020
theyflowintoasea.Estuariesexperiencetidalflowsandtheirwaterisa
changingmixtureoffreshandsalt.
http://purl.obolibrary.org/obo/ENVO_01000020
Marginalsea
biome
Themarginalseabiomecomprisespartsofanoceanpartiallyenclosedbyland ENVO:01000046
suchasislands,archipelagos,orpeninsulas.UnlikeMediterraneanseas,
marginalseashaveoceancurrentscausedbyoceanwinds.Manymarginal
seasareenclosedbyislandarcsthatwereformedfromthesubductionofone
oceanicplatebeneathanother.
http://purl.obolibrary.org/obo/ENVO_01000046
Marinebenthic
biome
Themarinebenthicbiome(benthicmeaning'bottom')encompassesthe
seafloorandincludessuchareasasshores,littoralorintertidalareas,marine
coralreefs,andthedeepseabed.
ENVO:01000024
http://purl.obolibrary.org/obo/ENVO_01000024
Marinemud
Aliquidorsemi-liquidmixtureofwaterandsomecombinationofsoil,silt,and ENVO:00005795
clay.
http://purl.obolibrary.org/obo/ENVO_00005795
Marinepelagic
biome
Themarinepelagicbiome(pelagicmeaningopensea)isthatofthemarine
watercolumn,fromthesurfacetothegreatestdepths.
ENVO:01000023
http://purl.obolibrary.org/obo/ENVO_01000023
Marinesaltmarsh Themarinesaltmarshbiomecomprisesmarshesthataretransitional
ENVO:01000022
biome
intertidalsbetweenlandandsaltyorbrackishmarinewater(e.g.:sloughs,
bays,estuaries).Itisdominatedbyhalophytic(salttolerant)herbaceous
plants.Thedailytidalsurgesbringinnutrients,whichtendtosettleinrootsof
theplantswithinthesaltmarsh.Thenaturalchemicalactivityofsalty(or
brackish)waterandthetendencyofalgaetobloomintheshallowunshaded
wateralsoallowforgreatbiodiversity.
http://purl.obolibrary.org/obo/ENVO_01000022
Marineupwelling
biome
Amarinebiomewhichcontainscommunitiesadaptedtolivinginan
environmentdeterminedbyanupwellingprocess.
ENVO:01000858
http://purl.obolibrary.org/obo/ENVO_01000858
Marinewater
body
Asignificantaccumulationofwaterwhichispartofamarinebiome.Ideaslike ENVO:00001999
"significant"arefuzzyandneedtobemodelledmoreaccurately.The
definitionisacandidateforreview.
http://purl.obolibrary.org/obo/ENVO_00001999
Mediterranean
seabiome
TheMediterraneanseabiomecomprisesmostlyenclosedseasthathave
limitedexchangeofdeepwaterwithouteroceansandwherethewater
circulationisdominatedbysalinityandtemperaturedifferencesratherthan
winds.
ENVO:01000047
http://purl.obolibrary.org/obo/ENVO_01000047
Oceanbiome
Theoceanbiomecomprisesmajorbodiesofsalinewater,principal
componentsofthehydrosphere.Approximately71%oftheEarth'ssurfaceis
coveredbyocean,acontinuousbodyofwaterthatiscustomarilydividedinto
severalprincipaloceansandsmallerseas.Morethanhalfofthisareaisover
3,000metres(9,800ft)deep.Averageoceanicsalinityisaround35partsper
thousand(ppt)(3.5%),andnearlyallseawaterhasasalinityintherangeof30
to38ppt.
ENVO:01000048
http://purl.obolibrary.org/obo/ENVO_01000048
Epeiricseabiome Theepeiricsea(alsoknownasanepicontinentalsea)biomecomprisesofa
shallowseasthatextendoverpartofacontinent.Epeiricseasareusually
associatedwiththemarinetransgressionsofthegeologicpast,whichhave
variouslybeenduetoeitherglobaleustaticsealevelchanges,localtectonic
deformation,orboth,andareoccasionallysemi-cyclic.
Table 3. Controlled vocabularies for Environmental feature
PreferredName
Definitions
ENVOID
Environmental
feature
Amaterialentitydeterminesanenvironmentalsystemwhenitsremoval
ENVO:00002297
wouldcausethecollapseofthatsystem.Forexample,aseamountdetermines
aseamountenvironment,actingasits'hub'.Thisclassiscurrentlybeing
alignedtotheBasicFormalOntology.Followingthisalignment,itsdefinition
andthedefinitionsofitssubclasseswillberevised.Amaterialentity,which
determinesanenvironmentalsystem.Includesenvironmentalzones,
geographicfeatures,glacialfeature,microscopicphysicalobjects,volcanic
featureandorganicfeature.
Link
http://purl.obolibrary.org/obo/ENVO_00002297
18
Horizon 2020 grant n. 676559
EXCELERATE Deliverable 6.1: Annex I.
Environmental
zoneor
environmental
area
Anenvironmentalzoneisanenvironmentalfeaturewhoseextentis
determinedbythepresenceorinfluenceofoneormorematerialentitiesor
processes.Anenvironmentalzonemay,itself,assumetheroleofan
environmentalfeature.Forexample,aintertidalzoneisthatpartofacoast
whichisexposedtoairandwaterduetotidalprocesses.Itdeterminesthe
intertidalzoneenvironment.Thisclassisexperimentalandnotsuitablefor
annotation.Includes:circalittoralzone(ENVO:01000412),field
(ENVO:01000352),hightidezone(ENVO:00000318),infralittoralzone
(ENVO:01000411),intertidalzone(ENVO:00000316),iron-reducingzoneof
petroleumcontaminatedsediment(ENVO:00002178),littoralzone
(ENVO:01000407),lowtidezone(ENVO:00000319),marineanoxiczone
(ENVO:01000066),marineeulittoralzone(ENVO:01000410),marine
oligotrophicdesert(ENVO:01000073),marinesub-littoralzone
(ENVO:01000126),marinesupra-littoralzone(ENVO:01000124).
ENVO:01000408
http://purl.obolibrary.org/obo/ENVO_01000408
Geographic
feature
Mayappearonamap.Includes:archipelago(ENVO:00000220),volcanicarc
ENVO:00000000
(ENVO:00000351),Arrugado(ENVO:00000538),Cost(ENVO:01000687),sea
cost(ENVO:00000303),brackishestuary(ENVO:00002137),estuarinewater
(ENVO:01000301,estuarinemud(ENVO:00002160),salinewedgeestuary
(ENVO:00000226),lagoon(ENVO:00000038),mangroveswamp
(ENVO:00000057),Mudflat(ENVO:00000192),tidalmudflat
(ENVO:00000241),Seabeach(ENVO:00000092),Seacliff(ENVO:00000088),
Seashore(ENVO:00000485),harbor(ENVO:00000463),marinecurrent
(ENVO:01000067).Furtherhydrographicfeature(ENVO:00000012)suchas
algalbloom(ENVO:2000004),oceanfloor(ENVO:00000426),seafloor
(ENVO:00000482)andMarinepelagicfeature(ENVO:01000044)suchasmidoceanridge(ENVO:00000406),Oceancurrent(ENVO:00000147),Oceanbasin
(ENVO:00002450),Oceantrench,(ENVO:00000275),marinereef
(ENVO:01000143),island(ENVO:00000098),atoll(ENVO:00000166),marine
hydrothermalventchimney(ENVO:01000129),oilseep(ENVO:00002063),oil
reservoir(ENVO:00002185),peninsula(ENVO:00000305),continentalmargin
(ENVO:01000298),continentalrise(ENVO:00000274),riffle
(ENVO:00000148),salineevaporationpond(ENVO:00000055),tidalpool
(ENVO:00000317),
http://purl.obolibrary.org/obo/ENVO_00000000
Glacialfeature
Ahydrographicfeaturecharacterizedbythedominanceofsnoworice.
IncludingIceberg(ENVO:00000298),brinepool(ENVO:01000060),
http://purl.obolibrary.org/obo/ENVO_00000131
ENVO:00000131
Table 4. Controlled vocabularies for Environmental material
PreferredName
Definitions
ENVOID
Link
Environmental
material
Everythingunderthisparentmustbeamassnoun.Allsubclassesaretobe
ENVO:00010483
understoodasbeingcomposedprimarilyofthenamedentity,ratherthan
restrictedtothatentity.Forexample,"ENVO:water"istobeunderstoodas
"environmentalmaterialcomposedprimarlyofsomeCHEBI:water".Thisclass
iscurrentlybeingalignedtotheBasicFormalOntology.Followingthis
alignment,itsdefinitionandthedefinitionsofitssubclasseswillberevised.A
portionofenvironmentalmaterialisafiatobject,whichformsthemediumor
partofthemediumofanenvironmentalsystem.
http://purl.obolibrary.org/obo/ENVO_00010483
Clay
Agroupofhydrousaluminiumphyllosilicate(phyllosilicatesbeingasubgroup
ofsilicateminerals)minerals(seeclayminerals),thataretypicallylessthan
2micrometresindiameter.Clayconsistsofavarietyofphyllosilicateminerals
richinsiliconandaluminiumoxidesandhydroxides,whichincludevariable
amountsofstructuralwater.
ENVO:00002982
http://purl.obolibrary.org/obo/ENVO_00002982
Marl
Marlisamassofcalciumcarbonatederivedfrommolluskshellsandmixed
withsiltandclay.
ENVO:01000853
http://purl.obolibrary.org/obo/ENVO_01000853
Mud
Aliquidorsemi-liquidmixtureofwaterandsomecombinationofsoil,silt,and ENVO:01000001
clay.Includes:anaerobicmud,arsenic-richmud,estuarinemud,marinemud
http://purl.obolibrary.org/obo/ENVO_01000001
Particulate
matter
Particulatematerialisanenvironmentalmaterial,whichiscomposedof
microscopicportionsofsolidorliquidmaterialsuspendedinanother
environmentalmaterial.
ENVO:01000060
http://purl.obolibrary.org/obo/ENVO_01000060
Sand
Anaturallyoccurringgranularmaterialcomposedoffinelydividedrockand
mineralparticles.Includesaciddunesand,beachsand,rockysandandsea
sand
ENVO:01000017
http://purl.obolibrary.org/obo/ENVO_01000017
Sediment
Sedimentisanenvironmentalsubstancecomprisedofanyparticulatematter
thatcanbetransportedbyfluidflowandwhicheventuallyisdepositedasa
layerofsolidparticlesonthebedorbottomofabodyofwaterorotherliquid.
Includesanaerobicsediment,biogenoussediment,bouldersediment,clay
sediment,cobblesediment,colloidalsediment,contaminatedsediment,
granularsediment,hydrogenoussediment,hyperthermophilicsediment,
ENVO:00002007
http://purl.obolibrary.org/obo/ENVO_00002007
19
Horizon 2020 grant n. 676559
EXCELERATE Deliverable 6.1: Annex I.
intertidalsediment,marinesediment,mesophilicsediment,pebblesediment,
sandysediment,siltysediment,streamsediment,suspendedsediment,
terrigenoussediment,thermophilicsediment.
Solid
environmental
material
Anenvironmentalmaterial,whichisinasolidstate.Thisisadefinedclass:its
subclasseswillnotbeasserted,butfilledbyinference.Includesmineral
material,rock,watericeandwood.
ENVO:01000814
http://purl.obolibrary.org/obo/ENVO_01000814
Wastematerial
Amaterialwhichisnotthedesiredoutputofaprocessandwhichistypically
theinputofaprocesswhichremovesitfromitsproducer(e.g.adisposal
process).Includesbiologicalwastematerial,industrialwastematerialand
wastewater.
ENVO:00002264
http://purl.obolibrary.org/obo/ENVO_00002264
Water
Anenvironmentalmaterialprimarilycomposedofdihydrogenoxideinits
ENVO:00002006
liquidform.Includes:contaminatedwater(ENVO:00002186),eutrophicwater
(ENVO:00002224),hydrothermalfluid(ENVO:01000134),muddywater
(ENVO:00005793),
oilfieldproductionwater(ENVO:00002194),salinewaterorsaltwater
(ENVO:00002010),seawateroroceanwater(ENVO:00002149),brackish
water(ENVO:00002019),estuarinewater(ENVO:01000301),hypersaline
water(ENVO:00002012),brine(ENVO:00003044),coastalseawater
(ENVO:00002150),surfacewater(ENVO:00002042)
http://purl.obolibrary.org/obo/ENVO_00002006
20
Horizon 2020 grant n. 676559
© Copyright 2026 Paperzz