USER GUIDE (v1.15.5) 1 1. Introduction ...................................................................................................................... 3 Rationale .............................................................................................................................................. 3 Scope and Intended Users .............................................................................................................. 3 2. Data Processing ................................................................................................................ 4 Sample Acquisition ........................................................................................................................... 4 Sample Filtering ................................................................................................................................. 5 Phylogenetic Assignment ............................................................................................................... 5 3. Website Design and Navigation .................................................................................. 6 4. Search Functionality ....................................................................................................... 7 Comparison Search ........................................................................................................................... 8 Interface Layout ............................................................................................................................................... 9 Search Functionality .................................................................................................................................... 10 Naming Groups .............................................................................................................................................. 11 Selecting Samples ......................................................................................................................................... 12 Running Search .............................................................................................................................................. 12 Search Results ................................................................................................................................................ 13 Search Results: Analysis Overview ....................................................................................................... 13 Search Results: Species Coverage .......................................................................................................... 16 Search Results: Heat Map ............................................................................................................ 18 Microbiota Search .......................................................................................................................... 19 Search Functionality .................................................................................................................................... 20 Menu Options ................................................................................................................................................. 20 Search Function ............................................................................................................................................. 21 Search Results ................................................................................................................................................ 21 Search Results: Analysis Overview ....................................................................................................... 22 Search Results: Prevalence Distribution ............................................................................................ 25 Search Results: Co-‐occurring Species Coverage .............................................................................. 26 Functional Search ........................................................................................................................... 28 Search Functionality ..................................................................................................................... 29 Search Results ................................................................................................................................................ 29 Search Results: Analysis Overview ....................................................................................................... 30 Search Results: Prevalence Distribution ............................................................................................ 33 Search Results: Co-‐occurring Species Coverage .............................................................................. 34 Sequence Search ............................................................................................................................. 36 Search Functionality ..................................................................................................................... 37 Search Results ................................................................................................................................................ 37 Search Results: Analysis Overview ....................................................................................................... 38 Search Results: Prevalence Distribution ............................................................................................ 41 Search Results: Co-‐occurring Species Coverage .............................................................................. 42 5. Manual Annotation ....................................................................................................... 44 User Registration ............................................................................................................................ 44 Annotation Methods ...................................................................................................................... 45 Annotation Icons ............................................................................................................................ 46 Performing Annotation ................................................................................................................ 47 Viewing Annotations ................................................................................................................................... 48 2 1. Introduction The application of whole genome, metagenomic sequencing to uncover the microbial diversity present in almost every environment is revolutionizing many aspects of biology. In particular areas of metagenomic research, the underlying biological knowledge exists and the potential user groups are sufficiently broad to necessitate the existence of specific databases with specialized search functionality. The microbiota found within humans and the human gastrointestinal tract particularly, represents one such area. From basic research, disease association and biomarker discovery to intervention through bacteriotherapy the importance of microbiota colonization to human health is continually emerging and represents and active area of research. Rationale While many tools exist that allow one to search metagenomic data, the incorporation of biological, particularly microbiological knowledge and species level community structure with these datasets remains limited. Equally, manual curation of metagenomic datasets is frequently required to ensure the maximum possible metadata is available for search across samples. The Human Pan-‐Microbe Community database provides this functionality in an easily accessible, user-‐ friendly format. Scope and Intended Users The primary intended users for the HPMC database are biologists, particularly microbiologists, immunologists and medical scientists. The intention of this resource is provide important, relevant data to enable these users to progress from basic correlation based association studies to causative studies that fulfill the requirements of the Koch’s postulate and demonstrate causation. The basic users groups are as follows: Microbiology researchers: identification and investigation of conditions where species or phylogenetic groups of interest are dominant or absent and explore further biological and functional understandings of important or poorly described species. Clinical and disease based researchers: identification of microbiota and community structures typical of a condition or disease state of interest that could enable strain level pathogen detection, identification of susceptibility biomarkers or bacteriotherapy candidates Genetics and Genomics based researchers: identification of conditions or communities where genes or functional capability exist such as conditions with a dominance or absence of antimicrobial resistance or important metabolic pathways. 3 2. Data Processing A highly selective, manually curated process is applied to ensure the continued high quality of the data contained within the HPMCD. This process relies initially on data submitted to the EBI Metagenomics Portal, which is re-‐analyzed using a specialized, standardized analysis pipeline. This pipeline integrates extensive, manually Portal, which collections and curated sample metadata. The resulting relational database is available for query through the web-‐based interface Sample Acquisition All samples included in the HPMC database are acquired from the European Nucleotide Archive with all new samples being included through the EBI Metagenomics portal. Samples are considered for inclusion in the database where they originate from human faecal samples and contain sufficient metadata to determine if they originate from a healthy of diseased individual. 4 Sample Filtering Sample data are integrated from high quality metagenomics datasets within the EMP that are enhanced with manually curated sample metadata and a culture derived, comprehensive genome collection for phylogenetic analysis. Samples where greater than 25% of reads are filtered due to poor read quality or significant human contamination are excluded at this point. In the current version of the database 94 samples were excluded by this filtering. Historical, publicly available metagenomic samples that are available in the European Nucleotide Archives but are currently absent in the EBI Metagenomic Portal and pass these quality criteria have also been included. Reads from included samples undergo quality filtering with Trimommatic v0.33, high quality gene fragments are identified using FragGeneScan v1.19 and functional annotation performed using InterProScan v5.0. This analysis method is equivalent to that described for the original EBI Metagenomics Portal. Identified gene fragments are also included in a reference BLAT database to provide sequence similarity search functionality. Phylogenetic Assignment A manually annotated list of known human microbiota associated genomes in combination with extensive high quality genomes available through the NCBI genome assembly archive. Reads are assigned to genomes based on ability to classify in a phylogenetic context. These counts are then corrected for genome uniqueness to achieve a relative count of species abundance Genome uniqueness is defined as the percentage of the genome where a 100 bp sliding window would uniquely identify that genome amongst all genomes contained in the database. This approach corrects for uneven genome coverage across phylogenetic group enabling direct comparison between species. The resulting corrected counts are subjected to log transformation, samples standardized to a mean of 0 and multiple sample scaling performed according to best practice metagenomic data analysis 3. Website Design and Navigation The website is designed to enable easy navigation and searching across various functions. 1 2 3 1. Menu Bar The primary navigation is performed through the main horizontal menu bar. All the relevant high-‐level navigation options will always be displayed in this location. 2. User Login To access personalized features such as custom microbe annotation it is possible to register and log in for free. 3. Version Number The current version is always displayed at the center, bottom of the screen. Major version changes (e.g. 1.x) are stored in the archive site indefinitely. This can be accessed from the help page. 6 4. Search Functionality The HPMC database provides the ability to perform 4 main types of search functionality: Comparison Search: Compare samples to determine similarities and differences in microbiota communities. Microbiota Search: Search for microbial group, identifying common sample conditions and the community structure typically associated with the microbial group occurrence. Functional Search: Search for samples that contain species capable of providing particular functions or possessing specific functional characteristics. Sequence Search: Detailed descriptions of each of these search functionalities are provided on the following pages. 7 Comparison Search The comparison search functionality is designed to allow users to select two groups of samples and compare the characteristics of each group. If desired, it is also possible to use this functionality to search just a single group and obtain associated data. To access the comparison search functionality click on the “Comparison Search” option in the menu bar. This will display the comparison search interface. Interface Layout This search interface is divided into two main sections. Panel 1 provides the ability to search samples by particular characteristics including age, gender and ethnicity. Panel 2 provides a list of samples within the database that fulfill the searched criteria. 1 2 Search Functionality 1 2 3 4 5 1. Text Search: Enter any value into the text search area to limit samples to those including that text value 2. Sample Type Search: Select to include all samples, only healthy samples or only diseased or treated samples. 3. Age Range Search: Define the age range for samples to include. The range can be defined in year or days. To include only those samples that fit within the defined range remember to unselect the “Include Unknown Age” checkbox. 4. Gender Search: Use this search option to include only samples originating from males or females. 5. Ethnicity Search: Select the ethnicity of the samples to include in the search. Naming Groups Group names can be defined by selecting the “Group Options” section of the search menu bar. This can be expanded by selecting the “+” option. Names can be specified to identify the groups. The total number of samples included in each group is also shown in this visualization. Selecting Samples To include samples in each group select the box on the left of the sample information. Selecting the first group will assign them to the blue group while selecting the second group will assign them to the orange group. For example in the above example the first three samples are assigned to the Blue group (previously named “Group 1”), the next three samples are assigned to the Orange group (previously named “Group 2”) and the remaining three samples are currently not assigned to either group. Running Search Once the samples have been assigned to each group click the “Run Analysis” button to start the search. Search Results There are three windows available with search results associated with the comparison search functionality that can be viewed by selecting from the following menu bar. These are described in detail below. Search Results: Analysis Overview The “Analysis Overview” section provides a detailed summary of the sample characteristics associated with each group. There are four components of this search that can be accessed by the vertical menu bar. Sample Type Distribution The “Sample Type Distribution” displays the number of healthy and abnormal samples present in each group. Abnormal samples include diseased samples and those that have had interventions such as antibiotics that are likely to alter the microbiota composition. Gender Distribution The “Gender Distribution” displays the number of Male and Female samples included in each group. Ethnicity Distribution The “Ethnicity Distribution” displays the distribution of Ethnicities represented in the selected samples for each group Age Distribution An age histogram for each sample is also provided to enable comparison between the selected groups Search Results: Species Coverage The distribution of phylogenetic groups is displayed for browsing. There is a comprehensive menu for altering the values displayed as detailed below. Placing the cursor over the bars in the graph will provide detailed information about the search results. The graph can also be exported using the dropdown box in the top right corner as described previously. Menu Options Phylogenetic Level: The phylogenetic level to display can be selected by clicking on the relevant letter (K: Kingdom, P: Phylum, C: Class, O: Order, F: Family G: Genera, S: Species) Format: The counts can be displayed as either raw counts of samples that fulfill the criteria or (as default) the percentage of the group that fulfill the criteria. Search: The search option provides the ability to search for specific text such as species names and only display those groups that match the search. Fold Ratio: Specify the difference that must be observed between Group 1 and Group 2 for the results to be included Expression Cutoff (% Reads): Specify the minimum corrected percentage of reads that a sample must have to be considered positive for a particular species Group 1 Minimum: The minimum percentage of samples a phylogenetic group must be represented in for group 1 to be included in the results. Group 2 Minimum: The minimum percentage of samples a phylogenetic group must be represented in for group 2 to be included in the results. Search Results: Heat Map To examine the distribution of groups present in each sample heat maps are displayed for each group. This can be viewed at any phylogenetic level as described previously for the Species Coverage menu. Where more samples than can fit on the screen are present scroll bars are displayed at the bottom of the window. Further information can be displayed by moving the cursor over the relevant point in the heat map. As with the other graphs on the site these can be exported using the dropdown menu in the top right corner. Microbiota Search The microbiota search functionality allows one to identify the conditions in which particular phylogenetic groups are found across all samples included in the database. This provides important functionality to enable identification of community structure associated with particular microbes or groups. To access the microbiota search functionality click on the “Microbiota Search” option in the menu bar. This will display the microbiota search interface. 19 Search Functionality The microbiota search is performed by selecting the phylogenetic group of interest and defining a detection level cutoff. Samples with the group of interest detected above this cutoff will be considered positive. Menu Options The following menu options are available in the Microbiota Search section Phylogenetic Search Level: Select the phylogenetic level at which to search Include Only Cultured Gastrointestinal Species: Select this option to include only those species that have been manually curated as gastrointestinal species. This classification is based on the fact that they have been cultured from human faecal samples. Include Only Publicly Annotated Gastrointestinal Species: Samples can also be annotated as gastrointestinal species through interactive community annotation. This option will display those species annotated in this way. Please note all species manually annotated at cultured will also be returned in this category. Detection Level Cutoff: The level above which detection will be considered positive Community annotation can also be performed through this menu option by clicking the associated “Up Vote” Arrow. Please see section 5 of this manual for more information about this feature. Search Function The selected phylogenetic group will be highlighted. Once a group has been selected click “Run Analysis” to perform the search Search Results There are three windows available with search results associated with the microbiota search functionality that can be viewed by selecting from the following menu bar. These are described in detail on the following page. Search Results: Analysis Overview The “Analysis Overview” section provides a detailed summary of the sample characteristics associated with the searched microbial group compared to the background of other samples within the database. There are four components of this search that can be accessed by the vertical menu bar. Sample Type Distribution The “Sample Type Distribution” displays the number of healthy and abnormal samples present in each group. Abnormal samples include diseased samples and those that have had interventions such as antibiotics that are likely to alter the microbiota composition. Gender Distribution The “Gender Distribution” displays the number of Male and Female samples included in each group. Ethnicity Distribution The “Ethnicity Distribution” displays the distribution of Ethnicities represented in the samples containing the selected microbial group compared to the background set. Age Distribution An age histogram for each sample is also provided to enable comparison between the selected groups Search Results: Prevalence Distribution The prevalence distribution is displayed for samples where the microbial group search is detected. This enables one to identify conditions where the microbe or phylogenetic group search is particularly prevalent. More information can be seen by moving the pointer over the relevant sample in the graph Search Results: Co-‐occurring Species Coverage To investigate the community structure the HPMC database provides an overview of the community structure that exists in the samples identified as containing the searched microbe compared to the background set found across the rest of the database. This enables one to identify those species or phylogenetic groups that frequently occur with the species of interest at a level higher than would be expected by chance alone. Menu Options Phylogenetic Level: The phylogenetic level to display can be selected by clicking on the relevant letter (K: Kingdom, P: Phylum, C: Class, O: Order, F: Family G: Genera, S: Species) Format: The counts can be displayed as either raw counts of samples that fulfill the criteria or (as default) the percentage of the group that fulfill the criteria. Search: The search option provides the ability to search for specific text such as species names and only display those groups that match the search. Fold Ratio: Specify the difference that must be observed between Group 1 and Group 2 for the results to be included Expression Cutoff (% Reads): Specify the minimum corrected percentage of reads that a sample must have to be considered positive for a particular species Group 1 Minimum: The minimum percentage of samples a phylogenetic group must be represented in for group 1 to be included in the results. Group 2 Minimum: The minimum percentage of samples a phylogenetic group must be represented in for group 2 to be included in the results. Functional Search To access the functional search functionality click on the “Functional Search” option in the menu bar. This will display the functional search interface. 28 Search Functionality The functionality search provides the ability to identify samples where particular gene ontology is represented. This enables users to search for groups of species that share a common function Search Results There are three windows available with search results associated with the microbiota search functionality that can be viewed by selecting from the following menu bar. These are described in detail on the following page. Search Results: Analysis Overview The “Analysis Overview” section provides a detailed summary of the sample characteristics associated with the searched microbial group compared to the background of other samples within the database. There are four components of this search that can be accessed by the vertical menu bar. Sample Type Distribution The “Sample Type Distribution” displays the number of healthy and abnormal samples present in each group. Abnormal samples include diseased samples and those that have had interventions such as antibiotics that are likely to alter the microbiota composition. Gender Distribution The “Gender Distribution” displays the number of Male and Female samples included in each group. Ethnicity Distribution The “Ethnicity Distribution” displays the distribution of Ethnicities represented in the samples containing the selected microbial group compared to the background set. Age Distribution An age histogram for each sample is also provided to enable comparison between the selected groups Search Results: Prevalence Distribution The prevalence distribution is displayed for samples where the microbial group search is detected. This enables one to identify conditions where the microbe or phylogenetic group search is particularly prevalent. More information can be seen by moving the pointer over the relevant sample in the graph Search Results: Co-‐occurring Species Coverage To investigate the community structure the HPMC database provides an overview of the community structure that exists in the samples identified as containing the searched microbe compared to the background set found across the rest of the database. This enables one to identify those species or phylogenetic groups that frequently occur with the species of interest at a level higher than would be expected by chance alone. Menu Options Phylogenetic Level: The phylogenetic level to display can be selected by clicking on the relevant letter (K: Kingdom, P: Phylum, C: Class, O: Order, F: Family G: Genera, S: Species) Format: The counts can be displayed as either raw counts of samples that fulfill the criteria or (as default) the percentage of the group that fulfill the criteria. Search: The search option provides the ability to search for specific text such as species names and only display those groups that match the search. Fold Ratio: Specify the difference that must be observed between Group 1 and Group 2 for the results to be included Expression Cutoff (% Reads): Specify the minimum corrected percentage of reads that a sample must have to be considered positive for a particular species Group 1 Minimum: The minimum percentage of samples a phylogenetic group must be represented in for group 1 to be included in the results. Group 2 Minimum: The minimum percentage of samples a phylogenetic group must be represented in for group 2 to be included in the result Sequence Search To access the sequence search functionality click on the “Sequence Search” option in the menu bar. This will display the sequence search interface. 36 Search Functionality The sequence search functionality enables user to identify samples that contain a particular sequence of interest then define the community structure of the microbiota associated with this sequence. Search Results There are three windows available with search results associated with the microbiota search functionality that can be viewed by selecting from the following menu bar. These are described in detail on the following page. Search Results: Analysis Overview The “Analysis Overview” section provides a detailed summary of the sample characteristics associated with the searched microbial group compared to the background of other samples within the database. There are four components of this search that can be accessed by the vertical menu bar. Sample Type Distribution The “Sample Type Distribution” displays the number of healthy and abnormal samples present in each group. Abnormal samples include diseased samples and those that have had interventions such as antibiotics that are likely to alter the microbiota composition. Gender Distribution The “Gender Distribution” displays the number of Male and Female samples included in each group. Ethnicity Distribution The “Ethnicity Distribution” displays the distribution of Ethnicities represented in the samples containing the selected microbial group compared to the background set. Age Distribution An age histogram for each sample is also provided to enable comparison between the selected groups Search Results: Prevalence Distribution The prevalence distribution is displayed for samples where the microbial group search is detected. This enables one to identify conditions where the microbe or phylogenetic group search is particularly prevalent. More information can be seen by moving the pointer over the relevant sample in the graph Search Results: Co-‐occurring Species Coverage To investigate the community structure the HPMC database provides an overview of the community structure that exists in the samples identified as containing the searched sequence compared to the background set found across the rest of the database. This enables one to identify those species or phylogenetic groups that frequently occur with the species of interest at a level higher than would be expected by chance alone. Menu Options Phylogenetic Level: The phylogenetic level to display can be selected by clicking on the relevant letter (K: Kingdom, P: Phylum, C: Class, O: Order, F: Family G: Genera, S: Species) Format: The counts can be displayed as either raw counts of samples that fulfill the criteria or (as default) the percentage of the group that fulfill the criteria. Search: The search option provides the ability to search for specific text such as species names and only display those groups that match the search. Fold Ratio: Specify the difference that must be observed between Group 1 and Group 2 for the results to be included Expression Cutoff (% Reads): Specify the minimum corrected percentage of reads that a sample must have to be considered positive for a particular species Group 1 Minimum: The minimum percentage of samples a phylogenetic group must be represented in for group 1 to be included in the results. Group 2 Minimum: The minimum percentage of samples a phylogenetic group must be represented in for group 2 to be included in the results 5. Manual Annotation User Registration To provide manual annotation of species you must be a registered user. This is an automatic process that is free for all users who wish to join. Please note, that while it is possible to prevent your name from appearing in the public statistics section it is not possible to prevent your annotations from contributing to the total annotations available to all users for the benefit of the whole community. The user registration form can be accessed from any page by clicking on the register link and appears as follows: Once registered, users are immediately able to contribute to the community annotation 44 Annotation Methods Community annotation contributions can be provided either through the specific “Microbiota Annotation” window Alternatively annotations can be added directly when using the Microbiota Search functionality. Annotation Icons There are four key icons associated with the annotation Icon Description This symbol indicates that this species or phylogenetic groups has been cultured and a genome sequence generated from a human faecal sample This symbol indicates that this species or phylogenetic group has been annotated as occurring in the human gastrointestinal tract by the community and approved by moderators however no or limited culturing data exists. This symbol indicates that this species or phylogenetic group has been annotated as occurring in the human gastrointestinal tract by the community but has not been independently verified by moderators. This symbol indicates that this species or phylogenetic group has not been reported to occur in the human gastrointestinal tract. This could be due to limited data or because it does not occur in this environment. Performing Annotation To perform annotation simply click the Blue Up Arrow next to the species or phylogenetic group name. If the arrow displayed is grey you will need to log on before making annotations is enabled. This will be recorded in your user profile and continue to be displayed as a tick when you next log on. Viewing Annotations When you view your own annotations they will always be displayed as a green tick. This will be visible where the “Up Arrow” was previously located. The community annotation will remain visible in the first column.
© Copyright 2026 Paperzz