Functional annotation Contents 1. 2. 3. Introduction 2 System Requirements 2 Visual overview of the user interface 2 3.1 Log-on / signing up..................................................................... 3 3.2 Main application......................................................................... 3 3.2.1 Menu bar............................................................................................... 4 3.2.2 Side Bar................................................................................................. 4 3.2.2.1 Open File............................................................................................................. 4 3.2.2.3 Add Proteins........................................................................................................ 4 3.2.2.4 Run Search.......................................................................................................... 4 3.2.2.5Edit...................................................................................................................... 4 3.2.2.6 Calculate Enrichment.......................................................................................... 4 3.2.3.1 3.2.3.2 3.2.3.3 3.2.6.1 User rights and roles........................................................................................... 7 3.2.6.2Overview............................................................................................................. 8 3.2.6.3Protein................................................................................................................. 8 3.2.6.4Synonyms............................................................................................................ 8 3.2.6.5 Description.......................................................................................................... 8 3.2.6.6 Functions............................................................................................................. 8 3.2.6.7 Gene Model......................................................................................................... 8 3.2.7.1 3.2.3 Three view modes: Data Table, Tree Map, Enrichment.........................4 Data table View................................................................................................... 5 Tree Map View.................................................................................................... 5 Enrichment View................................................................................................. 6 3.2.4 Protein search dialog.............................................................................6 3.2.5 Calculate functional term enrichment.................................................. 7 3.2.6Editor..................................................................................................... 7 3.2.7 Bin Chooser...........................................................................................9 Evidence codes decision tree............................................................................ 10 4 References 10 software manual 1 Functional annotation 1. Introduction Using controlled vocabularies for functional annotation or functional ontology simplifies the exploration and interpretation of high-throughput data by humans and machines. The MapMan Bin ontology was initiated by Thimm (2004) and can be used as a controlled classification system that is particularly suited for plant organisms. The MapMan ontology is represented as a hierarchically structured tree that currently comprises 35 main biological categories grouping 1854 functional categories (BINs). It follows the paradigm of trying to assign a protein/gene to as few BINs as possible without losing information. Functional analysis based on hierarchically structured annotations can directly and consistently be used for cross-species and cross-experiment comparisons. In contrast, functional annotations using GO terms usually consist of a collection of several terms that, when considered together, only describe the functional properties of the respective protein. In addition to the MapMan ontology, our Functional annotation tool (Fa) allows the use of a newly developed hierarchically organized protein localization ontology. This ontology also is designed as a hierarchical tree structure and currently comprises 13 main localization terms and 49 sub-localization terms. Our Functional annotation tool is a comprehensive data analyzing environment running as a web based cloud service tool that allows users to work with the ontologies in an interactive manner. The main functionalities of the application are: • Batch gene/protein identifier mapping and functional annotation • Gene/protein database searching • TreeMap based ontology visualization and interactive browsing • Ontology term enrichment - testing and visualization • Annotation of MapMan protein/gene functions and assignment of localization ontologies according to the user’s expertise. • Addition of protein/gene synonyms • Addition of protein/gene descriptions as free text 2. System Requirements The Functional Annotation tool is a web-based client application using Microsoft Silverlight. It is compatible with all browsers having Microsoft Silverlight or Moonlight installed. Microsoft Silverlight is freely available for Microsoft Windows and Mac OS under http://www.microsoft. com/silverlight . Moonlight Mono, the open source implementation of Microsoft Silverlight for UNIX systems, can be found under http://www.go-mono.com/moonlight/ . 3. Visual overview of the user interface The Functional Annotation tool (Fa) is part of the IOMIQS framework application collection and can be used under the preliminary address http://iomiqsdev.mpimp-golm.mpg.de . 2 manual software manual Functional annotation An IOMIQS user account is needed to use the full functionality of the tool. With the Guest login it is not possible to add protein/gene annotations and Fa can only be employed for batch annotations, enrichment calculations and the visualization of data sets. 3.1 Log-on / signing up Enter the main page under http://iomiqsdev.mpimp-golm.mpg.de and click on Fa Functional Annotator. Enter your user name and password. If you don’t have an account yet, click on sign up and enter your credentials. Attention: the password requires six or more characters. 3.2 Main application The main application is designed for work on data tables. On the left side of the windows you find the side bar. On top of the main window is the menu bar. Here you can choose the organism, the data source (gene model type) and the ontology type (MapMan or Localization) you decide to work with. In the center you find the main table which is empty by default. The Fa Functional Annotator features two different workflows: Workflow 1 – no data set In this case users can select the genes/proteins they wish to work with. This is done by pressing the Add Proteins button and selected genes/proteins will appear in the data table. This data table may be saved as a tab-separated text file by using the Save Data button. Workflow 2 - data set provided by the user Here data sets may be imported by the user. Data sets must be tab-separated text files and must contain at least one column with gene/protein identifiers or their synonyms. In addition, they may contain other software manual 3 Functional annotation data of any type. Data sets can be imported by pressing the Open File button. Do not forget to check or uncheck the Table has Column Names checkbox before loading the file. Protein, Bin(s), Bin Ontology and Search Synonym are columns that are added to your data table automatically. 3.2.1 Menu bar In addition to the fields Organism, Data source and Ontology type, the menu bar contains radio buttons termed Data Table, Tree Map and Enrichment. The latter allow you to switch between the different view modes featured by the Fa tool. Remark: The data source you choose describes the gene model version for which you want to get the annotation identifiers. It is not necessary that your data set contains these identifiers, as the batch annotation mode is downwards compatible (as long as the gene model includes the current identifier as synonym). 3.2.2 Side Bar The side bar on the left contains buttons for the main functions of the Fa tool, which are explained in the following. 3.2.2.1 Open File Open File opens a dialog that enables the user to import a data set as tab-separated text file. Before opening a file remember to mark or unmark the Table has column names checkbox. 3.2.2.2 Save Data Save Data opens a dialog for exporting the current data table as a tab-separated text file. 3.2.2.3 Add Proteins Add Proteins opens the protein search dialog and allows the user to create a table of genes/proteins for further editing. Look at the protein search dialog below for more details. 3.2.2.4 Run Search Run Search maps protein identifier and functional annotation to the current data set in batch mode. All entries in the data set need to be selected and the process is started by clicking the Run Search button. A pop up window appears where the user selects a column containing the synonyms, which is confirmed by pressing the Run button. A synonym may be a gene/protein identifier or the corresponding gene/protein name. Attention: You need to have annotations mapped to your data set before you can start working with it. 3.2.2.5 Edit Edit opens the Entry editor to add or modify information on the selected gene/protein. Look at Entry editor below for more details. 3.2.2.6 Calculate Enrichment Calculate Enrichment is only active in the Tree Map mode (radio button in menu bar). It opens the Bin enrichment calculator. See section on Calculate functional term enrichment below for more details. 3.2.3 Three view modes: Data Table, Tree Map and Enrichment As mentioned above the view mode is selected by the radio buttons in the menu bar. 4 manual software manual Functional annotation 3.2.3.1 Data table View The Data table view shows all columns of the data set which the user has loaded into the Fa tool. It always contains the five columns: Protein, #Hits, Bin(s), Bin Ontology and Search Synonym. These columns are there by default even if the user does not import data but creates an own data table by the Add Proteins function. Protein refers to the protein identifier of the given Search Synonym. The Search Synonym can be any synonym referring to a protein (gene/protein identifier or gene/protein name). Bin Ontology indicates the used ontology. Bin(s) refers to the code of the ontology term. #Hits shows the number of protein identifiers matching the respective Search Synonym. This should be “1” as gene/ protein synonyms should be distinct. If this is not the case the user can correct mappings by selecting the respective entry and choosing the correct match from a drop-down-box accessible by clicking on the arrow next to the entry in the Protein column. REMARK: In case of multiple matches, please correct the synonym entries of the respective gene/protein. 3.2.3.2 Tree Map View software manual 5 Functional annotation The Tree Map view is a visualization of the hierarchically organized ontology structure. It visualizes all entries of the data table dynamically. Each box represents an ontology term where the box size reflects the number of members within this term (log transformed). The color corresponds to one of the 35 top ontology terms and becomes lighter the deeper the respective term is in the hierarchy. After enrichment calculation, ontology terms with a p-value smaller than the placed threshold are encircled by dashed lines. 3.2.3.3 Enrichment View The Enrichment View shows the results of the ontology term enrichment calculation in a table format after the calculation was performed (see Calculate functional term enrichment below for further information). The four columns Bin, Bin Name, Bin Enrichment and Proteins contain the following information: Bin is the code of the ontology term. Bin Name is the name of the ontology term. Bin Enrichment contains the p-value calculated based on a hypergeometrical test for the respective functional term (Rivals, Personnaz et al. 2007). Proteins shows all genes/proteins used for the calculation within this ontology term. 3.2.4 Protein search dialog The Protein search dialog is opened after pressing the Add Proteins button in the side bar. It enables the user to search for specific proteins/genes of interest and to add them to a working data set for further editing. The search is set up as a contains search, which means all search hits contain the search phrase and will be shown in the search results. A search result can be selected and added to the working table. By default, only synonyms are searched, but unchecking the option Search Synonyms Only will result in a full text search (also including the description text of the genes/ proteins). 6 manual software manual Functional annotation 3.2.5 Calculate functional term enrichment Functional term enrichment or Gene set enrichment analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states. For calculating GSEA, the data set must contain a column defining the gene/ protein set (group of genes) as for example derived by a cluster algorithm (cluster membership) encoded as integers (finite number). In the side bar the button Calculate Enrichment starts the calculation wizard (enabled only in the Tree Map View mode). Group Column selects the column containing the set identifiers. Group Value marks the value for the respective data set the enrichment is calculated for. Within the field Universe you define, whether enrichment is calculated in the context of the input data, or in the context of the whole data present for the respective organism. You can select a p-value in the Threshold box, below which an enrichment is considered as significant and terms in the Tree Map Visualization are encircled by dashed lines. Mid pValue Threshold sets the number of term members below which a mid p-value is calculated. This is used to punish terms having only few members. Min Number in Terms sets the minimal number of members required to be present within a term below which a term is not considered anymore. The p-value calculation is based on a hypergeometrical statistical test (Rivals, Personnaz et al. 2007). 3.2.6 Editor The main concept of the editor is that users can add information according to their user rights. No user can modify an existing entry. Genes/proteins with ambiguous term assignments will automatically be sent to a consortium of experts to decide which entry should be considered as the valid one. Every change made to user’s data is directly submitted to the database so that by enriching your own data set you increase the knowledge of the community. The exception here is the guest user, who is only allowed to see changes older than 6 months. 3.2.6.1 User rights and roles • Guest As a guest you cannot edit gene/protein information. In addition, guest users only see entries older than 6 months. • Annotator As an annotator you are allowed to enter functional annotations (MapMan or Localization) and description as free text. • SuperAnnotator As SuperAnnotator you are also allowed to enter gene/protein synonyms. • Admin For the admin it is possible to change every field. • Organism Users may have access only to some organisms Once you have pressed the Edit button, a window appears with the following information: software manual 7 Functional annotation 3.2.6.2 Overview The overview provides a compact view to what is known on the selected gene/protein. In addition, gene model picture from http://www.phytozome.net is loaded. 3.2.6.3 Protein Protein shows the distinct protein identifier, which is the gene identifier extended by a “.pX”. X stands for a number accounting for the different protein sequences deriving from the same transcript (mature protein, splicing variant, etc.) 3.2.6.4 Synonyms Under Synonyms the user (if provided with SuperAnnotator rights) can edit the list of different names assigned to the selected gene/protein. The value column contains the name (synonym), Type specifies the type of the synonym and Source tells where the entry is derived from. 3.2.6.5 Description In the Description, DefLine and References fields the user may enter the respective information as free text. User indicates the name of the user who has entered this information. 3.2.6.6 Functions Under Function the associated ontology terms are shown and can be edited by pressing the Add button. (See Bin Chooser below for further details) 3.2.6.7 Gene Model Gene model provides information on the transcript, i.e., locus identifier and position of the transcript in the genome. 8 manual software manual Functional annotation 3.2.7 Bin Chooser The Bin Chooser allows adding controlled ontology terms (currently MapMan and Localization) to a gene/ protein. In the field Bin Type the current ontology can be selected and all terms available for this ontology are shown in the main window. The user may choose among them and add a term to the gene/protein currently selected. Search Bins reduces the selectable terms by a search term. Selected Mapping gives the current term selected by the user. Mapping Description allows the user to comment the annotation as free text. Mapping Type is meant to specify whether the annotation made refers to a biological process the gene/ protein participates in, or the molecular function of the gene/protein. As the MapMan ontology is based on the description of the biological process, the MapMan ontology is used as default. The Evidence Code is supposed to provide additional information on why the user has assigned a particular annotation to the gene/protein. The Evidence codes decision tree (shown below) might help to select a proper evidence code. software manual 9 Functional annotation 3.2.7.1 Evidence codes decision tree What type of evidence is the annotation based on? Experimental (wet lab) Computational method Author statement from publication No evidence is available Is annotation based on genetic mutations or allelic variation? Will each annotation be individually reviewed and confirmed by a human annotator? Is annotation based on an author statement that cites a published reference as the source of information? Is there a GO annotation in another aspect that allows you to make an inference based on that GO term for an aspect without evidence? yes no Is a single gene being mutated or compared to other alleles of the same gene? no yes yes IMP Is more than one gene being mutated in the same strain? IGI Is annotation based on a direct 1 to 1 physical interaction with another gene product? no yes IPI Is annotation based on a direct assay for the function, process, or component of the gene product? no yes yes ISS no yes TAS Is annotation based on an author statement that does not cite a published reference as the source of information? yes no yes IC Have you been able to find any evidence to support a GO annotation in a given GO aspect? [see note on use of ND] NAS no ND Does the computation include consideration of the genomic context of the gene? no yes IGC Curator reviewed an notations Annotations NO T reviewed by a curator IDA Is annotation based on the expression pattern of the gene product? Is the computation an integrated analysis, typically including experimental data sets, and often including multiple data types? yes yes IEA Is the computation based purely on the sequence of the gene product (or sequence-based mapping files)? no yes no ICA IEP http://www.geneontology.org/GO.evidence.tree.shtml 4 References Rivals, I., L. Personnaz, et al. (2007). “Enrichment or depletion of a GO category within a class of genes: which test?” Bioinformatics 23(4): 401-407. Thimm, O., O. Blasing, et al. (2004). “MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes.” Plant J 37(6): 914-939. 10 manual software manual
© Copyright 2025 Paperzz