2. System Requirements

Functional annotation
Contents
1.
2.
3.
Introduction
2
System Requirements
2
Visual overview of the user interface
2
3.1 Log-on / signing up..................................................................... 3
3.2 Main application......................................................................... 3
3.2.1 Menu bar............................................................................................... 4
3.2.2 Side Bar................................................................................................. 4
3.2.2.1
Open File............................................................................................................. 4
3.2.2.3
Add Proteins........................................................................................................ 4
3.2.2.4
Run Search.......................................................................................................... 4
3.2.2.5Edit...................................................................................................................... 4
3.2.2.6
Calculate Enrichment.......................................................................................... 4
3.2.3.1
3.2.3.2
3.2.3.3
3.2.6.1
User rights and roles........................................................................................... 7
3.2.6.2Overview............................................................................................................. 8
3.2.6.3Protein................................................................................................................. 8
3.2.6.4Synonyms............................................................................................................ 8
3.2.6.5
Description.......................................................................................................... 8
3.2.6.6
Functions............................................................................................................. 8
3.2.6.7
Gene Model......................................................................................................... 8
3.2.7.1
3.2.3 Three view modes: Data Table, Tree Map, Enrichment.........................4
Data table View................................................................................................... 5
Tree Map View.................................................................................................... 5
Enrichment View................................................................................................. 6
3.2.4 Protein search dialog.............................................................................6
3.2.5 Calculate functional term enrichment.................................................. 7
3.2.6Editor..................................................................................................... 7
3.2.7 Bin Chooser...........................................................................................9
Evidence codes decision tree............................................................................ 10
4 References
10
software manual
1
Functional annotation
1. Introduction
Using controlled vocabularies for functional annotation or functional ontology simplifies the exploration
and interpretation of high-throughput data by humans and machines.
The MapMan Bin ontology was initiated by Thimm (2004) and can be used as a controlled classification
system that is particularly suited for plant organisms. The MapMan ontology is represented as a
hierarchically structured tree that currently comprises 35 main biological categories grouping 1854
functional categories (BINs). It follows the paradigm of trying to assign a protein/gene to as few BINs as
possible without losing information. Functional analysis based on hierarchically structured annotations can directly and consistently be
used for cross-species and cross-experiment comparisons. In contrast, functional annotations using GO
terms usually consist of a collection of several terms that, when considered together, only describe the
functional properties of the respective protein.
In addition to the MapMan ontology, our Functional annotation tool (Fa) allows the use of a newly
developed hierarchically organized protein localization ontology. This ontology also is designed as a
hierarchical tree structure and currently comprises 13 main localization terms and 49 sub-localization
terms.
Our Functional annotation tool is a comprehensive data analyzing environment running as a web based
cloud service tool that allows users to work with the ontologies in an interactive manner.
The main functionalities of the application are:
• Batch gene/protein identifier mapping and functional annotation
• Gene/protein database searching
• TreeMap based ontology visualization and interactive browsing
• Ontology term enrichment - testing and visualization
• Annotation of MapMan protein/gene functions and assignment of localization ontologies
according to the user’s expertise.
• Addition of protein/gene synonyms
• Addition of protein/gene descriptions as free text
2. System Requirements
The Functional Annotation tool is a web-based client application using Microsoft Silverlight. It is
compatible with all browsers having Microsoft Silverlight or Moonlight installed.
Microsoft Silverlight is freely available for Microsoft Windows and Mac OS under http://www.microsoft.
com/silverlight . Moonlight Mono, the open source implementation of Microsoft Silverlight for UNIX
systems, can be found under http://www.go-mono.com/moonlight/ .
3. Visual overview of the user interface
The Functional Annotation tool (Fa) is part of the IOMIQS framework application collection and can be
used under the preliminary address http://iomiqsdev.mpimp-golm.mpg.de .
2 manual
software manual
Functional annotation
An IOMIQS user account is needed to use the full functionality of the tool. With the Guest login it is
not possible to add protein/gene annotations and Fa can only be employed for batch annotations,
enrichment calculations and the visualization of data sets.
3.1 Log-on / signing up
Enter the main page under http://iomiqsdev.mpimp-golm.mpg.de and
click on Fa Functional Annotator. Enter your user name and password.
If you don’t have an account yet, click on sign up and enter your
credentials.
Attention: the password requires six or more characters.
3.2 Main application
The main application is designed for work on data tables. On the left side of the windows you find the
side bar. On top of the main window is the menu bar. Here you can choose the organism, the data source
(gene model type) and the ontology type (MapMan or Localization) you decide to work with. In the
center you find the main table which is empty by default.
The Fa Functional Annotator features two different workflows:
Workflow 1 – no data set
In this case users can select the genes/proteins they wish to work with. This is done by pressing the Add
Proteins button and selected genes/proteins will appear in the data table. This data table may be saved as
a tab-separated text file by using the Save Data button.
Workflow 2 - data set provided by the user
Here data sets may be imported by the user. Data sets must be tab-separated text files and must contain
at least one column with gene/protein identifiers or their synonyms. In addition, they may contain other
software manual
3
Functional annotation
data of any type. Data sets can be imported by pressing the Open File button. Do not forget to check or
uncheck the Table has Column Names checkbox before loading the file. Protein, Bin(s), Bin Ontology and
Search Synonym are columns that are added to your data table automatically.
3.2.1 Menu bar
In addition to the fields Organism, Data source and Ontology type, the menu bar contains radio buttons
termed Data Table, Tree Map and Enrichment. The latter allow you to switch between the different view
modes featured by the Fa tool.
Remark: The data source you choose describes the gene model version for which you want to get the
annotation identifiers. It is not necessary that your data set contains these identifiers, as the batch
annotation mode is downwards compatible (as long as the gene model includes the current identifier as
synonym).
3.2.2 Side Bar
The side bar on the left contains buttons for the main functions of the Fa tool, which are explained in the
following.
3.2.2.1 Open File
Open File opens a dialog that enables the user to import a data set as tab-separated text file. Before
opening a file remember to mark or unmark the Table has column names checkbox.
3.2.2.2 Save Data
Save Data opens a dialog for exporting the current data table as a tab-separated text file.
3.2.2.3 Add Proteins
Add Proteins opens the protein search dialog and allows the user to create a table of genes/proteins for
further editing. Look at the protein search dialog below for more details.
3.2.2.4 Run Search
Run Search maps protein identifier and functional annotation to the current data set in batch mode. All
entries in the data set need to be selected and the process is started by clicking the Run Search button. A
pop up window appears where the user selects a column containing the synonyms, which is confirmed by
pressing the Run button. A synonym may be a gene/protein identifier or the corresponding gene/protein
name.
Attention: You need to have annotations mapped to your data set before you can start working with it.
3.2.2.5 Edit
Edit opens the Entry editor to add or modify information on the selected gene/protein. Look at Entry
editor below for more details.
3.2.2.6 Calculate Enrichment
Calculate Enrichment is only active in the Tree Map mode (radio button in menu bar). It opens the Bin
enrichment calculator. See section on Calculate functional term enrichment below for more details.
3.2.3 Three view modes: Data Table, Tree Map and Enrichment
As mentioned above the view mode is selected by the radio buttons in the menu bar.
4 manual
software manual
Functional annotation
3.2.3.1 Data table View
The Data table view shows all columns of the data set which the user has loaded into the Fa tool.
It always contains the five columns: Protein, #Hits, Bin(s), Bin Ontology and Search Synonym. These
columns are there by default even if the user does not import data but creates an own data table by the
Add Proteins function. Protein refers to the protein identifier of the given Search Synonym. The Search
Synonym can be any synonym referring to a protein (gene/protein identifier or gene/protein name). Bin
Ontology indicates the used ontology. Bin(s) refers to the code of the ontology term. #Hits shows the
number of protein identifiers matching the respective Search Synonym. This should be “1” as gene/
protein synonyms should be distinct. If this is not the case the user can correct mappings by selecting
the respective entry and choosing the correct match from a drop-down-box accessible by clicking on the
arrow next to the entry in the Protein column.
REMARK: In case of multiple matches, please correct the synonym entries of the respective gene/protein.
3.2.3.2 Tree Map View
software manual
5
Functional annotation
The Tree Map view is a visualization of the hierarchically organized ontology structure. It visualizes all
entries of the data table dynamically. Each box represents an ontology term where the box size reflects
the number of members within this term (log transformed). The color corresponds to one of the 35 top
ontology terms and becomes lighter the deeper the respective term is in the hierarchy. After enrichment
calculation, ontology terms with a p-value smaller than the placed threshold are encircled by dashed
lines.
3.2.3.3 Enrichment View
The Enrichment View shows the results of the ontology term enrichment calculation in a table format
after the calculation was performed (see Calculate functional term enrichment below for further
information). The four columns Bin, Bin Name, Bin Enrichment and Proteins contain the following
information: Bin is the code of the ontology term. Bin Name is the name of the ontology term. Bin
Enrichment contains the p-value calculated based on a hypergeometrical test for the respective functional
term (Rivals, Personnaz et al. 2007). Proteins shows all genes/proteins used for the calculation within this
ontology term.
3.2.4 Protein search dialog
The Protein search dialog is opened after
pressing the Add Proteins button in the
side bar. It enables the user to search for
specific proteins/genes of interest and
to add them to a working data set for
further editing. The search is set up as a
contains search, which means all search
hits contain the search phrase and will
be shown in the search results. A search
result can be selected and added to the
working table. By default, only synonyms
are searched, but unchecking the option
Search Synonyms Only will result in a full text search (also including the description text of the genes/
proteins).
6 manual
software manual
Functional annotation
3.2.5 Calculate functional term enrichment
Functional term enrichment or Gene set enrichment
analysis (GSEA) is a computational method that
determines whether an a priori defined set of genes
shows statistically significant, concordant differences
between two biological states. For calculating GSEA,
the data set must contain a column defining the gene/
protein set (group of genes) as for example derived by
a cluster algorithm (cluster membership) encoded as
integers (finite number).
In the side bar the button Calculate Enrichment starts the
calculation wizard (enabled only in the Tree Map View
mode). Group Column selects the column containing
the set identifiers. Group Value marks the value for
the respective data set the enrichment is calculated
for. Within the field Universe you define, whether
enrichment is calculated in the context of the input data, or in the context of the whole data present
for the respective organism. You can select a p-value in the Threshold box, below which an enrichment
is considered as significant and terms in the Tree Map Visualization are encircled by dashed lines. Mid
pValue Threshold sets the number of term members below which a mid p-value is calculated. This is used
to punish terms having only few members. Min Number in Terms sets the minimal number of members
required to be present within a term below which a term is not considered anymore. The p-value
calculation is based on a hypergeometrical statistical test (Rivals, Personnaz et al. 2007).
3.2.6 Editor
The main concept of the editor is that users can add information according to their user rights. No user
can modify an existing entry. Genes/proteins with ambiguous term assignments will automatically be sent
to a consortium of experts to decide which entry should be considered as the valid one. Every change
made to user’s data is directly submitted to the database so that by enriching your own data set you
increase the knowledge of the community. The exception here is the guest user, who is only allowed to
see changes older than 6 months.
3.2.6.1 User rights and roles
• Guest
As a guest you cannot edit gene/protein information. In addition, guest users only see entries
older than 6 months.
• Annotator
As an annotator you are allowed to enter functional annotations (MapMan or Localization) and
description as free text.
• SuperAnnotator
As SuperAnnotator you are also allowed to enter gene/protein synonyms.
• Admin
For the admin it is possible to change every field.
• Organism
Users may have access only to some organisms
Once you have pressed the Edit button, a window appears with the following information:
software manual
7
Functional annotation
3.2.6.2 Overview
The overview provides a compact view to what is known on the selected gene/protein. In addition, gene
model picture from http://www.phytozome.net is loaded.
3.2.6.3 Protein
Protein shows the distinct protein identifier, which is the gene identifier extended by a “.pX”. X stands
for a number accounting for the different protein sequences deriving from the same transcript (mature
protein, splicing variant, etc.)
3.2.6.4 Synonyms
Under Synonyms the user (if provided with SuperAnnotator rights) can edit the list of different names
assigned to the selected gene/protein. The value column contains the name (synonym), Type specifies
the type of the synonym and Source tells where the entry is derived from.
3.2.6.5 Description
In the Description, DefLine and References fields the user may enter the respective information as free
text. User indicates the name of the user who has entered this information. 3.2.6.6 Functions
Under Function the associated ontology terms are shown and can be edited by pressing the Add button.
(See Bin Chooser below for further details)
3.2.6.7 Gene Model
Gene model provides information on the transcript, i.e., locus identifier and position of the transcript in
the genome.
8 manual
software manual
Functional annotation
3.2.7 Bin Chooser
The Bin Chooser allows adding controlled ontology terms (currently MapMan and Localization) to a gene/
protein. In the field Bin Type the current ontology can be selected and all terms available for this ontology
are shown in the main window. The user may choose among them and add a term to the gene/protein
currently selected. Search Bins reduces the selectable terms by a search term. Selected Mapping gives the
current term selected by the user. Mapping Description allows the user to comment the annotation as
free text.
Mapping Type is meant to specify whether the annotation made refers to a biological process the gene/
protein participates in, or the molecular function of the gene/protein. As the MapMan ontology is based
on the description of the biological process, the MapMan ontology is used as default.
The Evidence Code is supposed to provide additional information on why the user has assigned a
particular annotation to the gene/protein. The Evidence codes decision tree (shown below) might help to
select a proper evidence code.
software manual
9
Functional annotation
3.2.7.1 Evidence codes decision tree
What type of evidence is the annotation based on?
Experimental
(wet lab)
Computational method
Author statement
from publication
No evidence is available
Is annotation based on genetic
mutations or allelic variation?
Will each annotation be individually
reviewed and confirmed by a
human annotator?
Is annotation based on an
author statement that
cites a published reference
as the source of information?
Is there a GO annotation in another
aspect that allows you to make an
inference based on that GO term for
an aspect without evidence?
yes
no
Is a single gene being
mutated or compared to other
alleles of the same gene?
no
yes
yes
IMP
Is more than one gene
being mutated in
the same strain?
IGI
Is annotation based on a
direct 1 to 1 physical interaction
with another gene product?
no
yes
IPI
Is annotation based on a direct
assay for the function, process,
or component of the gene product?
no
yes
yes
ISS
no
yes
TAS
Is annotation based on an
author statement that does not
cite a published reference
as the source of information?
yes
no
yes
IC
Have you been able to find any
evidence to support a GO
annotation in a given GO aspect?
[see note on use of ND]
NAS
no
ND
Does the computation include
consideration of the genomic
context of the gene?
no
yes
IGC
Curator reviewed an notations
Annotations NO T reviewed by a curator
IDA
Is annotation based on
the expression pattern
of the gene product?
Is the computation an integrated
analysis, typically including
experimental data sets, and often
including multiple data types?
yes
yes
IEA
Is the computation based purely on
the sequence of the gene product
(or sequence-based mapping files)?
no
yes
no
ICA
IEP
http://www.geneontology.org/GO.evidence.tree.shtml
4 References
Rivals, I., L. Personnaz, et al. (2007). “Enrichment or depletion of a GO category within a class of genes:
which test?” Bioinformatics 23(4): 401-407.
Thimm, O., O. Blasing, et al. (2004). “MAPMAN: a user-driven tool to display genomics data sets onto
diagrams of metabolic pathways and other biological processes.” Plant J 37(6): 914-939.
10 manual
software manual