The Flannotator—a gene and protein expression annotation tool for

BIOINFORMATICS APPLICATIONS NOTE
Vol. 25 no. 4 2009, pages 548–549
doi:10.1093/bioinformatics/btp012
Gene expression
The Flannotator—a gene and protein expression annotation tool
for Drosophila melanogaster
E. Ryder1,∗ , H. Spriggs1 , E. Drummond1 , D. St Johnston2 and S. Russell1
1 Department
of Genetics, University of Cambridge, Cambridge CB2 3EH and 2 The Wellcome Trust/Cancer Research
UK Gurdon Institute, University of Cambridge, Cambridge CB2 1QN, UK
Received and revised on November 25, 2008; accepted on January 1, 2009
Advance Access publication January 6, 2009
Associate Editor: Joaquin Dopazo
ABSTRACT
Summary: Understanding developmental processes and building
towards integrative systems biology require detailed knowledge
of the spatio-temporal expression of genes and proteins. We
have developed a software package for collecting, storing and
searching the annotation of protein or gene expression patterns in
Drosophila melanogaster. Using standard Drosophila anatomy and
Gene Ontologies, the system can readily capture expression patterns
at any stage of development and in all recognized tissue types as
well as details of sub-cellular localization. The web-based system
allows multiple groups to work in collaboration and share images
and annotation.
Availability: http://www.flannotator.org.uk/
Contact: [email protected]
1
INTRODUCTION
The ability to generate in vivo tagged proteins has tremendous
potential for furthering our understanding of developmental
processes by allowing the characterization of sub-cellular protein
localization and facilitating the isolation of multi-protein complexes.
We have recently embarked upon a large-scale genetic screen,
using a transposon-based strategy, to generate and characterize YFPtagged (yellow fluorescent protein) protein trap lines in the model
organism Drosophila melanogaster. Part of this project involves a
collaboration to annotate the expression patterns of the new lines
during fly development and includes over 30 UK-based laboratories.
To aid in the analysis of YFP-trap lines, we have written
software, The Flannotator, that facilitates the annotation of protein
trap expression during all stages of development and in all tissue
types (including sub-cellular location and spatial descriptors) using
standard FlyBase Drosophila anatomy (http://www.flybase.org)
and Gene Ontology (GO) (The Gene Ontology Consortium,
2000). Our web-based system allows multiple groups to work
in collaboration and share images and annotation easily, whilst
still protecting the original data. The Flannotator is written
in a mixture of php and javascript and is available as
a VMware image (http://www.vmware.com) that includes the
Ubuntu Linux 8.04 operating system (http://www.ubuntu.com),
web server, database and all software required. Such a design
considerably simplifies installation and mitigates against potential
∗ To
whom correspondence should be addressed.
548
incompatibilities between linux versions or distributions. Although
we designed Flannotator using Drosophila ontologies, the system
can be readily modified to utilize anatomy ontologies from other
species. Both full and demonstration versions are freely available
from http://www.flannotator.org.uk.
2
METHODS
2.1 The Flannotator web engine
Due to the complexity of the Flannotator, the system requires a modern web
browser for full functionality: FireFox (all platforms), Google Chrome and
Internet Explorer 6+ support full access to the annotation interface but older
browsers can be used for general site viewing and searching.
The Flannotator web site is split into various sections that capture
experiment descriptions, allow image uploading and annotation and facilitate
browsing or searching for specific genes or expression patterns. Individual
users are given unique usernames and passwords that provide access to the
site at different levels (annotators, administrators, etc.). Access to specific
data fields such as mapping data or other users’ annotations can be restricted
if required and experiments are defined so that one user cannot edit the
annotations of another.
2.2
Stock and sequence management
Our screen generates new protein trap insertions at random and consequently
each new line must be mapped to identify the identity of the trapped gene.
The Flannotator includes a full stock management system with a recorded
history to easily track changes. Mapping data from DNA sequencing is
imported and the software automatically determines whether an insertion
is within a gene and in the correct frame to produce a functional YFP fusion
based on the genome release loaded into the system. The Gbrowse genome
viewer (Stein et al., 2002) is included to provide a graphical overview of
the transposon insertion sites and gene models. The Flannotator currently
includes D.melanogaster release 5.3 annotation (FlyBase) for determining
gene fusions, although different versions or genomes from other species can
also be loaded if required.
2.3
Defining experiments and uploading images
An annotation experiment is defined by the user ID, tissue category, method
of sample preparation and stock name. After defining a few baseline
parameters (e.g. developmental stage being annotated and microscope used),
images are uploaded to the stage chooser page, which consists of an unsorted
bin and other bins based on the particular developmental stages selected
during the setup phase. Initially images are stored in the unsorted bin, and
are drag and dropped into the correct stage bins where they can be viewed
and processed in the main annotation window.
© The Author 2009. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected]
The Flannotator
Fig. 1. Annotation window.
2.4 Annotation using controlled ontologies
Expression patterns are described using fully controlled vocabularies from
the FlyBase anatomy, GO sub-cellular location ontology and defined FlyBase
descriptors. The current FlyBase anatomy ontology consists of over 6000
terms based on a tree structure. While such a comprehensive ontology
is essential to capture a complete description of expression patterns, the
majority of ontology terms will not be relevant to an individual annotator
focused on a particular tissue or developmental stage. To facilitate the user’s
navigation through this complex ontology, terms are placed into ‘tissue type’
groups that are themselves higher level terms (e.g. central nervous system).
Selecting a tissue group populates a menu with only the terms relevant for
a particular annotation. Individuals can select a subset of ontology terms for
a particular annotation task, with information about each term (place in the
ontology tree, descriptions, etc.) easily accessed. A reverse lookup system is
available if a user knows the terms they want to use but are unsure which
tissue group(s) it belongs to. Custom restricted lists can be constructed and
saved by individuals, for example including particular developmental stages
or types of microscopy employed. There is a similar problem with the large
number of GO Cellular_Component ontology terms since, in this case, many
terms are irrelevant to Drosophila biology and for this particular project we
use the GO_slim ontology subset to provide a useful core set of relevant
terms. Of course, if required, a full ontology may be used.
During the annotation process (Fig. 1), all terms are selected using menus
to prevent problems associated with free text entry, and tree construction is
restricted to prevent impossible annotations; for example sub-cellular terms
which contain anatomy terms as children. The NOT qualifier is also available
to allow explicit observations of where there is no protein expression.
Specific comments or observations about experiments or individual images
are allowed as free text entry, and moveable arrows and floating comments
boxes can be used to highlight areas of interest in the image. To aid in
complex annotations, sub-trees (e.g. sub-cellular terms and descriptors) from
one anatomy term can be copied and pasted to another greatly reducing the
amount of time needed. Terms that are used frequently are put into a ‘top
10’ list making them immediately available for future annotations.
Fig. 2. Section of an annotation report showing keyword cloud.
complex queries can also be constructed using a tree system similar to the
annotation process, based on AND and OR operators. Controlled vocabulary
terms further down the tree from the original search term are automatically
included in the search (e.g. searching for egg chamber will also include
oocyte, follicle cell and nurse cell). Sequencing and annotation data can
be exported in XML format for downstream processing or for use in other
pipelines.
3
RESULTS
The FlyProt database currently consists of over 10 000 pieces of
annotation and over 11 000 images from more than 25 annotators on
approximately 300 protein trap lines, displaying the robustness and
expandability of the Flannotator system for large projects.
ACKNOWLEDGEMENTS
We would like to acknowledge all of the members of the FlyProt
team; John Roote, Dr Nick Lowe, Dr Kathryn Lilley, Dr Jo Rees,
Ingrid Wesley, Laura Harris, Jane Webster, Glynnis Johnson, Pam
Fletcher, Svenja Hester and Julie Howard. Also our annotators, in
particular Helen White-Cooper who helped drive the Flannotator
through its early versions. Images reproduced with the kind
permission of Roger Guy Phillips, Andrea Brand, Helen WhiteCooper, Alex Gould and Andrew Bailey.
Funding: Wellcome Trust as part of the FlyProt project (#076739).
2.5
Browsing and querying expression annotation
Summarized information about each protein trap insertion is displayed on
the stock report page, which consists of the genetic and sequence data along
with images and annotations from all users (Fig. 2).
To give a summary view of the collected annotation data, annotation
keyword clouds based on simple anatomy term analysis are provided with
links to lines showing similar clouds. To facilitate subsequent data mining,
various query tools have been developed. In addition to simple gene queries,
Conflict of Interest: none declared.
REFERENCES
Stein,L.D. et al. (2002) The generic genome browser: a building block for a model
organism system database. Genome Res., 12, 1599–1610.
The Gene Ontology Consortium (2000) Gene Ontology: tool for the unification of
biology. Nat. Genet., 25, 25–29.
549